Clarification on Intel Haswell microarchitecture pipeline

News and research about CPU microarchitecture and software optimization
Post Reply
cianfa72
Posts: 4
Joined: 2021-11-03, 14:15:30

Clarification on Intel Haswell microarchitecture pipeline

Post by cianfa72 » 2021-11-03, 18:08:01

Hi Agner,
I'm a newby on this topic and reading your very interesting 'The microarchitecture of Intel, AMD, and VIA CPUs' last update 2021-08-17 I came with some questions.

From Figure 6.1 pag 71, as far as I can tell, 'ROB wb' block writes-back the result of each executed uop into PRF (Permanent Register File) while RRF write results from PRF to architectural registers (e.g. EAX, EBX ecc..) only when the uop retires. By the way 'ROB wb' also feeds ROB-entries operands of uops possibly waiting for those inputs inside ROB.

What about load & store memory operations ? I'm aware of there are execution units dedicated for this purpose, for instance on Intel Haswell microarchitecture:

U6UNu9i.png
U6UNu9i.png (93.18 KiB) Viewed 53445 times

As far as I can tell, Port 2 and Port 3 are attached to load & store (LD/STA execution units). Port 7 is attached to a STA unit available for stores alone while Port 4 serves the STD unit that I think writes the actual data into the store buffer allocated to the 'store' uop when it entered the ROB.

I'm not sure if the above is correct and if STA units are actually the AGUs (Address Generation Unit) or they are really different units employed for other purposes.

What is the role of 'ROB wb' unit for store uops ? Is its job just feeds the ROB entries operands of uops eventually waiting for those inputs inside the ROB ?

Thank you.

agner
Site Admin
Posts: 75
Joined: 2019-12-27, 18:56:25
Contact:

Re: Clarification on Intel Haswell microarchitecture pipeline

Post by agner » 2021-11-04, 15:35:23

STA at port 7 calculates the address for a store. The address can be calculated before the data to store is available. The calculated address is passed back to the scheduler where it waits until STD at port 4 needs it. For example: MOV [RAX+RBX],ECX. Here RAX and RBX go to STA to calculate the address. Then the calculated address and ECX go to port 4 where the data store process is initiated. The memory control unit probably has store buffers, store forwarding logic, and logic to determine if memory operations can go out of order.

cianfa72
Posts: 4
Joined: 2021-11-03, 14:15:30

Re: Clarification on Intel Haswell microarchitecture pipeline

Post by cianfa72 » 2021-11-04, 16:12:01

agner wrote:
2021-11-04, 15:35:23
STA at port 7 calculates the address for a store. The address can be calculated before the data to store is available. The calculated address is passed back to the scheduler where it waits until STD at port 4 needs it. For example: MOV [RAX+RBX],ECX. Here RAX and RBX go to STA to calculate the address. Then the calculated address and ECX go to port 4 where the data store process is initiated.
So, as far as I can understand, STA unit (there are actually 3 STA units) is actually the same as the AGU (Address Generation Unit) unit -- just a different name for the same thing, right ?
agner wrote:
2021-11-04, 15:35:23
The memory control unit probably has store buffers, store forwarding logic, and logic to determine if memory operations can go out of order.
From the Haswell microarchitecture picture above, memory control unit's store buffers should be actually the Store Buffers depicted in the "Load Buffers, Store Buffers, Reorder Buffers" block in the upper part, I guess...

Post Reply