inconsistent port usage about CALL vs PUSH&JMP

News and research about CPU microarchitecture and software optimization
Post Reply
katsu
Posts: 2
Joined: 2022-07-23, 7:43:13

inconsistent port usage about CALL vs PUSH&JMP

Post by katsu » 2022-07-23, 8:46:28

Hi

According to latest Agner Fog's instruction table, the `CALL m` instruction uses `2p237 p4 p6`(take Skylake as example),
as far as I can know, `CALL m` should be similar to `PUSH RIP; JMP m`, however as the table described,
`PUSH r` uses `p237 p4`, `JMP m` uses `p23 p6`, so the final port usage seem like `p23 p237 p4 p6`.

I have checked the CALL entry at uops.info and didn't find any places showing 2p237, is this indeed a small mistake or I
missed something?

Thanks
Katsu

agner
Site Admin
Posts: 75
Joined: 2019-12-27, 18:56:25
Contact:

Re: inconsistent port usage about CALL vs PUSH&JMP

Post by agner » 2022-07-23, 11:56:14

Code: Select all

Call [m] 
is similar to

Code: Select all

mov r, [m]
call r
It is using the address generation units (p237) twice, once for reading m and once for storing the return address on the stack.

call instructions and return instructions are usually tested together to avoid messing up the return stack buffer. It may be difficult to divide the µop count between the two instructions.

katsu
Posts: 2
Joined: 2022-07-23, 7:43:13

Re: inconsistent port usage about CALL vs PUSH&JMP

Post by katsu » 2022-07-23, 13:49:46

agner wrote:
2022-07-23, 11:56:14

Code: Select all

Call [m] 
is similar to

Code: Select all

mov r, [m]
call r
It is using the address generation units (p237) twice, once for reading m and once for storing the return address on the stack.

call instructions and return instructions are usually tested together to avoid messing up the return stack buffer. It may be difficult to divide the µop count between the two instructions.
Thank you Agner Fog for the clarification, one more to confirm.

> It is using the address generation units (p237) twice, once for reading m and once for storing the return address on the stack.

Can port 7 be used for load address calculation? The instruction table shows `MOV r32/r64,m` will finally be decoded into one μop
in the unfused domain and will be distributed to p23.

And the explanation of column headings in the instruction table has the following expression, which explicitly tells port 7 is used for
store address calculation:
µops each port: The number of μops for each execution port. p0 means a µop to execution port 0.
p01means a µop that can go to either port 0 or port 1. p0 p1 means two µops going to
port 0 and 1, respectively.
Port 0: Integer, f.p. and vector ALU, mul, div, branch
Port 1: Integer, f.p. and vector ALU
Port 2: Load
Port 3: Load
Port 4: Store
Port 5: Integer and vector ALU
Port 6: Integer ALU, branch
Port 7: Store address
Thanks
Katsu

Post Reply