Intel's new Chimera: Alder Lake

News and research about CPU microarchitecture and software optimization
andreas
Posts: 4
Joined: 2021-04-06, 18:25:56

Re: Intel's new Chimera: Alder Lake

Post by andreas » 2022-05-17, 15:54:54

With 8000 8-byte NOPS, the bottleneck are not the decoders, but instruction cache misses. You can see this by looking at the L2_RQSTS.CODE_RD_HIT counter (24.C4):

Code: Select all

sudo ./nanoBench.sh -f -conf configs/cfg_AlderLakeP_all.txt -cpu 0 -basic -unroll 1000 -loop 1000 -asm "|8|8|8|8|8|8|8|8" | grep -v 0.00

RDTSC: 3.01
Instructions retired: 8.00
Core cycles: 4.00
Reference cycles: 3.01
L2_RQSTS.CODE_RD_HIT: 1.00
L2_RQSTS.ALL_CODE_RD: 1.00
L2_REQUEST.ALL: 1.00
L2_REQUEST.ALL: 1.00
...

agner
Site Admin
Posts: 75
Joined: 2019-12-27, 18:56:25
Contact:

Re: Intel's new Chimera: Alder Lake

Post by agner » 2022-05-19, 5:41:49

We have now made some more tests on the P core after fixing the problem with overheating the CPU. The results are more stable now and basically confirming what Andreas wrote:
  • The decoders can handle up to 6 µops per clock
  • Simple integer instructions have a maximum throughput of 5 instructions per clock
  • Integer additions with a small immediate constant have latencies near zero.
  • Floating point addition has a latency of 2 clock cycles in chains of similar instructions, otherwise the latency is 3.
  • Cache read throughput: 3 reads per clock with sizes ≤ 256 bits. 2 reads/clock with 512 bits
  • Cache write throughput: 2 writes per clock with sizes ≤ 256 bits. 1 write/clock with 512 bits
  • Mixed read/write throughput: 3 reads and 1 write per clock with sizes ≤ 128 bits

alfred0809
Posts: 1
Joined: 2022-05-22, 6:15:46

Re: Intel's new Chimera: Alder Lake

Post by alfred0809 » 2022-05-22, 6:19:37

I have tested an Alder Lake, but I have not been able to get access to a setup that makes it possible to enable the AVX512 instructions. The performance of the P cores is improved somewhat over the Intel Ice Lake. The µop cache can hold 4k µops. The µop cache can deliver a maximum of 6 µops per clock cycle for a single thread

agner
Site Admin
Posts: 75
Joined: 2019-12-27, 18:56:25
Contact:

Re: Intel's new Chimera: Alder Lake

Post by agner » 2022-05-22, 10:06:47

The µop cache can hold 4k µops. The µop cache can deliver a maximum of 6 µops per clock cycle for a single thread
This agrees with my measurements.

agner
Site Admin
Posts: 75
Joined: 2019-12-27, 18:56:25
Contact:

P cores and E cores

Post by agner » 2022-06-28, 6:35:28

This Reddit post is reporting experiments with how to make sure heavy tasks are running in the P cores
https://www.reddit.com/r/XMG_gg/comment ... ores_when/.

I still think it is unreasonable to expect ordinary computer users to attend to processor-specific performance tuning details. Intel's confusing product names makes it difficult to even know what microarchitecture your computer is based on.

Post Reply