Search found 4 matches

by andreas
2022-05-17, 15:54:54
Forum: Agner's CPU blog
Topic: Intel's new Chimera: Alder Lake
Replies: 14
Views: 643741

Re: Intel's new Chimera: Alder Lake

With 8000 8-byte NOPS, the bottleneck are not the decoders, but instruction cache misses. You can see this by looking at the L2_RQSTS.CODE_RD_HIT counter (24.C4): sudo ./nanoBench.sh -f -conf configs/cfg_AlderLakeP_all.txt -cpu 0 -basic -unroll 1000 -loop 1000 -asm "|8|8|8|8|8|8|8|8" | grep -v 0.00 ...
by andreas
2022-05-16, 14:06:34
Forum: Agner's CPU blog
Topic: Intel's new Chimera: Alder Lake
Replies: 14
Views: 643741

Re: Intel's new Chimera: Alder Lake

agner wrote:
2022-05-16, 4:47:49
This is when your code is running out of the µop cache. The µops have already been decoded. The decoder throughput can only be measured when the loop is bigger than the µop cache.
My code is not running out of the µop cache. This can be seen from the UOPS_MITE count that is shown in the output.
by andreas
2022-05-15, 21:04:11
Forum: Agner's CPU blog
Topic: Intel's new Chimera: Alder Lake
Replies: 14
Views: 643741

Re: Intel's new Chimera: Alder Lake

The decoders can deliver a maximum of 4 µops per clock for a single thread According to my tests, the decoders on the P cores can decode 6 instructions per cycle. Here is an example for a sequence of NOP instructions that require, on average, 0.17 cycles: https://uops.info/html-tp/ADL-P/NOP-Measure...
by andreas
2021-04-06, 19:09:04
Forum: Agner's CPU blog
Topic: Intel Sunny Cove
Replies: 7
Views: 77684

Re: Intel Sunny Cove

According to your optimization guide, inc and dec cannot be macro fused on Tiger Lake. How do your tests for this look like? According to my tests (which are available here: https://www.uops.info/html-tp/TGL/DEC_R64-Measurements.html#macroFusion), they do macro fuse in the same way as on previous mi...