Search found 75 matches

by agner
2022-07-23, 11:56:14
Forum: Agner's CPU blog
Topic: inconsistent port usage about CALL vs PUSH&JMP
Replies: 2
Views: 59047

Re: inconsistent port usage about CALL vs PUSH&JMP

Call [m] is similar to mov r, [m] call r It is using the address generation units (p237) twice, once for reading m and once for storing the return address on the stack. call instructions and return instructions are usually tested together to avoid messing up the return stack buffer. It may be diffi...
by agner
2022-06-28, 6:35:28
Forum: Agner's CPU blog
Topic: Intel's new Chimera: Alder Lake
Replies: 14
Views: 644037

P cores and E cores

This Reddit post is reporting experiments with how to make sure heavy tasks are running in the P cores https://www.reddit.com/r/XMG_gg/comments/vlqn6d/psa_rendering_tasks_are_moved_to_ecores_when/ . I still think it is unreasonable to expect ordinary computer users to attend to processor-specific pe...
by agner
2022-05-22, 10:06:47
Forum: Agner's CPU blog
Topic: Intel's new Chimera: Alder Lake
Replies: 14
Views: 644037

Re: Intel's new Chimera: Alder Lake

The µop cache can hold 4k µops. The µop cache can deliver a maximum of 6 µops per clock cycle for a single thread
This agrees with my measurements.
by agner
2022-05-19, 5:41:49
Forum: Agner's CPU blog
Topic: Intel's new Chimera: Alder Lake
Replies: 14
Views: 644037

Re: Intel's new Chimera: Alder Lake

We have now made some more tests on the P core after fixing the problem with overheating the CPU. The results are more stable now and basically confirming what Andreas wrote: The decoders can handle up to 6 µops per clock Simple integer instructions have a maximum throughput of 5 instructions per cl...
by agner
2022-05-17, 4:18:07
Forum: Agner's CPU blog
Topic: Intel's new Chimera: Alder Lake
Replies: 14
Views: 644037

Re: Intel's new Chimera: Alder Lake

Andreas wrote:
This can be seen from the UOPS_MITE count that is shown in the output
With 4000 8-byte NOPS I see 32 bytes per clock. With 8000 8-byte NOPS I see 16 bytes per clock.
MITE_UOPS keeps counting (event 0x79, umask=4). DSB_UOPS (umask=8) stop counting.
by agner
2022-05-16, 4:47:49
Forum: Agner's CPU blog
Topic: Intel's new Chimera: Alder Lake
Replies: 14
Views: 644037

Re: Intel's new Chimera: Alder Lake

Thank you for contributing with your measurements. Andreas wrote: The decoders on the P cores can decode 6 instructions per cycle. ..they can handle 32 bytes per clock This is when your code is running out of the µop cache. The µops have already been decoded. The decoder throughput can only be measu...
by agner
2022-05-13, 9:01:33
Forum: Agner's CPU blog
Topic: Intel's new Chimera: Alder Lake
Replies: 14
Views: 644037

Test results for Alder Lake

I have now been able to test the Alder Lake P cores with full access to the AVX512 instructions thanks to a lot of help from Zingaburga . I can confirm that the full AVX512 instruction set is working on the Alder Lake with an early microcode and a certain BIOS when only the P cores are enabled. The ...
by agner
2022-04-30, 12:02:34
Forum: Agner's CPU blog
Topic: AVX or SSE code performance question
Replies: 1
Views: 45214

Re: AVX or SSE code performance question

I will propose that you post your question on https://stackoverflow.com with details about your code. Your results may in fact reflect cache effects, compiler optimization, or thread effects.
by agner
2022-04-26, 4:26:47
Forum: Agner's CPU blog
Topic: Intel's new Chimera: Alder Lake
Replies: 14
Views: 644037

Re: Intel's new Chimera: Alder Lake

davidbak wrote:
Isn't there also software that would have been pessimized in the other direction
Yes
by agner
2022-04-25, 13:00:47
Forum: Agner's CPU blog
Topic: Intel's new Chimera: Alder Lake
Replies: 14
Views: 644037

Re: Intel's new Chimera: Alder Lake

Thank you for your proposal. The easiest way is if I can test it by remote access (Linux). Please send me an email agner_at-agner.org