Agner's CPU blog

...

ARM SVE/2 can vary vector length between 128 bits and 2048 bits, at 128-bit increments. I don't see any reason to adopt this in x86 since you can just mask off the unused part of a vector when saving it. Might be worth pointing out that ARM has published an errata (see C215) which restricts SVE vec...

...

Responding to "What do you guys think?": Why would any other CPU vendor want to adopt this ISA extension? AFAIK AVX10.1 is just Sapphire Rapids' level AVX-512 renamed, with some new CPUID bits. I don't see why AMD (assuming they adopt FP16) would choose not to support it. AVX10.2 doesn't seem to cha...

...

Thanks for the writeup.
If it helps, I have an AVX512 enabled 12700K if there's some program/code you want me to run on it.

If you want to get such a chip yourself, I wrote an article regarding requirements.

...

Pre-loading the destination register of the multiply speeds up this loop by 5 times. Loading the source register does not speed it up. There's no way that loop should be able to run faster than the 5 clock cycle latency of the multiply instruction, and yet it does. This should be impossible. Even m...

Agner's CPU blog

Search found 4 matches

Re: Intel AVX10 & APX announcement

Re: Intel AVX10 & APX announcement

Re: Intel's new Chimera: Alder Lake

Re: Intel Floating Point Executing 3 to 4 Times Faster Than it Should. MAKES NO SENSE