Search found 75 matches

by agner
2023-12-25, 8:34:40
Forum: Agner's CPU blog
Topic: Is using BSF instruction instead of using GNU C __builtin_ctz inefficient?
Replies: 1
Views: 12510

Re: Is using BSF instruction instead of using GNU C __builtin_ctz inefficient?

__builtin_ctz is not portable to all compilers. I don't think there is any difference in performance. Let's keep this discussion on stackoverflow. Remember to use the tag "vector-class-library" on stackoverflow.
by agner
2023-09-06, 10:06:59
Forum: Agner's CPU blog
Topic: Intel's "cripple AMD" function
Replies: 6
Views: 331552

Re: Intel's "cripple AMD" function

Karalinda wrote:
In many cases, there are no good alternatives to Intel's function libraries
Apparently, it is now possible to use the Intel function libraries without the cripple feature. See my previous post "New Clang-based Intel compiler is better"
by agner
2023-08-27, 5:39:08
Forum: Agner's CPU blog
Topic: Suggestion: Stop using "vector" for computer science
Replies: 1
Views: 25219

Re: Suggestion: Stop using "vector" for computer science

Language evolves. I am not sure this is the right forum to discuss this.
by agner
2023-08-25, 6:22:49
Forum: Agner's CPU blog
Topic: Testp Question
Replies: 1
Views: 24640

Re: Testp Question

An optimizing assembler should code mov rax,123 as mov eax,123 because the result is zero-extended into rax anyway. The two instructions should give identical results. Test results may vary for random reasons. Zero extension cannot be used with negative constants. mov rax,-123 is two bytes longer th...
by agner
2023-07-31, 13:14:35
Forum: Agner's CPU blog
Topic: Intel AVX10 & APX announcement
Replies: 7
Views: 177276

Re: Intel AVX10 & APX announcement

APX, on the other hand, does add decoder complexity. X86 until AVX512 already has 15 - 18 different prefixes, depending on how you count. APX adds just one more prefix (REX2) and extends the number of uses of an existing one (EVEX). This is just an incremental increase in complexity. It should be p...
by agner
2023-07-30, 11:54:04
Forum: Agner's CPU blog
Topic: Intel AVX10 & APX announcement
Replies: 7
Views: 177276

Re: Intel AVX10 & APX announcement

We have no promise that in 10 years 1024-bit AVX1024 vector won't crush on 512-FPU. And so we'll have to reinvent the wheel for another time, again. The EVEX prefix used by AVX512 and AVX10 has space for extensions to 1024 bit vectors, but not 2048. SVE/2 might support scaling vector width ARM SVE/...
by agner
2023-07-30, 6:41:13
Forum: Agner's CPU blog
Topic: Intel AVX10 & APX announcement
Replies: 7
Views: 177276

Re: Intel AVX10 & APX announcement

Thanks for the links. As far as I can see from the manuals, the future AVX10.2 processors will be binary compatible with existing AVX512 code. You only have to recompile the code if you want to use the extra registers and new instructions. The advantages of the new features are limited, so I don't e...
by agner
2023-07-20, 17:17:54
Forum: Agner's CPU blog
Topic: AMD processors do allow you to change the CPUID string
Replies: 1
Views: 29702

Re: AMD processors do allow you to change the CPUID string

Sorry, you can change the CPU name string, but not the "vendor string" that says AuthenticAMD. It is the vendor string that is checked by Intel software.
by agner
2023-06-04, 4:54:03
Forum: Agner's CPU blog
Topic: VZEROUPPER issue with Zen4 in 32-bit mode?
Replies: 3
Views: 67589

Re: VZEROUPPER issue with Zen4 in 32-bit mode?

It appears that moves are eliminated in 64 bit mode, but not in 32 bit mode. Perhaps this explains your results. Try with some other instructions.