Agner`s CPU blog

Software optimization resources | E-mail subscription to this blog | www.agner.org

Test results for AMD Ryzen
Author:  Date: 2017-05-02 14:16
On Ryzen, Bulldozer, and/or Jaguar, does vxorps-zeroing of a ymm register still only take 1 micro-op? Unlike the non-special case, where vxorps ymm1,ymm2,ymm3 which is split into two?

I'm worried that the special-case of xor-zeroing might not be identified until after the decoder has already split it in two. Or that if it still needs an execution port, ymm zeroing might still use two instead of taking advantage of AVX implicit zero-extension to high lanes. (Previously posted at stackoverflow.com/questions/43713273/is-vxorps-zeroing-on-amd-jaguar-bulldozer-zen-faster-with-xmm-registers-than-ymm, but this is probably a better place to ask.)

If ymm-zeroing is slower on any CPUs, then compilers should use vxorps xmm0,xmm0,xmm0 even for _mm256_setzero_ps.

---

For _mm512_setzero_ps, using a VEX-encoded instruction saves a byte vs. EVEX. (reported to clang as bug 32862).
No existing AVX512 hardware has a problem with mixing VEX and EVEX vector instructions, or vector widths, AFAIK. And there's no reason to expect problems on future CPUs because of AVX's zero-extending to VLMAX.

----

On Intel CPUs, the choice affects whether it warms up the 256b execution units (and throttles the max-turbo on Xeon CPUs). So calling a noinline function that returns _mm256_setzero_ps wouldn't be a reliable way to warm up the execution units. But it already wasn't portably reliable anyway, because MSVC already always uses 128b vxorps for zeroing ymm/zmm regs. Returning 256b all-ones would work, but only clang and icc avoid loading a constant when AVX2 isn't available. See all 4 compilers on godbolt.

 
thread Test results for AMD Ryzen new - Agner - 2017-05-02
replythread Ryzen analyze new - Daniel - 2017-05-02
last reply Ryzen analyze new - Agner - 2017-05-02
replythread Test results for AMD Ryzen - Peter Cordes - 2017-05-02
last replythread Test results for AMD Ryzen new - Agner - 2017-05-03
last replythread Test results for AMD Ryzen new - Phenominal - 2017-05-06
last replythread Test results for AMD Ryzen new - Agner - 2017-05-06
last replythread Test results for AMD Ryzen new - Phenominal - 2017-05-06
last reply Test results for AMD Ryzen new - Agner - 2017-05-06
replythread Test results for AMD Ryzen new - Tacit Murky - 2017-05-05
last replythread Test results for AMD Ryzen new - Tacit Murky - 2017-07-08
last reply Test results for AMD Ryzen new - Michael Rolle - 2019-05-15
replythread Test results for AMD Ryzen--POPCNT new - Xing Liu - 2017-05-08
last reply Test results for AMD Ryzen--POPCNT new - Agner - 2017-05-11
replythread Test results for AMD Ryzen new - Justin - 2017-07-11
last reply EPYC new - Agner - 2017-07-11
replythread Test results for AMD Ryzen new - Lefty - 2017-07-12
last replythread Test results for AMD Ryzen new - Agner - 2017-07-12
replythread Test results for AMD Ryzen new - cvax - 2017-07-13
last reply Test results for AMD Ryzen new - Agner - 2017-07-13
last replythread Test results for AMD Ryzen new - Lefty - 2017-07-13
reply Test results for AMD Ryzen new - Agner - 2017-07-13
last replythread Test results for AMD Ryzen new - Travis - 2017-07-13
last reply Test results for AMD Ryzen new - Johannes - 2017-07-25
last replythread Test results for AMD Ryzen new - Conrad - 2017-09-22
reply Test results for AMD Ryzen new - Agner - 2017-09-22
last reply Test results for AMD Ryzen new - Travis - 2017-09-26