Agner`s CPU blog

Software optimization resources | E-mail subscription to this blog | www.agner.org

Test results for Knights Landing
Author:  Date: 2017-07-13 04:00
As always, many thanks for your work with the instruction timings and evaluations, they are truly helpful.

I have one question regarding the permutations on KNL. As you can see in your instruction manual and corroborated in "Intel Xeon Phi Processor High Performance Programming", there is around 2X difference in throughput and 33% in latency when using a two source or one source permutation/shuffle with the one source variant being superior. I am interested in permutations on double precision operands to be more precise.

Scanning through your instruction table, I see that you have added vpermpd as only working on ymm registers which presumably is a one source permutation. However, in the intrinsics guide, I can see that you can generate a vpermpd using zmm's with _mm512_mask_permutex_pd where you can use a mask to keep results from the source argument (which I think gets overwritten) and shuffles across 256bit lane in the other operand. This decays to vpermpd zmm {k}, zmm, imm. As far as I can tell, this is a one source operand permutation which will have a latency between 3-6 cycles and a rec throughput of 1, am I correct? From my understanding, the two source operands are the intrinsics that decay into vpermpd zmm {k}, zmm, zmm such as _mm512_maskz_permutexvar_pd or the VSHUFF64X2 instructions.

Your clarification between these would be very helpful.

Kind regards,
Ioan

 
thread Test results for Knights Landing new - Agner - 2016-11-26
reply Test results for Knights Landing new - Nathan Kurz - 2016-11-26
replythread Test results for Knights Landing new - Tom Forsyth - 2016-11-27
reply Test results for Knights Landing new - Søren Egmose - 2016-11-27
last reply Test results for Knights Landing new - Agner - 2016-11-30
replythread Test results for Knights Landing new - Joe Duarte - 2016-12-03
replythread Test results for Knights Landing new - Agner - 2016-12-04
last reply Test results for Knights Landing new - Constantinos Evangelinos - 2016-12-05
last replythread Test results for Knights Landing new - John McCalpin - 2016-12-06
replythread Test results for Knights Landing new - Agner - 2016-12-06
last reply Test results for Knights Landing new - John McCalpin - 2016-12-08
last reply Test results for Knights Landing new - Joe Duarte - 2016-12-07
replythread Test results for Knights Landing new - zboson - 2016-12-28
last reply VZEROUPPER new - Agner - 2016-12-28
replythread Test results for Knights Landing - Ioan Hadade - 2017-07-13
last reply Test results for Knights Landing new - Agner - 2017-07-13
last replythread INC/DEC throughput new - Peter Cordes - 2017-10-09
last reply INC/DEC throughput new - Agner - 2017-10-10