Agner`s CPU blog

Software optimization resources | E-mail subscription to this blog | www.agner.org

Test results for AMD Ryzen
Author:  Date: 2017-07-13 13:48
Lefty wrote:
Thanks for the answer.
I have another question. I am wondering why AVX-256 / AVX -512 is considered superior to AVX-128.
You can pack 2 AVX-128 instructions into one AVX-256 instruction (provided that the instructions are independent), but it will not necessarily execute faster. A CPU with one 256 bit SIMD unit can execute the AVX-256 instruction in one cycle, however a CPU with 2 128-bit SIMD units would just schedule the 2 AVX-128 instructions to execute simultaneously - also in one cycle. I don't see where the advantage is.
In general there might not be an advantage if you are comparing a CPU that offers 2N execution units of width W versus one which offers N execution units of width 2W (e.g., 4 x 128-bit units versus 2 x 256-bit units) - but that's not usually the comparison you would see in actual hardware. In general it is much easier to extend the length of the vector units by 2x than it is to sustainably execute at double the IPC. Indeed, Intel chips have been "stuck" at 4-wide for nearly a decade despite increasing from 128-bits to 512-bits on the vector size.

To double sustained IPC (in code that can provide the necessarily ILP in the first place) you'd have to have to approximately double fetch, decode, rename and retire throughput, and increase the size of many structures such as the ROB and PRF. Even then you might run out of registers in the ISA since you effectively need twice as many registers to keep the same amount of data "in flight". Many of these changes aren't just linear increases in hardware complexity, but quadratic or worse - and at some point they aren't even possible without reducing the clock frequency.

Increasing the width of the SIMD units, on the other hand, is generally a straightforward linear increase in complexity (with the exception of some lane-crossing operations, which is why those often have a longer latency and are generally discouraged).

 
thread Test results for AMD Ryzen new - Agner - 2017-05-02
replythread Ryzen analyze new - Daniel - 2017-05-02
last reply Ryzen analyze new - Agner - 2017-05-02
replythread Test results for AMD Ryzen new - Peter Cordes - 2017-05-02
last replythread Test results for AMD Ryzen new - Agner - 2017-05-03
last replythread Test results for AMD Ryzen new - Phenominal - 2017-05-06
last replythread Test results for AMD Ryzen new - Agner - 2017-05-06
last replythread Test results for AMD Ryzen new - Phenominal - 2017-05-06
last reply Test results for AMD Ryzen new - Agner - 2017-05-06
replythread Test results for AMD Ryzen new - Tacit Murky - 2017-05-05
last reply Test results for AMD Ryzen new - Tacit Murky - 2017-07-08
replythread Test results for AMD Ryzen--POPCNT new - Xing Liu - 2017-05-08
last reply Test results for AMD Ryzen--POPCNT new - Agner - 2017-05-11
replythread Test results for AMD Ryzen new - Justin - 2017-07-11
last reply EPYC new - Agner - 2017-07-11
replythread Test results for AMD Ryzen new - Lefty - 2017-07-12
last replythread Test results for AMD Ryzen new - Agner - 2017-07-12
replythread Test results for AMD Ryzen new - cvax - 2017-07-13
last reply Test results for AMD Ryzen new - Agner - 2017-07-13
last replythread Test results for AMD Ryzen new - Lefty - 2017-07-13
reply Test results for AMD Ryzen new - Agner - 2017-07-13
last replythread Test results for AMD Ryzen - Travis - 2017-07-13
last reply Test results for AMD Ryzen new - Johannes - 2017-07-25
last replythread Test results for AMD Ryzen new - Conrad - 2017-09-22
reply Test results for AMD Ryzen new - Agner - 2017-09-22
last reply Test results for AMD Ryzen new - Travis - 2017-09-26