Agner`s CPU blog

Software optimization resources | E-mail subscription to this blog | www.agner.org

Test results for Broadwell and Skylake
Author:  Date: 2017-06-20 12:18
- wrote:
> so we can compare it to the facts

So we have some info out now: www.anandtech.com/show/11550/the-intel-skylakex-review-core-i9-7900x-i7-7820x-and-i7-7800x-tested/3

From the article:

> Nominally the FMAs on ports 0 and 1 are 256-bit, so in order to drive towards the AVX-512-F these two ports are fused together, similar to how AVX-512-F is implemented in Knights Landing. The six-core and eight-core Skylake-X parts support one fused FMA for AVX-512-F, although the 10-core will support dual 512-bit AVX-512-F ports, which seems to be located on port 5. This means that the 10-core i9-7900X can support 64 SP or 32 DP calculations per cycle, whereas the 8-core/6-core parts can support 32 SP or 16 DP per cycle.

I don't recall Intel ever doing anything similar for product segmentation in the past, so limiting execution ports on cheaper SKUs seems to be a first to me.
Anyway, it sounds like you were on the ball, except that port 5 can also do FMA on higher SKUs, for more FLOPS.

Based also on Bulat Ziganshin comments, I'd find highly unlikely a 3 uOp dual AVX-512 issue (2 256-bit uOps on Ports 0-1 + 1 512-bit uOp on port 5). I'd assume, the 256 bit units on Ports 0-1 were kept mostly intact (except for some extra vector-mask logic required for AVX-512) on the six-and-eight core variants whereas both ports were expanded to 512-bit for the 8+ core variants. (Since as mentioned before, most functionalities were "cloned" between them on Skylake's first iteration).

Also, let's recall that Knights Landing don't "fuse ports" for AVX-512 executions, since both per-core-VPUs are 512-bit wide :)

 
thread Test results for Broadwell and Skylake new - Agner - 2015-12-26
replythread Sustained 64B loads per cycle on Haswell & Sky new - Nathan Kurz - 2015-12-26
last replythread Sustained 64B loads per cycle on Haswell & Sky new - Agner - 2015-12-27
last replythread Sustained 64B loads per cycle on Haswell & Sky new - Nathan Kurz - 2015-12-27
reply Sustained 64B loads per cycle on Haswell & Sky new - John D. McCalpin - 2016-01-04
reply Sustained 64B loads per cycle on Haswell & Sky new - T - 2016-06-18
last reply Sustained 64B loads per cycle on Haswell & Sky new - Jens Nurmann - 2017-01-12
replythread Test results for Broadwell and Skylake new - Peter Cordes - 2015-12-28
last reply Test results for Broadwell and Skylake new - Agner - 2015-12-29
replythread Test results for Broadwell and Skylake new - Tacit Murky - 2016-01-04
last replythread Test results for Broadwell and Skylake new - Agner - 2016-01-05
last replythread Test results for Broadwell and Skylake new - Tacit Murky - 2016-03-09
last reply Test results for Broadwell and Skylake new - Tacit Murky - 2016-06-05
replythread Minor bug in the microarchitecture manual new - SHK - 2016-01-10
last reply Minor bug in the microarchitecture manual new - Agner - 2016-01-16
replythread Test results for Broadwell and Skylake new - John D. McCalpin - 2016-01-12
last replythread Test results for Broadwell and Skylake new - Jess - 2016-02-11
last reply Description of discrepancy new - Nathan Kurz - 2016-03-13
reply Test results for Broadwell and Skylake new - Russell Van Zandt - 2016-02-22
replythread Instruction Throughput on Skylake new - Nathan Kurz - 2016-04-23
last replythread Instruction Throughput on Skylake new - Agner - 2016-04-24
last replythread Instruction Throughput on Skylake new - Nathan Kurz - 2016-04-26
last replythread Instruction Throughput on Skylake new - Agner - 2016-04-27
last replythread Instruction Throughput on Skylake new - T - 2016-06-18
reply Instruction Throughput on Skylake new - Agner - 2016-06-19
last replythread Instruction Throughput on Skylake new - Nathan Kurz - 2016-07-08
last replythread Instruction Throughput on Skylake new - Nathan Kurz - 2016-07-11
replythread Instruction Throughput on Skylake new - Tacit Murky - 2016-07-17
last replythread Haswell register renaming / unfused limits new - Peter Cordes - 2017-05-11
reply Haswell register renaming / unfused limits new - Tacit Murky - 2017-05-11
last reply Haswell register renaming / unfused limits new - Peter Cordes - 2017-05-12
last reply Instruction Throughput on Skylake new - T - 2016-08-08
reply Unlamination of micro-fused ops in SKL and earlier new - Travis - 2016-09-09
replythread 32B store-forwarding is slower than 16B new - Peter Cordes - 2017-05-11
last replythread 32B store-forwarding is slower than 16B new - Fabian Giesen - 2017-06-28
last reply 32B store-forwarding is slower than 16B new - Agner - 2017-06-28
reply SHL/SHR r,cl latency is lower than throughput new - Peter Cordes - 2017-05-27
replythread Test results for Broadwell and Skylake new - Bulat Ziganshin - 2017-05-30
last replythread Test results for Broadwell and Skylake new - Agner - 2017-05-30
last replythread Test results for Broadwell and Skylake new - Bulat Ziganshin - 2017-05-30
last replythread Test results for Broadwell and Skylake new - - - 2017-06-19
replythread Test results for Broadwell and Skylake - Jorcy Neto - 2017-06-20
last reply Test results for Broadwell and Skylake new - Jorcy Neto - 2017-06-20
replythread Test results for Broadwell and Skylake new - Bulat Ziganshin - 2017-06-21
reply Test results for Broadwell and Skylake new - Jorcy Neto - 2017-06-26
last replythread Test results for Broadwell and Skylake new - - - 2017-07-05
last replythread Test results for Broadwell and Skylake new - - - 2017-07-12
last reply Test results for Broadwell and Skylake new - Jorcy Neto - 2017-07-19
last replythread Test results for Broadwell and Skylake new - Xing Liu - 2017-06-28
last replythread Test results for Broadwell and Skylake new - Travis - 2017-06-29
last replythread Test results for Broadwell and Skylake new - Xing Liu - 2017-06-30
last reply Test results for Broadwell and Skylake new - Travis - 2017-07-13
last reply Official information about uOps and latency SNB+ new - SEt - 2017-07-17