Agner`s CPU blog

Software optimization resources | E-mail subscription to this blog | www.agner.org

Test results for Knights Landing
Author:  Date: 2016-12-06 11:19
The Xeon Phi x200 (Knights Landing) has a lot of modes of operation (selected at boot time), and the latency and bandwidth characteristics are slightly different for each mode.

It is also important to remember that the latency can be different for each physical address, depending on the location of the requesting core, the location of the coherence agent responsible for that address, and the location of the memory controller for that address. Intel has not publicly disclosed the mapping of core numbers (APIC IDs) to physical locations on the chip or the locations of coherence agents (CHA boxes) on the chip, nor has it disclosed the hash functions used to map physical addresses to coherence agents and to map physical addresses to MCDRAM or DDR4 memory controllers. (In some modes of operation the memory mappings are trivial, but not in all modes.)

The Knights Landing system at TACC uses the Xeon Phi 7250 processor (68 cores, 1.4 GHz nominal). For operation in "Flat" mode (MCDRAM as memory, located in the upper 16 GiB of the physical address space), with the coherence agent mapping in "Quadrant" mode (addresses are hashed to coherence agents spread across the entire chip, but each cache line is assigned to an MCDRAM controller in the same "quadrant" as the CHA responsible for coherence), my preferred latency tester gives values of 154ns +/- 1ns (1 standard deviation) for MCDRAM. These values are averaged over many addresses, with the variation mostly from core to core (with a few ns of random variability). My latency tester uses permutations of even-numbered cache lines in various sized address range blocks, so it is not guaranteed that my averages are uniformly distributed over all the coherence agents.

For the same system in "Flat" "All-to-All" mode (addresses are hashed to coherence agents spread across the entire chip, with no special correlation between the location of coherence agents and the MCDRAM controller owning an address), the corresponding value is 156ns +/- 1ns (1 standard deviation).

For the same system in "Flat" "Sub-NUMA Cluster 4" mode, the corresponding values are 150.5ns +/- 0.9ns (1 standard deviation) for "local" accesses, and 156.8ns +/- 3.1ns for "remote" accesses. Variability across nodes is not entirely negligible, in part because different nodes have different patterns of disabled tiles. (Four of the 38 tiles are disabled on each Xeon Phi 7250 processor.) Run-to-run variability is typically small when using large pages, but there are certain idiosyncrasies that have yet to be explained.

Note that even though the average latency differences are quite small across these modes of operation, the sustained bandwidth differences are much larger. The decreased number of "hops" required for coherence transactions in "Quadrant" and "SNC-4" modes reduces contention on the mesh links and thereby allows higher sustained bandwidths. The difference between sustained bandwidth in Flat-All-to-All and Flat-Quadrant modes suggests that contention on the non-data mesh links (address, acknowledge, and invalidate) is more important than contention on the data transfer links (which should be the same for those two modes of operation). I will post more details to my blog as they become available....

 
thread Test results for Knights Landing new - Agner - 2016-11-26
reply Test results for Knights Landing new - Nathan Kurz - 2016-11-26
replythread Test results for Knights Landing new - Tom Forsyth - 2016-11-27
reply Test results for Knights Landing new - Søren Egmose - 2016-11-27
last reply Test results for Knights Landing new - Agner - 2016-11-30
replythread Test results for Knights Landing new - Joe Duarte - 2016-12-03
replythread Test results for Knights Landing new - Agner - 2016-12-04
last reply Test results for Knights Landing new - Constantinos Evangelinos - 2016-12-05
last replythread Test results for Knights Landing - John McCalpin - 2016-12-06
replythread Test results for Knights Landing new - Agner - 2016-12-06
last reply Test results for Knights Landing new - John McCalpin - 2016-12-08
last reply Test results for Knights Landing new - Joe Duarte - 2016-12-07
replythread Test results for Knights Landing new - zboson - 2016-12-28
last reply VZEROUPPER new - Agner - 2016-12-28
replythread Test results for Knights Landing new - Ioan Hadade - 2017-07-13
last reply Test results for Knights Landing new - Agner - 2017-07-13
last replythread INC/DEC throughput new - Peter Cordes - 2017-10-09
last reply INC/DEC throughput new - Agner - 2017-10-10