Agner`s CPU blog

Software optimization resources | E-mail subscription to this blog | www.agner.org

Overview of CPU dispatching in Intel software
Author: Agner Fog Date: 2010-08-23 06:38

There are many different versions of Intel compilers and function libraries with different CPU dispatching schemes. Some of these are fair to non-Intel processors and some are unfair. By unfair dispatching I mean that it chooses a suboptimal code path when running on a non-Intel CPU even when the CPU is compatible with a better code path. The different versions can get quite confusing, so I have tried to test as many different versions of Intel software products as I could get my hands on and present an overview of the results here.

The tables below show the highest instruction set available to Intel and non-Intel processors when running the different software products. The sequence of instruction sets have the not very logical names:

386 MMX SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 AVX

Intel Math Kernel Library

The Math Kernel Library (MKL) contains many advanced mathematical functions. The results in the following table do not apply to various (sub-)packages that may be bundled with the MKL, such as the Intel Vector Math Library (VML), Intel Performance Primitives (IPP) and Intel Threading Building Blocks (TBB).

Library version Intel processor
32 bit mode
non-Intel processor
32 bit mode
Intel processor
64 bit mode
non-Intel processor
64 bit mode
MKL 7.0, 2004 SSE3 386 n.a. n.a.
MKL 8.1, 2006 SSSE3 SSSE3 SSSE3 SSSE3
MKL 9.0, 2006 SSSE3 SSSE3 SSSE3 SSSE3
MKL 10.2, 2008 SSE4.2 SSE4.2 SSE4.2 SSE2
MKL 10.3, 2010 SSE4.2 SSE4.2 AVX SSE2

As we can see, version 8 and 9 give Intel and non-Intel processors access to the same instruction sets, while version 7 and the 64-bit version 10 have unfair dispatching. MKL 7.0 has no x86-64 version.

Intel Vector Math Library

The Vector Math Library (VML) contains procedures for calculating elementary mathematical functions on vectors of arbitrary size.

Library version Intel processor
32 bit mode
non-Intel processor
32 bit mode
Intel processor
64 bit mode
non-Intel processor
64 bit mode
VML 7.0, 2004 SSE3 386 n.a. n.a.
VML 8.1, 2006 SSSE3 SSE SSSE3 SSE2
VML 9.0, 2006 SSSE3 SSE SSSE3 SSE2
VML 10.2, 2006 SSE4.2 SSE2 SSE4.2 SSE2
VML 10.3, 2010 AVX SSE2 AVX SSE2

As we can see, all versions have unfair dispatching. There are different branches for Intel processors with SSE2 and non-Intel processors with SSE2. I have not tested which of the SSE2 branches run fastest on non-Intel processors.

Intel Performance Primitives

All the versions I have tested have fair CPU dispatching.

Intel Threading Building Blocks

This library has some CPU dispatching, but I have not tested whether it is fair or not.

Intel standard C library and standard math library

These libraries are called automatically from code compiled with an Intel C++ compiler.

Library version Intel processor
32 bit mode
non-Intel processor
32 bit mode
Intel processor
64 bit mode
non-Intel processor
64 bit mode
7.1, 2004 SSE2 386 n.a. n.a.
8.1, 2005 SSE3 386 n.a. n.a.
9.1, 2006 SSE3 386 SSE3 SSE2
10.1, 2008 SSE4.2 386 SSE4.2 SSE2
11.1, 2010 AVX 386 AVX SSE2
12.0, 2010 AVX 386 AVX SSE2

All versions have unfair CPU dispatching. In many cases, however, the Intel compiler can generate calls directly to the SSE2 version of a function when compiling for the SSE2 or higher instruction set. This also applies to non-Intel processors.

Intel Short Vector Math Library

The Short Vector Math Library (SVML) is used for elementary mathematical functions on vector registers (XMM and YMM registers). It is called automatically from code compiled with an Intel compiler when the SSE2 or higher instruction set is enabled. The SVML can also be used with other compilers such as the Gnu C++ compiler.

Library version Intel processor
32 bit mode
non-Intel processor
32 bit mode
Intel processor
64 bit mode
non-Intel processor
64 bit mode
7.1, 2004 SSE2 SSE2 n.a. n.a.
8.1, 2005 SSE3 SSE2 n.a. n.a.
9.1, 2006 SSE3 SSE2 SSE3 SSE2
10.1, 2008 SSE4.2 SSE2 SSE4.2 SSE2
11.1, 2010 AVX SSE2 AVX SSE2
12.0, 2010 AVX SSE2 AVX SSE2

Intel C++ compiler

The Intel C++ compiler has various options that allow the programmer to generate code for a specific instruction set or to make multiple versions of the code for different instruction sets with automatic CPU dispatching. Non-Intel processors will always get the generic version of the code if CPU dispatching is used. The default level for the generic code is SSE2 for version 11 and 12 of the compiler, and 386 for version 10 and earlier in 32-bit mode as indicated in the following table.

Compiler version Intel processor
32 bit mode
non-Intel processor
32 bit mode
Intel processor
64 bit mode
non-Intel processor
64 bit mode
7.1, 2004 SSE2 386 n.a. n.a.
8.1, 2005 SSE3 386 n.a. n.a.
9.1, 2006 SSE3 386 SSE3 SSE2
10.1, 2008 SSE4.2 386 SSE4.2 SSE2
11.1, 2010 AVX SSE2 AVX SSE2
12.0, 2010 AVX SSE2 AVX SSE2

There is an option for setting the generic level higher or lower. For example, the options   /arch:SSE3 /QaxSSE4.1,AVX   will set the generic level to SSE3 and generate three versions of the code for the SSE3, SSE4.2 and AVX instruction sets. Non-Intel processors can only get the generic version, which will be SSE3 in this example. Code compiled with the /Qx option, for example /QxSSE4.1 will fail to run on non-Intel processors and processors without the specified instruction set.

Other Intel products

The above test results are obtained with Intel C++ compilers and function libraries for Windows and Linux. I have found no differences between the Windows and Linux versions in the cases where I have had access to both. I have not tested the Macintosh versions, but this is less relevant as long as no Macintosh computers are available with AMD or VIA processors. I have not tested the Intel Fortran compiler, but it seems to be similar to the Intel C++ compiler with respect to CPU dispatching.

Anybody who have earlier versions of the compiler and function libraries than the ones I have tested are welcome to contact me.

 
thread Intel's "cripple AMD" function new - Agner Fog - 2009-12-30
reply Intel's new - Felid - 2010-01-01
replythread Intel's "cripple AMD" function new - inhahe - 2010-01-03
last replythread Intel's new - Agner Fog - 2010-01-04
replythread Intel's compiler is the best? new - Weber - 2010-01-04
last reply Intel's compiler is the best? new - Agner Fog - 2010-01-09
reply Intel article new - Agner Fog - 2010-01-22
last replythread Intel's new - Deng - 2016-12-11
last replythread Intel's "cripple AMD" function new - Biplab Raut - 2019-12-20
last reply Intel's cripple AMD function new - Agner - 2019-12-29
replythread Web Parallels new - Jeff Craig - 2010-01-04
last replythread More Parallels new - Agner Fog - 2010-01-23
reply Early Examples new - Yuhong Bao - 2010-02-01
last reply More Parallels new - Yuhong Bao - 2010-02-20
replythread New CPUID manipulation program new - Agner Fog - 2010-01-22
replythread CPUID manipulation through virtualization new - Andrew Lofthouse - 2010-08-16
reply CPUID manipulation through virtualization new - Agner Fog - 2010-08-16
replythread CPUID manipulation program for AMD new - Agner - 2010-10-01
last replythread CPUID manipulation program for AMD new - Ralf - 2012-01-30
last reply CPUID manipulation program for AMD new - Agner - 2012-01-31
last reply CPUID manipulation through virtualization new - akshay - 2015-07-08
last replythread New CPUID manipulation program new - AVK - 2011-02-09
last reply New CPUID manipulation program new - Agner - 2011-02-09
reply AMD Blog on compilers/benchmarch new - margaret lewis - 2010-02-01
replythread New version is still crippling Intel's competitors new - Agner Fog - 2010-06-29
last reply New version is still crippling Intel's competitors new - granyte - 2014-09-16
reply Out of court settlement with FTC new - Agner Fog - 2010-08-05
reply AMD library contains Intel's cripple-AMD function! new - Agner Fog - 2010-08-11
replythread Common math programs are affected new - Agner Fog - 2010-08-20
last reply Preliminary test results for Matlab new - Agner Fog - 2010-09-16
replythread Overview of CPU dispatching in Intel software - Agner Fog - 2010-08-23
last reply Overview of CPU dispatching in Intel software new - Mingye Wang - 2020-08-31
replythread New Intel compiler version - still the same! new - Agner Fog - 2010-09-22
reply GCC now has support for function dispatch new - Jean-Luc - 2010-09-27
replythread Intel compiler question new - James Russell - 2010-10-11
last reply Intel compiler question new - Agner - 2010-10-12
reply New Intel compiler version - still the same! new - Don Kretsch - 2010-11-29
last replythread New Intel compiler version - still the same! new - Daniel - 2011-12-23
last replythread New Intel compiler version - still the same! new - Agner - 2011-12-25
last replythread New Intel compiler version - still the same! new - Stanley Theamer - 2012-02-12
last reply New Intel compiler version - still the same! new - Stretcho - 2012-03-14
replythread Still no library that is optimal on all processors new - Agner - 2012-04-18
replythread Still no library that is optimal on all processors new - Guest - 2012-05-17
last replythread Still no library that is optimal on all processors new - Agner - 2012-05-17
last replythread Still no library that is optimal on all processors new - David - 2012-05-19
last replythread Still no library that is optimal on all processors new - Agner - 2012-05-20
last reply Still no library that is optimal on all processors new - Bubba_Hotepp - 2012-06-16
last replythread Still no library that is optimal on all processors new - Marat Dukhan - 2013-05-20
last replythread Still no library that is optimal on all processors new - Agner - 2013-05-21
last replythread This is still going on, wow just wow new - Vuurdraak - 2016-11-10
last replythread This is still going on, wow just wow new - Agner - 2016-11-10
last replythread This is still going on, wow just wow new - Vuurdraak - 2016-11-11
last replythread This is still going on, wow just wow new - Denis - 2017-01-02
last replythread This is still going on, wow just wow new - Agner - 2017-01-02
replythread RYZEN thoughts? new - Noob programmer - 2017-03-10
last replythread RYZEN thoughts? new - Chromatix - 2017-03-16
last replythread RYZEN thoughts? new - Peter - 2017-04-11
replythread RYZEN thoughts? new - Agner - 2017-04-12
last replythread RYZEN thoughts? new - Ballsystemlord - 2019-02-12
last reply RYZEN thoughts? new - Agner - 2019-02-13
last reply RYZEN thoughts? new - itsmydamnation - 2017-04-21
last reply This is still going on, wow just wow new - Naoki Shibata - 2017-07-19
replythread A long history of legal antitrust battles new - Agner - 2017-07-27
last replythread A long history of legal antitrust battles new - Jorcy Neto - 2017-07-27
last replythread A long history of legal antitrust battles new - Royi - 2018-02-19
last reply A long history of legal antitrust battles new - Agner - 2018-05-15
reply Intel's "cripple AMD" function new - PCPMD - 2019-02-27
replythread Patches and workarounds new - Neville C - 2019-11-21
last reply Patches and workarounds new - Mingye Wang - 2020-09-01
replythread Intel's "cripple AMD" function new - Walker - 2020-06-29
last replythread Intel's "cripple AMD" function new - Forsen - 2020-09-16
reply Intel's new - Agner - 2020-09-16
last replythread Intel's "cripple AMD" function new - ETERNALBLUEbullrun - 2021-11-28
last reply Intel's new - Agner - 2021-11-28
last replythread New Intel compiler. Latest update new - Agner - 2022-08-08
last replythread MKL performance on AMD with the new compiler new - Gil Moses - 2022-08-22
last reply MKL performance on AMD with the new compiler new - Agner - 2022-08-22