Agner`s CPU blog

Software optimization resources | E-mail subscription to this blog | www.agner.org

AMD library contains Intel's cripple-AMD function!
Author: Agner Fog Date: 2010-08-11 10:57

This issue is getting more and more absurd the more I dig into it. AMD makes a function library called AMD Core Math Library (ACML) to match Intel's Math Kernel Library (MKL). I have tested a Windows version of ACML and found that some of the functions run faster when the CPU vendor ID is artificially changed to "GenuineIntel". Maybe this is not so surprising after all, since this version of ACML is compiled with Intel's Fortran compiler.

Here are some of the most marked test results:

Execution time
(lower is better)
Faked CPU vendor ID  
ACML function VIA AMD Intel % difference
drandlogistic 1.95 1.96 1.84 6
drandexponential 1.67 1.72 1.57 8
drandlognormal 3.42 3.46 2.99 15
ACML version acml4.4.0-ifort32.exe, VIA L3050 1.8 GHz processor, Windows 7, 32 bit. MS VS 2010 C++. Loop 100000 times * 256 values. Time unit = 109 clock cycles. Average of 20 runs.

On many of the functions in ACML there is little or no difference in performance depending on the CPU vendor ID, but some functions have a significant bias, as shown in the table above. Intel have repeatedly claimed that their compilers give a good performance on AMD chips if you compile for the SSE2 instruction set. Maybe the AMD people have believed this claim, or maybe they had no other option since they couldn't find a better Fortran compiler. With this compiler option, the compiler-generated code will be for the SSE2 instruction set only. I think that Intel first made the SSE2 recommendation at the time when AMD processors supported only SSE2, so this was the best performance you could get at that time. Today, you get suboptimal performance when compiling for SSE2 because later instruction sets are not used. And of course, the code will not work on older computers without SSE2.

To find the reason for the vendor ID effect, I decided to investigate the function with the strongest effect, which is the drandlognormal function. After a lot of detective work, I found that drandlognormal calls a logarithm function in Intel's Short Vector Math Library (SVML). This logarithm function is dispatched into three branches for the SSE2/generic, SSE3, and the future AVX instruction set, respectively. It uses the standard Intel CPU dispatcher, which gives the generic branch to all non-Intel processors. The SVML library supports only SSE2 and above, so the generic branch uses SSE2. When my VIA processor fakes to be an Intel, it gets the SSE3 branch, which is better optimized. The difference in performance is likely to be higher on future processors that support AVX.

There is another version of ACML for Windows built with the PGI compiler, but I couldn't make it work because some library files were missing.

The proposed settlement with FTC requires that Intel shall reimburse its compiler customers for the cost of recompiling their code with a different compiler. While this reimbursement program probably has little more than symbolic significance, it would be funny to see Intel compensating AMD for relying on their compiler. Unfortunately, it will be difficult for AMD to find a better Fortran compiler.

 
thread Intel's "cripple AMD" function - Agner Fog - 2009-12-30
reply Intel's - Felid - 2010-01-01
replythread Intel's "cripple AMD" function - inhahe - 2010-01-03
last replythread Intel's - Agner Fog - 2010-01-04
replythread Intel's compiler is the best? - Weber - 2010-01-04
last reply Intel's compiler is the best? - Agner Fog - 2010-01-09
last reply Intel article - Agner Fog - 2010-01-22
replythread Web Parallels - Jeff Craig - 2010-01-04
last replythread More Parallels - Agner Fog - 2010-01-23
reply Early Examples - Yuhong Bao - 2010-02-01
last reply More Parallels - Yuhong Bao - 2010-02-20
replythread New CPUID manipulation program - Agner Fog - 2010-01-22
replythread CPUID manipulation through virtualization - Andrew Lofthouse - 2010-08-16
reply CPUID manipulation through virtualization - Agner Fog - 2010-08-16
replythread CPUID manipulation program for AMD - Agner - 2010-10-01
last replythread CPUID manipulation program for AMD - Ralf - 2012-01-30
last reply CPUID manipulation program for AMD - Agner - 2012-01-31
last reply CPUID manipulation through virtualization - akshay - 2015-07-08
last replythread New CPUID manipulation program - AVK - 2011-02-09
last reply New CPUID manipulation program - Agner - 2011-02-09
reply AMD Blog on compilers/benchmarch - margaret lewis - 2010-02-01
replythread New version is still crippling Intel's competitors - Agner Fog - 2010-06-29
last reply New version is still crippling Intel's competitors - granyte - 2014-09-16
reply Out of court settlement with FTC - Agner Fog - 2010-08-05
reply AMD library contains Intel's cripple-AMD function! - Agner Fog - 2010-08-11
replythread Common math programs are affected - Agner Fog - 2010-08-20
last reply Preliminary test results for Matlab - Agner Fog - 2010-09-16
reply Overview of CPU dispatching in Intel software - Agner Fog - 2010-08-23
replythread New Intel compiler version - still the same! - Agner Fog - 2010-09-22
reply GCC now has support for function dispatch - Jean-Luc - 2010-09-27
replythread Intel compiler question - James Russell - 2010-10-11
last reply Intel compiler question - Agner - 2010-10-12
reply New Intel compiler version - still the same! - Don Kretsch - 2010-11-29
last replythread New Intel compiler version - still the same! - Daniel - 2011-12-23
last replythread New Intel compiler version - still the same! - Agner - 2011-12-25
last replythread New Intel compiler version - still the same! - Stanley Theamer - 2012-02-12
last reply New Intel compiler version - still the same! - Stretcho - 2012-03-14
last replythread Still no library that is optimal on all processors - Agner - 2012-04-18
replythread Still no library that is optimal on all processors - Guest - 2012-05-17
last replythread Still no library that is optimal on all processors - Agner - 2012-05-17
last replythread Still no library that is optimal on all processors - David - 2012-05-19
last replythread Still no library that is optimal on all processors - Agner - 2012-05-20
last reply Still no library that is optimal on all processors - Bubba_Hotepp - 2012-06-16
last replythread Still no library that is optimal on all processors - Marat Dukhan - 2013-05-20
last reply Still no library that is optimal on all processors - Agner - 2013-05-21