Agner`s CPU blog

Software optimization resources | E-mail subscription to this blog | www.agner.org

AMD library contains Intel's cripple-AMD function!
Author: Agner Fog Date: 2010-08-11 10:57

This issue is getting more and more absurd the more I dig into it. AMD makes a function library called AMD Core Math Library (ACML) to match Intel's Math Kernel Library (MKL). I have tested a Windows version of ACML and found that some of the functions run faster when the CPU vendor ID is artificially changed to "GenuineIntel". Maybe this is not so surprising after all, since this version of ACML is compiled with Intel's Fortran compiler.

Here are some of the most marked test results:

Execution time
(lower is better)
Faked CPU vendor ID  
ACML function VIA AMD Intel % difference
drandlogistic 1.95 1.96 1.84 6
drandexponential 1.67 1.72 1.57 8
drandlognormal 3.42 3.46 2.99 15
ACML version acml4.4.0-ifort32.exe, VIA L3050 1.8 GHz processor, Windows 7, 32 bit. MS VS 2010 C++. Loop 100000 times * 256 values. Time unit = 109 clock cycles. Average of 20 runs.

On many of the functions in ACML there is little or no difference in performance depending on the CPU vendor ID, but some functions have a significant bias, as shown in the table above. Intel have repeatedly claimed that their compilers give a good performance on AMD chips if you compile for the SSE2 instruction set. Maybe the AMD people have believed this claim, or maybe they had no other option since they couldn't find a better Fortran compiler. With this compiler option, the compiler-generated code will be for the SSE2 instruction set only. I think that Intel first made the SSE2 recommendation at the time when AMD processors supported only SSE2, so this was the best performance you could get at that time. Today, you get suboptimal performance when compiling for SSE2 because later instruction sets are not used. And of course, the code will not work on older computers without SSE2.

To find the reason for the vendor ID effect, I decided to investigate the function with the strongest effect, which is the drandlognormal function. After a lot of detective work, I found that drandlognormal calls a logarithm function in Intel's Short Vector Math Library (SVML). This logarithm function is dispatched into three branches for the SSE2/generic, SSE3, and the future AVX instruction set, respectively. It uses the standard Intel CPU dispatcher, which gives the generic branch to all non-Intel processors. The SVML library supports only SSE2 and above, so the generic branch uses SSE2. When my VIA processor fakes to be an Intel, it gets the SSE3 branch, which is better optimized. The difference in performance is likely to be higher on future processors that support AVX.

There is another version of ACML for Windows built with the PGI compiler, but I couldn't make it work because some library files were missing.

The proposed settlement with FTC requires that Intel shall reimburse its compiler customers for the cost of recompiling their code with a different compiler. While this reimbursement program probably has little more than symbolic significance, it would be funny to see Intel compensating AMD for relying on their compiler. Unfortunately, it will be difficult for AMD to find a better Fortran compiler.

 
thread Intel's "cripple AMD" function new - Agner Fog - 2009-12-30
reply Intel's new - Felid - 2010-01-01
replythread Intel's "cripple AMD" function new - inhahe - 2010-01-03
last replythread Intel's new - Agner Fog - 2010-01-04
replythread Intel's compiler is the best? new - Weber - 2010-01-04
last reply Intel's compiler is the best? new - Agner Fog - 2010-01-09
reply Intel article new - Agner Fog - 2010-01-22
last reply Intel's new - Deng - 2016-12-11
replythread Web Parallels new - Jeff Craig - 2010-01-04
last replythread More Parallels new - Agner Fog - 2010-01-23
reply Early Examples new - Yuhong Bao - 2010-02-01
last reply More Parallels new - Yuhong Bao - 2010-02-20
replythread New CPUID manipulation program new - Agner Fog - 2010-01-22
replythread CPUID manipulation through virtualization new - Andrew Lofthouse - 2010-08-16
reply CPUID manipulation through virtualization new - Agner Fog - 2010-08-16
replythread CPUID manipulation program for AMD new - Agner - 2010-10-01
last replythread CPUID manipulation program for AMD new - Ralf - 2012-01-30
last reply CPUID manipulation program for AMD new - Agner - 2012-01-31
last reply CPUID manipulation through virtualization new - akshay - 2015-07-08
last replythread New CPUID manipulation program new - AVK - 2011-02-09
last reply New CPUID manipulation program new - Agner - 2011-02-09
reply AMD Blog on compilers/benchmarch new - margaret lewis - 2010-02-01
replythread New version is still crippling Intel's competitors new - Agner Fog - 2010-06-29
last reply New version is still crippling Intel's competitors new - granyte - 2014-09-16
reply Out of court settlement with FTC new - Agner Fog - 2010-08-05
reply AMD library contains Intel's cripple-AMD function! - Agner Fog - 2010-08-11
replythread Common math programs are affected new - Agner Fog - 2010-08-20
last reply Preliminary test results for Matlab new - Agner Fog - 2010-09-16
reply Overview of CPU dispatching in Intel software new - Agner Fog - 2010-08-23
replythread New Intel compiler version - still the same! new - Agner Fog - 2010-09-22
reply GCC now has support for function dispatch new - Jean-Luc - 2010-09-27
replythread Intel compiler question new - James Russell - 2010-10-11
last reply Intel compiler question new - Agner - 2010-10-12
reply New Intel compiler version - still the same! new - Don Kretsch - 2010-11-29
last replythread New Intel compiler version - still the same! new - Daniel - 2011-12-23
last replythread New Intel compiler version - still the same! new - Agner - 2011-12-25
last replythread New Intel compiler version - still the same! new - Stanley Theamer - 2012-02-12
last reply New Intel compiler version - still the same! new - Stretcho - 2012-03-14
replythread Still no library that is optimal on all processors new - Agner - 2012-04-18
replythread Still no library that is optimal on all processors new - Guest - 2012-05-17
last replythread Still no library that is optimal on all processors new - Agner - 2012-05-17
last replythread Still no library that is optimal on all processors new - David - 2012-05-19
last replythread Still no library that is optimal on all processors new - Agner - 2012-05-20
last reply Still no library that is optimal on all processors new - Bubba_Hotepp - 2012-06-16
last replythread Still no library that is optimal on all processors new - Marat Dukhan - 2013-05-20
last replythread Still no library that is optimal on all processors new - Agner - 2013-05-21
last replythread This is still going on, wow just wow new - Vuurdraak - 2016-11-10
last replythread This is still going on, wow just wow new - Agner - 2016-11-10
last replythread This is still going on, wow just wow new - Vuurdraak - 2016-11-11
last replythread This is still going on, wow just wow new - Denis - 2017-01-02
last replythread This is still going on, wow just wow new - Agner - 2017-01-02
replythread RYZEN thoughts? new - Noob programmer - 2017-03-10
last replythread RYZEN thoughts? new - Chromatix - 2017-03-16
last replythread RYZEN thoughts? new - Peter - 2017-04-11
reply RYZEN thoughts? new - Agner - 2017-04-12
last reply RYZEN thoughts? new - itsmydamnation - 2017-04-21
last reply This is still going on, wow just wow new - Naoki Shibata - 2017-07-19
last replythread A long history of legal antitrust battles new - Agner - 2017-07-27
last reply A long history of legal antitrust battles new - Jorcy Neto - 2017-07-27