Agner`s CPU blog

Software optimization resources | E-mail subscription to this blog | www.agner.org

Still no library that is optimal on all processors
Author: Agner Date: 2012-04-18 08:52

Choosing the most efficient function library can be a nightmare to a programmer. I have tried to calculate the cosine function with different libraries and compare the calculation time. The best version is 19 times faster than the worst!

AMD have now updated their math libraries and added CPU dispatching. There are two versions of code in AMD's LIBM library: One for the SSE2 instruction set and one for AVX and FMA4. Intel processors will run the inferior SSE2 branch because they don't have the FMA4 instruction set. The incompatibility between Intel's and AMD's FMA instructions is another scandal, which I have discussed in this blog post. The AMD library does not check the CPU brand name as Intel libraries do. It only checks for the FMA4 instructions which are not supported by Intel processors, although - quite ironically - they were designed by Intel. It will be possible to run the better branch on Intel processors if Intel decides to support the FMA4 instruction set in the future.

The following table shows the results of some tests I have made of different math libraries.

Library Elements per vector Dispatch version Time, 32 bit mode Time, 64 bit mode
glibc 2.13 1 none 1400 1030
MS 17.00 1 SSE2 724 200
Intel 12.1.3 1 generic 1950 303
Intel 12.1.3 1 Intel 720 295
Intel SVML 12.1.3 4 generic (SSE2) 360 203
Intel SVML 12.1.3 4 Intel AVX 99 188
Intel SVML 12.1.3 8 generic (AVX) 112 128
Intel SVML 12.1.3 8 Intel AVX 108 101
AMD LIBM 3.0.2 4 generic (SSE2) n.a. 245
AMD LIBM 3.0.2 4 FMA4 n.a. 148
Calculation time in clock cycles for the cosine of a vector of 8 single-precision floats on an AMD Bulldozer CPU in a single thread with different function libraries.
(values are imprecise due to the varying clock frequency).

The Gnu function library (glibc) uses an outdated and inefficient code. The Microsoft library has decent performance in the 64-bit version, but of course it supports only the Windows platform. Intel's general math library is no better in my test case, but Intel's Short Vector Math Library (SVML) is very good. The SVML library supports vectors of 4 floats in an XMM register (SSE2) or a vector of 8 floats in a YMM register (AVX). It will choose the inferior generic path for non-Intel processors unless we replace Intel's CPU-dispatcher as described above. Intel's libraries are available for both Windows, Linux and Mac. AMD's LIBM library supports vectors of 4 floats. It is available for Windows and Linux, but only in 64-bit mode.

The sad conclusion is that we have no fully optimized math function library that supports all brands of x86 processors and all operating systems. If we want optimal performance on all processors, the best choice is to use Intel's SVML library and manipulate it into treating non-Intel processors better.

It would be nice if more people would work on improving glibc. This library supports all processors and platforms, but it is poorly optimized. Only a few memory and string functions in glibc have CPU dispatching, while the math functions have only old and poorly optimized versions. It would also be nice to have vector versions of the math functions in glibc because the Gnu compiler has support for such functions.

 
thread Intel's "cripple AMD" function new - Agner Fog - 2009-12-30
reply Intel's new - Felid - 2010-01-01
replythread Intel's "cripple AMD" function new - inhahe - 2010-01-03
last replythread Intel's new - Agner Fog - 2010-01-04
replythread Intel's compiler is the best? new - Weber - 2010-01-04
last reply Intel's compiler is the best? new - Agner Fog - 2010-01-09
reply Intel article new - Agner Fog - 2010-01-22
last reply Intel's new - Deng - 2016-12-11
replythread Web Parallels new - Jeff Craig - 2010-01-04
last replythread More Parallels new - Agner Fog - 2010-01-23
reply Early Examples new - Yuhong Bao - 2010-02-01
last reply More Parallels new - Yuhong Bao - 2010-02-20
replythread New CPUID manipulation program new - Agner Fog - 2010-01-22
replythread CPUID manipulation through virtualization new - Andrew Lofthouse - 2010-08-16
reply CPUID manipulation through virtualization new - Agner Fog - 2010-08-16
replythread CPUID manipulation program for AMD new - Agner - 2010-10-01
last replythread CPUID manipulation program for AMD new - Ralf - 2012-01-30
last reply CPUID manipulation program for AMD new - Agner - 2012-01-31
last reply CPUID manipulation through virtualization new - akshay - 2015-07-08
last replythread New CPUID manipulation program new - AVK - 2011-02-09
last reply New CPUID manipulation program new - Agner - 2011-02-09
reply AMD Blog on compilers/benchmarch new - margaret lewis - 2010-02-01
replythread New version is still crippling Intel's competitors new - Agner Fog - 2010-06-29
last reply New version is still crippling Intel's competitors new - granyte - 2014-09-16
reply Out of court settlement with FTC new - Agner Fog - 2010-08-05
reply AMD library contains Intel's cripple-AMD function! new - Agner Fog - 2010-08-11
replythread Common math programs are affected new - Agner Fog - 2010-08-20
last reply Preliminary test results for Matlab new - Agner Fog - 2010-09-16
reply Overview of CPU dispatching in Intel software new - Agner Fog - 2010-08-23
replythread New Intel compiler version - still the same! new - Agner Fog - 2010-09-22
reply GCC now has support for function dispatch new - Jean-Luc - 2010-09-27
replythread Intel compiler question new - James Russell - 2010-10-11
last reply Intel compiler question new - Agner - 2010-10-12
reply New Intel compiler version - still the same! new - Don Kretsch - 2010-11-29
last replythread New Intel compiler version - still the same! new - Daniel - 2011-12-23
last replythread New Intel compiler version - still the same! new - Agner - 2011-12-25
last replythread New Intel compiler version - still the same! new - Stanley Theamer - 2012-02-12
last reply New Intel compiler version - still the same! new - Stretcho - 2012-03-14
replythread Still no library that is optimal on all processors - Agner - 2012-04-18
replythread Still no library that is optimal on all processors new - Guest - 2012-05-17
last replythread Still no library that is optimal on all processors new - Agner - 2012-05-17
last replythread Still no library that is optimal on all processors new - David - 2012-05-19
last replythread Still no library that is optimal on all processors new - Agner - 2012-05-20
last reply Still no library that is optimal on all processors new - Bubba_Hotepp - 2012-06-16
last replythread Still no library that is optimal on all processors new - Marat Dukhan - 2013-05-20
last replythread Still no library that is optimal on all processors new - Agner - 2013-05-21
last replythread This is still going on, wow just wow new - Vuurdraak - 2016-11-10
last replythread This is still going on, wow just wow new - Agner - 2016-11-10
last replythread This is still going on, wow just wow new - Vuurdraak - 2016-11-11
last replythread This is still going on, wow just wow new - Denis - 2017-01-02
last replythread This is still going on, wow just wow new - Agner - 2017-01-02
replythread RYZEN thoughts? new - Noob programmer - 2017-03-10
last replythread RYZEN thoughts? new - Chromatix - 2017-03-16
last replythread RYZEN thoughts? new - Peter - 2017-04-11
reply RYZEN thoughts? new - Agner - 2017-04-12
last reply RYZEN thoughts? new - itsmydamnation - 2017-04-21
last reply This is still going on, wow just wow new - Naoki Shibata - 2017-07-19
last replythread A long history of legal antitrust battles new - Agner - 2017-07-27
last reply A long history of legal antitrust battles new - Jorcy Neto - 2017-07-27