Agner`s CPU blog

Software optimization resources | E-mail subscription to this blog | www.agner.org

Preliminary test results for Matlab
Author: Agner Fog Date: 2010-09-16 07:20

Preliminary test results for Matlab

I have now verified that the performance of Matlab depends strongly on the CPU vendor string. The benchmark test on my VIA processor gives the following results.

 

Benchmark timing (lower is better)

Faked CPU Matrix LU factorization Fast Fourier Transform Ordinary differential equation Solve sparce matrix 2-D graphics 3-D graphics
VIA 0.7243 0.4415 0.2074 0.5543 1.1418 0.8214
AMD 0.3197 0.4502 0.2201 0.4952 1.1812 0.8179
Intel 0.3161 0.2729 0.2218 0.4958 1.1967 0.7945
Built-in benchmark test on Matlab v. 7.11, 32 bit, Windows 7, VIA Nano L3050, 1.8 GHz. Average of 10 measurements.

These differences in benchmarks are mostly due to the fact that Matlab uses different function libraries for different processors. (The graphics performance is irrelevant here since I have no proper graphics card on my test board).

It is possible to choose different function libraries by modifying two poorly documented configuration files, named blas.spec and fftw.spec.

By modifying these configuration files, I got the following benchmarks for different function libraries on the VIA processor.

 

Benchmark timing (lower is better)

BLAS library Matrix LU factorization Solve sparce matrix Ordinary differential equation
mkl.dll 0.3162 0.4949 0.2213
acml.dll 0.6232 0.7589 0.2355
Default 0.7238 0.5537 0.2075
Benchmark tests on VIA processor with different libraries specified in blas.spec file. Same conditions as above.
 
 

Benchmark timing (lower is better)

FFT libraries Fast Fourier Transform
libfftw3.dll libfftw3f.dll 0.4494
libfftw3i.dll libfftw3f.dll 0.2708
Benchmark tests on VIA processor with different libraries specified in fftw.spec file. Same conditions as above.

This shows that most of the difference in performance can be accounted for by the fact that Matlab has specified different libraries to be used on different processor brands. The Matlab configuration files make specifications only for Intel and AMD processors, while VIA processors get a default library. Apparently, they have never heard about VIA processors. As you can see, the speed can be more than doubled for some tasks by adding an appropriate specification for VIA processors to the configuration files.

Next, I analyzed the library files to see if there was any CPU dispatching inside these libraries. This analysis gave the following results:

mkl.dll
This is Intel's Math Kernel Library version 10.2.3, 32 bit. As mentioned in another posting, this 32-bit version of MKL uses the same instruction sets for Intel and non-Intel processors, while the 64-bit version gives a (minor) advantage to Intel processors over non-Intel processors. What is more important is that this MKL contains another check for the Intel vendor string in connection with a check for the number of processor cores. It looks like multithreading works inferior, or not at all, on non-Intel processors in this library. If this suspicion holds true then it can have quite a dramatic negative effect on the performance on AMD processors. However, I cannot test this with my current test methods because there is no VIA processor with multiple cores yet. I don't have the time to make another test setup right now so unfortunately we can't tell yet if this affects multi-threading on AMD processors. 

acml.dll
This is AMD's Core Math Library, version 4.2.0. This version of ACML is compiled with an Intel compiler, just like the one I have reported about in a previous posting. It contains an Intel CPU dispatcher which enables the SSE2 instruction set only on Intel processors. This has minor effect in this case because only a few functions are affected. Furthermore, it uses Intel's Open MP library for threading. This library may have inferior functionality on non-Intel processors.

default blas library
This library contains no CPU dispatching. It calls several other libraries that do have CPU dispatching, but apparently nothing that favorizes a specific CPU vendor.

FFT libraries
These libraries contain a CPU dispatcher that enables SSE2 in some functions. They are compiled with a Microsoft compiler and they contain no check for the CPU vendor. The library used for AMD and VIA processors (libfftw3.dll) has very little SSE2 code, while the library used for Intel processors (libfftw3i.dll) has more SSE2 code. Reportedly, Matlab have disabled the use of SSE2 on AMD processors because it was inefficient in their tests (link). This decision is probably based on the old AMD K8 processors, while SSE2 is more efficient in newer AMD processors.

My conclusion so far is that the performance of Matlab depends strongly on the CPU vendor string, but this effect is mainly due to suboptimal settings in the configuration files, and this problem can be solved easily by modifying these files. Several of the library files contain Intel CPU dispatchers that favorize Intel processors, but the effect of this is too small to give statistically significant results in my tests.

So far, I have only made tests on a single-core processor. There may be larger effects on multi-core processors, but I have not been able to test this yet. I have made a small test package with the appropriate configuration files and descriptions for my readers to experiment with. You can download it here.

 
thread Intel's "cripple AMD" function new - Agner Fog - 2009-12-30
reply Intel's new - Felid - 2010-01-01
replythread Intel's "cripple AMD" function new - inhahe - 2010-01-03
last replythread Intel's new - Agner Fog - 2010-01-04
replythread Intel's compiler is the best? new - Weber - 2010-01-04
last reply Intel's compiler is the best? new - Agner Fog - 2010-01-09
reply Intel article new - Agner Fog - 2010-01-22
last reply Intel's new - Deng - 2016-12-11
replythread Web Parallels new - Jeff Craig - 2010-01-04
last replythread More Parallels new - Agner Fog - 2010-01-23
reply Early Examples new - Yuhong Bao - 2010-02-01
last reply More Parallels new - Yuhong Bao - 2010-02-20
replythread New CPUID manipulation program new - Agner Fog - 2010-01-22
replythread CPUID manipulation through virtualization new - Andrew Lofthouse - 2010-08-16
reply CPUID manipulation through virtualization new - Agner Fog - 2010-08-16
replythread CPUID manipulation program for AMD new - Agner - 2010-10-01
last replythread CPUID manipulation program for AMD new - Ralf - 2012-01-30
last reply CPUID manipulation program for AMD new - Agner - 2012-01-31
last reply CPUID manipulation through virtualization new - akshay - 2015-07-08
last replythread New CPUID manipulation program new - AVK - 2011-02-09
last reply New CPUID manipulation program new - Agner - 2011-02-09
reply AMD Blog on compilers/benchmarch new - margaret lewis - 2010-02-01
replythread New version is still crippling Intel's competitors new - Agner Fog - 2010-06-29
last reply New version is still crippling Intel's competitors new - granyte - 2014-09-16
reply Out of court settlement with FTC new - Agner Fog - 2010-08-05
reply AMD library contains Intel's cripple-AMD function! new - Agner Fog - 2010-08-11
replythread Common math programs are affected new - Agner Fog - 2010-08-20
last reply Preliminary test results for Matlab - Agner Fog - 2010-09-16
reply Overview of CPU dispatching in Intel software new - Agner Fog - 2010-08-23
replythread New Intel compiler version - still the same! new - Agner Fog - 2010-09-22
reply GCC now has support for function dispatch new - Jean-Luc - 2010-09-27
replythread Intel compiler question new - James Russell - 2010-10-11
last reply Intel compiler question new - Agner - 2010-10-12
reply New Intel compiler version - still the same! new - Don Kretsch - 2010-11-29
last replythread New Intel compiler version - still the same! new - Daniel - 2011-12-23
last replythread New Intel compiler version - still the same! new - Agner - 2011-12-25
last replythread New Intel compiler version - still the same! new - Stanley Theamer - 2012-02-12
last reply New Intel compiler version - still the same! new - Stretcho - 2012-03-14
replythread Still no library that is optimal on all processors new - Agner - 2012-04-18
replythread Still no library that is optimal on all processors new - Guest - 2012-05-17
last replythread Still no library that is optimal on all processors new - Agner - 2012-05-17
last replythread Still no library that is optimal on all processors new - David - 2012-05-19
last replythread Still no library that is optimal on all processors new - Agner - 2012-05-20
last reply Still no library that is optimal on all processors new - Bubba_Hotepp - 2012-06-16
last replythread Still no library that is optimal on all processors new - Marat Dukhan - 2013-05-20
last replythread Still no library that is optimal on all processors new - Agner - 2013-05-21
last replythread This is still going on, wow just wow new - Vuurdraak - 2016-11-10
last replythread This is still going on, wow just wow new - Agner - 2016-11-10
last replythread This is still going on, wow just wow new - Vuurdraak - 2016-11-11
last replythread This is still going on, wow just wow new - Denis - 2017-01-02
last replythread This is still going on, wow just wow new - Agner - 2017-01-02
replythread RYZEN thoughts? new - Noob programmer - 2017-03-10
last replythread RYZEN thoughts? new - Chromatix - 2017-03-16
last replythread RYZEN thoughts? new - Peter - 2017-04-11
reply RYZEN thoughts? new - Agner - 2017-04-12
last reply RYZEN thoughts? new - itsmydamnation - 2017-04-21
last reply This is still going on, wow just wow new - Naoki Shibata - 2017-07-19
last replythread A long history of legal antitrust battles new - Agner - 2017-07-27
last reply A long history of legal antitrust battles new - Jorcy Neto - 2017-07-27