Preliminary test results for Matlab
I have now verified that the performance of Matlab depends strongly on the CPU vendor
string. The benchmark test on my VIA processor gives the following results.
|
Benchmark timing (lower is better)
|
Faked CPU |
Matrix LU factorization |
Fast Fourier Transform |
Ordinary differential equation |
Solve sparce matrix |
2-D graphics |
3-D graphics |
VIA |
0.7243 |
0.4415 |
0.2074 |
0.5543 |
1.1418 |
0.8214 |
AMD |
0.3197 |
0.4502 |
0.2201 |
0.4952 |
1.1812 |
0.8179 |
Intel |
0.3161 |
0.2729 |
0.2218 |
0.4958 |
1.1967 |
0.7945 |
Built-in benchmark test on Matlab v. 7.11, 32 bit, Windows
7, VIA Nano L3050, 1.8 GHz. Average of 10 measurements. |
These differences in benchmarks are mostly due to the fact that Matlab uses
different function libraries for different processors. (The graphics performance
is irrelevant here since I have no proper graphics card on my test board).
It is possible to choose
different function libraries by modifying two poorly documented configuration files, named
blas.spec and fftw.spec.
By modifying these configuration files, I got the following benchmarks for different function libraries on the VIA processor.
|
Benchmark timing (lower is better) |
BLAS library |
Matrix LU factorization |
Solve sparce matrix |
Ordinary differential equation |
mkl.dll |
0.3162 |
0.4949 |
0.2213 |
acml.dll |
0.6232 |
0.7589 |
0.2355 |
Default |
0.7238 |
0.5537 |
0.2075 |
Benchmark tests on VIA processor with different libraries
specified in blas.spec file. Same conditions as above. |
|
Benchmark timing (lower is better)
|
FFT libraries |
Fast Fourier Transform |
libfftw3.dll libfftw3f.dll |
0.4494 |
libfftw3i.dll libfftw3f.dll |
0.2708 |
Benchmark tests on VIA processor with different libraries
specified in fftw.spec file. Same conditions as above. |
This shows that most of the difference in performance can be accounted for by
the fact that Matlab has specified different libraries to be used on different
processor brands. The Matlab configuration files make specifications only for
Intel and AMD processors, while VIA processors get a default library.
Apparently, they have never heard about VIA processors. As you can see, the
speed can be more than doubled for some tasks by adding an appropriate
specification for VIA processors to the configuration files.
Next, I analyzed the library files to see if there was any CPU dispatching
inside these libraries. This analysis gave the following results:
mkl.dll
This is Intel's Math Kernel Library version 10.2.3, 32 bit. As mentioned in another
posting, this 32-bit version of MKL uses the same instruction sets for Intel
and non-Intel processors, while the 64-bit version gives a (minor) advantage to
Intel processors over non-Intel processors. What is more important is that this
MKL contains another check for the Intel vendor string in connection with a
check for the number of processor cores. It looks like multithreading works
inferior, or not at all, on non-Intel processors in this library. If this
suspicion holds true then it can have quite a dramatic negative effect on the
performance on AMD processors. However, I cannot test this with my current test
methods because there is no VIA processor with multiple cores yet. I don't have
the time to make another test setup right now so unfortunately we can't tell yet
if this affects multi-threading on AMD processors.
acml.dll
This is AMD's Core Math Library, version 4.2.0. This version of ACML is
compiled with an Intel compiler, just like the one I have reported about in a
previous posting. It contains an Intel CPU dispatcher which enables the SSE2
instruction set only on Intel processors. This has minor effect in this case
because only a few functions are affected. Furthermore, it uses
Intel's Open MP library for threading. This library may have inferior
functionality on non-Intel processors.
default blas library
This library contains no CPU dispatching. It calls several other libraries that
do have CPU dispatching, but apparently nothing that favorizes a specific CPU
vendor.
FFT libraries
These libraries contain a CPU dispatcher that enables SSE2 in some functions.
They are compiled with a Microsoft compiler and they contain no check for the
CPU vendor. The library used for AMD and VIA processors (libfftw3.dll) has very
little SSE2 code, while the library used for Intel processors (libfftw3i.dll)
has more SSE2 code. Reportedly, Matlab have disabled the use of SSE2 on AMD
processors because it was inefficient in their tests (link).
This decision is probably based on the old AMD K8 processors, while SSE2 is more
efficient in newer AMD processors.
My conclusion so far is that the performance of Matlab depends strongly on
the CPU vendor string, but this effect is mainly due to suboptimal settings in
the configuration files, and this problem can be solved easily by modifying
these files. Several of the library files contain Intel CPU dispatchers that
favorize Intel processors, but the effect of this is too small to give
statistically significant results in my tests.
So far, I have only made tests on a single-core processor. There may be
larger effects on multi-core processors, but I have not been able to test this
yet. I have made a small test package with the appropriate configuration files
and descriptions for my readers to experiment with. You can download it here. |