Which C++ compiler is best?

News and research about CPU microarchitecture and software optimization
Post Reply
agner
Site Admin
Posts: 75
Joined: 2019-12-27, 18:56:25
Contact:

Which C++ compiler is best?

Post by agner » 2022-08-09, 8:35:04

C++ is the best programming language if you want programs to run fast. My C++ optimization manual explains why.

The number of C++ compilers on the market is decreasing. Many compilers have disappeared from the market or are no longer maintained. Do you remember compiler names like Borland, Embarcadero, CodeGear, PathScale, PGI, Zortech, Symantec, Digital Mars, Watcom, Codeplay, Glockenspiel? It is becoming increasingly expensive to develop C++ compilers because the x86 instruction set now has around two thousand different instructions and the number keeps growing. At the same time, the C++ language keeps developing with new complicated features being added steadily. Optimization techniques are also becoming more and more advanced, where different compilers compete to produce the most optimized code.

It looks like the free open source compilers - Gnu and Clang - are going to outcompete the commercial compilers in this race. The popular Microsoft Visual Studio now has an option for using a Clang compiler instead of Microsoft's own compiler. Intel's compiler has switched to open source. The latest version of Intel's compiler is based on Clang and is now offered for free where previous versions were quite expensive.

I have made a comparison of C++ compilers for the x86 platform. The detailed results are listed in my C++ optimization manual. The results are that the Clang compiler is the best in terms of what kinds of optimizations it can do. The Clang compiler can do really amazing things, like optimizing across a vector permutation. The Clang compiler has one serious drawback, though. It tends to unroll loops excessively. This puts unnecessary pressure on critical resources such as code cache, micro-op cache, and loop buffer in the CPU. It may be wise to optimize for size rather than for speed on the Clang compiler until this problem has been fixed, in order to avoid excessive loop unrolling.

The Gnu compiler is also excellent. It can do most of the same optimizations as Clang can. It is difficult to predict whether Gnu or Clang will produce the best result in a specific case without testing. In most cases, it makes little difference whether you use Gnu or Clang.

The Microsoft compiler is very popular because of the user friendly Visual Studio with excellent debugging features. But in terms of optimization, it is not as good as the other compilers. Sometimes, the Microsoft compiler is using a lot of instructions for doing relatively simple things. The Microsoft compiler is good enough for less demanding applications, but if you want top performance you may use the Clang plugin to Visual Studio. The Microsoft compiler does not support Linux. The other compilers work on all platforms.

The Intel compiler now comes in two versions. A legacy version named "Classic" and a new version termed "LLVM-based". The legacy version is a continuation of the old Intel compiler. It is better than the Microsoft compiler but not as good as Gnu and Clang in terms of optimization, according to my tests. There is a long controversy over the Intel compiler producing code that deliberately degrades performance on competing brands of microprocessors. The long story about this controversy is told in another thread. Both versions of Intel's compiler are now able to produce code that performs well on AMD processors, but there is little reason to use the soon-obsolete Intel legacy compiler. The legacy Intel compiler is currently the only compiler that can make multiple versions of user code with automatic dispatching depending on which instruction set the CPU supports. This mechanism does not work with other brands of microprocessors than Intel.

The LLVM-based Intel compiler is a forking of the Clang compiler. The behavior is almost identical to Clang. Intel claim that it optimizes better than the plain Clang compiler (link), but this is not seen in my tests.

Intel provides a lot of highly optimized function libraries for mathematical and other applications. Intel function libraries have historically been part of the controversy over crippled performance on AMD processors. The situation has now been improved. Most of the function libraries now perform well on AMD processors when used with suitable compiler options in the new Intel compiler or with a non-Intel compiler. However, there is still no guarantee from Intel that all their library functions give fair performance on AMD and other processors. The new Intel compiler may be used where the smooth integration with Intel function libraries is an advantage or if it optimizes better in specific cases, otherwise you may as well use the plain Clang compiler which delivers at least as good performance.

nanxiao
Posts: 1
Joined: 2022-09-06, 7:53:09

Re: Which C++ compiler is best?

Post by nanxiao » 2022-09-06, 13:58:35

C++ is the best programming language if you want programs to run fast.
@agner It seems Rust has become more and more popular recently, and it is said the Rust program performance is quite good. Just curious, how do you think about Rust? Could you compare Rust with C++ if possible? Thanks very much in advance!

agner
Site Admin
Posts: 75
Joined: 2019-12-27, 18:56:25
Contact:

Re: Which C++ compiler is best?

Post by agner » 2022-09-06, 14:59:13

nanxiao wrote:
it is said the Rust program performance is quite good
The Rust compiler is based on LLVM, the same as Clang is using. This means that you can get similar performance as Clang C++ in some cases, but not all. You better do your own tests.

Rustom
Posts: 1
Joined: 2022-09-19, 11:29:14

Re: Which C++ compiler is best?

Post by Rustom » 2022-09-19, 12:31:12

agner wrote:
2022-08-09, 8:35:04
C++ is the best programming language if you want programs to run fast.
...
The Intel compiler now comes in two versions. A legacy version named "Classic" and a new version termed "LLVM-based".
...
Intel function libraries have historically been part of the controversy over crippled performance on AMD processors. The situation has now been improved. Most of the function libraries now perform well on AMD processors when used with suitable compiler options in the new Intel compiler or with a non-Intel compiler. However, there is still no guarantee from Intel that all their library functions give fair performance on AMD and other processors. The new Intel compiler may be used where the smooth integration with Intel function libraries is an advantage or if it optimizes better in specific cases, otherwise you may as well use the plain Clang compiler which delivers at least as good performance.
The key point to note is "when used with suitable compiler options". With AMD CPUs, do not use the /Qxhost option with the "classic" Intel Fortran and C compilers if you care about performance.

The Intel C and Fortran (version 2021.6.0) compilers provide a /Qxhost option, and one may expect that with this option the compiler will select an appropriate instruction set for the host processor. However, I stumbled upon some unusual effects of using this option on an AMD processor. The compilers have a -# option the purpose of which is to show the detailed list of flags passed by the compiler drivers to the compilers proper, and using this option showed me that the correct optimization flag is not passed when I used /Qxhost with the "classic" compilers.

Although the CPUINFO utility correctly recognizes the processor, the classic compiler drivers (ifort and icl) do not pass the flag -mGLOB_advanced_optim=TRUE when the compilation command includes /Qxhost, even if optimizations have been requested with the /O2 or /fast option. Not only does this result in generating slower code, but it also causes the generated code to use only x87 instructions for all floating point calculations in the 64-bit object code!

When I used /Qxavx2 instead of /Qxhost, I found that the generated program contained AVX2 instructions and ran fine. One would expect that on a Zen2 AMD CPU, since the instruction set contains AVX2 as a subset, specifying /Qxhost should cause generation of AVX2 (or better) instructions, rather than the decades old "legacy" x87 FPU instructions.

When the new LLVM based Intel compilers (ifx and icx) are used, this misbehavior does not occur.

Here are the test programs. Compile to an OBJ file and run dumpbin /disasm on them to see the instructions used.

The Fortran program:

Code: Select all

program tst
   implicit none
   integer, parameter :: NN = 1000
   integer i,n
   real, dimension(NN) :: vx,vy
   double precision s

   n = NN
   do i = 1, n
      vx(i) = i*0.003
      vy(i) = 30.0-vx(i)
   end do
   s = 0
   do i=1,n
      s = s+vx(i)*vy(i)
   end do
   print '(ES20.10)',s
end program
The C program:

Code: Select all

#include <stdio.h>

int main(){

#define NN 100
float vx[NN],vy[NN];
double s;
int i, n;

n = NN; s = 0.0;
for(i=0; i<n; i++){
   vx[i] = (i+1)*0.003;
   vy[i] = 30.0 - vx[i];
   }
for(i=0; i<n; i++)
   s += vx[i]*vy[i];
printf("s = %10.4e\n",s);
}

Post Reply