Agner`s CPU blog

Software optimization resources | E-mail subscription to this blog | www.agner.org

 
thread Intel's "cripple AMD" function - Agner Fog - 2009-12-30
reply Intel's - Felid - 2010-01-01
replythread Intel's "cripple AMD" function - inhahe - 2010-01-03
last replythread Intel's - Agner Fog - 2010-01-04
replythread Intel's compiler is the best? - Weber - 2010-01-04
last reply Intel's compiler is the best? - Agner Fog - 2010-01-09
reply Intel article - Agner Fog - 2010-01-22
last replythread Intel's - Deng - 2016-12-11
last replythread Intel's "cripple AMD" function - Biplab Raut - 2019-12-20
last reply Intel's cripple AMD function - Agner - 2019-12-29
replythread Web Parallels - Jeff Craig - 2010-01-04
last replythread More Parallels - Agner Fog - 2010-01-23
reply Early Examples - Yuhong Bao - 2010-02-01
last reply More Parallels - Yuhong Bao - 2010-02-20
replythread New CPUID manipulation program - Agner Fog - 2010-01-22
replythread CPUID manipulation through virtualization - Andrew Lofthouse - 2010-08-16
reply CPUID manipulation through virtualization - Agner Fog - 2010-08-16
replythread CPUID manipulation program for AMD - Agner - 2010-10-01
last replythread CPUID manipulation program for AMD - Ralf - 2012-01-30
last reply CPUID manipulation program for AMD - Agner - 2012-01-31
last reply CPUID manipulation through virtualization - akshay - 2015-07-08
last replythread New CPUID manipulation program - AVK - 2011-02-09
last reply New CPUID manipulation program - Agner - 2011-02-09
reply AMD Blog on compilers/benchmarch - margaret lewis - 2010-02-01
replythread New version is still crippling Intel's competitors - Agner Fog - 2010-06-29
last reply New version is still crippling Intel's competitors - granyte - 2014-09-16
reply Out of court settlement with FTC - Agner Fog - 2010-08-05
reply AMD library contains Intel's cripple-AMD function! - Agner Fog - 2010-08-11
replythread Common math programs are affected - Agner Fog - 2010-08-20
last reply Preliminary test results for Matlab - Agner Fog - 2010-09-16
replythread Overview of CPU dispatching in Intel software - Agner Fog - 2010-08-23
last reply Overview of CPU dispatching in Intel software - Mingye Wang - 2020-08-31
replythread New Intel compiler version - still the same! - Agner Fog - 2010-09-22
reply GCC now has support for function dispatch - Jean-Luc - 2010-09-27
replythread Intel compiler question - James Russell - 2010-10-11
last reply Intel compiler question - Agner - 2010-10-12
reply New Intel compiler version - still the same! - Don Kretsch - 2010-11-29
last replythread New Intel compiler version - still the same! - Daniel - 2011-12-23
last replythread New Intel compiler version - still the same! - Agner - 2011-12-25
last replythread New Intel compiler version - still the same! - Stanley Theamer - 2012-02-12
last reply New Intel compiler version - still the same! - Stretcho - 2012-03-14
replythread Still no library that is optimal on all processors - Agner - 2012-04-18
replythread Still no library that is optimal on all processors - Guest - 2012-05-17
last replythread Still no library that is optimal on all processors - Agner - 2012-05-17
last replythread Still no library that is optimal on all processors - David - 2012-05-19
last replythread Still no library that is optimal on all processors - Agner - 2012-05-20
last reply Still no library that is optimal on all processors - Bubba_Hotepp - 2012-06-16
last replythread Still no library that is optimal on all processors - Marat Dukhan - 2013-05-20
last replythread Still no library that is optimal on all processors - Agner - 2013-05-21
last replythread This is still going on, wow just wow - Vuurdraak - 2016-11-10
last replythread This is still going on, wow just wow - Agner - 2016-11-10
last replythread This is still going on, wow just wow - Vuurdraak - 2016-11-11
last replythread This is still going on, wow just wow - Denis - 2017-01-02
last replythread This is still going on, wow just wow - Agner - 2017-01-02
replythread RYZEN thoughts? - Noob programmer - 2017-03-10
last replythread RYZEN thoughts? - Chromatix - 2017-03-16
last replythread RYZEN thoughts? - Peter - 2017-04-11
replythread RYZEN thoughts? - Agner - 2017-04-12
last replythread RYZEN thoughts? - Ballsystemlord - 2019-02-12
last reply RYZEN thoughts? - Agner - 2019-02-13
last reply RYZEN thoughts? - itsmydamnation - 2017-04-21
last reply This is still going on, wow just wow - Naoki Shibata - 2017-07-19
replythread A long history of legal antitrust battles - Agner - 2017-07-27
last replythread A long history of legal antitrust battles - Jorcy Neto - 2017-07-27
last replythread A long history of legal antitrust battles - Royi - 2018-02-19
last reply A long history of legal antitrust battles - Agner - 2018-05-15
reply Intel's "cripple AMD" function - PCPMD - 2019-02-27
replythread Patches and workarounds - Neville C - 2019-11-21
last reply Patches and workarounds - Mingye Wang - 2020-09-01
replythread Intel's "cripple AMD" function - Walker - 2020-06-29
last replythread Intel's "cripple AMD" function - Forsen - 2020-09-16
reply Intel's - Agner - 2020-09-16
last replythread Intel's "cripple AMD" function - ETERNALBLUEbullrun - 2021-11-28
last reply Intel's - Agner - 2021-11-28
last replythread New Intel compiler. Latest update - Agner - 2022-08-08
last replythread MKL performance on AMD with the new compiler - Gil Moses - 2022-08-22
last reply MKL performance on AMD with the new compiler - Agner - 2022-08-22
 
Intel's "cripple AMD" function
Author: Agner Fog Date: 2009-12-30 10:22

Will Intel be forced to remove the "cripple AMD" function from their compiler?

Many software programmers consider Intel's compiler the best optimizing compiler on the market, and it is often the preferred compiler for the most critical applications. Likewise, Intel is supplying a lot of highly optimized function libraries for many different technical and scientific applications. In many cases, there are no good alternatives to Intel's function libraries.

Unfortunately, software compiled with the Intel compiler or the Intel function libraries has inferior performance on AMD and VIA processors. The reason is that the compiler or library can make multiple versions of a piece of code, each optimized for a certain processor and instruction set, for example SSE2, SSE3, etc. The system includes a function that detects which type of CPU it is running on and chooses the optimal code path for that CPU. This is called a CPU dispatcher. However, the Intel CPU dispatcher does not only check which instruction set is supported by the CPU, it also checks the vendor ID string. If the vendor string says "GenuineIntel" then it uses the optimal code path. If the CPU is not from Intel then, in most cases, it will run the slowest possible version of the code, even if the CPU is fully compatible with a better version.

I have complained about this behavior for years, and so have many others, but Intel have refused to change their CPU dispatcher. If Intel had advertised their compiler as compatible with Intel processors only, then there would probably be no complaints. The problem is that they are trying to hide what they are doing. Many software developers think that the compiler is compatible with AMD processors, and in fact it is, but unbeknownst to the programmer it puts in a biased CPU dispatcher that chooses an inferior code path whenever it is running on a non-Intel processor. If programmers knew this fact they would probably use another compiler. Who wants to sell a piece of software that doesn't work well on AMD processors?

Because of their size, Intel can afford to put more money into their compiler than other CPU vendors can. The Intel compiler is relatively cheap, it has superior performance, and the support is excellent. Selling such a compiler is certainly not a profitable business in itself, but it is obviously intended as a way of supporting Intel's microprocessors. There would be no point in adding new advanced instructions to the microprocessors if there were no tools to use these instructions. AMD is also making a compiler, but the current version supports only Linux, not Windows.

Various people have raised suspicion that the biased CPU dispatching has made its way into common benchmark programs (link link). This is a serious issue indeed. We know that many customers base their buying decision on published benchmark results, and a biased benchmark means an unfair market advantage worth billions of dollars.

  

The legal battle

AMD have sued Intel for unfair competition at least since 2005, and the case has been settled in November 2009. This settlement deals with many issues of unfair competition, apparently including the Intel compiler. The settlement says:

2.3 TECHNICAL PRACTICES

Intel shall not include any Artificial Performance Impairment in any Intel product or require any Third Party to include an Artificial Performance Impairment in the Third Party's product. As used in this Section 2.3, "Artificial Performance Impairment" means an affirmative engineering or design action by Intel (but not a failure to act) that (i) degrades the performance or operation of a Specified AMD product, (ii) is not a consequence of an Intel Product Benefit and (iii) is made intentionally to degrade the performance or operation of a Specified AMD Product. For purposes of this Section 2.3, "Product Benefit" shall mean any benefit, advantage, or improvement in terms of performance, operation, price, cost, manufacturability, reliability, compatibility, or ability to operate or enhance the operation of another product.

In no circumstances shall this Section 2.3 impose or be construed to impose any obligation on Intel to (i) take any act that would provide a Product Benefit to any AMD or other non-Intel product, either when such AMD or non-Intel product is used alone or in combination with any other product, (ii) optimize any products for Specified AMD Products, or (iii) provide any technical information, documents, or know how to AMD.

This looks like a victory for AMD. If we read "any Intel product" as Intel's compilers and function libraries, "any Third Party" as programmers using these compilers and libraries, and "Artificial Performance Impairment" as the CPU dispatcher checking the vendor ID string; then the settlement puts an obligation on Intel to change their CPU dispatcher. I will certainly check the next version of Intel's compiler and libraries to see if they have done so or they have found a loophole in the settlement.

Interestingly, this is not the end of the story. Only about one month after the AMD/Intel settlement, the US Federal Trade Commission (FTC) filed an antitrust complaint against Intel. The accusations in the FTC complaint are unusually strong:

Intel sought to undercut the performance advantage of non-Intel x86 CPUs relative to Intel x86 CPUs when it redesigned and distributed software products, such as compilers and libraries.
[...]
To the public, OEMs, ISVs, and benchmarking organizations, the slower performance of non-Intel CPUs on Intel-compiled software applications appeared to be caused by the non-Intel CPUs rather than the Intel software. Intel failed to disclose the effects of the changes it made to its software in or about 2003 and later to its customers or the public. Intel also disseminated false or misleading documentation about its compiler and libraries. Intel represented to ISVs, OEMs, benchmarking organizations, and the public that programs inherently performed better on Intel CPUs than on competing CPUs. In truth and in fact, many differences were due largely or entirely to the Intel software. Intel's misleading or false statements and omissions about the performance of its software were material to ISVs, OEMs, benchmarking organizations, and the public in their purchase or use of CPUs. Therefore, Intel's representations that programs inherently performed better on Intel CPUs than on competing CPUs were, and are, false or misleading. Intel's failure to disclose that the differences were due largely to the Intel software, in light of the representations made, was, and is, a deceptive practice. Moreover, those misrepresentations and omissions were likely to harm the reputation of other x86 CPUs companies, and harmed competition.
[...]
Some ISVs requested information from Intel concerning the apparent variation in performance of identical software run on Intel and non-Intel CPUs. In response to such requests, on numerous occasions, Intel misrepresented, expressly or by implication, the source of the problem and whether it could be solved.
[...]
Intel's software design changes slowed the performance of non-Intel x86 CPUs and had no sufficiently justifiable technological benefit. Intel's deceptive conduct deprived consumers of an informed choice between Intel chips and rival chips, and between Intel software and rival software, and raised rivals' costs of competing in the relevant CPU markets. The loss of performance caused by the Intel compiler and libraries also directly harmed consumers that used non-Intel x86 CPUs.

The remedy that the FTC asks for is also quite farreaching:

Requiring that, with respect to those Intel customers that purchased from Intel a software compiler that had or has the design or effect of impairing the actual or apparent performance of microprocessors not manufactured by Intel ("Defective Compiler"), as described in the Complaint:

  1. Intel provide them, at no additional charge, a substitute compiler that is not a Defective Compiler;
  2. Intel compensate them for the cost of recompiling the software they had compiled on the Defective Compiler and of substituting, and distributing to their own customers, the recompiled software for software compiled on a Defective Compiler; and
  3. Intel give public notice and warning, in a manner likely to be communicated to persons that have purchased software compiled on Defective Compilers purchased from Intel, of the possible need to replace that software.

Maybe the FTC has decided that the AMD/Intel settlement was not a fair and sufficient remedy against Intel's monopoly behavior? The settlement compensates AMD, but not VIA and other microprocessor vendors, and not the customers who have been harmed by insufficient competition and by the "defective" software produced with the Intel compiler.

  

My own findings

When I started testing Intel's compiler several years ago, I soon found out that it had a biased CPU dispatcher. Back in January 2007 I complained to Intel about the unfair CPU dispatcher. I had a long correspondence with Intel engineers about the issue, where they kept denying the problem and I kept providing more evidence. They said that:

The CPU dispatch, coupled with optimizations, is designed to optimize performance across Intel and AMD processors to give the best results. This is clearly our goal and with one exception we believe we are there now. The one exception is that our 9.x compilers do not support SSE3 on AMD processors because of the timing of the release of AMD processors vs. our compiler (our compiler was developed before AMD supported SSE3). The future 10.x compilers, which enter beta this quarter and release around the middle of the year, will address this now that we've had time to tune and adjust to the new AMD processors.

Sounds nice, but the truth is that the CPU dispatcher didn't support SSE or SSE2 or any higher SSE in AMD processors and still doesn't today (Intel compiler version 11.1.054). I have later found out that others have made similar complaints to Intel and got similarly useless answers (link link).

The Intel CPU dispatcher does not only check the vendor ID string and the instruction sets supported. It also checks for specific processor models. In fact, it will fail to recognize future Intel processors with a family number different from 6. When I mentioned this to the Intel engineers they replied:

You mentioned we will not support future Intel processors with non-'6' family designations without a compiler update. Yes, that is correct and intentional. Our compiler produces code which we have high confidence will continue to run in the future. This has the effect of not assuming anything about future Intel or AMD or other processors. You have noted we could be more aggressive. We believe that would not be wise for our customers, who want a level of security that their code (built with our compiler) will continue to run far into the future. Your suggested methods, while they may sound reasonable, are not conservative enough for our highly optimizing compiler. Our experience steers us to issue code conservatively, and update the compiler when we have had a chance to verify functionality with new Intel and new AMD processors. That means there is a lag sometime in our production release support for new processors.

In other words, they claim that they are optimizing for specific processor models rather than for specific instruction sets. If true, this gives Intel an argument for not supporting AMD processors properly. But it also means that all software developers who use an Intel compiler have to recompile their code and distribute new versions to their customers every time a new Intel processor appears on the market. Now, this was three years ago. What happens if I try to run a program compiled with an old version of Intel's compiler on the newest Intel processors? You guessed it: It still runs the optimal code path. But the reason is more difficult to guess: Intel have manipulated the CPUID family numbers on new processors in such a way that they appear as known models to older Intel software. I have described the technical details elsewhere.

Perhaps the initial design of Intel's CPU dispatcher was indeed intended to optimize for known processor models only, without regard for future models. If any of my students had made such a solution that was not future-oriented, I would consider it a serious flaw. Perhaps the Intel engineers discovered the missing support for future processors too late so that they had to design the next generation of their processors in such a way that they appeared as known models to existing Intel software.

After Intel had flatly denied to change their CPU dispatcher, I decided that the most efficient way to make them change their minds was to create publicity about the problem. I contacted several IT magazines, but nobody wanted to write about it. Sad, but not very surprising, considering that they all depend on advertising money from Intel. The only publicity was my own optimization manual where I have described the problem in detail and given instructions on how to replace the unfair CPU dispatcher. I wonder why AMD didn't create public awareness about the problem. Were they obliged to keep quiet about an ungoing lawsuit? And what about VIA/Centaur?

  

Workarounds

At present, we don't know if or when Intel will make a new compiler and new software libraries that do not check the vendor ID string. In the meantime, here is what we can do about the problem.

  • Use another compiler. In my tests, the Gnu compiler for Linux has an optimizing performance similar to the Intel compiler, but the Gnu function library (glibc) is inferior. All other compilers gave lower performance in my tests. There is no other Windows compiler with a similar performance, not even the Gnu compiler for Windows.
      
  • Use the Intel software and patch the CPU dispatcher. In my C++ manual, I have provided the code for alternative CPU dispatchers for Intel's compiler and function libraries and descriptions on how to patch them into your software. This, of course, relies on undocumented details of the Intel software. This dispatcher-patch can improve performance on non-Intel processors considerably in many cases.
      
  • Never trust any benchmark unless it is open source and compiled with a neutral compiler, such as Gnu or Microsoft.
       
  • It is possible to change the CPUID of AMD processors by using the AMD virtualization instructions. I hope that somebody will volunteer to make a program for this purpose. This will make it easy for anybody to check if their benchmark is fair and to improve the performance of software compiled with the Intel compiler on AMD processors.

  

Links

My Discussion in Aceshardware forum 2007.

Discussion in AMD Developer Forums 2008.

My Discussion in AMDzone 2009.

Discussion in comp.arch 2004.

Complaint to Intel 2004, discussion in slashdot.org.

Mark Mackey, complaint to Intel 2005.

PCMark 2005 benchmark proven unfair. Arstechnica.

Testimony by John Oram regarding BAPCo benchmark organization.

Comment on AMD Developer Central 2005.

AMD files lawsuit 2005.

AMD untitrust complaints 2005.

Settlement agreement between AMD and Intel, 2009.

FTC complaint 2009.

Technical details in my C++ optimization manual.

[Added later:]

Discussion on XtremeSystems Forum.

Discussion on OSnews.

   
Intel's
Author:  Date: 2010-01-01 12:50
About
"It is possible to change the CPUID of AMD processors by using the AMD virtualization instructions. I hope that somebody will volunteer to make a program for this purpose."
How exactly we can do that? As of now the simpliest option is to change vendor strings embedded in executables to compare with CPU's. We're at iXBT.com are doing this right now and will publish test results soon (Intel & AMD CPU's, Intel <-> AMD & AMD <-> Intel string change, 4-6 runs per app, 50-100 app's). If it is OK to use your findings (with all the copyrights), we will :)

Felid

   
Intel's "cripple AMD" function
Author:  Date: 2010-01-03 18:37
I think it's arguable whether or not Intel crippled AMD via an "affirmative engineering or design action" as opposed to a "failure to act" (as distinguished in the settlement). They optimize code paths based on Vendor ID, which obviously has to do with more than supported instruction sets, or they wouldn't have to do it by processor family, etc. I doubt Intel can be required to optimize specifically for a CPU that's not theirs (also implying they wouldn't necessarily know its internal workings in order to do so).

Yes, it may be true that the same code paths they used for Intel CPUs would have optimized well for AMD CPUs, but Intel probably isn't required to know that. Also, their code path optimizations are apparently targeted for specific vendors (namely, theirs, I guess) and models, not generic strategies, so it would have been a hack for them to apply an Intel-derived code path to AMD processors, because, while being better than no optimizer, the code might also include superfluous and deleterious algorithms as applied to AMD CPUs. Yes, it's pragmatic to do so and would have been better for consumers, but can we ethically require them to engineer a dirty hack into their programming structure? Another reason it's a gray area is that it's possible that the code path optimizations they took were obvious and would likely apply to any modern x86 CPU (though the fact that AMD and Intel are the only two players in the game sort of makes it beg the question), and if only some of them were obvious those optimizations may or may not be modularized already for easy extraction and universal use.

From a business standpoint, this was obviously a strategic decision on Intel's part to undermine AMD, but if it's an ideological grey area whether they undermined AMD or simply took advantage of the fact that they happen to be the producers of one of the CPUs they compile for in order to improve their compiler for their *own* product, then they can't necessarily be faulted, because the law requires evidence and analysis. (Taking in-house advantage isn't necessarily anti-competitive in itself; for example, Microsoft makes plenty of applications that run on their own OS but not on Linux or Mac OS.)

It is true that the settlement they issued flat-out admits wrongdoing; i.e., it implies that they did, in fact, effect an "affirmative or design action" meant to undermine AMD. As far as I can see there are three possible reasons for this.
1. We don't know all the details regarding how their code paths engine was implemented and other contingent issues. Being that it's a gray area, depending on these details we don't know, they could have been clearly in the wrong and thus admitted it.
2. It's a settlement; settlements are always political. Therefore the fact that they made a statement implying wrongdoing doesn't necessarily mean a whole lot. In fact, the excerpt shown doesn't really say *anything* except that Intel will not in the future do things that are already clearly classified as anti-competitive. In other words, it's a formal statement that they won't break the law... and not being able to break the law is already implied.
3. As per Reason 2, given the small excerpt of the settlement shown, it seems possible to me that what they *actually* did is make something up that will sate AMD's lawyers while at the same time leaving the door open for them to either continue the same practice, or cease the practice (if it's too obviously anti-competitive or if they explicitly said they'd cease it elsewhere) but instate similar and/or related practices in the future, on account of the fact that those practices can easily be classified as "failures to act."

Of course it complicates the matter that there is actually a complete instruction set (SSE) that they don't support for the AMD. They have an excuse for that. Whether or not it's valid who knows. Since they do support that instruction set, it would seem deliberate "affirmative action" to disable it on AMD since AMD supports it in exactly the same way that Intel does on a purely functional level (independently of speed), and SSE is generally more efficient when applicable. However, if the decisions for how and when to use SSE instructions are intricately tied in with the rest of their code path algorithm (and possibly rely on internal structure of the CPU design), then the caveats I brought up earlier still apply.

In any case, whether not supporting optimizations on AMD's CPUs was an affirmative design decision to undermine AMD machines or merely a failure to act (to benefit AMD machines), either way, it's clearly wrong for them to publish benchmarks to OEMs, etc. comparing AMD CPUs to Intel CPUs using their own compiler that specifically optimizes for Intel CPUs (based on Vendor ID no less!, but either way) and not for AMD CPUs. It's misleading, and according to the UTC, even when specifically confronted with the issue they would habitually either mislead or directly lie about the cause for the speed difference and whether it could be solved. So *that's* the part that's really devious, and I can see why the FTC sued them. I *hate* companies like that. Incidentally, though, all companies are companies like that.

   
Intel's
Author: Agner Fog Date: 2010-01-04 03:28
inhahe wrote:
I think it's arguable whether or not Intel crippled AMD via an "affirmative engineering or design action" as opposed to a "failure to act" (as distinguished in the settlement).
Checking for vendor ID is an affirmative action. The grey area is whether they are optimizing for specific CPU models or for specific instruction sets. There is only one case where they distinguish between different CPU models that have the same instruction set, namely Pentium 4 versus Pentium M. In most cases, however, they use the same code path for both, or the two paths are identical or almost identical. The distinction may be unimportant from a technical point of view, but it may give Intel a legal excuse for claiming that they are optimizing for specific CPU models.
I doubt Intel can be required to optimize specifically for a CPU that's not theirs
The settlement doesn't require that.
Another reason it's a gray area is that it's possible that the code path optimizations they took were obvious and would likely apply to any modern x86 CPU (though the fact that AMD and Intel are the only two players in the game sort of makes it beg the question),
Most optimizations are indeed obvious applications of the available instruction set. If you have SSE2 you can do four additions in one instruction. That's an obvious thing to do regardless of CPU model. Don't forget there is a third player, VIA. Their chips are fast enough for being relevant here.
given the small excerpt of the settlement shown, it seems possible to me that what they *actually* did is make something up that will sate AMD's lawyers while at the same time leaving the door open for them to either continue the same practice, or cease the practice (if it's too obviously anti-competitive or if they explicitly said they'd cease it elsewhere) but instate similar and/or related practices in the future, on account of the fact that those practices can easily be classified as "failures to act." [...] However, if the decisions for how and when to use SSE instructions are intricately tied in with the rest of their code path algorithm (and possibly rely on internal structure of the CPU design), then the caveats I brought up earlier still apply.
Yes, they will probably be able to claim that. From a merely technical perspective, I think it's a bad idea to make different code paths for two processors that support the same instruction set based on whether a particular instruction runs a little faster on one than on the other. If you consider the time it takes to develop a complete program plus the time it takes to market it, then it is likely that the processors you optimized for will be obsolete for your most demanding customers before the time your software peaks on the market. My advise would certainly be to optimize for the newest processor, but make sure you maintain compatibility with older processors.

But of course Intel compiler engineers are not obliged to listen to my advice if doing otherwise enables them to harm their competitors.

In any case, whether not supporting optimizations on AMD's CPUs was an affirmative design decision to undermine AMD machines or merely a failure to act (to benefit AMD machines), either way, it's clearly wrong for them to publish benchmarks to OEMs, etc. comparing AMD CPUs to Intel CPUs using their own compiler that specifically optimizes for Intel CPUs (based on Vendor ID no less!, but either way) and not for AMD CPUs. It's misleading, and according to the UTC, even when specifically confronted with the issue they would habitually either mislead or directly lie about the cause for the speed difference and whether it could be solved. So *that's* the part that's really devious, and I can see why the FTC sued them. I *hate* companies like that. Incidentally, though, all companies are companies like that.
Fortunately, not all companies are like that. I am sure this case has harmed Intel's reputation. They can be damn sure that their next compiler version will be thoroughly scrutinized. Hopefully, they will take their reputation into account when they design the next compiler version and function libraries.
   
Intel's compiler is the best?
Author: Weber Date: 2010-01-04 16:46
You wrote: "There is no other Windows compiler with a similar performance, not even the Gnu compiler for Windows."

Which compilers did you test?

Did you try e.g. the Portland Group (PGI) compiler? That means, does Intel's compiler (icc) produce faster executables than e.g. the PGI compiler for Intel CPU's? Does icc (with a patched CPU dispatcher) even produce faster executables than e.g. PGI for AMD CPU's?

Not that I'd try to challenge your claims (I haven't benchmarked any of the compilers), I'm just interested. Thanks! :-)

   
Intel's compiler is the best?
Author: Agner Fog Date: 2010-01-09 04:23
Weber wrote:
Which compilers did you test?
You can see my comparison of compilers in my C++ manual. The PathScale and PGI compilers are also fairly good, but not the best.

[Update 2019:] The newest versions of Gnu and Clang C++ compilers are now optimizing better than the Intel compiler in my tests.

   
Intel article
Author: Agner Fog Date: 2010-01-22 04:04
Intel have just published an article on how the CPU dispatching works in the Intel Performance Primitives (IPP) function library, see software.intel.com/en-us/articles/intel-integrated-performance-primitives-intel-ipp-understanding-cpu-optimized-code-used-in-intel-ipp/

The article indicates a fair handling of non-Intel processors in the IPP library. This is in accordance with my test results.

What the article doesn't mention is the unfair CPU dispatching in several other Intel function libraries :-)

   
Intel's
Author: Deng Date: 2016-12-11 20:43
Recently I tested the inteldispatchpatch.zip (2014-07-30) in asmlib on MKL library in Intel Parallel Studio 2015 Update 3.
It gave me error like "Intel MKL ERROR: CPU 1 is not supported." on AMD servers. (from mkl_blas_xdgemv ()). If removing the 'dispatchpatch64.o' during linking, the MKL works on AMD servers.

It seems to me that the dispatch does not work on newer Intel MKL. do you know more way how to patch it correct?
Agner Fog wrote:

inhahe wrote:
I think it's arguable whether or not Intel crippled AMD via an "affirmative engineering or design action" as opposed to a "failure to act" (as distinguished in the settlement).
Checking for vendor ID is an affirmative action. The grey area is whether they are optimizing for specific CPU models or for specific instruction sets. There is only one case where they distinguish between different CPU models that have the same instruction set, namely Pentium 4 versus Pentium M. In most cases, however, they use the same code path for both, or the two paths are identical or almost identical. The distinction may be unimportant from a technical point of view, but it may give Intel a legal excuse for claiming that they are optimizing for specific CPU models.
I doubt Intel can be required to optimize specifically for a CPU that's not theirs
The settlement doesn't require that.
Another reason it's a gray area is that it's possible that the code path optimizations they took were obvious and would likely apply to any modern x86 CPU (though the fact that AMD and Intel are the only two players in the game sort of makes it beg the question),
Most optimizations are indeed obvious applications of the available instruction set. If you have SSE2 you can do four additions in one instruction. That's an obvious thing to do regardless of CPU model. Don't forget there is a third player, VIA. Their chips are fast enough for being relevant here.
given the small excerpt of the settlement shown, it seems possible to me that what they *actually* did is make something up that will sate AMD's lawyers while at the same time leaving the door open for them to either continue the same practice, or cease the practice (if it's too obviously anti-competitive or if they explicitly said they'd cease it elsewhere) but instate similar and/or related practices in the future, on account of the fact that those practices can easily be classified as "failures to act." [...] However, if the decisions for how and when to use SSE instructions are intricately tied in with the rest of their code path algorithm (and possibly rely on internal structure of the CPU design), then the caveats I brought up earlier still apply.
Yes, they will probably be able to claim that. From a merely technical perspective, I think it's a bad idea to make different code paths for two processors that support the same instruction set based on whether a particular instruction runs a little faster on one than on the other. If you consider the time it takes to develop a complete program plus the time it takes to market it, then it is likely that the processors you optimized for will be obsolete for your most demanding customers before the time your software peaks on the market. My advise would certainly be to optimize for the newest processor, but make sure you maintain compatibility with older processors.

But of course Intel compiler engineers are not obliged to listen to my advice if doing otherwise enables them to harm their competitors.

In any case, whether not supporting optimizations on AMD's CPUs was an affirmative design decision to undermine AMD machines or merely a failure to act (to benefit AMD machines), either way, it's clearly wrong for them to publish benchmarks to OEMs, etc. comparing AMD CPUs to Intel CPUs using their own compiler that specifically optimizes for Intel CPUs (based on Vendor ID no less!, but either way) and not for AMD CPUs. It's misleading, and according to the UTC, even when specifically confronted with the issue they would habitually either mislead or directly lie about the cause for the speed difference and whether it could be solved. So *that's* the part that's really devious, and I can see why the FTC sued them. I *hate* companies like that. Incidentally, though, all companies are companies like that.
Fortunately, not all companies are like that. I am sure this case has harmed Intel's reputation. They can be damn sure that their next compiler version will be thoroughly scrutinized. Hopefully, they will take their reputation into account when they design the next compiler version and function libraries.
   
Intel's "cripple AMD" function
Author:  Date: 2019-12-20 03:51
Hi Agner,
I have read your blog and tutorial (Optimizing software in C++) with special focus on section 13.7 (page 139 - 141).
I am trying to run MKL2019 on AMD cpu on Ubuntu 18.04 by changing the CPU dispatcher as per your code snippet given in intel_mkl_feature_patch.c (from asmlib.zip).
I added your CPU dispatcher snippet into the application and tried to compile by additionally linking libmkl_core.so.
But I got the following errors:-
undefined reference to __intel_mkl_feature_indicator
undefined reference to __intel_mkl_feature_indicator_x
undefined reference to __intel_mkl_features_init_x
undefined reference to __intel_mkl_feature_indicator_x
undefined reference to __intel_mkl_feature_indicator

When I directly use dispatchpatch64.o from amslib.zip, I get undefined reference to intel_mkl_patch.

Can you please let me know other system dependencies required to use your patch? Or that new MKL's CPU dispatcher has changed and the above functions no longer exist.

Thanks,
Biplab Raut

Deng wrote:

Recently I tested the inteldispatchpatch.zip (2014-07-30) in asmlib on MKL library in Intel Parallel Studio 2015 Update 3.
It gave me error like "Intel MKL ERROR: CPU 1 is not supported." on AMD servers. (from mkl_blas_xdgemv ()). If removing the 'dispatchpatch64.o' during linking, the MKL works on AMD servers.

It seems to me that the dispatch does not work on newer Intel MKL. do you know more way how to patch it correct?
Agner Fog wrote:

inhahe wrote:
I think it's arguable whether or not Intel crippled AMD via an "affirmative engineering or design action" as opposed to a "failure to act" (as distinguished in the settlement).
Checking for vendor ID is an affirmative action. The grey area is whether they are optimizing for specific CPU models or for specific instruction sets. There is only one case where they distinguish between different CPU models that have the same instruction set, namely Pentium 4 versus Pentium M. In most cases, however, they use the same code path for both, or the two paths are identical or almost identical. The distinction may be unimportant from a technical point of view, but it may give Intel a legal excuse for claiming that they are optimizing for specific CPU models.
I doubt Intel can be required to optimize specifically for a CPU that's not theirs
The settlement doesn't require that.
Another reason it's a gray area is that it's possible that the code path optimizations they took were obvious and would likely apply to any modern x86 CPU (though the fact that AMD and Intel are the only two players in the game sort of makes it beg the question),
Most optimizations are indeed obvious applications of the available instruction set. If you have SSE2 you can do four additions in one instruction. That's an obvious thing to do regardless of CPU model. Don't forget there is a third player, VIA. Their chips are fast enough for being relevant here.
given the small excerpt of the settlement shown, it seems possible to me that what they *actually* did is make something up that will sate AMD's lawyers while at the same time leaving the door open for them to either continue the same practice, or cease the practice (if it's too obviously anti-competitive or if they explicitly said they'd cease it elsewhere) but instate similar and/or related practices in the future, on account of the fact that those practices can easily be classified as "failures to act." [...] However, if the decisions for how and when to use SSE instructions are intricately tied in with the rest of their code path algorithm (and possibly rely on internal structure of the CPU design), then the caveats I brought up earlier still apply.
Yes, they will probably be able to claim that. From a merely technical perspective, I think it's a bad idea to make different code paths for two processors that support the same instruction set based on whether a particular instruction runs a little faster on one than on the other. If you consider the time it takes to develop a complete program plus the time it takes to market it, then it is likely that the processors you optimized for will be obsolete for your most demanding customers before the time your software peaks on the market. My advise would certainly be to optimize for the newest processor, but make sure you maintain compatibility with older processors.

But of course Intel compiler engineers are not obliged to listen to my advice if doing otherwise enables them to harm their competitors.

In any case, whether not supporting optimizations on AMD's CPUs was an affirmative design decision to undermine AMD machines or merely a failure to act (to benefit AMD machines), either way, it's clearly wrong for them to publish benchmarks to OEMs, etc. comparing AMD CPUs to Intel CPUs using their own compiler that specifically optimizes for Intel CPUs (based on Vendor ID no less!, but either way) and not for AMD CPUs. It's misleading, and according to the UTC, even when specifically confronted with the issue they would habitually either mislead or directly lie about the cause for the speed difference and whether it could be solved. So *that's* the part that's really devious, and I can see why the FTC sued them. I *hate* companies like that. Incidentally, though, all companies are companies like that.
Fortunately, not all companies are like that. I am sure this case has harmed Intel's reputation. They can be damn sure that their next compiler version will be thoroughly scrutinized. Hopefully, they will take their reputation into account when they design the next compiler version and function libraries.
   
Intel's cripple AMD function
Author: Agner Date: 2019-12-29 10:32
You need to link the patch before any Intel function library.


I just found another function that needs patching. The function named
int mkl_serv_intel_cpu()
returns 1 when running on an Intel CPU and 0 when running on another brand of CPU. I have provided a patch at
/www.agner.org/optimize/intel_dispatch_patch.zip

   
Web Parallels
Author:  Date: 2010-01-04 10:12
Really fascinating article. I've generally deferred to open source compilers, like GCC, but that's largely been because of preferring open source to proprietary software when possible. I'd not realized that Intel was doing this in their compilers, which is a major concern for me, since I purchased an Intel desktop processor for home use in well over a decade.

What I found most interesting, is how this seems to parallel a fight we've been having in the web development community a lot over the last five years or so, on the benefits of feature detection versus browser detection. I'll defer to Nicholas Zakas' explanation here: www.nczonline.net/blog/2009/12/29/feature-detection-is-not-browser-detection/

Now, on the web it's a bit trickier, web browsers' behaviour tends to be considerably more variable than x86 behaviour across different vendors, but ultimately, the desire is the same. Check what is supported, and select code-paths based on what's supported, not based on whether you're on IE or Safari or Firefox, etc. And the bad design is generally made by the developer, not some third-party vendor (though not always, as Nicholas talks about in his post).

I'm going to be publicizing this issue, to the best of my ability, since I think it's important that people understand the implications of choosing to use Intel's compiler, which is otherwise quite high quality. And, while I didn't understand why the antitrust suit was filed by the US government when it was announced, I'm certainly far more interested in it's outcome than I was before.

   
More Parallels
Author: Agner Fog Date: 2010-01-23 02:28
I have always thought that it is bad programming practice to make software that relies on specific CPU brands and models. Any program that makes assumptions based on the CPU model number is likely to be obsolete as soon as a new processor appears on the market.

Unfortunately, this practice is more widespread than I thought. Here are some examples:

  1. According to CNET news, Skype has made a deal with Intel to limit the functionality on non-Intel computers, alleging that contemporary AMD processors were not powerful enough. See CNET News and Slashdot News. This limitation has later been removed.
     
  2. 64-bit Windows will only run on processors with known vendor names. This is a serious barrier for new companies to enter the x86 processor market and it is an obstackle to emulation. Same problem in FreeBSD. In fact, VIA had to make a feature to change the vendor string in their processors in order to run 64-bit Windows. See analysis by Geoff Chappell. Incidentally, this feature made it possible for me to make my CPUID manipulation program.
     
  3. An Intel employee is making a separate branch for the Intel Atom processor in a certain function in the Gnu Libc library (see libc-alpha archive). Intel also have a separate branch for the Atom processor in their IPP library. If this practice spreads then the libraries will soon be bloated with many separate branches for different brands and models of microprocessors. In my opinion, the branching should be based on certain performance characteristics, not on specific processor models. If necessary, add CPUID bits for specific performance parameters, or let the software test which branch is fastest.
     
Thank you to Yuhong Bao for sending me the first two examples.
   
Early Examples
Author:  Date: 2010-02-01 22:03
In fact, it dates to when AMD released the Athlon XP back in 2001, which was I think the first non-Intel processor that supported SSE. Back then it was discovered that Windows Movie Maker 1.1 shipped with the original RTM release of Windows XP as well as Windows Movie Encoder 7 did not use SSE on non-Intel processors. Luckily by the time of the XP launch in October 2001, MS was ready with WMM 1.2, as well as WME 7.1, which removed the vendor check, and AMD itself had patches as well. BTW, It was reported that it accounted for a dip in the Sysmark 2001 and Winstone 2002 Content Creation benchmarks.
References:
www.geek.com/articles/chips/wmm-v12-adds-athlon-xp-sse-support-for-wme-20020110/
www.anandtech.com/showdoc.aspx?i=1543&p=5
www.tomshardware.com/forum/68600-28-toms-super-comparision
   
More Parallels
Author:  Date: 2010-02-20 23:56
Also, currently OpenSolaris checks for a vendor of GenuineIntel before using SSSE3 and later extensions. While that wasn't a problem initially when the code was written because only Intel implemented them and AMD was going to go their own path with SSE5, VIA now implement them too with the Nano, and AMD later decided to change and implement all of them up to AVX in Bulldozer.

Bug has been filed at:
defect.opensolaris.org/bz/show_bug.cgi?id=14706

   
New CPUID manipulation program
Author: Agner Fog Date: 2010-01-22 03:54
It it possible to manipulate the CPUID in VIA Nano processors. This feature is currently undocumented, and it is different from the method described in manuals for earlier VIA processors. I have got the necessary information now, and I have made a little program that can change the CPUID vendor string, family and model number on a VIA Nano processor.

You can download the program from www.agner.org/optimize/cpuidfake.zip. It is open source (GPL). Please read the included instructions.

The program requires a VIA Nano processor, and Windows (32 or 64 bit). This program makes it possible to test if a benchmark result depends on the CPUID vendor string. You can also test if the performance of other CPU-intensive programs perform differently depending on the CPUID.

If you find any benchmark or other generally used CPU-intensive program that appears to have an unfair CPU dispatching then please let me know. You may also contact the producer of the program and ask if it has been built with an Intel compiler or Intel function libraries. It is important to know how widespread this problem is.

   
CPUID manipulation through virtualization
Author:  Date: 2010-08-16 08:31
If you do not have a VIA processor, you can also test applications using a VMWare virtual machine. If VMWare is using hardware virtualization, all cpuid instructions are intercepted and hence can be spoofed. Using the following lines in my .vmx file, I can change the vendor_id string from GenuineIntel (I have a Core 2 Duo) to AuthenticAMD:

cpuid.0.ebx="0110:1000:0111:0100:0111:0101:0100:0001"
cpuid.0.edx="0110:1001:0111:0100:0110:1110:0110:0101"
cpuid.0.ecx="0100:0100:0100:1101:0100:0001:0110:0011"

I've verified the behavior of Intel's Compiler using this method...

   
CPUID manipulation through virtualization
Author: Agner Fog Date: 2010-08-16 12:24
Andrew Lofthouse wrote:
If VMWare is using hardware virtualization, all cpuid instructions are intercepted and hence can be spoofed.
Thanks a lot for the tip. Now everybody can test if their software has vendor-specific performance. You can get a 60 days evaluation from VMWare or get the server version for free.

By analogy to Andrew's code, I assume that you can make an AMD processor spoof to be "GenuineIntel" with these lines:

cpuid.0.ebx="0111:0101:0110:1110:0110:0101:0100:0111"
cpuid.0.edx="0100:1001:0110:0101:0110:1110:0110:1001"
cpuid.0.ecx="0110:1100:0110:0101:0111:0100:0110:1110"
The Intel software also checks the family number, which should be set to 6:
cpuid.1.eax="0000:0000:0000:0001:0000:0110:0111:0001"
You can verify this with CPUID.

Anybody who finds software with a strong performance effect of the vendor string are welcome to post the details here or mail them to me.

I am currently testing this effect in various math programs. I will post the findings here soon.

   
CPUID manipulation program for AMD
Author: Agner Date: 2010-10-01 06:11
Here is a CPUID manipulation program for AMD processors that I have received from a Russian friend. link. I haven't tested it. Use it at your own risk. Explanation.
   
CPUID manipulation program for AMD
Author:  Date: 2012-01-30 19:15
I just came across this:
https://github.com/jimenezrick/patch-AuthenticAMD
Interesting.
Not what I want though, I'm googling for a way to make KVM return "GenuineIntel" for the CPUID instruction.
   
CPUID manipulation program for AMD
Author: Agner Date: 2012-01-31 01:56
Ralf wrote:
I just came across this: https://github.com/jimenezrick/patch-AuthenticAMD
Thanks for the reference. The patch program is not a perfect solution. I would prefer that the patch skips the vendor string test rather than replace it with a test for "AuthenticAMD" so that it would work with any vendor. It should also remove the check for CPU family number. But it's a quick solution in some cases, including Intel .DLL and .SO library files. Try it at your own risk, and make sure you make a backup first :)
   
CPUID manipulation through virtualization
Author:  Date: 2015-07-08 04:13
Hi,

I have been trying to install mac os 10.8.2 using souldev teams tutorial, in vmware.
I have used hardware bypass software in download (vmware patch) and still it shows the same error.
I also tried to add the "cpuid" code line in .vmx file and no result.

cpu- amd athlon
asus motherboard
4gb ram
windows 7 ultimate 64 bit

tried with several versions of vmware (8,9 series) and same error every where
I have been searching for the solutions since weeks and no result.
PLEASE some body help me out guys.

Thanks advance.

   
New CPUID manipulation program
Author: AVK Date: 2011-02-09 00:55
Mr. Fog,

Have you read AMD #43170 document "BKDG for AMD Processors Family 14h?" In there, on page 319, I've noticed an interesting table 102 named as: "Reset mapping for CPUID Fn0000_0000_E[B,C,D]X". What do you think about the "Reset mapping" sentence? Does it mean that AMD CPUs finally get a feature to change its CPUID.Vendor string like VIA CPUs have? If it so, would you update your CPUID Fake utility?

   
New CPUID manipulation program
Author: Agner Date: 2011-02-09 02:15
AVK wrote:
Does it mean that AMD CPUs finally get a feature to change its CPUID.Vendor string like VIA CPUs have?
Unfortunately not. It is read-only.
   
AMD Blog on compilers/benchmarch
Author:  Date: 2010-02-01 18:50
Posted a blog on compilers/benchmarks - Chipping Away the Façade on Compilers and Benchmarks for AMD Processors
blogs.amd.com/work/2010/01/22/chipping-away-the-facade-on-compilers-and-benchmarks-for-amd-processors/
   
New version is still crippling Intel's competitors
Author: Agner Fog Date: 2010-06-29 04:16
Intel have released a new version of their Math Kernel Library (v. 10.3) in beta test.

I have tested the new libraries and found that the CPU dispatching works basically the same way as before. The standard math library, vector math library, short vector math library and the 64-bit version of other math kernel library functions still use an inferior code path for non-Intel processors.

I have found the following differences from previous versions:

  • Many functions now have a branch for the forthcoming AVX instruction set, but still only for Intel processors. This will increase the difference in performance between Intel and AMD processors on these functions. Both Intel and AMD are planning to support AVX in 2011.
     
  • The CPU dispatcher for the vector math library has a new branch for non-Intel processors with SSE2. Unlike the generic branch, the new non-Intel SSE2 branch is used only on non-Intel processors, and it is inferior in many cases to the branch used by Intel processors with the same instruction set. The non-Intel SSE2 branch is implemented in the 32-bit Windows version and the 32-bit Linux version, but not in the 64-bit versions of the library.
     
  • A new Summary Statistics library uses the same CPU dispatcher as the vector math library.

Obviously, I haven't tested all functions in the library. There may be more differences that I haven't discovered. But it is clear that many functions in the new version of the library still cripples performance on non-Intel processors. I don't understand how they can do this without violating the legal settlement with AMD.

   
New version is still crippling Intel's competitors
Author:  Date: 2014-09-16 04:37
I recently peeked at the generated code for the intel c++ compiler suite 2015 when on x64 not only is the cpu dispatcher present but it seem there is an entry for AuthenticAMD in it I sadly don't have enough knowledge about x64 assembly to know what is going on
   
Out of court settlement with FTC
Author: Agner Fog Date: 2010-08-05 09:43

Out of court settlement with FTC

Yesterday, the Federal Trade Commission (FTC) announced that they are going for an out of court settlement with Intel. See their press release and proposed decision.

I will comment only on the part of the settlement that deals with Intel's compilers and function libraries. The FTC orders that Intel must inform its software customers about the CPU dispatch mechanism that leads to suboptimal performance on non-Intel CPUs. It also recognizes that certain published benchmarks were misleading. The advantage of an out-of-court settlement is that it is faster. A court battle could take so long time that the issues were obsolete before a decision was made. In a comment, the FTC explains that the purpose of the order is not punitive but remedial.

The settlement with FTC is less far-reaching than the settlement with AMD. The AMD settlement requires that Intel remove any "Artificial Performance Impairment", while the FTC settlement requires only that Intel inform their customers of what they do. This will not solve the problem, only make it more visible. The wording of the settlement is also somewhat ambiguous as to which clauses apply to both the compiler and the function libraries, and which clauses apply to the compiler only. This is unfortunate since many software developers are using the Intel function libraries without using the Intel compiler.

The FTC have asked me to testify in court about the CPU dispatching in Intel's compilers and function libraries. I will not have to do this now, of course, but I will continue to publish my findings here on my blog. I am currently doing a survey of software that is affected by the biased CPU dispatching and I am going to publish the results here soon.

Since Intel have not removed the biased CPU dispatching from their MKL library despite the settlement with AMD, and since the settlement with FTC does not require them to do so, we can expect that the problem will persist.

It is interesting that the FTC in their comment suggests that software developers can override the code dispatch mechanisms in Intel compilers and libraries. This is a technique that I have developed and described in my C++ manual. However, I doubt that commercial software developers will be happy to use such hacking techniques that rely on undocumented features.

The response of the software community will probably be to avoid Intel software products entirely. In my test of the optimizing performance of C++ compilers, the Intel compiler and the Gnu compiler for Linux shared the first place. Unfortunately, the Gnu compiler for Windows is not up to date so we still need a good replacement for the Intel compiler for Windows. It is not a profitable business to make a well optimized math function library. If we cannot use Intel's libraries then we probably have to rely on the open source community for making such libraries. The Gnu function libraries (glibc) are not very well optimized, so there is still a lot of work to do. The work of optimizing the Gnu function libraries is going very slowly and is done mainly by an Intel guy. Why don't AMD and independent programmers contribute to this work to make sure the software performs well on non-Intel processors as well?

After all, the FTC settlement leaves the software community with more problems than we could expect after the AMD settlement. Maybe this reflects the limited power of the FTC?

   
AMD library contains Intel's cripple-AMD function!
Author: Agner Fog Date: 2010-08-11 10:57

This issue is getting more and more absurd the more I dig into it. AMD makes a function library called AMD Core Math Library (ACML) to match Intel's Math Kernel Library (MKL). I have tested a Windows version of ACML and found that some of the functions run faster when the CPU vendor ID is artificially changed to "GenuineIntel". Maybe this is not so surprising after all, since this version of ACML is compiled with Intel's Fortran compiler.

Here are some of the most marked test results:

Execution time
(lower is better)
Faked CPU vendor ID  
ACML function VIA AMD Intel % difference
drandlogistic 1.95 1.96 1.84 6
drandexponential 1.67 1.72 1.57 8
drandlognormal 3.42 3.46 2.99 15
ACML version acml4.4.0-ifort32.exe, VIA L3050 1.8 GHz processor, Windows 7, 32 bit. MS VS 2010 C++. Loop 100000 times * 256 values. Time unit = 109 clock cycles. Average of 20 runs.

On many of the functions in ACML there is little or no difference in performance depending on the CPU vendor ID, but some functions have a significant bias, as shown in the table above. Intel have repeatedly claimed that their compilers give a good performance on AMD chips if you compile for the SSE2 instruction set. Maybe the AMD people have believed this claim, or maybe they had no other option since they couldn't find a better Fortran compiler. With this compiler option, the compiler-generated code will be for the SSE2 instruction set only. I think that Intel first made the SSE2 recommendation at the time when AMD processors supported only SSE2, so this was the best performance you could get at that time. Today, you get suboptimal performance when compiling for SSE2 because later instruction sets are not used. And of course, the code will not work on older computers without SSE2.

To find the reason for the vendor ID effect, I decided to investigate the function with the strongest effect, which is the drandlognormal function. After a lot of detective work, I found that drandlognormal calls a logarithm function in Intel's Short Vector Math Library (SVML). This logarithm function is dispatched into three branches for the SSE2/generic, SSE3, and the future AVX instruction set, respectively. It uses the standard Intel CPU dispatcher, which gives the generic branch to all non-Intel processors. The SVML library supports only SSE2 and above, so the generic branch uses SSE2. When my VIA processor fakes to be an Intel, it gets the SSE3 branch, which is better optimized. The difference in performance is likely to be higher on future processors that support AVX.

There is another version of ACML for Windows built with the PGI compiler, but I couldn't make it work because some library files were missing.

The proposed settlement with FTC requires that Intel shall reimburse its compiler customers for the cost of recompiling their code with a different compiler. While this reimbursement program probably has little more than symbolic significance, it would be funny to see Intel compensating AMD for relying on their compiler. Unfortunately, it will be difficult for AMD to find a better Fortran compiler.

   
Common math programs are affected
Author: Agner Fog Date: 2010-08-20 11:05

Back in January, I made a tool to manipulate the CPUID of VIA processors and published the code here in the hope that somebody would test a lot of programs to see if the performance depends on the CPU vendor. The research staff of a Russian IT webzine iXBT.com offered to help with this. I gave them some equipment that they couldn't get in Russia, and they have tested a lot of programs. Their results are published in Russian, and later also in English.

The Russian researchers found several programs where the performance depended on the vendor name on the CPU. While this is ground for suspicion, it does not necessarily mean that Intel software is involved. It is necessary to make a deeper investigation in order to see if the programs are compiled with an Intel compiler or an Intel function library.

I noticed from the screening results that Matlab and Mathematica were among the programs where the vendor name effect was highest. I decided therefore to make further investigations of mathematical programs and found that Mathcad was also affected. Matlab, Mathematica and Mathcad are the most commonly used math programs at universities and colleges.

Below are the results of my investigations so far on a VIA Nano L3050, 1.8 GHz.

Mathematica

Mathematica version 7.0.1 was tested using the BenchmarkReport function that is included with the package. The overall benchmark result for different (faked) CPU vendors was as follows (average of 5 tests):

Faked vendor and family number Benchmark
(higher is better)
VIA 1.078
AMD 1.102
Intel, (nonexisting) family 7 1.102
Intel, family 6 1.114

A further investigation shows that Mathematica uses the Intel Math Kernel Library (MKL version 10.1 beta, 2008), including the Vector Math Library (VML) which contains optimized code paths used exclusively for Intel processors, and the Gnu Multiple Precision Arithmetic Library (GMP, no version info), which contains Intel-specific and AMD-specific code paths but no VIA-specific code paths. Another executable file (mathdll.dll) with Mathematica's own kernel code contains a check for the Intel vendor string, but I could not find out what the purpose of this check is, and I found no evidence that it originates from Intel software.

The benchmark is based on 15 different tests. Some of these tests were influenced by the CPU ID, others were not. The benchmark for elementary mathematical functions was 26.9% better when the CPU was identified as an Intel than when it was identified as anything else, probably due to the VML. The tests for high precision math was better for both AMD and Intel, probably due to the GMP.

Mathcad

Mathcad version 15.0 was tested with some simple benchmarks made by myself. Matrix algebra was among the types of calculations that were highly affected by the CPU ID. The calculation time for a series of matrix inversions was as follows:

Faked CPU Computation time, s MKL version loaded Instruction set used
VIA Nano 69.6 default 386
AMD Opteron 68.7 default 386
Intel Core 2 44.7 Pentium 3 SSE
Intel Atom 73.9 Pentium 3 SSE
Intel Pentium 4 33.2 Pentium 4 w. SSE3 SSE3
Intel nonexisting fam. 7 69.5 default 386

Using a debugger, I could verify that it uses an old version of Intel MKL (version 7.2.0, 2004), and that it loads different versions of the MKL depending on the CPU ID as indicated in the table above. The speed is more than doubled when the CPU fakes to be an Intel Pentium 4.

It is interesting that this version of MKL doesn't choose the optimal code path for an Intel Core 2. This proves my point that dispatching by CPU model number rather than by instruction set is not sure to be optimal on future processors, and that it sometimes takes years before a new library makes it to the end product. Any processor-specific optimization is likely to be obsolete at that time. In this case the library is six years behind the software it is used in.

Matlab

I haven't got a Matlab package for testing yet, so the detailed results will have to wait. However, it is known that Matlab uses Intel's MKL library. The Russians report that Matlab runs 28% slower when the CPU identifies as a VIA compared to an Intel Core 2.

Apparently, the Matlab people are aware of the problem because they have announced that they are now using Intel's MKL library on Intel machines, and AMD's ACML library on AMD machines for basic linear algebra calculations. However, this is probably no improvement. Our Russian friends reported two years ago that Matlab runs faster with MKL than with ACML on an AMD machine!

It may sound like a fair solution that each CPU vendor makes its own function libraries, but this can soon be a nightmare for the producers of application software. This goes against the very principle of having a standardized instruction set. And apparently, only Intel can afford the costs of optimizing large function libraries on the detailed instruction level. They don't make much money on these function libraries, and it is surely very costly to develop, test and optimize such a big library of complicated mathematical functions, let alone the costs of making a different version for every new instruction set extension. The AMD libraries cannot match this level of optimization, and VIA can hardly afford to make any function libraries at all.

This is the core of the problem. By investing in the development of large, comprehensive and highly optimized math libraries, Intel have obtained a dominating market position in mathematical software, but in a very subtle way. They are not making the application software for the end user; but they are making some of the tools and building blocks for making such software. This enables them to manipulate the performance of this software on the CPUs produced by their competitors. And this manipulation is completely invisible to the end user and perhaps even to the application programmer. In fact, Intel have managed to put their CPU dispatcher into an AMD function library, as revealed in a previous post here.

Intel are putting themselves into an advantageous position by making better function libraries than everybody else, and they are taking advantage of this position by lowering the performance of common mathematical software on the CPUs of their competitors relative to their own. We have probably not seen the end of the legal battles yet.

   
Preliminary test results for Matlab
Author: Agner Fog Date: 2010-09-16 07:20

Preliminary test results for Matlab

I have now verified that the performance of Matlab depends strongly on the CPU vendor string. The benchmark test on my VIA processor gives the following results.

 

Benchmark timing (lower is better)

Faked CPU Matrix LU factorization Fast Fourier Transform Ordinary differential equation Solve sparce matrix 2-D graphics 3-D graphics
VIA 0.7243 0.4415 0.2074 0.5543 1.1418 0.8214
AMD 0.3197 0.4502 0.2201 0.4952 1.1812 0.8179
Intel 0.3161 0.2729 0.2218 0.4958 1.1967 0.7945
Built-in benchmark test on Matlab v. 7.11, 32 bit, Windows 7, VIA Nano L3050, 1.8 GHz. Average of 10 measurements.

These differences in benchmarks are mostly due to the fact that Matlab uses different function libraries for different processors. (The graphics performance is irrelevant here since I have no proper graphics card on my test board).

It is possible to choose different function libraries by modifying two poorly documented configuration files, named blas.spec and fftw.spec.

By modifying these configuration files, I got the following benchmarks for different function libraries on the VIA processor.

 

Benchmark timing (lower is better)

BLAS library Matrix LU factorization Solve sparce matrix Ordinary differential equation
mkl.dll 0.3162 0.4949 0.2213
acml.dll 0.6232 0.7589 0.2355
Default 0.7238 0.5537 0.2075
Benchmark tests on VIA processor with different libraries specified in blas.spec file. Same conditions as above.
 
 

Benchmark timing (lower is better)

FFT libraries Fast Fourier Transform
libfftw3.dll libfftw3f.dll 0.4494
libfftw3i.dll libfftw3f.dll 0.2708
Benchmark tests on VIA processor with different libraries specified in fftw.spec file. Same conditions as above.

This shows that most of the difference in performance can be accounted for by the fact that Matlab has specified different libraries to be used on different processor brands. The Matlab configuration files make specifications only for Intel and AMD processors, while VIA processors get a default library. Apparently, they have never heard about VIA processors. As you can see, the speed can be more than doubled for some tasks by adding an appropriate specification for VIA processors to the configuration files.

Next, I analyzed the library files to see if there was any CPU dispatching inside these libraries. This analysis gave the following results:

mkl.dll
This is Intel's Math Kernel Library version 10.2.3, 32 bit. As mentioned in another posting, this 32-bit version of MKL uses the same instruction sets for Intel and non-Intel processors, while the 64-bit version gives a (minor) advantage to Intel processors over non-Intel processors. What is more important is that this MKL contains another check for the Intel vendor string in connection with a check for the number of processor cores. It looks like multithreading works inferior, or not at all, on non-Intel processors in this library. If this suspicion holds true then it can have quite a dramatic negative effect on the performance on AMD processors. However, I cannot test this with my current test methods because there is no VIA processor with multiple cores yet. I don't have the time to make another test setup right now so unfortunately we can't tell yet if this affects multi-threading on AMD processors. 

acml.dll
This is AMD's Core Math Library, version 4.2.0. This version of ACML is compiled with an Intel compiler, just like the one I have reported about in a previous posting. It contains an Intel CPU dispatcher which enables the SSE2 instruction set only on Intel processors. This has minor effect in this case because only a few functions are affected. Furthermore, it uses Intel's Open MP library for threading. This library may have inferior functionality on non-Intel processors.

default blas library
This library contains no CPU dispatching. It calls several other libraries that do have CPU dispatching, but apparently nothing that favorizes a specific CPU vendor.

FFT libraries
These libraries contain a CPU dispatcher that enables SSE2 in some functions. They are compiled with a Microsoft compiler and they contain no check for the CPU vendor. The library used for AMD and VIA processors (libfftw3.dll) has very little SSE2 code, while the library used for Intel processors (libfftw3i.dll) has more SSE2 code. Reportedly, Matlab have disabled the use of SSE2 on AMD processors because it was inefficient in their tests (link). This decision is probably based on the old AMD K8 processors, while SSE2 is more efficient in newer AMD processors.

My conclusion so far is that the performance of Matlab depends strongly on the CPU vendor string, but this effect is mainly due to suboptimal settings in the configuration files, and this problem can be solved easily by modifying these files. Several of the library files contain Intel CPU dispatchers that favorize Intel processors, but the effect of this is too small to give statistically significant results in my tests.

So far, I have only made tests on a single-core processor. There may be larger effects on multi-core processors, but I have not been able to test this yet. I have made a small test package with the appropriate configuration files and descriptions for my readers to experiment with. You can download it here.

   
Overview of CPU dispatching in Intel software
Author: Agner Fog Date: 2010-08-23 06:38

There are many different versions of Intel compilers and function libraries with different CPU dispatching schemes. Some of these are fair to non-Intel processors and some are unfair. By unfair dispatching I mean that it chooses a suboptimal code path when running on a non-Intel CPU even when the CPU is compatible with a better code path. The different versions can get quite confusing, so I have tried to test as many different versions of Intel software products as I could get my hands on and present an overview of the results here.

The tables below show the highest instruction set available to Intel and non-Intel processors when running the different software products. The sequence of instruction sets have the not very logical names:

386 MMX SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 AVX

Intel Math Kernel Library

The Math Kernel Library (MKL) contains many advanced mathematical functions. The results in the following table do not apply to various (sub-)packages that may be bundled with the MKL, such as the Intel Vector Math Library (VML), Intel Performance Primitives (IPP) and Intel Threading Building Blocks (TBB).

Library version Intel processor
32 bit mode
non-Intel processor
32 bit mode
Intel processor
64 bit mode
non-Intel processor
64 bit mode
MKL 7.0, 2004 SSE3 386 n.a. n.a.
MKL 8.1, 2006 SSSE3 SSSE3 SSSE3 SSSE3
MKL 9.0, 2006 SSSE3 SSSE3 SSSE3 SSSE3
MKL 10.2, 2008 SSE4.2 SSE4.2 SSE4.2 SSE2
MKL 10.3, 2010 SSE4.2 SSE4.2 AVX SSE2

As we can see, version 8 and 9 give Intel and non-Intel processors access to the same instruction sets, while version 7 and the 64-bit version 10 have unfair dispatching. MKL 7.0 has no x86-64 version.

Intel Vector Math Library

The Vector Math Library (VML) contains procedures for calculating elementary mathematical functions on vectors of arbitrary size.

Library version Intel processor
32 bit mode
non-Intel processor
32 bit mode
Intel processor
64 bit mode
non-Intel processor
64 bit mode
VML 7.0, 2004 SSE3 386 n.a. n.a.
VML 8.1, 2006 SSSE3 SSE SSSE3 SSE2
VML 9.0, 2006 SSSE3 SSE SSSE3 SSE2
VML 10.2, 2006 SSE4.2 SSE2 SSE4.2 SSE2
VML 10.3, 2010 AVX SSE2 AVX SSE2

As we can see, all versions have unfair dispatching. There are different branches for Intel processors with SSE2 and non-Intel processors with SSE2. I have not tested which of the SSE2 branches run fastest on non-Intel processors.

Intel Performance Primitives

All the versions I have tested have fair CPU dispatching.

Intel Threading Building Blocks

This library has some CPU dispatching, but I have not tested whether it is fair or not.

Intel standard C library and standard math library

These libraries are called automatically from code compiled with an Intel C++ compiler.

Library version Intel processor
32 bit mode
non-Intel processor
32 bit mode
Intel processor
64 bit mode
non-Intel processor
64 bit mode
7.1, 2004 SSE2 386 n.a. n.a.
8.1, 2005 SSE3 386 n.a. n.a.
9.1, 2006 SSE3 386 SSE3 SSE2
10.1, 2008 SSE4.2 386 SSE4.2 SSE2
11.1, 2010 AVX 386 AVX SSE2
12.0, 2010 AVX 386 AVX SSE2

All versions have unfair CPU dispatching. In many cases, however, the Intel compiler can generate calls directly to the SSE2 version of a function when compiling for the SSE2 or higher instruction set. This also applies to non-Intel processors.

Intel Short Vector Math Library

The Short Vector Math Library (SVML) is used for elementary mathematical functions on vector registers (XMM and YMM registers). It is called automatically from code compiled with an Intel compiler when the SSE2 or higher instruction set is enabled. The SVML can also be used with other compilers such as the Gnu C++ compiler.

Library version Intel processor
32 bit mode
non-Intel processor
32 bit mode
Intel processor
64 bit mode
non-Intel processor
64 bit mode
7.1, 2004 SSE2 SSE2 n.a. n.a.
8.1, 2005 SSE3 SSE2 n.a. n.a.
9.1, 2006 SSE3 SSE2 SSE3 SSE2
10.1, 2008 SSE4.2 SSE2 SSE4.2 SSE2
11.1, 2010 AVX SSE2 AVX SSE2
12.0, 2010 AVX SSE2 AVX SSE2

Intel C++ compiler

The Intel C++ compiler has various options that allow the programmer to generate code for a specific instruction set or to make multiple versions of the code for different instruction sets with automatic CPU dispatching. Non-Intel processors will always get the generic version of the code if CPU dispatching is used. The default level for the generic code is SSE2 for version 11 and 12 of the compiler, and 386 for version 10 and earlier in 32-bit mode as indicated in the following table.

Compiler version Intel processor
32 bit mode
non-Intel processor
32 bit mode
Intel processor
64 bit mode
non-Intel processor
64 bit mode
7.1, 2004 SSE2 386 n.a. n.a.
8.1, 2005 SSE3 386 n.a. n.a.
9.1, 2006 SSE3 386 SSE3 SSE2
10.1, 2008 SSE4.2 386 SSE4.2 SSE2
11.1, 2010 AVX SSE2 AVX SSE2
12.0, 2010 AVX SSE2 AVX SSE2

There is an option for setting the generic level higher or lower. For example, the options   /arch:SSE3 /QaxSSE4.1,AVX   will set the generic level to SSE3 and generate three versions of the code for the SSE3, SSE4.2 and AVX instruction sets. Non-Intel processors can only get the generic version, which will be SSE3 in this example. Code compiled with the /Qx option, for example /QxSSE4.1 will fail to run on non-Intel processors and processors without the specified instruction set.

Other Intel products

The above test results are obtained with Intel C++ compilers and function libraries for Windows and Linux. I have found no differences between the Windows and Linux versions in the cases where I have had access to both. I have not tested the Macintosh versions, but this is less relevant as long as no Macintosh computers are available with AMD or VIA processors. I have not tested the Intel Fortran compiler, but it seems to be similar to the Intel C++ compiler with respect to CPU dispatching.

Anybody who have earlier versions of the compiler and function libraries than the ones I have tested are welcome to contact me.

   
Overview of CPU dispatching in Intel software
Author:  Date: 2020-08-31 10:21
I don't know who needs to know this, but a quick check om Intel MKL 2020.1.217-2 (from Debian sid) shows that VML is subject to the main MKL dispatcher too. And Intel is still sticking to to the __intel_cpu_features_init_x name even after all the years!

I can't really decipher the things going on with the AVX512 bits. The MKL_ENABLE_INSTRUCTIONS environment variable causes some strange assignments on EDI, but I don't see how that gets used. As for the feature init code... I can't bother to check the CPUID table. At least this won't really be a problem until Centaur releases their new AVX512 "AI" chip anyways.

   
New Intel compiler version - still the same!
Author: Agner Fog Date: 2010-09-22 11:37

A new Intel C++ compiler version 12 has now been released as part of the new "Intel Parallel Composer 2011". The CPU dispatching methods are unchanged from version 11. Apparently, all that has come out of the legal battles over CPU dispatching is a notice on Intel's website that the compiler does not optimize equally for non-Intel microprocessors (see link).

The main difference between version 11 and version 12 of the compiler is that the latter has more features for splitting the code into parallel threads in order to take advantage of multi-core processors. I have not tested how these features work on non-Intel processors.

The settlement with AMD requires that Intel shall not include any Artificial Performance Impairment in any Intel product. I cannot find any change in the new compiler version that reflects this requirement.

While the wording of the AMD settlement with regard to CPU dispatching is much more far-reaching than the FTC settlement, it has had no apparent effect so far, perhaps because it is subject to interpretation. Likewise, the FTC have not succeeded in making Intel change their compiler and libraries - maybe because they don't have the power to do so, or maybe because they don't have sufficient specialized knowledge to counter the technical arguments of Intel's experts.

Anyway, the software community will still have to live with the technical problems. My best advice now is to override Intel's CPU dispatcher as explained in my C++ manual, or use another compiler.

   
GCC now has support for function dispatch
Author:  Date: 2010-09-27 11:33
GCC support for function dispatch has recently been checked in.

More information here:
nickclifton.livejournal.com/6612.html
and here:
www.airs.com/blog/archives/403

JL

   
Intel compiler question
Author:  Date: 2010-10-11 06:21
Hello Agner and thank you for sharing all of your wisdom with us, it's really appreciated.

At the time of writing there's the following paragraph in your optimizing software in C++ manual:

"Programs compiled with the Intel compiler with the /Qax option will run sub-optimally on non-Intel processors unless the above patch is included. Programs compiled with the /Qx option will not run at all on non-Intel processors unless the above patch is included."

Does that mean that the unfair CPU path dispatcher is only present when either of those switches are used in the compilation? Or is it present too when using /arch:SSE3 only for example?

Thank you very much for your time, kind regards,
J. Russell.

   
Intel compiler question
Author: Agner Date: 2010-10-12 02:11
When the /arch option is specified to the Intel compiler, it will use the SSE2 version of most library functions (bypassing the CPU dispatching in the library) even when a higher instruction set is specified. The default value for /arch is SSE2 in version 11 and 12 of the compiler. The program will run fine on non-Intel processors unless it calls a library function with CPU dispatching, for example in the MKL library. It will not run on any CPU with an instruction set lower than specified by the /arch option.

The /arch:AVX option is unofficial at the moment. /arch:AVX will use the AVX version of some library functions. This doesn't work with non-Intel processors at the moment, but this is a bug which they have promised to fix.

   
New Intel compiler version - still the same!
Author:  Date: 2010-11-29 00:59
Thanks for all of your excellent analysis of the Intel compiler behavior on Intel and AMD hardware.
I wanted to inform you of another alternative to using either the new Intel compiler or the GNU tools for
x86 (Intel and AMD) systems. Sun Microsystems, now part of Oracle, offers a great
developer tool suite... and it is free for both development and production usage.
The Oracle Solaris Studio suite includes C, C++, and Fortran compilers, along with a number of
advanced tools (such as, a dbx debugger and a performance analyzer). All of these tools support
both Intel and AMD hardware via command-line options while the default is an architecture type
of "generic" which tries to balance performance for both chips sets in one binary. It is an impressive
free set of tools and can be downloaded from: www.oracle.com/goto/solarisstudio (even though the
name is *Solaris* Studio, there is a Linux download for it as well). I think you'll find the performance
comparable to, and in some cases, better than(!) the Intel compiler on both the Intel and AMD chip sets.

Regards,
--Don

   
New Intel compiler version - still the same!
Author: Daniel Date: 2011-12-23 12:33
First of all I want to thank you for all your research an work coding a fair dispatcher override concerning Intel's Compiler. Without your work this Compiler wouldn't be usable for me at all because code that favorizes Intel CPU's is inacceptable for me!

I'd like to add, that in version 12 of Intel's C++ Compiler for Windows your dispatcher override code works for 32bit, but not for 64bit. Seems like the x64 equivalent for AVX is smaller than the 32bit number.

Adding a simple switch for 64bit and changing the the value at the line __intel_cpu_indicator = 0x20000; from 0x20000 to 1<<16 for the 64bit branch solves the problem. I'm not sure whether the other values are correct or not because the only AMD CPU I have available supports AVX.

The Code vor the MKL and VML are fine though.

Regards
Daniel

   
New Intel compiler version - still the same!
Author: Agner Date: 2011-12-25 04:19
Daniel wrote:
Seems like the x64 equivalent for AVX is smaller than the 32bit number.

Adding a simple switch for 64bit and changing the the value at the line __intel_cpu_indicator = 0x20000; from 0x20000 to 1<<16 for the 64bit branch solves the problem.

Please email me with an example of a function that fails with __intel_cpu_indicator = 0x20000. I will investigate it.
   
New Intel compiler version - still the same!
Author:  Date: 2012-02-12 12:39
www.compilerreimbursementprogram.com

software.intel.com/en-us/articles/optimization-notice/#opt-en

Looks like the FTC has finished the ruling. If you purchased the Intel compiler you can receive compensation.

Nicely Nicely. I wasn't going to type this entire thing, however it needs to be done.

"Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimizations on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice."
Notice Revision #20110804

They hide the text in a gif to avoid the search engines. However this site is indexed very well.

   
New Intel compiler version - still the same!
Author:  Date: 2012-03-14 10:50
I am not sure why Intel used images for that one instance, but their are a number of sites on the Intel web where the announcement is in text format. A quick search for a phrase of the announcement found a number of them in various formats .... PDF, text, image,

I doubt they were hiding anything but it makes good reading.

software.intel.com/sites/products/documentation/hpc/composerxe/en-us/cpp/lin/main/main_welcome.htm

This is not directly related to the compiler but it does raise the issue of potential problems if there is some difference in the way that Intel and others implement new features. They have updated the BOINC project PRIMEGRID to use the AVX instructions introduced on Sandy Bridge and Bulldozer. The code works fine on a Sandy Bridge part but fails on a Bulldozer.

AVX build of llr (20%-50% faster)
www.primegrid.com/forum_thread.php?id=3912

"This has been tested on Intel Sandy Bridge but should also work on AMD Bulldozer [UPDATE] DOES NOT WORK on AMD Bulldozer. Currently, these are the only two CPU's that support AVX. Attention: AVX is supported only after Win7 SP1. WinXP does not have AVX support."


There may be a coding error that trips the AMD Bulldozer implementation of the PRIMEGRID client project but not the Sandy Bridge. If there is that difference, developers will have to be careful when they add support. I think that in this case they are using the gcc compiler.


AMD FX X8 8120 not working with AVX!
www.primegrid.com/forum_thread.php?id=3933

I thought you might be interested.

RJ

   
Still no library that is optimal on all processors
Author: Agner Date: 2012-04-18 08:52

Choosing the most efficient function library can be a nightmare to a programmer. I have tried to calculate the cosine function with different libraries and compare the calculation time. The best version is 19 times faster than the worst!

AMD have now updated their math libraries and added CPU dispatching. There are two versions of code in AMD's LIBM library: One for the SSE2 instruction set and one for AVX and FMA4. Intel processors will run the inferior SSE2 branch because they don't have the FMA4 instruction set. The incompatibility between Intel's and AMD's FMA instructions is another scandal, which I have discussed in this blog post. The AMD library does not check the CPU brand name as Intel libraries do. It only checks for the FMA4 instructions which are not supported by Intel processors, although - quite ironically - they were designed by Intel. It will be possible to run the better branch on Intel processors if Intel decides to support the FMA4 instruction set in the future.

The following table shows the results of some tests I have made of different math libraries.

Library Elements per vector Dispatch version Time, 32 bit mode Time, 64 bit mode
glibc 2.13 1 none 1400 1030
MS 17.00 1 SSE2 724 200
Intel 12.1.3 1 generic 1950 303
Intel 12.1.3 1 Intel 720 295
Intel SVML 12.1.3 4 generic (SSE2) 360 203
Intel SVML 12.1.3 4 Intel AVX 99 188
Intel SVML 12.1.3 8 generic (AVX) 112 128
Intel SVML 12.1.3 8 Intel AVX 108 101
AMD LIBM 3.0.2 4 generic (SSE2) n.a. 245
AMD LIBM 3.0.2 4 FMA4 n.a. 148
Calculation time in clock cycles for the cosine of a vector of 8 single-precision floats on an AMD Bulldozer CPU in a single thread with different function libraries.
(values are imprecise due to the varying clock frequency).

The Gnu function library (glibc) uses an outdated and inefficient code. The Microsoft library has decent performance in the 64-bit version, but of course it supports only the Windows platform. Intel's general math library is no better in my test case, but Intel's Short Vector Math Library (SVML) is very good. The SVML library supports vectors of 4 floats in an XMM register (SSE2) or a vector of 8 floats in a YMM register (AVX). It will choose the inferior generic path for non-Intel processors unless we replace Intel's CPU-dispatcher as described above. Intel's libraries are available for both Windows, Linux and Mac. AMD's LIBM library supports vectors of 4 floats. It is available for Windows and Linux, but only in 64-bit mode.

The sad conclusion is that we have no fully optimized math function library that supports all brands of x86 processors and all operating systems. If we want optimal performance on all processors, the best choice is to use Intel's SVML library and manipulate it into treating non-Intel processors better.

It would be nice if more people would work on improving glibc. This library supports all processors and platforms, but it is poorly optimized. Only a few memory and string functions in glibc have CPU dispatching, while the math functions have only old and poorly optimized versions. It would also be nice to have vector versions of the math functions in glibc because the Gnu compiler has support for such functions.

   
Still no library that is optimal on all processors
Author: Guest Date: 2012-05-17 11:38
I know this is an old blog but does anyone have any news on whether Intel has any intention of removing the performance throttle against AMD? Seems Antitrust lawsuits isn't enough, and only when Intel starts having to continually pay fines and compensation will they realise the costs of comensation is crippling them... since most users only care about performance, then use of Intel's compiler which is made to only benefit Intel CPUs is giving Intel an unfair advantage in the market, by costing AMD sales.

I suppose the only way around it is for developers to blacklist and boycott the Intel compilers and move to one that is independant of CPU vendors (i.e. open source or made by the developers themselves) Unfortunately this would cost time, effort and money and would be much easier to use Intel's considering that any such move would probably lead to Intel giving out sponsors or 'bribes' in the same way that AMD and Nvidia do with certain games, when the logo comes up at the start.

Whilst it could be true that Intel generally have better architechture and overall performance vs AMD, they shouldn't have to resort to cheating.

   
Still no library that is optimal on all processors
Author: Agner Date: 2012-05-17 13:40
Guest wrote:
does anyone have any news on whether Intel has any intention of removing the performance throttle against AMD?
It's not getting better. The latest version of Intel's SVML (small vector math library) has some functions that can only be called from processors with AVX because the input parameter is an AVX vector (YMM register). There is no logical reason why these functions should have a CPU dispatcher, yet they have two different code paths for the same instruction set: An optimized version for Intel processors with AVX and an inferior version for other brands of CPU with AVX.
   
Still no library that is optimal on all processors
Author: David Date: 2012-05-19 16:35
For the static IPP library, can't a custom dispatcher be built with something like... if not "GenuineIntel" then if cpu reports support for SSE2 / 3 / 4 / etc.. then staticinitcpu(the intelcpu compatible with those instructions)??

Has such a dispatcher already been created by AMD or someone??

   
Still no library that is optimal on all processors
Author: Agner Date: 2012-05-20 01:58
David wrote:
For the static IPP library, can't a custom dispatcher be built...

The IPP has fair CPU dispatching. There is no need for a custom dispatcher here. Dispatchers for the other Intel libraries are described in my C++ manual.

   
Still no library that is optimal on all processors
Author:  Date: 2012-06-16 04:45
Do you have any tips for changing the CPUID using VMWare Workstation 8.0.4? I edited the .vmx file to read (at the top part of the file) -
.encoding = "windows-1252"
config.version = "8"
virtualHW.version = "8"
numvcpus = "4"
cpuid.coresPerSocket = "4"
cpuid.0.ebx="0111:0101:0110:1110:0110:0101:0100:0111"
cpuid.0.edx="0100:1001:0110:0101:0110:1110:0110:1001"
cpuid.0.ecx="0110:1100:0110:0101:0111:0100:0110:1110"
cpuid.1.eax="0000:0000:0000:0001:0000:0110:0111:1110"
scsi0.present = "TRUE"
memsize = "4096"

The VM boots up and at the first screen of Win XP install when it says in the bottom grey bar "Setup is startng Windows"

   
Still no library that is optimal on all processors
Author:  Date: 2013-05-20 18:05
Open-source Yeppp! library (www.yeppp.info) provides vector mathematical functions optimized for both Intel and AMD processors, and demonstrates performance comparable to the best vendor-optimized libraries: www.yeppp.info/home/yeppp-performance-numbers
   
Still no library that is optimal on all processors
Author: Agner Date: 2013-05-21 07:22
Marat Dukhan wrote:
Open-source Yeppp! library (www.yeppp.info) provides vector mathematical functions optimized for both Intel and AMD processors, and demonstrates performance comparable to the best vendor-optimized libraries: www.yeppp.info/home/yeppp-performance-numbers
Thank you. That is good news indeed. I hope that you will add more functions to the library.
   
This is still going on, wow just wow
Author: Vuurdraak Date: 2016-11-10 12:30
I knew Intel was doing this stuff in the past, I didn't realize they where still crippling their compilers.
I'm not a professional coder so I never looked at it again, I am how ever a long time AMD CPU user, who Intel is hurting on purpose.
I checked on the Intel website and the notice that they only optimize for Intel processors is still there, so they are still crippling their compilers, ten years after Agner started to complain to Intel.

I'm surprised that no anti-trust authorities have slammed Intel for this, as to me it is clearly anti competitiveness coding.
It's amazing how a company that has a virtual CPU monopoly for desktop computers, can get away with using the worst possible solution for determining the use of the best instruction set, where they are even willing to partly cripple their own new CPU's, like in the example of renaming a Core CPU to Pentium 4, because they refuse to use blacklisting or proper feature set detection regardless of the CPU brand or model.

There should be no reason at all to attempt to detect the CPU's brand other then to hurt the competition, even if this needs to be done to exclude bugged CPU models, any branching done that automatically eliminates non-Intel CPU's from a faster code path before checking if you are dealing with a bugged CPU or it's feature set, clearly is done on purpose.

If Intel was some small business, that was struggling for market share, then I would not care too much, but they are a monopolist who produces current products that are more expensive for the same speed as what I paid for my Phenom II CPU five years ago, because there is almost no competition, also hurting the people who buy Intel CPU's as Intel can jack up the price.

While Microsoft was forced by the EU to include competitors browsers as a choice in to it's OS, Intel is allowed to purposefully code compilers that are widely used due to their great optimization, to slowdown competitors products, it's just amazing.

   
This is still going on, wow just wow
Author: Agner Date: 2016-11-10 13:22
Vuurdraak wrote:
I'm surprised that no anti-trust authorities have slammed Intel for this, as to me it is clearly anti competitiveness coding.
See www.agner.org/optimize/blog/read.php?i=49#112

BTW, here is more evidence that benchmarks have been corrupted:

arstechnica.com/gadgets/2008/07/atom-nano-review/6/

   
This is still going on, wow just wow
Author: Vuurdraak Date: 2016-11-11 04:34
I did read the FTC and AMD court case part, I'm not a lawyer, but has this made it impossible to go after Intels compilers in the USA ?
Apart from paying some cash to AMD and promising not to hurt AMD, they never stopped crippling the compiler, and I'm wondering what kind of non-techies have allowed this to continue, because anybody who has ever written a piece of code, understands that this compiler stuff is done on purpose and is unnecessary, they do not have to explicitly optimize for non-Intel CPUs to allow non Intel CPUs to take the best possible code path, the only thing they need to do, is not kill a CPU code path based on the CPUs brand name.

The EU is still in a court case with Intel, although that seems to be about giving rebates to Dell, HP, NEC & Lenovo, for not selling AMD products: fortune.com/2016/10/20/intel-eu-antitrust-fine/
I'm wondering if the EU trade commission, has looked in to the compiler story, I'm going to take a look where I can send an email that reaches the EU, to take a look at this blog and the compiler shenanigans.

   
This is still going on, wow just wow
Author:  Date: 2017-01-02 10:35
Hello! I am justa simple user. And more than 15 years I am using AMD CPU becpuse I just want to help to the company to make a real competition and not to be Intel as monopolist.
Is it possible to make some software to destroy a "check" of genuine intel?
Exist 1 software with name - Intel Compiler Patcher ori cc_patch here is link - https://cloud.mail.ru/public/GMpC/YQJ6spHtQ
Is it possible to modify it to make it more usefull at 2017?
Or maybe exist something else?
dear Agner, you have a great program cpuidfake that working Only with VIA. Can you make that it will work with AMD in 2017 reality ?
Sorry I am just a simple user. But I am hope that it possible. thank you for your work!
   
This is still going on, wow just wow
Author: Agner Date: 2017-01-02 11:35
Denis wrote:
software with name - Intel Compiler Patcher ori cc_patch here is link - https://cloud.mail.ru/public/GMpC/YQJ6spHtQ
I haven't tried the russian patcher. You may try it at your own risk - but be sure to backup your programs first!


Agner, you have a great program cpuidfake that working Only with VIA. Can you make that it will work with AMD in 2017
Unfortunately, it is not so simple to modify the CPUID on an AMD machine.
   
RYZEN thoughts?
Author:  Date: 2017-03-10 15:51
can anyone xplain what is going on with this new cpu. they are shattering records in rendering and synthetic benchmarks but gaming performance seems to b affected.
Game engines are normally using compilers from Microsoft, and windows cant even properly identify the amount of cache memory yet. So a simple windows update will fix everything?

The problem about incorrect cache memory is reported by some tech review sites and the discussion can be found in Microsoft's own website too:

https://answers.microsoft.com/en-us/windows/forum/windows_10-hardware/fyi-smt-configuration-error-affecting-amd-ryzen/6f911994-1c17-4886-98ab-93c55852285a

http://www.universityherald.com/articles/68738/20170310/amd-ryzen-cpu-severely-crippled-windows-10-microsoft-patches-way.htm

   
RYZEN thoughts?
Author:  Date: 2017-03-16 08:23
It's not clear to me that "compiler shenanigans" are responsible for the difference - which does vary quite markedly between games, and even between different APIs on the same game (eg. DX12 and Vulkan is relatively much worse for Ryzen than DX11 and OpenGL, which is the opposite of what you'd expect from an ordinary CPU bottleneck).

What we do know is that communication between CCXes (each of which carries 4 cores) is somewhat more constrained than communication within a CCX; they are linked by the Infinity Fabric which runs at a speed tied to the memory clock, whereas everything within each CCX runs at core clock speed. Games tend to have a lot of producer and consumer threads which communicate voraciously, and they generally don't achieve full core utilisation on a CPU as highly multithreaded as Ryzen 7, so Windows further complicates matters by migrating threads randomly from one core to another, often *between* CCXes. Productivity benchmarks appear to need relatively little communication between threads, and because all cores are fully utilised, Windows doesn't migrate threads between cores.

Hence productivity benchmarks can use Ryzen efficiently without needing to know about its quirks. Games need to be a bit more intelligent, and some of them apparently have intelligence tuned for Intel CPUs which unfortunately breaks on Ryzen. AMD posted something revealing about F1 2016 recently which illustrates this very neatly.

   
RYZEN thoughts?
Author: Peter Date: 2017-04-11 22:29
The most plausible sounding explanation I've read is that Ryzen lives and dies by its large L2 caches.

Neither its IPC nor its top clock speeds can quite beat Intel, but it can keep substantially larger program and data working set fragments in its 512 kB L2s than Intel chips can in their 256 kB ones and can avoid memory stalls for some programs more easily.

By the same token, thread rescheduling can kill performance, especially if one is migrated to the opposite CCX, since the large L2 cache takes inherently longer to re-warm, and Ryzen's internal datapaths are only 32B wide (compared to 64B for post-Haswell) even within a CCX, and the Infinity Fabric crossbar is only clocked at half-nominal DDR4 interface rates, for something in the neighborhood of only 40-50 GB/s bandwidth in each direction. Re-warming an L2 cache for a thread moved to an adjacent core within a CCX could take 4x as many clocks as for an Intel chip and perhaps 10x-12x longer for migrations to a remote CCX.

The programs that do better in Ryzen will probably be the ones that can both keep thread migration under control and that can keep their working sets under control, not using so much that 512 kB L2s and 8 MB L3 cluster still can't keep up.

   
RYZEN thoughts?
Author: Agner Date: 2017-04-12 00:52
The single-thread instructions per clock rate of Ryzen is higher than for any Intel processor, except for 256-bit vector code. I am testing the Ryzen right now and the test results are coming soon. Please be patient.
   
RYZEN thoughts?
Author:  Date: 2019-02-12 14:54
It's been almost 2 years since you mentioned that you were testing ryzen, have you finished? I don't see any results. You may have just placed them somewhere and never replied to this thread. I did not see them.
   
RYZEN thoughts?
Author: Agner Date: 2019-02-13 00:33
The test results for Ryzen are in my optimization manual. It can execute five instructions per clock in small loops.
   
RYZEN thoughts?
Author: itsmydamnation Date: 2017-04-21 01:29
So on the whole gaming performance thing. It looks a bit multifaceted but nothing long lasting:

1.Currently win10 power management is not Zen aware, so either you park cores and get very slow wake up and get the XFR clocks or you disable core parking and don't get the XFR clocks. AMD released a work around power profile, but its a stop gap, Zen needs the equivalent of intel speedstep support in windows.

2. Memory latency is high, lots of initial reviews where done at DDR4 2133/2400 with high timings resulting in around 100ns memory access times in canned benchmarks (sandra, ADIA, sysmark etc). With new BIOS memory latency is lowered and higher frequencies are becoming easier to hit eg. 3200mhz CAS 14/16 . This is bring those memory latency numbers down to low 70's and on some motherbaord/memory combinations high 60's.

3. Inter CCX fabric clock and thus latency/throughput is tied to memory clock, so between this and point 2 upping memory clock has a big performance impact in gaming far bigger then on intel platforms. my guess is latency both inter CCX and to memory but i guess it could be throughput.

4. Nvidia Drivers seem to run into performance issues in many games above 4 physical cores. Its an odd problem, cant be clearly seen in all games, but as core counts increase NV performance can go backwards while Radeon performance goes forward. This can be seen on both Broadwell-E and Ryzen. Given the current GPU landscape this wasn't picked up initially because everyone tested on 1070/1080/1080ti etc and tested 1800X's against 7700k's.

5. Some games are doing things Zen doesn't like, AOTS got 20% perf up lift via a patch, we will have to wait and see what other games bring.

Unfortunately regular hardware sites aren't keeping up with this very fluid and changing environment, you have to take to forums and youtube reviewers (yuk) to see this stuff being tested.

   
This is still going on, wow just wow
Author:  Date: 2017-07-19 05:07
Please let me announce our new release of SLEEF library.
It is basically an open source alternative of SVML, and of course it works very well on AMD computers.

Please check out our benchmark results. The performance of SLEEF is now comparable to SVML.

sleef.org

   
A long history of legal antitrust battles
Author: Agner Date: 2017-07-27 00:48
Intel has a long history of monopoly-like behavior and expensive legal battels, according to this video blogger:
www.youtube.com/watch?v=osSMJRyxG0k.
   
A long history of legal antitrust battles
Author:  Date: 2017-07-27 12:04
Agner wrote:
Intel has a long history of monopoly-like behavior and expensive legal battels, according to this video blogger:
www.youtube.com/watch?v=osSMJRyxG0k.
And maybe there's another one lying in the future regarding x86 emulation on Windows 10 for ARM :

Sources :
www.pcper.com/news/Mobile/Qualcomm-and-Microsoft-bring-full-Windows-10-Snapdragon-devices
www.eetimes.com/author.asp?section_id=36&doc_id=1331874
newsroom.intel.com/editorials/x86-approaching-40-still-going-strong/

   
A long history of legal antitrust battles
Author:  Date: 2018-02-19 05:10
Has something changed in the latest ICC (Version 18.1)?

Both for the ICC generating code path and SVML.

Thank You.

   
A long history of legal antitrust battles
Author: Agner Date: 2018-05-15 04:11
Royi wrote:
Has something changed in the latest ICC (Version 18.1)?

Both for the ICC generating code path and SVML.

Thank You.

It looks like ICC version 18.1 works in the same way as the previous versions. The MKL (Math Kernel Library) has both a discriminating and a non-discriminating CPU dispatcher (__intel_mkl_features_init and __intel_mkl_features_init_x, respectively). I don't know how it decides which one to use.

The SVML (Short Vector Math Library) uses the non-discriminating CPU dispatcher.

   
Intel's "cripple AMD" function
Author: PCPMD Date: 2019-02-27 16:05
I feel like it needs to be mentioned that there's another issue popping up in the gaming world that I believe is related to this feud.

Many new games are releasing with an SSSE3/SSE4.2 requirement, which I suspect is now a default setting in the Intel Compiler. The Phenom II line of AMD processors are some of the best processors available for the AM2+/AM3 socket, and they do not support SSSE3. This may not immediately seem like favoritism, but Intel has consistently been releasing new CPU instruction sets that AMD has had to keep up with, being one step behind. This is not a performance issue, but rather an engineered obsolescence issue. By slowly moving the bar for compatibility, they'll shorten the lifespans of AMD processors, leaving people feeling like AMD isn't good enough. This will continue to be an issue, every time Intel changes the Intel Compiler default settings to include a new instruction set.

I have a Phenom II 1100T 6X and I'm quite happy with its performance, but now I'm looking at having to downgrade to a newer, but slower processor, or upgrade and replace my motherboard and my RAID0 tied to that motherboard. My CPU has never been my bottleneck in performance, however Intel is trying to make it obsolete.

I strongly urge programmers to fight this by maintaining support for processors without the newest Intel instruction sets for as long as possible.

   
Patches and workarounds
Author: Neville C Date: 2019-11-21 00:41
Entering MKL_DEBUG_CPU_TYPE=5 into the System Environment Variables 'fixes' Intel's Math Kernel Library (MKL).
200+ % performance gains.

For Linux:
Edit your shell's configuration scripts (~/.bashrc for bash, ~/.zshrc for zsh etc) adding the line export MKL_DEBUG_CPU_TYPE=5.

Ref:
https://www.reddit.com/r/matlab/comments/dxn38s/howto_force_matlab_to_use_a_fast_codepath_on_amd/

This makes me wonder if *_DEBUG_CPU_TYPE=5 or ???_DEBUG_CPU_TYPE=5 would work for all Intel compiled libraries and files?
A long sought universal 'fix' would be great!

You are aware of the patches for Intel's compiler and exe's here:
www.swallowtail.org/naughty-intel.shtml#patches
and the Intel Compiler Patcher based on it:
https://www.majorgeeks.com/files/details/intel_compiler_patcher.html
but running the the Intel Compiler Patcher through:
https://www.hybrid-analysis.com/
seems to point to it containing very well hidden malware in it.

   
Patches and workarounds
Author:  Date: 2020-09-01 01:48
The environment variable has been patched out in MKL 2020 Update 1. A Daniel de Kok has found out an alternative approach that involves overriding a mkl_serv_intel_cpu_true() function.[1] This version seems a bit nicer than the `intel_mkl_feature_patch.c` in that it requires no manual calling, only a preload. For Windows however, I have no idea how one can cleanly inject a dll, let alone modify the PE file to always load it.

[1]: https://danieldk.eu/Posts/2020-08-31-MKL-Zen.html

Looking for the constant 0x756e6547 in Debian MKL 2020.1.217-2 (presumably 2020 Update 1), it appears that it is only one of the three places that runs a GenuineIntel check. The other two are our familiar __intel_mkl_features_init_body and a mkl_dft_ipp_is_GenuineIntel function.

Looking through the references does not give me half an idea how only overriding mkl_serv_intel_cpu_true() helps performance. Presumably this has something to do with the "Zen kernel" Daniel talks about. Since this symbol is externally available, it might be checked by the architecture-specific libraries instead.

   
Intel's "cripple AMD" function
Author: Walker Date: 2020-06-29 03:27
Check d3dx9_43.dll and d3dx10_43.dll. Branch selection dependent on cpuid "GenuineIntel" is still there and functional. By Microsoft.
lol And AMD did lick Microsoft all over with their Windows 10 only(not, 7 runs perfectly) support for ryzen in contrast. Pathetic.
   
Intel's "cripple AMD" function
Author:  Date: 2020-09-16 06:17
It makes no sense to deploy ICC compiled binaries to AMD users nor would it be smart for intel to align their optimization pipeline to another vendor. It's considerably cheap to add a few additional build tasks before you begin to publish your application. Checking the cpu id for the backing chip is trivial and branching the starting PE, or simply overwriting files, at the launching step isn't hard either. Preferably you want to avoid SIMD capability checks at runtime and instead do that before you actually start the internal process. This obviously requires some compilation overhead due to the fact that you need to build multiple binaries for essentially the same purpose, but since it's just done at the deployment stage, I see no decisive reason why this method wouldn't be superior. You can build your targets in parallel on different servers with different compilers and input switches.
And since LLVM should be faster than ICC anyways (*), why even bother about it anymore. This is the end game where micro optimizations in software make up for the hardware limitations.

*Check replies: https://www.agner.org/optimize/blog/read.php?i=1015

   
Intel's
Author: Agner Date: 2020-09-16 10:31
Forsen wrote:
It's considerably cheap to add a few additional build tasks before you begin to publish your application. Checking the cpu id for the backing chip is trivial and branching the starting PE, or simply overwriting files, at the launching step isn't hard either.
It is quite expensive for a software company to develop, debug, test, verify, support, and maintain multiple versions of their code using different compilers. It is more convenient to use a single compiler that gives good performance on all brands of CPU.
   
Intel's "cripple AMD" function
Author:  Date: 2021-11-28 22:43
I think there may be some misunderstanding here.
Nobody thinks that Intel should go out of their way to help AMD, such as by making optimizations specific for AMD processors.

Rather, Intel licensed out an instruction set called x86, and there are standardized extensions to it that must be implemented exactly the same as Intel did or not at all.
All other compilers on Earth check if these extensions (such as SSE, AVX, AES-NI, etc.) are present from the CPUID flags, and Intel is also checking the CPUID flags, but they are checking for what brand of CPU it is rather than which extensions it supports.
If a compiler has a way to use AVX to run 16 computations at once (per thread) using AVX, for example, it does not need to make an AMD version.
It just needs to check if AVX is supported by the current CPU (not all Intel CPU's, nor all AMD CPU's, have this extension) and make use of that if available (in code that can be vectorized, the throughput increases 1600%, whereas micro-optimization is generally a gain far less than 10% or so.)

The brand name is not in any way related to which extensions the currently running CPU supports. Intel's compiler is still going out of its way to cripple competitors. This has nothing to do with going out of their way to accomodate AMD or not.
Imagine if when you tried to set a default browser, Microsoft made it so Windows would not let you unless the browser was from Microsoft, even if all browsers were standardized and it was easier to just check if the browser was supported on your computer.
Microsoft did this and got taken to court for it. They lost billions of dollars. Just for preventing competing browsers from bundling with Windows!
What Intel is doing here is far worse. No project run by volunteers with limited time, nor any commercial project with budgetary/tinme constrants, can afford to port everything they own to another compiler. Remember, many people start using Intel's compiler without any warning of this anticompetitive behaviour.

And no, unless you are talking about a personal pet/toy project, it is not trivial to port everything to a new compiler.
There are intrinsics (heavily promoted by Intel) which must all be rewritten.
There are cross-organization interactions of code that require everyone to use the same compiler for the duration of a project.
There are all sorts of legacy issues where you may end up with some binaries/libraries that you no longer have the source code to (or never had the source to,) and they may not have been certified to be used with code generated from other compilers. All throughout industry the costs of this could be infeasible, even where it is possible at all (which is mostly small projects.)

   
Intel's
Author: Agner Date: 2021-11-28 23:40
ETERNALBLUEbullrun wrote:
Remember, many people start using Intel's compiler without any warning of this anticompetitive behaviour.
Early versions of Intel compilers had no warnings about discriminatory code. This warning was added after the legal action from FTC in 2012.

Today, the Gcc and Clang compilers are optimizing better than the Intel compiler in many cases. Most programmers now prefer to use other compilers than Intel, unless they need some specific feature.

Programmers still have problems if they need the Intel Math Kernel Library and there is no good alternative.

   
New Intel compiler. Latest update
Author: Agner Date: 2022-08-08 04:58
The new Intel compiler is better. For latest updates, see my new forum: link.
   
MKL performance on AMD with the new compiler
Author:  Date: 2022-08-22 09:02
Hi Agner,

I've read through your blog (focusing on the recent posts), and found it very informative.
I hope you could help with some questions I have:

There's a new version of MKL coming with the new compiler, namely 2022.1.0.
Does that version, too, need to be patched to overcome Intel's discrimination of non-Intel processors?

I tried to use both intel_mkl_cpuid_patch.cpp and intel_mkl_feature_patch.cpp you mention. Now:

- The cpuid patch failed link due to duplicate symbols, e.g.:
mkl_core.lib(mkl_cpuid_patched.obj) : error LNK2005: mkl_serv_intel_cpu already defined in intel_mkl_cpuid_patch.cpp.obj
It appears that mkl_core source code includes the file "mkl_cpuid_patched.*" - what could that mean?

- I have trouble understanding how the feature patch works. The code generated doesn't seem like it's doing anything special:

__intel_mkl_features_init_x:
00007FFE13F6A230 xor eax,eax
00007FFE13F6A232 jmp __intel_mkl_features_init+10h (07FFE13F6A250h)
00007FFE13F6A237 nop word ptr [rax+rax]
__intel_mkl_features_init:
00007FFE13F6A240 mov eax,1
00007FFE13F6A245 jmp __intel_mkl_features_init+10h (07FFE13F6A250h)
00007FFE13F6A24A nop word ptr [rax+rax]
00007FFE13F6A250 push rdx
00007FFE13F6A251 push rcx

... am I missing something?

The final and largest issue is, I need to use the dll version of MKL, actually we create our own compact mkl dll to be used by many of the products. Is there a way to patch when using a dll and not a static lib?

Thanks,
Gil.

   
MKL performance on AMD with the new compiler
Author: Agner Date: 2022-08-22 09:48
Gil Moses wrote:
There's a new version of MKL coming with the new compiler, namely 2022.1.0. Does that version, too, need to be patched to overcome Intel's discrimination of non-Intel processors?

This messageboard is no longer used. Please see the latest update on this topic at my new forum: link.

The new version of MKL seems to work well on AMD processors without a patch, especially if you use it with another compiler than Intel. I have asked Intel to confirm this, but I have not received a useful answer yet. I will suggest that you ask them too.