Software optimization resources | E-mail subscription to this blog | www.agner.org
| Threaded View | Search | List | List Messageboards | Help |
| Intel's "cripple AMD" function |
|---|
| Author: Agner Fog | Date: 2009-12-30 10:22 |
Will Intel be forced to remove the "cripple AMD" function from their compiler?Many software programmers consider Intel's compiler the best optimizing compiler on the market, and it is often the preferred compiler for the most critical applications. Likewise, Intel is supplying a lot of highly optimized function libraries for many different technical and scientific applications. In many cases, there are no good alternatives to Intel's function libraries. Unfortunately, software compiled with the Intel compiler or the Intel function libraries has inferior performance on AMD and VIA processors. The reason is that the compiler or library can make multiple versions of a piece of code, each optimized for a certain processor and instruction set, for example SSE2, SSE3, etc. The system includes a function that detects which type of CPU it is running on and chooses the optimal code path for that CPU. This is called a CPU dispatcher. However, the Intel CPU dispatcher does not only check which instruction set is supported by the CPU, it also checks the vendor ID string. If the vendor string says "GenuineIntel" then it uses the optimal code path. If the CPU is not from Intel then, in most cases, it will run the slowest possible version of the code, even if the CPU is fully compatible with a better version. I have complained about this behavior for years, and so have many others, but Intel have refused to change their CPU dispatcher. If Intel had advertised their compiler as compatible with Intel processors only, then there would probably be no complaints. The problem is that they are trying to hide what they are doing. Many software developers think that the compiler is compatible with AMD processors, and in fact it is, but unbeknownst to the programmer it puts in a biased CPU dispatcher that chooses an inferior code path whenever it is running on a non-Intel processor. If programmers knew this fact they would probably use another compiler. Who wants to sell a piece of software that doesn't work well on AMD processors? Because of their size, Intel can afford to put more money into their compiler than other CPU vendors can. The Intel compiler is relatively cheap, it has superior performance, and the support is excellent. Selling such a compiler is certainly not a profitable business in itself, but it is obviously intended as a way of supporting Intel's microprocessors. There would be no point in adding new advanced instructions to the microprocessors if there were no tools to use these instructions. AMD is also making a compiler, but the current version supports only Linux, not Windows. Various people have raised suspicion that the biased CPU dispatching has made its way into common benchmark programs (link link). This is a serious issue indeed. We know that many customers base their buying decision on published benchmark results, and a biased benchmark means an unfair market advantage worth billions of dollars.
The legal battleAMD have sued Intel for unfair competition at least since 2005, and the case has been settled in November 2009. This settlement deals with many issues of unfair competition, apparently including the Intel compiler. The settlement says:
This looks like a victory for AMD. If we read "any Intel product" as Intel's compilers and function libraries, "any Third Party" as programmers using these compilers and libraries, and "Artificial Performance Impairment" as the CPU dispatcher checking the vendor ID string; then the settlement puts an obligation on Intel to change their CPU dispatcher. I will certainly check the next version of Intel's compiler and libraries to see if they have done so or they have found a loophole in the settlement. Interestingly, this is not the end of the story. Only about one month after the AMD/Intel settlement, the US Federal Trade Commission (FTC) filed an antitrust complaint against Intel. The accusations in the FTC complaint are unusually strong:
The remedy that the FTC asks for is also quite farreaching:
Maybe the FTC has decided that the AMD/Intel settlement was not a fair and sufficient remedy against Intel's monopoly behavior? The settlement compensates AMD, but not VIA and other microprocessor vendors, and not the customers who have been harmed by insufficient competition and by the "defective" software produced with the Intel compiler.
My own findingsWhen I started testing Intel's compiler several years ago, I soon found out that it had a biased CPU dispatcher. Back in January 2007 I complained to Intel about the unfair CPU dispatcher. I had a long correspondence with Intel engineers about the issue, where they kept denying the problem and I kept providing more evidence. They said that:
Sounds nice, but the truth is that the CPU dispatcher didn't support SSE or SSE2 or any higher SSE in AMD processors and still doesn't today (Intel compiler version 11.1.054). I have later found out that others have made similar complaints to Intel and got similarly useless answers (link link). The Intel CPU dispatcher does not only check the vendor ID string and the instruction sets supported. It also checks for specific processor models. In fact, it will fail to recognize future Intel processors with a family number different from 6. When I mentioned this to the Intel engineers they replied:
In other words, they claim that they are optimizing for specific processor models rather than for specific instruction sets. If true, this gives Intel an argument for not supporting AMD processors properly. But it also means that all software developers who use an Intel compiler have to recompile their code and distribute new versions to their customers every time a new Intel processor appears on the market. Now, this was three years ago. What happens if I try to run a program compiled with an old version of Intel's compiler on the newest Intel processors? You guessed it: It still runs the optimal code path. But the reason is more difficult to guess: Intel have manipulated the CPUID family numbers on new processors in such a way that they appear as known models to older Intel software. I have described the technical details elsewhere. Perhaps the initial design of Intel's CPU dispatcher was indeed intended to optimize for known processor models only, without regard for future models. If any of my students had made such a solution that was not future-oriented, I would consider it a serious flaw. Perhaps the Intel engineers discovered the missing support for future processors too late so that they had to design the next generation of their processors in such a way that they appeared as known models to existing Intel software. After Intel had flatly denied to change their CPU dispatcher, I decided that the most efficient way to make them change their minds was to create publicity about the problem. I contacted several IT magazines, but nobody wanted to write about it. Sad, but not very surprising, considering that they all depend on advertising money from Intel. The only publicity was my own optimization manual where I have described the problem in detail and given instructions on how to replace the unfair CPU dispatcher. I wonder why AMD didn't create public awareness about the problem. Were they obliged to keep quiet about an ungoing lawsuit? And what about VIA/Centaur?
WorkaroundsAt present, we don't know if or when Intel will make a new compiler and new software libraries that do not check the vendor ID string. In the meantime, here is what we can do about the problem.
LinksMy Discussion in Aceshardware forum 2007. Discussion in AMD Developer Forums 2008. My Discussion in AMDzone 2009. Complaint to Intel 2004, discussion in slashdot.org. Mark Mackey, complaint to Intel 2005. PCMark 2005 benchmark proven unfair. Arstechnica. Testimony by John Oram regarding BAPCo benchmark organization. Comment on AMD Developer Central 2005. AMD untitrust complaints 2005. Settlement agreement between AMD and Intel, 2009. Technical details in my C++ optimization manual. [Added later:] | |
| Reply To This Message |
| Intel's |
|---|
| Author: | Date: 2010-01-01 12:50 |
| About… «It is possible to change the CPUID of AMD processors by using the AMD virtualization instructions. I hope that somebody will volunteer to make a program for this purpose.» How exactly we can do that? As of now the simpliest option is to change vendor strings embedded in executables to compare with CPU's. We're at iXBT.com are doing this right now and will publish test results soon (Intel & AMD CPU's, „Intel“—>„AMD“ & „AMD“—>„Intel“ string change, 4-6 runs per app, 50-100 app's). If it is OK to use your findings (with all the copyrights), we will :) Felid | |
| Reply To This Message |
| Intel's |
|---|
| Author: Agner Fog | Date: 2010-01-02 03:22 |
Felid wrote:«It is possible to change the CPUID of AMD processors by using the AMD virtualization instructions. I hope that somebody will volunteer to make a program for this purpose.»It would work the same way as a virtualization tool. If you can get your hands on the source code of an existing virtualization tool for AMD processors you can add this as an extra feature. As of now the simpliest option is to change vendor strings embedded in executables to compare with CPU's. We're at iXBT.com are doing this right now and will publish test results soon (Intel & AMD CPU's, „Intel“—>„AMD“ & „AMD“—>„Intel“ string change, 4-6 runs per app, 50-100 app's). If it is OK to use your findings (with all the copyrights), we will :)Good initialive. I'm looking forward to the results. Remember to check for the vendor string "Genu", "ineI", "ntel" in all DLLs as well. | |
| Reply To This Message |
| Intel's "cripple AMD" function |
|---|
| Author: | Date: 2010-01-03 18:37 |
| I think it's arguable whether or not Intel crippled AMD via an "affirmative engineering or design action" as opposed to a "failure to act" (as distinguished in the settlement). They optimize code paths based on Vendor ID, which obviously has to do with more than supported instruction sets, or they wouldn't have to do it by processor family, etc. I doubt Intel can be required to optimize specifically for a CPU that's not theirs (also implying they wouldn't necessarily know its internal workings in order to do so). Yes, it may be true that the same code paths they used for Intel CPUs would have optimized well for AMD CPUs, but Intel probably isn't required to know that. Also, their code path optimizations are apparently targeted for specific vendors (namely, theirs, I guess) and models, not generic strategies, so it would have been a hack for them to apply an Intel-derived code path to AMD processors, because, while being better than no optimizer, the code might also include superfluous and deleterious algorithms as applied to AMD CPUs. Yes, it's pragmatic to do so and would have been better for consumers, but can we ethically require them to engineer a dirty hack into their programming structure? Another reason it's a gray area is that it's possible that the code path optimizations they took were obvious and would likely apply to any modern x86 CPU (though the fact that AMD and Intel are the only two players in the game sort of makes it beg the question), and if only some of them were obvious those optimizations may or may not be modularized already for easy extraction and universal use. From a business standpoint, this was obviously a strategic decision on Intel's part to undermine AMD, but if it's an ideological grey area whether they undermined AMD or simply took advantage of the fact that they happen to be the producers of one of the CPUs they compile for in order to improve their compiler for their *own* product, then they can't necessarily be faulted, because the law requires evidence and analysis. (Taking in-house advantage isn't necessarily anti-competitive in itself; for example, Microsoft makes plenty of applications that run on their own OS but not on Linux or Mac OS.) It is true that the settlement they issued flat-out admits wrongdoing; i.e., it implies that they did, in fact, effect an "affirmative or design action" meant to undermine AMD. As far as I can see there are three possible reasons for this. Of course it complicates the matter that there is actually a complete instruction set (SSE) that they don't support for the AMD. They have an excuse for that. Whether or not it's valid who knows. Since they do support that instruction set, it would seem deliberate "affirmative action" to disable it on AMD since AMD supports it in exactly the same way that Intel does on a purely functional level (independently of speed), and SSE is generally more efficient when applicable. However, if the decisions for how and when to use SSE instructions are intricately tied in with the rest of their code path algorithm (and possibly rely on internal structure of the CPU design), then the caveats I brought up earlier still apply. In any case, whether not supporting optimizations on AMD's CPUs was an affirmative design decision to undermine AMD machines or merely a failure to act (to benefit AMD machines), either way, it's clearly wrong for them to publish benchmarks to OEMs, etc. comparing AMD CPUs to Intel CPUs using their own compiler that specifically optimizes for Intel CPUs (based on Vendor ID no less!, but either way) and not for AMD CPUs. It's misleading, and according to the UTC, even when specifically confronted with the issue they would habitually either mislead or directly lie about the cause for the speed difference and whether it could be solved. So *that's* the part that's really devious, and I can see why the FTC sued them. I *hate* companies like that. Incidentally, though, all companies are companies like that. | |
| Reply To This Message |
| Intel's |
|---|
| Author: Agner Fog | Date: 2010-01-04 03:28 |
inhahe wrote:I think it's arguable whether or not Intel crippled AMD via an "affirmative engineering or design action" as opposed to a "failure to act" (as distinguished in the settlement).Checking for vendor ID is an affirmative action. The grey area is whether they are optimizing for specific CPU models or for specific instruction sets. There is only one case where they distinguish between different CPU models that have the same instruction set, namely Pentium 4 versus Pentium M. In most cases, however, they use the same code path for both, or the two paths are identical or almost identical. The distinction may be unimportant from a technical point of view, but it may give Intel a legal excuse for claiming that they are optimizing for specific CPU models. I doubt Intel can be required to optimize specifically for a CPU that's not theirsThe settlement doesn't require that. Another reason it's a gray area is that it's possible that the code path optimizations they took were obvious and would likely apply to any modern x86 CPU (though the fact that AMD and Intel are the only two players in the game sort of makes it beg the question),Most optimizations are indeed obvious applications of the available instruction set. If you have SSE2 you can do four additions in one instruction. That's an obvious thing to do regardless of CPU model. Don't forget there is a third player, VIA. Their chips are fast enough for being relevant here. given the small excerpt of the settlement shown, it seems possible to me that what they *actually* did is make something up that will sate AMD's lawyers while at the same time leaving the door open for them to either continue the same practice, or cease the practice (if it's too obviously anti-competitive or if they explicitly said they'd cease it elsewhere) but instate similar and/or related practices in the future, on account of the fact that those practices can easily be classified as "failures to act." [...] However, if the decisions for how and when to use SSE instructions are intricately tied in with the rest of their code path algorithm (and possibly rely on internal structure of the CPU design), then the caveats I brought up earlier still apply.Yes, they will probably be able to claim that. From a merely technical perspective, I think it's a bad idea to make different code paths for two processors that support the same instruction set based on whether a particular instruction runs a little faster on one than on the other. If you consider the time it takes to develop a complete program plus the time it takes to market it, then it is likely that the processors you optimized for will be obsolete for your most demanding customers before the time your software peaks on the market. My advise would certainly be to optimize for the newest processor, but make sure you maintain compatibility with older processors. But of course Intel compiler engineers are not obliged to listen to my advice if doing otherwise enables them to harm their competitors. In any case, whether not supporting optimizations on AMD's CPUs was an affirmative design decision to undermine AMD machines or merely a failure to act (to benefit AMD machines), either way, it's clearly wrong for them to publish benchmarks to OEMs, etc. comparing AMD CPUs to Intel CPUs using their own compiler that specifically optimizes for Intel CPUs (based on Vendor ID no less!, but either way) and not for AMD CPUs. It's misleading, and according to the UTC, even when specifically confronted with the issue they would habitually either mislead or directly lie about the cause for the speed difference and whether it could be solved. So *that's* the part that's really devious, and I can see why the FTC sued them. I *hate* companies like that. Incidentally, though, all companies are companies like that.Fortunately, not all companies are like that. I am sure this case has harmed Intel's reputation. They can be damn sure that their next compiler version will be thoroughly scrutinized. Hopefully, they will take their reputation into account when they design the next compiler version and function libraries. | |
| Reply To This Message |
| Intel's compiler is the best? |
|---|
| Author: Weber | Date: 2010-01-04 16:46 |
| You wrote: "There is no other Windows compiler with a similar performance, not even the Gnu compiler for Windows." Which compilers did you test? Did you try e.g. the Portland Group (PGI) compiler? That means, does Intel's compiler (icc) produce faster executables than e.g. the PGI compiler for Intel CPU's? Does icc (with a patched CPU dispatcher) even produce faster executables than e.g. PGI for AMD CPU's? Not that I'd try to challenge your claims (I haven't benchmarked any of the compilers), I'm just interested. Thanks! :-) | |
| Reply To This Message |
| Intel's compiler is the best? |
|---|
| Author: Agner Fog | Date: 2010-01-09 04:23 |
Weber wrote:Which compilers did you test?You can see my comparison of compilers in my C++ manual. The PathScale and PGI compilers are also fairly good, but not the best. | |
| Reply To This Message |
| Intel article |
|---|
| Author: Agner Fog | Date: 2010-01-22 04:04 |
| Intel have just published an article on how the CPU dispatching works in the Intel Performance Primitives (IPP) function library, see software.intel.com/en-us/articles/intel-integrated-performance-primitives-intel-ipp-understanding-cpu-optimized-code-used-in-intel-ipp/ The article indicates a fair handling of non-Intel processors in the IPP library. This is in accordance with my test results. What the article doesn't mention is the unfair CPU dispatching in several other Intel function libraries :-) | |
| Reply To This Message |
| Web Parallels |
|---|
| Author: | Date: 2010-01-04 10:12 |
| Really fascinating article. I've generally deferred to open source compilers, like GCC, but that's largely been because of preferring open source to proprietary software when possible. I'd not realized that Intel was doing this in their compilers, which is a major concern for me, since I purchased an Intel desktop processor for home use in well over a decade. What I found most interesting, is how this seems to parallel a fight we've been having in the web development community a lot over the last five years or so, on the benefits of feature detection versus browser detection. I'll defer to Nicholas Zakas' explanation here: www.nczonline.net/blog/2009/12/29/feature-detection-is-not-browser-detection/ Now, on the web it's a bit trickier, web browsers' behaviour tends to be considerably more variable than x86 behaviour across different vendors, but ultimately, the desire is the same. Check what is supported, and select code-paths based on what's supported, not based on whether you're on IE or Safari or Firefox, etc. And the bad design is generally made by the developer, not some third-party vendor (though not always, as Nicholas talks about in his post). I'm going to be publicizing this issue, to the best of my ability, since I think it's important that people understand the implications of choosing to use Intel's compiler, which is otherwise quite high quality. And, while I didn't understand why the antitrust suit was filed by the US government when it was announced, I'm certainly far more interested in it's outcome than I was before. | |
| Reply To This Message |
| More Parallels |
|---|
| Author: Agner Fog | Date: 2010-01-23 02:28 |
| I have always thought that it is bad programming practice to make software that relies on specific CPU brands and models. Any program that makes assumptions based on the CPU model number is likely to be obsolete as soon as a new processor appears on the market.
Unfortunately, this practice is more widespread than I thought. Here are some examples:
| |
| Reply To This Message |
| Early Examples |
|---|
| Author: | Date: 2010-02-01 22:03 |
| In fact, it dates to when AMD released the Athlon XP back in 2001, which was I think the first non-Intel processor that supported SSE. Back then it was discovered that Windows Movie Maker 1.1 shipped with the original RTM release of Windows XP as well as Windows Movie Encoder 7 did not use SSE on non-Intel processors. Luckily by the time of the XP launch in October 2001, MS was ready with WMM 1.2, as well as WME 7.1, which removed the vendor check, and AMD itself had patches as well. BTW, It was reported that it accounted for a dip in the Sysmark 2001 and Winstone 2002 Content Creation benchmarks. References: www.geek.com/articles/chips/wmm-v12-adds-athlon-xp-sse-support-for-wme-20020110/ www.anandtech.com/showdoc.aspx?i=1543&p=5 www.tomshardware.com/forum/68600-28-toms-super-comparision | |
| Reply To This Message |
| More Parallels |
|---|
| Author: | Date: 2010-02-20 23:56 |
| Also, currently OpenSolaris checks for a vendor of GenuineIntel before using SSSE3 and later extensions. While that wasn't a problem initially when the code was written because only Intel implemented them and AMD was going to go their own path with SSE5, VIA now implement them too with the Nano, and AMD later decided to change and implement all of them up to AVX in Bulldozer.
Bug has been filed at: | |
| Reply To This Message |
| Intel's "cripple AMD" function |
|---|
| Author: | Date: 2010-01-06 11:42 |
It is possible to change the CPUID of AMD processors by using the AMD virtualization instructions. I hope that somebody will volunteer to make a program for this purpose. This will make it easy for anybody to check if their benchmark is fair and to improve the performance of software compiled with the Intel compiler on AMD processors. And VIA processors can do this too, which would be useful not only for this, but also because some 64-bit OSes include a hardcoded list of CPU vendor IDs that the OS will run on. Windows is a common 64-bit OS that include such a list. Often older versions of such an OS will not include the CentaurHauls vendor ID. Windows would react by bluescreening the computer at boot. MS was able to release a hotfix that added CentaurHauls to the list, and it is included in Vista SP2 and 7. | |
| Reply To This Message |
| Intel's |
|---|
| Author: Agner Fog | Date: 2010-01-09 04:42 |
Yuhong Bao wrote:
And VIA processors can do this too, which would be useful not only for this, but also because some 64-bit OSes include a hardcoded list of CPU vendor IDs that the OS will run on. Windows is a common 64-bit OS that include such a list. Often older versions of such an OS will not include the CentaurHauls vendor ID. Windows would react by bluescreening the computer at boot. MS was able to release a hotfix that added CentaurHauls to the list, and it is included in Vista SP2 and 7.A list of known CPUs is really bad programming practice. Such a list is almost certain to be obsolete long before the software reaches the last customer. Software should make decisions based on what the CPU can do not on its name or model number. In the rare case where necessary information can't be extracted from a CPUID it is better to do a test than to rely on a list of known models. If somebody makes an x64 CPU emulator or a softcore in VHDL or another chip producer decides to go into the competition, then it wouldn't be able to run Windows unless you put in a known vendor ID, which would be a violation of trademark law if you publish the thing! PS. If you know how to change the vendor string on a VIA then please tell me. I can't find it and I have tried everything. | |
| Reply To This Message |
| Intel's |
|---|
| Author: | Date: 2010-01-15 01:47 |
| Yea, unfortunately VIA has not released public datasheets for the C7 and later CPUs. But according to the C3 Nehemiah datasheet at datasheets.chipdb.org/VIA/Nehemiah/VIA%20C3%20Nehemiah%20Datasheet%20R113.pdf, it was done using the FCR2 and FCR3 MSRs. | |
| Reply To This Message |
| New CPUID manipulation program |
|---|
| Author: Agner Fog | Date: 2010-01-22 03:54 |
| It it possible to manipulate the CPUID in VIA Nano processors. This feature is currently undocumented, and it is different from the method described in manuals for earlier VIA processors. I have got the necessary information now, and I have made a little program that can change the CPUID vendor string, family and model number on a VIA Nano processor. You can download the program from www.agner.org/optimize/cpuidfake.zip. It is open source (GPL). Please read the included instructions. The program requires a VIA Nano processor, and Windows (32 or 64 bit). This program makes it possible to test if a benchmark result depends on the CPUID vendor string. You can also test if the performance of other CPU-intensive programs perform differently depending on the CPUID. If you find any benchmark or other generally used CPU-intensive program that appears to have an unfair CPU dispatching then please let me know. You may also contact the producer of the program and ask if it has been built with an Intel compiler or Intel function libraries. It is important to know how widespread this problem is.
| |
| Reply To This Message |
| AMD Blog on compilers/benchmarch |
|---|
| Author: | Date: 2010-02-01 18:50 |
| Posted a blog on compilers/benchmarks - Chipping Away the Façade on Compilers and Benchmarks for AMD Processors blogs.amd.com/work/2010/01/22/chipping-away-the-facade-on-compilers-and-benchmarks-for-amd-processors/ | |
| Reply To This Message |
| New version is still crippling Intel's competitors |
|---|
| Author: Agner Fog | Date: 2010-06-29 04:16 |
| Intel have released a new version of their Math Kernel Library (v. 10.3) in beta test.
I have tested the new libraries and found that the CPU dispatching works basically the same way as before. The standard math library, vector math library, short vector math library and the 64-bit version of other math kernel library functions still use an inferior code path for non-Intel processors. I have found the following differences from previous versions:
Obviously, I haven't tested all functions in the library. There may be more differences that I haven't discovered. But it is clear that many functions in the new version of the library still cripples performance on non-Intel processors. I don't understand how they can do this without violating the legal settlement with AMD. | |
| Reply To This Message |
| Threaded View | Search | List | List Messageboards | Help |