Agner`s CPU blog

Software optimization resources | E-mail subscription to this blog | www.agner.org

 
thread How good is hyperthreading? - Agner Fog - 2009-09-29
replythread How good is hyperthreading? - StuffMaster - 2009-10-30
last replythread How good is hyperthreading? - Agner Fog - 2009-10-30
last replythread How good is hyperthreading? - Andrew Rodland - 2009-10-31
last replythread How good is hyperthreading? - Agner Fog - 2009-10-31
last replythread How good is hyperthreading? - Gabe Parmer - 2009-11-03
last reply How good is hyperthreading? - Agner Fog - 2009-11-03
replythread How good is hyperthreading? - Fred Bosick - 2009-12-14
last reply How good is hyperthreading? - Agner Fog - 2009-12-15
reply AMD Bulldozer - Agner Fog - 2009-12-15
last reply How good is hyperthreading? - Alain Amiouni - 2010-04-30
 
How good is hyperthreading?
Author: Agner Fog Date: 2009-09-29 04:35
Today, most high-end microprocessors have two or more cores. Multi-threaded applications take advantage of multi-core processors by running multiple threads simultaneously. If you are running four threads simultaneously on a processor with four cores you get four times as much work done per time unit.

Some processors take multithreading even further by running two threads in each core. This is what Intel calls hyperthreading (also called simultaneous multithreading). For example, the Intel Core i7 processor with four cores can run eight threads simultaneously - two in each core. Apparently, the more threads you can run simultaneously the more work you get done in a given time. But there is a problem here: The two threads running in the same core are competing for the same resources. If each of the two threads gets only half the amount of a limiting resource then it will run at half speed, and the advantage of hyperthreading is completely gone. Two threads running at half speed is certainly not better than a single thread running at full speed.

I have made some tests of hyperthreading to see how fast each of the two threads is running. The following resources are shared between two threads running in the same core:

  • Cache
  • Branch prediction resources
  • Instruction fetch and decoding
  • Execution units

Hyperthreading is no advantage if any of these resources is a limiting factor for the speed. But hyperthreading can be an advantage if the speed is limited by something else. To be more specific, each of the two threads will run at more than half speed in the following cases:

  • If memory data are so scattered that there will be many cache misses regardless of whether each thread can use the full cache or only half of it. Then one thread can use all the execution resources while the other thread is waiting for a memory operand that was not in the cache.
  • If there are many branch mispredictions and the number of branch mispredictions is not increased much by sharing the branch target buffer and branch history table between two threads. Then one thread can use all the execution resources while the other thread is waiting for the misprediction to be resolved.
  • If the code has many long dependency chains that prevent efficient use of the execution units.
In these cases, each of the two threads will run at more than half speed, but less than full speed. The total performance is never doubled by hyperthreading, but it may be increased by e.g. 25%. On the other hand, if the performance is limited by any of the shared resources, for example the instruction fetcher, the memory read port, or the multiply unit, then the total performance is not increased by hyperthreading. Actually, in the worst cases the total performance is decreased by hyperthreading because some resources are wasted when the two threads compete for the same resources. A quick google search reveals several examples of applications that run slower with hyperthreading than when hyperthreading is disabled.

I have tested two microprocessors with hyperthreading: the Intel Core i7 and the Intel Atom. The Core i7 has four cores. This processor is quite powerful. The execution units of each core are so powerful that a single thread will rarely utilize the full potential of the processor. Therefore, it makes good sense to run two threads in the same core. Unfortunately, the instruction fetch unit is less powerful, and this is likely to be a bottleneck even in single-threaded applications. With hyperthreading enabled, the Core i7 can run eight threads simultaneously. This can give an impressive performance in favorable cases, but how many applications are able to keep eight threads busy at the same time?

The Intel Atom is a small low-power processor which is used in small netbook computers and embedded applications. It has two cores capable of running two threads each. The execution units of the Atom are much smaller than the i7. It sounds like a weird idea to share the already meager execution units between two threads. The rationale is that the Atom lacks the out-of-order capabilities of the bigger processors. When the execution unit is waiting for an uncached memory operand or some other long-latency event, it would have nothing else to do in the meantime unless there was a second thread it could work on.

The details of these processors are explained in my microarchitecture manual www.agner.org/optimize/#manuals.

Obviously, it can be quite difficult for a software programmer to predict whether hyperthreading is good or bad for a particular application. The only safe way of answering this question is to test it. Ideally, the programmer should test his or her application on several different microprocessors with several different data sets and with hyperthreading turned on and off. This is a large burden indeed to put on software developers, and very few programmers are willing to spend time and money on testing how hyperthreading affects their application.

If it turns out that hyperthreading is not good for a particular application then comes the next problem of how to turn it off. Telling the user to turn off hyperthreading in the BIOS setup is not an option. The average user may not have the skills to do so; the feature may not be supported in the BIOS; or it may be that hyperthreading is good for one program and bad for another program running on the same computer. The programmer has to put the "avoid hyperthreading" feature into the program. First the program has to detect whether the computer it is running on has hyperthreading or not. Later versions of Windows have system functions that can give this information. In Linux you have to read a configuration file. If hyperthreading is detected then lock the process to use the even-numbered logical processors only. This will make one of the two threads in each processor core idle so that there is no contention for resources.

Unfortunately, you cannot prevent the operating system from using the idle threads for something else. There is no way to tell the microprocessor to give one of the two threads in a core higher priority than another. Sometimes it happens that the operating system lets two threads with very different priority run in the same processor core. This has the unfortunate consequence that the low-priority thread steals resources from the high-priority thread. I have seen this happening even with the new Windows 7. It is the responsibility of the operating system to avoid putting threads with different priority into the same core. But unfortunately, operating system designers haven't fully solved this problem yet.

What the application programmer needs is a system call that tells the operating system that "This application wants to run no more than one thread in each core and I don't want to share any core with any other processes". Unfortunately, current operating systems have no such system call to my knowledge.

Other microprocessor vendors use hyperthreading as well. In fact, there are rumors that AMD will use hyperthreading in some of their processors in the future.

Hyperthreading does indeed give a measurable advantage that shows in benchmark tests. This is a strong sales argument that may convince the confused consumer. But the microprocessor designer should also take into account that few applications are able to handle hyperthreading optimally. This is a technology that places a considerable burden on software developers as well as on operating system designers. We may ask whether the silicon space that is used for implementing hyperthreading might be better used for other purposes?

   
How good is hyperthreading?
Author: StuffMaster Date: 2009-10-30 09:34
Actually, hyperthreading *is* SMT. It's just Intel's name for it. Running 4 threads on 4 processors is SMP (regardless of whether they're threads or process, I think).
   
How good is hyperthreading?
Author: Agner Fog Date: 2009-10-30 09:48
StuffMaster wrote:
Actually, hyperthreading *is* SMT. It's just Intel's name for it.

I would say that hyperthreading is a subset of simultaneous multithreading.

If you are running two threads in two different cores you have simultaneous multithreading, but not hyperthreading.

   
How good is hyperthreading?
Author: Andrew Rodland Date: 2009-10-31 01:00
Multiple threads on multiple processors/cores is "multithreading" and of course it's "simultaneous" but it's still not what's meant by SMT. SMT is multiple threads of execution *per core*, unless you want to confuse the whole world.
   
How good is hyperthreading?
Author: Agner Fog Date: 2009-10-31 02:47
Andrew Rodland wrote:
SMT is multiple threads of execution *per core*.
The definition of SMT in Wikipedia is not clear and the reference links are dead. Can anybody point to an authoritative reference?

I think the term SMT should be avoided because it is ambiguous. However, I am not sure the term hyperthreading is better. Hyperthreading is not ambiguous, but it's not self-explaining either. I don't know if the term hyperthreading is trademarked.

   
How good is hyperthreading?
Author: Gabe Parmer Date: 2009-11-03 09:14
As defined in Levy's ISCA paper, SMT == hyperthreading.

www.cs.washington.edu/research/smt

and the publication:

"Simultaneous Multithreading: Maximizing On-Chip Parallelism"
with ps link @
www.cs.washington.edu/research/smt/papers/isca95abstract.html

The name SMT might be quite ambiguous now and I agree it probably shouldn't be used. However, for history's sake, it is well-defined, and was good research.

   
How good is hyperthreading?
Author: Agner Fog Date: 2009-11-03 09:45
Thank you for the references, Gabe. I have corrected the original message.
   
How good is hyperthreading?
Author:  Date: 2009-12-14 00:13
If I may be so bold, I may have discovered two disadvantages to Hyperthreading. The first is very iffy and should be tested more to come to a conclusion. A processor with Hyperthreading enabled puts out more heat for a given load. The 2nd is so compelling that I no longer bother to test for the first one. It's a subtle problem which may not ever show up for anyone else in general use. One needs to obtain a program called "crashme" by some guy named Carrette. It spawns processes and hands the processor garbage data to execute in them. An OS crash is the intended result.

Running Win2003 Server on a dual Opteron system, the program could not cause a crash. When I transferred the system to reside atop a 950, it crashed within an hour. After I turned off Hyperthreading, it never crashes even after running all night. I have not kept conditions to be absolutely identical between all of these runs. I like to use my computer at least occasionally. :-) So this must be tested by others in case I am mistaken.

(The OS in question is Win2003 Server 32bit, Enterprise, with all the patches up to the time of testing, including SP2. The first machine was a dual Opteron 248 on an Iwill dk8n board. The 2nd machine is a ASUS P6T SE and an Intel Core i7 950, stepping D0. Both mainboard BIOSes regularly updated to latest versions.)

Thanks!

   
How good is hyperthreading?
Author: Agner Fog Date: 2009-12-15 01:57
Fred Bosick wrote:
A processor with Hyperthreading enabled puts out more heat for a given load.
This claim must be tested.

One needs to obtain a program called "crashme" by some guy named Carrette. It spawns processes and hands the processor garbage data to execute in them. [...] When I transferred the system to reside atop a 950, it crashed within an hour. After I turned off Hyperthreading, it never crashes even after running all night.
Interesting. We need to know whether this is an operating system issue or a hardware bug. It would be nice to test it on different operating systems and with different microprocessors (Core i7, Atom).
   
AMD Bulldozer
Author: Agner Fog Date: 2009-12-15 02:15
There have been various rumors about whether AMD would use the same technology. Now early info on the forthcoming AMD Bulldozer processor seems to indicate that it will share the instruction fetch and decode and the FP/SIMD unit between two threads, but not the integer units or other parts of the pipeline.

This means that there will be less interference between threads unless both threads have a lot of floating point or SIMD code. So far so good, but it will be very difficult for the operating system to detect if threads are FP/SIMD intensive and to move such threads around to avoid interference. And it is even more difficult to predict whether instruction fetch and decoding will be a bottleneck. There will probably be a buffer after the instruction decoder, but that doesn't help if there are many branch mispredictions.

The fact that different processors share different resources between threads makes it very difficult for software developers to optimize their code. You may have to test an application on several different processors in order to detect whether it is advantageous or not to run two threads per core on each particular processor. Processor-specific optimization is generally a bad idea because the software has to be updated every time a new processor appears on the market. Ideally, there should be CPUID feature bits to tell exactly which resources are shared and which resources are not shared between threads; and the programmer should make a code that predicts whether it is optimal to run one or two threads per core based on what resources are critical to that particular application and whether these resources are shared or not.

I can't wait to get access to the new AMD processor to test how much interference there actually is between threads.

References:
phx.corporate-ir.net/External.File?item=UGFyZW50SUQ9MjAzMzJ8Q2hpbGRJRD0tMXxUeXBlPTM=&t=1
phx.corporate-ir.net/External.File?item=UGFyZW50SUQ9MjAzMjd8Q2hpbGRJRD0tMXxUeXBlPTM=&t=1
www.amdzone.com/phpbb3/viewtopic.php?f=52&t=137216
www.theregister.co.uk/2009/12/14/amd_bulldozer_preview/

This message modified 2010-03-19.

   
How good is hyperthreading?
Author: Alain Amiouni Date: 2010-04-30 12:17
A straightforward example of hyperthreading side effects is well known of chess engines users.
Rybka 3, for instance, incurs a 20% penalty when hyperthreading is enabled (on my i7 950, affinities not set).
So one must set affinities from the task manager, mapping each engine instance to an even-numbered logical processor. Unfortunately it does not work under all chess GUIs and you often trigger (with Vista and 7) an access violation when trying to do so. Curiously chess engine programmers seem not to worry much about this issue.
As expected, chess engines run faster when HT is on and affinities are set, than when HT is off. Probably because it lets free logical processors for the OS.