Agner`s CPU blog

Software optimization resources | E-mail subscription to this blog | www.agner.org

Micro-fusion limited to 1-reg addressing modes
Author:  Date: 2015-07-11 21:39
uop micro-fusion on Intel SnB seems to be possible only when it doesn't create uops with more than 2 input dependencies. Intel's code analyzer (IACA, from https://software.intel.com/en-us/articles/intel-architecture-code-analyzer) knows about this, and real experiments on Sandybridge hardware confirm that it's real: See my answer to stackoverflow.com/questions/26046634/micro-fusion-and-addressing-modes.

I didn't see any mention of this in your optimization manual or microarchitecture docs.

I tested again with store instructions, as that's an example used in your microarch doc, and it seems they can only fuse when 1-reg addressing modes are used. For example,

mov [rsi + 0 + rdi], eax ; produces as many fused as unfused uops.

mov [rsi + 0], eax ; produces 1 fused-domain, 2 unfused-domain uops

I'm doing all my testing on 64bit Linux, on an i5 2500k (SnB). I just tested a 32bit binary, and got the same results, since your example did use 32bit registers. Same result: mov [esi+edi], eax can't micro-fuse. Assembled/linked with:
yasm -f elf32 uop-test.s && ld -m elf_i386 uop-test.o -o 32.uop-test
file(1) says it's a 32bit elf statically linked binary, so I'm pretty sure I did this right. :P

In the Core2/Nehalem section of your architecture guide, you say:
A fused μop can have three input dependencies, while an unfused μop can have only two.

I think this is wrong. I haven't tested Core2 or Nehalem, just SnB, but the SnB/IvB section simply refers back to the Nehalem section without mentioning any caveats. I'm sure it's wrong for SnB. IACA with -arch NHM doesn't show micro-fusion for 2-reg addresses for stores, or ALU ops, so this needs testing on Nehalem hardware, too. (IACA can't analyse for pre-Nehalem arches.)


off-topic: It'd be nice if the microarch doc didn't refer back to how things were somewhat different on older architectures quite as much. It gets to be a problem for micro-op fusion, where SnB refers you back to the Nehalem section AND the P-M section. At least the SnB section doesn't have anything new to add. I think it might be a good idea to have the Nehalem section not refer back to P-M, or at least summarize anything it doesn't say itself, though, since two levels of recursion is pushing it.

That's about the only bad thing I can say about your work, though! Overall, it's an amazing resource. Making each section stand alone would bloat things, and make it less obvious when things were the same for multiple CPUs, so that wouldn't be good, either.

 
thread Optimization manuals updated new - Agner - 2013-09-04
reply Optimization manuals updated new - Agner - 2014-02-19
replythread Latency of PTEST/VPTEST new - Nathan Kurz - 2014-05-20
last reply Latency of PTEST/VPTEST new - Agner - 2014-05-20
replythread Optimization manuals updated - Silvermont test new - Agner - 2014-08-08
last replythread Optimization manuals updated - Silvermont test new - Tacit Murky - 2014-08-11
last reply Optimization manuals updated - Silvermont test new - Agner - 2014-08-13
replythread Conditional operation new - Just_Coder - 2014-09-20
last replythread Conditional operation new - Agner - 2014-09-21
last reply Conditional operation new - Slacker - 2014-10-06
replythread Optimization manuals updated new - Slacker - 2014-10-06
last reply Optimization manuals updated new - jenya - 2014-10-10
replythread FP pipelines on Intel's Haswell core new - John D. McCalpin - 2014-10-17
reply FP pipelines on Intel's Haswell core new - Agner - 2014-10-18
last replythread FP pipelines on Intel's Haswell core new - Jorcy de Oliveira Neto - 2015-09-24
last reply FP pipelines on Intel's Haswell core new - Agner - 2015-09-25
replythread Micro-fusion limited to 1-reg addressing modes - Peter Cordes - 2015-07-11
replythread Micro-fusion limited to 1-reg addressing modes new - Agner - 2015-07-12
last reply Micro-fusion limited to 1-reg addressing modes new - Tacit Murky - 2015-11-15
last replythread Micro-fusion limited to 1-reg addressing modes new - Agner - 2015-12-01
reply Micro-fusion limited to 1-reg addressing modes new - Peter Cordes - 2015-12-15
last reply Micro-fusion limited to 1-reg addressing modes new - Peter Cordes - 2016-05-24
last replythread Skylake? new - Travis - 2015-10-21
last replythread Skylake? new - Agner - 2015-10-22
replythread Skylake? new - John D. McCalpin - 2015-10-22
reply Skylake? new - Adrian Bocaniciu - 2015-10-23
last reply Skylake? new - Bigos - 2015-10-23
last replythread Skylake? new - Slacker - 2015-10-24
last replythread Excavator and Puma new - Agner - 2015-12-16
reply Excavator and Puma new - Slacker - 2016-01-03
reply Excavator and Puma new - Daniel - 2016-01-16
last reply Excavator and Puma new - Jonathan Morton - 2016-02-02