Agner`s CPU blog

Software optimization resources | E-mail subscription to this blog | www.agner.org

Micro-fusion limited to 1-reg addressing modes
Author:  Date: 2015-12-15 20:25
Agner wrote:
Peter Cordes wrote:
uop micro-fusion on Intel SnB seems to be possible only when it doesn't create uops with more than 2 input dependencies.
I have now tested this on Sandy Bridge, Ivy Bridge, Haswell and Broadwell. I have not had access to test on a Skylake yet.

The results show that instructions with three input dependencies are fusing alright and use only a single entry in the micro-operation cache ...

I guess you didn't see my response on Stackoverflow to your 2nd answer there.

Our test results disagree. I see a change in the uop perf counters, and an increase in the clock cycles taken, when changing from or eax, [rsi] to or eax, [rsi+rdi]. I didn't try to measure uop-cache slots, just a total cycle count, and fused/unfused uop counts. My full test code, and the Linux perf command I used to get data from the performance counters, is posted on stackoverflow.

Based on Tacit Murky's information that SnB's internal uop format doesn't have room for a micro-fused index register, maybe 2-register addressing modes can still micro-fuse in the uop cache, but not in the pipeline where the ROB tracks them.

Did your test results make an assumption about uops in the uop cache being the same as fused-domain uops in the pipeline?

I re-ran my test after seeing your response, and I'm still sure I'm seeing 2-register source operands NOT micro-fusing. If I'm wrong, can you please have a look and help me figure out what's wrong with my test procedure? I've been using
ocperf.py stat -r4 -e task-clock,cycles,instructions,uops_issued.any,uops_dispatched.thread,uops_retired.all,uops_retired.retire_slots,stalled-cycles-frontend,stalled-cycles-backend ./uop-test

I'm essentially testing fused-domain uops against the 4-wide limit of the pipeline for issuing / retiring 4 fused-domain uops per clock. Some of my fused-domain uops are NOPs, to avoid execution port unfused-domain bottlenecks on SnB.

 
thread Optimization manuals updated new - Agner - 2013-09-04
reply Optimization manuals updated new - Agner - 2014-02-19
replythread Latency of PTEST/VPTEST new - Nathan Kurz - 2014-05-20
last reply Latency of PTEST/VPTEST new - Agner - 2014-05-20
replythread Optimization manuals updated - Silvermont test new - Agner - 2014-08-08
last replythread Optimization manuals updated - Silvermont test new - Tacit Murky - 2014-08-11
last reply Optimization manuals updated - Silvermont test new - Agner - 2014-08-13
replythread Conditional operation new - Just_Coder - 2014-09-20
last replythread Conditional operation new - Agner - 2014-09-21
last reply Conditional operation new - Slacker - 2014-10-06
replythread Optimization manuals updated new - Slacker - 2014-10-06
last reply Optimization manuals updated new - jenya - 2014-10-10
replythread FP pipelines on Intel's Haswell core new - John D. McCalpin - 2014-10-17
reply FP pipelines on Intel's Haswell core new - Agner - 2014-10-18
last replythread FP pipelines on Intel's Haswell core new - Jorcy de Oliveira Neto - 2015-09-24
last reply FP pipelines on Intel's Haswell core new - Agner - 2015-09-25
replythread Micro-fusion limited to 1-reg addressing modes new - Peter Cordes - 2015-07-11
replythread Micro-fusion limited to 1-reg addressing modes new - Agner - 2015-07-12
last reply Micro-fusion limited to 1-reg addressing modes new - Tacit Murky - 2015-11-15
last replythread Micro-fusion limited to 1-reg addressing modes new - Agner - 2015-12-01
reply Micro-fusion limited to 1-reg addressing modes - Peter Cordes - 2015-12-15
last reply Micro-fusion limited to 1-reg addressing modes new - Peter Cordes - 2016-05-24
last replythread Skylake? new - Travis - 2015-10-21
last replythread Skylake? new - Agner - 2015-10-22
replythread Skylake? new - John D. McCalpin - 2015-10-22
reply Skylake? new - Adrian Bocaniciu - 2015-10-23
last reply Skylake? new - Bigos - 2015-10-23
last replythread Skylake? new - Slacker - 2015-10-24
last replythread Excavator and Puma new - Agner - 2015-12-16
reply Excavator and Puma new - Slacker - 2016-01-03
reply Excavator and Puma new - Daniel - 2016-01-16
last reply Excavator and Puma new - Jonathan Morton - 2016-02-02