Agner`s CPU blog

Software optimization resources | E-mail subscription to this blog |

Test results for Intel's Sandy Bridge processor
Author: Agner Date: 2013-08-10 05:52
Now I have done some tests of the alignment effects. This explains the weird results I have seen earlier where the performance was improved when some instructions was made longer.
mov ebp, 100
align 32
%rep 100 
                     ;  uops  Bytes
cmove eax,eax        ;    2      3
cmove ebx,ebx        ;    2      3
xchg  r8,r9          ;    3      3
nop7                 ;    1      7
nop7                 ;    1      7
nop8                 ;    1      8
nop                  ;    1      1
;                Total:  11     32
dec ebp
jnz LL

This takes almost 4 clocks. When I add a nop after align 32 to change the alignment by one byte, it takes only 3 clocks. The explanation is this. Each µop cache line can take 6 µops. The first two instructions take one µop cache line. The xchg instruction cannot cross a cache line so it starts in a new cache line. The next three instructions go in the same line, and the last nop takes a third line. Then there is a 32-bytes boundary and we start a new cache line. In total we need 300 cache lines, and there are only 256 lines in the µop cache. The loop doesn't fit into the µop cache, so the decoders become the bottleneck. When the alignment is changed, the last nop goes together with the two cmove instructions in the next iteration, and we need only 200 cache lines. Now it fits into the µop cache and the speed goes up. The same can be obtained by lowering the repeat count.

thread Test results for Intel's Sandy Bridge processor new - Agner - 2011-01-30
reply Test results for Intel's Sandy Bridge processor new - PaulR - 2011-02-15
replythread AVX2 new - phis - 2011-06-23
last reply AVX2 new - Agner - 2011-06-23
replythread Test results for Intel's Sandy Bridge processor new - anon - 2013-08-01
last replythread Test results for Intel's Sandy Bridge processor new - Agner - 2013-08-06
last replythread Test results for Intel's Sandy Bridge processor new - anon - 2013-08-07
last replythread Test results for Intel's Sandy Bridge processor new - Agner - 2013-08-07
last replythread Test results for Intel's Sandy Bridge processor new - anon - 2013-08-07
last replythread Test results for Intel's Sandy Bridge processor new - Agner - 2013-08-08
last replythread Test results for Intel's Sandy Bridge processor new - anon - 2013-08-08
last replythread Test results for Intel's Sandy Bridge processor new - Agner - 2013-08-09
last replythread Test results for Intel's Sandy Bridge processor new - anon - 2013-08-09
last replythread Test results for Intel's Sandy Bridge processor new - Agner - 2013-08-10
last reply Test results for Intel's Sandy Bridge processor - Agner - 2013-08-10
replythread Test results for Intel's Sandy Bridge processor new - John D. McCalpin - 2013-10-09
last replythread Test results for Intel's Sandy Bridge processor new - Agner - 2013-10-10
last replythread Test results for Intel's Sandy Bridge processor new - John D. McCalpin - 2013-10-11
last replythread SB's L1D banks new - Tacit Murky - 2013-11-03
last reply SB's L1D banks new - John D. McCalpin - 2013-11-07
replythread Test results for Intel's Sandy Bridge processor new - John D. McCalpin - 2015-08-18
replythread Test results for Intel's Sandy Bridge processor new - Agner - 2015-08-18
last replythread Test results for Intel's Sandy Bridge processor new - John D. McCalpin - 2015-08-24
last replythread Test results for Intel's Sandy Bridge processor new - Agner - 2015-08-25
last reply Test results for Intel's Sandy Bridge processor new - John D. McCalpin - 2015-08-25
replythread Haswell upper128 power gating new - Peter Cordes - 2015-08-28
last replythread Haswell upper128 power gating new - Agner - 2016-01-16
last replythread Haswell upper128 power gating new - John D. McCalpin - 2016-01-29
last reply Haswell upper128 power gating new - Agner - 2016-01-30
last replythread Test results for Intel's Sandy Bridge processor new - Agner - 2015-12-20
last replythread Test results for Intel's Sandy Bridge processor new - John D. McCalpin - 2015-12-21
last replythread Test results for Intel's Sandy Bridge processor new - Agner - 2015-12-22
reply Test results for Intel's Sandy Bridge processor new - Robert - 2015-12-24
last replythread Test results for Intel's Sandy Bridge processor new - Just_Coder - 2015-12-25
last reply Test results for Intel's Sandy Bridge processor new - Agner - 2015-12-26
last replythread Test results for Intel's Sandy Bridge processor new - Just_Coder - 2015-08-23
last reply Test results for Intel's Sandy Bridge processor new - Agner - 2015-08-25