Agner`s CPU blog

Software optimization resources | E-mail subscription to this blog | www.agner.org

Optimization manuals updated
Author: Agner Date: 2014-02-19 05:15

The optimization manuals at www.agner.org/optimize/#manuals have now been updated with test of the AMD Steamroller microprocessor.

There are also minor additions regarding the forthcoming AVX-512 instruction set.

I have not tested the Intel Silvermont/Bay Trail processor yet because the test machine I have access to cannot run Linux, and the kind of tests that I want to do are very difficult to do under Windows.

Test results for AMD Steamroller

  • Similar microarchitecture to Bulldozer and Piledriver
  • Has one instruction decoder per thread, where previous designs shared a decoder between two threads. This removes a potential bottleneck.
  • Instruction fetch is still shared between two threads. This is a likely bottleneck.
  • Instruction cache increased by 50%
  • New loop buffer can store at least 32 decoded instructions. The exact size is not known
  • Improved throughput for level-2 cache write
  • Store forwarding improved. Supports small read after bigger write
  • Floating point / vector unit redesigned with three pipes where previous designs had four pipes
  • Maximum throughput is four instructions per clock when integer and vector instructions are mixed
  • Floating point division improved
  • No penalty for floating point denormal and underflow results
  • Some performance flaws in Piledriver have been fixed. Most importantly, 256-bit stores are now performing well
  • A new performance flaw has been added, though. Floating point vector addition has lower throughput than expected
  • Supports AVX, but not AVX2
 
thread Optimization manuals updated new - Agner - 2013-09-04
reply Optimization manuals updated - Agner - 2014-02-19
replythread Latency of PTEST/VPTEST new - Nathan Kurz - 2014-05-20
last reply Latency of PTEST/VPTEST new - Agner - 2014-05-20
replythread Optimization manuals updated - Silvermont test new - Agner - 2014-08-08
last replythread Optimization manuals updated - Silvermont test new - Tacit Murky - 2014-08-11
last reply Optimization manuals updated - Silvermont test new - Agner - 2014-08-13
replythread Conditional operation new - Just_Coder - 2014-09-20
last replythread Conditional operation new - Agner - 2014-09-21
last reply Conditional operation new - Slacker - 2014-10-06
replythread Optimization manuals updated new - Slacker - 2014-10-06
last reply Optimization manuals updated new - jenya - 2014-10-10
replythread FP pipelines on Intel's Haswell core new - John D. McCalpin - 2014-10-17
reply FP pipelines on Intel's Haswell core new - Agner - 2014-10-18
last replythread FP pipelines on Intel's Haswell core new - Jorcy de Oliveira Neto - 2015-09-24
last reply FP pipelines on Intel's Haswell core new - Agner - 2015-09-25
replythread Micro-fusion limited to 1-reg addressing modes new - Peter Cordes - 2015-07-11
replythread Micro-fusion limited to 1-reg addressing modes new - Agner - 2015-07-12
last reply Micro-fusion limited to 1-reg addressing modes new - Tacit Murky - 2015-11-15
last replythread Micro-fusion limited to 1-reg addressing modes new - Agner - 2015-12-01
reply Micro-fusion limited to 1-reg addressing modes new - Peter Cordes - 2015-12-15
last reply Micro-fusion limited to 1-reg addressing modes new - Peter Cordes - 2016-05-24
last replythread Skylake? new - Travis - 2015-10-21
last replythread Skylake? new - Agner - 2015-10-22
replythread Skylake? new - John D. McCalpin - 2015-10-22
reply Skylake? new - Adrian Bocaniciu - 2015-10-23
last reply Skylake? new - Bigos - 2015-10-23
last replythread Skylake? new - Slacker - 2015-10-24
last replythread Excavator and Puma new - Agner - 2015-12-16
reply Excavator and Puma new - Slacker - 2016-01-03
reply Excavator and Puma new - Daniel - 2016-01-16
last reply Excavator and Puma new - Jonathan Morton - 2016-02-02