Agner`s CPU blog

Software optimization resources | E-mail subscription to this blog | www.agner.org

Indexed registers
Author:  Date: 2016-10-04 10:57
Kurt Baumgardner wrote:
[...]
Well, that's a shame. I've been looking at some 80x86 disassembly of a C .exe recently, that contained a lot of function calls. Most of them do a bunch of PUSHes, read some stack vars, followed by the function's actual purpose, and end with the POPs, and finally the return. And, of course that's in the majority of function calls: this constant pattern of register save, fill, use, reload, followed by return (another stack pop). Seems like there should be a better way. Maybe instead of a set for every program/thread, you could have just enough space for a few levels deep of registers. This would correspond with the "call stack". Since most calls come with PUSHes, and most returns are proceeded by POPs, it seems like a prime candidate for optimization, for both speed and code size.

Maybe that's still too complex or slow for hardware, though.

Function calls and their associated push/pops are very common, but a large majority of them happen on cold paths:
- Anything that happens on a cold path rarely makes any noticeable speed difference... Even moderately warm paths tend to count a lot less than the hottest paths in overall speed.
- The hottest paths often are loops and tend to either not have any function calls, or have calls that are only occasionally called (error conditions etc), or have calls that can be inlined - which means the compiler can automatically allocate registers and even reorder operations within the parent function, as long as the function is small enough for this to be beneficial.
- Hot loops that do have function calls are often very large and complex loops that are likely to stall for all sorts of other reasons, such as not fitting in the instruction cache, having data cache misses, having pointer-following chains that stall due to data cache latency, having hard-to-predict branches, loading/storing lots of values from objects and so forth, so the function call overhead might not stand out in the wash, or if you're lucky, it could happen while the CPU is waiting after something else (which makes it free).

One option would be register windows like on SPARC, which acts as a hardware assisted stack. Here's an article about that:
http://ieng9.ucsd.edu/~cs30x/sparcstack.html

Another option would be load-dual and store-dual instructions, which let you combine two 64-bit register memory operations into a single 128-bit store (as long as your stack is 128 bit-aligned) and is probably as fast possible for saving/restoring a bunch of registers at the same time (especially if you can dual-issue it to a 2-port data cache).

 
thread Proposal for instruction set - now on Github - Agner - 2016-06-26
replythread Proposal for instruction set - now on Github - Joe Duarte - 2016-07-04
last replythread Proposal for instruction set - now on Github - Agner - 2016-07-04
replythread Proposal for instruction set - now on Github - Hubert Lamontagne - 2016-07-06
last replythread Proposal for instruction set - now on Github - Agner - 2016-07-06
last replythread Proposal for instruction set - now on Github - Hubert Lamontagne - 2016-07-07
last reply Proposal for instruction set - now on Github - Agner - 2016-07-07
replythread Whole-function vectorization and conditionals - Sylvain Collange - 2016-08-15
last replythread Whole-function vectorization and conditionals - Agner - 2016-08-15
last replythread Whole-function vectorization and conditionals - Sylvain Collange - 2016-08-15
last replythread Whole-function vectorization and conditionals - Agner - 2016-08-15
last replythread Whole-function vectorization and conditionals - Sylvain Collange - 2016-08-15
last replythread Whole-function vectorization and conditionals - Agner - 2016-08-15
reply Number of input dependencies - Agner - 2016-08-16
last replythread Whole-function vectorization and conditionals - Sylvain Collange - 2016-08-16
last replythread Whole-function vectorization and conditionals - Agner - 2016-08-17
last replythread Merging with first operand - Sylvain Collange - 2016-08-18
last replythread Merging with first operand - Agner - 2016-08-19
replythread SIMD exceptions are fine with masking - Sylvain Collange - 2016-08-19
last replythread SIMD exceptions are fine with masking - Agner - 2016-08-20
reply SIMD exceptions are fine with masking - Hubert Lamontagne - 2016-08-20
last reply SIMD exceptions are fine with masking - Sylvain Collange - 2016-08-25
last reply Merging with first operand - Hubert Lamontagne - 2016-08-19
last replythread Proposal for instruction set - now on Github - Joe Duarte - 2016-08-17
last replythread Proposal for instruction set - now on Github - Agner - 2016-08-18
last replythread Proposal for instruction set - now on Github - Joe Duarte - 2016-08-31
reply Proposal for instruction set - now on Github - Agner - 2016-08-31
last reply Proposal for instruction set - now on Github - Jorcy Neto - 2016-09-01
replythread Proposal for instruction set - now on Github - Yuhong Bao - 2016-07-12
last reply Proposal for instruction set - now on Github - Hubert Lamontagne - 2016-07-12
replythread Things from MIPS (and novel things) - Anonymous - 2016-07-28
replythread Things from MIPS (and novel things) - Agner - 2016-07-28
last reply Things from MIPS (and novel things) - Hubert Lamontagne - 2016-07-28
last replythread Matrix multiplication - Agner - 2016-07-29
reply Matrix multiplication - Hubert Lamontagne - 2016-07-29
last replythread Matrix multiplication - John D. McCalpin - 2016-07-29
last reply Matrix multiplication - Agner - 2016-07-29
replythread Introduction website - Agner - 2016-08-01
last replythread Introduction website - EricTL - 2017-07-17
last replythread Introduction website - Agner - 2017-07-18
last replythread Introduction website - EricTL - 2017-07-20
last reply Introduction website - Agner - 2017-07-20
replythread Proposal for instruction set - now on Github - Joe Duarte - 2016-08-04
last replythread Proposal for instruction set - now on Github - Agner - 2016-08-04
last replythread Proposal for instruction set - now on Github - Hubert Lamontagne - 2016-08-05
replythread Proposal for instruction set - now on Github - Agner - 2016-08-06
last replythread Proposal for instruction set - now on Github - fanoI - 2016-08-08
last replythread Proposal for instruction set - now on Github - Agner - 2016-08-08
last reply Proposal for instruction set - now on Github - fanoI - 2016-08-09
last replythread Proposal for instruction set - now on Github - Joe Duarte - 2016-08-08
last replythread Proposal for instruction set - now on Github - Hubert Lamontagne - 2016-08-09
last replythread Proposal for instruction set - now on Github - Joe Duarte - 2016-08-11
last replythread Proposal for instruction set - now on Github - Agner - 2016-08-12
last reply Proposal for instruction set - now on Github - Hubert Lamontagne - 2016-08-12
replythread Proposal for instruction set - now on Github - grant galitz - 2016-08-22
reply Proposal for instruction set - now on Github - Agner - 2016-08-22
last reply Proposal for instruction set - now on Github - Hubert Lamontagne - 2016-08-24
replythread ARM with scalable vector extensions - Agner - 2016-08-23
replythread ARM with scalable vector extensions - Jorcy Neto - 2016-08-23
last reply ARM with scalable vector extensions - Hubert Lamontagne - 2016-08-26
last reply ARM with scalable vector extensions - Jorcy Neto - 2016-12-20
replythread Proposal for instruction set - now on Github - Hubert Lamontagne - 2016-09-05
replythread Proposal for instruction set - now on Github - Agner - 2016-09-05
replythread Proposal for instruction set - now on Github - Hubert Lamontagne - 2016-09-05
last replythread Proposal for instruction set - now on Github - Agner - 2016-09-06
reply Proposal for instruction set - now on Github - Bigos - 2016-09-06
last replythread Proposal for instruction set - now on Github - Hubert Lamontagne - 2016-09-06
last replythread Proposal for instruction set - now on Github - Agner - 2016-09-07
last replythread Proposal for instruction set - now on Github - Hubert Lamontagne - 2016-09-07
last replythread Proposal for instruction set - now on Github - Agner - 2016-09-08
last reply Proposal for instruction set - now on Github - Hubert Lamontagne - 2016-09-08
last replythread Proposal for instruction set - now on Github - Commenter - 2016-09-07
last reply Proposal for instruction set - now on Github - Bigos - 2016-09-08
last replythread Paging - Kurt Baumgardner - 2016-09-09
replythread Paging - Agner - 2016-09-10
reply Paging - Hubert Lamontagne - 2016-09-11
last replythread Paging - Kurt Baumgardner - 2016-09-13
replythread Paging - Agner - 2016-09-13
last reply Paging - Kurt Baumgardner - 2016-09-13
last replythread Paging - Hubert Lamontagne - 2016-09-13
last reply Paging - Kurt Baumgardner - 2016-09-14
replythread Paging - Hubert Lamontagne - 2016-09-11
last reply Paging - Kurt Baumgardner - 2016-09-13
last replythread Paging - Agner - 2016-09-14
last reply Paging - Jorcy Neto - 2016-09-18
replythread A null register? - csdt - 2016-09-23
last replythread A null register? - Agner - 2016-09-24
last replythread A null register? - Hubert Lamontagne - 2016-09-24
replythread A null register? - csdt - 2016-09-26
last reply A null register? - Agner - 2016-09-27
last replythread Indexed registers - Kurt Baumgardner - 2016-09-26
last replythread Indexed registers - Agner - 2016-09-27
replythread Indexed registers - Kurt Baumgardner - 2016-09-27
last reply Indexed registers - Agner - 2016-09-28
last replythread Indexed registers - Hubert Lamontagne - 2016-09-28
last replythread Indexed registers - Kurt Baumgardner - 2016-10-03
reply Indexed registers - Agner - 2016-10-03
last replythread Indexed registers - Hubert Lamontagne - 2016-10-04
last replythread Bilinear Interpolation - Hubert Lamontagne - 2016-10-28
last replythread Bilinear Interpolation - Agner - 2016-10-29
last replythread Bilinear Interpolation - Hubert Lamontagne - 2016-10-29
last replythread Bilinear Interpolation - Agner - 2016-10-30
last reply Bilinear Interpolation - Hubert Lamontagne - 2016-10-30
replythread ForwardCom version 1.04 - Agner - 2016-12-08
replythread ForwardCom version 1.04 - Matthias Bentrup - 2016-12-12
last replythread ForwardCom version 1.04 - Agner - 2016-12-12
last reply ForwardCom version 1.04 - Matthias Bentrup - 2016-12-14
last replythread Async system calls; horizontal packing instruction - Joe Duarte - 2016-12-14
reply Async system calls; horizontal packing instruction - Agner - 2016-12-15
last replythread Comparison of instruction sets - Agner - 2016-12-17
replythread Comparison of instruction sets - Joe Duarte - 2016-12-28
reply Comparison of instruction sets - Agner - 2016-12-29
last reply Comparison of instruction sets - Hubert Lamontagne - 2016-12-30
last reply Comparison of instruction sets - Hubert Lamontagne - 2017-01-05
replythread ForwardCom version 1.05 - Agner - 2017-01-22
replythread Syscall/ISR acceleration - Jonathan Brandmeyer - 2017-01-22
last replythread Syscall/ISR acceleration - Agner - 2017-01-23
last replythread Syscall/ISR acceleration - Jonathan Brandmeyer - 2017-01-25
last reply Syscall/ISR acceleration - Agner - 2017-01-25
replythread ForwardCom version 1.05 - Jiří Moravec - 2017-01-23
last reply ForwardCom version 1.05 - Agner - 2017-01-24
last replythread Jump prefetch? - csdt - 2017-01-27
last replythread Jump prefetch? - Agner - 2017-01-27
last replythread Jump prefetch? - csdt - 2017-01-30
last replythread Jump prefetch? - Agner - 2017-01-30
last replythread Jump prefetch? - csdt - 2017-01-30
replythread Jump prefetch? - Agner - 2017-01-31
reply Jump prefetch? - csdt - 2017-01-31
last replythread Jump prefetch? - Hubert Lamontagne - 2017-02-01
last replythread Jump prefetch? - Agner - 2017-02-01
last replythread Jump prefetch? - Hubert Lamontagne - 2017-02-01
last replythread Jump prefetch? - Agner - 2017-02-02
last reply Jump prefetch? - Agner - 2017-02-14
last replythread Jump prefetch? - Hubert Lamontagne - 2017-01-31
last replythread High precision arithmetic - fanoI - 2017-03-21
last reply High precision arithmetic - Agner - 2017-03-21
replythread Intel's Control-flow Enforcement Technology - Joe Duarte - 2017-04-13
last reply Intel's Control-flow Enforcement Technology - Agner - 2017-04-14
reply Proposal for instruction set - now on Github - Agner - 2017-04-27
replythread Assembler with metaprogramming features - Agner - 2017-07-27
last replythread Assembler with metaprogramming features - Kai Rese - 2017-08-11
last replythread Assembler with metaprogramming features - Agner - 2017-08-11
last replythread Assembler with metaprogramming features - Kai Rese - 2017-08-14
last replythread Assembler with metaprogramming features - Agner - 2017-08-14
last reply Assembler with metaprogramming features - Kai Rese - 2017-08-15
replythread Number of register file ports in implementations - Hubert Lamontagne - 2017-08-22
last replythread Number of register file ports in implementations - Agner - 2017-08-23
last replythread Number of register file ports in implementations - Hubert Lamontagne - 2017-08-27
last replythread Number of register file ports in implementations - Agner - 2017-08-28
reply Number of register file ports in implementations - Bigos - 2017-08-28
last reply Number of register file ports in implementations - Hubert Lamontagne - 2017-08-28
replythread Proposal for instruction set - now on Github - yeengief - 2017-09-20
replythread Proposal for instruction set - now on Github - Agner - 2017-09-20
last replythread Proposal for instruction set - now on Github - yeengief - 2017-09-20
last replythread Proposal for instruction set - now on Github - Agner - 2017-09-20
last replythread Proposal for instruction set - now on Github - yeengief - 2017-09-21
last replythread Proposal for instruction set - now on Github - Agner - 2017-09-21
last replythread Proposal for instruction set - now on Github - yeengief - 2017-09-21
last reply Proposal for instruction set - now on Github - Agner - 2017-09-23
replythread Proposal for instruction set - now on Github - - - 2017-09-22
last reply Proposal for instruction set - now on Github - Agner - 2017-09-23
last replythread Proposal for instruction set - now on Github - Hubert Lamontagne - 2017-09-25
last replythread Proposal for instruction set - now on Github - Agner - 2017-09-26
last reply Proposal for instruction set - now on Github - Hubert Lamontagne - 2017-09-26
last reply New assembler, new version, new forum - Agner - 2017-11-03