Agner`s CPU blog

Software optimization resources | E-mail subscription to this blog | www.agner.org

 
thread Instruction set bloat - Agner - 2018-04-25
reply Instruction set bloat - someone - 2018-04-26
last replythread Instruction set bloat - undocumented - 2018-07-22
last replythread Instruction set bloat - Agner - 2018-07-23
last reply Instruction set bloat - Avid reader - 2018-09-22
 
Instruction set bloat
Author: Agner Date: 2018-04-25 14:29
The new AVX512 instruction set extension adds three hundred new instructions. The x86 instruction set with its many extensions now includes more than two thousand different instructions. We can only guess what this costs in terms of design complexity and silicon space on the latest microprocessors.


No compiler is able to generate so many different instructions from high-level language, and few programmers, if any, are competent to use them all in assembly code or intrinsic functions. However, most of the instructions appear to be useful and they may be used in specially designed function libraries.


Many of the older instructions are now obsolete as they have been replaced by new more efficient instructions, but the old instructions are still supported for the sake of compatibility with legacy software. AMD has removed some of their obsolete instructions from their new processors, but Intel processors still support even the most obscure undocumented instructions dating back to the first 8086 processor.


The x86 instruction set was initially developed at a time when CISC design was technologically optimal. This instruction set with its many extensions is now a confusing hodgepodge witnessing a long history of changing technologies, short-sighted decisions, patches, changing priorities, and changing marketing fads.


Other instruction sets such as ARM, etc., are not quite as messy. This is not because of better foresight from the designers but simply because of a shorter history. The ARM instruction set also has patches, and it will certainly need more patches in the future when extensions are needed.


Some of the problems could perhaps have been avoided if the design process was open and transparent. Decisions about new instructions and extensions are typically kept as business secrets for competitive reasons. When a new instruction set extension is finally published, it is too late to change anything in case an outsider comes up with a better proposal.


I have been following this development for many years and I have often been frustrated to watch the tortuous results. I can't help thinking about what an ideal computer design would look like if we could start from scratch with no concern for backward compatibility, but using the knowledge and experience we have collected through all these years. Well - I have done more than just thinking. I have published some of my ideas and got a lot of useful feedback and new ideas from the users of this forum. The result is a new instruction set and computer system that I call ForwardCom (Forward Compatible Computer System). ForwardCom is neither RISC nor CISC, but a hybrid with few instructions but many variants of each instruction. It has vector registers with variable length, where each vector register contains information about its own length. It is designed so that existing software can take advantage of unlimited future extensions of the vector length without the need for new instructions and recompilation. Hence the name "forward compatible". The instruction format is standardized and the complexity is limited in order to enable a simple pipeline design.


All the necessary software tools for ForwardCom have been designed. You can see it all at www.forwardcom.info including the many innovative features. No hardware or FPGA implementation has been developed yet.


I don't expect the mainstream computer market to make a radical change of technology even if it might appear to be collapsing under its own weight. The cost of replacing all legacy software would simply be unsurmountable. But the ForwardCom project can still be useful as a sandbox for experiments and university projects. I have already got the first request from a student who wants to do such a project. It might also be useful for niche products where backward compatibility is not needed, for example large vector processors.

   
Instruction set bloat
Author:  Date: 2018-04-26 07:38
Agner wrote:
The new AVX512 instruction set extension adds three hundred new instructions. The x86 instruction set with its many extensions now includes more than two thousand different instructions. We can only guess what this costs in terms of design complexity and silicon space on the latest microprocessors.


No compiler is able to generate so many different instructions from high-level language, and few programmers, if any, are competent to use them all in assembly code or intrinsic functions. However, most of the instructions appear to be useful and they may be used in specially designed function libraries.


Many of the older instructions are now obsolete as they have been replaced by new more efficient instructions, but the old instructions are still supported for the sake of compatibility with legacy software. AMD has removed some of their obsolete instructions from their new processors, but Intel processors still support even the most obscure undocumented instructions dating back to the first 8086 processor.


The x86 instruction set was initially developed at a time when CISC design was technologically optimal. This instruction set with its many extensions is now a confusing hodgepodge witnessing a long history of changing technologies, short-sighted decisions, patches, changing priorities, and changing marketing fads.


Other instruction sets such as ARM, etc., are not quite as messy. This is not because of better foresight from the designers but simply because of a shorter history. The ARM instruction set also has patches, and it will certainly need more patches in the future when extensions are needed.


Some of the problems could perhaps have been avoided if the design process was open and transparent. Decisions about new instructions and extensions are typically kept as business secrets for competitive reasons. When a new instruction set extension is finally published, it is too late to change anything in case an outsider comes up with a better proposal.


I have been following this development for many years and I have often been frustrated to watch the tortuous results. I can't help thinking about what an ideal computer design would look like if we could start from scratch with no concern for backward compatibility, but using the knowledge and experience we have collected through all these years. Well - I have done more than just thinking. I have published some of my ideas and got a lot of useful feedback and new ideas from the users of this forum. The result is a new instruction set and computer system that I call ForwardCom (Forward Compatible Computer System). ForwardCom is neither RISC nor CISC, but a hybrid with few instructions but many variants of each instruction. It has vector registers with variable length, where each vector register contains information about its own length. It is designed so that existing software can take advantage of unlimited future extensions of the vector length without the need for new instructions and recompilation. Hence the name "forward compatible". The instruction format is standardized and the complexity is limited in order to enable a simple pipeline design.


All the necessary software tools for ForwardCom have been designed. You can see it all at www.forwardcom.info including the many innovative features. No hardware or FPGA implementation has been developed yet.


I don't expect the mainstream computer market to make a radical change of technology even if it might appear to be collapsing under its own weight. The cost of replacing all legacy software would simply be unsurmountable. But the ForwardCom project can still be useful as a sandbox for experiments and university projects. I have already got the first request from a student who wants to do such a project. It might also be useful for niche products where backward compatibility is not needed, for example large vector processors.

I'd say that the ARMv8 ISA reset was a great idea that allowed preserving backwards compatibility while avoiding further ISA bloat. The legacy ARM ISA accumulated quite some bloat since 1987 (+ the 26 to 32-bit transition too).

AMD could have used x86_64 as an opportunity to clean up quite some of the bloat, but did none of that for some reason.

I'd argue that fixed size 32-bit aligned instructions as implemented on ARMv8 are easier to decode (not on legacy ARM though, as every instruction there can be conditional, making decoding complicated) than any kind of variable-length instructions.

   
Instruction set bloat
Author: undocumented Date: 2018-07-22 22:42
https://www.youtube.com/watch?v=KrksBdWcZgQ
Did You know and can You dig it more?
   
Instruction set bloat
Author: Agner Date: 2018-07-23 01:07
undocumented wrote:
Did You know and can You dig it more?
What the eloquent Christopher Domas explains in this video is a way of finding undocumented instructions. The 32-bit x86 instruction code map is really crammed with instructions. The map was basically full by 1985. Since then, they have used all kind of tricks and patches to put more instructions into the map. A search for holes in this map is not likely to reveal much. There are still several holes in the map in 64-bit mode, though. If the hardware vendor wants to put in secret back dors, they would probably not use undocumented instruction codes. Instead, they might use some officially unused bits of control registers or they might make undocumented model-specific registers. I am pretty sure that some processors have undocumented model-specific registers which are used for internal test purposes. It would be easy to make a backdoor that is impossible to find. If a backdoor is opened only when certain model specific registers have a certain combination of values (a strong password) then it would never be found by a systematic search. The only way to find such a backdoor - if it exists - woud be to reverse engineer a piece of software that uses it.
   
Instruction set bloat
Author: Avid reader Date: 2018-09-22 12:00
Agner wrote:
The only way to find such a backdoor - if it exists - woud be to reverse engineer a piece of software that uses it.
Christopher Domas once again found a clever workaround [0], it turns out that for the time being enough information can be extracted from rdmsr latency to identify suspicious machine specific registers.
Of course that wouldn't help break a 2*64 bit password, but knowing something exists and where to look is a great first step.

[0]: https://media.defcon.org/DEF%20CON%2026/DEF%20CON%2026%20presentations/Christopher%20Domas/