The idea of supporting vector registers with variable length has important consequences for the instruction set architecture as well as for the entire ecosystem of compilers, function libraries, etc. I will discuss my thoughts about this here. First, the register set. We have discussed whether there should be different registers for integers and floating point numbers, and for scalars and vectors. So far, the following solutions have been proposed: 1. One universal register set for everything.
2. Two register sets, one for scalars and one for vectors. Same registers are used for integers and floating point.
3. Two register sets, one for integer scalars and one for everything else: floating point scalars, integer vectors and floating point vectors.
4. Three register sets, one for integer scalars, one for floating point scalars and one for vectors of all types. The reason for using the same vector registers for integers and floating point numbers is that they share many of the same instructions, as mentioned in a previous post. If we assume that a lot of floating point code involves arrays and loops, then we must prioritize easy vectorization of floating point code. If we assume, furthermore, that a lot of floating point code contains calls to mathematical function libraries, then we must make these library calls vectorizable. Mathematical library functions such as sine or logarithm should have a variable-size vector as input and a similar variable-size vector as output. It will be simpler to use the same functions for scalars by specifying a vector length of one, rather than having separate function versions for scalars and vectors. This will make it easier for an optimizing compiler to convert function calls in scalar code to vector code. A consequence of this is that we should use the same register set for floating point scalars and floating point vectors. A drawback of using vector registers for scalars is that vector registers cannot have callee-save status because the vector length is variable with no theoretical upper limit. We must find out if scalar non-vectorizable floating point code is sufficiently common to justify having a separate register set for floating point scalars. For integers, on the other hand, there is no doubt that scalar code is common. We need scalar integer registers for pointers, loop control, and all kinds of general code. This leaves us with option 3 above as probably the optimal solution: one register set for integer scalars, and another register set for floating point numbers and vectors. The priority on vector support, variable-length vectors, and variable length vector functions has important consequences of the whole ecosystem of compilers, function libraries, etc. We must define an ABI standard that supports functions with variable-length vectors. If registers r9 - r15 are used for specifying vector length, as proposed, then it will be natural to use these registers also to specify the vector length of function parameters and function returns. If multiple vector parameters have the same length (in bytes), then they should use the same vector length register, r9. If multiple vector parameters have different length then they will use r9, r10, etc. If there are more than 9 scalar integer parameters before one or more variable-length vector parameters, then the vector length will have precedence over the scalar integer parameters for the use of r9 - r15. The vector length is specified in bytes in the vector registers and in assembly code because this makes loops more efficient (we can use the same register for loop counter, array index and vector length as explained in my previous post). High level code differs from this by specifying the vector length as the number of vector elements. The compiler can easily translate this to bytes, like it is already doing for array indexes. A function that uses multiple vectors of different kinds should preferably have the same element size for all vectors, i.e. 64-bit integers if you have double precision floats. This system also needs special support in compilers. As a minimum, we need a way of defining functions with variable-length vectors as parameters and as return value. Many contemporary compilers already have a way of specifying fixed-length vector registers as parameters and variables. The problem that vector registers cannot have callee-save status can be met by making an addition to the object file format that allows a libraray function to specify which registers it is modifying. A compiler that supports whole-program optimization can use this information at the register allocation stage to avoid the need to save registers across calls to library functions with static linking. |