Author: Agner |
Date: 2017-07-27 02:06 |
I am working on making an assembler for ForwardCom, and I think that the design of assemblers could use an update, now that we are redesigning everything anyway. I found that it was not too hard to make the assembly syntax look very much like C or Java, rather than the terrible syntaxes of many current assemblers. This will surely make it easier to write assembly code and make the code easier to read. My plans for the assembler so far are as follows:
Instructions look like C code. For example to add two registers:
int32 r1 = add(r2, r3)
Or simply:
int32 r1 = r2 + r3
Memory operands are indicated with square brackets, where the length of vector registers is indicated if necessary:
float v1 = [r1 + 0x100, length = r5] // read memory operand of r5 bytes into vector register v1
Masked instructions can be coded conveniently with the ?: operator:
int32 r1 = r4 ? r2 + r3 : r5 // add r2 and r3 with boolean in r4 as mask and r5 as fallback value
This is still assembly, because you make one machine instruction per line, and the programmer has to decide which registers to use, but it looks like high-level language.
Branches and loops can be coded either with conditional jumps or as high-level constructs like if, else, switch, for, while, do, break, continue, with {} for block nesting. I also intend to support preprocessing directives like #define, #include, #if; #else, #endif. Now, a more tricky problem is macros and metaprogramming. Most assemblers have macro features that allow metaprogramming, but the syntax is often complicated, confusing, and inconsistent. Such features are probably needed, but I would like a more logical syntax. I have looked at high-level languages to see if they have useful metaprogramming features. Metaprogramming in C++ is possible with templates, but this is very convoluted because the template feature was designed for a more narrow purpose. I would prefer something more similar to the support for self-modifying code in script languages. You make a string variable and then issue this string to the assembler and have it interpreted as assembly language. We need a way to distinguish between code that is generated for run-time execution, and meta-code that is executed in the assembler. My idea so far is to use a dot (.) or percent (%) or some other symbol to indicate assembly-time meta code. For example, a loop that adds the numbers from 1 to 10 at runtime will look like this:
int64 r0 = 0
for (int64 r1 = 1; r1 <= 10; r1++) {
int64 r0 += r1
}
A similar assemble-time loop could look like this:
. sum = 0 // define assembly-time variable sum
.for (i = 0; i <= 10; i++) { // dot indicates assembly-time loop
. sum += i
}
int64 r1 = sum // runtime code r1 = 55
This will calculate the sum at assembly time and just generate a single runtime instruction setting register r1 = 55. The assembly-time variables will be dynamically typed, using 64-bit integers, double, or string, depending on the context. All common operators are supported. String expressions can be converted to assembly code. For example:
. text = " nop \n"
. emit text + text /* This will send the string " nop \n nop \n" to the assembler and produce two nop instructions */
With this proposal, we have three kinds of code. For example, "#if" will branch at the preprocessing stage, ".if" will branch at the assembly metaprogramming stage, and "if" will generate executable code that branches at runtime. It may be confusing with three types of code, but at least the "#" and "." prefixes will make the distinction clear. The preprocessing directives (with #) are executed before the assembler interprets any code, so it has no knowledge of anything defined in the code, while the metaprogramming features (with .) can check e.g. if a function is defined before making code that calls it, or it can make different code depending on the type or size of a defined data object. I would like to hear your comments before I implement all this. |