Porting Compilers
Instead of translating directly into machine code, modern compilers translate to a machine independent intermediate code in order to enhance portability of the compiler and minimize design efforts. The intermediate language defines a virtual machine that can execute all programs written in the intermediate language (a machine is defined by its language and vice versa). The intermediate code instructions are translated into equivalent machine code sequences by a code generator to create executable code. It is also possible to skip the generation of machine code by actually implementing the virtual machine in machine code. This virtual machine implementation is called an interpreter, because it reads in the intermediate code instructions one by one and after each read executes the equivalent machine code sequences (the interpretation) of the read intermediate instruction directly.
The use of intermediate code enhances portability of the compiler, because only the machine dependent code (the interpreter or the code generator) of the compiler itself needs to be ported to the target machine. The remainder of the compiler can be imported as intermediate code and then further processed by the ported code generator or interpreter, thus producing the compiler software or directly executing the intermediate code on the interpreter. The machine independent part can be developed and tested on another machine (the host machine). This greatly reduces design efforts, because the machine independent part needs to be developed only once to create portable intermediate code.
An interpreter is less complex and therefore easier to port than a code generator, because it is not able to do code optimizations due to its limited view of the program code (it only sees one instruction at a time and you need a sequence to do optimization). Some interpreters are extremely easy to port, because they only make minimal assumptions about the instruction set of the underlying hardware. As a result the virtual machine is even simpler than the target CPU.
Writing the compiler sources entirely in the programming language the compiler is supposed to translate, makes the following approach, better known as compiler bootstrapping, feasible on the target machine:
- Port the interpreter. This needs to be coded in assembly code, using an already present assembler on the target.
- Adapt the source of the code generator to the new machine.
- Execute the adapted source using the interpreter with the code generator source as input. This will generate the machine code for the code generator.
The difficult part of coding the optimization routines is done using the high-level language instead of the assembly language of the target.
According to the designers of the BCPL language, interpreted code (in the BCPL case) is more compact than machine code; typically by a factor of two to one. Interpreted code however runs about ten times slower than compiled code on the same machine.
The designers of the Java programming language try to take advantage of the compactness of interpreted code, because in Java a program needs to be transmitted over the Internet before execution can start on the target's Java Virtual Machine.
Read more about this topic: Porting