The Ita Compiler
The compiler is responsible for compiling the Ita programming language
into either a VM code, or native Ita machine code. The compiler for Ita
was written on PC in C#. C# was chosen because it is a very productive
language. C# is also fairly similar (although far more powerful) to the
Ita programming language. So, it theory it should be easier to port the
compiler to the Ita programming language (i.e. to make it self hosting)
than if a language with different syntax/semantics was used.
Lexer/Parser
The lexer and parser for the language were written by hand instead of
using a parser generator. One of my project goals was to write everything
by hand, so I didn't want to use any external tools to generate a lexer/parser.
The lexer and parser are quite straightforward. The parser uses rescursive
decent to build up an AST. The parser keeps track of source locations (line
and column info) so fairly precise errors can be reported.
Compilation
Once an AST is created, the compiler generates VM (virtual machine)
code. The VM operations are relatively high level. For example VM operations
may perform memory allocations or load fields from an object. The VM code is
however lower level than many other languages. For example, the VM code doesn't
have the equivalent of a procedure call. The VM instructions were designed
to be easily translated into native code or calls to runtime library functions.
The VM code has the concept of an "item". The treatment of items is similar
to the Oberon compiler. An item may be an integer constant, a stack location,
or an immediate object. A register item is also available for the optimizer,
but a register is never directly created by the compiler front end.
The compilation to VM code automatically performs a couple simple optimizations.
For example, constants are folded and evaluated before emitting any VM code.
However, the compiler doesn't do any sort of constant propagation. The compiler
also tries to eliminate anonymous "boolean" values that are immediately used
in an if statement.
Multiplication by a power of two is automatically converted to a shift
left operation.
VM Level Optimization
After the compiler emits VM code, some further optimizations can be
done on the VM code before conversion to native code. The primary optimization
is to promote some stack allocations (i.e. temporary values and local variables)
to registers.
The VM level optimizer breaks the VM code into basic blocks and then
determines information such as the lifetime of a variable and looks at
possible paths through the VM code to determine if a variable could be
read at any point after registers may be spilled. If a variable is never
read after it is potentially spilled then it is promoted to a register.
Once the VM instructions have been generated, it is possible to execute them
directly on the PC inside a VM simulator. To execute the code directly on
the Ita computer, the VM instructions must next be converted into native Ita
code. This final step is fairly straightforward because most instructions
can be translated into a short sequence of native instructions. More complex
operation, like memory allocation, are converted to calls to the runtime library.
Native Code Optimization
The only real complexity in the backend is the need for optimization. The
backend tries to take advantage of constants and register items to reduce
the number of instructions emitted. Some of the more unique instructions
of the CPU are also used for some operations like adding one to a register.
The compiler also designates a register to hold a constant value of 4 since
it is very commonly used when advancing pointers by the size of a machine
word.
Since the CPU doesn't have an instruction to shift by a variable amount,
the compiler will unroll shifts by an immediate value to reduce loop overhead.
Executable Format
Most compilers generate an executable in a serialized format that can be
loaded by the target platform's operating system. Instead of using this
technique, the Ita compiler generates a memory image for the Ita computer.
The memory image contains all the code required to run the operating system
and the applications that have been compiled. Running the output of the
compiler requires this memory image to be loaded into RAM and then executed.
Execution begins by executing the image starting at address 0.
On a machine simulator the memory image can be read directly from a file
into the simulated machine's RAM. On the actual Ita computer a small boot
strap stored in flash memory is used to copy the memory image (also stored in
flash) into the system's RAM. Once the image is copied the bootstrap running
from flash memory jumps to RAM address 0 to begin execution of the image.
Error Reporting
Error reporting is one area that many hobby compilers are often relatively
weak. The Ita compiler is sure to track source locations for all tokens. This
information makes it easy to identify where an error occured in the source
code.
Here is an example of an error reported by the compiler with provided context
information:
{
static Object MainThreadStart(Object dummy)
{
StarterClass.result = Main();
^ Use of undefined variable or function 'Main'.
}
static void MainHelper()
{
The compiler only attempts to report the first error encountered. I felt
this wasn't much of a deficiency since the compile times are quite low. But
it is one area that could be improved in the compiler.
|