Dave Cope's Homepage
   Main | Personal | Photo Gallery | Projects | Linux | Employment Info. | Links | Misc. | About e-mail  

The Ita Compiler

The compiler is responsible for compiling the Ita programming language into either a VM code, or native Ita machine code. The compiler for Ita was written on PC in C#. C# was chosen because it is a very productive language. C# is also fairly similar (although far more powerful) to the Ita programming language. So, it theory it should be easier to port the compiler to the Ita programming language (i.e. to make it self hosting) than if a language with different syntax/semantics was used.

Lexer/Parser

The lexer and parser for the language were written by hand instead of using a parser generator. One of my project goals was to write everything by hand, so I didn't want to use any external tools to generate a lexer/parser.

The lexer and parser are quite straightforward. The parser uses rescursive decent to build up an AST. The parser keeps track of source locations (line and column info) so fairly precise errors can be reported.

Compilation

Once an AST is created, the compiler generates VM (virtual machine) code. The VM operations are relatively high level. For example VM operations may perform memory allocations or load fields from an object. The VM code is however lower level than many other languages. For example, the VM code doesn't have the equivalent of a procedure call. The VM instructions were designed to be easily translated into native code or calls to runtime library functions.

The VM code has the concept of an "item". The treatment of items is similar to the Oberon compiler. An item may be an integer constant, a stack location, or an immediate object. A register item is also available for the optimizer, but a register is never directly created by the compiler front end.

The compilation to VM code automatically performs a couple simple optimizations. For example, constants are folded and evaluated before emitting any VM code. However, the compiler doesn't do any sort of constant propagation. The compiler also tries to eliminate anonymous "boolean" values that are immediately used in an if statement.

Multiplication by a power of two is automatically converted to a shift left operation.

VM Level Optimization

After the compiler emits VM code, some further optimizations can be done on the VM code before conversion to native code. The primary optimization is to promote some stack allocations (i.e. temporary values and local variables) to registers.

The VM level optimizer breaks the VM code into basic blocks and then determines information such as the lifetime of a variable and looks at possible paths through the VM code to determine if a variable could be read at any point after registers may be spilled. If a variable is never read after it is potentially spilled then it is promoted to a register.

Once the VM instructions have been generated, it is possible to execute them directly on the PC inside a VM simulator. To execute the code directly on the Ita computer, the VM instructions must next be converted into native Ita code. This final step is fairly straightforward because most instructions can be translated into a short sequence of native instructions. More complex operation, like memory allocation, are converted to calls to the runtime library.

Native Code Optimization

The only real complexity in the backend is the need for optimization. The backend tries to take advantage of constants and register items to reduce the number of instructions emitted. Some of the more unique instructions of the CPU are also used for some operations like adding one to a register. The compiler also designates a register to hold a constant value of 4 since it is very commonly used when advancing pointers by the size of a machine word.

Since the CPU doesn't have an instruction to shift by a variable amount, the compiler will unroll shifts by an immediate value to reduce loop overhead.

Executable Format

Most compilers generate an executable in a serialized format that can be loaded by the target platform's operating system. Instead of using this technique, the Ita compiler generates a memory image for the Ita computer. The memory image contains all the code required to run the operating system and the applications that have been compiled. Running the output of the compiler requires this memory image to be loaded into RAM and then executed. Execution begins by executing the image starting at address 0.

On a machine simulator the memory image can be read directly from a file into the simulated machine's RAM. On the actual Ita computer a small boot strap stored in flash memory is used to copy the memory image (also stored in flash) into the system's RAM. Once the image is copied the bootstrap running from flash memory jumps to RAM address 0 to begin execution of the image.

Error Reporting

Error reporting is one area that many hobby compilers are often relatively weak. The Ita compiler is sure to track source locations for all tokens. This information makes it easy to identify where an error occured in the source code.

Here is an example of an error reported by the compiler with provided context information:

                {
                    
                    static Object MainThreadStart(Object dummy)
                    {
                        StarterClass.result = Main();
                                              ^ Use of undefined variable or function 'Main'.
                    }

                    static void MainHelper()
                    {

The compiler only attempts to report the first error encountered. I felt this wasn't much of a deficiency since the compile times are quite low. But it is one area that could be improved in the compiler.



This site was designed and coded by Dave Cope - © 1998-2003