Skip to content
calvis edited this page Apr 28, 2012 · 2 revisions

Our compiler is divided into three parts, a front-end, a middle, and a back-end.

Front End

The front-end of the Harlan compiler is comprised of our parser, typechecker, and several passes that are responsible for making the code more uniform and simple.

  • parse-harlan verifies the input program, and inserts tags around all scalar expressions (for example, the integer 0 becomes (int 0)). This reduces ambiguity in the typechecker.
  • returnify will insert a return around the last expression in each function body, if one does not exist already. This pass must run before typechecking, so that the typechecker can enforce that all returns are returning expressions of the same type.
  • typecheck does exactly what it sounds like. Every expression is tagged with its type, assuming that the program is statically well-typed. If not, the typechecker will error.
  • expand-primitives rewrites front-end primitives, such as write-pgm (primitives that are too complex to deal with in passes later), or print and assert that are called on complex data-types.
  • remove-danger adds safety-checks before vector-refs, to ensure safe memory access.

Middle

  • make-kernel-dimensions-explicit adds another field to kernels, to describe their dimension. Usually, this is the length of the first argument.
  • optimize-fuse-kernels creates two-dimensional kernels where possible, and inlines iota as a kernel argument.
  • lift-complex unnests all complex expressions.
  • remove-nested-kernels turns all kernels nested inside another kernel into for-loops, leaving the outermost kernel intact.
  • optimize-lift-lets lifts let-bindings as high as possible in the program, possibly outside of kernels, for-loops, and while-loops.
  • returnify-kernels changes kernel syntax again. Instead of returning expressions, a "return value" argument is explicitly passed in as an argument, and the kernel repeatedly assigns to the return value instead.
  • make-vector-refs-explicit replaces all references to kernel arguments inside of kernel bodies with vector-refs to the source vector. (This could use an example).
  • annotate-free-vars walks the body of all kernels and adds an explicit annotation of all free-variables referenced inside of them.
  • insert-let-regions inserts a new primitive, let-region, inside of each function body, for-loop, and while-loop. This indicates a scoped region of memory, where things can be allocated.
  • infer-regions infers all variables onto a region. This is an understatement.
  • uglify-vectors replaces make-vector with an explicit alloc, replaces length with a region-reference to the field where a vector's length is stored, and adds an additional offset to vector-ref to account for the new addition of the vector-length field.
  • remove-let-regions explicitly creates and destroys a region based on its lexical scope.
  • flatten-lets unnests lets.
  • hoist-kernels moves all kernels to a special module at the top of the program, replacing them with a new apply-kernel primitive that applies a kernel's unique identifier to its arguments.
  • generate-kernel-calls turns apply-kernel into calls to runtime functions.
  • compile-module transforms the entire module into something more C-like and discards unnecessary annotations.
  • convert-types converts Harlan types into C-types.

Back End

Currently, harlan-format-c is the only pass in the back-end of the compiler, and it is responsible for turning s-expressions into a string of C++ with OpenCL.