-
Notifications
You must be signed in to change notification settings - Fork 7
Intermediate Representation
Bernard Teo edited this page Apr 30, 2020
·
1 revision
The main thing in the IR is a struct representation of the program, stored in a way that makes it easier to do optimisations. These structs are defined in lib.rs
.
It has a tree structure much like ESTree. However, there are many differences, such as:
- All variables are represented with the location it should be read/written to (i.e. local#1.var#1, or local#1.var#1.var#1, or global#1). Variable names are not stored (TODO: in the future there should be variable names too, but only for displaying compilation errors).
- Parameters come before local variables. (So if a function as two parameters, then local#0 is the first parameter (usually the closure, if this function has a closure) and local#1 is the second parameter. local#2 onwards are the actual local variables.)
- Locals have block scope (i.e. their lifetime is tied to a block statement). A block statement is a list of locals and a sequence of statements.
- There are no constant/variable declarations — the frontend should have hoisted them to the top of the current block statement, and converted the corresponding constant/variable declaration to an assignment statement. There is no behavioural difference between
const
andlet
in the IR. It is expected that in the future, there would be a warning that detects whenlet
variables could have beenconst
(as this kind of analysis would be required by either the frontend or the IR to decide if a variable needs to be placed in a synthesised struct (see the point that immediately follows)). - This might be changed to a finer-grained lifetime measure, in order to better determine whether a variable needs to go onto the GC roots stack.
- There are no constant/variable declarations — the frontend should have hoisted them to the top of the current block statement, and converted the corresponding constant/variable declaration to an assignment statement. There is no behavioural difference between
- Note that the locals here do not necessary map one-to-one to those in the ESTree. The frontend might synthesise a struct to contain locals in the same block if these locals are needed by an inner function. The frontend is allowed to synthesise structs, eliminate variables, or do anything else that it sees fit (but such things are not implemented in the frontend yet).
- All expressions have a type, which is set to Any by the frontend. Optimisation passes might assign more specific types to the variables and expressions in the IR.
- All functions are top level and don't have closures (the closure is just treated as another parameter). This is something like C. However, the Function primitive contains a function pointer and a closure, packed into a single primitive. There is a special expression variant (called
ExprKind::Appl
) that, given a Function primitive and a list of n arguments, will unpack Function primitive and call the function pointer with (n + 1) arguments — the first argument is the closure parameter, and the remaining arguments are the original list of arguments. - Top-level code is put into a separate function, which is set as the "entry point" — this is the function that will be invoked directly by the host (JavaScript).
- There is global variable support in the IR, but the backend does not currently support it and consequently the frontend does not use it. (TODO: Variables in the top-level should be global variables.)
The IR has some issues that should be fixed before undertaking any other changes:
- Expressions and statements should be unified so that we can eliminate the
ExprKind::Sequence
abomination and changeTypeOf
toTypeIfStatement
that would automatically declare a new local of the correct type (see the Backend for a bit more details). We would also be able to unify if-statements and conditional expressions. Note that we will not need a special type to represent statements that are not expressions (e.g. declarations) &mdash the frontend will already ensure that they are not embedded in a expression, so for the IR it is as if it just returns theundefined
value (which is a zero-sized type anyway, so nothing will be emitted by the backend for it). - The
Statement::Void
variant of statement, which represents a statement that never returns, should never have existed. There's duplication of a bunch of things due to this. Instead, the return type of an expression should beOption<VarType>
, where theNone
variant means that the expression never returns. (In an expression that never returns, we do not allocate space for the return value, but instead we emit anunreachable
instruction in the backend. Knowing that a function never returns also might help with some optimisations. It could also be used to warn on unreachable code.) - Builtin operators (e.g. + - * /) are currently encoded as calls to pregenerated functions. This is bad in a number of ways: we lose error information and can't report the error at the call site, it is slower due to the function call overhead, and it's also an awkward abstraction. Instead, the primitive instruction should be embedded directly in the calling function.
Other less-important issues:
- Functions should be annotated with whether they might allocate, to aid the backend in deciding whether it needs to prepare for the possibility of the garbage collector being executed during a function call.
- How to implement overloaded functions (e.g.
display()
) properly? - How to
display()
a function? A pair? - Should pairs be implemented with structs (much much nicer and more efficient, but breaks compatibility with default Source §2), or arrays (compatible with Source §2)? Personally I'm strongly for implementing pairs with structs. In well-written Source you're strongly discouraged from utilising the fact that pairs are actually arrays with two elements anyway, so we might as well enforce it (and in the process, get some efficiency gains).
- Encoding source information (file, line, column) in the IR, so that the backend can embed them in the resulting WebAssembly binary when a narrowing cast (or otherwise potentially failable operation) is performed. Then we can get nicer error messages at runtime.