|
1 |
| -- Feature Name: (fill me in with a unique ident, my_awesome_feature) |
| 1 | +- Feature Name: N/A |
2 | 2 | - Start Date: (fill me in with today's date, YYYY-MM-DD)
|
3 | 3 | - RFC PR: (leave this empty)
|
4 | 4 | - Rust Issue: (leave this empty)
|
5 | 5 |
|
6 | 6 | # Summary
|
7 | 7 |
|
8 |
| -One para explanation of the feature. |
| 8 | +Introduce a "mid-level IR" (MIR) into the compiler. The MIR desugars |
| 9 | +most of Rust's surface representation, leaving a simpler form that is |
| 10 | +well-suited to type-checking and translation. |
9 | 11 |
|
10 | 12 | # Motivation
|
11 | 13 |
|
12 |
| -Why are we doing this? What use cases does it support? What is the expected outcome? |
| 14 | +The current compiler uses a single AST from the initial parse all the |
| 15 | +way to the final generation of LLVM. While this has some advantages, |
| 16 | +there are also a number of distinct downsides. |
| 17 | + |
| 18 | +1. The complexity of the compiler is increased because all passes must |
| 19 | + be written against the full Rust language, rather than being able |
| 20 | + to consider a reduced subset. The MIR proposed here is *radically* |
| 21 | + simpler than the surface Rust syntax -- for example, it contains no |
| 22 | + "match" statements, and converts both `ref` bindings and `&` |
| 23 | + expresions into a single form. |
| 24 | + |
| 25 | + a. There are numerous examples of "desugaring" in Rust. In |
| 26 | + principle, desugaring one language feature into another should |
| 27 | + make the compiler *simpler*, but in our current implementation, |
| 28 | + it tends to make things more complex, because every phase must |
| 29 | + simulate the desugaring anew. The most prominent example are |
| 30 | + closure expressions (`|| ...`), which desugar to a fresh struct |
| 31 | + instance, but other examples abound: `for` loops, `if let` and |
| 32 | + `while let`, `box` expressions, overloaded operators (which |
| 33 | + desugar to method calls), method calls (which desugar to UFCS |
| 34 | + notation). There are also a number of features (such as `box` |
| 35 | + patterns) which are almost infeasible to implement today but |
| 36 | + which should be nearly trivial given a MIR representation. |
| 37 | + |
| 38 | +2. Reasoning about fine-grained control-flow in an AST is rather |
| 39 | + difficult. The right tool for this job is a control-flow graph |
| 40 | + (CFG). We currently construct a CFG that lives "on top" of the AST, |
| 41 | + which allows the borrow checking code to be flow sensitive, but it |
| 42 | + is awkward to work with. Worse, because this CFG is not used by |
| 43 | + trans, it is not necessarily the case that the control-flow as seen |
| 44 | + by the analyses corresponds to the code that will be generated. |
| 45 | + The MIR is based on a CFG, resolving this situation. |
| 46 | + |
| 47 | +3. The reliability of safety analyses is reduced because the gap |
| 48 | + between what is being analyzed (the AST) and what is being executed |
| 49 | + (LLVM bitcode) is very wide. The MIR is very low-level and hence the |
| 50 | + translation to LLVM should be straightforward. |
| 51 | + |
| 52 | +4. The reliability of safety proofs, when we have some, would be |
| 53 | + reduced because the formal language we are modeling is so far from |
| 54 | + the full compiler AST. The MIR is simple enough that it should be |
| 55 | + possible to (eventually) make safety proofs based on the MIR |
| 56 | + itself. |
| 57 | + |
| 58 | +5. Rust-specific optimizations, and optimizing trans output, are very |
| 59 | + challenging. There are numerous cases where it would be nice to be |
| 60 | + able to do optimizations *before* translating to LLVM |
| 61 | + bitcode. Currently, we are forced to do these optimizations as part |
| 62 | + of lowering to bitcode, which can get quite complex. Having an intermediate |
| 63 | + form improves the situation because: |
| 64 | + |
| 65 | + a. In some cases, we can do the optimizations in the MIR itself before translation. |
| 66 | + b. In other cases, we can do analyses on the MIR to easily determine when the optimization |
| 67 | + would be safe. |
| 68 | + c. Finally, because the MIR so much closer to LLVM bitcode, the complexity of trans |
| 69 | + is greatly reduced, and so it is easier to manage a more optimized translation. |
| 70 | + |
| 71 | +6. Migrating away from LLVM is nearly impossible. It would be nice to |
| 72 | + provide a choic of backends. Currently though this is infeasible, |
| 73 | + since so much of the semantics of Rust itself are embedded in the |
| 74 | + `trans` step which converts to LLVM IR. Under the MIR design, those |
| 75 | + semantics are instead described in the translation from AST to MIR, |
| 76 | + and the LLVM step itself simply applies optimizations. |
13 | 77 |
|
14 | 78 | # Detailed design
|
15 | 79 |
|
16 |
| -This is the bulk of the RFC. Explain the design in enough detail for somebody familiar |
17 |
| -with the language to understand, and for somebody familiar with the compiler to implement. |
18 |
| -This should get into specifics and corner-cases, and include examples of how the feature is used. |
| 80 | +### Prototype |
| 81 | + |
| 82 | +The MIR design being described here [has been prototyped][proto-crate] |
| 83 | +and can be viewed in the `nikomatsakis` repository on github. In |
| 84 | +particular, [the `repr` module][repr] defines the MIR representation, |
| 85 | +and [the `build` module][build] contains the code to create a MIR |
| 86 | +representation from an AST-like form. |
| 87 | + |
| 88 | +For increased flexibility, as well as to make the code simpler, the |
| 89 | +prototype is not coded directly against the compiler's AST, but rather |
| 90 | +against an idealized representation defined by [the `HIR` trait][hir]. |
| 91 | +The `HIR` trait contains a number of opaque associated types for the |
| 92 | +various aspects of the compiler. For example, |
| 93 | +[the type `H::Expr`][hirexpr] represents an expression. In order to |
| 94 | +find out what kind of expression it is, the `mirror` method is called, |
| 95 | +which converts an `H::Expr` into an [`Expr<H>` mirror][expr]. This |
| 96 | +mirror then contains [embedded `ExprRef<H>` nodes][exprref] to refer |
| 97 | +to further subexpressions; these may either be mirrors themselves, or |
| 98 | +else they may be additional `H::Expr` nodes. This allows the tree that |
| 99 | +is exported to differ in small ways from the actual tree within the |
| 100 | +compiler; the primary intention is to use this to model "adjustments" |
| 101 | +like autoderef. |
| 102 | + |
| 103 | +Note that the HIR mirroring system is an experiemnt and not really |
| 104 | +part of the MIR itself. It does however present an interesting option |
| 105 | +for (eventually) stabilizing access to the compiler's internals. |
| 106 | + |
| 107 | +[proto-crate]: https://github.com/nikomatsakis/rust/tree/mir/src/librustc_mir |
| 108 | +[repr]: https://github.com/nikomatsakis/rust/blob/mir/src/librustc_mir/repr.rs |
| 109 | +[build]: https://github.com/nikomatsakis/rust/tree/mir/src/librustc_mir/build |
| 110 | +[hir]: https://github.com/nikomatsakis/rust/blob/mir/src/librustc_mir/hir.rs |
| 111 | +[hirexpr]: https://github.com/nikomatsakis/rust/blob/mir/src/librustc_mir/hir.rs#L28 |
| 112 | +[mirror]: https://github.com/nikomatsakis/rust/blob/mir/src/librustc_mir/hir.rs#L32-L35 |
| 113 | +[expr]: https://github.com/nikomatsakis/rust/blob/mir/src/librustc_mir/hir.rs#L111-L161 |
| 114 | +[exprref]: https://github.com/nikomatsakis/rust/blob/mir/src/librustc_mir/hir.rs#L163-L167 |
19 | 115 |
|
20 | 116 | # Drawbacks
|
21 | 117 |
|
|
0 commit comments