Skip to content

Commit 5fd3035

Browse files
committed
Initial commit
1 parent 2604c80 commit 5fd3035

File tree

2 files changed

+891
-6
lines changed

2 files changed

+891
-6
lines changed

0000-template.md

+102-6
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,117 @@
1-
- Feature Name: (fill me in with a unique ident, my_awesome_feature)
1+
- Feature Name: N/A
22
- Start Date: (fill me in with today's date, YYYY-MM-DD)
33
- RFC PR: (leave this empty)
44
- Rust Issue: (leave this empty)
55

66
# Summary
77

8-
One para explanation of the feature.
8+
Introduce a "mid-level IR" (MIR) into the compiler. The MIR desugars
9+
most of Rust's surface representation, leaving a simpler form that is
10+
well-suited to type-checking and translation.
911

1012
# Motivation
1113

12-
Why are we doing this? What use cases does it support? What is the expected outcome?
14+
The current compiler uses a single AST from the initial parse all the
15+
way to the final generation of LLVM. While this has some advantages,
16+
there are also a number of distinct downsides.
17+
18+
1. The complexity of the compiler is increased because all passes must
19+
be written against the full Rust language, rather than being able
20+
to consider a reduced subset. The MIR proposed here is *radically*
21+
simpler than the surface Rust syntax -- for example, it contains no
22+
"match" statements, and converts both `ref` bindings and `&`
23+
expresions into a single form.
24+
25+
a. There are numerous examples of "desugaring" in Rust. In
26+
principle, desugaring one language feature into another should
27+
make the compiler *simpler*, but in our current implementation,
28+
it tends to make things more complex, because every phase must
29+
simulate the desugaring anew. The most prominent example are
30+
closure expressions (`|| ...`), which desugar to a fresh struct
31+
instance, but other examples abound: `for` loops, `if let` and
32+
`while let`, `box` expressions, overloaded operators (which
33+
desugar to method calls), method calls (which desugar to UFCS
34+
notation). There are also a number of features (such as `box`
35+
patterns) which are almost infeasible to implement today but
36+
which should be nearly trivial given a MIR representation.
37+
38+
2. Reasoning about fine-grained control-flow in an AST is rather
39+
difficult. The right tool for this job is a control-flow graph
40+
(CFG). We currently construct a CFG that lives "on top" of the AST,
41+
which allows the borrow checking code to be flow sensitive, but it
42+
is awkward to work with. Worse, because this CFG is not used by
43+
trans, it is not necessarily the case that the control-flow as seen
44+
by the analyses corresponds to the code that will be generated.
45+
The MIR is based on a CFG, resolving this situation.
46+
47+
3. The reliability of safety analyses is reduced because the gap
48+
between what is being analyzed (the AST) and what is being executed
49+
(LLVM bitcode) is very wide. The MIR is very low-level and hence the
50+
translation to LLVM should be straightforward.
51+
52+
4. The reliability of safety proofs, when we have some, would be
53+
reduced because the formal language we are modeling is so far from
54+
the full compiler AST. The MIR is simple enough that it should be
55+
possible to (eventually) make safety proofs based on the MIR
56+
itself.
57+
58+
5. Rust-specific optimizations, and optimizing trans output, are very
59+
challenging. There are numerous cases where it would be nice to be
60+
able to do optimizations *before* translating to LLVM
61+
bitcode. Currently, we are forced to do these optimizations as part
62+
of lowering to bitcode, which can get quite complex. Having an intermediate
63+
form improves the situation because:
64+
65+
a. In some cases, we can do the optimizations in the MIR itself before translation.
66+
b. In other cases, we can do analyses on the MIR to easily determine when the optimization
67+
would be safe.
68+
c. Finally, because the MIR so much closer to LLVM bitcode, the complexity of trans
69+
is greatly reduced, and so it is easier to manage a more optimized translation.
70+
71+
6. Migrating away from LLVM is nearly impossible. It would be nice to
72+
provide a choic of backends. Currently though this is infeasible,
73+
since so much of the semantics of Rust itself are embedded in the
74+
`trans` step which converts to LLVM IR. Under the MIR design, those
75+
semantics are instead described in the translation from AST to MIR,
76+
and the LLVM step itself simply applies optimizations.
1377

1478
# Detailed design
1579

16-
This is the bulk of the RFC. Explain the design in enough detail for somebody familiar
17-
with the language to understand, and for somebody familiar with the compiler to implement.
18-
This should get into specifics and corner-cases, and include examples of how the feature is used.
80+
### Prototype
81+
82+
The MIR design being described here [has been prototyped][proto-crate]
83+
and can be viewed in the `nikomatsakis` repository on github. In
84+
particular, [the `repr` module][repr] defines the MIR representation,
85+
and [the `build` module][build] contains the code to create a MIR
86+
representation from an AST-like form.
87+
88+
For increased flexibility, as well as to make the code simpler, the
89+
prototype is not coded directly against the compiler's AST, but rather
90+
against an idealized representation defined by [the `HIR` trait][hir].
91+
The `HIR` trait contains a number of opaque associated types for the
92+
various aspects of the compiler. For example,
93+
[the type `H::Expr`][hirexpr] represents an expression. In order to
94+
find out what kind of expression it is, the `mirror` method is called,
95+
which converts an `H::Expr` into an [`Expr<H>` mirror][expr]. This
96+
mirror then contains [embedded `ExprRef<H>` nodes][exprref] to refer
97+
to further subexpressions; these may either be mirrors themselves, or
98+
else they may be additional `H::Expr` nodes. This allows the tree that
99+
is exported to differ in small ways from the actual tree within the
100+
compiler; the primary intention is to use this to model "adjustments"
101+
like autoderef.
102+
103+
Note that the HIR mirroring system is an experiemnt and not really
104+
part of the MIR itself. It does however present an interesting option
105+
for (eventually) stabilizing access to the compiler's internals.
106+
107+
[proto-crate]: https://github.com/nikomatsakis/rust/tree/mir/src/librustc_mir
108+
[repr]: https://github.com/nikomatsakis/rust/blob/mir/src/librustc_mir/repr.rs
109+
[build]: https://github.com/nikomatsakis/rust/tree/mir/src/librustc_mir/build
110+
[hir]: https://github.com/nikomatsakis/rust/blob/mir/src/librustc_mir/hir.rs
111+
[hirexpr]: https://github.com/nikomatsakis/rust/blob/mir/src/librustc_mir/hir.rs#L28
112+
[mirror]: https://github.com/nikomatsakis/rust/blob/mir/src/librustc_mir/hir.rs#L32-L35
113+
[expr]: https://github.com/nikomatsakis/rust/blob/mir/src/librustc_mir/hir.rs#L111-L161
114+
[exprref]: https://github.com/nikomatsakis/rust/blob/mir/src/librustc_mir/hir.rs#L163-L167
19115

20116
# Drawbacks
21117

0 commit comments

Comments
 (0)