Skip to content

Commit f60bc3a

Browse files
committedSep 19, 2017
Auto merge of #44505 - nikomatsakis:lotsa-comments, r=steveklabnik
rework the README.md for rustc and add other readmes OK, so, long ago I committed to the idea of trying to write some high-level documentation for rustc. This has proved to be much harder for me to get done than I thought it would! This PR is far from as complete as I had hoped, but I wanted to open it so that people can give me feedback on the conventions that it establishes. If this seems like a good way forward, we can land it and I will open an issue with a good check-list of things to write (and try to take down some of them myself). Here are the conventions I established on which I would like feedback. **Use README.md files**. First off, I'm aiming to keep most of the high-level docs in `README.md` files, rather than entries on forge. My thought is that such files are (a) more discoverable than forge and (b) closer to the code, and hence can be edited in a single PR. However, since they are not *in the code*, they will naturally get out of date, so the intention is to focus on the highest-level details, which are least likely to bitrot. I've included a few examples of common functions and so forth, but never tried to (e.g.) exhaustively list the names of functions and so forth. - I would like to use the tidy scripts to try and check that these do not go out of date. Future work. **librustc/README.md as the main entrypoint.** This seems like the most natural place people will look first. It lays out how the crates are structured and **is intended** to give pointers to the main data structures of the compiler (I didn't update that yet; the existing material is terribly dated). **A glossary listing abbreviations and things.** It's much harder to read code if you don't know what some obscure set of letters like `infcx` stands for. **Major modules each have their own README.md that documents the high-level idea.** For example, I wrote some stuff about `hir` and `ty`. Both of them have many missing topics, but I think that is roughly the level of depth that would be good. The idea is to give people a "feeling" for what the code does. What is missing primarily here is lots of content. =) Here are some things I'd like to see: - A description of what a QUERY is and how to define one - Some comments for `librustc/ty/maps.rs` - An overview of how compilation proceeds now (i.e., the hybrid demand-driven and forward model) and how we would like to see it going in the future (all demand-driven) - Some coverage of how incremental will work under red-green - An updated list of the major IRs in use of the compiler (AST, HIR, TypeckTables, MIR) and major bits of interesting code (typeck, borrowck, etc) - More advice on how to use `x.py`, or at least pointers to that - Good choice for `config.toml` - How to use `RUST_LOG` and other debugging flags (e.g., `-Zverbose`, `-Ztreat-err-as-bug`) - Helpful conventions for `debug!` statement formatting cc @rust-lang/compiler @mgattozzi
2 parents 325ba23 + 638958b commit f60bc3a

File tree

20 files changed

+2571
-1757
lines changed

20 files changed

+2571
-1757
lines changed
 

‎src/librustc/README.md

+185-156
Large diffs are not rendered by default.

‎src/librustc/hir/README.md

+119
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
# Introduction to the HIR
2+
3+
The HIR -- "High-level IR" -- is the primary IR used in most of
4+
rustc. It is a desugared version of the "abstract syntax tree" (AST)
5+
that is generated after parsing, macro expansion, and name resolution
6+
have completed. Many parts of HIR resemble Rust surface syntax quite
7+
closely, with the exception that some of Rust's expression forms have
8+
been desugared away (as an example, `for` loops are converted into a
9+
`loop` and do not appear in the HIR).
10+
11+
This README covers the main concepts of the HIR.
12+
13+
### Out-of-band storage and the `Crate` type
14+
15+
The top-level data-structure in the HIR is the `Crate`, which stores
16+
the contents of the crate currently being compiled (we only ever
17+
construct HIR for the current crate). Whereas in the AST the crate
18+
data structure basically just contains the root module, the HIR
19+
`Crate` structure contains a number of maps and other things that
20+
serve to organize the content of the crate for easier access.
21+
22+
For example, the contents of individual items (e.g., modules,
23+
functions, traits, impls, etc) in the HIR are not immediately
24+
accessible in the parents. So, for example, if had a module item `foo`
25+
containing a function `bar()`:
26+
27+
```
28+
mod foo {
29+
fn bar() { }
30+
}
31+
```
32+
33+
Then in the HIR the representation of module `foo` (the `Mod`
34+
stuct) would have only the **`ItemId`** `I` of `bar()`. To get the
35+
details of the function `bar()`, we would lookup `I` in the
36+
`items` map.
37+
38+
One nice result from this representation is that one can iterate
39+
over all items in the crate by iterating over the key-value pairs
40+
in these maps (without the need to trawl through the IR in total).
41+
There are similar maps for things like trait items and impl items,
42+
as well as "bodies" (explained below).
43+
44+
The other reason to setup the representation this way is for better
45+
integration with incremental compilation. This way, if you gain access
46+
to a `&hir::Item` (e.g. for the mod `foo`), you do not immediately
47+
gain access to the contents of the function `bar()`. Instead, you only
48+
gain access to the **id** for `bar()`, and you must invoke some
49+
function to lookup the contents of `bar()` given its id; this gives us
50+
a chance to observe that you accessed the data for `bar()` and record
51+
the dependency.
52+
53+
### Identifiers in the HIR
54+
55+
Most of the code that has to deal with things in HIR tends not to
56+
carry around references into the HIR, but rather to carry around
57+
*identifier numbers* (or just "ids"). Right now, you will find four
58+
sorts of identifiers in active use:
59+
60+
- `DefId`, which primarily name "definitions" or top-level items.
61+
- You can think of a `DefId` as being shorthand for a very explicit
62+
and complete path, like `std::collections::HashMap`. However,
63+
these paths are able to name things that are not nameable in
64+
normal Rust (e.g., impls), and they also include extra information
65+
about the crate (such as its version number, as two versions of
66+
the same crate can co-exist).
67+
- A `DefId` really consists of two parts, a `CrateNum` (which
68+
identifies the crate) and a `DefIndex` (which indixes into a list
69+
of items that is maintained per crate).
70+
- `HirId`, which combines the index of a particular item with an
71+
offset within that item.
72+
- the key point of a `HirId` is that it is *relative* to some item (which is named
73+
via a `DefId`).
74+
- `BodyId`, this is an absolute identifier that refers to a specific
75+
body (definition of a function or constant) in the crate. It is currently
76+
effectively a "newtype'd" `NodeId`.
77+
- `NodeId`, which is an absolute id that identifies a single node in the HIR tree.
78+
- While these are still in common use, **they are being slowly phased out**.
79+
- Since they are absolute within the crate, adding a new node
80+
anywhere in the tree causes the node-ids of all subsequent code in
81+
the crate to change. This is terrible for incremental compilation,
82+
as you can perhaps imagine.
83+
84+
### HIR Map
85+
86+
Most of the time when you are working with the HIR, you will do so via
87+
the **HIR Map**, accessible in the tcx via `tcx.hir` (and defined in
88+
the `hir::map` module). The HIR map contains a number of methods to
89+
convert between ids of various kinds and to lookup data associated
90+
with a HIR node.
91+
92+
For example, if you have a `DefId`, and you would like to convert it
93+
to a `NodeId`, you can use `tcx.hir.as_local_node_id(def_id)`. This
94+
returns an `Option<NodeId>` -- this will be `None` if the def-id
95+
refers to something outside of the current crate (since then it has no
96+
HIR node), but otherwise returns `Some(n)` where `n` is the node-id of
97+
the definition.
98+
99+
Similarly, you can use `tcx.hir.find(n)` to lookup the node for a
100+
`NodeId`. This returns a `Option<Node<'tcx>>`, where `Node` is an enum
101+
defined in the map; by matching on this you can find out what sort of
102+
node the node-id referred to and also get a pointer to the data
103+
itself. Often, you know what sort of node `n` is -- e.g., if you know
104+
that `n` must be some HIR expression, you can do
105+
`tcx.hir.expect_expr(n)`, which will extract and return the
106+
`&hir::Expr`, panicking if `n` is not in fact an expression.
107+
108+
Finally, you can use the HIR map to find the parents of nodes, via
109+
calls like `tcx.hir.get_parent_node(n)`.
110+
111+
### HIR Bodies
112+
113+
A **body** represents some kind of executable code, such as the body
114+
of a function/closure or the definition of a constant. Bodies are
115+
associated with an **owner**, which is typically some kind of item
116+
(e.g., a `fn()` or `const`), but could also be a closure expression
117+
(e.g., `|x, y| x + y`). You can use the HIR map to find find the body
118+
associated with a given def-id (`maybe_body_owned_by()`) or to find
119+
the owner of a body (`body_owner_def_id()`).

‎src/librustc/hir/map/README.md

+4
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
The HIR map, accessible via `tcx.hir`, allows you to quickly navigate the
2+
HIR and convert between various forms of identifiers. See [the HIR README] for more information.
3+
4+
[the HIR README]: ../README.md

‎src/librustc/hir/mod.rs

+25-1
Original file line numberDiff line numberDiff line change
@@ -413,6 +413,10 @@ pub struct WhereEqPredicate {
413413

414414
pub type CrateConfig = HirVec<P<MetaItem>>;
415415

416+
/// The top-level data structure that stores the entire contents of
417+
/// the crate currently being compiled.
418+
///
419+
/// For more details, see [the module-level README](README.md).
416420
#[derive(Clone, PartialEq, Eq, RustcEncodable, RustcDecodable, Debug)]
417421
pub struct Crate {
418422
pub module: Mod,
@@ -927,7 +931,27 @@ pub struct BodyId {
927931
pub node_id: NodeId,
928932
}
929933

930-
/// The body of a function or constant value.
934+
/// The body of a function, closure, or constant value. In the case of
935+
/// a function, the body contains not only the function body itself
936+
/// (which is an expression), but also the argument patterns, since
937+
/// those are something that the caller doesn't really care about.
938+
///
939+
/// # Examples
940+
///
941+
/// ```
942+
/// fn foo((x, y): (u32, u32)) -> u32 {
943+
/// x + y
944+
/// }
945+
/// ```
946+
///
947+
/// Here, the `Body` associated with `foo()` would contain:
948+
///
949+
/// - an `arguments` array containing the `(x, y)` pattern
950+
/// - a `value` containing the `x + y` expression (maybe wrapped in a block)
951+
/// - `is_generator` would be false
952+
///
953+
/// All bodies have an **owner**, which can be accessed via the HIR
954+
/// map using `body_owner_def_id()`.
931955
#[derive(Clone, PartialEq, Eq, RustcEncodable, RustcDecodable, Hash, Debug)]
932956
pub struct Body {
933957
pub arguments: HirVec<Arg>,

‎src/librustc/lib.rs

+22-1
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,28 @@
88
// option. This file may not be copied, modified, or distributed
99
// except according to those terms.
1010

11-
//! The Rust compiler.
11+
//! The "main crate" of the Rust compiler. This crate contains common
12+
//! type definitions that are used by the other crates in the rustc
13+
//! "family". Some prominent examples (note that each of these modules
14+
//! has their own README with further details).
15+
//!
16+
//! - **HIR.** The "high-level (H) intermediate representation (IR)" is
17+
//! defined in the `hir` module.
18+
//! - **MIR.** The "mid-level (M) intermediate representation (IR)" is
19+
//! defined in the `mir` module. This module contains only the
20+
//! *definition* of the MIR; the passes that transform and operate
21+
//! on MIR are found in `librustc_mir` crate.
22+
//! - **Types.** The internal representation of types used in rustc is
23+
//! defined in the `ty` module. This includes the **type context**
24+
//! (or `tcx`), which is the central context during most of
25+
//! compilation, containing the interners and other things.
26+
//! - **Traits.** Trait resolution is implemented in the `traits` module.
27+
//! - **Type inference.** The type inference code can be found in the `infer` module;
28+
//! this code handles low-level equality and subtyping operations. The
29+
//! type check pass in the compiler is found in the `librustc_typeck` crate.
30+
//!
31+
//! For a deeper explanation of how the compiler works and is
32+
//! organized, see the README.md file in this directory.
1233
//!
1334
//! # Note
1435
//!

‎src/librustc/ty/README.md

+165
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,165 @@
1+
# Types and the Type Context
2+
3+
The `ty` module defines how the Rust compiler represents types
4+
internally. It also defines the *typing context* (`tcx` or `TyCtxt`),
5+
which is the central data structure in the compiler.
6+
7+
## The tcx and how it uses lifetimes
8+
9+
The `tcx` ("typing context") is the central data structure in the
10+
compiler. It is the context that you use to perform all manner of
11+
queries. The struct `TyCtxt` defines a reference to this shared context:
12+
13+
```rust
14+
tcx: TyCtxt<'a, 'gcx, 'tcx>
15+
// -- ---- ----
16+
// | | |
17+
// | | innermost arena lifetime (if any)
18+
// | "global arena" lifetime
19+
// lifetime of this reference
20+
```
21+
22+
As you can see, the `TyCtxt` type takes three lifetime parameters.
23+
These lifetimes are perhaps the most complex thing to understand about
24+
the tcx. During Rust compilation, we allocate most of our memory in
25+
**arenas**, which are basically pools of memory that get freed all at
26+
once. When you see a reference with a lifetime like `'tcx` or `'gcx`,
27+
you know that it refers to arena-allocated data (or data that lives as
28+
long as the arenas, anyhow).
29+
30+
We use two distinct levels of arenas. The outer level is the "global
31+
arena". This arena lasts for the entire compilation: so anything you
32+
allocate in there is only freed once compilation is basically over
33+
(actually, when we shift to executing LLVM).
34+
35+
To reduce peak memory usage, when we do type inference, we also use an
36+
inner level of arena. These arenas get thrown away once type inference
37+
is over. This is done because type inference generates a lot of
38+
"throw-away" types that are not particularly interesting after type
39+
inference completes, so keeping around those allocations would be
40+
wasteful.
41+
42+
Often, we wish to write code that explicitly asserts that it is not
43+
taking place during inference. In that case, there is no "local"
44+
arena, and all the types that you can access are allocated in the
45+
global arena. To express this, the idea is to us the same lifetime
46+
for the `'gcx` and `'tcx` parameters of `TyCtxt`. Just to be a touch
47+
confusing, we tend to use the name `'tcx` in such contexts. Here is an
48+
example:
49+
50+
```rust
51+
fn not_in_inference<'a, 'tcx>(tcx: TyCtxt<'a, 'tcx, 'tcx>, def_id: DefId) {
52+
// ---- ----
53+
// Using the same lifetime here asserts
54+
// that the innermost arena accessible through
55+
// this reference *is* the global arena.
56+
}
57+
```
58+
59+
In contrast, if we want to code that can be usable during type inference, then you
60+
need to declare a distinct `'gcx` and `'tcx` lifetime parameter:
61+
62+
```rust
63+
fn maybe_in_inference<'a, 'gcx, 'tcx>(tcx: TyCtxt<'a, 'gcx, 'tcx>, def_id: DefId) {
64+
// ---- ----
65+
// Using different lifetimes here means that
66+
// the innermost arena *may* be distinct
67+
// from the global arena (but doesn't have to be).
68+
}
69+
```
70+
71+
### Allocating and working with types
72+
73+
Rust types are represented using the `Ty<'tcx>` defined in the `ty`
74+
module (not to be confused with the `Ty` struct from [the HIR]). This
75+
is in fact a simple type alias for a reference with `'tcx` lifetime:
76+
77+
```rust
78+
pub type Ty<'tcx> = &'tcx TyS<'tcx>;
79+
```
80+
81+
[the HIR]: ../hir/README.md
82+
83+
You can basically ignore the `TyS` struct -- you will basically never
84+
access it explicitly. We always pass it by reference using the
85+
`Ty<'tcx>` alias -- the only exception I think is to define inherent
86+
methods on types. Instances of `TyS` are only ever allocated in one of
87+
the rustc arenas (never e.g. on the stack).
88+
89+
One common operation on types is to **match** and see what kinds of
90+
types they are. This is done by doing `match ty.sty`, sort of like this:
91+
92+
```rust
93+
fn test_type<'tcx>(ty: Ty<'tcx>) {
94+
match ty.sty {
95+
ty::TyArray(elem_ty, len) => { ... }
96+
...
97+
}
98+
}
99+
```
100+
101+
The `sty` field (the origin of this name is unclear to me; perhaps
102+
structural type?) is of type `TypeVariants<'tcx>`, which is an enum
103+
definined all of the different kinds of types in the compiler.
104+
105+
> NB: inspecting the `sty` field on types during type inference can be
106+
> risky, as there are may be inference variables and other things to
107+
> consider, or sometimes types are not yet known that will become
108+
> known later.).
109+
110+
To allocate a new type, you can use the various `mk_` methods defined
111+
on the `tcx`. These have names that correpond mostly to the various kinds
112+
of type variants. For example:
113+
114+
```rust
115+
let array_ty = tcx.mk_array(elem_ty, len * 2);
116+
```
117+
118+
These methods all return a `Ty<'tcx>` -- note that the lifetime you
119+
get back is the lifetime of the innermost arena that this `tcx` has
120+
access to. In fact, types are always canonicalized and interned (so we
121+
never allocate exactly the same type twice) and are always allocated
122+
in the outermost arena where they can be (so, if they do not contain
123+
any inference variables or other "temporary" types, they will be
124+
allocated in the global arena). However, the lifetime `'tcx` is always
125+
a safe approximation, so that is what you get back.
126+
127+
> NB. Because types are interned, it is possible to compare them for
128+
> equality efficiently using `==` -- however, this is almost never what
129+
> you want to do unless you happen to be hashing and looking for
130+
> duplicates. This is because often in Rust there are multiple ways to
131+
> represent the same type, particularly once inference is involved. If
132+
> you are going to be testing for type equality, you probably need to
133+
> start looking into the inference code to do it right.
134+
135+
You can also find various common types in the tcx itself by accessing
136+
`tcx.types.bool`, `tcx.types.char`, etc (see `CommonTypes` for more).
137+
138+
### Beyond types: Other kinds of arena-allocated data structures
139+
140+
In addition to types, there are a number of other arena-allocated data
141+
structures that you can allocate, and which are found in this
142+
module. Here are a few examples:
143+
144+
- `Substs`, allocated with `mk_substs` -- this will intern a slice of types, often used to
145+
specify the values to be substituted for generics (e.g., `HashMap<i32, u32>`
146+
would be represented as a slice `&'tcx [tcx.types.i32, tcx.types.u32]`.
147+
- `TraitRef`, typically passed by value -- a **trait reference**
148+
consists of a reference to a trait along with its various type
149+
parameters (including `Self`), like `i32: Display` (here, the def-id
150+
would reference the `Display` trait, and the substs would contain
151+
`i32`).
152+
- `Predicate` defines something the trait system has to prove (see `traits` module).
153+
154+
### Import conventions
155+
156+
Although there is no hard and fast rule, the `ty` module tends to be used like so:
157+
158+
```rust
159+
use ty::{self, Ty, TyCtxt};
160+
```
161+
162+
In particular, since they are so common, the `Ty` and `TyCtxt` types
163+
are imported directly. Other types are often referenced with an
164+
explicit `ty::` prefix (e.g., `ty::TraitRef<'tcx>`). But some modules
165+
choose to import a larger or smaller set of names explicitly.

‎src/librustc/ty/context.rs

+4-3
Original file line numberDiff line numberDiff line change
@@ -793,9 +793,10 @@ impl<'tcx> CommonTypes<'tcx> {
793793
}
794794
}
795795

796-
/// The data structure to keep track of all the information that typechecker
797-
/// generates so that so that it can be reused and doesn't have to be redone
798-
/// later on.
796+
/// The central data structure of the compiler. It stores references
797+
/// to the various **arenas** and also houses the results of the
798+
/// various **compiler queries** that have been performed. See [the
799+
/// README](README.md) for more deatils.
799800
#[derive(Copy, Clone)]
800801
pub struct TyCtxt<'a, 'gcx: 'a+'tcx, 'tcx: 'a> {
801802
gcx: &'a GlobalCtxt<'gcx>,

‎src/librustc/ty/maps.rs

-1,551
This file was deleted.

‎src/librustc/ty/maps/README.md

+302
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,302 @@
1+
# The Rust Compiler Query System
2+
3+
The Compiler Query System is the key to our new demand-driven
4+
organization. The idea is pretty simple. You have various queries
5+
that compute things about the input -- for example, there is a query
6+
called `type_of(def_id)` that, given the def-id of some item, will
7+
compute the type of that item and return it to you.
8+
9+
Query execution is **memoized** -- so the first time you invoke a
10+
query, it will go do the computation, but the next time, the result is
11+
returned from a hashtable. Moreover, query execution fits nicely into
12+
**incremental computation**; the idea is roughly that, when you do a
13+
query, the result **may** be returned to you by loading stored data
14+
from disk (but that's a separate topic we won't discuss further here).
15+
16+
The overall vision is that, eventually, the entire compiler
17+
control-flow will be query driven. There will effectively be one
18+
top-level query ("compile") that will run compilation on a crate; this
19+
will in turn demand information about that crate, starting from the
20+
*end*. For example:
21+
22+
- This "compile" query might demand to get a list of codegen-units
23+
(i.e., modules that need to be compiled by LLVM).
24+
- But computing the list of codegen-units would invoke some subquery
25+
that returns the list of all modules defined in the Rust source.
26+
- That query in turn would invoke something asking for the HIR.
27+
- This keeps going further and further back until we wind up doing the
28+
actual parsing.
29+
30+
However, that vision is not fully realized. Still, big chunks of the
31+
compiler (for example, generating MIR) work exactly like this.
32+
33+
### Invoking queries
34+
35+
To invoke a query is simple. The tcx ("type context") offers a method
36+
for each defined query. So, for example, to invoke the `type_of`
37+
query, you would just do this:
38+
39+
```rust
40+
let ty = tcx.type_of(some_def_id);
41+
```
42+
43+
### Cycles between queries
44+
45+
Currently, cycles during query execution should always result in a
46+
compilation error. Typically, they arise because of illegal programs
47+
that contain cyclic references they shouldn't (though sometimes they
48+
arise because of compiler bugs, in which case we need to factor our
49+
queries in a more fine-grained fashion to avoid them).
50+
51+
However, it is nonetheless often useful to *recover* from a cycle
52+
(after reporting an error, say) and try to soldier on, so as to give a
53+
better user experience. In order to recover from a cycle, you don't
54+
get to use the nice method-call-style syntax. Instead, you invoke
55+
using the `try_get` method, which looks roughly like this:
56+
57+
```rust
58+
use ty::maps::queries;
59+
...
60+
match queries::type_of::try_get(tcx, DUMMY_SP, self.did) {
61+
Ok(result) => {
62+
// no cycle occurred! You can use `result`
63+
}
64+
Err(err) => {
65+
// A cycle occurred! The error value `err` is a `DiagnosticBuilder`,
66+
// meaning essentially an "in-progress", not-yet-reported error message.
67+
// See below for more details on what to do here.
68+
}
69+
}
70+
```
71+
72+
So, if you get back an `Err` from `try_get`, then a cycle *did* occur. This means that
73+
you must ensure that a compiler error message is reported. You can do that in two ways:
74+
75+
The simplest is to invoke `err.emit()`. This will emit the cycle error to the user.
76+
77+
However, often cycles happen because of an illegal program, and you
78+
know at that point that an error either already has been reported or
79+
will be reported due to this cycle by some other bit of code. In that
80+
case, you can invoke `err.cancel()` to not emit any error. It is
81+
traditional to then invoke:
82+
83+
```
84+
tcx.sess.delay_span_bug(some_span, "some message")
85+
```
86+
87+
`delay_span_bug()` is a helper that says: we expect a compilation
88+
error to have happened or to happen in the future; so, if compilation
89+
ultimately succeeds, make an ICE with the message `"some
90+
message"`. This is basically just a precaution in case you are wrong.
91+
92+
### How the compiler executes a query
93+
94+
So you may be wondering what happens when you invoke a query
95+
method. The answer is that, for each query, the compiler maintains a
96+
cache -- if your query has already been executed, then, the answer is
97+
simple: we clone the return value out of the cache and return it
98+
(therefore, you should try to ensure that the return types of queries
99+
are cheaply cloneable; insert a `Rc` if necessary).
100+
101+
#### Providers
102+
103+
If, however, the query is *not* in the cache, then the compiler will
104+
try to find a suitable **provider**. A provider is a function that has
105+
been defined and linked into the compiler somewhere that contains the
106+
code to compute the result of the query.
107+
108+
**Providers are defined per-crate.** The compiler maintains,
109+
internally, a table of providers for every crate, at least
110+
conceptually. Right now, there are really two sets: the providers for
111+
queries about the **local crate** (that is, the one being compiled)
112+
and providers for queries about **external crates** (that is,
113+
dependencies of the local crate). Note that what determines the crate
114+
that a query is targeting is not the *kind* of query, but the *key*.
115+
For example, when you invoke `tcx.type_of(def_id)`, that could be a
116+
local query or an external query, depending on what crate the `def_id`
117+
is referring to (see the `self::keys::Key` trait for more information
118+
on how that works).
119+
120+
Providers always have the same signature:
121+
122+
```rust
123+
fn provider<'cx, 'tcx>(tcx: TyCtxt<'cx, 'tcx, 'tcx>,
124+
key: QUERY_KEY)
125+
-> QUERY_RESULT
126+
{
127+
...
128+
}
129+
```
130+
131+
Providers take two arguments: the `tcx` and the query key. Note also
132+
that they take the *global* tcx (i.e., they use the `'tcx` lifetime
133+
twice), rather than taking a tcx with some active inference context.
134+
They return the result of the query.
135+
136+
#### How providers are setup
137+
138+
When the tcx is created, it is given the providers by its creator using
139+
the `Providers` struct. This struct is generate by the macros here, but it
140+
is basically a big list of function pointers:
141+
142+
```rust
143+
struct Providers {
144+
type_of: for<'cx, 'tcx> fn(TyCtxt<'cx, 'tcx, 'tcx>, DefId) -> Ty<'tcx>,
145+
...
146+
}
147+
```
148+
149+
At present, we have one copy of the struct for local crates, and one
150+
for external crates, though the plan is that we may eventually have
151+
one per crate.
152+
153+
These `Provider` structs are ultimately created and populated by
154+
`librustc_driver`, but it does this by distributing the work
155+
throughout the other `rustc_*` crates. This is done by invoking
156+
various `provide` functions. These functions tend to look something
157+
like this:
158+
159+
```rust
160+
pub fn provide(providers: &mut Providers) {
161+
*providers = Providers {
162+
type_of,
163+
..*providers
164+
};
165+
}
166+
```
167+
168+
That is, they take an `&mut Providers` and mutate it in place. Usually
169+
we use the formulation above just because it looks nice, but you could
170+
as well do `providers.type_of = type_of`, which would be equivalent.
171+
(Here, `type_of` would be a top-level function, defined as we saw
172+
before.) So, if we wanted to have add a provider for some other query,
173+
let's call it `fubar`, into the crate above, we might modify the `provide()`
174+
function like so:
175+
176+
```rust
177+
pub fn provide(providers: &mut Providers) {
178+
*providers = Providers {
179+
type_of,
180+
fubar,
181+
..*providers
182+
};
183+
}
184+
185+
fn fubar<'cx, 'tcx>(tcx: TyCtxt<'cx, 'tcx>, key: DefId) -> Fubar<'tcx> { .. }
186+
```
187+
188+
NB. Most of the `rustc_*` crate only provide **local
189+
providers**. Almost all **extern providers** wind up going through the
190+
`rustc_metadata` crate, which loads the information from the crate
191+
metadata. But in some cases there are crates that provide queries for
192+
*both* local and external crates, in which case they define both a
193+
`provide` and a `provide_extern` function that `rustc_driver` can
194+
invoke.
195+
196+
### Adding a new kind of query
197+
198+
So suppose you want to add a new kind of query, how do you do so?
199+
Well, defining a query takes place in two steps:
200+
201+
1. first, you have to specify the query name and arguments; and then,
202+
2. you have to supply query providers where needed.
203+
204+
The specify the query name and arguments, you simply add an entry
205+
to the big macro invocation in `mod.rs`. This will probably have changed
206+
by the time you read this README, but at present it looks something
207+
like:
208+
209+
```
210+
define_maps! { <'tcx>
211+
/// Records the type of every item.
212+
[] fn type_of: TypeOfItem(DefId) -> Ty<'tcx>,
213+
214+
...
215+
}
216+
```
217+
218+
Each line of the macro defines one query. The name is broken up like this:
219+
220+
```
221+
[] fn type_of: TypeOfItem(DefId) -> Ty<'tcx>,
222+
^^ ^^^^^^^ ^^^^^^^^^^ ^^^^^ ^^^^^^^^
223+
| | | | |
224+
| | | | result type of query
225+
| | | query key type
226+
| | dep-node constructor
227+
| name of query
228+
query flags
229+
```
230+
231+
Let's go over them one by one:
232+
233+
- **Query flags:** these are largely unused right now, but the intention
234+
is that we'll be able to customize various aspects of how the query is
235+
processed.
236+
- **Name of query:** the name of the query method
237+
(`tcx.type_of(..)`). Also used as the name of a struct
238+
(`ty::maps::queries::type_of`) that will be generated to represent
239+
this query.
240+
- **Dep-node constructor:** indicates the constructor function that
241+
connects this query to incremental compilation. Typically, this is a
242+
`DepNode` variant, which can be added by modifying the
243+
`define_dep_nodes!` macro invocation in
244+
`librustc/dep_graph/dep_node.rs`.
245+
- However, sometimes we use a custom function, in which case the
246+
name will be in snake case and the function will be defined at the
247+
bottom of the file. This is typically used when the query key is
248+
not a def-id, or just not the type that the dep-node expects.
249+
- **Query key type:** the type of the argument to this query.
250+
This type must implement the `ty::maps::keys::Key` trait, which
251+
defines (for example) how to map it to a crate, and so forth.
252+
- **Result type of query:** the type produced by this query. This type
253+
should (a) not use `RefCell` or other interior mutability and (b) be
254+
cheaply cloneable. Interning or using `Rc` or `Arc` is recommended for
255+
non-trivial data types.
256+
- The one exception to those rules is the `ty::steal::Steal` type,
257+
which is used to cheaply modify MIR in place. See the definition
258+
of `Steal` for more details. New uses of `Steal` should **not** be
259+
added without alerting `@rust-lang/compiler`.
260+
261+
So, to add a query:
262+
263+
- Add an entry to `define_maps!` using the format above.
264+
- Possibly add a corresponding entry to the dep-node macro.
265+
- Link the provider by modifying the appropriate `provide` method;
266+
or add a new one if needed and ensure that `rustc_driver` is invoking it.
267+
268+
#### Query structs and descriptions
269+
270+
For each kind, the `define_maps` macro will generate a "query struct"
271+
named after the query. This struct is a kind of a place-holder
272+
describing the query. Each such struct implements the
273+
`self::config::QueryConfig` trait, which has associated types for the
274+
key/value of that particular query. Basically the code generated looks something
275+
like this:
276+
277+
```rust
278+
// Dummy struct representing a particular kind of query:
279+
pub struct type_of<'tcx> { phantom: PhantomData<&'tcx ()> }
280+
281+
impl<'tcx> QueryConfig for type_of<'tcx> {
282+
type Key = DefId;
283+
type Value = Ty<'tcx>;
284+
}
285+
```
286+
287+
There is an additional trait that you may wish to implement called
288+
`self::config::QueryDescription`. This trait is used during cycle
289+
errors to give a "human readable" name for the query, so that we can
290+
summarize what was happening when the cycle occurred. Implementing
291+
this trait is optional if the query key is `DefId`, but if you *don't*
292+
implement it, you get a pretty generic error ("processing `foo`...").
293+
You can put new impls into the `config` module. They look something like this:
294+
295+
```rust
296+
impl<'tcx> QueryDescription for queries::type_of<'tcx> {
297+
fn describe(tcx: TyCtxt, key: DefId) -> String {
298+
format!("computing the type of `{}`", tcx.item_path_str(key))
299+
}
300+
}
301+
```
302+

‎src/librustc/ty/maps/config.rs

+492
Large diffs are not rendered by default.

‎src/librustc/ty/maps/keys.rs

+162
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,162 @@
1+
// Copyright 2012-2015 The Rust Project Developers. See the COPYRIGHT
2+
// file at the top-level directory of this distribution and at
3+
// http://rust-lang.org/COPYRIGHT.
4+
//
5+
// Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
6+
// http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
7+
// <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
8+
// option. This file may not be copied, modified, or distributed
9+
// except according to those terms.
10+
11+
//! Defines the set of legal keys that can be used in queries.
12+
13+
use hir::def_id::{CrateNum, DefId, LOCAL_CRATE, DefIndex};
14+
use mir::transform::{MirSuite, MirPassIndex};
15+
use ty::{self, Ty, TyCtxt};
16+
use ty::subst::Substs;
17+
use ty::fast_reject::SimplifiedType;
18+
19+
use std::fmt::Debug;
20+
use std::hash::Hash;
21+
use syntax_pos::{Span, DUMMY_SP};
22+
use syntax_pos::symbol::InternedString;
23+
24+
/// The `Key` trait controls what types can legally be used as the key
25+
/// for a query.
26+
pub trait Key: Clone + Hash + Eq + Debug {
27+
/// Given an instance of this key, what crate is it referring to?
28+
/// This is used to find the provider.
29+
fn map_crate(&self) -> CrateNum;
30+
31+
/// In the event that a cycle occurs, if no explicit span has been
32+
/// given for a query with key `self`, what span should we use?
33+
fn default_span(&self, tcx: TyCtxt) -> Span;
34+
}
35+
36+
impl<'tcx> Key for ty::InstanceDef<'tcx> {
37+
fn map_crate(&self) -> CrateNum {
38+
LOCAL_CRATE
39+
}
40+
41+
fn default_span(&self, tcx: TyCtxt) -> Span {
42+
tcx.def_span(self.def_id())
43+
}
44+
}
45+
46+
impl<'tcx> Key for ty::Instance<'tcx> {
47+
fn map_crate(&self) -> CrateNum {
48+
LOCAL_CRATE
49+
}
50+
51+
fn default_span(&self, tcx: TyCtxt) -> Span {
52+
tcx.def_span(self.def_id())
53+
}
54+
}
55+
56+
impl Key for CrateNum {
57+
fn map_crate(&self) -> CrateNum {
58+
*self
59+
}
60+
fn default_span(&self, _: TyCtxt) -> Span {
61+
DUMMY_SP
62+
}
63+
}
64+
65+
impl Key for DefIndex {
66+
fn map_crate(&self) -> CrateNum {
67+
LOCAL_CRATE
68+
}
69+
fn default_span(&self, _tcx: TyCtxt) -> Span {
70+
DUMMY_SP
71+
}
72+
}
73+
74+
impl Key for DefId {
75+
fn map_crate(&self) -> CrateNum {
76+
self.krate
77+
}
78+
fn default_span(&self, tcx: TyCtxt) -> Span {
79+
tcx.def_span(*self)
80+
}
81+
}
82+
83+
impl Key for (DefId, DefId) {
84+
fn map_crate(&self) -> CrateNum {
85+
self.0.krate
86+
}
87+
fn default_span(&self, tcx: TyCtxt) -> Span {
88+
self.1.default_span(tcx)
89+
}
90+
}
91+
92+
impl Key for (CrateNum, DefId) {
93+
fn map_crate(&self) -> CrateNum {
94+
self.0
95+
}
96+
fn default_span(&self, tcx: TyCtxt) -> Span {
97+
self.1.default_span(tcx)
98+
}
99+
}
100+
101+
impl Key for (DefId, SimplifiedType) {
102+
fn map_crate(&self) -> CrateNum {
103+
self.0.krate
104+
}
105+
fn default_span(&self, tcx: TyCtxt) -> Span {
106+
self.0.default_span(tcx)
107+
}
108+
}
109+
110+
impl<'tcx> Key for (DefId, &'tcx Substs<'tcx>) {
111+
fn map_crate(&self) -> CrateNum {
112+
self.0.krate
113+
}
114+
fn default_span(&self, tcx: TyCtxt) -> Span {
115+
self.0.default_span(tcx)
116+
}
117+
}
118+
119+
impl Key for (MirSuite, DefId) {
120+
fn map_crate(&self) -> CrateNum {
121+
self.1.map_crate()
122+
}
123+
fn default_span(&self, tcx: TyCtxt) -> Span {
124+
self.1.default_span(tcx)
125+
}
126+
}
127+
128+
impl Key for (MirSuite, MirPassIndex, DefId) {
129+
fn map_crate(&self) -> CrateNum {
130+
self.2.map_crate()
131+
}
132+
fn default_span(&self, tcx: TyCtxt) -> Span {
133+
self.2.default_span(tcx)
134+
}
135+
}
136+
137+
impl<'tcx> Key for Ty<'tcx> {
138+
fn map_crate(&self) -> CrateNum {
139+
LOCAL_CRATE
140+
}
141+
fn default_span(&self, _: TyCtxt) -> Span {
142+
DUMMY_SP
143+
}
144+
}
145+
146+
impl<'tcx, T: Key> Key for ty::ParamEnvAnd<'tcx, T> {
147+
fn map_crate(&self) -> CrateNum {
148+
self.value.map_crate()
149+
}
150+
fn default_span(&self, tcx: TyCtxt) -> Span {
151+
self.value.default_span(tcx)
152+
}
153+
}
154+
155+
impl Key for InternedString {
156+
fn map_crate(&self) -> CrateNum {
157+
LOCAL_CRATE
158+
}
159+
fn default_span(&self, _tcx: TyCtxt) -> Span {
160+
DUMMY_SP
161+
}
162+
}

‎src/librustc/ty/maps/mod.rs

+453
Large diffs are not rendered by default.

‎src/librustc/ty/maps/plumbing.rs

+494
Large diffs are not rendered by default.

‎src/librustc/ty/maps/values.rs

+49
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
// Copyright 2012-2015 The Rust Project Developers. See the COPYRIGHT
2+
// file at the top-level directory of this distribution and at
3+
// http://rust-lang.org/COPYRIGHT.
4+
//
5+
// Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
6+
// http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
7+
// <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
8+
// option. This file may not be copied, modified, or distributed
9+
// except according to those terms.
10+
11+
use ty::{self, Ty, TyCtxt};
12+
13+
use syntax::symbol::Symbol;
14+
15+
pub(super) trait Value<'tcx>: Sized {
16+
fn from_cycle_error<'a>(tcx: TyCtxt<'a, 'tcx, 'tcx>) -> Self;
17+
}
18+
19+
impl<'tcx, T> Value<'tcx> for T {
20+
default fn from_cycle_error<'a>(tcx: TyCtxt<'a, 'tcx, 'tcx>) -> T {
21+
tcx.sess.abort_if_errors();
22+
bug!("Value::from_cycle_error called without errors");
23+
}
24+
}
25+
26+
impl<'tcx, T: Default> Value<'tcx> for T {
27+
default fn from_cycle_error<'a>(_: TyCtxt<'a, 'tcx, 'tcx>) -> T {
28+
T::default()
29+
}
30+
}
31+
32+
impl<'tcx> Value<'tcx> for Ty<'tcx> {
33+
fn from_cycle_error<'a>(tcx: TyCtxt<'a, 'tcx, 'tcx>) -> Ty<'tcx> {
34+
tcx.types.err
35+
}
36+
}
37+
38+
impl<'tcx> Value<'tcx> for ty::DtorckConstraint<'tcx> {
39+
fn from_cycle_error<'a>(_: TyCtxt<'a, 'tcx, 'tcx>) -> Self {
40+
Self::empty()
41+
}
42+
}
43+
44+
impl<'tcx> Value<'tcx> for ty::SymbolName {
45+
fn from_cycle_error<'a>(_: TyCtxt<'a, 'tcx, 'tcx>) -> Self {
46+
ty::SymbolName { name: Symbol::intern("<error>").as_str() }
47+
}
48+
}
49+

‎src/librustc_back/README.md

+6
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
NB: This crate is part of the Rust compiler. For an overview of the
2+
compiler as a whole, see
3+
[the README.md file found in `librustc`](../librustc/README.md).
4+
5+
`librustc_back` contains some very low-level details that are
6+
specific to different LLVM targets and so forth.

‎src/librustc_driver/README.md

+12
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
NB: This crate is part of the Rust compiler. For an overview of the
2+
compiler as a whole, see
3+
[the README.md file found in `librustc`](../librustc/README.md).
4+
5+
The `driver` crate is effectively the "main" function for the rust
6+
compiler. It orchstrates the compilation process and "knits together"
7+
the code from the other crates within rustc. This crate itself does
8+
not contain any of the "main logic" of the compiler (though it does
9+
have some code related to pretty printing or other minor compiler
10+
options).
11+
12+

‎src/librustc_trans/README.md

+7-1
Original file line numberDiff line numberDiff line change
@@ -1 +1,7 @@
1-
See [librustc/README.md](../librustc/README.md).
1+
NB: This crate is part of the Rust compiler. For an overview of the
2+
compiler as a whole, see
3+
[the README.md file found in `librustc`](../librustc/README.md).
4+
5+
The `trans` crate contains the code to convert from MIR into LLVM IR,
6+
and then from LLVM IR into machine code. In general it contains code
7+
that runs towards the end of the compilation process.

‎src/librustc_typeck/README.md

+48
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
NB: This crate is part of the Rust compiler. For an overview of the
2+
compiler as a whole, see
3+
[the README.md file found in `librustc`](../librustc/README.md).
4+
5+
The `rustc_typeck` crate contains the source for "type collection" and
6+
"type checking", as well as a few other bits of related functionality.
7+
(It draws heavily on the [type inferencing][infer] and
8+
[trait solving][traits] code found in librustc.)
9+
10+
[infer]: ../librustc/infer/README.md
11+
[traits]: ../librustc/traits/README.md
12+
13+
## Type collection
14+
15+
Type "collection" is the process of convering the types found in the
16+
HIR (`hir::Ty`), which represent the syntactic things that the user
17+
wrote, into the **internal representation** used by the compiler
18+
(`Ty<'tcx>`) -- we also do similar conversions for where-clauses and
19+
other bits of the function signature.
20+
21+
To try and get a sense for the difference, consider this function:
22+
23+
```rust
24+
struct Foo { }
25+
fn foo(x: Foo, y: self::Foo) { .. }
26+
// ^^^ ^^^^^^^^^
27+
```
28+
29+
Those two parameters `x` and `y` each have the same type: but they
30+
will have distinct `hir::Ty` nodes. Those nodes will have different
31+
spans, and of course they encode the path somewhat differently. But
32+
once they are "collected" into `Ty<'tcx>` nodes, they will be
33+
represented by the exact same internal type.
34+
35+
Collection is defined as a bundle of queries (e.g., `type_of`) for
36+
computing information about the various functions, traits, and other
37+
items in the crate being compiled. Note that each of these queries is
38+
concerned with *interprocedural* things -- for example, for a function
39+
definition, collection will figure out the type and signature of the
40+
function, but it will not visit the *body* of the function in any way,
41+
nor examine type annotations on local variables (that's the job of
42+
type *checking*).
43+
44+
For more details, see the `collect` module.
45+
46+
## Type checking
47+
48+
TODO

‎src/librustc_typeck/collect.rs

+15-44
Original file line numberDiff line numberDiff line change
@@ -8,50 +8,21 @@
88
// option. This file may not be copied, modified, or distributed
99
// except according to those terms.
1010

11-
/*
12-
13-
# Collect phase
14-
15-
The collect phase of type check has the job of visiting all items,
16-
determining their type, and writing that type into the `tcx.types`
17-
table. Despite its name, this table does not really operate as a
18-
*cache*, at least not for the types of items defined within the
19-
current crate: we assume that after the collect phase, the types of
20-
all local items will be present in the table.
21-
22-
Unlike most of the types that are present in Rust, the types computed
23-
for each item are in fact type schemes. This means that they are
24-
generic types that may have type parameters. TypeSchemes are
25-
represented by a pair of `Generics` and `Ty`. Type
26-
parameters themselves are represented as `ty_param()` instances.
27-
28-
The phasing of type conversion is somewhat complicated. There is no
29-
clear set of phases we can enforce (e.g., converting traits first,
30-
then types, or something like that) because the user can introduce
31-
arbitrary interdependencies. So instead we generally convert things
32-
lazilly and on demand, and include logic that checks for cycles.
33-
Demand is driven by calls to `AstConv::get_item_type_scheme` or
34-
`AstConv::trait_def`.
35-
36-
Currently, we "convert" types and traits in two phases (note that
37-
conversion only affects the types of items / enum variants / methods;
38-
it does not e.g. compute the types of individual expressions):
39-
40-
0. Intrinsics
41-
1. Trait/Type definitions
42-
43-
Conversion itself is done by simply walking each of the items in turn
44-
and invoking an appropriate function (e.g., `trait_def_of_item` or
45-
`convert_item`). However, it is possible that while converting an
46-
item, we may need to compute the *type scheme* or *trait definition*
47-
for other items.
48-
49-
There are some shortcomings in this design:
50-
- Because the item generics include defaults, cycles through type
51-
parameter defaults are illegal even if those defaults are never
52-
employed. This is not necessarily a bug.
53-
54-
*/
11+
//! "Collection" is the process of determining the type and other external
12+
//! details of each item in Rust. Collection is specifically concerned
13+
//! with *interprocedural* things -- for example, for a function
14+
//! definition, collection will figure out the type and signature of the
15+
//! function, but it will not visit the *body* of the function in any way,
16+
//! nor examine type annotations on local variables (that's the job of
17+
//! type *checking*).
18+
//!
19+
//! Collecting is ultimately defined by a bundle of queries that
20+
//! inquire after various facts about the items in the crate (e.g.,
21+
//! `type_of`, `generics_of`, `predicates_of`, etc). See the `provide` function
22+
//! for the full set.
23+
//!
24+
//! At present, however, we do run collection across all items in the
25+
//! crate as a kind of pass. This should eventually be factored away.
5526
5627
use astconv::{AstConv, Bounds};
5728
use lint;

‎src/libsyntax/README.md

+7
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
NB: This crate is part of the Rust compiler. For an overview of the
2+
compiler as a whole, see
3+
[the README.md file found in `librustc`](../librustc/README.md).
4+
5+
The `syntax` crate contains those things concerned purely with syntax
6+
– that is, the AST ("abstract syntax tree"), parser, pretty-printer,
7+
lexer, macro expander, and utilities for traversing ASTs.

0 commit comments

Comments
 (0)
Please sign in to comment.