Skip to content

Commit 0ffc56b

Browse files
Merge pull request #270 from michaelwoerister/query-eval-model-update
Add "The Query Evaluation Model in Detail" and "Incremental Compilation In Detail" chapters.
2 parents 3dadf43 + 808a9a1 commit 0ffc56b

File tree

7 files changed

+603
-57
lines changed

7 files changed

+603
-57
lines changed

src/SUMMARY.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,9 @@
2727
- [The Rustc Driver](./rustc-driver.md)
2828
- [Rustdoc](./rustdoc.md)
2929
- [Queries: demand-driven compilation](./query.md)
30-
- [Incremental compilation](./incremental-compilation.md)
30+
- [The Query Evaluation Model in Detail](./queries/query-evaluation-model-in-detail.md)
31+
- [Incremental compilation](./queries/incremental-compilation.md)
32+
- [Incremental compilation In Detail](./queries/incremental-compilation-in-detail.md)
3133
- [Debugging and Testing](./incrcomp-debugging.md)
3234
- [The parser](./the-parser.md)
3335
- [`#[test]` Implementation](./test-implementation.md)

src/appendix/glossary.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ completeness | completeness is a technical term in type theory. Comp
1515
control-flow graph | a representation of the control-flow of a program; see [the background chapter for more](./background.html#cfg)
1616
CTFE | Compile-Time Function Evaluation. This is the ability of the compiler to evaluate `const fn`s at compile time. This is part of the compiler's constant evaluation system. ([see more](../const-eval.html))
1717
cx | we tend to use "cx" as an abbreviation for context. See also `tcx`, `infcx`, etc.
18-
DAG | a directed acyclic graph is used during compilation to keep track of dependencies between queries. ([see more](../incremental-compilation.html))
18+
DAG | a directed acyclic graph is used during compilation to keep track of dependencies between queries. ([see more](../queries/incremental-compilation.html))
1919
data-flow analysis | a static analysis that figures out what properties are true at each point in the control-flow of a program; see [the background chapter for more](./background.html#dataflow)
2020
DefId | an index identifying a definition (see `librustc/hir/def_id.rs`). Uniquely identifies a `DefPath`.
2121
Double pointer | a pointer with additional metadata. See "fat pointer" for more.

src/queries/incremental-compilation-in-detail.md

Lines changed: 354 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 237 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,237 @@
1+
2+
3+
# The Query Evaluation Model in Detail
4+
5+
This chapter provides a deeper dive into the abstract model queries are built on.
6+
It does not go into implementation details but tries to explain
7+
the underlying logic. The examples here, therefore, have been stripped down and
8+
simplified and don't directly reflect the compilers internal APIs.
9+
10+
## What is a query?
11+
12+
Abstractly we view the compiler's knowledge about a given crate as a "database"
13+
and queries are the way of asking the compiler questions about it, i.e.
14+
we "query" the compiler's "database" for facts.
15+
16+
However, there's something special to this compiler database: It starts out empty
17+
and is filled on-demand when queries are executed. Consequently, a query must
18+
know how to compute its result if the database does not contain it yet. For
19+
doing so, it can access other queries and certain input values that the database
20+
is pre-filled with on creation.
21+
22+
A query thus consists of the following things:
23+
24+
- A name that identifies the query
25+
- A "key" that specifies what we want to look up
26+
- A result type that specifies what kind of result it yields
27+
- A "provider" which is a function that specifies how the result is to be
28+
computed if it isn't already present in the database.
29+
30+
As an example, the name of the `type_of` query is `type_of`, its query key is a
31+
`DefId` identifying the item we want to know the type of, the result type is
32+
`Ty<'tcx>`, and the provider is a function that, given the query key and access
33+
to the rest of the database, can compute the type of the item identified by the
34+
key.
35+
36+
So in some sense a query is just a function that maps the query key to the
37+
corresponding result. However, we have to apply some restrictions in order for
38+
this to be sound:
39+
40+
- The key and result must be immutable values.
41+
- The provider function must be a pure function, that is, for the same key it
42+
must always yield the same result.
43+
- The only parameters a provider function takes are the key and a reference to
44+
the "query context" (which provides access to rest of the "database").
45+
46+
The database is built up lazily by invoking queries. The query providers will
47+
invoke other queries, for which the result is either already cached or computed
48+
by calling another query provider. These query provider invocations
49+
conceptually form a directed acyclic graph (DAG) at the leaves of which are
50+
input values that are already known when the query context is created.
51+
52+
53+
54+
## Caching/Memoization
55+
56+
Results of query invocations are "memoized" which means that the query context
57+
will cache the result in an internal table and, when the query is invoked with
58+
the same query key again, will return the result from the cache instead of
59+
running the provider again.
60+
61+
This caching is crucial for making the query engine efficient. Without
62+
memoization the system would still be sound (that is, it would yield the same
63+
results) but the same computations would be done over and over again.
64+
65+
Memoization is one of the main reasons why query providers have to be pure
66+
functions. If calling a provider function could yield different results for
67+
each invocation (because it accesses some global mutable state) then we could
68+
not memoize the result.
69+
70+
71+
72+
## Input data
73+
74+
When the query context is created, it is still empty: No queries have been
75+
executed, no results are cached. But the context already provides access to
76+
"input" data, i.e. pieces of immutable data that where computed before the
77+
context was created and that queries can access to do their computations.
78+
Currently this input data consists mainly of the HIR map and the command-line
79+
options the compiler was invoked with. In the future, inputs will just consist
80+
of command-line options and a list of source files -- the HIR map will itself
81+
be provided by a query which processes these source files.
82+
83+
Without inputs, queries would live in a void without anything to compute their
84+
result from (remember, query providers only have access to other queries and
85+
the context but not any other outside state or information).
86+
87+
For a query provider, input data and results of other queries look exactly the
88+
same: It just tells the context "give me the value of X". Because input data
89+
is immutable, the provider can rely on it being the same across
90+
different query invocations, just as is the case for query results.
91+
92+
93+
94+
## An example execution trace of some queries
95+
96+
How does this DAG of query invocations come into existence? At some point
97+
the compiler driver will create the, as yet empty, query context. It will then,
98+
from outside of the query system, invoke the queries it needs to perform its
99+
task. This looks something like the following:
100+
101+
```rust,ignore
102+
fn compile_crate() {}
103+
let cli_options = ...;
104+
let hir_map = ...;
105+
106+
// Create the query context `tcx`
107+
let tcx = TyCtxt::new(cli_options, hir_map);
108+
109+
// Do type checking by invoking the type check query
110+
tcx.type_check_crate();
111+
}
112+
```
113+
114+
The `type_check_crate` query provider would look something like the following:
115+
116+
```rust,ignore
117+
fn type_check_crate_provider(tcx, _key: ()) {
118+
let list_of_items = tcx.hir_map.list_of_items();
119+
120+
for item_def_id in list_of_hir_items {
121+
tcx.type_check_item(item_def_id);
122+
}
123+
}
124+
```
125+
126+
We see that the `type_check_crate` query accesses input data
127+
(`tcx.hir_map.list_of_items()`) and invokes other queries
128+
(`type_check_item`). The `type_check_item`
129+
invocations will themselves access input data and/or invoke other queries,
130+
so that in the end the DAG of query invocations will be built up backwards
131+
from the node that was initially executed:
132+
133+
```ignore
134+
(2) (1)
135+
list_of_all_hir_items <----------------------------- type_check_crate()
136+
|
137+
(5) (4) (3) |
138+
Hir(foo) <--- type_of(foo) <--- type_check_item(foo) <-------+
139+
| |
140+
+-----------------+ |
141+
| |
142+
(7) v (6) (8) |
143+
Hir(bar) <--- type_of(bar) <--- type_check_item(bar) <-------+
144+
145+
// (x) denotes invocation order
146+
```
147+
148+
We also see that often a query result can be read from the cache:
149+
`type_of(bar)` was computed for `type_check_item(foo)` so when
150+
`type_check_item(bar)` needs it, it is already in the cache.
151+
152+
Query results stay cached in the query context as long as the context lives.
153+
So if the compiler driver invoked another query later on, the above graph
154+
would still exist and already executed queries would not have to be re-done.
155+
156+
157+
158+
## Cycles
159+
160+
Earlier we stated that query invocations form a DAG. However, it would be easy
161+
form a cyclic graph by, for example, having a query provider like the following:
162+
163+
```rust,ignore
164+
fn cyclic_query_provider(tcx, key) -> u32 {
165+
// Invoke the same query with the same key again
166+
tcx.cyclic_query(key)
167+
}
168+
```
169+
170+
Since query providers are regular functions, this would behave much as expected:
171+
Evaluation would get stuck in an infinite recursion. A query like this would not
172+
be very useful either. However, sometimes certain kinds of invalid user input
173+
can result in queries being called in a cyclic way. The query engine includes
174+
a check for cyclic invocations and, because cycles are an irrecoverable error,
175+
will abort execution with a "cycle error" messages that tries to be human
176+
readable.
177+
178+
At some point the compiler had a notion of "cycle recovery", that is, one could
179+
"try" to execute a query and if it ended up causing a cycle, proceed in some
180+
other fashion. However, this was later removed because it is not entirely
181+
clear what the theoretical consequences of this are, especially regarding
182+
incremental compilation.
183+
184+
185+
## "Steal" Queries
186+
187+
Some queries have their result wrapped in a `Steal<T>` struct. These queries
188+
behave exactly the same as regular with one exception: Their result is expected
189+
to be "stolen" out of the cache at some point, meaning some other part of the
190+
program is taking ownership of it and the result cannot be accessed anymore.
191+
192+
This stealing mechanism exists purely as a performance optimization because some
193+
result values are too costly to clone (e.g. the MIR of a function). It seems
194+
like result stealing would violate the condition that query results must be
195+
immutable (after all we are moving the result value out of the cache) but it is
196+
OK as long as the mutation is not observable. This is achieved by two things:
197+
198+
- Before a result is stolen, we make sure to eagerly run all queries that
199+
might ever need to read that result. This has to be done manually by calling
200+
those queries.
201+
- Whenever a query tries to access a stolen result, we make the compiler ICE so
202+
that such a condition cannot go unnoticed.
203+
204+
This is not an ideal setup because of the manual intervention needed, so it
205+
should be used sparingly and only when it is well known which queries might
206+
access a given result. In practice, however, stealing has not turned out to be
207+
much of a maintainance burden.
208+
209+
To summarize: "Steal queries" break some of the rules in a controlled way.
210+
There are checks in place that make sure that nothing can go silently wrong.
211+
212+
213+
## Parallel Query Execution
214+
215+
The query model has some properties that make it actually feasible to evaluate
216+
multiple queries in parallel without too much of an effort:
217+
218+
- All data a query provider can access is accessed via the query context, so
219+
the query context can take care of synchronizing access.
220+
- Query results are required to be immutable so they can safely be used by
221+
different threads concurrently.
222+
223+
The nightly compiler already implements parallel query evaluation as follows:
224+
225+
When a query `foo` is evaluated, the cache table for `foo` is locked.
226+
227+
- If there already is a result, we can clone it,release the lock and
228+
we are done.
229+
- If there is no cache entry and no other active query invocation computing the
230+
same result, we mark the key as being "in progress", release the lock and
231+
start evaluating.
232+
- If there *is* another query invocation for the same key in progress, we
233+
release the lock, and just block the thread until the other invocation has
234+
computed the result we are waiting for. This cannot deadlock because, as
235+
mentioned before, query invocations form a DAG. Some thread will always make
236+
progress.
237+

src/query.md

Lines changed: 7 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,12 @@ will in turn demand information about that crate, starting from the
3535
However, that vision is not fully realized. Still, big chunks of the
3636
compiler (for example, generating MIR) work exactly like this.
3737

38+
### The Query Evaluation Model in Detail
39+
40+
The [Query Evaluation Model in Detail][query-model] chapter gives a more
41+
in-depth description of what queries are and how they work.
42+
If you intend to write a query of your own, this is a good read.
43+
3844
### Invoking queries
3945

4046
To invoke a query is simple. The tcx ("type context") offers a method
@@ -45,60 +51,6 @@ query, you would just do this:
4551
let ty = tcx.type_of(some_def_id);
4652
```
4753

48-
### Cycles between queries
49-
50-
A cycle is when a query becomes stuck in a loop e.g. query A generates query B
51-
which generates query A again.
52-
53-
Currently, cycles during query execution should always result in a
54-
compilation error. Typically, they arise because of illegal programs
55-
that contain cyclic references they shouldn't (though sometimes they
56-
arise because of compiler bugs, in which case we need to factor our
57-
queries in a more fine-grained fashion to avoid them).
58-
59-
However, it is nonetheless often useful to *recover* from a cycle
60-
(after reporting an error, say) and try to soldier on, so as to give a
61-
better user experience. In order to recover from a cycle, you don't
62-
get to use the nice method-call-style syntax. Instead, you invoke
63-
using the `try_get` method, which looks roughly like this:
64-
65-
```rust,ignore
66-
use ty::queries;
67-
...
68-
match queries::type_of::try_get(tcx, DUMMY_SP, self.did) {
69-
Ok(result) => {
70-
// no cycle occurred! You can use `result`
71-
}
72-
Err(err) => {
73-
// A cycle occurred! The error value `err` is a `DiagnosticBuilder`,
74-
// meaning essentially an "in-progress", not-yet-reported error message.
75-
// See below for more details on what to do here.
76-
}
77-
}
78-
```
79-
80-
So, if you get back an `Err` from `try_get`, then a cycle *did* occur. This
81-
means that you must ensure that a compiler error message is reported. You can
82-
do that in two ways:
83-
84-
The simplest is to invoke `err.emit()`. This will emit the cycle error to the
85-
user.
86-
87-
However, often cycles happen because of an illegal program, and you
88-
know at that point that an error either already has been reported or
89-
will be reported due to this cycle by some other bit of code. In that
90-
case, you can invoke `err.cancel()` to not emit any error. It is
91-
traditional to then invoke:
92-
93-
```rust,ignore
94-
tcx.sess.delay_span_bug(some_span, "some message")
95-
```
96-
97-
`delay_span_bug()` is a helper that says: we expect a compilation
98-
error to have happened or to happen in the future; so, if compilation
99-
ultimately succeeds, make an ICE with the message `"some
100-
message"`. This is basically just a precaution in case you are wrong.
101-
10254
### How the compiler executes a query
10355

10456
So you may be wondering what happens when you invoke a query
@@ -315,3 +267,4 @@ impl<'tcx> QueryDescription for queries::type_of<'tcx> {
315267
}
316268
```
317269

270+
[query-model]: queries/query-evaluation-model-in-detail.html

src/variance.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -139,7 +139,7 @@ crate (through `crate_variances`), but since most changes will not result in a
139139
change to the actual results from variance inference, the `variances_of` query
140140
will wind up being considered green after it is re-evaluated.
141141

142-
[rga]: ./incremental-compilation.html
142+
[rga]: ./queries/incremental-compilation.html
143143

144144
<a name="addendum"></a>
145145

0 commit comments

Comments
 (0)