Skip to content

Commit a53a22e

Browse files
committed
Implement the new region hierarchy rules, in which regions from distinct
hierarchies are judged based on the lexical relationship of their respective fn bodies.
1 parent bc959fd commit a53a22e

File tree

3 files changed

+110
-140
lines changed

3 files changed

+110
-140
lines changed

src/librustc/middle/infer/region_inference/README.md

+55-108
Original file line numberDiff line numberDiff line change
@@ -249,114 +249,61 @@ there is a reference created whose lifetime does not enclose
249249
the borrow expression, we must issue sufficient restrictions to ensure
250250
that the pointee remains valid.
251251

252-
## Adding closures
253-
254-
The other significant complication to the region hierarchy is
255-
closures. I will describe here how closures should work, though some
256-
of the work to implement this model is ongoing at the time of this
257-
writing.
258-
259-
The body of closures are type-checked along with the function that
260-
creates them. However, unlike other expressions that appear within the
261-
function body, it is not entirely obvious when a closure body executes
262-
with respect to the other expressions. This is because the closure
263-
body will execute whenever the closure is called; however, we can
264-
never know precisely when the closure will be called, especially
265-
without some sort of alias analysis.
266-
267-
However, we can place some sort of limits on when the closure
268-
executes. In particular, the type of every closure `fn:'r K` includes
269-
a region bound `'r`. This bound indicates the maximum lifetime of that
270-
closure; once we exit that region, the closure cannot be called
271-
anymore. Therefore, we say that the lifetime of the closure body is a
272-
sublifetime of the closure bound, but the closure body itself is unordered
273-
with respect to other parts of the code.
274-
275-
For example, consider the following fragment of code:
276-
277-
'a: {
278-
let closure: fn:'a() = || 'b: {
279-
'c: ...
280-
};
281-
'd: ...
282-
}
283-
284-
Here we have four lifetimes, `'a`, `'b`, `'c`, and `'d`. The closure
285-
`closure` is bounded by the lifetime `'a`. The lifetime `'b` is the
286-
lifetime of the closure body, and `'c` is some statement within the
287-
closure body. Finally, `'d` is a statement within the outer block that
288-
created the closure.
289-
290-
We can say that the closure body `'b` is a sublifetime of `'a` due to
291-
the closure bound. By the usual lexical scoping conventions, the
292-
statement `'c` is clearly a sublifetime of `'b`, and `'d` is a
293-
sublifetime of `'d`. However, there is no ordering between `'c` and
294-
`'d` per se (this kind of ordering between statements is actually only
295-
an issue for dataflow; passes like the borrow checker must assume that
296-
closures could execute at any time from the moment they are created
297-
until they go out of scope).
298-
299-
### Complications due to closure bound inference
300-
301-
There is only one problem with the above model: in general, we do not
302-
actually *know* the closure bounds during region inference! In fact,
303-
closure bounds are almost always region variables! This is very tricky
304-
because the inference system implicitly assumes that we can do things
305-
like compute the LUB of two scoped lifetimes without needing to know
306-
the values of any variables.
307-
308-
Here is an example to illustrate the problem:
309-
310-
fn identify<T>(x: T) -> T { x }
311-
312-
fn foo() { // 'foo is the function body
313-
'a: {
314-
let closure = identity(|| 'b: {
315-
'c: ...
316-
});
317-
'd: closure();
318-
}
319-
'e: ...;
320-
}
321-
322-
In this example, the closure bound is not explicit. At compile time,
323-
we will create a region variable (let's call it `V0`) to represent the
324-
closure bound.
325-
326-
The primary difficulty arises during the constraint propagation phase.
327-
Imagine there is some variable with incoming edges from `'c` and `'d`.
328-
This means that the value of the variable must be `LUB('c,
329-
'd)`. However, without knowing what the closure bound `V0` is, we
330-
can't compute the LUB of `'c` and `'d`! Any we don't know the closure
331-
bound until inference is done.
332-
333-
The solution is to rely on the fixed point nature of inference.
334-
Basically, when we must compute `LUB('c, 'd)`, we just use the current
335-
value for `V0` as the closure's bound. If `V0`'s binding should
336-
change, then we will do another round of inference, and the result of
337-
`LUB('c, 'd)` will change.
338-
339-
One minor implication of this is that the graph does not in fact track
340-
the full set of dependencies between edges. We cannot easily know
341-
whether the result of a LUB computation will change, since there may
342-
be indirect dependencies on other variables that are not reflected on
343-
the graph. Therefore, we must *always* iterate over all edges when
344-
doing the fixed point calculation, not just those adjacent to nodes
345-
whose values have changed.
346-
347-
Were it not for this requirement, we could in fact avoid fixed-point
348-
iteration altogether. In that universe, we could instead first
349-
identify and remove strongly connected components (SCC) in the graph.
350-
Note that such components must consist solely of region variables; all
351-
of these variables can effectively be unified into a single variable.
352-
Once SCCs are removed, we are left with a DAG. At this point, we
353-
could walk the DAG in topological order once to compute the expanding
354-
nodes, and again in reverse topological order to compute the
355-
contracting nodes. However, as I said, this does not work given the
356-
current treatment of closure bounds, but perhaps in the future we can
357-
address this problem somehow and make region inference somewhat more
358-
efficient. Note that this is solely a matter of performance, not
359-
expressiveness.
252+
## Modeling closures
253+
254+
Integrating closures properly into the model is a bit of
255+
work-in-progress. In an ideal world, we would model closures as
256+
closely as possible after their desugared equivalents. That is, a
257+
closure type would be modeled as a struct, and the region hierarchy of
258+
different closure bodies would be completely distinct from all other
259+
fns. We are generally moving in that direction but there are
260+
complications in terms of the implementation.
261+
262+
In practice what we currently do is somewhat different. The basis for
263+
the current approach is the observation that the only time that
264+
regions from distinct fn bodies interact with one another is through
265+
an upvar or the type of a fn parameter (since closures live in the fn
266+
body namespace, they can in fact have fn parameters whose types
267+
include regions from the surrounding fn body). For these cases, there
268+
are separate mechanisms which ensure that the regions that appear in
269+
upvars/parameters outlive the dynamic extent of each call to the
270+
closure:
271+
272+
1. Types must outlive the region of any expression where they are used.
273+
For a closure type `C` to outlive a region `'r`, that implies that the
274+
types of all its upvars must outlive `'r`.
275+
2. Parameters must outlive the region of any fn that they are passed to.
276+
277+
Therefore, we can -- sort of -- assume that when we are asked to
278+
compare a region `'a` from a closure with a region `'b` from the fn
279+
that encloses it, in fact `'b` is the larger region. And that is
280+
precisely what we do: when building the region hierarchy, each region
281+
lives in its own distinct subtree, but if we are asked to compute the
282+
`LUB(r1, r2)` of two regions, and those regions are in disjoint
283+
subtrees, we compare the lexical nesting of the two regions.
284+
285+
*Ideas for improving the situation:* The correct argument here is
286+
subtle and a bit hand-wavy. The ideal, as stated earlier, would be to
287+
model things in such a way that it corresponds more closely to the
288+
desugared code. The best approach for doing this is a bit unclear: it
289+
may in fact be possible to *actually* desugar before we start, but I
290+
don't think so. The main option that I've been thinking through is
291+
imposing a "view shift" as we enter the fn body, so that regions
292+
appearing in the types of fn parameters and upvars are translated from
293+
being regions in the outer fn into free region parameters, just as
294+
they would be if we applied the desugaring. The challenge here is that
295+
type inference may not have fully run, so the types may not be fully
296+
known: we could probably do this translation lazilly, as type
297+
variables are instantiated. We would also have to apply a kind of
298+
inverse translation to the return value. This would be a good idea
299+
anyway, as right now it is possible for free regions instantiated
300+
within the closure to leak into the parent: this currently leads to
301+
type errors, since those regions cannot outlive any expressions within
302+
the parent hierarchy. Much like the current handling of closures,
303+
there are no known cases where this leads to a type-checking accepting
304+
incorrect code (though it sometimes rejects what might be considered
305+
correct code; see rust-lang/rust#22557), but it still doesn't feel
306+
like the right approach.
360307

361308
### Skolemization
362309

src/librustc/middle/region.rs

+54-32
Original file line numberDiff line numberDiff line change
@@ -262,7 +262,9 @@ pub struct RegionMaps {
262262
/// fn, if any. Thus the map structures the fn bodies into a
263263
/// hierarchy based on their lexical mapping. This is used to
264264
/// handle the relationships between regions in a fn and in a
265-
/// closure defined by that fn.
265+
/// closure defined by that fn. See the "Modeling closures"
266+
/// section of the README in middle::infer::region_inference for
267+
/// more details.
266268
fn_tree: RefCell<NodeMap<ast::NodeId>>,
267269
}
268270

@@ -337,7 +339,9 @@ pub struct Context {
337339
/// the root of the current region tree. This is typically the id
338340
/// of the innermost fn body. Each fn forms its own disjoint tree
339341
/// in the region hierarchy. These fn bodies are themselves
340-
/// arranged into a tree.
342+
/// arranged into a tree. See the "Modeling closures" section of
343+
/// the README in middle::infer::region_inference for more
344+
/// details.
341345
root_id: Option<ast::NodeId>,
342346

343347
/// the scope that contains any new variables declared
@@ -411,7 +415,18 @@ impl RegionMaps {
411415
assert!(previous.is_none());
412416
}
413417

414-
fn record_encl_scope(&self, sub: CodeExtent, sup: CodeExtent) {
418+
fn fn_is_enclosed_by(&self, mut sub_fn: ast::NodeId, sup_fn: ast::NodeId) -> bool {
419+
let fn_tree = self.fn_tree.borrow();
420+
loop {
421+
if sub_fn == sup_fn { return true; }
422+
match fn_tree.get(&sub_fn) {
423+
Some(&s) => { sub_fn = s; }
424+
None => { return false; }
425+
}
426+
}
427+
}
428+
429+
pub fn record_encl_scope(&self, sub: CodeExtent, sup: CodeExtent) {
415430
debug!("record_encl_scope(sub={:?}, sup={:?})", sub, sup);
416431
assert!(sub != sup);
417432
self.scope_map.borrow_mut().insert(sub, sup);
@@ -600,7 +615,7 @@ impl RegionMaps {
600615
let mut a_index = a_ancestors.len() - 1;
601616
let mut b_index = b_ancestors.len() - 1;
602617

603-
// Here, ~[ab]_ancestors is a vector going from narrow to broad.
618+
// Here, [ab]_ancestors is a vector going from narrow to broad.
604619
// The end of each vector will be the item where the scope is
605620
// defined; if there are any common ancestors, then the tails of
606621
// the vector will be the same. So basically we want to walk
@@ -609,7 +624,32 @@ impl RegionMaps {
609624
// then the corresponding scope is a superscope of the other.
610625

611626
if a_ancestors[a_index] != b_ancestors[b_index] {
612-
return None;
627+
// In this case, the two regions belong to completely
628+
// different functions. Compare those fn for lexical
629+
// nesting. The reasoning behind this is subtle. See the
630+
// "Modeling closures" section of the README in
631+
// middle::infer::region_inference for more details.
632+
let a_root_scope = a_ancestors[a_index];
633+
let b_root_scope = a_ancestors[a_index];
634+
return match (a_root_scope, b_root_scope) {
635+
(CodeExtent::DestructionScope(a_root_id),
636+
CodeExtent::DestructionScope(b_root_id)) => {
637+
if self.fn_is_enclosed_by(a_root_id, b_root_id) {
638+
// `a` is enclosed by `b`, hence `b` is the ancestor of everything in `a`
639+
Some(scope_b)
640+
} else if self.fn_is_enclosed_by(b_root_id, a_root_id) {
641+
// `b` is enclosed by `a`, hence `a` is the ancestor of everything in `b`
642+
Some(scope_a)
643+
} else {
644+
// neither fn encloses the other
645+
None
646+
}
647+
}
648+
_ => {
649+
// root ids are always Misc right now
650+
unreachable!()
651+
}
652+
};
613653
}
614654

615655
loop {
@@ -675,6 +715,7 @@ fn resolve_block(visitor: &mut RegionResolutionVisitor, blk: &ast::Block) {
675715
let prev_cx = visitor.cx;
676716

677717
let blk_scope = CodeExtent::Misc(blk.id);
718+
678719
// If block was previously marked as a terminating scope during
679720
// the recursive visit of its parent node in the AST, then we need
680721
// to account for the destruction scope representing the extent of
@@ -1144,7 +1185,7 @@ fn resolve_item(visitor: &mut RegionResolutionVisitor, item: &ast::Item) {
11441185
}
11451186

11461187
fn resolve_fn(visitor: &mut RegionResolutionVisitor,
1147-
fk: FnKind,
1188+
_: FnKind,
11481189
decl: &ast::FnDecl,
11491190
body: &ast::Block,
11501191
sp: Span,
@@ -1180,32 +1221,13 @@ fn resolve_fn(visitor: &mut RegionResolutionVisitor,
11801221
};
11811222
visit::walk_fn_decl(visitor, decl);
11821223

1183-
// The body of the fn itself is either a root scope (top-level fn)
1184-
// or it continues with the inherited scope (closures).
1185-
match fk {
1186-
visit::FkItemFn(..) | visit::FkMethod(..) => {
1187-
visitor.cx = Context {
1188-
root_id: Some(body.id),
1189-
parent: InnermostEnclosingExpr::None,
1190-
var_parent: InnermostDeclaringBlock::None
1191-
};
1192-
visitor.visit_block(body);
1193-
}
1194-
visit::FkFnBlock(..) => {
1195-
// FIXME(#3696) -- at present we are place the closure body
1196-
// within the region hierarchy exactly where it appears lexically.
1197-
// This is wrong because the closure may live longer
1198-
// than the enclosing expression. We should probably fix this,
1199-
// but the correct fix is a bit subtle, and I am also not sure
1200-
// that the present approach is unsound -- it may not permit
1201-
// any illegal programs. See issue for more details.
1202-
visitor.cx = Context {
1203-
root_id: Some(body.id),
1204-
..outer_cx
1205-
};
1206-
visitor.visit_block(body);
1207-
}
1208-
}
1224+
// The body of the every fn is a root scope.
1225+
visitor.cx = Context {
1226+
root_id: Some(body.id),
1227+
parent: InnermostEnclosingExpr::None,
1228+
var_parent: InnermostDeclaringBlock::None
1229+
};
1230+
visitor.visit_block(body);
12091231

12101232
// Restore context we had at the start.
12111233
visitor.cx = outer_cx;

src/librustc_driver/test.rs

+1
Original file line numberDiff line numberDiff line change
@@ -588,6 +588,7 @@ fn lub_free_free() {
588588
fn lub_returning_scope() {
589589
test_env(EMPTY_SOURCE_STR,
590590
errors(&["cannot infer an appropriate lifetime"]), |env| {
591+
env.create_simple_region_hierarchy();
591592
let t_rptr_scope10 = env.t_rptr_scope(10);
592593
let t_rptr_scope11 = env.t_rptr_scope(11);
593594

0 commit comments

Comments
 (0)