Implement the new region hierarchy rules, in which regions from distinct

nikomatsakis · nikomatsakis · commit a53a22eab94f · 2015-04-01T08:40:42.000-04:00
hierarchies are judged based on the lexical relationship of their
respective fn bodies.
diff --git a/src/librustc/middle/infer/region_inference/README.md b/src/librustc/middle/infer/region_inference/README.md
@@ -249,114 +249,61 @@ there is a reference created whose lifetime does not enclose
 the borrow expression, we must issue sufficient restrictions to ensure
 that the pointee remains valid.
 
-## Adding closures
-
-The other significant complication to the region hierarchy is
-closures. I will describe here how closures should work, though some
-of the work to implement this model is ongoing at the time of this
-writing.
-
-The body of closures are type-checked along with the function that
-creates them. However, unlike other expressions that appear within the
-function body, it is not entirely obvious when a closure body executes
-with respect to the other expressions. This is because the closure
-body will execute whenever the closure is called; however, we can
-never know precisely when the closure will be called, especially
-without some sort of alias analysis.
-
-However, we can place some sort of limits on when the closure
-executes.  In particular, the type of every closure `fn:'r K` includes
-a region bound `'r`. This bound indicates the maximum lifetime of that
-closure; once we exit that region, the closure cannot be called
-anymore. Therefore, we say that the lifetime of the closure body is a
-sublifetime of the closure bound, but the closure body itself is unordered
-with respect to other parts of the code.
-
-For example, consider the following fragment of code:
-
-    'a: {
-         let closure: fn:'a() = || 'b: {
-             'c: ...
-         };
-         'd: ...
-    }
-
-Here we have four lifetimes, `'a`, `'b`, `'c`, and `'d`. The closure
-`closure` is bounded by the lifetime `'a`. The lifetime `'b` is the
-lifetime of the closure body, and `'c` is some statement within the
-closure body. Finally, `'d` is a statement within the outer block that
-created the closure.
-
-We can say that the closure body `'b` is a sublifetime of `'a` due to
-the closure bound. By the usual lexical scoping conventions, the
-statement `'c` is clearly a sublifetime of `'b`, and `'d` is a
-sublifetime of `'d`. However, there is no ordering between `'c` and
-`'d` per se (this kind of ordering between statements is actually only
-an issue for dataflow; passes like the borrow checker must assume that
-closures could execute at any time from the moment they are created
-until they go out of scope).
-
-### Complications due to closure bound inference
-
-There is only one problem with the above model: in general, we do not
-actually *know* the closure bounds during region inference! In fact,
-closure bounds are almost always region variables! This is very tricky
-because the inference system implicitly assumes that we can do things
-like compute the LUB of two scoped lifetimes without needing to know
-the values of any variables.
-
-Here is an example to illustrate the problem:
-
-    fn identify<T>(x: T) -> T { x }
-
-    fn foo() { // 'foo is the function body
-      'a: {
-           let closure = identity(|| 'b: {
-               'c: ...
-           });
-           'd: closure();
-      }
-      'e: ...;
-    }
-
-In this example, the closure bound is not explicit. At compile time,
-we will create a region variable (let's call it `V0`) to represent the
-closure bound.
-
-The primary difficulty arises during the constraint propagation phase.
-Imagine there is some variable with incoming edges from `'c` and `'d`.
-This means that the value of the variable must be `LUB('c,
-'d)`. However, without knowing what the closure bound `V0` is, we
-can't compute the LUB of `'c` and `'d`! Any we don't know the closure
-bound until inference is done.
-
-The solution is to rely on the fixed point nature of inference.
-Basically, when we must compute `LUB('c, 'd)`, we just use the current
-value for `V0` as the closure's bound. If `V0`'s binding should
-change, then we will do another round of inference, and the result of
-`LUB('c, 'd)` will change.
-
-One minor implication of this is that the graph does not in fact track
-the full set of dependencies between edges. We cannot easily know
-whether the result of a LUB computation will change, since there may
-be indirect dependencies on other variables that are not reflected on
-the graph. Therefore, we must *always* iterate over all edges when
-doing the fixed point calculation, not just those adjacent to nodes
-whose values have changed.
-
-Were it not for this requirement, we could in fact avoid fixed-point
-iteration altogether. In that universe, we could instead first
-identify and remove strongly connected components (SCC) in the graph.
-Note that such components must consist solely of region variables; all
-of these variables can effectively be unified into a single variable.
-Once SCCs are removed, we are left with a DAG.  At this point, we
-could walk the DAG in topological order once to compute the expanding
-nodes, and again in reverse topological order to compute the
-contracting nodes. However, as I said, this does not work given the
-current treatment of closure bounds, but perhaps in the future we can
-address this problem somehow and make region inference somewhat more
-efficient. Note that this is solely a matter of performance, not
-expressiveness.
+## Modeling closures
+
+Integrating closures properly into the model is a bit of
+work-in-progress. In an ideal world, we would model closures as
+closely as possible after their desugared equivalents. That is, a
+closure type would be modeled as a struct, and the region hierarchy of
+different closure bodies would be completely distinct from all other
+fns. We are generally moving in that direction but there are
+complications in terms of the implementation.
+
+In practice what we currently do is somewhat different. The basis for
+the current approach is the observation that the only time that
+regions from distinct fn bodies interact with one another is through
+an upvar or the type of a fn parameter (since closures live in the fn
+body namespace, they can in fact have fn parameters whose types
+include regions from the surrounding fn body). For these cases, there
+are separate mechanisms which ensure that the regions that appear in
+upvars/parameters outlive the dynamic extent of each call to the
+closure:
+
+1. Types must outlive the region of any expression where they are used.
+   For a closure type `C` to outlive a region `'r`, that implies that the
+   types of all its upvars must outlive `'r`.
+2. Parameters must outlive the region of any fn that they are passed to.
+
+Therefore, we can -- sort of -- assume that when we are asked to
+compare a region `'a` from a closure with a region `'b` from the fn
+that encloses it, in fact `'b` is the larger region. And that is
+precisely what we do: when building the region hierarchy, each region
+lives in its own distinct subtree, but if we are asked to compute the
+`LUB(r1, r2)` of two regions, and those regions are in disjoint
+subtrees, we compare the lexical nesting of the two regions.
+
+*Ideas for improving the situation:* The correct argument here is
+subtle and a bit hand-wavy. The ideal, as stated earlier, would be to
+model things in such a way that it corresponds more closely to the
+desugared code. The best approach for doing this is a bit unclear: it
+may in fact be possible to *actually* desugar before we start, but I
+don't think so. The main option that I've been thinking through is
+imposing a "view shift" as we enter the fn body, so that regions
+appearing in the types of fn parameters and upvars are translated from
+being regions in the outer fn into free region parameters, just as
+they would be if we applied the desugaring. The challenge here is that
+type inference may not have fully run, so the types may not be fully
+known: we could probably do this translation lazilly, as type
+variables are instantiated. We would also have to apply a kind of
+inverse translation to the return value. This would be a good idea
+anyway, as right now it is possible for free regions instantiated
+within the closure to leak into the parent: this currently leads to
+type errors, since those regions cannot outlive any expressions within
+the parent hierarchy. Much like the current handling of closures,
+there are no known cases where this leads to a type-checking accepting
+incorrect code (though it sometimes rejects what might be considered
+correct code; see rust-lang/rust#22557), but it still doesn't feel
+like the right approach.
 
 ### Skolemization
 
diff --git a/src/librustc/middle/region.rs b/src/librustc/middle/region.rs
@@ -262,7 +262,9 @@ pub struct RegionMaps {
     /// fn, if any. Thus the map structures the fn bodies into a
     /// hierarchy based on their lexical mapping. This is used to
     /// handle the relationships between regions in a fn and in a
-    /// closure defined by that fn.
+    /// closure defined by that fn. See the "Modeling closures"
+    /// section of the README in middle::infer::region_inference for
+    /// more details.
     fn_tree: RefCell<NodeMap<ast::NodeId>>,
 }
 
@@ -337,7 +339,9 @@ pub struct Context {
     /// the root of the current region tree. This is typically the id
     /// of the innermost fn body. Each fn forms its own disjoint tree
     /// in the region hierarchy. These fn bodies are themselves
-    /// arranged into a tree.
+    /// arranged into a tree. See the "Modeling closures" section of
+    /// the README in middle::infer::region_inference for more
+    /// details.
     root_id: Option<ast::NodeId>,
 
     /// the scope that contains any new variables declared
@@ -411,7 +415,18 @@ impl RegionMaps {
         assert!(previous.is_none());
     }
 
-    fn record_encl_scope(&self, sub: CodeExtent, sup: CodeExtent) {
+    fn fn_is_enclosed_by(&self, mut sub_fn: ast::NodeId, sup_fn: ast::NodeId) -> bool {
+        let fn_tree = self.fn_tree.borrow();
+        loop {
+            if sub_fn == sup_fn { return true; }
+            match fn_tree.get(&sub_fn) {
+                Some(&s) => { sub_fn = s; }
+                None => { return false; }
+            }
+        }
+    }
+
+    pub fn record_encl_scope(&self, sub: CodeExtent, sup: CodeExtent) {
         debug!("record_encl_scope(sub={:?}, sup={:?})", sub, sup);
         assert!(sub != sup);
         self.scope_map.borrow_mut().insert(sub, sup);
@@ -600,7 +615,7 @@ impl RegionMaps {
         let mut a_index = a_ancestors.len() - 1;
         let mut b_index = b_ancestors.len() - 1;
 
-        // Here, ~[ab]_ancestors is a vector going from narrow to broad.
+        // Here, [ab]_ancestors is a vector going from narrow to broad.
         // The end of each vector will be the item where the scope is
         // defined; if there are any common ancestors, then the tails of
         // the vector will be the same.  So basically we want to walk
@@ -609,7 +624,32 @@ impl RegionMaps {
         // then the corresponding scope is a superscope of the other.
 
         if a_ancestors[a_index] != b_ancestors[b_index] {
-            return None;
+            // In this case, the two regions belong to completely
+            // different functions.  Compare those fn for lexical
+            // nesting. The reasoning behind this is subtle.  See the
+            // "Modeling closures" section of the README in
+            // middle::infer::region_inference for more details.
+            let a_root_scope = a_ancestors[a_index];
+            let b_root_scope = a_ancestors[a_index];
+            return match (a_root_scope, b_root_scope) {
+                (CodeExtent::DestructionScope(a_root_id),
+                 CodeExtent::DestructionScope(b_root_id)) => {
+                    if self.fn_is_enclosed_by(a_root_id, b_root_id) {
+                        // `a` is enclosed by `b`, hence `b` is the ancestor of everything in `a`
+                        Some(scope_b)
+                    } else if self.fn_is_enclosed_by(b_root_id, a_root_id) {
+                        // `b` is enclosed by `a`, hence `a` is the ancestor of everything in `b`
+                        Some(scope_a)
+                    } else {
+                        // neither fn encloses the other
+                        None
+                    }
+                }
+                _ => {
+                    // root ids are always Misc right now
+                    unreachable!()
+                }
+            };
         }
 
         loop {
@@ -675,6 +715,7 @@ fn resolve_block(visitor: &mut RegionResolutionVisitor, blk: &ast::Block) {
     let prev_cx = visitor.cx;
 
     let blk_scope = CodeExtent::Misc(blk.id);
+
     // If block was previously marked as a terminating scope during
     // the recursive visit of its parent node in the AST, then we need
     // to account for the destruction scope representing the extent of
@@ -1144,7 +1185,7 @@ fn resolve_item(visitor: &mut RegionResolutionVisitor, item: &ast::Item) {
 }
 
 fn resolve_fn(visitor: &mut RegionResolutionVisitor,
-              fk: FnKind,
+              _: FnKind,
               decl: &ast::FnDecl,
               body: &ast::Block,
               sp: Span,
@@ -1180,32 +1221,13 @@ fn resolve_fn(visitor: &mut RegionResolutionVisitor,
     };
     visit::walk_fn_decl(visitor, decl);
 
-    // The body of the fn itself is either a root scope (top-level fn)
-    // or it continues with the inherited scope (closures).
-    match fk {
-        visit::FkItemFn(..) | visit::FkMethod(..) => {
-            visitor.cx = Context {
-                root_id: Some(body.id),
-                parent: InnermostEnclosingExpr::None,
-                var_parent: InnermostDeclaringBlock::None
-            };
-            visitor.visit_block(body);
-        }
-        visit::FkFnBlock(..) => {
-            // FIXME(#3696) -- at present we are place the closure body
-            // within the region hierarchy exactly where it appears lexically.
-            // This is wrong because the closure may live longer
-            // than the enclosing expression. We should probably fix this,
-            // but the correct fix is a bit subtle, and I am also not sure
-            // that the present approach is unsound -- it may not permit
-            // any illegal programs. See issue for more details.
-            visitor.cx = Context {
-                root_id: Some(body.id),
-                ..outer_cx
-            };
-            visitor.visit_block(body);
-        }
-    }
+    // The body of the every fn is a root scope.
+    visitor.cx = Context {
+        root_id: Some(body.id),
+        parent: InnermostEnclosingExpr::None,
+        var_parent: InnermostDeclaringBlock::None
+    };
+    visitor.visit_block(body);
 
     // Restore context we had at the start.
     visitor.cx = outer_cx;
diff --git a/src/librustc_driver/test.rs b/src/librustc_driver/test.rs
@@ -588,6 +588,7 @@ fn lub_free_free() {
 fn lub_returning_scope() {
     test_env(EMPTY_SOURCE_STR,
              errors(&["cannot infer an appropriate lifetime"]), |env| {
+                 env.create_simple_region_hierarchy();
                  let t_rptr_scope10 = env.t_rptr_scope(10);
                  let t_rptr_scope11 = env.t_rptr_scope(11);