Skip to content

Commit 794c2d6

Browse files
authored
Merge pull request #27 from Gankro/reref
rewrite references.md
2 parents c0e8c56 + c4822cd commit 794c2d6

File tree

3 files changed

+149
-159
lines changed

3 files changed

+149
-159
lines changed

src/SUMMARY.md

+1
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
* [Other reprs](other-reprs.md)
1212
* [Ownership](ownership.md)
1313
* [References](references.md)
14+
* [Aliasing](aliasing.md)
1415
* [Lifetimes](lifetimes.md)
1516
* [Limits of Lifetimes](lifetime-mismatch.md)
1617
* [Lifetime Elision](lifetime-elision.md)

src/aliasing.md

+135
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,135 @@
1+
# Aliasing
2+
3+
First off, let's get some important caveats out of this way:
4+
5+
* We will be using the broadest possible definition of aliasing for the sake
6+
of discussion. Rust's definition will probably be more restricted to factor
7+
in mutations and liveness.
8+
9+
* We will be assuming a single-threaded, interrupt-free, execution. We will also
10+
be ignoring things like memory-mapped hardware. Rust assumes these things
11+
don't happen unless you tell it otherwise. For more details, see the
12+
[Concurrency Chapter](concurrency.html).
13+
14+
With that said, here's our working definition: variables and pointers *alias*
15+
if they refer to overlapping regions of memory.
16+
17+
18+
19+
20+
# Why Aliasing Matters
21+
22+
So why should we care about aliasing?
23+
24+
Consider this simple function:
25+
26+
```rust
27+
fn compute(input: &u32, output: &mut u32) {
28+
if *input > 10 {
29+
*output = 1;
30+
}
31+
if *input > 5 {
32+
*output *= 2;
33+
}
34+
}
35+
```
36+
37+
We would *like* to be able to optimize it to the following function:
38+
39+
```rust
40+
fn compute(input: &u32, output: &mut u32) {
41+
let cached_input = *input; // keep *input in a register
42+
if cached_input > 10 {
43+
*output = 2; // x > 10 implies x > 5, so double and exit immediately
44+
} else if cached_input > 5 {
45+
*output *= 2;
46+
}
47+
}
48+
```
49+
50+
In Rust, this optimization should be sound. For almost any other language, it
51+
wouldn't be (barring global analysis). This is because the optimization relies
52+
on knowing that aliasing doesn't occur, which most languages are fairly liberal
53+
with. Specifically, we need to worry about function arguments that make `input`
54+
and `output` overlap, such as `compute(&x, &mut x)`.
55+
56+
With that input, we could get this execution:
57+
58+
```rust,ignore
59+
// input == output == 0xabad1dea
60+
// *input == *output == 20
61+
if *input > 10 { // true (*input == 20)
62+
*output = 1; // also overwrites *input, because they are the same
63+
}
64+
if *input > 5 { // false (*input == 1)
65+
*output *= 2;
66+
}
67+
// *input == *output == 1
68+
```
69+
70+
Our optimized function would produce `*output == 2` for this input, so the
71+
correctness of our optimization relies on this input being impossible.
72+
73+
In Rust we know this input should be impossible because `&mut` isn't allowed to be
74+
aliased. So we can safely reject its possibility and perform this optimization.
75+
In most other languages, this input would be entirely possible, and must be considered.
76+
77+
This is why alias analysis is important: it lets the compiler perform useful
78+
optimizations! Some examples:
79+
80+
* keeping values in registers by proving no pointers access the value's memory
81+
* eliminating reads by proving some memory hasn't been written to since last we read it
82+
* eliminating writes by proving some memory is never read before the next write to it
83+
* moving or reordering reads and writes by proving they don't depend on each other
84+
85+
These optimizations also tend to prove the soundness of bigger optimizations
86+
such as loop vectorization, constant propagation, and dead code elimination.
87+
88+
In the previous example, we used the fact that `&mut u32` can't be aliased to prove
89+
that writes to `*output` can't possibly affect `*input`. This let us cache `*input`
90+
in a register, eliminating a read.
91+
92+
By caching this read, we knew that the the write in the `> 10` branch couldn't
93+
affect whether we take the `> 5` branch, allowing us to also eliminate a
94+
read-modify-write (doubling `*output`) when `*input > 10`.
95+
96+
The key thing to remember about alias analysis is that writes are the primary
97+
hazard for optimizations. That is, the only thing that prevents us
98+
from moving a read to any other part of the program is the possibility of us
99+
re-ordering it with a write to the same location.
100+
101+
For instance, we have no concern for aliasing in the following modified version
102+
of our function, because we've moved the only write to `*output` to the very
103+
end of our function. This allows us to freely reorder the reads of `*input` that
104+
occur before it:
105+
106+
```rust
107+
fn compute(input: &u32, output: &mut u32) {
108+
let mut temp = *output;
109+
if *input > 10 {
110+
temp = 1;
111+
}
112+
if *input > 5 {
113+
temp *= 2;
114+
}
115+
*output = temp;
116+
}
117+
```
118+
119+
We're still relying on alias analysis to assume that `temp` doesn't alias
120+
`input`, but the proof is much simpler: the value of a local variable can't be
121+
aliased by things that existed before it was declared. This is an assumption
122+
every language freely makes, and so this version of the function could be
123+
optimized the way we want in any language.
124+
125+
This is why the definition of "alias" that Rust will use likely involves some
126+
notion of liveness and mutation: we don't actually care if aliasing occurs if
127+
there aren't any actual writes to memory happening.
128+
129+
Of course, a full aliasing model for Rust must also take into consideration things like
130+
function calls (which may mutate things we don't see), raw pointers (which have
131+
no aliasing requirements on their own), and UnsafeCell (which lets the referent
132+
of an `&` be mutated).
133+
134+
135+

src/references.md

+13-159
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,5 @@
11
# References
22

3-
This section gives a high-level view of the memory model that *all* Rust
4-
programs must satisfy to be correct. Safe code is statically verified
5-
to obey this model by the borrow checker. Unsafe code may go above
6-
and beyond the borrow checker while still satisfying this model. The borrow
7-
checker may also be extended to allow more programs to compile, as long as
8-
this more fundamental model is satisfied.
9-
103
There are two kinds of reference:
114

125
* Shared reference: `&`
@@ -17,161 +10,22 @@ Which obey the following rules:
1710
* A reference cannot outlive its referent
1811
* A mutable reference cannot be aliased
1912

20-
That's it. That's the whole model. Of course, we should probably define
21-
what *aliased* means. To define aliasing, we must define the notion of
22-
*paths* and *liveness*.
23-
24-
25-
**NOTE: The model that follows is generally agreed to be dubious and have
26-
issues. It's ok-ish as an intuitive model, but fails to capture the desired
27-
semantics. We leave this here to be able to use notions introduced here in later
28-
sections. This will be significantly changed in the future. TODO: do that.**
29-
30-
31-
# Paths
32-
33-
If all Rust had were values (no pointers), then every value would be uniquely
34-
owned by a variable or composite structure. From this we naturally derive a
35-
*tree* of ownership. The stack itself is the root of the tree, with every
36-
variable as its direct children. Each variable's direct children would be their
37-
fields (if any), and so on.
38-
39-
From this view, every value in Rust has a unique *path* in the tree of
40-
ownership. Of particular interest are *ancestors* and *descendants*: if `x` owns
41-
`y`, then `x` is an ancestor of `y`, and `y` is a descendant of `x`. Note
42-
that this is an inclusive relationship: `x` is a descendant and ancestor of
43-
itself.
44-
45-
We can then define references as simply *names* for paths. When you create a
46-
reference, you're declaring that an ownership path exists to this address
47-
of memory.
48-
49-
Tragically, plenty of data doesn't reside on the stack, and we must also
50-
accommodate this. Globals and thread-locals are simple enough to model as
51-
residing at the bottom of the stack (though we must be careful with mutable
52-
globals). Data on the heap poses a different problem.
53-
54-
If all Rust had on the heap was data uniquely owned by a pointer on the stack,
55-
then we could just treat such a pointer as a struct that owns the value on the
56-
heap. Box, Vec, String, and HashMap, are examples of types which uniquely
57-
own data on the heap.
58-
59-
Unfortunately, data on the heap is not *always* uniquely owned. Rc for instance
60-
introduces a notion of *shared* ownership. Shared ownership of a value means
61-
there is no unique path to it. A value with no unique path limits what we can do
62-
with it.
63-
64-
In general, only shared references can be created to non-unique paths. However
65-
mechanisms which ensure mutual exclusion may establish One True Owner
66-
temporarily, establishing a unique path to that value (and therefore all
67-
its children). If this is done, the value may be mutated. In particular, a
68-
mutable reference can be taken.
69-
70-
The most common way to establish such a path is through *interior mutability*,
71-
in contrast to the *inherited mutability* that everything in Rust normally uses.
72-
Cell, RefCell, Mutex, and RWLock are all examples of interior mutability types.
73-
These types provide exclusive access through runtime restrictions.
74-
75-
An interesting case of this effect is Rc itself: if an Rc has refcount 1,
76-
then it is safe to mutate or even move its internals. Note however that the
77-
refcount itself uses interior mutability.
78-
79-
In order to correctly communicate to the type system that a variable or field of
80-
a struct can have interior mutability, it must be wrapped in an UnsafeCell. This
81-
does not in itself make it safe to perform interior mutability operations on
82-
that value. You still must yourself ensure that mutual exclusion is upheld.
83-
84-
85-
13+
That's it. That's the whole model references follow.
8614

87-
# Liveness
15+
Of course, we should probably define what *aliased* means.
8816

89-
Note: Liveness is not the same thing as a *lifetime*, which will be explained
90-
in detail in the next section of this chapter.
17+
```text
18+
error[E0425]: cannot find value `aliased` in this scope
19+
--> <rust.rs>:2:20
20+
|
21+
2 | println!("{}", aliased);
22+
| ^^^^^^^ not found in this scope
9123
92-
Roughly, a reference is *live* at some point in a program if it can be
93-
dereferenced. Shared references are always live unless they are literally
94-
unreachable (for instance, they reside in freed or leaked memory). Mutable
95-
references can be reachable but *not* live through the process of *reborrowing*.
96-
97-
A mutable reference can be reborrowed to either a shared or mutable reference to
98-
one of its descendants. A reborrowed reference will only be live again once all
99-
reborrows derived from it expire. For instance, a mutable reference can be
100-
reborrowed to point to a field of its referent:
101-
102-
```rust
103-
let x = &mut (1, 2);
104-
{
105-
// reborrow x to a subfield
106-
let y = &mut x.0;
107-
// y is now live, but x isn't
108-
*y = 3;
109-
}
110-
// y goes out of scope, so x is live again
111-
*x = (5, 7);
24+
error: aborting due to previous error
11225
```
11326

114-
It is also possible to reborrow into *multiple* mutable references, as long as
115-
they are *disjoint*: no reference is an ancestor of another. Rust
116-
explicitly enables this to be done with disjoint struct fields, because
117-
disjointness can be statically proven:
118-
119-
```rust
120-
let x = &mut (1, 2);
121-
{
122-
// reborrow x to two disjoint subfields
123-
let y = &mut x.0;
124-
let z = &mut x.1;
125-
126-
// y and z are now live, but x isn't
127-
*y = 3;
128-
*z = 4;
129-
}
130-
// y and z go out of scope, so x is live again
131-
*x = (5, 7);
132-
```
133-
134-
However it's often the case that Rust isn't sufficiently smart to prove that
135-
multiple borrows are disjoint. *This does not mean it is fundamentally illegal
136-
to make such a borrow*, just that Rust isn't as smart as you want.
137-
138-
To simplify things, we can model variables as a fake type of reference: *owned*
139-
references. Owned references have much the same semantics as mutable references:
140-
they can be re-borrowed in a mutable or shared manner, which makes them no
141-
longer live. Live owned references have the unique property that they can be
142-
moved out of (though mutable references *can* be swapped out of). This power is
143-
only given to *live* owned references because moving its referent would of
144-
course invalidate all outstanding references prematurely.
145-
146-
As a local lint against inappropriate mutation, only variables that are marked
147-
as `mut` can be borrowed mutably.
148-
149-
It is interesting to note that Box behaves exactly like an owned reference. It
150-
can be moved out of, and Rust understands it sufficiently to reason about its
151-
paths like a normal variable.
152-
153-
154-
155-
156-
# Aliasing
157-
158-
With liveness and paths defined, we can now properly define *aliasing*:
159-
160-
**A mutable reference is aliased if there exists another live reference to one
161-
of its ancestors or descendants.**
162-
163-
(If you prefer, you may also say the two live references alias *each other*.
164-
This has no semantic consequences, but is probably a more useful notion when
165-
verifying the soundness of a construct.)
166-
167-
That's it. Super simple right? Except for the fact that it took us two pages to
168-
define all of the terms in that definition. You know: Super. Simple.
169-
170-
Actually it's a bit more complicated than that. In addition to references, Rust
171-
has *raw pointers*: `*const T` and `*mut T`. Raw pointers have no inherent
172-
ownership or aliasing semantics. As a result, Rust makes absolutely no effort to
173-
track that they are used correctly, and they are wildly unsafe.
27+
Unfortunately, Rust hasn't actually defined its aliasing model. 🙀
17428

175-
**It is an open question to what degree raw pointers have alias semantics.
176-
However it is important for these definitions to be sound that the existence of
177-
a raw pointer does not imply some kind of live path.**
29+
While we wait for the Rust devs to specify the semantics of their language,
30+
let's use the next section to discuss what aliasing is in general, and why it
31+
matters.

0 commit comments

Comments
 (0)