Skip to content

Commit e5276df

Browse files
authored
Merge pull request #2514 from RalfJung/union-initialization-and-drop
Union initialization and Drop
2 parents eecc3f5 + e9f3184 commit e5276df

File tree

2 files changed

+378
-0
lines changed

2 files changed

+378
-0
lines changed

text/1444-union.md

+2
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,8 @@ Provide native support for C-compatible unions, defined via a new "contextual
1010
keyword" `union`, without breaking any existing code that uses `union` as an
1111
identifier.
1212

13+
**Note:** This RFC has been partially superseded by `unions-and-drop`.
14+
1315
# Motivation
1416
[motivation]: #motivation
1517

+376
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,376 @@
1+
- Feature Name: `union_initialization_and_drop`
2+
- Start Date: 2018-08-03
3+
- RFC PR: [rust-lang/rfcs#2514](https://github.com/rust-lang/rfcs/pull/2514)
4+
- Rust Issue: [rust-lang/rust#55149](https://github.com/rust-lang/rust/issues/55149)
5+
6+
# Summary
7+
[summary]: #summary
8+
9+
Unions do not allow fields of types that require drop glue (the code that is
10+
automatically run when a variables goes out of scope: recursively dropping the
11+
variable and all its fields), but they may still `impl Drop` themselves. We
12+
specify when one may move out of a union field and when the union's `drop` is
13+
called. To avoid undesired implicit calls of drop, we also restrict the use of
14+
`DerefMut` when unions are involved.
15+
16+
# Motivation
17+
[motivation]: #motivation
18+
19+
Currently, it is unstable to have a non-`Copy` field in the union. The main
20+
reason for this is that having fields which need drop glue raises some hard
21+
questions about whether to call that drop glue when assigning a union field, and
22+
how to make programming with such unions less of a time bomb (triggered by
23+
accidentally dropping data one meant to just overwrite). Not much progress has
24+
been made on stabilizing the unstable union features. This RFC proposes a route
25+
forwards that side-steps the time bomb: Do not allow fields with drop glue.
26+
27+
# Guide-level explanation
28+
[guide-level-explanation]: #guide-level-explanation
29+
30+
## Union Definition
31+
32+
When defining a union, it is a hard error to use a field type that requires drop glue.
33+
Examples:
34+
```rust
35+
// Accepted
36+
union Example1<T> {
37+
// `ManuallyDrop<T>` never has drop glue, even if `T` does.
38+
f1: ManuallyDrop<T>,
39+
// `RefCell<i32>` is a fully known type, and does not have drop glue.
40+
f2: RefCell<i32>,
41+
}
42+
union Example2<T: Copy> {
43+
// `Copy` types never have drop glue.
44+
f1: T,
45+
}
46+
trait Trait3 { type Assoc: Copy; }
47+
union Example3<T: Trait3> {
48+
// `T::Assoc` is `Copy` and hence cannot have drop glue.
49+
f1: T::Assoc,
50+
}
51+
52+
// Rejected
53+
union Example4<T> {
54+
// `T` might have drop glue, and then `RefCell<T>` would as well.
55+
f1: RefCell<T>,
56+
}
57+
trait Trait5 { type Assoc; }
58+
union Example5<T: Trait5> {
59+
// `T::Assoc` might have drop glue.
60+
f1: T::Assoc,
61+
}
62+
```
63+
64+
Ruling out possibly dropping types may seem restrictive, but thanks to
65+
`ManuallyDrop` it in fact is not: If the compiler rejects a union definition,
66+
you can always wrap field types in `ManuallyDrop` to obtain a working
67+
definition. This means you have to manually take care of when to drop the data,
68+
but that is already something to be concerned with when working on unions.
69+
70+
As a consequence, it is quite obvious that writing to a union field will never
71+
implicitly call `drop`. Such a write is hence always a safe operation. This
72+
removes a whole class of pitfalls related to `drop` being called in tricky
73+
unsafe code when you might not expect that to happen. (However, see below for
74+
some pitfalls that remain.)
75+
76+
Reading from a union field and creating a reference remain unsafe: We cannot
77+
guarantee that the field contains valid data.
78+
79+
## Union initialization and `Drop`
80+
81+
In two cases, the compiler cares about whether a (field of a) variable is
82+
initialized: When deciding whether a move from the field/variable is allowed
83+
(for cases where the type is not `Copy`), and when deciding whether or not the
84+
variable has to be dropped when it goes out of scope.
85+
86+
A union just does very simple initialization tracking: There is a single boolean
87+
state for the entire union and all of its fields. Nested inner fields are
88+
tracked just like they are for structs; however, when the union becomes
89+
(un)initialized, then all nested inner fields of all union fields are
90+
(un)initialized at once. So, (un)initializing a union field also
91+
(un)initializes its siblings. For example:
92+
93+
```rust
94+
// This code creates bad references and transmutes to `Vec` in incorrect ways.
95+
// This is just to demonstrate what the compiler would accept in terms of
96+
// tracking initialization.
97+
98+
struct S(i32); // not `Copy`, no drop glue
99+
union U { f1: ManuallyDrop<Vec<i32>>, f2: (S, S), f3: i32 }
100+
101+
let mut u: U;
102+
// Now `u` is not initialized: `&u`, `&u.f2` and `&u.f2.0` are all rejected.
103+
104+
// We can write into uninitialized inner fields:
105+
u.f2.1 = S(42);
106+
{ let _x = &u.f2.1; } // This field is initialized now.
107+
// But this does not change the initialization state of the union itself,
108+
// or any other (inner) field.
109+
110+
// We can initialize by assigning an entire field:
111+
u.f1 = ManuallyDrop::new(Vec::new());
112+
// Now *all (nested) fields* of `u` are initialized, including the siblings of `f1`:
113+
{ let _x = &u.f2; }
114+
{ let _x = &u.f2.0; }
115+
116+
// Equivalently, we can assign the entire union:
117+
u = U { f2: (S(42), S(23) };
118+
// Now `u` is still initialized.
119+
120+
// Copying does not change anything:
121+
let _x = u.f3;
122+
// Now `u` is still initialized.
123+
124+
// We can move out of an initialized union:
125+
let v = u.f1;
126+
// Now `f1` *and its siblings* are no longer initialized (they got "moved out of"):
127+
// `let _x = u.f2;` would hence get rejected, as would `&u.f1` and `foo(u)`.
128+
u.f1 = v;
129+
// Now `u` and all of its fields are initialized again ("moving back in").
130+
131+
// When we move out of an inner field, the other union fields become uninitialized
132+
// even if they are `Copy`.
133+
let s = u.f2.1;
134+
// Now `u.f1` and `u.f3` are no longer initialized. But `u.f2.0` is:
135+
let s = u.f2.0;
136+
```
137+
138+
If the union implements `Drop`, the same restrictions as for structs apply: It
139+
is not possible to initialize a field before initializing the entire variable,
140+
and it is not possible to move out of a field. For example:
141+
142+
```rust
143+
// This code creates bad references and transmutes to `Vec` in incorrect ways.
144+
// This is just to demonstrate what the compiler would accept in terms of
145+
// tracking initialization.
146+
147+
struct S(i32); // not `Copy`, no drop glue
148+
149+
union U { f1: ManuallyDrop<Vec<i32>>, f2: (S, S), f3: u32 }
150+
impl Drop for U {
151+
fn drop(&mut self) {
152+
println!("Goodbye!");
153+
}
154+
}
155+
156+
let mut u: U;
157+
// `u.f1 = ...;` gets rejected: Cannot initialize a union with `Drop` by assigning a field.
158+
u = U { f2: (S(42), S(1)) };
159+
// Now `u` is initialized.
160+
161+
// `let v = u.f1;` gets rejected: Cannot move out of union that implements `Drop`.
162+
let v_ref = &mut u.f1; // creating a reference is allowed
163+
let _x = u.f3; // copying out is allowed
164+
```
165+
166+
When a union implementing `Drop` goes out of scope, its destructor gets called if and only if the union is currently considered initialized:
167+
(Continuing the example from above.)
168+
169+
```rust
170+
{
171+
let u = U { f2: (S(0), S(1)) };
172+
// drop gets called
173+
}
174+
{
175+
let u = U { f1: ManuallyDrop::new(Vec::new()) };
176+
foo(u);
177+
// drop does NOT get called
178+
}
179+
```
180+
181+
## Potential pitfalls around `DerefMut`
182+
183+
There is still a potential pitfall left around assigning to union fields: If the
184+
assignment implicitly happens through a `DerefMut`, it may call drop glue. For
185+
example:
186+
187+
```rust
188+
#![feature(untagged_unions)]
189+
190+
use std::mem::ManuallyDrop;
191+
192+
union U<T> { x:(), f: ManuallyDrop<T> }
193+
194+
fn main() {
195+
let mut u : U<(Vec<i32>,)> = U { x: () };
196+
unsafe { u.f.0 = Vec::new() }; // uninitialized `Vec` being droped
197+
}
198+
```
199+
This requires `unsafe` because it desugars to `ManuallyDrop::deref_mut(&mut u.f).0`,
200+
and while writing to a union field is safe, taking a reference is not.
201+
202+
For this reason, `DerefMut` auto-deref is not applied when working on a union or
203+
its fields. However, note that manually dereferencing is still possible, so
204+
`*(u.f).0 = Vec::new()` is still a way to drop an uninitialized field! But this
205+
can never happen when no `*` is involved, and hopefully dereferencing an element
206+
of a union is a clear enough signal that the union better be initialized
207+
properly for this to make sense.
208+
209+
# Reference-level explanation
210+
[reference-level-explanation]: #reference-level-explanation
211+
212+
## Union definition
213+
214+
When defining a union, it is a hard error to use a field type that requires drop glue.
215+
This is checked as follows:
216+
217+
* Proceed recursively down the given type, insofar as the type involved is known
218+
at compile-time. For example, `u32`, `&mut T` and `ManuallyDrop<T>` are known
219+
to not have drop glue no matter the choice of `T`.
220+
* When hitting a type variable where no progress can be made, check that `T:
221+
Copy` as a proxy for `T` not requiring drop glue.
222+
223+
Note: Currently, union fields with drop glue are allowed on nightly with an
224+
unstable feature. This RFC proposes to remove support for that entirely; code using
225+
nightly might have to be changed.
226+
227+
## Writing to union fields
228+
229+
Writing to union fields is currently unsafe when the field has drop glue. This
230+
check is no longer needed, because union fields will never have drop glue.
231+
Moreover, writing to a nested field (e.g., `u.f1.x = 0;`) is currently unsafe as
232+
well, this should also become a safe operation as long as the path (expanded,
233+
i.e., after auto-derefs are inserted) consists *only of field projections, not
234+
deref's*. Note that this is sound only because `ManuallyDrop`'s only field is
235+
private (so, in fact, this is *not* sound inside the module that defines
236+
`ManuallyDrop`).
237+
238+
## Union initialization tracking
239+
240+
A "fragment" is a place of the form `local_var.field.field.field`, without any
241+
implicit derefs. A fragment can be either *initialized* or *uninitialized*.
242+
This state is approximated statically: The type system will only allow accesses
243+
to definitely initialized fragments. Drop elaboration needs to know the precise
244+
state of a fragment, for which purpose it adds run-time drop flags as needed.
245+
246+
If a fragment has some uninitialized nested fragments then it is still
247+
uninitialized and accesses to this fragment as a whole are prevented. This
248+
applies even if it also has a nested initialized fragment (in which case we speak
249+
of a *partially initialized* fragment). If a fragment has only initialized
250+
nested fragments then it is initialized as a whole and can be accessed.
251+
252+
A fragment becomes initialized when it is assigned to, or created using an
253+
initializer, or it is a union field and a sibling becomes initialized, or all
254+
its nested fragments become initialized. A fragment becomes uninitialized when
255+
it doesn't implement `Copy` and is moved out from, or it is a union field
256+
(possibly `Copy`) and its sibling becomes uninitialized, or some of its nested
257+
fragments becomes uninitialized.
258+
259+
In other words, union fields behave a lot like struct fields except that if one
260+
field changes initialization state, the others follow suit. In particular, if
261+
one union field becomes partially initialized (because one of its nested
262+
fragments got uninitialized), all its siblings become *entirely* uninitialized,
263+
including their nested fragments.
264+
265+
If a fragment is of a type which has an `impl Drop`, then its nested fragments
266+
cannot be separately (un)initialized. Only the entire fragment can be
267+
initialized by assignment, and the entire fragment can be uninitialized by
268+
moving out.
269+
270+
NOTE: To my knowledge, this already mostly matches the current
271+
implementation. The only exception is that "fragment becomes initialized when
272+
all its nested fragments become initialized" rule is not currently implemented
273+
for neither structs nor unions, so the compiler accepts less code than it
274+
should. However, `impl Drop for Union` and non-`Copy` union fields are behind a
275+
feature gate, so the effects of this on unions cannot currently be observed on
276+
stable compilers.
277+
278+
(This closely follows a
279+
[previously proposed RFC by @petrochenkov](https://github.com/petrochenkov/rfcs/blob/e5266bd105f592f7408b8592c5c3deaccba7f1ec/text/1444-union.md#initialization-state).)
280+
281+
## Potential pitfalls around `DerefMut`
282+
283+
When adding auto-derefs on the left-hand side of an assignment, as we traverse
284+
the path, once we hit a `union`, we stop adding further auto-derefs. So with
285+
`s: Struct` and `u: Union`, when encountering `s.u.f.x`, auto-deref *does*
286+
happen on `s`, but not on `s.u` or any of the later components.
287+
288+
Notice that this relies crucially on the only field of `ManuallyDrop` being
289+
private! If we could project directly through that field, no `DerefMut` would
290+
be needed to reproduce the problematic example from the "guide" section.
291+
292+
# Drawbacks
293+
[drawbacks]: #drawbacks
294+
295+
This makes working with unions involving types that may have drop glue slightly
296+
more verbose than today: One has to write `ManuallyDrop` more often than one may
297+
want to.
298+
299+
The restriction placed on `DerefMut` is not fully backwards compatible: A type
300+
could implement `Copy + DerefMut` and actually rely on the deref coercion inside
301+
a union. That seems very unlikely, but should be tested with a crater run.
302+
303+
The initialization tracking rules are somewhat surprising, and one might prefer
304+
the compiler to just not track anything when it comes to unions. After all, the
305+
compiler fundamentally cannot know what part of the union is properly
306+
initialized. Unfortunately, not having any initialization tracking is not an
307+
option when non-`Copy` fields are involved: We have to decide if moving out of a
308+
union field is allowed.
309+
310+
# Rationale and alternatives
311+
[rationale-and-alternatives]: #rationale-and-alternatives
312+
313+
Ruling out fields with drop glue does not, in fact, reduce the expressiveness of
314+
unions because one can use `ManuallyDrop<T>` to obtain a drop-glue-free version
315+
of `T`. If anything, having the `ManuallyDrop` in the union definition should
316+
help to drive home the point that no automatic dropping is happening, ever.
317+
(Before this RFC, automatic dropping is happening when assigning to a union
318+
field but not when the union goes out of scope. That seems to be the result of
319+
necessity, not of a coherent design.)
320+
321+
An alternative approach to proceed with unions has been
322+
[previously proposed by @petrochenkov](https://github.com/petrochenkov/rfcs/blob/e5266bd105f592f7408b8592c5c3deaccba7f1ec/text/1444-union.md#initialization-state).
323+
That proposal replaces RFC 1444 and goes into a lot more points than this much more
324+
limited proposal. In particular, it allows fields with drop glue. However, it
325+
can be pretty hard for the programmer to predict when drop glue will be
326+
automatically invoked on assignment or not, because the initialization tracking
327+
(which this RFC adapts from @petrochenkov's proposal) can sometimes be a little
328+
surprising when looking at individual fields: Whether `u.f2 = ...;` drops
329+
depends on whether `u.f1` has been previously initialized. We hence
330+
have a lint to warn people that unions with drop-glue fields are not always
331+
very well-behaved. This RFC, on the other hand, side-steps the entire question
332+
by not allowing fields with drop glue. Initialization tracking thus has no
333+
effect on the code executed during an assignment of a union field. For unions
334+
that `impl Drop`, it still has an effect on what happens when the union goes out
335+
of scope, but in that case initialization is so restricted that I cannot think
336+
of any surprises. Together with the `DerefMut` restriction, that should make it
337+
very unlikely to accidentally call `drop` when it was not intended.
338+
339+
We could significantly simplify the initialization tracking by always applying
340+
the rules that are currently only applied to unions that `impl Drop`. However,
341+
that does not actually help with the pitfall described above. The more complex
342+
rules allow more code that many will reasonably expect to work, and do not seem
343+
to introduce any additional pitfalls.
344+
345+
We could reduce the relevance of state tracking further by not to allowing `impl
346+
Drop for Union`. It is still possible to add a wrapper struct around the union
347+
which has drop glue, so this does not restrict expressiveness. However, this
348+
seems unnecessarily cumbersome, and it does not seem to help avoid any
349+
surprises. State tracking around unions that `impl Drop` is pretty much as
350+
simple as it gets.
351+
352+
# Prior art
353+
[prior-art]: #prior-art
354+
355+
I do not know of any language combining initialization tracking and destructors
356+
with unions: C++ [never runs destructors for fields of unions][cpp_union_drop],
357+
and it does not track whether fields of a data structures are initialized to
358+
(dis)allow references or moves.
359+
360+
[cpp_union_drop]: https://en.cppreference.com/w/cpp/language/union
361+
362+
# Unresolved questions
363+
[unresolved-questions]: #unresolved-questions
364+
365+
Should we even try to avoid the `DerefMut`-related pitfall? And if yes, should
366+
we maybe try harder, e.g. lint against using `*` below a union type when
367+
describing a place? That would make people write `let v = &mut u.f; *v =
368+
Vec::new();`. It is not clear that this helps in terms of pointing out that an
369+
automatic drop may be happening.
370+
371+
We could allow moving out of a union field even if it implements `Drop`. That
372+
would have the effect of making the union considered uninitialized, i.e., it
373+
would not be dropped implicitly when it goes out of scope. However, it might be
374+
useful to not let people do this accidentally. The same effect can always be
375+
achieved by having a dropless union wrapped in a newtype `struct` with the
376+
desired `Drop`.

0 commit comments

Comments
 (0)