|
| 1 | +- Feature Name: `union_initialization_and_drop` |
| 2 | +- Start Date: 2018-08-03 |
| 3 | +- RFC PR: [rust-lang/rfcs#2514](https://github.com/rust-lang/rfcs/pull/2514) |
| 4 | +- Rust Issue: [rust-lang/rust#55149](https://github.com/rust-lang/rust/issues/55149) |
| 5 | + |
| 6 | +# Summary |
| 7 | +[summary]: #summary |
| 8 | + |
| 9 | +Unions do not allow fields of types that require drop glue (the code that is |
| 10 | +automatically run when a variables goes out of scope: recursively dropping the |
| 11 | +variable and all its fields), but they may still `impl Drop` themselves. We |
| 12 | +specify when one may move out of a union field and when the union's `drop` is |
| 13 | +called. To avoid undesired implicit calls of drop, we also restrict the use of |
| 14 | +`DerefMut` when unions are involved. |
| 15 | + |
| 16 | +# Motivation |
| 17 | +[motivation]: #motivation |
| 18 | + |
| 19 | +Currently, it is unstable to have a non-`Copy` field in the union. The main |
| 20 | +reason for this is that having fields which need drop glue raises some hard |
| 21 | +questions about whether to call that drop glue when assigning a union field, and |
| 22 | +how to make programming with such unions less of a time bomb (triggered by |
| 23 | +accidentally dropping data one meant to just overwrite). Not much progress has |
| 24 | +been made on stabilizing the unstable union features. This RFC proposes a route |
| 25 | +forwards that side-steps the time bomb: Do not allow fields with drop glue. |
| 26 | + |
| 27 | +# Guide-level explanation |
| 28 | +[guide-level-explanation]: #guide-level-explanation |
| 29 | + |
| 30 | +## Union Definition |
| 31 | + |
| 32 | +When defining a union, it is a hard error to use a field type that requires drop glue. |
| 33 | +Examples: |
| 34 | +```rust |
| 35 | +// Accepted |
| 36 | +union Example1<T> { |
| 37 | + // `ManuallyDrop<T>` never has drop glue, even if `T` does. |
| 38 | + f1: ManuallyDrop<T>, |
| 39 | + // `RefCell<i32>` is a fully known type, and does not have drop glue. |
| 40 | + f2: RefCell<i32>, |
| 41 | +} |
| 42 | +union Example2<T: Copy> { |
| 43 | + // `Copy` types never have drop glue. |
| 44 | + f1: T, |
| 45 | +} |
| 46 | +trait Trait3 { type Assoc: Copy; } |
| 47 | +union Example3<T: Trait3> { |
| 48 | + // `T::Assoc` is `Copy` and hence cannot have drop glue. |
| 49 | + f1: T::Assoc, |
| 50 | +} |
| 51 | + |
| 52 | +// Rejected |
| 53 | +union Example4<T> { |
| 54 | + // `T` might have drop glue, and then `RefCell<T>` would as well. |
| 55 | + f1: RefCell<T>, |
| 56 | +} |
| 57 | +trait Trait5 { type Assoc; } |
| 58 | +union Example5<T: Trait5> { |
| 59 | + // `T::Assoc` might have drop glue. |
| 60 | + f1: T::Assoc, |
| 61 | +} |
| 62 | +``` |
| 63 | + |
| 64 | +Ruling out possibly dropping types may seem restrictive, but thanks to |
| 65 | +`ManuallyDrop` it in fact is not: If the compiler rejects a union definition, |
| 66 | +you can always wrap field types in `ManuallyDrop` to obtain a working |
| 67 | +definition. This means you have to manually take care of when to drop the data, |
| 68 | +but that is already something to be concerned with when working on unions. |
| 69 | + |
| 70 | +As a consequence, it is quite obvious that writing to a union field will never |
| 71 | +implicitly call `drop`. Such a write is hence always a safe operation. This |
| 72 | +removes a whole class of pitfalls related to `drop` being called in tricky |
| 73 | +unsafe code when you might not expect that to happen. (However, see below for |
| 74 | +some pitfalls that remain.) |
| 75 | + |
| 76 | +Reading from a union field and creating a reference remain unsafe: We cannot |
| 77 | +guarantee that the field contains valid data. |
| 78 | + |
| 79 | +## Union initialization and `Drop` |
| 80 | + |
| 81 | +In two cases, the compiler cares about whether a (field of a) variable is |
| 82 | +initialized: When deciding whether a move from the field/variable is allowed |
| 83 | +(for cases where the type is not `Copy`), and when deciding whether or not the |
| 84 | +variable has to be dropped when it goes out of scope. |
| 85 | + |
| 86 | +A union just does very simple initialization tracking: There is a single boolean |
| 87 | +state for the entire union and all of its fields. Nested inner fields are |
| 88 | +tracked just like they are for structs; however, when the union becomes |
| 89 | +(un)initialized, then all nested inner fields of all union fields are |
| 90 | +(un)initialized at once. So, (un)initializing a union field also |
| 91 | +(un)initializes its siblings. For example: |
| 92 | + |
| 93 | +```rust |
| 94 | +// This code creates bad references and transmutes to `Vec` in incorrect ways. |
| 95 | +// This is just to demonstrate what the compiler would accept in terms of |
| 96 | +// tracking initialization. |
| 97 | + |
| 98 | +struct S(i32); // not `Copy`, no drop glue |
| 99 | +union U { f1: ManuallyDrop<Vec<i32>>, f2: (S, S), f3: i32 } |
| 100 | + |
| 101 | +let mut u: U; |
| 102 | +// Now `u` is not initialized: `&u`, `&u.f2` and `&u.f2.0` are all rejected. |
| 103 | + |
| 104 | +// We can write into uninitialized inner fields: |
| 105 | +u.f2.1 = S(42); |
| 106 | +{ let _x = &u.f2.1; } // This field is initialized now. |
| 107 | +// But this does not change the initialization state of the union itself, |
| 108 | +// or any other (inner) field. |
| 109 | + |
| 110 | +// We can initialize by assigning an entire field: |
| 111 | +u.f1 = ManuallyDrop::new(Vec::new()); |
| 112 | +// Now *all (nested) fields* of `u` are initialized, including the siblings of `f1`: |
| 113 | +{ let _x = &u.f2; } |
| 114 | +{ let _x = &u.f2.0; } |
| 115 | + |
| 116 | +// Equivalently, we can assign the entire union: |
| 117 | +u = U { f2: (S(42), S(23) }; |
| 118 | +// Now `u` is still initialized. |
| 119 | + |
| 120 | +// Copying does not change anything: |
| 121 | +let _x = u.f3; |
| 122 | +// Now `u` is still initialized. |
| 123 | + |
| 124 | +// We can move out of an initialized union: |
| 125 | +let v = u.f1; |
| 126 | +// Now `f1` *and its siblings* are no longer initialized (they got "moved out of"): |
| 127 | +// `let _x = u.f2;` would hence get rejected, as would `&u.f1` and `foo(u)`. |
| 128 | +u.f1 = v; |
| 129 | +// Now `u` and all of its fields are initialized again ("moving back in"). |
| 130 | + |
| 131 | +// When we move out of an inner field, the other union fields become uninitialized |
| 132 | +// even if they are `Copy`. |
| 133 | +let s = u.f2.1; |
| 134 | +// Now `u.f1` and `u.f3` are no longer initialized. But `u.f2.0` is: |
| 135 | +let s = u.f2.0; |
| 136 | +``` |
| 137 | + |
| 138 | +If the union implements `Drop`, the same restrictions as for structs apply: It |
| 139 | +is not possible to initialize a field before initializing the entire variable, |
| 140 | +and it is not possible to move out of a field. For example: |
| 141 | + |
| 142 | +```rust |
| 143 | +// This code creates bad references and transmutes to `Vec` in incorrect ways. |
| 144 | +// This is just to demonstrate what the compiler would accept in terms of |
| 145 | +// tracking initialization. |
| 146 | + |
| 147 | +struct S(i32); // not `Copy`, no drop glue |
| 148 | + |
| 149 | +union U { f1: ManuallyDrop<Vec<i32>>, f2: (S, S), f3: u32 } |
| 150 | +impl Drop for U { |
| 151 | + fn drop(&mut self) { |
| 152 | + println!("Goodbye!"); |
| 153 | + } |
| 154 | +} |
| 155 | + |
| 156 | +let mut u: U; |
| 157 | +// `u.f1 = ...;` gets rejected: Cannot initialize a union with `Drop` by assigning a field. |
| 158 | +u = U { f2: (S(42), S(1)) }; |
| 159 | +// Now `u` is initialized. |
| 160 | + |
| 161 | +// `let v = u.f1;` gets rejected: Cannot move out of union that implements `Drop`. |
| 162 | +let v_ref = &mut u.f1; // creating a reference is allowed |
| 163 | +let _x = u.f3; // copying out is allowed |
| 164 | +``` |
| 165 | + |
| 166 | +When a union implementing `Drop` goes out of scope, its destructor gets called if and only if the union is currently considered initialized: |
| 167 | +(Continuing the example from above.) |
| 168 | + |
| 169 | +```rust |
| 170 | +{ |
| 171 | + let u = U { f2: (S(0), S(1)) }; |
| 172 | + // drop gets called |
| 173 | +} |
| 174 | +{ |
| 175 | + let u = U { f1: ManuallyDrop::new(Vec::new()) }; |
| 176 | + foo(u); |
| 177 | + // drop does NOT get called |
| 178 | +} |
| 179 | +``` |
| 180 | + |
| 181 | +## Potential pitfalls around `DerefMut` |
| 182 | + |
| 183 | +There is still a potential pitfall left around assigning to union fields: If the |
| 184 | +assignment implicitly happens through a `DerefMut`, it may call drop glue. For |
| 185 | +example: |
| 186 | + |
| 187 | +```rust |
| 188 | +#![feature(untagged_unions)] |
| 189 | + |
| 190 | +use std::mem::ManuallyDrop; |
| 191 | + |
| 192 | +union U<T> { x:(), f: ManuallyDrop<T> } |
| 193 | + |
| 194 | +fn main() { |
| 195 | + let mut u : U<(Vec<i32>,)> = U { x: () }; |
| 196 | + unsafe { u.f.0 = Vec::new() }; // uninitialized `Vec` being droped |
| 197 | +} |
| 198 | +``` |
| 199 | +This requires `unsafe` because it desugars to `ManuallyDrop::deref_mut(&mut u.f).0`, |
| 200 | +and while writing to a union field is safe, taking a reference is not. |
| 201 | + |
| 202 | +For this reason, `DerefMut` auto-deref is not applied when working on a union or |
| 203 | +its fields. However, note that manually dereferencing is still possible, so |
| 204 | +`*(u.f).0 = Vec::new()` is still a way to drop an uninitialized field! But this |
| 205 | +can never happen when no `*` is involved, and hopefully dereferencing an element |
| 206 | +of a union is a clear enough signal that the union better be initialized |
| 207 | +properly for this to make sense. |
| 208 | + |
| 209 | +# Reference-level explanation |
| 210 | +[reference-level-explanation]: #reference-level-explanation |
| 211 | + |
| 212 | +## Union definition |
| 213 | + |
| 214 | +When defining a union, it is a hard error to use a field type that requires drop glue. |
| 215 | +This is checked as follows: |
| 216 | + |
| 217 | +* Proceed recursively down the given type, insofar as the type involved is known |
| 218 | + at compile-time. For example, `u32`, `&mut T` and `ManuallyDrop<T>` are known |
| 219 | + to not have drop glue no matter the choice of `T`. |
| 220 | +* When hitting a type variable where no progress can be made, check that `T: |
| 221 | + Copy` as a proxy for `T` not requiring drop glue. |
| 222 | + |
| 223 | +Note: Currently, union fields with drop glue are allowed on nightly with an |
| 224 | +unstable feature. This RFC proposes to remove support for that entirely; code using |
| 225 | +nightly might have to be changed. |
| 226 | + |
| 227 | +## Writing to union fields |
| 228 | + |
| 229 | +Writing to union fields is currently unsafe when the field has drop glue. This |
| 230 | +check is no longer needed, because union fields will never have drop glue. |
| 231 | +Moreover, writing to a nested field (e.g., `u.f1.x = 0;`) is currently unsafe as |
| 232 | +well, this should also become a safe operation as long as the path (expanded, |
| 233 | +i.e., after auto-derefs are inserted) consists *only of field projections, not |
| 234 | +deref's*. Note that this is sound only because `ManuallyDrop`'s only field is |
| 235 | +private (so, in fact, this is *not* sound inside the module that defines |
| 236 | +`ManuallyDrop`). |
| 237 | + |
| 238 | +## Union initialization tracking |
| 239 | + |
| 240 | +A "fragment" is a place of the form `local_var.field.field.field`, without any |
| 241 | +implicit derefs. A fragment can be either *initialized* or *uninitialized*. |
| 242 | +This state is approximated statically: The type system will only allow accesses |
| 243 | +to definitely initialized fragments. Drop elaboration needs to know the precise |
| 244 | +state of a fragment, for which purpose it adds run-time drop flags as needed. |
| 245 | + |
| 246 | +If a fragment has some uninitialized nested fragments then it is still |
| 247 | +uninitialized and accesses to this fragment as a whole are prevented. This |
| 248 | +applies even if it also has a nested initialized fragment (in which case we speak |
| 249 | +of a *partially initialized* fragment). If a fragment has only initialized |
| 250 | +nested fragments then it is initialized as a whole and can be accessed. |
| 251 | + |
| 252 | +A fragment becomes initialized when it is assigned to, or created using an |
| 253 | +initializer, or it is a union field and a sibling becomes initialized, or all |
| 254 | +its nested fragments become initialized. A fragment becomes uninitialized when |
| 255 | +it doesn't implement `Copy` and is moved out from, or it is a union field |
| 256 | +(possibly `Copy`) and its sibling becomes uninitialized, or some of its nested |
| 257 | +fragments becomes uninitialized. |
| 258 | + |
| 259 | +In other words, union fields behave a lot like struct fields except that if one |
| 260 | +field changes initialization state, the others follow suit. In particular, if |
| 261 | +one union field becomes partially initialized (because one of its nested |
| 262 | +fragments got uninitialized), all its siblings become *entirely* uninitialized, |
| 263 | +including their nested fragments. |
| 264 | + |
| 265 | +If a fragment is of a type which has an `impl Drop`, then its nested fragments |
| 266 | +cannot be separately (un)initialized. Only the entire fragment can be |
| 267 | +initialized by assignment, and the entire fragment can be uninitialized by |
| 268 | +moving out. |
| 269 | + |
| 270 | +NOTE: To my knowledge, this already mostly matches the current |
| 271 | +implementation. The only exception is that "fragment becomes initialized when |
| 272 | +all its nested fragments become initialized" rule is not currently implemented |
| 273 | +for neither structs nor unions, so the compiler accepts less code than it |
| 274 | +should. However, `impl Drop for Union` and non-`Copy` union fields are behind a |
| 275 | +feature gate, so the effects of this on unions cannot currently be observed on |
| 276 | +stable compilers. |
| 277 | + |
| 278 | +(This closely follows a |
| 279 | +[previously proposed RFC by @petrochenkov](https://github.com/petrochenkov/rfcs/blob/e5266bd105f592f7408b8592c5c3deaccba7f1ec/text/1444-union.md#initialization-state).) |
| 280 | + |
| 281 | +## Potential pitfalls around `DerefMut` |
| 282 | + |
| 283 | +When adding auto-derefs on the left-hand side of an assignment, as we traverse |
| 284 | +the path, once we hit a `union`, we stop adding further auto-derefs. So with |
| 285 | +`s: Struct` and `u: Union`, when encountering `s.u.f.x`, auto-deref *does* |
| 286 | +happen on `s`, but not on `s.u` or any of the later components. |
| 287 | + |
| 288 | +Notice that this relies crucially on the only field of `ManuallyDrop` being |
| 289 | +private! If we could project directly through that field, no `DerefMut` would |
| 290 | +be needed to reproduce the problematic example from the "guide" section. |
| 291 | + |
| 292 | +# Drawbacks |
| 293 | +[drawbacks]: #drawbacks |
| 294 | + |
| 295 | +This makes working with unions involving types that may have drop glue slightly |
| 296 | +more verbose than today: One has to write `ManuallyDrop` more often than one may |
| 297 | +want to. |
| 298 | + |
| 299 | +The restriction placed on `DerefMut` is not fully backwards compatible: A type |
| 300 | +could implement `Copy + DerefMut` and actually rely on the deref coercion inside |
| 301 | +a union. That seems very unlikely, but should be tested with a crater run. |
| 302 | + |
| 303 | +The initialization tracking rules are somewhat surprising, and one might prefer |
| 304 | +the compiler to just not track anything when it comes to unions. After all, the |
| 305 | +compiler fundamentally cannot know what part of the union is properly |
| 306 | +initialized. Unfortunately, not having any initialization tracking is not an |
| 307 | +option when non-`Copy` fields are involved: We have to decide if moving out of a |
| 308 | +union field is allowed. |
| 309 | + |
| 310 | +# Rationale and alternatives |
| 311 | +[rationale-and-alternatives]: #rationale-and-alternatives |
| 312 | + |
| 313 | +Ruling out fields with drop glue does not, in fact, reduce the expressiveness of |
| 314 | +unions because one can use `ManuallyDrop<T>` to obtain a drop-glue-free version |
| 315 | +of `T`. If anything, having the `ManuallyDrop` in the union definition should |
| 316 | +help to drive home the point that no automatic dropping is happening, ever. |
| 317 | +(Before this RFC, automatic dropping is happening when assigning to a union |
| 318 | +field but not when the union goes out of scope. That seems to be the result of |
| 319 | +necessity, not of a coherent design.) |
| 320 | + |
| 321 | +An alternative approach to proceed with unions has been |
| 322 | +[previously proposed by @petrochenkov](https://github.com/petrochenkov/rfcs/blob/e5266bd105f592f7408b8592c5c3deaccba7f1ec/text/1444-union.md#initialization-state). |
| 323 | +That proposal replaces RFC 1444 and goes into a lot more points than this much more |
| 324 | +limited proposal. In particular, it allows fields with drop glue. However, it |
| 325 | +can be pretty hard for the programmer to predict when drop glue will be |
| 326 | +automatically invoked on assignment or not, because the initialization tracking |
| 327 | +(which this RFC adapts from @petrochenkov's proposal) can sometimes be a little |
| 328 | +surprising when looking at individual fields: Whether `u.f2 = ...;` drops |
| 329 | +depends on whether `u.f1` has been previously initialized. We hence |
| 330 | +have a lint to warn people that unions with drop-glue fields are not always |
| 331 | +very well-behaved. This RFC, on the other hand, side-steps the entire question |
| 332 | +by not allowing fields with drop glue. Initialization tracking thus has no |
| 333 | +effect on the code executed during an assignment of a union field. For unions |
| 334 | +that `impl Drop`, it still has an effect on what happens when the union goes out |
| 335 | +of scope, but in that case initialization is so restricted that I cannot think |
| 336 | +of any surprises. Together with the `DerefMut` restriction, that should make it |
| 337 | +very unlikely to accidentally call `drop` when it was not intended. |
| 338 | + |
| 339 | +We could significantly simplify the initialization tracking by always applying |
| 340 | +the rules that are currently only applied to unions that `impl Drop`. However, |
| 341 | +that does not actually help with the pitfall described above. The more complex |
| 342 | +rules allow more code that many will reasonably expect to work, and do not seem |
| 343 | +to introduce any additional pitfalls. |
| 344 | + |
| 345 | +We could reduce the relevance of state tracking further by not to allowing `impl |
| 346 | +Drop for Union`. It is still possible to add a wrapper struct around the union |
| 347 | +which has drop glue, so this does not restrict expressiveness. However, this |
| 348 | +seems unnecessarily cumbersome, and it does not seem to help avoid any |
| 349 | +surprises. State tracking around unions that `impl Drop` is pretty much as |
| 350 | +simple as it gets. |
| 351 | + |
| 352 | +# Prior art |
| 353 | +[prior-art]: #prior-art |
| 354 | + |
| 355 | +I do not know of any language combining initialization tracking and destructors |
| 356 | +with unions: C++ [never runs destructors for fields of unions][cpp_union_drop], |
| 357 | +and it does not track whether fields of a data structures are initialized to |
| 358 | +(dis)allow references or moves. |
| 359 | + |
| 360 | +[cpp_union_drop]: https://en.cppreference.com/w/cpp/language/union |
| 361 | + |
| 362 | +# Unresolved questions |
| 363 | +[unresolved-questions]: #unresolved-questions |
| 364 | + |
| 365 | +Should we even try to avoid the `DerefMut`-related pitfall? And if yes, should |
| 366 | +we maybe try harder, e.g. lint against using `*` below a union type when |
| 367 | +describing a place? That would make people write `let v = &mut u.f; *v = |
| 368 | +Vec::new();`. It is not clear that this helps in terms of pointing out that an |
| 369 | +automatic drop may be happening. |
| 370 | + |
| 371 | +We could allow moving out of a union field even if it implements `Drop`. That |
| 372 | +would have the effect of making the union considered uninitialized, i.e., it |
| 373 | +would not be dropped implicitly when it goes out of scope. However, it might be |
| 374 | +useful to not let people do this accidentally. The same effect can always be |
| 375 | +achieved by having a dropless union wrapped in a newtype `struct` with the |
| 376 | +desired `Drop`. |
0 commit comments