Skip to content

Commit 9b61841

Browse files
authored
Merge pull request #166 from CAD97/patch-1
Note design constraints on hypothetical `DynSized`
2 parents 6155d3b + 9e68194 commit 9b61841

File tree

1 file changed

+142
-0
lines changed

1 file changed

+142
-0
lines changed
Lines changed: 142 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
# Exotically sized types (`DynSized` and `extern type`)
2+
3+
## Overview
4+
5+
In current Rust, there's two kinds of types with respect to sizing:
6+
if a type is `Sized`, its layout (size and alignment) is known statically,
7+
and if a type is `?Sized`, its layout may not be known until runtime (e.g. via a vtable).
8+
9+
However, more exotically sized types exist; the most common example is opaque `extern type`.
10+
`extern type`s have an *unknown* layout to Rust, and as such can only be used behind a pointer type.
11+
Since the most unsized a type can currently be is `?Sized`, though,
12+
the compiler has to make up a size and alignment to return from `mem::size_of_val`/`align_of_val`.
13+
Currently the compiler returns a size of 0 and an alignment of 1.
14+
Lying in this fashion is considered undesirable \[2].
15+
16+
Additionally, some C-header-interface libraries expose an opaque (incomplete) type
17+
but also provide a function returning the size of the type and expect the caller to allocate space.
18+
This is useful to allow the library to change the size of the type,
19+
but still allow the caller to control allocation (e.g. using a custom arena allocator).
20+
When bridging to Rust, these types should ideally have access to dynamic size/align.
21+
22+
## Proposed Solution
23+
24+
The most obvious and independently reinvented solution is a "`DynSized`" trait that provides dynamic size/align information.
25+
`extern type` would not implement `DynSized`, and generic code could opt into `?DynSized` types to support such.
26+
27+
At the time of writing, there is weak approval from T-lang to proceed with an internal-only version of `DynSized`
28+
which is used to prohibit the use of `extern type` in standard `<T: ?Sized>` generic arguments \[2].
29+
30+
This design document is about the restrictions on what `T: ?Sized + DynSized` actually needs to imply.
31+
32+
## Design Constraints
33+
34+
### `Arc` and `Weak`
35+
36+
`Arc` supports "zombie" references, where all strong `Arc` and the pointee have been dropped,
37+
but `Weak` handles still exist and so the allocation still exists.
38+
This means that `Weak` needs to be able to determine the layout of the allocation from a dropped pointee,
39+
as the `T` is dropped with the last `Arc` but the allocation freed with the last `Weak`.
40+
41+
In addition, `Weak` are pointers to the *reference count* part of the `ArcInner` allocation,
42+
and thus need to *statically* know the alignment of the pointee type to determine the offset
43+
(it cannot call `align_of_val_raw` without first knowing the offset).
44+
45+
There are four potential resolutions that handle both size and alignment uniformly:
46+
47+
- Store layout information in the `ArcInner` header, or
48+
- Store layout information in `T`'s space after it's been dropped, or
49+
- Require that layout be determined solely from pointee metadata, or
50+
- Require that layout be determinable from a dropped pointee.[^why]
51+
52+
[^why]: This is trivially the case if determining the layout does not read the pointee (i.e. is derivable by just the potentially wide pointer);
53+
alternatively, the pointee could ensure that layout information (e.g. vtable pointer) remains valid to read even after it's been dropped.]
54+
55+
Dealing with alignment can be simplified by changing `Arc<T>` from storing `*mut ArcInner<T>` to
56+
storing `*mut T` and storing the refcount metadata at a fixed negative offset independent of `T`.
57+
58+
T-lang commented on this in \[3] (w.r.t. const `Weak<T>::[into|from]_raw` and `Weak::new`):
59+
60+
> Consensus from meeting:
61+
> - We approve the option to make `align_of_val_raw` require a once-valid-but-dropped value, in order to better support thin objects
62+
> - we believe the sentinel design \[of `Weak::new`] means that `align_of_val_raw` is only ever invoked on once-valid-but-dropped values
63+
> - We do not want `align_of_val_raw` to be forced to work for metadata + thin pointer
64+
> - Implement `Weak::from_raw` to check for sentinel and take some special action if it is observed
65+
> - potential cost: for unsized types (only), there is an extra branch (but if custom dst doesn’t require \[dynamic] alignment, we can change this later)
66+
> - It is not really lang team’s call, but we are -1 on adding more fields to `Rc`/`Arc`
67+
> - For custom dst, the design will have to accommodate getting the size and alignment from “once-valid-but-dropped” values (values that were once valid but have been dropped); this is a non-issue for known use cases like c-string and thin-objects (which store a vtable)
68+
> - (but could be relevant for dynamically allocated vtables)
69+
70+
### `Mutex` (and more generally, `UnsafeCell`)
71+
72+
The problem statement here is the combination of `&Mutex<T>` and `&mut T` both being usable concurrently,
73+
plus the following presumably sound function:
74+
75+
```rust
76+
fn noop_write<T: ?Sized>(it: &mut T) {
77+
let len = std::mem::size_of_val(it);
78+
let ptr = it as *mut T as *mut u8;
79+
unsafe { std::ptr::copy(ptr, ptr, len); }
80+
}
81+
```
82+
83+
To make the conflict abundantly clear, consider the following:
84+
85+
```rust
86+
let mutex: &Mutex<ThinCStr> = /* elided */;
87+
88+
join(
89+
|| {
90+
let mut lock = mutex.lock();
91+
let it: &mut ThinCStr = &mut *lock;
92+
noop_write(it);
93+
},
94+
|| {
95+
std::mem::size_of_val(mutex);
96+
},
97+
);
98+
```
99+
100+
In order to determine the size of `Mutex<ThinCStr>`, you have to know the size of `ThinCStr`, which is inline to the `Mutex`.
101+
To determine the size of `ThinCStr`, you have to read every byte to find the terminating nul byte (equiv. call `strlen`).
102+
However, in the other fork, we lock the mutex and use the `&mut ThinCStr` to read and write-back every byte of the `ThinCStr`.
103+
Because the `&mut` side of the operation is surely nonatomic (and `strlen` likely isn't), this is an unsafe data race, thus UB.
104+
105+
This constraint is more difficult to resolve than the previous one coming from `Arc`/`Weak`.
106+
Fundamentally, types like `ThinCStr` which require reading the pointee to determine layout information break a core property of `UnsafeCell`
107+
that `&UnsafeCell<T>` cannot (safely) read (or write) any of `T`'s bytes, if `std::mem::size_of_val` works without locking.
108+
109+
Thus (at the time of writing) there are three known potential resolutions to this constraint:
110+
111+
- Require layout to be calculated solely from thin pointer and pointee metadata,
112+
- Require `size_of_val` to acquire a read lock (for `Mutex`-like types),
113+
- Declare `noop_write` is only sound for types which determine layout without reading the pointee, or
114+
- Prohibit the use of pointee-determined-layout types in `Mutex`-like types.
115+
116+
## Potential Conclusions
117+
118+
This heading is the notes' author's (@CAD97's) opinion only:
119+
120+
From the above, there result *four* classes of sizedness that Rust *could* care about \[1]:
121+
122+
- "`T: Sized + MetaSized + DynSized`", where the size and alignment are known statically;
123+
- "`T: ?Sized + MetaSized + DynSized`", where the size and alignment are known from the data pointer and metadata;
124+
- "`T: ?Sized + ?MetaSized + DynSized`", where the size and alignment require reading the pointee; and
125+
- "`T: ?Sized + ?MetaSized + ?DynSized`", where the size and alignment cannot be determined by (generic) code.
126+
127+
Examples of these are respectively `u8`, `dyn Trait`, `ThinCStr`, and `extern type`.
128+
129+
@CAD97 posits that in the majority of cases,
130+
`OwningPointer<T>`-like types want "`?Sized + ?MetaSized + DynSized`",
131+
`Ref<T>`-like types want "`?Sized + ?MetaSized + ?DynSized`", and
132+
`UnsafeCell<T>`-like types want "`?Sized + MetaSized + DynSized`".
133+
134+
Additionally, it could be useful to restrict `MetaSized` to only know the pointee metadata and not the data pointer;
135+
this would allow things like `[T] where T: ?Sized + MetaSized` using both slice and `T` metadata for an extra-fat pointer
136+
(e.g. `[[T]]` for 2D slices doing the obvious thing (without stride)).
137+
138+
## References
139+
140+
- \[1] https://internals.rust-lang.org/t/erfc-minimal-custom-dsts-via-extern-type-dynsized/16591?u=cad97
141+
- \[2] https://github.com/rust-lang/rust/issues/49708
142+
- \[3] https://hackmd.io/7r3_is6uTz-163fsOV8Vfg

0 commit comments

Comments
 (0)