Skip to content

Commit c75caa5

Browse files
Gankraworkingjubilee
authored andcommitted
WIP PROOF-OF-CONCEPT: experiment with very strict pointer provenance
This patch series examines the question: how bad would it be if we adopted an extremely strict pointer provenance model that completely banished all int<->ptr casts. The key insight to making this approach even *vaguely* pallatable is the ptr.with_addr(addr) -> ptr function, which takes a pointer and an address and creates a new pointer with that address and the provenance of the input pointer. In this way the "chain of custody" is completely and dynamically restored, making the model suitable even for dynamic checkers like CHERI and Miri. This is not a formal model, but lots of the docs discussing the model have been updated to try to the *concept* of this design in the hopes that it can be iterated on. Many new methods have been added to ptr to attempt to fill in semantic gaps that this introduces, or to just get the ball rolling on "hey this is a problem that needs to be solved, here's a bad solution as a starting point".
1 parent 7b0bf9e commit c75caa5

File tree

3 files changed

+452
-81
lines changed

3 files changed

+452
-81
lines changed

library/core/src/ptr/const_ptr.rs

Lines changed: 136 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -60,44 +60,39 @@ impl<T: ?Sized> *const T {
6060

6161
/// Casts a pointer to its raw bits.
6262
///
63-
/// This is equivalent to `as usize`, but is more specific to enhance readability.
64-
/// The inverse method is [`from_bits`](#method.from_bits).
63+
/// In general, pointers cannot be understood as "just an integer"
64+
/// and cannot be created from one without additional context.
6565
///
66-
/// In particular, `*p as usize` and `p as usize` will both compile for
67-
/// pointers to numeric types but do very different things, so using this
68-
/// helps emphasize that reading the bits was intentional.
69-
///
70-
/// # Examples
71-
///
72-
/// ```
73-
/// #![feature(ptr_to_from_bits)]
74-
/// let array = [13, 42];
75-
/// let p0: *const i32 = &array[0];
76-
/// assert_eq!(<*const _>::from_bits(p0.to_bits()), p0);
77-
/// let p1: *const i32 = &array[1];
78-
/// assert_eq!(p1.to_bits() - p0.to_bits(), 4);
79-
/// ```
66+
/// If you would like to treat a pointer like an integer anyway,
67+
/// see [`addr`][#method.addr-1] and [`with_addr`][#method.with_addr-1] for the responsible
68+
/// way to do that.
8069
#[unstable(feature = "ptr_to_from_bits", issue = "91126")]
81-
pub fn to_bits(self) -> usize
70+
pub fn to_bits(self) -> [u8; core::mem::size_of::<*const ()>()]
8271
where
8372
T: Sized,
8473
{
85-
self as usize
74+
// SAFETY: I AM THE MAGIC
75+
unsafe { core::mem::transmute(self) }
8676
}
8777

8878
/// Creates a pointer from its raw bits.
8979
///
9080
/// This is equivalent to `as *const T`, but is more specific to enhance readability.
91-
/// The inverse method is [`to_bits`](#method.to_bits).
81+
/// The inverse method is [`to_bits`](#method.to_bits-1).
9282
///
9383
/// # Examples
9484
///
9585
/// ```
9686
/// #![feature(ptr_to_from_bits)]
9787
/// use std::ptr::NonNull;
98-
/// let dangling: *const u8 = NonNull::dangling().as_ptr();
99-
/// assert_eq!(<*const u8>::from_bits(1), dangling);
88+
/// let dangling: *mut u8 = NonNull::dangling().as_ptr();
89+
/// assert_eq!(<*mut u8>::from_bits(1), dangling);
10090
/// ```
91+
#[rustc_deprecated(
92+
since = "1.61.0",
93+
reason = "This design is incompatible with Pointer Provenance",
94+
suggestion = "from_addr"
95+
)]
10196
#[unstable(feature = "ptr_to_from_bits", issue = "91126")]
10297
pub fn from_bits(bits: usize) -> Self
10398
where
@@ -106,6 +101,85 @@ impl<T: ?Sized> *const T {
106101
bits as Self
107102
}
108103

104+
/// Gets the "address" portion of the pointer.
105+
///
106+
/// On most platforms this is a no-op, as the pointer is just an address,
107+
/// and is equivalent to the deprecated `ptr as usize` cast.
108+
///
109+
/// On more complicated platforms like CHERI and segmented architectures,
110+
/// this may remove some important metadata. See [`with_addr`][#method.with_addr-1] for
111+
/// details on this distinction and why it's important.
112+
#[unstable(feature = "strict_provenance", issue = "99999999")]
113+
pub fn addr(self) -> usize
114+
where
115+
T: Sized,
116+
{
117+
// FIXME(strict_provenance_magic): I am magic and should be a compiler intrinsic.
118+
self as usize
119+
}
120+
121+
/// Creates a new pointer with the given address.
122+
///
123+
/// This replaces the deprecated `usize as ptr` cast, which had
124+
/// fundamentally broken semantics because it couldn't restore
125+
/// *segment* and *provenance*.
126+
///
127+
/// A pointer semantically has 3 pieces of information associated with it:
128+
///
129+
/// * Segment: The address-space it is part of.
130+
/// * Provenance: An allocation (slice) that it is allowed to access.
131+
/// * Address: The actual address it points at.
132+
///
133+
/// The compiler and hardware need to properly understand all 3 of these
134+
/// values at all times to properly execute your code.
135+
///
136+
/// Segment and Provenance are implicitly defined by *how* a pointer is
137+
/// constructed and generally propagates verbatim to all derived pointers.
138+
/// It is therefore *impossible* to convert an address into a pointer
139+
/// on its own, because there is no way to know what its segment and
140+
/// provenance should be.
141+
///
142+
/// By introducing a "representative" pointer into the process we can
143+
/// properly construct a new pointer with *its* segment and provenance,
144+
/// just as any other derived pointer would. This *should* be equivalent
145+
/// to `wrapping_offset`ting the given pointer to the new address. See the
146+
/// docs for `wrapping_offset` for the restrictions this applies.
147+
///
148+
/// # Example
149+
///
150+
/// Here is an example of how to properly use this API to mess around
151+
/// with tagged pointers. Here we have a tag in the lowest bit:
152+
///
153+
/// ```text
154+
/// let my_tagged_ptr: *const T = ...;
155+
///
156+
/// // Get the address and do whatever bit tricks we like
157+
/// let addr = my_tagged_ptr.addr();
158+
/// let has_tag = (addr & 0x1) != 0;
159+
/// let real_addr = addr & !0x1;
160+
///
161+
/// // Reconstitute a pointer with the new address and use it
162+
/// let my_untagged_ptr = my_tagged_ptr.with_addr(real_addr);
163+
/// let val = *my_untagged_ptr;
164+
/// ```
165+
#[unstable(feature = "strict_provenance", issue = "99999999")]
166+
pub fn with_addr(self, addr: usize) -> Self
167+
where
168+
T: Sized,
169+
{
170+
// FIXME(strict_provenance_magic): I am magic and should be a compiler intrinsic.
171+
//
172+
// In the mean-time, this operation is defined to be "as if" it was
173+
// a wrapping_offset, so we can emulate it as such. This should properly
174+
// restore pointer provenance even under today's compiler.
175+
let self_addr = self.addr() as isize;
176+
let dest_addr = addr as isize;
177+
let offset = dest_addr.wrapping_sub(self_addr);
178+
179+
// This is the canonical desugarring of this operation
180+
self.cast::<u8>().wrapping_offset(offset).cast::<T>()
181+
}
182+
109183
/// Decompose a (possibly wide) pointer into its address and metadata components.
110184
///
111185
/// The pointer can be later reconstructed with [`from_raw_parts`].
@@ -305,10 +379,10 @@ impl<T: ?Sized> *const T {
305379
/// This operation itself is always safe, but using the resulting pointer is not.
306380
///
307381
/// The resulting pointer "remembers" the [allocated object] that `self` points to; it must not
308-
/// be used to read or write other allocated objects.
382+
/// be used to read or write other allocated objects. This is tracked by provenance.
309383
///
310-
/// In other words, `let z = x.wrapping_offset((y as isize) - (x as isize))` does *not* make `z`
311-
/// the same as `y` even if we assume `T` has size `1` and there is no overflow: `z` is still
384+
/// In other words, `let z = x.wrapping_offset((y.addr() as isize) - (x.addr() as isize))`
385+
/// does *not* make `z` the same as `y` even if we assume `T` has size `1` and there is no overflow: `z` is still
312386
/// attached to the object `x` is attached to, and dereferencing it is Undefined Behavior unless
313387
/// `x` and `y` point into the same allocated object.
314388
///
@@ -320,8 +394,39 @@ impl<T: ?Sized> *const T {
320394
///
321395
/// The delayed check only considers the value of the pointer that was dereferenced, not the
322396
/// intermediate values used during the computation of the final result. For example,
323-
/// `x.wrapping_offset(o).wrapping_offset(o.wrapping_neg())` is always the same as `x`. In other
324-
/// words, leaving the allocated object and then re-entering it later is permitted.
397+
/// `x.wrapping_offset(o).wrapping_offset(o.wrapping_neg())` is always the same as `x`...
398+
///
399+
/// Usually.
400+
///
401+
/// More work needs to be done to define the rules here, but on CHERI it is not *actually*
402+
/// a no-op to wrapping_offset a pointer to some random address and back again. For practical
403+
/// applications that actually need this, it *will* generally work, but if your offset is
404+
/// "too out of bounds" the system will mark your pointer as invalid, and subsequent reads
405+
/// will fault *as if* the pointer had been corrupted by a non-pointer instruction.
406+
///
407+
/// CHERI has a roughly 64-bit address space but its 128-bit pointers contain
408+
/// 3 ostensibly-address-space-sized values:
409+
///
410+
/// * 2 values for the "slice" that the pointer can access.
411+
/// * 1 value for the actuall address it points to.
412+
///
413+
/// To accomplish this, CHERI compresses the values and even requires large allocations
414+
/// to have higher alignment to free up extra bits. This compression scheme can support
415+
/// the pointer being offset outside of the slice, but only to an extent. A *generous*
416+
/// extent, but a limited one nonetheless. To quote CHERI's documenation:
417+
///
418+
/// > With 27 bits of the capability used for bounds, CHERI-MIPS and 64-bit
419+
/// > CHERI-RISC-V provide the following guarantees:
420+
/// >
421+
/// > * A pointer is able to travel at least 1⁄4 the size of the object, or 2 KiB,
422+
/// > whichever is greater, above its upper bound.
423+
/// > * It is able to travel at least 1⁄8 the size of the object, or 1 KiB,
424+
/// > whichever is greater, below its lower bound.
425+
///
426+
/// Needless to say, any scheme that relies on reusing the least significant bits
427+
/// of a pointer based on alignment is going to be fine. Any scheme which tries
428+
/// to set *high* bits isn't going to work, but that was *already* extremely
429+
/// platform-specific and not at all portable.
325430
///
326431
/// [`offset`]: #method.offset
327432
/// [allocated object]: crate::ptr#allocated-object
@@ -427,10 +532,10 @@ impl<T: ?Sized> *const T {
427532
/// ```rust,no_run
428533
/// let ptr1 = Box::into_raw(Box::new(0u8)) as *const u8;
429534
/// let ptr2 = Box::into_raw(Box::new(1u8)) as *const u8;
430-
/// let diff = (ptr2 as isize).wrapping_sub(ptr1 as isize);
535+
/// let diff = (ptr2.addr() as isize).wrapping_sub(ptr1.addr() as isize);
431536
/// // Make ptr2_other an "alias" of ptr2, but derived from ptr1.
432537
/// let ptr2_other = (ptr1 as *const u8).wrapping_offset(diff);
433-
/// assert_eq!(ptr2 as usize, ptr2_other as usize);
538+
/// assert_eq!(ptr2.addr(), ptr2_other.addr());
434539
/// // Since ptr2_other and ptr2 are derived from pointers to different objects,
435540
/// // computing their offset is undefined behavior, even though
436541
/// // they point to the same address!
@@ -653,7 +758,7 @@ impl<T: ?Sized> *const T {
653758
/// The resulting pointer "remembers" the [allocated object] that `self` points to; it must not
654759
/// be used to read or write other allocated objects.
655760
///
656-
/// In other words, `let z = x.wrapping_add((y as usize) - (x as usize))` does *not* make `z`
761+
/// In other words, `let z = x.wrapping_add((y.addr()) - (x.addr()))` does *not* make `z`
657762
/// the same as `y` even if we assume `T` has size `1` and there is no overflow: `z` is still
658763
/// attached to the object `x` is attached to, and dereferencing it is Undefined Behavior unless
659764
/// `x` and `y` point into the same allocated object.
@@ -715,7 +820,7 @@ impl<T: ?Sized> *const T {
715820
/// The resulting pointer "remembers" the [allocated object] that `self` points to; it must not
716821
/// be used to read or write other allocated objects.
717822
///
718-
/// In other words, `let z = x.wrapping_sub((x as usize) - (y as usize))` does *not* make `z`
823+
/// In other words, `let z = x.wrapping_sub((x.addr()) - (y.addr()))` does *not* make `z`
719824
/// the same as `y` even if we assume `T` has size `1` and there is no overflow: `z` is still
720825
/// attached to the object `x` is attached to, and dereferencing it is Undefined Behavior unless
721826
/// `x` and `y` point into the same allocated object.
@@ -1003,7 +1108,7 @@ impl<T> *const [T] {
10031108
/// use std::ptr;
10041109
///
10051110
/// let slice: *const [i8] = ptr::slice_from_raw_parts(ptr::null(), 3);
1006-
/// assert_eq!(slice.as_ptr(), 0 as *const i8);
1111+
/// assert_eq!(slice.as_ptr(), ptr::null());
10071112
/// ```
10081113
#[inline]
10091114
#[unstable(feature = "slice_ptr_get", issue = "74265")]

0 commit comments

Comments
 (0)