Skip to content

Commit 8b26f5c

Browse files
authored
Merge pull request #4057 from RalfJung/scfix
Fix weak memory emulation to avoid generating behaviors that are forbidden under C++ 20
2 parents f2349f5 + 4e7f53a commit 8b26f5c

File tree

6 files changed

+225
-241
lines changed

6 files changed

+225
-241
lines changed

README.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -68,9 +68,10 @@ Further caveats that Miri users should be aware of:
6868
not support networking. System API support varies between targets; if you run
6969
on Windows it is a good idea to use `--target x86_64-unknown-linux-gnu` to get
7070
better support.
71-
* Weak memory emulation may [produce weak behaviors](https://github.com/rust-lang/miri/issues/2301)
72-
when `SeqCst` fences are used that are not actually permitted by the Rust memory model, and it
73-
cannot produce all behaviors possibly observable on real hardware.
71+
* Weak memory emulation is not complete: there are legal behaviors that Miri will never produce.
72+
However, Miri produces many behaviors that are hard to observe on real hardware, so it can help
73+
quite a bit in finding weak memory concurrency bugs. To be really sure about complicated atomic
74+
code, use specialized tools such as [loom](https://github.com/tokio-rs/loom).
7475

7576
Moreover, Miri fundamentally cannot ensure that your code is *sound*. [Soundness] is the property of
7677
never causing undefined behavior when invoked from arbitrary safe code, even in combination with

src/concurrency/data_race.rs

Lines changed: 120 additions & 110 deletions
Original file line numberDiff line numberDiff line change
@@ -108,19 +108,19 @@ pub(super) struct ThreadClockSet {
108108
fence_acquire: VClock,
109109

110110
/// The last timestamp of happens-before relations that
111-
/// have been released by this thread by a fence.
111+
/// have been released by this thread by a release fence.
112112
fence_release: VClock,
113113

114-
/// Timestamps of the last SC fence performed by each
115-
/// thread, updated when this thread performs an SC fence
116-
pub(super) fence_seqcst: VClock,
117-
118114
/// Timestamps of the last SC write performed by each
119-
/// thread, updated when this thread performs an SC fence
115+
/// thread, updated when this thread performs an SC fence.
116+
/// This is never acquired into the thread's clock, it
117+
/// just limits which old writes can be seen in weak memory emulation.
120118
pub(super) write_seqcst: VClock,
121119

122120
/// Timestamps of the last SC fence performed by each
123-
/// thread, updated when this thread performs an SC read
121+
/// thread, updated when this thread performs an SC read.
122+
/// This is never acquired into the thread's clock, it
123+
/// just limits which old writes can be seen in weak memory emulation.
124124
pub(super) read_seqcst: VClock,
125125
}
126126

@@ -256,6 +256,106 @@ enum AccessType {
256256
AtomicRmw,
257257
}
258258

259+
/// Per-byte vector clock metadata for data-race detection.
260+
#[derive(Clone, PartialEq, Eq, Debug)]
261+
struct MemoryCellClocks {
262+
/// The vector-clock timestamp and the thread that did the last non-atomic write. We don't need
263+
/// a full `VClock` here, it's always a single thread and nothing synchronizes, so the effective
264+
/// clock is all-0 except for the thread that did the write.
265+
write: (VectorIdx, VTimestamp),
266+
267+
/// The type of operation that the write index represents,
268+
/// either newly allocated memory, a non-atomic write or
269+
/// a deallocation of memory.
270+
write_type: NaWriteType,
271+
272+
/// The vector-clock of all non-atomic reads that happened since the last non-atomic write
273+
/// (i.e., we join together the "singleton" clocks corresponding to each read). It is reset to
274+
/// zero on each write operation.
275+
read: VClock,
276+
277+
/// Atomic access, acquire, release sequence tracking clocks.
278+
/// For non-atomic memory this value is set to None.
279+
/// For atomic memory, each byte carries this information.
280+
atomic_ops: Option<Box<AtomicMemoryCellClocks>>,
281+
}
282+
283+
/// Extra metadata associated with a thread.
284+
#[derive(Debug, Clone, Default)]
285+
struct ThreadExtraState {
286+
/// The current vector index in use by the
287+
/// thread currently, this is set to None
288+
/// after the vector index has been re-used
289+
/// and hence the value will never need to be
290+
/// read during data-race reporting.
291+
vector_index: Option<VectorIdx>,
292+
293+
/// Thread termination vector clock, this
294+
/// is set on thread termination and is used
295+
/// for joining on threads since the vector_index
296+
/// may be re-used when the join operation occurs.
297+
termination_vector_clock: Option<VClock>,
298+
}
299+
300+
/// Global data-race detection state, contains the currently
301+
/// executing thread as well as the vector-clocks associated
302+
/// with each of the threads.
303+
// FIXME: it is probably better to have one large RefCell, than to have so many small ones.
304+
#[derive(Debug, Clone)]
305+
pub struct GlobalState {
306+
/// Set to true once the first additional
307+
/// thread has launched, due to the dependency
308+
/// between before and after a thread launch.
309+
/// Any data-races must be recorded after this
310+
/// so concurrent execution can ignore recording
311+
/// any data-races.
312+
multi_threaded: Cell<bool>,
313+
314+
/// A flag to mark we are currently performing
315+
/// a data race free action (such as atomic access)
316+
/// to suppress the race detector
317+
ongoing_action_data_race_free: Cell<bool>,
318+
319+
/// Mapping of a vector index to a known set of thread
320+
/// clocks, this is not directly mapping from a thread id
321+
/// since it may refer to multiple threads.
322+
vector_clocks: RefCell<IndexVec<VectorIdx, ThreadClockSet>>,
323+
324+
/// Mapping of a given vector index to the current thread
325+
/// that the execution is representing, this may change
326+
/// if a vector index is re-assigned to a new thread.
327+
vector_info: RefCell<IndexVec<VectorIdx, ThreadId>>,
328+
329+
/// The mapping of a given thread to associated thread metadata.
330+
thread_info: RefCell<IndexVec<ThreadId, ThreadExtraState>>,
331+
332+
/// Potential vector indices that could be re-used on thread creation
333+
/// values are inserted here on after the thread has terminated and
334+
/// been joined with, and hence may potentially become free
335+
/// for use as the index for a new thread.
336+
/// Elements in this set may still require the vector index to
337+
/// report data-races, and can only be re-used after all
338+
/// active vector-clocks catch up with the threads timestamp.
339+
reuse_candidates: RefCell<FxHashSet<VectorIdx>>,
340+
341+
/// We make SC fences act like RMWs on a global location.
342+
/// To implement that, they all release and acquire into this clock.
343+
last_sc_fence: RefCell<VClock>,
344+
345+
/// The timestamp of last SC write performed by each thread.
346+
/// Threads only update their own index here!
347+
last_sc_write_per_thread: RefCell<VClock>,
348+
349+
/// Track when an outdated (weak memory) load happens.
350+
pub track_outdated_loads: bool,
351+
}
352+
353+
impl VisitProvenance for GlobalState {
354+
fn visit_provenance(&self, _visit: &mut VisitWith<'_>) {
355+
// We don't have any tags.
356+
}
357+
}
358+
259359
impl AccessType {
260360
fn description(self, ty: Option<Ty<'_>>, size: Option<Size>) -> String {
261361
let mut msg = String::new();
@@ -309,30 +409,6 @@ impl AccessType {
309409
}
310410
}
311411

312-
/// Per-byte vector clock metadata for data-race detection.
313-
#[derive(Clone, PartialEq, Eq, Debug)]
314-
struct MemoryCellClocks {
315-
/// The vector-clock timestamp and the thread that did the last non-atomic write. We don't need
316-
/// a full `VClock` here, it's always a single thread and nothing synchronizes, so the effective
317-
/// clock is all-0 except for the thread that did the write.
318-
write: (VectorIdx, VTimestamp),
319-
320-
/// The type of operation that the write index represents,
321-
/// either newly allocated memory, a non-atomic write or
322-
/// a deallocation of memory.
323-
write_type: NaWriteType,
324-
325-
/// The vector-clock of all non-atomic reads that happened since the last non-atomic write
326-
/// (i.e., we join together the "singleton" clocks corresponding to each read). It is reset to
327-
/// zero on each write operation.
328-
read: VClock,
329-
330-
/// Atomic access, acquire, release sequence tracking clocks.
331-
/// For non-atomic memory this value is set to None.
332-
/// For atomic memory, each byte carries this information.
333-
atomic_ops: Option<Box<AtomicMemoryCellClocks>>,
334-
}
335-
336412
impl AtomicMemoryCellClocks {
337413
fn new(size: Size) -> Self {
338414
AtomicMemoryCellClocks {
@@ -803,9 +879,17 @@ pub trait EvalContextExt<'tcx>: MiriInterpCxExt<'tcx> {
803879
clocks.apply_release_fence();
804880
}
805881
if atomic == AtomicFenceOrd::SeqCst {
806-
data_race.last_sc_fence.borrow_mut().set_at_index(&clocks.clock, index);
807-
clocks.fence_seqcst.join(&data_race.last_sc_fence.borrow());
808-
clocks.write_seqcst.join(&data_race.last_sc_write.borrow());
882+
// Behave like an RMW on the global fence location. This takes full care of
883+
// all the SC fence requirements, including C++17 §32.4 [atomics.order]
884+
// paragraph 6 (which would limit what future reads can see). It also rules
885+
// out many legal behaviors, but we don't currently have a model that would
886+
// be more precise.
887+
let mut sc_fence_clock = data_race.last_sc_fence.borrow_mut();
888+
sc_fence_clock.join(&clocks.clock);
889+
clocks.clock.join(&sc_fence_clock);
890+
// Also establish some sort of order with the last SC write that happened, globally
891+
// (but this is only respected by future reads).
892+
clocks.write_seqcst.join(&data_race.last_sc_write_per_thread.borrow());
809893
}
810894

811895
// Increment timestamp in case of release semantics.
@@ -1463,80 +1547,6 @@ trait EvalContextPrivExt<'tcx>: MiriInterpCxExt<'tcx> {
14631547
}
14641548
}
14651549

1466-
/// Extra metadata associated with a thread.
1467-
#[derive(Debug, Clone, Default)]
1468-
struct ThreadExtraState {
1469-
/// The current vector index in use by the
1470-
/// thread currently, this is set to None
1471-
/// after the vector index has been re-used
1472-
/// and hence the value will never need to be
1473-
/// read during data-race reporting.
1474-
vector_index: Option<VectorIdx>,
1475-
1476-
/// Thread termination vector clock, this
1477-
/// is set on thread termination and is used
1478-
/// for joining on threads since the vector_index
1479-
/// may be re-used when the join operation occurs.
1480-
termination_vector_clock: Option<VClock>,
1481-
}
1482-
1483-
/// Global data-race detection state, contains the currently
1484-
/// executing thread as well as the vector-clocks associated
1485-
/// with each of the threads.
1486-
// FIXME: it is probably better to have one large RefCell, than to have so many small ones.
1487-
#[derive(Debug, Clone)]
1488-
pub struct GlobalState {
1489-
/// Set to true once the first additional
1490-
/// thread has launched, due to the dependency
1491-
/// between before and after a thread launch.
1492-
/// Any data-races must be recorded after this
1493-
/// so concurrent execution can ignore recording
1494-
/// any data-races.
1495-
multi_threaded: Cell<bool>,
1496-
1497-
/// A flag to mark we are currently performing
1498-
/// a data race free action (such as atomic access)
1499-
/// to suppress the race detector
1500-
ongoing_action_data_race_free: Cell<bool>,
1501-
1502-
/// Mapping of a vector index to a known set of thread
1503-
/// clocks, this is not directly mapping from a thread id
1504-
/// since it may refer to multiple threads.
1505-
vector_clocks: RefCell<IndexVec<VectorIdx, ThreadClockSet>>,
1506-
1507-
/// Mapping of a given vector index to the current thread
1508-
/// that the execution is representing, this may change
1509-
/// if a vector index is re-assigned to a new thread.
1510-
vector_info: RefCell<IndexVec<VectorIdx, ThreadId>>,
1511-
1512-
/// The mapping of a given thread to associated thread metadata.
1513-
thread_info: RefCell<IndexVec<ThreadId, ThreadExtraState>>,
1514-
1515-
/// Potential vector indices that could be re-used on thread creation
1516-
/// values are inserted here on after the thread has terminated and
1517-
/// been joined with, and hence may potentially become free
1518-
/// for use as the index for a new thread.
1519-
/// Elements in this set may still require the vector index to
1520-
/// report data-races, and can only be re-used after all
1521-
/// active vector-clocks catch up with the threads timestamp.
1522-
reuse_candidates: RefCell<FxHashSet<VectorIdx>>,
1523-
1524-
/// The timestamp of last SC fence performed by each thread
1525-
last_sc_fence: RefCell<VClock>,
1526-
1527-
/// The timestamp of last SC write performed by each thread
1528-
last_sc_write: RefCell<VClock>,
1529-
1530-
/// Track when an outdated (weak memory) load happens.
1531-
pub track_outdated_loads: bool,
1532-
}
1533-
1534-
impl VisitProvenance for GlobalState {
1535-
fn visit_provenance(&self, _visit: &mut VisitWith<'_>) {
1536-
// We don't have any tags.
1537-
}
1538-
}
1539-
15401550
impl GlobalState {
15411551
/// Create a new global state, setup with just thread-id=0
15421552
/// advanced to timestamp = 1.
@@ -1549,7 +1559,7 @@ impl GlobalState {
15491559
thread_info: RefCell::new(IndexVec::new()),
15501560
reuse_candidates: RefCell::new(FxHashSet::default()),
15511561
last_sc_fence: RefCell::new(VClock::default()),
1552-
last_sc_write: RefCell::new(VClock::default()),
1562+
last_sc_write_per_thread: RefCell::new(VClock::default()),
15531563
track_outdated_loads: config.track_outdated_loads,
15541564
};
15551565

@@ -1851,7 +1861,7 @@ impl GlobalState {
18511861
// SC ATOMIC STORE rule in the paper.
18521862
pub(super) fn sc_write(&self, thread_mgr: &ThreadManager<'_>) {
18531863
let (index, clocks) = self.active_thread_state(thread_mgr);
1854-
self.last_sc_write.borrow_mut().set_at_index(&clocks.clock, index);
1864+
self.last_sc_write_per_thread.borrow_mut().set_at_index(&clocks.clock, index);
18551865
}
18561866

18571867
// SC ATOMIC READ rule in the paper.

src/concurrency/weak_memory.rs

Lines changed: 14 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -11,10 +11,17 @@
1111
//! This implementation is not fully correct under the revised C++20 model and may generate behaviours C++20
1212
//! disallows (<https://github.com/rust-lang/miri/issues/2301>).
1313
//!
14-
//! A modification is made to the paper's model to partially address C++20 changes.
15-
//! Specifically, if an SC load reads from an atomic store of any ordering, then a later SC load cannot read from
16-
//! an earlier store in the location's modification order. This is to prevent creating a backwards S edge from the second
17-
//! load to the first, as a result of C++20's coherence-ordered before rules.
14+
//! Modifications are made to the paper's model to address C++20 changes:
15+
//! - If an SC load reads from an atomic store of any ordering, then a later SC load cannot read
16+
//! from an earlier store in the location's modification order. This is to prevent creating a
17+
//! backwards S edge from the second load to the first, as a result of C++20's coherence-ordered
18+
//! before rules. (This seems to rule out behaviors that were actually permitted by the RC11 model
19+
//! that C++20 intended to copy (<https://plv.mpi-sws.org/scfix/paper.pdf>); a change was
20+
//! introduced when translating the math to English. According to Viktor Vafeiadis, this
21+
//! difference is harmless. So we stick to what the standard says, and allow fewer behaviors.)
22+
//! - SC fences are treated like AcqRel RMWs to a global clock, to ensure they induce enough
23+
//! synchronization with the surrounding accesses. This rules out legal behavior, but it is really
24+
//! hard to be more precise here.
1825
//!
1926
//! Rust follows the C++20 memory model (except for the Consume ordering and some operations not performable through C++'s
2027
//! `std::atomic<T>` API). It is therefore possible for this implementation to generate behaviours never observable when the
@@ -138,6 +145,7 @@ struct StoreElement {
138145

139146
/// The timestamp of the storing thread when it performed the store
140147
timestamp: VTimestamp,
148+
141149
/// The value of this store. `None` means uninitialized.
142150
// FIXME: Currently, we cannot represent partial initialization.
143151
val: Option<Scalar>,
@@ -335,11 +343,6 @@ impl<'tcx> StoreBuffer {
335343
// then we cannot read-from anything earlier in modification order.
336344
// C++20 §6.9.2.2 [intro.races] paragraph 16
337345
false
338-
} else if store_elem.timestamp <= clocks.fence_seqcst[store_elem.store_index] {
339-
// The current load, which may be sequenced-after an SC fence, cannot read-before
340-
// the last store sequenced-before an SC fence in another thread.
341-
// C++17 §32.4 [atomics.order] paragraph 6
342-
false
343346
} else if store_elem.timestamp <= clocks.write_seqcst[store_elem.store_index]
344347
&& store_elem.is_seqcst
345348
{
@@ -356,9 +359,9 @@ impl<'tcx> StoreBuffer {
356359
false
357360
} else if is_seqcst && store_elem.load_info.borrow().sc_loaded {
358361
// The current SC load cannot read-before a store that an earlier SC load has observed.
359-
// See https://github.com/rust-lang/miri/issues/2301#issuecomment-1222720427
362+
// See https://github.com/rust-lang/miri/issues/2301#issuecomment-1222720427.
360363
// Consequences of C++20 §31.4 [atomics.order] paragraph 3.1, 3.3 (coherence-ordered before)
361-
// and 4.1 (coherence-ordered before between SC makes global total order S)
364+
// and 4.1 (coherence-ordered before between SC makes global total order S).
362365
false
363366
} else {
364367
true

0 commit comments

Comments
 (0)