Skip to content

Commit b01bd37

Browse files
committed
Cranelift: implement "precise store traps" in presence of store-tearing hardware.
As discussed at in bytecodealliance#7237 and WebAssembly/design#1490, some instruction-set architectures do not guarantee that a store that "partially traps" (overlaps multiple pages, only one of which disallows the store) does not also have some side-effects. In particular, the part of the store that *is* legal might succeed. This has fascinating implications for virtual memory-based WebAssembly heap implementations: when a store is partially out-of-bounds, it should trap (there is no "partially" here: if the last byte is out of bounds, the store is out of bounds). A trapping store should not alter the final memory state, which is observable by the outside world after the trap. Yet, the simple lowering of a Wasm store to a machine store instruction could violate this expectation, in the presence of "store tearing" as described above. Neither ARMv8 (aarch64) nor RISC-V guarantee lack of store-tearing, and we have observed it in tests on RISC-V. This PR implements the idea first proposed [here], namely to prepend a load of the same size to every store. The idea is that if the store will trap, the load will as well; and precise exception handling, a well-established principle in all modern ISAs, guarantees that instructions beyond a trapping instruction have no effect. This is off by default, and is mainly meant as an option to study the impact of this idea and to allow for precise trap execution semantics on affected machines unless/until the spec is clarified. On an Apple M2 Max machine (aarch64), this was measured to have a 2% performance impact when running `spidermonkey.wasm` with a simple recursive Fibonacci program. It can be used via the `-Ccranelift-ensure_precise_store_traps=true` flag to Wasmtime. [here]: WebAssembly/design#1490 (comment)
1 parent 668e816 commit b01bd37

File tree

6 files changed

+181
-0
lines changed

6 files changed

+181
-0
lines changed

cranelift/codegen/meta/src/shared/settings.rs

+18
Original file line numberDiff line numberDiff line change
@@ -118,6 +118,24 @@ pub(crate) fn define() -> SettingGroup {
118118
false,
119119
);
120120

121+
settings.add_bool(
122+
"ensure_precise_store_traps",
123+
"Enable transform to ensure precise store traps in the presence of store-tearing hardware.",
124+
r#"
125+
This inserts a load before every store so that hardware that "tears" stores when
126+
part of the stored address range would fault due to a not-present page does not do so.
127+
This can be useful to ensure precise post-trap memory state: for example, a Wasm guest
128+
that traps on an unaligned partially-out-of-bounds store might otherwise produce
129+
incorrect final memory state on hardware that performs the in-bounds part of the store
130+
before trapping.
131+
132+
See also:
133+
* https://github.com/WebAssembly/design/issues/1490
134+
* https://github.com/bytecodealliance/wasmtime/issues/7237
135+
"#,
136+
false,
137+
);
138+
121139
settings.add_bool(
122140
"enable_pinned_reg",
123141
"Enable the use of the pinned register.",

cranelift/codegen/src/context.rs

+11
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ use crate::legalizer::simple_legalize;
2020
use crate::loop_analysis::LoopAnalysis;
2121
use crate::machinst::{CompiledCode, CompiledCodeStencil};
2222
use crate::nan_canonicalization::do_nan_canonicalization;
23+
use crate::precise_store_traps::do_precise_store_traps;
2324
use crate::remove_constant_phis::do_remove_constant_phis;
2425
use crate::result::{CodegenResult, CompileResult};
2526
use crate::settings::{FlagsOrIsa, OptLevel};
@@ -176,6 +177,9 @@ impl Context {
176177
if isa.flags().enable_nan_canonicalization() {
177178
self.canonicalize_nans(isa)?;
178179
}
180+
if isa.flags().ensure_precise_store_traps() {
181+
self.ensure_precise_store_traps(isa)?;
182+
}
179183

180184
self.legalize(isa)?;
181185

@@ -286,6 +290,13 @@ impl Context {
286290
self.verify_if(isa)
287291
}
288292

293+
/// Translate all stores to load-then-store pairs to ensure
294+
/// precise traps on store-tearing hardware.
295+
pub fn ensure_precise_store_traps(&mut self, isa: &dyn TargetIsa) -> CodegenResult<()> {
296+
do_precise_store_traps(&mut self.func);
297+
self.verify_if(isa)
298+
}
299+
289300
/// Run the legalizer for `isa` on the function.
290301
pub fn legalize(&mut self, isa: &dyn TargetIsa) -> CodegenResult<()> {
291302
// Legalization invalidates the domtree and loop_analysis by mutating the CFG.

cranelift/codegen/src/lib.rs

+1
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,7 @@ mod iterators;
7777
mod legalizer;
7878
mod nan_canonicalization;
7979
mod opts;
80+
mod precise_store_traps;
8081
mod remove_constant_phis;
8182
mod result;
8283
mod scoped_hash_map;
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
//! Precise-store-traps pass.
2+
//!
3+
//! On some instruction-set architectures, a store that crosses a page
4+
//! boundary such that one of the pages would fault on a write can
5+
//! sometimes still perform part of its memory update on the other
6+
//! page. This becomes relevant, and problematic, when page
7+
//! protections are load-bearing for Wasm VM semantics: see [this
8+
//! issue] where a partially-out-of-bounds store in Wasm is currently
9+
//! defined to perform no side-effect, but with a common lowering on
10+
//! several ISAs and on some microarchitectures does actually perform
11+
//! a "torn write".
12+
//!
13+
//! [this issue]: https://github.com/WebAssembly/design/issues/1490
14+
//!
15+
//! This pass performs a transform on CLIF that should avoid "torn
16+
//! partially-faulting stores" by performing a throwaway *load* before
17+
//! every store, of the same size and to the same address. This
18+
//! throwaway load will fault if the store would have faulted due to
19+
//! not-present pages (this still does nothing for
20+
//! readonly-page-faults). Because the load happens before the store
21+
//! in program order, if it faults, any ISA that guarantees precise
22+
//! exceptions (all ISAs that we support) will ensure that the store
23+
//! has no side-effects. (Microarchitecturally, once the faulting
24+
//! instruction retires, the later not-yet-retired entries in the
25+
//! store buffer will be flushed.)
26+
//!
27+
//! This is not on by default and remains an "experimental" option
28+
//! while the Wasm spec resolves this issue, and serves for now to
29+
//! allow collecting data on overheads and experimenting on affected
30+
//! machines.
31+
32+
use crate::cursor::{Cursor, FuncCursor};
33+
use crate::ir::types::*;
34+
use crate::ir::*;
35+
36+
fn covering_type_for_value(func: &Function, value: Value) -> Type {
37+
match func.dfg.value_type(value).bits() {
38+
8 => I8,
39+
16 => I16,
40+
32 => I32,
41+
64 => I64,
42+
128 => I8X16,
43+
_ => unreachable!(),
44+
}
45+
}
46+
47+
/// Perform the precise-store-traps transform on a function body.
48+
pub fn do_precise_store_traps(func: &mut Function) {
49+
let mut pos = FuncCursor::new(func);
50+
while let Some(_block) = pos.next_block() {
51+
while let Some(inst) = pos.next_inst() {
52+
match &pos.func.dfg.insts[inst] {
53+
&InstructionData::StackStore {
54+
opcode: _,
55+
arg: data,
56+
stack_slot,
57+
offset,
58+
} => {
59+
let ty = covering_type_for_value(&pos.func, data);
60+
let _ = pos.ins().stack_load(ty, stack_slot, offset);
61+
}
62+
&InstructionData::DynamicStackStore {
63+
opcode: _,
64+
arg: data,
65+
dynamic_stack_slot,
66+
} => {
67+
let ty = covering_type_for_value(&pos.func, data);
68+
let _ = pos.ins().dynamic_stack_load(ty, dynamic_stack_slot);
69+
}
70+
&InstructionData::Store {
71+
opcode,
72+
args,
73+
flags,
74+
offset,
75+
} => {
76+
let (data, addr) = (args[0], args[1]);
77+
let ty = match opcode {
78+
Opcode::Store => covering_type_for_value(&pos.func, data),
79+
Opcode::Istore8 => I8,
80+
Opcode::Istore16 => I16,
81+
Opcode::Istore32 => I32,
82+
_ => unreachable!(),
83+
};
84+
let _ = pos.ins().load(ty, flags, addr, offset);
85+
}
86+
&InstructionData::StoreNoOffset {
87+
opcode: Opcode::AtomicStore,
88+
args,
89+
flags,
90+
} => {
91+
let (data, addr) = (args[0], args[1]);
92+
let ty = covering_type_for_value(&pos.func, data);
93+
let _ = pos.ins().atomic_load(ty, flags, addr);
94+
}
95+
&InstructionData::AtomicCas { .. } | &InstructionData::AtomicRmw { .. } => {
96+
// Nothing: already does a read before the write.
97+
}
98+
&InstructionData::NullAry {
99+
opcode: Opcode::Debugtrap,
100+
} => {
101+
// Marked as `can_store`, but no concerns here.
102+
}
103+
inst => {
104+
assert!(!inst.opcode().can_store());
105+
}
106+
}
107+
}
108+
}
109+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
test optimize
2+
set opt_level=speed
3+
set ensure_precise_store_traps=true
4+
target x86_64
5+
6+
function %f0(i64) {
7+
block0(v0: i64):
8+
v1 = iconst.i64 0
9+
store.i64 v1, v0
10+
return
11+
}
12+
13+
; check: load.i64 v0
14+
; check: store v1, v0
15+
16+
function %f1(i64, i8x16) {
17+
block0(v0: i64, v1: i8x16):
18+
store.i64 v1, v0
19+
return
20+
}
21+
22+
; check: load.i8x16 v0
23+
; check: store v1, v0
24+
25+
function %f2(i64, i64) {
26+
block0(v0: i64, v1: i64):
27+
istore8 v1, v0
28+
return
29+
}
30+
31+
; check: load.i8 v0
32+
; check: istore8 v1, v0
33+
34+
function %f3(i64, i64) {
35+
block0(v0: i64, v1: i64):
36+
atomic_store.i64 v1, v0
37+
return
38+
}
39+
40+
; check: atomic_load.i64 v0
41+
; check: atomic_store v1, v0

crates/wasmtime/src/engine.rs

+1
Original file line numberDiff line numberDiff line change
@@ -314,6 +314,7 @@ impl Engine {
314314
"enable_heap_access_spectre_mitigation"
315315
| "enable_table_access_spectre_mitigation"
316316
| "enable_nan_canonicalization"
317+
| "ensure_precise_store_traps"
317318
| "enable_jump_tables"
318319
| "enable_float"
319320
| "enable_verifier"

0 commit comments

Comments
 (0)