[IR Container] Phase 2.6 Concurrency & Thread Safety#5971
[IR Container] Phase 2.6 Concurrency & Thread Safety#5971mdavis36 wants to merge 16 commits intomd/phase2-copy-movefrom
Conversation
|
!test |
Description
|
| Relevant files | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Enhancement |
|
PR Reviewer Guide
Here are some key observations to aid the review process:
| 🧪 No relevant tests |
| ⚡ Recommended focus areas for review |
Lock ordering consistency
|
Test failures
-
(Low, 1)
Tensor numerical mismatches in nvFuser HopperMatmulTest suiteTest Name H100 Source HopperMatmulTest.PingPongPersistent ❌ Link
Moved special values (`zero_val_`, `one_val_`, `true_val_`,
`false_val_`, `magic_zero_val_`) from `IrContainer` to the `Fusion`
class. This ensures that with shared containers, each Fusion has its own
special values, preventing ownership conflicts when one Fusion is
destroyed.
**Option Implemented:** Option A (Move Special Values to Fusion) as
recommended in the prompt.
Added private members and public accessors to Fusion class:
```cpp
// Phase 2: Per-Fusion special values
// With shared containers, each Fusion needs its own special values.
// These are raw pointers - memory is owned by IrContainer's vals_up_.
// Destroying this Fusion removes these vals via
removeStatementsOwnedBy().
Val* zero_val_ = nullptr;
Val* one_val_ = nullptr;
Val* true_val_ = nullptr;
Val* false_val_ = nullptr;
NamedScalar* magic_zero_val_ = nullptr;
```
Public accessors:
- `Val* zeroVal()` - Returns Index 0
- `Val* oneVal()` - Returns Index 1
- `Val* falseVal()` - Returns Bool false
- `Val* trueVal()` - Returns Bool true
- `NamedScalar* magicZeroVal()` - Returns magic zero named scalar
- `Val* zeroVal(DataType dtype)` - Returns 0 for specified dtype
- `Val* oneVal(DataType dtype)` - Returns 1 for specified dtype
Implemented lazy creation pattern for all special value accessors:
```cpp
Val* Fusion::zeroVal() {
if (!zero_val_) {
zero_val_ = IrBuilder::createInContainer<Val>(this, 0L,
DataType::Index);
}
return zero_val_;
}
// Similar implementations for oneVal(), falseVal(), trueVal(),
magicZeroVal()
```
Updated `Fusion::clear()` to reset special value pointers:
```cpp
// Reset per-Fusion special values (they'll be recreated lazily if
needed)
// The actual Val objects were removed by removeStatementsOwnedBy above.
zero_val_ = nullptr;
one_val_ = nullptr;
true_val_ = nullptr;
false_val_ = nullptr;
magic_zero_val_ = nullptr;
```
Removed special value members and added documentation comment:
```cpp
// Note: Special values (zero_val_, one_val_, true_val_, false_val_,
// magic_zero_val_) are now per-Fusion, stored in Fusion class.
// This avoids ownership conflicts when multiple Fusions share an
IrContainer.
// See Fusion::zeroVal(), etc. for the per-Fusion implementation.
```
Removed special value accessor implementations (they're now in Fusion).
All call sites were already updated to use `fusion->zeroVal()` instead
of `ir_container()->zeroVal()`. Verified with grep that no call sites
remain using the old pattern.
Added 8 new unit tests for Task 7:
1. **PerFusionSpecialValuesBasic** - Tests that special values are
created and owned by the Fusion
2. **SpecialValuesOwnedByFusion** - Tests that special values are
tracked in `ownedVals()`
3. **SeparateFusionsHaveOwnSpecialValues** - Tests that two Fusions have
different special value objects
4. **DestroyFusionDoesNotAffectOther** - Tests that destroying one
Fusion doesn't affect another's special values
5. **SpecialValuesLazyCreation** - Tests that same value is returned on
repeated calls
6. **AllSpecialValuesPerFusion** - Tests all five special value
accessors
7. **SpecialValuesClearedOnFusionClear** - Tests that `clear()` resets
special values
8. **SpecialValuesWithDtype** - Tests `zeroVal(dtype)` and
`oneVal(dtype)` accessors
```
[==========] Running 34 tests from 3 test suites.
[ PASSED ] 34 tests.
```
```
[==========] Running 26 tests from 1 test suite.
[ PASSED ] 26 tests.
```
Including 8 new Task 7 tests:
- `Phase2ContainerTest.PerFusionSpecialValuesBasic` - PASSED
- `Phase2ContainerTest.SpecialValuesOwnedByFusion` - PASSED
- `Phase2ContainerTest.SeparateFusionsHaveOwnSpecialValues` - PASSED
- `Phase2ContainerTest.DestroyFusionDoesNotAffectOther` - PASSED
- `Phase2ContainerTest.SpecialValuesLazyCreation` - PASSED
- `Phase2ContainerTest.AllSpecialValuesPerFusion` - PASSED
- `Phase2ContainerTest.SpecialValuesClearedOnFusionClear` - PASSED
- `Phase2ContainerTest.SpecialValuesWithDtype` - PASSED
- `csrc/fusion.h` - Added special value members and accessors
- `csrc/fusion.cpp` - Added accessor implementations, updated `clear()`
- `csrc/ir/container.h` - Removed special values, added comment
- `csrc/ir/container.cpp` - Removed accessor implementations
- `tests/cpp/test_phase2_container_sharing.cpp` - Added 8 unit tests
- [x] Each Fusion has its own special values
- [x] Destroying Fusion A doesn't affect Fusion B's special values
- [x] Special value accessors (`zeroVal()`, `oneVal()`, etc.) return
this Fusion's values
- [x] Lazy creation still works (create on first access)
- [x] Smoke tests pass (34/34)
- [x] Unit tests added (8 tests)
- [x] Unit tests pass (26/26 Phase 2 tests)
- [x] Code compiles without errors
- [x] REPORT.md delivered
1. **Memory ownership:** Special values are raw pointers stored in
Fusion, but the actual memory is owned by IrContainer's `vals_up_`. When
a Fusion is destroyed, `removeStatementsOwnedBy()` cleans up these vals.
2. **Lazy creation pattern:** Special values are created on first
access. This matches the original IrContainer behavior and avoids
creating values that aren't needed.
3. **Clear handling:** `Fusion::clear()` now resets special value
pointers to nullptr after `removeStatementsOwnedBy()` removes the actual
Val objects. This ensures lazy recreation works correctly after clear.
4. **Copy/move handling:** Will be addressed in Tasks 5 and 6. This task
just moves the members and accessors.
Moved `axioms_` and `metadata_` from `IrContainer` to the `Fusion` class. This completes the deprecation of `parent_` usage for val-creating methods, which was necessary because `parent_` implies a 1-1 relationship (container → Fusion), but Phase 2 has 1-many (shared containers). Methods that used `parent_` to create vals were moved to Fusion: - `metadataOf(Val*)` - Now uses `v->container()` to get owning Fusion - `axioms()` - Now creates axiom vals owned by `this` Fusion - `assumePositive/assumeNonNegative` - Now adds to `this` Fusion's axioms - Added `axioms_` and `metadata_` private members - Changed method declarations from forwarding to actual implementations - Added includes for `ir/builder.h` and `ir/internal_nodes.h` - Implemented `metadataOf()`, `axioms()`, `assumePositive()`, `assumeNonNegative()` methods - Updated `clear()` to reset `axioms_` and `metadata_` - Removed `metadataOf()`, `axioms()`, `assumePositive()`, `assumeNonNegative()` declarations - Removed `lazyInitAxioms()` declaration - Removed `axioms_` and `metadata_` members - Removed implementations of above methods - Updated `IrContainer::swap` to remove axioms_/metadata_ swapping - Updated `IrContainer::copy` to remove axioms_/metadata_ handling - Updated `IrContainer::clear` to remove axioms_/metadata_ clearing Each Fusion now has its own axioms and metadata cache. This ensures: 1. No ownership conflicts when multiple Fusions share an IrContainer 2. Correct behavior when one Fusion is destroyed (doesn't affect others) 3. Lazy creation pattern preserved (create on first access) This is a prerequisite for the copy/move semantics changes which will swap/transfer these per-Fusion members.
- Add missing swap of axioms_ and metadata_ in Fusion::swap to prevent dangling pointers after move/assignment - Add missing cloning of axioms_ and metadata_ in Fusion::copy to preserve custom assumptions and metadata cache across copies - Guard Fusion::removeVal against removing cached special vals - Use std::unique_ptr for special vals and steal from vals_up_ to preserve the original invariant (shortcuts in vals_ but not vals_up_) - Fix metadataOf to use 'this' instead of v->container()
The old IrContainer approach popped special vals (zeroVal, oneVal, etc.) from vals_up_ after creation. During Fusion::copy, these vals were not cloned through the normal deterministic_vals() path. Instead, they were first cloned during axiom cloning, which happened AFTER val_type_name_map_ was overridden from the source — causing the name counter to be incremented 1 past the source value. Now that special vals remain in vals_up_, they are properly cloned before the counter override, so the counter stays accurate. This shifts loop index val names down by 1 (e.g., i113 instead of i114). The index expression structure is unchanged.
Special vals (trueVal, falseVal, oneVal, etc.) can be lazily created inside a StatementGuard scope (e.g. by simplifyExpr called from haveDifferentShardings). When the guard rolls back, it pops vals_up_ back to the snapshot, destroying those vals while the Fusion cache pointers still reference them. Subsequent calls return dangling pointers causing UB — this manifested as LoopShardedSplitReshapeIds incorrectly classifying a reshape as resharding on CI. Fusion::removeStatementsCreatedAfter now nulls out any special val cache pointers that are about to be destroyed, so they get re-created on next access.
SubstituteInExpr directly sets mutations_[reference] = substitute
without checking reference == substitute, unlike registerMutation
which guards against this. With per-Fusion special vals, Fusion::copy
now maps zero_val_ through the cloner so that IterDomain extents and
zero_val_ share the same pointer. When concretizeEmptyExtents finds
an extent that IS zero_val_, SubstituteInExpr created a self-mapping
that tripped the two-hop assertion in maybeMutated.
Why this didn't happen before:
Old code (main):
zero_val_ was stored in a separate unique_ptr, popped from
vals_up_. Fusion::copy didn't wire it up — B->zeroVal() lazily
created a brand new Val, so ext != zero always held.
New code (this branch):
zero_val_ lives in vals_up_ like any other Val. Fusion::copy
remaps it via ir_cloner.clone(), so B->zero_val_ IS the same
cloned Val that IterDomain extents reference:
Fusion A Fusion B (clone)
┌─────────────────┐ ┌──────────────────┐
│ zero_val_ ─► 0x1111 │ zero_val_ ─► 0x2222
│ id->extent() ─► 0x1111 │ id->extent() ─► 0x2222
└─────────────────┘ └──────────────────┘
clone maps 0x1111 → 0x2222
So ext == zero, and SubstituteInExpr(ext, zero) created:
mutations_[0x2222] = 0x2222 (self-mapping)
Then maybeMutated looked up 0x2222, found itself, treated
it as a two-hop chain, and asserted.
Inlines registerVal, registerExpr, removeVal, and removeExpr logic directly into Fusion, eliminating the delegation to IrContainer. This consolidates the registration path after per-Fusion special values were moved from IrContainer to Fusion. Also removes vestigial friend class StatementGuard from IrContainer (it only uses public Fusion API) and adds Fusion as a friend of IrContainerPasskey so it can construct passkeys for setName() calls.
Change Fusion::ir_container_ from unique_ptr to shared_ptr to enable future container sharing between Fusions. Add Fusion tracking API to IrContainer (addFusion/removeFusion/transferFusion/sharingCount). Remove IrContainer::parent_ since the 1:1 relationship no longer holds. Disable parallel compilation during the shared_ptr transition.
2cac45c to
8fb976b
Compare
192fd55 to
35b7405
Compare
|
!test |
Greptile SummaryThis PR adds Key Changes:
Thread Safety:
Risk Assessment: Confidence Score: 4/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Fusion::registerVal/Expr/removeVal/Expr] --> B[Acquire unique_lock on mutex_]
B --> C[Call ContainerMutator static method]
C --> D[Direct field access - lock-free]
D --> E[May call nested ContainerMutator methods]
E --> D
D --> F[Release lock when function returns]
G[IrContainer public accessors] --> H{Read or Write?}
H -->|Read| I[Acquire shared_lock]
H -->|Write| J[Acquire unique_lock]
I --> K[Access internal data]
J --> K
K --> L[Return/Release lock]
M[Nested Call Example:<br/>removeVal → removeExpr] --> N[Lock acquired once in removeVal]
N --> O[ContainerMutator::removeVal<br/>calls ContainerMutator::removeExpr]
O --> P[Both execute under same lock<br/>No deadlock!]
style B fill:#90EE90
style C fill:#87CEEB
style I fill:#FFD700
style J fill:#FF6B6B
style P fill:#90EE90
Last reviewed commit: 31bccb9 |
| const std::unordered_set<Expr*>& IrContainer::unordered_exprs() const noexcept { | ||
| std::shared_lock lock(mutex_); | ||
| return exprs_; | ||
| } | ||
|
|
||
| const std::unordered_set<Val*>& IrContainer::vals() const noexcept { | ||
| std::shared_lock lock(mutex_); | ||
| return vals_; | ||
| } |
There was a problem hiding this comment.
Reference outlives lock scope
unordered_exprs(), vals(), sharingFusions(), valsOwnedBy(), and exprsOwnedBy() all return const& to internal data, but the shared_lock is released when the function returns. This means callers hold an unprotected reference to the data structure, and any concurrent writer (under unique_lock) could mutate or invalidate it.
This is safe in Phase 2 (containers are not shared across threads), but in Phase 3 where multiple threads write to the same IrContainer, this pattern will produce data races. Consider documenting this as a known limitation to address in Phase 3, or returning by value for these accessors (at the cost of a copy).
Additional Comments (1)
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time! |
|
!test |
Add per_fusion_vals_ / per_fusion_exprs_ maps to IrContainer so each Fusion can efficiently query only its own statements in a shared container. Fusion forwarding methods (vals(), unordered_exprs(), deterministic_vals(), etc.) now return per-Fusion filtered results. Fusion::clear() uses removeStatementsOwnedBy(this) instead of ir_container()->clear().
Copy constructor now shares the source's container pointer instead of creating a new one. Fusion::copy clones directly from per-Fusion filtered vals rather than delegating to IrContainer::copy. Swap changed from content-based (IrContainer::swap) to pointer-based with per-Fusion ownership tracking for both same-container and different-container cases.
Move val/expr name counters from IrContainer to Fusion so each Fusion independently tracks name assignment. This fixes CI failures where Fusion::copy left the dest counter at N (number of cloned vals) instead of max(name)+1 when source names were non-sequential, causing newly created TVs to collide with existing names. The fix adds val_type_name_map_ and expr_name_counter_ to Fusion, and updates registerVal/registerExpr to use the Fusion-level counters. Fusion::copy syncs counters from source to dest after cloning. Fusion::swap exchanges counters. Fusion::clear resets them.
Add std::shared_mutex to IrContainer to protect shared mutable state during concurrent access from parallel compilation threads. - IrContainer public methods self-lock (shared_lock for reads, unique_lock for writes) - Fusion mutation methods (registerVal/Expr, removeVal/Expr, removeStatementsCreatedAfter) acquire unique_lock then delegate to lock-free ContainerMutator static methods, avoiding self-deadlock on nested calls (e.g., removeVal → removeExpr) - ContainerMutator is a PIMPL struct defined only in fusion.cpp, keeping lock-free impl methods out of the header - Remove kPhase2DisableParallelCompile guard, re-enabling parallel compilation now that the mutex is in place - Delete dead IrContainer::copy() and IrContainer::swap() methods
8fb976b to
31bccb9
Compare
35b7405 to
88b2e60
Compare
…tainers Update Fusion::removeStatementsCreatedAfter to compare per-Fusion counts (from exprsOwnedBy(this) and numValsExcludingShortcuts()) instead of global deque sizes. This correctly handles shared containers where other Fusions' statements would inflate the global counts. Add NVF_ERROR assertions to verify the LIFO invariant: the tail element of the global deque must belong to this Fusion. If violated, another Fusion appended concurrently (should be prevented by PR #5971 locking). Remove now-unnecessary deque size validation checks.
a9c62ea to
46080be
Compare
Summary
Add
std::shared_mutexto IrContainer for concurrent read access during parallel compilation, remove thekPhase2DisableParallelCompileserialization guard introduced in PR 1, and validate that the full test suite passes with parallel compilation re-enabled.This is a future-proofing and defensive correctness change. Phase 2's
makeFusionpath does NOT share containers (each segment gets its own container via the default constructor), so parallel compilation is technically safe without the mutex. However, Phase 3 will changemakeFusionto use the copy constructor (shared container), at which point multiple threads will write to the sameIrContainerconcurrently. The mutex must be in place before Phase 3 can enable that.The Nested Call Problem and ContainerMutator
Five Fusion methods directly access IrContainer's internal fields because statement registration was moved from IrContainer to Fusion previously:
removeVal()callsremoveExpr(), andregisterExpr()also callsremoveExpr(). Sincestd::shared_mutexis not recursive, acquiringunique_lockin both the outer and inner methods would deadlock.The solution is a two-layer locking architecture:
ContainerMutatoris forward-declared infusion.h(2 lines) and fully defined infusion.cpp. This keeps the header clean and makes the locking architecture self-documenting: everything insideContainerMutatorassumes the lock is already held.Thread Safety Analysis
Dead Code Removal
Investigation revealed that
IrContainer::copy()andIrContainer::swap()have zero call sites — all copy/move/swap semantics are handled at the Fusion level after previous work. Removing them eliminates ~45 lines of dead code and avoids complex dual-locking patterns.Relationship to Phase 2
This PR completes the Phase 2 architectural work. With thread safety in place, the full shared scalar infrastructure is ready for Phase 3:
CI Risk
Low-medium. This is the first CI run with parallel compilation re-enabled since PR #5961 serialized it. Any latent concurrency issues would surface here. The parallel compilation path doesn't share containers in Phase 2, so the mutex is defensive — but re-enabling parallelism exercises the full concurrent codegen pipeline.