[IR Container] Phase 2.6 Concurrency & Thread Safety by mdavis36 · Pull Request #5971 · NVIDIA/Fuser

mdavis36 · 2026-02-18T02:58:52Z

Summary

Add std::shared_mutex to IrContainer for concurrent read access during parallel compilation, remove the kPhase2DisableParallelCompile serialization guard introduced in PR 1, and validate that the full test suite passes with parallel compilation re-enabled.

This is a future-proofing and defensive correctness change. Phase 2's makeFusion path does NOT share containers (each segment gets its own container via the default constructor), so parallel compilation is technically safe without the mutex. However, Phase 3 will change makeFusion to use the copy constructor (shared container), at which point multiple threads will write to the same IrContainer concurrently. The mutex must be in place before Phase 3 can enable that.

The Nested Call Problem and ContainerMutator

Five Fusion methods directly access IrContainer's internal fields because statement registration was moved from IrContainer to Fusion previously:

registerVal()  → writes vals_up_, vals_, per_fusion_vals_
registerExpr() → writes exprs_up_, exprs_, per_fusion_exprs_   (calls removeExpr for SSA)
removeVal()    → writes vals_up_, vals_, per_fusion_vals_      (calls removeExpr)
removeExpr()   → writes exprs_up_, exprs_, per_fusion_exprs_
removeStatementsCreatedAfter() → calls all of the above

removeVal() calls removeExpr(), and registerExpr() also calls removeExpr(). Since std::shared_mutex is not recursive, acquiring unique_lock in both the outer and inner methods would deadlock.

The solution is a two-layer locking architecture:

Layer 1: IrContainer public methods (self-locking)
  Read methods:  std::shared_lock(mutex_)    — concurrent reads OK
  Write methods: std::unique_lock(mutex_)   — exclusive access

Layer 2: Fusion methods that bypass IrContainer (ContainerMutator)
  Public method: acquires std::unique_lock(ir_container()->mutex_)
  Delegates to:  ContainerMutator static methods (lock-free, direct field access)
  Nested calls:  go through ContainerMutator → safe, already under lock

ContainerMutator is forward-declared in fusion.h (2 lines) and fully defined in fusion.cpp. This keeps the header clean and makes the locking architecture self-documenting: everything inside ContainerMutator assumes the lock is already held.

Thread Safety Analysis

                        Phase 2                          Phase 3
                        ───────                          ───────
makeFusion behavior:    Default ctor + Fusion::copy      Copy ctor (shared container)
Container sharing:      No (each segment gets its own)   Yes (scalars reused)
Thread safety needed:   No (reads only on completeFusion) Yes (concurrent writes)

Dead Code Removal

Investigation revealed that IrContainer::copy() and IrContainer::swap() have zero call sites — all copy/move/swap semantics are handled at the Fusion level after previous work. Removing them eliminates ~45 lines of dead code and avoids complex dual-locking patterns.

Relationship to Phase 2

This PR completes the Phase 2 architectural work. With thread safety in place, the full shared scalar infrastructure is ready for Phase 3:

CI Risk

Low-medium. This is the first CI run with parallel compilation re-enabled since PR #5961 serialized it. Any latent concurrency issues would surface here. The parallel compilation path doesn't share containers in Phase 2, so the mutex is defensive — but re-enabling parallelism exercises the full concurrent codegen pipeline.

mdavis36 · 2026-02-18T02:58:58Z

!test

github-actions · 2026-02-18T02:59:48Z

Description

Add std::shared_mutex to IrContainer for thread-safe concurrent access
Implement ContainerMutator PIMPL pattern with lock-free static methods
Add shared_lock for reads and unique_lock for writes in all container methods
Remove kPhase2DisableParallelCompile guard to re-enable parallel compilation
Delete dead IrContainer::copy() and IrContainer::swap() methods

Changes walkthrough

Relevant files

Enhancement

container.h `Add shared_mutex and move method implementations` csrc/ir/container.h Add #include and mutable std::shared_mutex mutex_ member Move method implementations from header to cpp file (unordered_exprs, vals, numExprs, numVals) Add private helper method declarations (inContainerImpl, assertInContainerImpl) Remove static copy() and swap() method declarations	+14/-20
container.cpp `Add locking to all container methods` csrc/ir/container.cpp Add std::shared_lock to all read-only methods (deterministic_vals, deterministic_exprs, etc.) Add std::unique_lock to all write methods (addFusion, removeFusion, etc.) Add locking to all container state access methods Remove dead IrContainer::swap() and IrContainer::copy() implementations Add inContainerImpl and assertInContainerImpl helper methods	+49/-48
fusion.h `Add ContainerMutator PIMPL forward declaration` csrc/fusion.h Add forward declaration of ContainerMutator struct Make ContainerMutator a friend struct for access to container internals	+4/-0
fusion.cpp `Implement ContainerMutator PIMPL with lock-free methods` csrc/fusion.cpp Add ContainerMutator struct with static lock-free methods (removeExpr, removeVal, registerVal, registerExpr, removeStatementsCreatedAfter) Replace Fusion method implementations with unique_lock + delegation to ContainerMutator Add std::unique_lock to Fusion::swap method Move all mutation logic to ContainerMutator static methods	+213/-198
fusion_kernel_runtime.cpp `Re-enable parallel compilation` csrc/runtime/fusion_kernel_runtime.cpp Remove kPhase2DisableParallelCompile guard constant Re-enable parallel compilation by removing the disable condition	+2/-5

PR Reviewer Guide

Here are some key observations to aid the review process:

🧪 No relevant tests

⚡ Recommended focus areas for review

Lock ordering consistency

The PR uses std::unique_lock in Fusion methods and std::shared_lock in IrContainer methods. Need to verify that the locking order is consistent across the codebase to prevent potential deadlocks, especially when Fusion methods call IrContainer methods that might also acquire locks.

void Fusion::removeExpr(Expr* expr) {
  std::unique_lock lock(ir_container()->mutex_);
  ContainerMutator::removeExpr(this, expr);
}

void Fusion::removeVal(Val* val) {
  std::unique_lock lock(ir_container()->mutex_);
  ContainerMutator::removeVal(this, val);
}

void Fusion::removeStatementsCreatedAfter(
    int64_t num_exprs_before,
    int64_t num_vals_before) {
  std::unique_lock lock(ir_container()->mutex_);
  ContainerMutator::removeStatementsCreatedAfter(
      this, num_exprs_before, num_vals_before);
}

Thread safety of ContainerMutator delegation

The ContainerMutator static methods are called while holding unique_lock on ir_container()->mutex_, but they access ir_container() internals directly. Need to ensure this pattern is safe and doesn't introduce race conditions when multiple threads are accessing different Fusion instances that share the same IrContainer.

    const noexcept {
  std::shared_lock lock(mutex_);
  std::unordered_map<Val*, int64_t> vals_map;
  int64_t count = 0;
  std::transform(
      vals_up_.begin(),
      vals_up_.end(),
      std::inserter(vals_map, vals_map.end()),
      [&count](const std::unique_ptr<Val>& val_up) {
        return std::make_pair(val_up.get(), count++);
      });
  return vals_map;
}

//! Return mapping from expression to integer id
const std::unordered_map<Expr*, int64_t> IrContainer::deterministic_exprs_map()
    const noexcept {
  std::shared_lock lock(mutex_);
  std::unordered_map<Expr*, int64_t> exprs_map;
  int64_t count = 0;
  std::transform(
      exprs_up_.begin(),
      exprs_up_.end(),
      std::inserter(exprs_map, exprs_map.end()),
      [&count](const std::unique_ptr<Expr>& expr_up) {
        return std::make_pair(expr_up.get(), count++);
      });
  return exprs_map;
}

IrContainer::IrContainer() = default;

IrContainer::~IrContainer() {
  clear();
}

void IrContainer::clear() noexcept {
  FUSER_PERF_SCOPE("IrContainer clear");
  vals_.clear();
  vals_up_.clear();
  exprs_.clear();
  exprs_up_.clear();
  val_type_name_map_.clear();
  expr_name_counter_ = 0;
  per_fusion_vals_.clear();
  per_fusion_exprs_.clear();
}

bool IrContainer::inContainer(const Statement* const_stmt) const {
  std::shared_lock lock(mutex_);
  return inContainerImpl(const_stmt);
}

bool IrContainer::inContainerImpl(const Statement* const_stmt) const {
  // We don't use dynamic_cast here because `const_stmt` may be an invalid
  // pointer. Specifically a pointer to a Statement owned by another container
  // that has been freed.

  // NOLINTNEXTLINE(cppcoreguidelines-pro-type-const-cast)
  void* raw_ptr = const_cast<void*>(reinterpret_cast<const void*>(const_stmt));
  if (exprs_.count(reinterpret_cast<Expr*>(raw_ptr)) == 0 &&
      vals_.count(reinterpret_cast<Val*>(raw_ptr)) == 0) {
    return false;
  }

  NVF_ERROR(
      sharing_fusions_.count(const_stmt->container()) > 0,
      "Container claims to own stmt, but stmt disagrees.");

  // NOLINTNEXTLINE(cppcoreguidelines-pro-type-const-cast)
  auto* stmt = const_cast<Statement*>(const_stmt);
  if (stmt->isExpr()) {
    NVF_ERROR(
        exprs_.find(stmt->as<Expr>()) != exprs_.end(),
        "Somehow container claims to and not to own an Expr.");
  }
  if (stmt->isVal()) {
    NVF_ERROR(
        vals_.find(stmt->as<Val>()) != vals_.end(),
        "Somehow container claims to and not to own an Val.");
  }

  return true;
}

void IrContainer::assertInContainerImpl(
    const Statement* stmt,
    const std::string& msg) const {
  NVF_CHECK(inContainerImpl(stmt), msg, " it was not found in the active container.");
}

const std::unordered_set<Expr*>& IrContainer::unordered_exprs() const noexcept {
  std::shared_lock lock(mutex_);
  return exprs_;
}

const std::unordered_set<Val*>& IrContainer::vals() const noexcept {
  std::shared_lock lock(mutex_);
  return vals_;
}

int64_t IrContainer::numExprs() const noexcept {
  std::shared_lock lock(mutex_);
  return std::ssize(exprs_);
}

int64_t IrContainer::numVals() const noexcept {
  std::shared_lock lock(mutex_);
  return std::ssize(vals_up_);
}

void IrContainer::addFusion(Fusion* fusion) {
  std::unique_lock lock(mutex_);
  sharing_fusions_.insert(fusion);
}

void IrContainer::removeFusion(Fusion* fusion) {
  std::unique_lock lock(mutex_);
  sharing_fusions_.erase(fusion);
}

void IrContainer::transferFusion(Fusion* from, Fusion* to) {
  std::unique_lock lock(mutex_);
  sharing_fusions_.erase(from);
  sharing_fusions_.insert(to);
}

size_t IrContainer::sharingCount() const {
  std::shared_lock lock(mutex_);
  return sharing_fusions_.size();
}

bool IrContainer::hasMultipleFusions() const {
  std::shared_lock lock(mutex_);
  return sharing_fusions_.size() > 1;
}

const std::unordered_set<Fusion*>& IrContainer::sharingFusions() const {
  std::shared_lock lock(mutex_);
  return sharing_fusions_;
}

const std::unordered_set<Val*>& IrContainer::valsOwnedBy(
    const Fusion* fusion) const {
  std::shared_lock lock(mutex_);
  static const std::unordered_set<Val*> empty;
  auto it = per_fusion_vals_.find(fusion);
  return it != per_fusion_vals_.end() ? it->second : empty;
}

const std::unordered_set<Expr*>& IrContainer::exprsOwnedBy(
    const Fusion* fusion) const {
  std::shared_lock lock(mutex_);
  static const std::unordered_set<Expr*> empty;
  auto it = per_fusion_exprs_.find(fusion);
  return it != per_fusion_exprs_.end() ? it->second : empty;
}

void IrContainer::transferStatementOwnership(
    const Fusion* from,
    const Fusion* to) {
  std::unique_lock lock(mutex_);
  auto vals_it = per_fusion_vals_.find(from);
  if (vals_it != per_fusion_vals_.end()) {
    auto& to_vals = per_fusion_vals_[to];
    to_vals.insert(vals_it->second.begin(), vals_it->second.end());
    per_fusion_vals_.erase(vals_it);
  }

  auto exprs_it = per_fusion_exprs_.find(from);
  if (exprs_it != per_fusion_exprs_.end()) {
    auto& to_exprs = per_fusion_exprs_[to];
    to_exprs.insert(exprs_it->second.begin(), exprs_it->second.end());
    per_fusion_exprs_.erase(exprs_it);
  }
}

void IrContainer::removeStatementsOwnedBy(const Fusion* fusion) {
  std::unique_lock lock(mutex_);
  auto vals_it = per_fusion_vals_.find(fusion);
  if (vals_it != per_fusion_vals_.end()) {
    for (auto it = vals_up_.begin(); it != vals_up_.end();) {
      if (vals_it->second.count(it->get()) > 0) {
        vals_.erase(it->get());
        it = vals_up_.erase(it);
      } else {
        ++it;
      }
    }
    per_fusion_vals_.erase(vals_it);
  }

  auto exprs_it = per_fusion_exprs_.find(fusion);
  if (exprs_it != per_fusion_exprs_.end()) {
    for (auto it = exprs_up_.begin(); it != exprs_up_.end();) {

Parallel compilation re-enabling

The kPhase2DisableParallelCompile guard is removed, re-enabling parallel compilation. This is a significant change that should be thoroughly tested with concurrent fusion compilation scenarios to ensure the mutex implementation provides adequate protection under load.

    if (num_groups == 1 ||
        isOptionDisabled(DisableOption::ParallelCompile)) {
      compileKernel(group_runtime_inputs, group_to_run);
    } else {
      // launch compileKernel thread here
      getThreadPool()->run([this,
                            &group_runtime_inputs,
                            group_to_run,
                            &thread_pool_error_message,
                            &thread_pool_error_message_mutex]() {
        FUSER_PERF_SCOPE("FusionKernelRuntime::compileFusionParallel");
        try {
          compileKernel(group_runtime_inputs, group_to_run);
        } catch (const std::exception& e) {
          // Set flag inside lambda so we can throw an exception after thread
          // pool completes its work.
          const std::lock_guard<std::mutex> lock(
              thread_pool_error_message_mutex);
          std::stringstream ss;
          ss << thread_pool_error_message
             << "\nError from segmentation group " << group_to_run->groupId()
             << ": " << e.what() << "\n";
          thread_pool_error_message = ss.str();
        }
      });
    }
  }
} catch (const std::exception& e) {
  // Before cleaning up unique_ptr-backed resources such as
  // SegmentedGroup, make sure all threads are done as they may
  // be still using the resources.
  getThreadPool()->waitWorkComplete();
  throw;
}

if (num_groups != 1 &&
    !isOptionDisabled(DisableOption::ParallelCompile)) {
  // Wait until all segments finish compiling
  getThreadPool()->waitWorkComplete();
  NVF_ERROR(

Test failures

(Low, 1) Tensor numerical mismatches in nvFuser HopperMatmulTest suite

Test Name H100 Source

HopperMatmulTest.PingPongPersistent ❌ Link

Moved special values (`zero_val_`, `one_val_`, `true_val_`, `false_val_`, `magic_zero_val_`) from `IrContainer` to the `Fusion` class. This ensures that with shared containers, each Fusion has its own special values, preventing ownership conflicts when one Fusion is destroyed. **Option Implemented:** Option A (Move Special Values to Fusion) as recommended in the prompt. Added private members and public accessors to Fusion class: ```cpp // Phase 2: Per-Fusion special values // With shared containers, each Fusion needs its own special values. // These are raw pointers - memory is owned by IrContainer's vals_up_. // Destroying this Fusion removes these vals via removeStatementsOwnedBy(). Val* zero_val_ = nullptr; Val* one_val_ = nullptr; Val* true_val_ = nullptr; Val* false_val_ = nullptr; NamedScalar* magic_zero_val_ = nullptr; ``` Public accessors: - `Val* zeroVal()` - Returns Index 0 - `Val* oneVal()` - Returns Index 1 - `Val* falseVal()` - Returns Bool false - `Val* trueVal()` - Returns Bool true - `NamedScalar* magicZeroVal()` - Returns magic zero named scalar - `Val* zeroVal(DataType dtype)` - Returns 0 for specified dtype - `Val* oneVal(DataType dtype)` - Returns 1 for specified dtype Implemented lazy creation pattern for all special value accessors: ```cpp Val* Fusion::zeroVal() { if (!zero_val_) { zero_val_ = IrBuilder::createInContainer<Val>(this, 0L, DataType::Index); } return zero_val_; } // Similar implementations for oneVal(), falseVal(), trueVal(), magicZeroVal() ``` Updated `Fusion::clear()` to reset special value pointers: ```cpp // Reset per-Fusion special values (they'll be recreated lazily if needed) // The actual Val objects were removed by removeStatementsOwnedBy above. zero_val_ = nullptr; one_val_ = nullptr; true_val_ = nullptr; false_val_ = nullptr; magic_zero_val_ = nullptr; ``` Removed special value members and added documentation comment: ```cpp // Note: Special values (zero_val_, one_val_, true_val_, false_val_, // magic_zero_val_) are now per-Fusion, stored in Fusion class. // This avoids ownership conflicts when multiple Fusions share an IrContainer. // See Fusion::zeroVal(), etc. for the per-Fusion implementation. ``` Removed special value accessor implementations (they're now in Fusion). All call sites were already updated to use `fusion->zeroVal()` instead of `ir_container()->zeroVal()`. Verified with grep that no call sites remain using the old pattern. Added 8 new unit tests for Task 7: 1. **PerFusionSpecialValuesBasic** - Tests that special values are created and owned by the Fusion 2. **SpecialValuesOwnedByFusion** - Tests that special values are tracked in `ownedVals()` 3. **SeparateFusionsHaveOwnSpecialValues** - Tests that two Fusions have different special value objects 4. **DestroyFusionDoesNotAffectOther** - Tests that destroying one Fusion doesn't affect another's special values 5. **SpecialValuesLazyCreation** - Tests that same value is returned on repeated calls 6. **AllSpecialValuesPerFusion** - Tests all five special value accessors 7. **SpecialValuesClearedOnFusionClear** - Tests that `clear()` resets special values 8. **SpecialValuesWithDtype** - Tests `zeroVal(dtype)` and `oneVal(dtype)` accessors ``` [==========] Running 34 tests from 3 test suites. [ PASSED ] 34 tests. ``` ``` [==========] Running 26 tests from 1 test suite. [ PASSED ] 26 tests. ``` Including 8 new Task 7 tests: - `Phase2ContainerTest.PerFusionSpecialValuesBasic` - PASSED - `Phase2ContainerTest.SpecialValuesOwnedByFusion` - PASSED - `Phase2ContainerTest.SeparateFusionsHaveOwnSpecialValues` - PASSED - `Phase2ContainerTest.DestroyFusionDoesNotAffectOther` - PASSED - `Phase2ContainerTest.SpecialValuesLazyCreation` - PASSED - `Phase2ContainerTest.AllSpecialValuesPerFusion` - PASSED - `Phase2ContainerTest.SpecialValuesClearedOnFusionClear` - PASSED - `Phase2ContainerTest.SpecialValuesWithDtype` - PASSED - `csrc/fusion.h` - Added special value members and accessors - `csrc/fusion.cpp` - Added accessor implementations, updated `clear()` - `csrc/ir/container.h` - Removed special values, added comment - `csrc/ir/container.cpp` - Removed accessor implementations - `tests/cpp/test_phase2_container_sharing.cpp` - Added 8 unit tests - [x] Each Fusion has its own special values - [x] Destroying Fusion A doesn't affect Fusion B's special values - [x] Special value accessors (`zeroVal()`, `oneVal()`, etc.) return this Fusion's values - [x] Lazy creation still works (create on first access) - [x] Smoke tests pass (34/34) - [x] Unit tests added (8 tests) - [x] Unit tests pass (26/26 Phase 2 tests) - [x] Code compiles without errors - [x] REPORT.md delivered 1. **Memory ownership:** Special values are raw pointers stored in Fusion, but the actual memory is owned by IrContainer's `vals_up_`. When a Fusion is destroyed, `removeStatementsOwnedBy()` cleans up these vals. 2. **Lazy creation pattern:** Special values are created on first access. This matches the original IrContainer behavior and avoids creating values that aren't needed. 3. **Clear handling:** `Fusion::clear()` now resets special value pointers to nullptr after `removeStatementsOwnedBy()` removes the actual Val objects. This ensures lazy recreation works correctly after clear. 4. **Copy/move handling:** Will be addressed in Tasks 5 and 6. This task just moves the members and accessors.

Moved `axioms_` and `metadata_` from `IrContainer` to the `Fusion` class. This completes the deprecation of `parent_` usage for val-creating methods, which was necessary because `parent_` implies a 1-1 relationship (container → Fusion), but Phase 2 has 1-many (shared containers). Methods that used `parent_` to create vals were moved to Fusion: - `metadataOf(Val*)` - Now uses `v->container()` to get owning Fusion - `axioms()` - Now creates axiom vals owned by `this` Fusion - `assumePositive/assumeNonNegative` - Now adds to `this` Fusion's axioms - Added `axioms_` and `metadata_` private members - Changed method declarations from forwarding to actual implementations - Added includes for `ir/builder.h` and `ir/internal_nodes.h` - Implemented `metadataOf()`, `axioms()`, `assumePositive()`, `assumeNonNegative()` methods - Updated `clear()` to reset `axioms_` and `metadata_` - Removed `metadataOf()`, `axioms()`, `assumePositive()`, `assumeNonNegative()` declarations - Removed `lazyInitAxioms()` declaration - Removed `axioms_` and `metadata_` members - Removed implementations of above methods - Updated `IrContainer::swap` to remove axioms_/metadata_ swapping - Updated `IrContainer::copy` to remove axioms_/metadata_ handling - Updated `IrContainer::clear` to remove axioms_/metadata_ clearing Each Fusion now has its own axioms and metadata cache. This ensures: 1. No ownership conflicts when multiple Fusions share an IrContainer 2. Correct behavior when one Fusion is destroyed (doesn't affect others) 3. Lazy creation pattern preserved (create on first access) This is a prerequisite for the copy/move semantics changes which will swap/transfer these per-Fusion members.

- Add missing swap of axioms_ and metadata_ in Fusion::swap to prevent dangling pointers after move/assignment - Add missing cloning of axioms_ and metadata_ in Fusion::copy to preserve custom assumptions and metadata cache across copies - Guard Fusion::removeVal against removing cached special vals - Use std::unique_ptr for special vals and steal from vals_up_ to preserve the original invariant (shortcuts in vals_ but not vals_up_) - Fix metadataOf to use 'this' instead of v->container()

The old IrContainer approach popped special vals (zeroVal, oneVal, etc.) from vals_up_ after creation. During Fusion::copy, these vals were not cloned through the normal deterministic_vals() path. Instead, they were first cloned during axiom cloning, which happened AFTER val_type_name_map_ was overridden from the source — causing the name counter to be incremented 1 past the source value. Now that special vals remain in vals_up_, they are properly cloned before the counter override, so the counter stays accurate. This shifts loop index val names down by 1 (e.g., i113 instead of i114). The index expression structure is unchanged.

Special vals (trueVal, falseVal, oneVal, etc.) can be lazily created inside a StatementGuard scope (e.g. by simplifyExpr called from haveDifferentShardings). When the guard rolls back, it pops vals_up_ back to the snapshot, destroying those vals while the Fusion cache pointers still reference them. Subsequent calls return dangling pointers causing UB — this manifested as LoopShardedSplitReshapeIds incorrectly classifying a reshape as resharding on CI. Fusion::removeStatementsCreatedAfter now nulls out any special val cache pointers that are about to be destroyed, so they get re-created on next access.

SubstituteInExpr directly sets mutations_[reference] = substitute without checking reference == substitute, unlike registerMutation which guards against this. With per-Fusion special vals, Fusion::copy now maps zero_val_ through the cloner so that IterDomain extents and zero_val_ share the same pointer. When concretizeEmptyExtents finds an extent that IS zero_val_, SubstituteInExpr created a self-mapping that tripped the two-hop assertion in maybeMutated. Why this didn't happen before: Old code (main): zero_val_ was stored in a separate unique_ptr, popped from vals_up_. Fusion::copy didn't wire it up — B->zeroVal() lazily created a brand new Val, so ext != zero always held. New code (this branch): zero_val_ lives in vals_up_ like any other Val. Fusion::copy remaps it via ir_cloner.clone(), so B->zero_val_ IS the same cloned Val that IterDomain extents reference: Fusion A Fusion B (clone) ┌─────────────────┐ ┌──────────────────┐ │ zero_val_ ─► 0x1111 │ zero_val_ ─► 0x2222 │ id->extent() ─► 0x1111 │ id->extent() ─► 0x2222 └─────────────────┘ └──────────────────┘ clone maps 0x1111 → 0x2222 So ext == zero, and SubstituteInExpr(ext, zero) created: mutations_[0x2222] = 0x2222 (self-mapping) Then maybeMutated looked up 0x2222, found itself, treated it as a two-hop chain, and asserted.

Inlines registerVal, registerExpr, removeVal, and removeExpr logic directly into Fusion, eliminating the delegation to IrContainer. This consolidates the registration path after per-Fusion special values were moved from IrContainer to Fusion. Also removes vestigial friend class StatementGuard from IrContainer (it only uses public Fusion API) and adds Fusion as a friend of IrContainerPasskey so it can construct passkeys for setName() calls.

Change Fusion::ir_container_ from unique_ptr to shared_ptr to enable future container sharing between Fusions. Add Fusion tracking API to IrContainer (addFusion/removeFusion/transferFusion/sharingCount). Remove IrContainer::parent_ since the 1:1 relationship no longer holds. Disable parallel compilation during the shared_ptr transition.

mdavis36 · 2026-02-18T06:38:00Z

!test

greptile-apps · 2026-02-18T06:42:11Z

Greptile Summary

This PR adds std::shared_mutex to IrContainer and implements a two-layer locking architecture to prepare for Phase 3's shared container model.

Key Changes:

Introduces ContainerMutator PIMPL pattern in fusion.cpp to handle nested mutation calls (removeVal → removeExpr) without deadlock, since std::shared_mutex is non-recursive
All IrContainer public methods now use shared_lock for reads and unique_lock for writes
Fusion mutation methods acquire unique_lock once and delegate to lock-free ContainerMutator static methods
Removes ~45 lines of dead code (IrContainer::copy() and IrContainer::swap() had zero call sites)
Re-enables parallel compilation by removing kPhase2DisableParallelCompile guard

Thread Safety:

Phase 2 safe: containers aren't shared across threads (each segment gets its own via default constructor)
Phase 3 ready: mutex infrastructure in place for when makeFusion uses copy constructor (shared containers)
Known limitation documented: reference-returning methods (vals(), unordered_exprs(), etc.) release locks immediately, which will need addressing when containers are actually shared in Phase 3

Risk Assessment:
The implementation correctly prevents deadlocks through the lock-free ContainerMutator pattern. Re-enabling parallel compilation exercises the concurrent codegen pipeline, but risk is low since Phase 2 doesn't share containers.

Confidence Score: 4/5

Safe to merge for Phase 2; defensive infrastructure for Phase 3 with documented limitations
The locking architecture is well-designed and correctly prevents the nested-call deadlock through ContainerMutator. Phase 2 safety is high since containers aren't shared across threads. Score reduced from 5 due to reference lifetime issues that will need addressing in Phase 3
Monitor csrc/ir/container.cpp reference-returning methods in Phase 3 when containers become shared

Important Files Changed

Filename	Overview
csrc/fusion.cpp	Implemented `ContainerMutator` with lock-free mutation methods; refactored 5 Fusion methods to use two-layer locking architecture; added mutex acquisition in `swap()`
csrc/ir/container.h	Added `std::shared_mutex mutex_` member; moved inline methods to .cpp for locking; removed dead `copy()` and `swap()` methods; added lock-free `*Impl()` helpers
csrc/ir/container.cpp	Implemented shared/unique locking for all accessor methods; documented `clear()` mutex assumptions; removed ~45 lines of dead code
csrc/runtime/fusion_kernel_runtime.cpp	Removed `kPhase2DisableParallelCompile` guard, re-enabling parallel compilation with mutex protection now in place

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Fusion::registerVal/Expr/removeVal/Expr] --> B[Acquire unique_lock on mutex_]
    B --> C[Call ContainerMutator static method]
    C --> D[Direct field access - lock-free]
    D --> E[May call nested ContainerMutator methods]
    E --> D
    D --> F[Release lock when function returns]
    
    G[IrContainer public accessors] --> H{Read or Write?}
    H -->|Read| I[Acquire shared_lock]
    H -->|Write| J[Acquire unique_lock]
    I --> K[Access internal data]
    J --> K
    K --> L[Return/Release lock]
    
    M[Nested Call Example:<br/>removeVal → removeExpr] --> N[Lock acquired once in removeVal]
    N --> O[ContainerMutator::removeVal<br/>calls ContainerMutator::removeExpr]
    O --> P[Both execute under same lock<br/>No deadlock!]
    
    style B fill:#90EE90
    style C fill:#87CEEB
    style I fill:#FFD700
    style J fill:#FF6B6B
    style P fill:#90EE90

_{Last reviewed commit: 31bccb9}

greptile-apps

_{5 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-18T06:42:14Z

csrc/ir/container.cpp

+const std::unordered_set<Expr*>& IrContainer::unordered_exprs() const noexcept {
+  std::shared_lock lock(mutex_);
+  return exprs_;
+}
+
+const std::unordered_set<Val*>& IrContainer::vals() const noexcept {
+  std::shared_lock lock(mutex_);
+  return vals_;
+}


Reference outlives lock scope

unordered_exprs(), vals(), sharingFusions(), valsOwnedBy(), and exprsOwnedBy() all return const& to internal data, but the shared_lock is released when the function returns. This means callers hold an unprotected reference to the data structure, and any concurrent writer (under unique_lock) could mutate or invalidate it.

This is safe in Phase 2 (containers are not shared across threads), but in Phase 3 where multiple threads write to the same IrContainer, this pattern will produce data races. Consider documenting this as a known limitation to address in Phase 3, or returning by value for these accessors (at the cost of a copy).

greptile-apps · 2026-02-18T06:42:19Z

Additional Comments (1)

csrc/ir/container.cpp
clear() does not acquire the mutex

IrContainer::clear() modifies all internal data structures (vals_, exprs_, per_fusion_vals_, etc.) without acquiring mutex_. While clear() is only called from IrContainer::~IrContainer() and Fusion::copy() (which is called in the copy constructor/assignment), there is no lock protection if a concurrent thread holds a shared lock on the same container. Since clear() is protected, this may be intentional (caller guarantees exclusive access), but it's worth a brief comment to document that assumption for Phase 3 safety.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

mdavis36 · 2026-02-18T16:02:24Z

!test

Add per_fusion_vals_ / per_fusion_exprs_ maps to IrContainer so each Fusion can efficiently query only its own statements in a shared container. Fusion forwarding methods (vals(), unordered_exprs(), deterministic_vals(), etc.) now return per-Fusion filtered results. Fusion::clear() uses removeStatementsOwnedBy(this) instead of ir_container()->clear().

Copy constructor now shares the source's container pointer instead of creating a new one. Fusion::copy clones directly from per-Fusion filtered vals rather than delegating to IrContainer::copy. Swap changed from content-based (IrContainer::swap) to pointer-based with per-Fusion ownership tracking for both same-container and different-container cases.

Move val/expr name counters from IrContainer to Fusion so each Fusion independently tracks name assignment. This fixes CI failures where Fusion::copy left the dest counter at N (number of cloned vals) instead of max(name)+1 when source names were non-sequential, causing newly created TVs to collide with existing names. The fix adds val_type_name_map_ and expr_name_counter_ to Fusion, and updates registerVal/registerExpr to use the Fusion-level counters. Fusion::copy syncs counters from source to dest after cloning. Fusion::swap exchanges counters. Fusion::clear resets them.

Add std::shared_mutex to IrContainer to protect shared mutable state during concurrent access from parallel compilation threads. - IrContainer public methods self-lock (shared_lock for reads, unique_lock for writes) - Fusion mutation methods (registerVal/Expr, removeVal/Expr, removeStatementsCreatedAfter) acquire unique_lock then delegate to lock-free ContainerMutator static methods, avoiding self-deadlock on nested calls (e.g., removeVal → removeExpr) - ContainerMutator is a PIMPL struct defined only in fusion.cpp, keeping lock-free impl methods out of the header - Remove kPhase2DisableParallelCompile guard, re-enabling parallel compilation now that the mutex is in place - Delete dead IrContainer::copy() and IrContainer::swap() methods

…tainers Update Fusion::removeStatementsCreatedAfter to compare per-Fusion counts (from exprsOwnedBy(this) and numValsExcludingShortcuts()) instead of global deque sizes. This correctly handles shared containers where other Fusions' statements would inflate the global counts. Add NVF_ERROR assertions to verify the LIFO invariant: the tail element of the global deque must belong to this Fusion. If violated, another Fusion appended concurrently (should be prevented by PR #5971 locking). Remove now-unnecessary deque size validation checks.

mdavis36 mentioned this pull request Feb 18, 2026

[Ir Refactor] shared_ptr IrContainer #5918

Closed

mdavis36 added 10 commits February 17, 2026 19:13

Cleanup comments

9a756c3

Ownership of special values belong to the container.

6711ba5

mdavis36 force-pushed the md/phase2-thread-safety branch from 2cac45c to 8fb976b Compare February 18, 2026 03:13

mdavis36 force-pushed the md/phase2-copy-move branch from 192fd55 to 35b7405 Compare February 18, 2026 03:13

This was referenced Feb 18, 2026

[IR Container] Phase 2.3 Basic shared ptr #5960

Open

[IR Container] Phase 2 IR Container Refactor #5975

Draft

mdavis36 changed the title ~~[IR Container] Phase2 thread safety~~ [IR Container] Phase 2.6 Concurrency & Thread Safety Feb 18, 2026

mdavis36 marked this pull request as ready for review February 18, 2026 06:38

greptile-apps bot reviewed Feb 18, 2026

View reviewed changes

mdavis36 added 6 commits February 25, 2026 16:22

Fix stale comment referencing removed parent backpointer model

c965408

Add mutex documentation to IrContainer::clear() for Phase 3 safety

31bccb9

mdavis36 force-pushed the md/phase2-thread-safety branch from 8fb976b to 31bccb9 Compare February 26, 2026 00:29

mdavis36 force-pushed the md/phase2-copy-move branch from 35b7405 to 88b2e60 Compare February 26, 2026 00:29

mdavis36 force-pushed the md/phase2-copy-move branch 2 times, most recently from a9c62ea to 46080be Compare March 3, 2026 02:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[IR Container] Phase 2.6 Concurrency & Thread Safety#5971

[IR Container] Phase 2.6 Concurrency & Thread Safety#5971
mdavis36 wants to merge 16 commits intomd/phase2-copy-movefrom
md/phase2-thread-safety

mdavis36 commented Feb 18, 2026 •

edited

Loading

Uh oh!

mdavis36 commented Feb 18, 2026

Uh oh!

github-actions bot commented Feb 18, 2026 •

edited by xwang233

Loading

Changes walkthrough

PR Reviewer Guide

Test failures

Uh oh!

mdavis36 commented Feb 18, 2026

Uh oh!

greptile-apps bot commented Feb 18, 2026 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Feb 18, 2026

Uh oh!

greptile-apps bot commented Feb 18, 2026

Uh oh!

mdavis36 commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mdavis36 commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

The Nested Call Problem and ContainerMutator

Thread Safety Analysis

Dead Code Removal

Relationship to Phase 2

CI Risk

Uh oh!

mdavis36 commented Feb 18, 2026

Uh oh!

github-actions bot commented Feb 18, 2026 • edited by xwang233 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes walkthrough

PR Reviewer Guide

Test failures

Uh oh!

mdavis36 commented Feb 18, 2026

Uh oh!

greptile-apps bot commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Feb 18, 2026

Uh oh!

mdavis36 commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mdavis36 commented Feb 18, 2026 •

edited

Loading

github-actions bot commented Feb 18, 2026 •

edited by xwang233

Loading

greptile-apps bot commented Feb 18, 2026 •

edited

Loading