Skip to content

fix(install): serialize cusum threshold crossing via cas#554

Draft
imjustprism wants to merge 2 commits intoendevco:mainfrom
imjustprism:fix/cusum-race
Draft

fix(install): serialize cusum threshold crossing via cas#554
imjustprism wants to merge 2 commits intoendevco:mainfrom
imjustprism:fix/cusum-race

Conversation

@imjustprism
Copy link
Copy Markdown
Contributor

@imjustprism imjustprism commented May 9, 2026

Fixed

  • RegimeDetector::record race where two threads near the threshold both fire Rising for one regime change

Why

old path:

let pos = self.pos.fetch_add(dev, Relaxed).saturating_add(dev);
if pos >= threshold {
    self.pos.store(0, Relaxed);
    return Rising;
}

thread A and thread B both observe their post-add pos >= threshold (B's add happens after A's, so B sees an even larger sum). both reset to 0. both return Rising. limiter applies 0.7x shrink twice (~0.49x) for one regime change. floor bounds damage but throughput recovery slows.

Changed

CAS-guarded threshold crossing. only the thread that successfully transitions the over-threshold value to 0 returns Rising. racers see 0 on retry, treat their delta as a fresh accumulation against the cleared sum.

mirror logic on the Falling (neg) path.

Verified

check result
cargo fmt --check clean
cargo clippy --all-targets -D warnings clean
cargo test (1453 tests across all crates) pass
install --diag full on 132-direct / 2839-pkg fixture clean, 69s wall
install on vue fixture (306 pkgs) clean, 18.9s
install on next fixture clean, 8.9s

bats not run (windows host lacks flock). CI bats covers it.


View in Codesmith
Need help on this PR? Tag @codesmith with what you need.

  • Let Codesmith autofix CI failures and bot reviews

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 9, 2026

Greptile Summary

This PR replaces the fetch_add + conditional store(0) pattern in RegimeDetector::record with a CAS loop so that only one thread can claim a threshold crossing per event. Both the pos (Rising) and neg (Falling) paths are updated symmetrically.

  • Race fix: the CAS ensures only the one thread that transitions the over-threshold value to 0 returns a signal; all concurrent losers receive Stable.
  • Single-sample edge case: the Err(observed) if observed < current guard prevents a loser from re-entering the threshold arm after the winner has already cleared the accumulator — correctly handling samples whose individual deviation alone meets the threshold. The mirror guard on the neg path is identical in structure.
  • Loser accumulation: losing threads do a fetch_add(abs_dev) onto the freshly-cleared accumulator so their deviation is counted toward the next genuine threshold crossing rather than being silently dropped.

Confidence Score: 5/5

The change is a well-scoped concurrency fix with no new data paths, no schema changes, and symmetric treatment of both accumulator arms.

The CAS loop correctly serializes threshold crossings, the observed < current guard closes the single-sample edge case from the previous review thread, and the loser fetch_add preserves deviation contributions for future accumulation. No correctness issues remain.

No files require special attention.

Important Files Changed

Filename Overview
crates/aube-util/src/adaptive.rs Replaces racy fetch_add+conditional-store with a CAS loop in both the pos (Rising) and neg (Falling) paths; the observed < current guard correctly suppresses losers even when a single sample's deviation alone meets the threshold, which addresses the edge case the previous review flagged.

Reviews (2): Last reviewed commit: "fixes" | Re-trigger Greptile

Comment thread crates/aube-util/src/adaptive.rs
@imjustprism imjustprism marked this pull request as draft May 9, 2026 23:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant