feat(repair): reduce qm_cids HeadObject calls with bucket list index#199
feat(repair): reduce qm_cids HeadObject calls with bucket list index#199raymondjacobson merged 3 commits intomainfrom
Conversation
Replace per-key bucket.Attributes (HeadObject) calls during qm_cids cleanup with a single bucket.List that builds an in-memory presence index. For nodes with millions of qm_cids entries, this collapses millions of HeadObject calls into one paginated ListObjects call. Gated behind OPENAUDIO_REPAIR_QM_CIDS_USE_LIST_INDEX=true (off by default). Falls back gracefully to per-key attrs if the index build fails. Uses existing tracker.Counters for observability. Also fixes the ValidateCID dead-code bug (checked wrong error variable) and the dropFromMyBucket error variable scoping issue. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
aef294e to
83f5d44
Compare
|
Adding the operator-side evidence we gathered, since the simpler shape here is the right upstream path. Our live canaries on the heavier proof branch were validating the same core mechanism this PR keeps: replacing per-key Representative live result from the clean cohort window:
So from our side the core list-index mechanism is real. The thing that did not survive review was the extra upstream surface area around it. Happy to help validate or supply more rollout evidence if useful. |
|
One thing I still thought was worth tightening here was direct coverage for the new list-index path itself. I put together a tiny support branch on my fork with just three focused tests and no behavior changes:
Branch: Focused run on that support branch: go test ./pkg/mediorum/server -run 'TestBuildRepairPresenceIndexIncludesLocalBlob|TestRepairCidWithPresenceIndexUsesListedState|TestRepairCidUsesKnownPresentOutsideCleanup' -count=1\n```\n\nI’m not trying to reopen the broader `#198` surface here. This is just the smallest direct coverage I thought might make the simpler path easier to trust. |
|
@RolfAris Thanks for the canary data and the tests — cherry-picked them onto the branch. The 66x reduction numbers are great to have as baseline evidence for this approach. |
Summary
Simplified alternative to #198. Same core optimization — replace per-key
bucket.Attributes(HeadObject) calls during qm_cids cleanup with a singlebucket.Listthat builds an in-memory presence map — but without the shadow comparison machinery, cadence controls, evidence tracker, or extra env vars.What changed:
repairPresenceIndextype: ~47 lines, just a map and a build functionrepairCidaccepts an optional presence index; when present, does a map lookup instead of HeadObjectOPENAUDIO_REPAIR_QM_CIDS_USE_LIST_INDEX=true(off by default)tracker.Countersfor observability (qm_cids_list_index_hit,qm_cids_list_index_miss,qm_cids_list_index_entries,qm_cids_list_index_build_fail)Also fixes (from review of #198 and #175):
ValidateCIDdead-code bug:if err != nil->if errVal != nilSeenKeysinvalidation after invalid CID deletiondropFromMyBucketerror variable scoping (err =->err :=)What was intentionally left out vs #198:
OPENAUDIO_REPAIR_CLEANUP_EVERY/OPENAUDIO_REPAIR_QM_CIDS_CLEANUP_EVERYcadence knobsrepairSourceEvidenceTracker(119 lines, 10 methods)MediorumConfigstruct~107 lines changed vs ~872 in #198, same HeadObject reduction.
Test plan
go build ./pkg/mediorum/server/go vet ./pkg/mediorum/server/go test ./pkg/mediorum/server/ -run TestRepairOPENAUDIO_REPAIR_QM_CIDS_USE_LIST_INDEX=true, observeqm_cids_list_index_hit/qm_cids_list_index_misscounters in repair tracker🤖 Generated with Claude Code