feat(repair): reduce qm_cids HeadObject calls with bucket list index by raymondjacobson · Pull Request #199 · OpenAudio/go-openaudio

raymondjacobson · 2026-04-04T07:42:01Z

Summary

Simplified alternative to #198. Same core optimization — replace per-key bucket.Attributes (HeadObject) calls during qm_cids cleanup with a single bucket.List that builds an in-memory presence map — but without the shadow comparison machinery, cadence controls, evidence tracker, or extra env vars.

What changed:

New repairPresenceIndex type: ~47 lines, just a map and a build function
repairCid accepts an optional presence index; when present, does a map lookup instead of HeadObject
Index is built once before the qm_cids loop during cleanup cycles
One env var: OPENAUDIO_REPAIR_QM_CIDS_USE_LIST_INDEX=true (off by default)
Uses existing tracker.Counters for observability (qm_cids_list_index_hit, qm_cids_list_index_miss, qm_cids_list_index_entries, qm_cids_list_index_build_fail)
Falls back to per-key attrs if the index build fails

Also fixes (from review of #198 and #175):

ValidateCID dead-code bug: if err != nil -> if errVal != nil
SeenKeys invalidation after invalid CID deletion
dropFromMyBucket error variable scoping (err = -> err :=)

What was intentionally left out vs #198:

Shadow comparison system (sampling, mismatch detection, auto-disable)
OPENAUDIO_REPAIR_CLEANUP_EVERY / OPENAUDIO_REPAIR_QM_CIDS_CLEANUP_EVERY cadence knobs
repairSourceEvidenceTracker (119 lines, 10 methods)
Whitespace realignment of MediorumConfig struct

~107 lines changed vs ~872 in #198, same HeadObject reduction.

Test plan

Verify build: go build ./pkg/mediorum/server/
Verify vet: go vet ./pkg/mediorum/server/
Run existing repair tests: go test ./pkg/mediorum/server/ -run TestRepair
Deploy to canary with OPENAUDIO_REPAIR_QM_CIDS_USE_LIST_INDEX=true, observe qm_cids_list_index_hit / qm_cids_list_index_miss counters in repair tracker
Deploy without env var set and verify no behavior change

🤖 Generated with Claude Code

Replace per-key bucket.Attributes (HeadObject) calls during qm_cids cleanup with a single bucket.List that builds an in-memory presence index. For nodes with millions of qm_cids entries, this collapses millions of HeadObject calls into one paginated ListObjects call. Gated behind OPENAUDIO_REPAIR_QM_CIDS_USE_LIST_INDEX=true (off by default). Falls back gracefully to per-key attrs if the index build fails. Uses existing tracker.Counters for observability. Also fixes the ValidateCID dead-code bug (checked wrong error variable) and the dropFromMyBucket error variable scoping issue. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

RolfAris · 2026-04-04T08:00:42Z

Adding the operator-side evidence we gathered, since the simpler shape here is the right upstream path.

Our live canaries on the heavier proof branch were validating the same core mechanism this PR keeps: replacing per-key qm_cids attr checks with a bucket.List -> map presence index during cleanup.

Representative live result from the clean cohort window:

1,995,657 total qm_cids calls
29,937 effective attr-backed lookups
66.66x direct reduction
98.4999% reduction
zero normal attr calls on val003, val004, and val006 in the clean proof windows

So from our side the core list-index mechanism is real. The thing that did not survive review was the extra upstream surface area around it.

Happy to help validate or supply more rollout evidence if useful.

RolfAris · 2026-04-04T08:13:45Z

One thing I still thought was worth tightening here was direct coverage for the new list-index path itself.

I put together a tiny support branch on my fork with just three focused tests and no behavior changes:

TestBuildRepairPresenceIndexIncludesLocalBlob
TestRepairCidWithPresenceIndexUsesListedState
TestRepairCidUsesKnownPresentOutsideCleanup

Branch: RolfAris:support/pr-199-targeted-tests-20260404
Commit: 46b32bd

Focused run on that support branch:

go test ./pkg/mediorum/server -run 'TestBuildRepairPresenceIndexIncludesLocalBlob|TestRepairCidWithPresenceIndexUsesListedState|TestRepairCidUsesKnownPresentOutsideCleanup' -count=1\n```\n\nI’m not trying to reopen the broader `#198` surface here. This is just the smallest direct coverage I thought might make the simpler path easier to trust.

raymondjacobson · 2026-04-04T09:01:44Z

@RolfAris Thanks for the canary data and the tests — cherry-picked them onto the branch. The 66x reduction numbers are great to have as baseline evidence for this approach.

raymondjacobson force-pushed the feat/qm-cids-list-index-simple branch from aef294e to 83f5d44 Compare April 4, 2026 07:48

raymondjacobson mentioned this pull request Apr 4, 2026

feat(repair): reduce qm_cids cleanup attr checks with presence index #198

Closed

test(repair): cover qm_cids list index fast path

2b35526

Merge branch 'main' into feat/qm-cids-list-index-simple

c356f40

raymondjacobson merged commit ab8be7c into main Apr 4, 2026
6 checks passed

raymondjacobson deleted the feat/qm-cids-list-index-simple branch April 4, 2026 09:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(repair): reduce qm_cids HeadObject calls with bucket list index#199

feat(repair): reduce qm_cids HeadObject calls with bucket list index#199
raymondjacobson merged 3 commits intomainfrom
feat/qm-cids-list-index-simple

raymondjacobson commented Apr 4, 2026

Uh oh!

RolfAris commented Apr 4, 2026 •

edited

Loading

Uh oh!

RolfAris commented Apr 4, 2026

Uh oh!

raymondjacobson commented Apr 4, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

raymondjacobson commented Apr 4, 2026

Summary

Test plan

Uh oh!

RolfAris commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RolfAris commented Apr 4, 2026

Uh oh!

raymondjacobson commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

RolfAris commented Apr 4, 2026 •

edited

Loading

raymondjacobson commented Apr 4, 2026 •

edited

Loading