Skip to content

Per-block littidx flush + single shard (gated on #3645)#3660

Merged
Kbhat1 merged 1 commit into
mainfrom
littidx-flush-shard
Jun 29, 2026
Merged

Per-block littidx flush + single shard (gated on #3645)#3660
Kbhat1 merged 1 commit into
mainfrom
littidx-flush-shard

Conversation

@Kbhat1

@Kbhat1 Kbhat1 commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Description

  • NOTE: Only Merge post 3645: That makes per-block flush affordable, reclaiming near-per-block crash durability for free.
  • littFlushInterval 100ms -> 5ms (~one block at Giga throughput) — bounds crash loss to ~a single block (near-WAL durability for receipt bodies).
  • ShardingFactor 16 -> 1. flushing one segment file is cheaper; sharding mainly helps across multiple disks.

Testing

  • reads-off write ceiling 167k (vs 168k prior)
  • reads-on ~153–157k writes / 58k point-reads/s / 520 getLogs/s @ 7ms
  • 3h GC soak flat (old −27% control-loop step-down gone), 0 panics

With litt's async keymap flush (#3645), flushing ~once per block is cheap, so:
- littFlushInterval 100ms -> 5ms (~one block at Giga throughput), bounding
  crash loss to roughly a single block (near-WAL durability)
- ShardingFactor 16 -> 1 (flushing a single segment file is cheaper; sharding
  mainly helps across multiple disks)

DO NOT MERGE until #3645 (litt async keymap) lands: without it, flushing this
often regresses write throughput ~48%.

Validated via cryptosim (main littidx + #3645 + parallel getLogs):
- reads-off write ceiling 167k (vs 168k prior)
- reads-on ~153-157k writes / 58k point-reads/s / 520 getLogs/s @ 7ms
- 3h GC soak flat (old -27% control-loop step-down gone), 0 panics

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 29, 2026

Copy link
Copy Markdown

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedJun 29, 2026, 2:20 PM

@Kbhat1 Kbhat1 requested a review from cody-littley June 29, 2026 14:17
@codecov

codecov Bot commented Jun 29, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 58.18%. Comparing base (c713a03) to head (362413b).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3660      +/-   ##
==========================================
- Coverage   59.14%   58.18%   -0.97%     
==========================================
  Files        2262     2178      -84     
  Lines      187031   177408    -9623     
==========================================
- Hits       110625   103226    -7399     
+ Misses      66456    65023    -1433     
+ Partials     9950     9159     -791     
Flag Coverage Δ
sei-chain-pr 56.13% <100.00%> (?)
sei-db 70.41% <ø> (-0.22%) ⬇️
sei-db-state-db ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
sei-db/ledger_db/receipt/litt_receipt_store.go 58.97% <100.00%> (ø)

... and 85 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Kbhat1 Kbhat1 changed the title feat(receipt): per-block littidx flush + single shard (gated on #3645) Per-block littidx flush + single shard (gated on #3645) Jun 29, 2026
@Kbhat1 Kbhat1 marked this pull request as ready for review June 29, 2026 14:19
@cursor

cursor Bot commented Jun 29, 2026

Copy link
Copy Markdown

PR Summary

Medium Risk
Changes crash-durability bounds and write/sharding behavior for the auxiliary receipt store; consensus is unaffected but RPC may see not-found for very recent receipts after a hard crash.

Overview
Tightens littidx receipt-body durability by lowering littFlushInterval from 100ms to 5ms (~one block at Giga block times), so a hard crash should lose at most about one block of auxiliary RPC receipt data instead of a much longer window. Comments now tie that aggressiveness to async keymap flushing off the control loop (without it, per-block-style flushing was observed to hurt write throughput badly).

The receipts litt table ShardingFactor drops from 16 to 1, favoring cheaper single-segment flushes on one disk rather than spreading writes across many shards.

Reviewed by Cursor Bugbot for commit 362413b. Bugbot is set up for automated code reviews on this repo. Configure here.

@seidroid seidroid Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small, well-documented tuning PR that lowers the receipt-store litt flush interval (100ms → 5ms) and drops the sharding factor (16 → 1). The changes are correct and internally consistent; the only notes are process-related (merge ordering vs. #3645) and that the second-opinion reviews produced no findings.

Findings: 0 blocking | 4 non-blocking | 0 posted inline

Blockers

  • None at the file/PR level.

Non-blocking

  • Merge-ordering dependency: the PR is only safe to merge after #3645 (async keymap flush), as the author notes and the code comments state. The 5ms per-block flush is described as regressing write throughput by ≈-48% without that change. Confirm #3645 has landed before merging this.
  • The 5ms ticker calls receipts.Flush() ~200x/sec continuously, including when there are no new writes. This is fine since Flush on a clean table is a cheap no-op and time.Ticker coalesces missed ticks (no goroutine pileup), but it is worth confirming Flush() is genuinely cheap on an empty/already-flushed table to avoid idle-CPU overhead.
  • Second-opinion reviews produced no actionable output: codex-review.md reports 'No material findings' and cursor-review.md is empty. REVIEW_GUIDELINES.md is also empty, so no repo-specific standards were applied.
  • ShardingFactor is changed to 1 with the rationale that sharding mainly helps across multiple disks. If a future deployment uses multiple disks for the receipt store, this tuning would need to be revisited — consider whether this should eventually be config-driven rather than a hardcoded constant.

@cody-littley cody-littley left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments are true only once I merge my branch, but IMO not a big deal since that's merging soon.

My only concern is that timing might not provide an ironclad crash safety guarantee. But this change by itself doesn't make the problem any worse, so no need to block it while we have this discussion.

Comment on lines +84 to +89
// littFlushInterval is roughly one flush per block at Giga throughput (a
// block is ~7ms), bounding crash loss to about a single block. Flushing this
// often is only cheap because litt flushes its keymap asynchronously, off the
// control loop; without that, a per-block flush regresses write throughput
// badly (≈-48% observed before the async keymap landed).
littFlushInterval = 5 * time.Millisecond

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we requiring one flush per block for crash recovery safety? If so, doing it via wall clock time seems a bit fragile.

@Kbhat1 Kbhat1 added this pull request to the merge queue Jun 29, 2026
Merged via the queue into main with commit bb69d38 Jun 29, 2026
76 of 79 checks passed
@Kbhat1 Kbhat1 deleted the littidx-flush-shard branch June 29, 2026 21:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants