refactor!: variable size writes for app storage by tac0turtle · Pull Request #24 · evstack/ev-rs

tac0turtle · 2026-03-10T17:47:25Z

Overview

Summary by CodeRabbit

Refactor
- Switched from fixed-size value chunks to variable-size values (max 4092 bytes); key handling adjusted.
- Commit flow now uses a write-path that runs pruning and sync before computing commit state.
- Configuration schema updated with renamed fields and new log-related options; oversized-value error text clarified.
New Features
- Background pruning worker with scheduling and retry integrated into commit path.
Tests
- Tests updated for empty-value semantics, max-value size, and pruning behavior.

coderabbitai · 2026-03-10T17:47:52Z

Warning

Rate limit exceeded

@tac0turtle has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 17 minutes and 17 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8b15c24c-1779-4c43-b03a-b70217b25e1d

📥 Commits

Reviewing files that changed from the base of the PR and between 997b1b8 and 3700c04.

📒 Files selected for processing (1)

crates/storage/src/qmdb_impl.rs

📝 Walkthrough

Walkthrough

Replaces fixed-size storage chunks with variable Vec<u8> values, migrates QMDB config from FixedConfig to VariableConfig, adds a background prune worker with signaling and retry logic, and updates commit flow to perform prune/sync under a write lock before computing the commit hash.

Changes

Cohort / File(s)	Summary
Types & public constants `crates/storage/src/types.rs`	Removed fixed-size value-chunk types/helpers; added `MAX_VALUE_SIZE = 4092` and `StorageKey = FixedBytes<...>`. Value framing simplified to raw `Vec<u8>` with an explicit max size.
QMDB core implementation `crates/storage/src/qmdb_impl.rs`	Replaced `StorageValueChunk` with `Vec<u8>` across DB alias, `PreparedBatch.updates`, decode/apply/prepare paths; changed `ValueTooLarge` message text; updated internal get/commit paths to propagate raw bytes.
Config changes `crates/storage/src/qmdb_impl.rs`	Migrated `FixedConfig` → `VariableConfig`; renamed `log_journal_partition` → `log_partition`; added `log_compression` and `log_codec_config` (RangeCfg-based).
Pruning & background worker `crates/storage/src/qmdb_impl.rs`	Added `prune_tx: UnboundedSender<()>`, `spawn_prune_worker(db)`, schedule/retry constants (`PRUNE_SCHEDULE_DELAY`, `PRUNE_RETRY_DELAY`); integrated prune signaling into commit (write-lock, prune+sync before hash).
Tests & expectations `crates/storage/...` (tests)	Updated tests to treat empty values as `Some(Vec::new())`; adjusted assertions for `MAX_VALUE_SIZE`; added `test_commit_prunes_inactive_history` and other prune-related expectations.
Manifest/metadata `Cargo.toml`, manifest lines	Manifest references updated (lines changed +83/-30) to reflect crate changes and new exports.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant QmdbStorage
  participant PruneWorker
  participant Disk

  Client->>QmdbStorage: prepare_batch(updates: Vec<(Key, Option<Vec<u8>>)>) / apply_batch(...)
  Client->>QmdbStorage: commit_state()
  QmdbStorage->>QmdbStorage: acquire write lock
  QmdbStorage->>PruneWorker: send prune signal (prune_tx)
  PruneWorker-->>QmdbStorage: (async) prune scheduled/ack
  QmdbStorage->>Disk: sync()
  QmdbStorage->>QmdbStorage: compute commit hash
  QmdbStorage-->>Client: return commit result

  Note over PruneWorker,Disk: PruneWorker wakes on schedule or signal\nattempts prune -> retries on failure with PRUNE_RETRY_DELAY
  PruneWorker->>Disk: prune inactive history
  PruneWorker->>PruneWorker: retry loop on failure

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

chore: update commonware & improve storage #23: Modifies the same storage subsystem files (crates/storage/src/qmdb_impl.rs, types.rs) with overlapping changes to stored value representation and batching/pruning logic.

Poem

🐰 I swapped my chunky burrow for nimble byte rows,
I prune old trails where quiet moss grows.
Commits lock, sync, then sparkle anew,
Little bytes hop, and history trims through. 🌿

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 71.43% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'refactor!: variable size writes for app storage' directly and accurately describes the main changes in the PR, which involve replacing fixed-size StorageValueChunk with variable-size Vec throughout the storage implementation.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch marko/variable_size

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

crates/storage/src/qmdb_impl.rs (1)
170-187: ⚠️ Potential issue | 🟠 Major

Add namespace isolation or format detection to prevent codec mismatch on reopen.

This change introduces variable-size value encoding (Vec<u8> with RangeCfg codec) while continuing to reuse the existing "evolve-state" partition prefix. Upgraded nodes will silently reopen state with a different codec, risking startup failure or silent data corruption if the legacy fixed-format journals are misinterpreted. Either bump the partition prefix to isolate the new format, add a format-version guard on initialization, or fail fast when detecting legacy data.

Additionally, commit_state() calls prune() before the fallible sync() call (lines 338-340). If the prune operation persists eagerly and sync subsequently fails, this violates commit atomicity. Verify whether QMDB's prune is idempotent and recoverable on failure, or reverse the order to sync first.

The pruning test test_commit_prunes_inactive_history verifies in-process bounds advancement but does not exercise the reopen-from-disk scenario where codec compatibility matters most.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/storage/src/qmdb_impl.rs` around lines 170 - 187, The VariableConfig
change introduces a variable-size codec for the existing "evolve-state"
partitions; to prevent silent codec mismatches on reopen, either bump the
partition prefix used in VariableConfig (e.g., change format of
log_partition/mmr_metadata_partition/grafted_mmr_metadata_partition), or add an
explicit on-disk format/version guard during initialization in the same module
that checks existing partition metadata and fails fast on incompatible codecs
(detect legacy fixed-size format and error), and update any migration logic
accordingly; also change commit_state() to preserve atomicity by performing
sync() before prune() (or prove and document that prune() is
idempotent/recoverable) so that a failed sync does not leave partially pruned
persistent state, and extend the pruning test
test_commit_prunes_inactive_history to include an on-disk reopen scenario to
verify codec compatibility and reopen behavior.

🧹 Nitpick comments (1)

crates/storage/src/qmdb_impl.rs (1)
1316-1379: Add a reopen step to this regression.

This proves in-process pruning, but the risky part of this refactor is the persisted format change. Reinitializing QmdbStorage from the same temp directory after commit_state() would catch codec/restart regressions that the current assertions miss.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/storage/src/qmdb_impl.rs` around lines 1316 - 1379, The test
test_commit_prunes_inactive_history currently verifies in-process pruning but
misses restart/codec regressions; after the second
storage.commit_state().await.unwrap(), drop the existing QmdbStorage instance
(let it go out of scope), then reinitialize a new QmdbStorage::new(context,
config).await.unwrap() pointed at the same TempDir and re-read the prune
boundary (db.bounds().await.start) and the keys via storage.get to assert
start_after > start_before, key-1 returns value-1-v2 and key-0 is None, ensuring
the persisted format and reopen behavior are validated.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@crates/storage/src/qmdb_impl.rs`:
- Around line 170-187: The VariableConfig change introduces a variable-size
codec for the existing "evolve-state" partitions; to prevent silent codec
mismatches on reopen, either bump the partition prefix used in VariableConfig
(e.g., change format of
log_partition/mmr_metadata_partition/grafted_mmr_metadata_partition), or add an
explicit on-disk format/version guard during initialization in the same module
that checks existing partition metadata and fails fast on incompatible codecs
(detect legacy fixed-size format and error), and update any migration logic
accordingly; also change commit_state() to preserve atomicity by performing
sync() before prune() (or prove and document that prune() is
idempotent/recoverable) so that a failed sync does not leave partially pruned
persistent state, and extend the pruning test
test_commit_prunes_inactive_history to include an on-disk reopen scenario to
verify codec compatibility and reopen behavior.

---

Nitpick comments:
In `@crates/storage/src/qmdb_impl.rs`:
- Around line 1316-1379: The test test_commit_prunes_inactive_history currently
verifies in-process pruning but misses restart/codec regressions; after the
second storage.commit_state().await.unwrap(), drop the existing QmdbStorage
instance (let it go out of scope), then reinitialize a new
QmdbStorage::new(context, config).await.unwrap() pointed at the same TempDir and
re-read the prune boundary (db.bounds().await.start) and the keys via
storage.get to assert start_after > start_before, key-1 returns value-1-v2 and
key-0 is None, ensuring the persisted format and reopen behavior are validated.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 81dafbfc-6a26-4c4b-be71-3bb7475170f1

📥 Commits

Reviewing files that changed from the base of the PR and between 3bd8799 and aeaf994.

📒 Files selected for processing (2)

crates/storage/src/qmdb_impl.rs
crates/storage/src/types.rs

coderabbitai

🧹 Nitpick comments (1)

crates/storage/src/qmdb_impl.rs (1)

143-177: Prune worker design is reasonable, but consider limiting retry attempts.

The background prune worker correctly:

Debounces multiple signals via try_recv() draining
Uses non-blocking try_write() to avoid deadlock with commit operations
Gracefully exits when all senders are dropped

However, the inner retry loop (lines 151-172) will retry indefinitely if the write lock is always contended. While the 25ms sleep prevents CPU spinning, consider adding a maximum retry count to prevent the worker from being stuck in extended contention scenarios.

♻️ Optional: Add retry limit to prevent indefinite retry

                 loop {
+                    const MAX_RETRIES: usize = 20; // ~500ms total
+                    let mut retries = 0;
                     let mut db = match db.try_write() {
                         Ok(db) => db,
                         Err(_) => {
+                            retries += 1;
+                            if retries >= MAX_RETRIES {
+                                tracing::warn!("prune worker: gave up after {MAX_RETRIES} retries");
+                                break;
+                            }
                             while prune_rx.try_recv().is_ok() {}
                             tokio::time::sleep(PRUNE_RETRY_DELAY).await;
                             continue;
                         }
                     };

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/storage/src/qmdb_impl.rs` around lines 143 - 177, spawn_prune_worker’s
inner loop currently retries acquiring the write lock forever (using
db.try_write() with sleeps), which can cause the background worker to be stuck
under heavy contention; add a bounded retry mechanism: introduce a max retry
counter (e.g., MAX_PRUNE_WRITE_RETRIES) and increment it each time try_write()
returns Err, sleeping PRUNE_RETRY_DELAY between attempts, and if the counter
exceeds the max, log a warning/error and break out of the loop (or skip this
prune cycle) so the worker can continue processing future signals from prune_rx;
apply this change around the try_write() -> prune() -> sync() sequence in
spawn_prune_worker.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@crates/storage/src/qmdb_impl.rs`:
- Around line 143-177: spawn_prune_worker’s inner loop currently retries
acquiring the write lock forever (using db.try_write() with sleeps), which can
cause the background worker to be stuck under heavy contention; add a bounded
retry mechanism: introduce a max retry counter (e.g., MAX_PRUNE_WRITE_RETRIES)
and increment it each time try_write() returns Err, sleeping PRUNE_RETRY_DELAY
between attempts, and if the counter exceeds the max, log a warning/error and
break out of the loop (or skip this prune cycle) so the worker can continue
processing future signals from prune_rx; apply this change around the
try_write() -> prune() -> sync() sequence in spawn_prune_worker.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7f2517a4-060b-41db-9967-942fce8e188f

📥 Commits

Reviewing files that changed from the base of the PR and between aeaf994 and 79fe312.

📒 Files selected for processing (1)

crates/storage/src/qmdb_impl.rs

rewrite storage variable size writes

aeaf994

coderabbitai bot reviewed Mar 10, 2026

View reviewed changes

async pruning

79fe312

coderabbitai bot reviewed Mar 10, 2026

View reviewed changes

tac0turtle added 2 commits March 10, 2026 19:22

simplify

997b1b8

add test case

3700c04

tac0turtle merged commit fd1500f into main Mar 10, 2026
5 of 6 checks passed

tac0turtle deleted the marko/variable_size branch March 10, 2026 18:28

coderabbitai bot mentioned this pull request Mar 16, 2026

optimize storage and fix revert reason #27

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor!: variable size writes for app storage #24

refactor!: variable size writes for app storage #24
tac0turtle merged 4 commits intomainfrom
marko/variable_size

tac0turtle commented Mar 10, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 10, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tac0turtle commented Mar 10, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tac0turtle commented Mar 10, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 10, 2026 •

edited

Loading