Skip to content

refactor!: variable size writes for app storage #24

Merged
tac0turtle merged 4 commits intomainfrom
marko/variable_size
Mar 10, 2026
Merged

refactor!: variable size writes for app storage #24
tac0turtle merged 4 commits intomainfrom
marko/variable_size

Conversation

@tac0turtle
Copy link
Contributor

@tac0turtle tac0turtle commented Mar 10, 2026

Overview

Summary by CodeRabbit

  • Refactor

    • Switched from fixed-size value chunks to variable-size values (max 4092 bytes); key handling adjusted.
    • Commit flow now uses a write-path that runs pruning and sync before computing commit state.
    • Configuration schema updated with renamed fields and new log-related options; oversized-value error text clarified.
  • New Features

    • Background pruning worker with scheduling and retry integrated into commit path.
  • Tests

    • Tests updated for empty-value semantics, max-value size, and pruning behavior.

@coderabbitai
Copy link

coderabbitai bot commented Mar 10, 2026

Warning

Rate limit exceeded

@tac0turtle has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 17 minutes and 17 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8b15c24c-1779-4c43-b03a-b70217b25e1d

📥 Commits

Reviewing files that changed from the base of the PR and between 997b1b8 and 3700c04.

📒 Files selected for processing (1)
  • crates/storage/src/qmdb_impl.rs
📝 Walkthrough

Walkthrough

Replaces fixed-size storage chunks with variable Vec<u8> values, migrates QMDB config from FixedConfig to VariableConfig, adds a background prune worker with signaling and retry logic, and updates commit flow to perform prune/sync under a write lock before computing the commit hash.

Changes

Cohort / File(s) Summary
Types & public constants
crates/storage/src/types.rs
Removed fixed-size value-chunk types/helpers; added MAX_VALUE_SIZE = 4092 and StorageKey = FixedBytes<...>. Value framing simplified to raw Vec<u8> with an explicit max size.
QMDB core implementation
crates/storage/src/qmdb_impl.rs
Replaced StorageValueChunk with Vec<u8> across DB alias, PreparedBatch.updates, decode/apply/prepare paths; changed ValueTooLarge message text; updated internal get/commit paths to propagate raw bytes.
Config changes
crates/storage/src/qmdb_impl.rs
Migrated FixedConfigVariableConfig; renamed log_journal_partitionlog_partition; added log_compression and log_codec_config (RangeCfg-based).
Pruning & background worker
crates/storage/src/qmdb_impl.rs
Added prune_tx: UnboundedSender<()>, spawn_prune_worker(db), schedule/retry constants (PRUNE_SCHEDULE_DELAY, PRUNE_RETRY_DELAY); integrated prune signaling into commit (write-lock, prune+sync before hash).
Tests & expectations
crates/storage/... (tests)
Updated tests to treat empty values as Some(Vec::new()); adjusted assertions for MAX_VALUE_SIZE; added test_commit_prunes_inactive_history and other prune-related expectations.
Manifest/metadata
Cargo.toml, manifest lines
Manifest references updated (lines changed +83/-30) to reflect crate changes and new exports.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant QmdbStorage
  participant PruneWorker
  participant Disk

  Client->>QmdbStorage: prepare_batch(updates: Vec<(Key, Option<Vec<u8>>)>) / apply_batch(...)
  Client->>QmdbStorage: commit_state()
  QmdbStorage->>QmdbStorage: acquire write lock
  QmdbStorage->>PruneWorker: send prune signal (prune_tx)
  PruneWorker-->>QmdbStorage: (async) prune scheduled/ack
  QmdbStorage->>Disk: sync()
  QmdbStorage->>QmdbStorage: compute commit hash
  QmdbStorage-->>Client: return commit result

  Note over PruneWorker,Disk: PruneWorker wakes on schedule or signal\nattempts prune -> retries on failure with PRUNE_RETRY_DELAY
  PruneWorker->>Disk: prune inactive history
  PruneWorker->>PruneWorker: retry loop on failure
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 I swapped my chunky burrow for nimble byte rows,
I prune old trails where quiet moss grows.
Commits lock, sync, then sparkle anew,
Little bytes hop, and history trims through. 🌿

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 71.43% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'refactor!: variable size writes for app storage' directly and accurately describes the main changes in the PR, which involve replacing fixed-size StorageValueChunk with variable-size Vec throughout the storage implementation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch marko/variable_size

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/storage/src/qmdb_impl.rs (1)

170-187: ⚠️ Potential issue | 🟠 Major

Add namespace isolation or format detection to prevent codec mismatch on reopen.

This change introduces variable-size value encoding (Vec<u8> with RangeCfg codec) while continuing to reuse the existing "evolve-state" partition prefix. Upgraded nodes will silently reopen state with a different codec, risking startup failure or silent data corruption if the legacy fixed-format journals are misinterpreted. Either bump the partition prefix to isolate the new format, add a format-version guard on initialization, or fail fast when detecting legacy data.

Additionally, commit_state() calls prune() before the fallible sync() call (lines 338-340). If the prune operation persists eagerly and sync subsequently fails, this violates commit atomicity. Verify whether QMDB's prune is idempotent and recoverable on failure, or reverse the order to sync first.

The pruning test test_commit_prunes_inactive_history verifies in-process bounds advancement but does not exercise the reopen-from-disk scenario where codec compatibility matters most.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/storage/src/qmdb_impl.rs` around lines 170 - 187, The VariableConfig
change introduces a variable-size codec for the existing "evolve-state"
partitions; to prevent silent codec mismatches on reopen, either bump the
partition prefix used in VariableConfig (e.g., change format of
log_partition/mmr_metadata_partition/grafted_mmr_metadata_partition), or add an
explicit on-disk format/version guard during initialization in the same module
that checks existing partition metadata and fails fast on incompatible codecs
(detect legacy fixed-size format and error), and update any migration logic
accordingly; also change commit_state() to preserve atomicity by performing
sync() before prune() (or prove and document that prune() is
idempotent/recoverable) so that a failed sync does not leave partially pruned
persistent state, and extend the pruning test
test_commit_prunes_inactive_history to include an on-disk reopen scenario to
verify codec compatibility and reopen behavior.
🧹 Nitpick comments (1)
crates/storage/src/qmdb_impl.rs (1)

1316-1379: Add a reopen step to this regression.

This proves in-process pruning, but the risky part of this refactor is the persisted format change. Reinitializing QmdbStorage from the same temp directory after commit_state() would catch codec/restart regressions that the current assertions miss.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/storage/src/qmdb_impl.rs` around lines 1316 - 1379, The test
test_commit_prunes_inactive_history currently verifies in-process pruning but
misses restart/codec regressions; after the second
storage.commit_state().await.unwrap(), drop the existing QmdbStorage instance
(let it go out of scope), then reinitialize a new QmdbStorage::new(context,
config).await.unwrap() pointed at the same TempDir and re-read the prune
boundary (db.bounds().await.start) and the keys via storage.get to assert
start_after > start_before, key-1 returns value-1-v2 and key-0 is None, ensuring
the persisted format and reopen behavior are validated.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@crates/storage/src/qmdb_impl.rs`:
- Around line 170-187: The VariableConfig change introduces a variable-size
codec for the existing "evolve-state" partitions; to prevent silent codec
mismatches on reopen, either bump the partition prefix used in VariableConfig
(e.g., change format of
log_partition/mmr_metadata_partition/grafted_mmr_metadata_partition), or add an
explicit on-disk format/version guard during initialization in the same module
that checks existing partition metadata and fails fast on incompatible codecs
(detect legacy fixed-size format and error), and update any migration logic
accordingly; also change commit_state() to preserve atomicity by performing
sync() before prune() (or prove and document that prune() is
idempotent/recoverable) so that a failed sync does not leave partially pruned
persistent state, and extend the pruning test
test_commit_prunes_inactive_history to include an on-disk reopen scenario to
verify codec compatibility and reopen behavior.

---

Nitpick comments:
In `@crates/storage/src/qmdb_impl.rs`:
- Around line 1316-1379: The test test_commit_prunes_inactive_history currently
verifies in-process pruning but misses restart/codec regressions; after the
second storage.commit_state().await.unwrap(), drop the existing QmdbStorage
instance (let it go out of scope), then reinitialize a new
QmdbStorage::new(context, config).await.unwrap() pointed at the same TempDir and
re-read the prune boundary (db.bounds().await.start) and the keys via
storage.get to assert start_after > start_before, key-1 returns value-1-v2 and
key-0 is None, ensuring the persisted format and reopen behavior are validated.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 81dafbfc-6a26-4c4b-be71-3bb7475170f1

📥 Commits

Reviewing files that changed from the base of the PR and between 3bd8799 and aeaf994.

📒 Files selected for processing (2)
  • crates/storage/src/qmdb_impl.rs
  • crates/storage/src/types.rs

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
crates/storage/src/qmdb_impl.rs (1)

143-177: Prune worker design is reasonable, but consider limiting retry attempts.

The background prune worker correctly:

  • Debounces multiple signals via try_recv() draining
  • Uses non-blocking try_write() to avoid deadlock with commit operations
  • Gracefully exits when all senders are dropped

However, the inner retry loop (lines 151-172) will retry indefinitely if the write lock is always contended. While the 25ms sleep prevents CPU spinning, consider adding a maximum retry count to prevent the worker from being stuck in extended contention scenarios.

♻️ Optional: Add retry limit to prevent indefinite retry
                 loop {
+                    const MAX_RETRIES: usize = 20; // ~500ms total
+                    let mut retries = 0;
                     let mut db = match db.try_write() {
                         Ok(db) => db,
                         Err(_) => {
+                            retries += 1;
+                            if retries >= MAX_RETRIES {
+                                tracing::warn!("prune worker: gave up after {MAX_RETRIES} retries");
+                                break;
+                            }
                             while prune_rx.try_recv().is_ok() {}
                             tokio::time::sleep(PRUNE_RETRY_DELAY).await;
                             continue;
                         }
                     };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/storage/src/qmdb_impl.rs` around lines 143 - 177, spawn_prune_worker’s
inner loop currently retries acquiring the write lock forever (using
db.try_write() with sleeps), which can cause the background worker to be stuck
under heavy contention; add a bounded retry mechanism: introduce a max retry
counter (e.g., MAX_PRUNE_WRITE_RETRIES) and increment it each time try_write()
returns Err, sleeping PRUNE_RETRY_DELAY between attempts, and if the counter
exceeds the max, log a warning/error and break out of the loop (or skip this
prune cycle) so the worker can continue processing future signals from prune_rx;
apply this change around the try_write() -> prune() -> sync() sequence in
spawn_prune_worker.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@crates/storage/src/qmdb_impl.rs`:
- Around line 143-177: spawn_prune_worker’s inner loop currently retries
acquiring the write lock forever (using db.try_write() with sleeps), which can
cause the background worker to be stuck under heavy contention; add a bounded
retry mechanism: introduce a max retry counter (e.g., MAX_PRUNE_WRITE_RETRIES)
and increment it each time try_write() returns Err, sleeping PRUNE_RETRY_DELAY
between attempts, and if the counter exceeds the max, log a warning/error and
break out of the loop (or skip this prune cycle) so the worker can continue
processing future signals from prune_rx; apply this change around the
try_write() -> prune() -> sync() sequence in spawn_prune_worker.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7f2517a4-060b-41db-9967-942fce8e188f

📥 Commits

Reviewing files that changed from the base of the PR and between aeaf994 and 79fe312.

📒 Files selected for processing (1)
  • crates/storage/src/qmdb_impl.rs

@tac0turtle tac0turtle merged commit fd1500f into main Mar 10, 2026
5 of 6 checks passed
@tac0turtle tac0turtle deleted the marko/variable_size branch March 10, 2026 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant