refactor!: variable size writes for app storage #24
Conversation
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughReplaces fixed-size storage chunks with variable Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant QmdbStorage
participant PruneWorker
participant Disk
Client->>QmdbStorage: prepare_batch(updates: Vec<(Key, Option<Vec<u8>>)>) / apply_batch(...)
Client->>QmdbStorage: commit_state()
QmdbStorage->>QmdbStorage: acquire write lock
QmdbStorage->>PruneWorker: send prune signal (prune_tx)
PruneWorker-->>QmdbStorage: (async) prune scheduled/ack
QmdbStorage->>Disk: sync()
QmdbStorage->>QmdbStorage: compute commit hash
QmdbStorage-->>Client: return commit result
Note over PruneWorker,Disk: PruneWorker wakes on schedule or signal\nattempts prune -> retries on failure with PRUNE_RETRY_DELAY
PruneWorker->>Disk: prune inactive history
PruneWorker->>PruneWorker: retry loop on failure
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/storage/src/qmdb_impl.rs (1)
170-187:⚠️ Potential issue | 🟠 MajorAdd namespace isolation or format detection to prevent codec mismatch on reopen.
This change introduces variable-size value encoding (
Vec<u8>withRangeCfgcodec) while continuing to reuse the existing"evolve-state"partition prefix. Upgraded nodes will silently reopen state with a different codec, risking startup failure or silent data corruption if the legacy fixed-format journals are misinterpreted. Either bump the partition prefix to isolate the new format, add a format-version guard on initialization, or fail fast when detecting legacy data.Additionally,
commit_state()callsprune()before the falliblesync()call (lines 338-340). If the prune operation persists eagerly and sync subsequently fails, this violates commit atomicity. Verify whether QMDB's prune is idempotent and recoverable on failure, or reverse the order to sync first.The pruning test
test_commit_prunes_inactive_historyverifies in-process bounds advancement but does not exercise the reopen-from-disk scenario where codec compatibility matters most.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/storage/src/qmdb_impl.rs` around lines 170 - 187, The VariableConfig change introduces a variable-size codec for the existing "evolve-state" partitions; to prevent silent codec mismatches on reopen, either bump the partition prefix used in VariableConfig (e.g., change format of log_partition/mmr_metadata_partition/grafted_mmr_metadata_partition), or add an explicit on-disk format/version guard during initialization in the same module that checks existing partition metadata and fails fast on incompatible codecs (detect legacy fixed-size format and error), and update any migration logic accordingly; also change commit_state() to preserve atomicity by performing sync() before prune() (or prove and document that prune() is idempotent/recoverable) so that a failed sync does not leave partially pruned persistent state, and extend the pruning test test_commit_prunes_inactive_history to include an on-disk reopen scenario to verify codec compatibility and reopen behavior.
🧹 Nitpick comments (1)
crates/storage/src/qmdb_impl.rs (1)
1316-1379: Add a reopen step to this regression.This proves in-process pruning, but the risky part of this refactor is the persisted format change. Reinitializing
QmdbStoragefrom the same temp directory aftercommit_state()would catch codec/restart regressions that the current assertions miss.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/storage/src/qmdb_impl.rs` around lines 1316 - 1379, The test test_commit_prunes_inactive_history currently verifies in-process pruning but misses restart/codec regressions; after the second storage.commit_state().await.unwrap(), drop the existing QmdbStorage instance (let it go out of scope), then reinitialize a new QmdbStorage::new(context, config).await.unwrap() pointed at the same TempDir and re-read the prune boundary (db.bounds().await.start) and the keys via storage.get to assert start_after > start_before, key-1 returns value-1-v2 and key-0 is None, ensuring the persisted format and reopen behavior are validated.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@crates/storage/src/qmdb_impl.rs`:
- Around line 170-187: The VariableConfig change introduces a variable-size
codec for the existing "evolve-state" partitions; to prevent silent codec
mismatches on reopen, either bump the partition prefix used in VariableConfig
(e.g., change format of
log_partition/mmr_metadata_partition/grafted_mmr_metadata_partition), or add an
explicit on-disk format/version guard during initialization in the same module
that checks existing partition metadata and fails fast on incompatible codecs
(detect legacy fixed-size format and error), and update any migration logic
accordingly; also change commit_state() to preserve atomicity by performing
sync() before prune() (or prove and document that prune() is
idempotent/recoverable) so that a failed sync does not leave partially pruned
persistent state, and extend the pruning test
test_commit_prunes_inactive_history to include an on-disk reopen scenario to
verify codec compatibility and reopen behavior.
---
Nitpick comments:
In `@crates/storage/src/qmdb_impl.rs`:
- Around line 1316-1379: The test test_commit_prunes_inactive_history currently
verifies in-process pruning but misses restart/codec regressions; after the
second storage.commit_state().await.unwrap(), drop the existing QmdbStorage
instance (let it go out of scope), then reinitialize a new
QmdbStorage::new(context, config).await.unwrap() pointed at the same TempDir and
re-read the prune boundary (db.bounds().await.start) and the keys via
storage.get to assert start_after > start_before, key-1 returns value-1-v2 and
key-0 is None, ensuring the persisted format and reopen behavior are validated.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 81dafbfc-6a26-4c4b-be71-3bb7475170f1
📒 Files selected for processing (2)
crates/storage/src/qmdb_impl.rscrates/storage/src/types.rs
There was a problem hiding this comment.
🧹 Nitpick comments (1)
crates/storage/src/qmdb_impl.rs (1)
143-177: Prune worker design is reasonable, but consider limiting retry attempts.The background prune worker correctly:
- Debounces multiple signals via
try_recv()draining- Uses non-blocking
try_write()to avoid deadlock with commit operations- Gracefully exits when all senders are dropped
However, the inner retry loop (lines 151-172) will retry indefinitely if the write lock is always contended. While the 25ms sleep prevents CPU spinning, consider adding a maximum retry count to prevent the worker from being stuck in extended contention scenarios.
♻️ Optional: Add retry limit to prevent indefinite retry
loop { + const MAX_RETRIES: usize = 20; // ~500ms total + let mut retries = 0; let mut db = match db.try_write() { Ok(db) => db, Err(_) => { + retries += 1; + if retries >= MAX_RETRIES { + tracing::warn!("prune worker: gave up after {MAX_RETRIES} retries"); + break; + } while prune_rx.try_recv().is_ok() {} tokio::time::sleep(PRUNE_RETRY_DELAY).await; continue; } };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/storage/src/qmdb_impl.rs` around lines 143 - 177, spawn_prune_worker’s inner loop currently retries acquiring the write lock forever (using db.try_write() with sleeps), which can cause the background worker to be stuck under heavy contention; add a bounded retry mechanism: introduce a max retry counter (e.g., MAX_PRUNE_WRITE_RETRIES) and increment it each time try_write() returns Err, sleeping PRUNE_RETRY_DELAY between attempts, and if the counter exceeds the max, log a warning/error and break out of the loop (or skip this prune cycle) so the worker can continue processing future signals from prune_rx; apply this change around the try_write() -> prune() -> sync() sequence in spawn_prune_worker.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@crates/storage/src/qmdb_impl.rs`:
- Around line 143-177: spawn_prune_worker’s inner loop currently retries
acquiring the write lock forever (using db.try_write() with sleeps), which can
cause the background worker to be stuck under heavy contention; add a bounded
retry mechanism: introduce a max retry counter (e.g., MAX_PRUNE_WRITE_RETRIES)
and increment it each time try_write() returns Err, sleeping PRUNE_RETRY_DELAY
between attempts, and if the counter exceeds the max, log a warning/error and
break out of the loop (or skip this prune cycle) so the worker can continue
processing future signals from prune_rx; apply this change around the
try_write() -> prune() -> sync() sequence in spawn_prune_worker.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7f2517a4-060b-41db-9967-942fce8e188f
📒 Files selected for processing (1)
crates/storage/src/qmdb_impl.rs
Overview
Summary by CodeRabbit
Refactor
New Features
Tests