Skip to content

refactor: split statistics sketch storage and add RocksDB diagnostics#312

Merged
KKould merged 4 commits intomainfrom
refactor/statistics_sketch
Mar 22, 2026
Merged

refactor: split statistics sketch storage and add RocksDB diagnostics#312
KKould merged 4 commits intomainfrom
refactor/statistics_sketch

Conversation

@KKould
Copy link
Member

@KKould KKould commented Mar 22, 2026

What problem does this PR solve?

  • The previous statistics layout stored the Count-Min Sketch too close to hot table/index data, which made RocksDB locality worse and showed up as a large TPCC regression.
  • Statistics cache misses were also repeatedly falling back to storage, including negative lookup cases, which made the regression harder to reason about.
  • We also lacked phase-level RocksDB diagnostics, so it was difficult to tell whether cache misses came from load, analyze, or the measured TPCC workload.

Issue link:

What is changed and how it works?

  • Split Count-Min Sketch persistence out of the statistics root and store it as sketch meta + sketch pages.
  • Keep histogram metadata and buckets as the eager statistics payload, and load sketch data lazily only when collect_count actually needs it.
  • Change StatisticsMetaCache to cache both positive entries and negative lookup results via Option, so repeated misses do not keep hitting storage.
  • Update ANALYZE, storage codecs, and transaction APIs to read/write the new statistics layout.
  • Add richer RocksDB metrics, including data/index/filter cache hit/miss counters and TPCC phase snapshots, so locality regressions are easier to diagnose.
  • Bump crate version to 0.1.8 for the next publish.

Code changes

  • Has Rust code change
  • Has CI related scripts change

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Manual test steps:

  • cargo test test_analyze --lib
  • cargo check -p tpcc -p kite_sql
  • cargo run -p tpcc --release -- --backend kite --measure-time 60 --num-ware 1 --rocksdb-stats --path /tmp/kitesql_tpcc_phase_metrics
    • Observed TPCC stayed in the restored ~24k TpmC range and phase metrics showed misses were dominated by data blocks during the measured phase.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Note for reviewer

  • The core of this PR is the sketch-storage split and the cache/load path changes around statistics.
  • The RocksDB TPCC diagnostics are included to validate the locality impact of the refactor and to make future regressions easier to spot.

@KKould KKould self-assigned this Mar 22, 2026
@KKould KKould added the invalid This doesn't seem right label Mar 22, 2026
@KKould KKould merged commit fbdabef into main Mar 22, 2026
13 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

invalid This doesn't seem right

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant