feat(merge): adversarial-review test coverage (F4/F5/F7) + F14 sub-region engine fix#6428
Open
g-talbot wants to merge 1 commit into
Open
Conversation
2a3f09a to
edd45a6
Compare
6ed25aa to
c990bfa
Compare
34d634f to
a6c72d5
Compare
c990bfa to
3b90477
Compare
a6c72d5 to
2efe6c2
Compare
3b90477 to
814a8c7
Compare
2efe6c2 to
5f04604
Compare
814a8c7 to
435a4b5
Compare
Three test additions plus one engine fix surfaced by the F4 tests.
The existing MS-7 test proved the per-input page-cache bound for one
input and one region. F4 extends the coverage:
- `test_ms7_per_input_bound_across_num_inputs` sweeps `num_inputs ∈ {1, 3, 8}` × `rows_per_input ∈
{3 000, 30 000}` and asserts the per-input peak stays bounded. Cross-axis growth check: going
from 1 input to 8 must not push the peak up.
- `test_ms7_per_input_bound_across_sub_regions_does_not_scale_with_rows` runs the prefix_len=0
multi-output sub-region path at 3 000 vs 30 000 rows and asserts peak doesn't scale with input
row count. **This test surfaced F14 (below) — without the engine fix, the sub-region path's peak
grew ~9× when rows grew 10×.**
Tests serialize via `ms7_serial_lock` because
`PEAK_BODY_COL_PAGE_CACHE_LEN` is process-global; concurrent tests
would pollute each other's readings.
Parquet streams emit pages in column-major order (all of col 0,
then all of col 1, ...). The old sub-region-outer / col-inner
ordering meant that while processing sub-region 0's col K, the
stream emitted cols 0..K-1's remaining pages first to reach col K —
those skipped pages got cached under their own col_idx for later
sub-regions to consume, and the cache scaled with input row count.
Fix: new `process_split_region_col_outer` function for the
`needs_split` path. Cols iterate in the outer loop, sub-regions in
the inner. Each parquet col chunk is fully consumed from the stream
across all sub-regions before col K+1 starts. Cache for col K is
empty before col K+1's pages arrive.
Mechanics: pre-determine writer assignments for the region's
sub-regions (a top-level region's sub-regions may span multiple
output writers; consecutive sub-regions on the same writer get
coalesced into one combined Region so each writer holds one RG
concurrently — RGs on the same writer are sequential, so coalescing
keeps the parquet writer's single-active-RG constraint intact).
Single-region path stays on the existing `process_region`.
`prop_merge_prefix_aligned_streaming` sweeps `(num_inputs ∈ 1..=3,
per-input RG specs, num_outputs ∈ 1..=3)` with prefix_len=1 and
asserts MC-1 (rows preserved), MC-3 (sorted_series monotone within
each output), MS-3 (num_row_groups matches footer), PA-1+PA-3
(`assert_unique_rg_prefix_keys`), and CS-1 (metastore prefix_len ==
KV) on every generated case. 32 cases capped to keep runtime under
a second.
Fixture: `make_prefix_len_one_input` writes one RG per
`(metric_name, rows)` entry by calling `writer.flush()` between
batches. `sorted_series` encodes
`metric_base + row_offset_within_metric`, mirroring production's
storekey property that different metric_names produce
non-overlapping `sorted_series` byte ranges.
Plus a focused unit test `test_f5_single_input_two_metrics_minimal`
that pins one specific case for fast iteration.
`test_f7_production_shape_multi_input_multi_rg_multi_output`: 5
inputs × ~4 prefix-aligned RGs each × 4 outputs × prefix_len=1.
Asserts the full invariant bundle (MC-1, MS-3, PA-1+PA-3, MS-5
cross-output sorted_series monotonicity, CS-1) — the corner the
adversarial review flagged as "untested production case".
MS-5 is "across adjacent outputs, sorted_series is monotone
non-decreasing." A single metric CAN span outputs (the engine
splits at sorted_series transitions inside an overflowing region),
so the cross-output invariant is sorted_series monotonicity, not
"each metric in one output."
- `cargo test -p quickwit-parquet-engine --lib` — 498 unit tests pass.
- `cargo clippy -p quickwit-parquet-engine --tests --all-features` with `-Dwarnings`.
- `cargo doc --no-deps -p quickwit-parquet-engine` warning-free.
- `cargo fmt --all -- --check` (nightly via PATH override).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5f04604 to
6b31ac0
Compare
435a4b5 to
154defd
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stacked on #6423. Closes the remaining open code findings from the adversarial review (
.planning/reviews/2026-05-13-adversarial-review-parquet-merger.md).Summary
Three test additions plus one engine fix surfaced by the F4 tests:
F4: MS-7 generalized
test_ms7_per_input_bound_across_num_inputssweepsnum_inputs ∈ {1, 3, 8}×rows_per_input ∈ {3 000, 30 000}and asserts the per-input peak stays bounded.test_ms7_per_input_bound_across_sub_regions_does_not_scale_with_rowsruns the prefix_len=0 multi-output sub-region path at 3 000 vs 30 000 rows and asserts peak doesn't scale with input row count.The sub-region test surfaced F14 — without the engine fix, the sub-region path's peak grew ~9× when rows grew 10×.
F14: invert col/sub-region loop nesting
Parquet streams emit pages in column-major order (all of col 0, then all of col 1, ...). The old sub-region-outer / col-inner ordering meant that while processing sub-region 0's col K, the stream emitted cols 0..K-1's remaining pages first to reach col K — those skipped pages got cached under their own col_idx for later sub-regions to consume. The cache scaled with input row count.
Fix: new
process_split_region_col_outerfunction for theneeds_splitpath. Cols iterate in the outer loop, sub-regions in the inner. Each parquet col chunk is fully consumed from the stream across all sub-regions before col K+1 starts. Single-region path stays on the existingprocess_regionunchanged.Mechanics: pre-determine writer assignments for the region's sub-regions (sub-regions can span multiple output writers; consecutive sub-regions on the same writer get coalesced into one combined Region so each writer holds one concurrent RG — RGs on the same writer are sequential per parquet's single-active-RG constraint).
F5: prefix-aware proptest
prop_merge_prefix_aligned_streamingsweeps(num_inputs ∈ 1..=3, per-input RG specs, num_outputs ∈ 1..=3)with prefix_len=1. Asserts on every generated case:MergeOutputFile.num_row_groupsmatches footer.assert_unique_rg_prefix_keyspasses.MergeOutputFile.output_rg_partition_prefix_lenmatches on-disk KV.32 cases capped to keep runtime tight. Fixture honors the storekey property: different metric_names produce non-overlapping
sorted_seriesbyte ranges, same(metric, row_offset)across inputs gets the samesorted_series(the realistic "same series in multiple splits" case).F7: production-shape integration test
test_f7_production_shape_multi_input_multi_rg_multi_output: 5 inputs × ~4 prefix-aligned RGs each × 4 outputs × prefix_len=1, with overlap (every metric appears in 2-3 of the 5 inputs). Asserts MC-1, MS-3, PA-1+PA-3, MS-5 (cross-output sorted_series monotonicity — a single metric CAN span outputs when its region overflows the per-output budget; the cross-output invariant is sorted_series monotonicity, not "each metric in one output"), and CS-1.Test plan
cargo test -p quickwit-parquet-engine --lib— 498 unit tests pass (was 495 on feat(merge): legacy promotion path + body-col schema evolution #6423; added test_ms7_per_input_bound_across_num_inputs, test_ms7_per_input_bound_across_sub_regions_does_not_scale_with_rows, prop_merge_prefix_aligned_streaming, test_f5_single_input_two_metrics_minimal, test_f7_production_shape_multi_input_multi_rg_multi_output).cargo clippy -p quickwit-parquet-engine --tests --all-featureswith-Dwarnings.cargo fmt --all -- --check(nightly).cargo doc --no-deps -p quickwit-parquet-enginewarning-free.Status of the adversarial review
After this PR, the remaining open work from the original review is just the spec docs (PA/MS/CS supplement to ADR-002). All code findings (F1, F2, F4, F6–F14) are closed across PRs #6424, #6425, #6426, #6423, and this one.
🤖 Generated with Claude Code