perf(fts): use block max for or tail bounds#7435
Open
BubbleCal wants to merge 2 commits into
Open
Conversation
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Xuanwo
reviewed
Jun 24, 2026
| // A low-scoring tail can be the only iterator left in the current | ||
| // window. Move to the next window so a later high-scoring block is still | ||
| // reachable instead of ending the disjunction early. | ||
| self.update_max_scores(up_to + 1); |
Collaborator
There was a problem hiding this comment.
When this path advances a headless tail window, update_max_scores() derives the next up_to from tail.peek() in the no-head/no-lead case. The tail heap is ordered by upper bound, so if the top tail is already on its final block while a lower-bound tail still has later compressed blocks, up_to becomes TERMINATED_DOC_ID and the OR path stops before visiting those later blocks, dropping valid matches.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Performance Improvement
What is the performance issue or bottleneck?
FTS OR WAND kept lead postings that were moved back into
tailusing the posting list's full approximate upper bound. That made the tail bound too loose inside the active block-max window and reduced the pruning benefit available from existing per-block max-score metadata.How does this PR improve performance?
When OR WAND moves a lead posting back into
tail, it now uses the current block max as the tail upper bound if the active block window still covers the next target. If the window is expired or not applicable, it falls back to the posting list's approximate upper bound so later high-score blocks remain reachable.The PR also advances a headless OR tail to the next compressed block window when needed, while avoiding no-progress advancement for plain postings.
This intentionally does not implement impacts skip / ImpactsDISI, shared-floor import, MaxScoreCache, or score-only WAND fast paths.
Benchmark context
Ablation comparison between main-style tail upper bounds and this OR tail block-max optimization:
Validation
All Cargo commands were run with an isolated target directory.
cargo fmt --allgit diff --checkcargo test -p lance-index scalar::inverted::wand::tests::test_or_ -- --nocapture— 4 passedcargo test -p lance-index scalar::inverted::wand::tests -- --nocapture— 34 passedcargo check -p lance-index --testscargo check --workspace --tests --benchescargo clippy --all --tests --benches -- -D warnings