perf: Cache num_output_rows in sort merge join to avoid O(n) recount#20478
Open
andygrove wants to merge 2 commits intoapache:mainfrom
Open
perf: Cache num_output_rows in sort merge join to avoid O(n) recount#20478andygrove wants to merge 2 commits intoapache:mainfrom
andygrove wants to merge 2 commits intoapache:mainfrom
Conversation
In the SMJ tight loop (join_partial), num_unfrozen_pairs() was called twice per iteration: once in the loop guard and once inside append_output_pair. This method iterated all chunks in output_indices and summed their lengths, making the loop O(batch_size * num_chunks) instead of O(batch_size). Add a num_output_rows field to StreamedBatch that is incremented on each append and reset on freeze, replacing the O(n) summation with an O(1) field read. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
jonathanc-n
reviewed
Feb 23, 2026
Contributor
jonathanc-n
left a comment
There was a problem hiding this comment.
This looks good, some small comments.
| } | ||
|
|
||
| /// Number of unfrozen output pairs in this streamed batch | ||
| fn num_output_rows(&self) -> usize { |
Contributor
There was a problem hiding this comment.
nit: i think this function can be removed, we can call num_unfrozen_pairs() as streamed_batch.num_output_rows. and add a small comment for the num_output_rows field to declare that it represents unfrozen pairs
| } | ||
|
|
||
| self.streamed_batch.output_indices.clear(); | ||
| self.streamed_batch.num_output_rows = 0; |
Contributor
There was a problem hiding this comment.
I think we should probably encapsulate this reset pattern into its own function? (self.reset() calls .clear() and sets num_output_rows = 0)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
N/A - performance optimization
Rationale for this change
In the SMJ tight loop (
join_partial),num_unfrozen_pairs()was called twice per iteration: once in the loop guard and once insideappend_output_pair. This method iterates all chunks inoutput_indicesand sums their lengths — O(num_chunks). Over a full batch ofbatch_sizeiterations, this makes the inner loop O(batch_size * num_chunks) instead of O(batch_size).What changes are included in this PR?
Add a
num_output_rowsfield toStreamedBatchthat is incremented on each append and reset on freeze, replacing the O(n) summation with an O(1) field read.num_output_rows: usizefield toStreamedBatch, initialized to0num_output_rowsinappend_output_pair()after each appendnum_output_rows()now returns the cached field directly0infreeze_streamed()whenoutput_indicesis clearednum_unfrozen_pairsparameter fromappend_output_pair()since it can now readself.num_output_rowsdirectlyAre these changes tested?
Yes — all 48 existing
sort_merge_jointests pass. This is a pure refactor of an internal counter with no behavioral change.Performance
Very minor improvement.
Before
After
Are there any user-facing changes?
No.
🤖 Generated with Claude Code