chore(rust/sedona-spatial-join): Integrate bounding box sampler and spilling support into build side collector#542
Merged
Kontinuation merged 3 commits intoapache:mainfrom Jan 27, 2026
Conversation
b54ba58 to
17a9261
Compare
paleolimbot
approved these changes
Jan 23, 2026
rust/sedona-spatial-join/src/exec.rs
Outdated
| filter.as_ref(), | ||
| converted_from_hash_join, | ||
| )?; | ||
| let seed = fastrand::u64(0..0xFFFF); |
Member
There was a problem hiding this comment.
No need to do this here, but should this be configurable in some way or logged so we can reproduce failures?
Member
Author
There was a problem hiding this comment.
I have added an option sedona.spatial_join.debug.random_seed for setting the random seed; also the seed is printed in debug log.
| geos = { workspace = true } | ||
| float_next_after = { workspace = true } | ||
| fastrand = { workspace = true } | ||
| log = { workspace = true } |
Member
There was a problem hiding this comment.
Are log and env_logger doing the same thing or do we need both of them?
Member
Author
There was a problem hiding this comment.
We need both. log is the logging facade, env_logger is the logging implementation we actually use.
Reference: https://docs.rs/log/latest/log/#available-logging-implementations
5bc89e1 to
52ec01e
Compare
paleolimbot
approved these changes
Jan 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is an intermediate step that integrates multiple pieces together. The overall behavior is unchanged when memory limit of DataFusion is not set (which is the default case). The collected bounding box samples are unused for now, the performance overhead of sampling boxes is negligible according to our tests. Committing this won't drive the project into an unstable or unreleasable state.
The next step will be adding a partitioned spatial index provider and integrate spatial partitioner into the main spatial join workflow, but will effectively only work on one single partition for now. This will also be a non-breaking change.