Skip to content

Refactor build_array_reader into a struct #7521

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 20, 2025

Conversation

alamb
Copy link
Contributor

@alamb alamb commented May 16, 2025

Note to reviewers

Looking at the whitespace blind diff (https://github.com/apache/arrow-rs/pull/7521/files?w=1) I think makes it easier to see what is changed in this PR (changed the plumbing of how parameters are passed around rather than any logic changes)

Which issue does this PR close?

Rationale for this change

I am trying to avoid potentially decoding arrays twice

Applying ArrowPredicate is sometimes slower than filtering afterwards. Part of the reason for this is that filter columns are decoded twice.

I want a way to inject a pre-calculated filter result into the record batch decoding machinery and one way I found that works well is to provide an ArrayBuilder instance that uses the cached result. I found it convenient to have a struct on which to hang the cache rather than a bunch of free functions

You can see how this is used here:

Also, even if we choose not to go with the result

What changes are included in this PR?

  • Factor the internal functions that create ArrayBuilders into a new struct

Are there any user-facing changes?

No, this code is entirely internal (e.g. this code is not public: https://docs.rs/parquet/latest/parquet/?search=array_builder)

@github-actions github-actions bot added the parquet Changes to the parquet crate label May 16, 2025
@alamb alamb force-pushed the alamb/extract_array_data_builder branch from 403a636 to 5f42aac Compare May 16, 2025 16:16
@alamb alamb force-pushed the alamb/extract_array_data_builder branch from 5f42aac to 9690125 Compare May 16, 2025 16:19
@alamb alamb changed the title Factor ArrayDataBuilder into a new struct Refactor ArrayDataBuilder into a new struct May 16, 2025
@alamb alamb changed the title Refactor ArrayDataBuilder into a new struct Refactor build_array_reader into a struct May 16, 2025
@@ -45,7 +45,7 @@ mod struct_array;
#[cfg(test)]
mod test_util;

pub use builder::build_array_reader;
pub(crate) use builder::ArrayReaderBuilder;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally find the use of pub use in non pub modules confusing to reason about what is pub and what is not. I think using pub(crate) to make it explicit this is crate private makes it easier to understand

@alamb alamb marked this pull request as ready for review May 16, 2025 16:22
Copy link
Contributor

@zhuqi-lucas zhuqi-lucas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Thank you @alamb.

@alamb alamb merged commit 45bda04 into apache:main May 20, 2025
16 checks passed
@alamb
Copy link
Contributor Author

alamb commented May 20, 2025

Thanks for the review @zhuqi-lucas

@alamb alamb deleted the alamb/extract_array_data_builder branch May 20, 2025 17:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants