-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Consolidate ParquetExec
tests in parquet_exec
integration test
#4130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -900,6 +900,8 @@ pub async fn plan_to_parquet( | |||
|
|||
#[cfg(test)] | |||
mod tests { | |||
// See also `parquet_exec` integration test | |||
|
|||
use super::*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Ted-Jiang / @tustvold / @thinkharderdev what do you think about moving the remaining tests in this module into the integration test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think some of them may make use of crate private functions, otherwise I have no strong feeling either way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 yes I think the pruning ones in particular do so -- I would likely need to refactor somehow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No objection from me. I think we can probably leave the pruning tests as unit tests here and just move the more "integration-y" tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the only real reason to consolidate them is that my sense of order is somewhat violated that the three kinds of parquet predicate pushdown are not in the same pattern
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I will pursue the idea as a small follow on refactor
// under the License. | ||
|
||
/// Run all tests that are found in the `parquet` directory | ||
mod parquet; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that this has become a single integration test that runs rather than three independent ones (that each need to be compiled / linked / run serially)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds reasonable ! 👍
// under the License. | ||
|
||
/// Run all tests that are found in the `parquet` directory | ||
mod parquet; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds reasonable ! 👍
@@ -900,6 +900,8 @@ pub async fn plan_to_parquet( | |||
|
|||
#[cfg(test)] | |||
mod tests { | |||
// See also `parquet_exec` integration test | |||
|
|||
use super::*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No objection from me. I think we can probably leave the pruning tests as unit tests here and just move the more "integration-y" tests?
Benchmark runs are scheduled for baseline = 175adbd and contender = ebc279c. ebc279c is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
re #3463
Rationale for this change
The parquet reader is both important and well tested in DataFusion. However, currently the tests are split in several places so it may not be all that clear what is covered.
Also, the parquet integration test currently takes 24 seconds so more parallelism would be better
What changes are included in this PR?
Changes:
Move most ParquetExec tests into the
parquet_exec
integration test:So now we can run the parquet tests like:
cargo test --test parquet_exec
Are there any user-facing changes?
No