-
Notifications
You must be signed in to change notification settings - Fork 991
Use real row-group sample to estimate partition size #20567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
How do things look after this PR? |
This makes the partition size very accurate, because we sample a real row-group. |
|
/merge |
Description
Updates cudf-polars row-group sampling to improve the
ParquetSourceInfo.storage_sizeestimate used to make partitioning decisions.Checklist