-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Feat: introduce ExecutionPlan::partition_statistics
API
#15852
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cc @berkaysynnada PTAL, I didn't see any challenges during refactoring, the process is smooth. The tests are failing due to #15689 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @xudong963, thank you ! I’ll just ask about specifying deprecation notices for the obsolete API before merging.
b943802
to
9f28472
Compare
b1f701f
to
2144cf2
Compare
|
9d72f0e
to
e3a14ee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @xudong963 and @berkaysynnada
I also think you should add a note to the upgrade guide for this change (basically people will need to implement partition_statistics
if they have implemented statistics
-- and I think the compiler may not flag the newly introduced method.
Something that might be a better UX for downstream users might be to NOT provide a default implementation of partition_statistics
and force them to implement it:
fn partition_statistics(&self, partition: Option<usize>) -> Result<Statistics>;
But then they would need some way to avoid the boiler plate code 🤔
ExecutionPlan::partition_statistics
API
e3a14ee
to
71777d4
Compare
Added to #15771
Because we marked the old statistics method as deprecated, the compiler will mention users and navigate them to the new method.
Yes, I've experienced this during upgrading DF45, the version introduced a new method for LexOrdering IIRC, and gave it a default implementation; the default isn't suitable for our case, and I didn't notice the new method; it took me a long time to debug. QAQ
Also agree, it's a trade-off, given that the compiler will metion as I said above and we'll also metion the change in the upgrade doc, life will be easier. Or after merging the PR, we can implement the new method in all places and then remove the default implemation. |
Let's get this one in -- it has been outstanding for too long. Thanks again @xudong963 and @berkaysynnada |
* save * save * save * functional way * fix sort * adding test * add tests * save * update * add PartitionedStatistics structure * use Arc * refine tests * save * resolve conflicts * use PartitionedStatistics * impl index and len for PartitionedStatistics * add test for cross join * fix clippy * Check the statistics_by_partition with real results * rebase main and fix cross join test * resolve conflicts * Feat: introduce partition statistics API * address comments * deprecated statistics API * rebase main and fix tests * fix
Which issue does this PR close?
statistics_by_partition
API toExecutionPlan
#15495statistics_by_partition API
to ExecutionPlan #15503Rationale for this change
statistics_by_partition API
to ExecutionPlan #15503What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?