Skip to content

Window Functions Order Conservation -- Follow-up On Set Monotonicity #14813

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Feb 26, 2025

Conversation

berkaysynnada
Copy link
Contributor

@berkaysynnada berkaysynnada commented Feb 21, 2025

Which issue does this PR close?

Rationale for this change

#14271 had introduced set-monotonicity term for window functions. After that PR we have some false negatives for order conservation, and we don't have a fully complete test coverage for window functions in terms of set-monotonicity, partitioning type, frame type, and order of the function inputs.

What changes are included in this PR?

This PR enriches the ordering properties of WindowAggExec and BoundedWindowAggExec, and I've written a test function which covers all cases for a window function, both positive and negative cases.

Are these changes tested?

Yes

Are there any user-facing changes?

Order conservation in more cases

@berkaysynnada berkaysynnada marked this pull request as draft February 21, 2025 14:24
@github-actions github-actions bot added physical-expr Changes to the physical-expr crates optimizer Optimizer rules core Core DataFusion crate logical-expr Logical plan and expressions labels Feb 21, 2025
@@ -222,208 +227,6 @@ async fn test_remove_unnecessary_sort5() -> Result<()> {
Ok(())
}

#[tokio::test]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coverage of these tests are preserved in test_window_partial_constant_and_set_monotonicity()

@github-actions github-actions bot removed the logical-expr Logical plan and expressions label Feb 21, 2025
" BoundedWindowAggExec: wdw=[count: Ok(Field { name: \"count\", data_type: Int64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }), frame: WindowFrame { units: Range, start_bound: Preceding(NULL), end_bound: CurrentRow, is_causal: false }], mode=[Sorted]",
" SortExec: expr=[nullable_col@0 ASC, non_nullable_col@1 ASC], preserve_partitioning=[false]",
" DataSourceExec: partitions=1, partition_sizes=[0]"];
" SortExec: expr=[nullable_col@0 ASC, non_nullable_col@1 ASC], preserve_partitioning=[false]",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To the reviewers, please double check here. While 2nd BoundedWindowAggExec requires [nullable_col, non_nullable_col] ordering, it is not guaranteed in the previous version. If I'm not missing something, now it is, with a newly added partial sort.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, it seems like we had a bug before which is now fixed.

@@ -2280,3 +2086,1265 @@ async fn test_not_replaced_with_partial_sort_for_unbounded_input() -> Result<()>
assert_optimized!(expected_input, expected_no_change, physical_plan, true);
Ok(())
}

#[tokio::test]
async fn test_window_partial_constant_and_set_monotonicity() -> Result<()> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing this with rs-test seems hard to debug, and the relation between the arguments and initial-final plans is not very clear. So I prefer this

@@ -498,6 +697,15 @@ pub fn get_window_mode(
None
}

fn all_possible_sort_options(expr: Arc<dyn PhysicalExpr>) -> Vec<PhysicalSortExpr> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be init once maybe

@berkaysynnada berkaysynnada marked this pull request as ready for review February 23, 2025 18:11
@berkaysynnada
Copy link
Contributor Author

PTAL @ozankabak, @alamb

@github-actions github-actions bot added sql SQL Planner development-process Related to development process of DataFusion logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt) substrait Changes to the substrait crate catalog Related to the catalog crate common Related to common crate execution Related to the execution crate proto Related to proto crate functions Changes to functions implementation labels Feb 25, 2025
@github-actions github-actions bot removed documentation Improvements or additions to documentation sql SQL Planner development-process Related to development process of DataFusion logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt) substrait Changes to the substrait crate catalog Related to the catalog crate common Related to common crate execution Related to the execution crate proto Related to proto crate functions Changes to functions implementation labels Feb 25, 2025
Copy link
Contributor

@ozankabak ozankabak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed both the code and the tests carefully, and sent a commit for some improvements. This PR greatly improves the test coverage and fills some gaps in terms of set monotonicity optimizations.

I will wait for some more time before merging in case we get more eyes on this (which would be great).

@ozankabak
Copy link
Contributor

I will go ahead and merge this since it is a follow-up to a previously discussed (and reviewed) PR/feature. It would still be great to have some post-merge review on this when you have some time on your hands @alamb.

@ozankabak ozankabak merged commit ea0686b into apache:main Feb 26, 2025
24 checks passed
@alamb
Copy link
Contributor

alamb commented Feb 26, 2025

I will go ahead and merge this since it is a follow-up to a previously discussed (and reviewed) PR/feature. It would still be great to have some post-merge review on this when you have some time on your hands @alamb.

THanks -- I'll try to to look at it but I am currently quite tied up with triyng to get DataFusion 46 ready for release (and dealing with some unexpected fallout from #14224). May not be until next week

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate optimizer Optimizer rules physical-expr Changes to the physical-expr crates
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants