Skip to content

CI Red: Fix union in view table test #15300

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 19, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 49 additions & 4 deletions datafusion/sqllogictest/test_files/union.slt
Original file line number Diff line number Diff line change
Expand Up @@ -896,7 +896,7 @@ INSERT INTO u2 VALUES (20, 20), (40, 40);

statement ok
CREATE VIEW v1 AS
SELECT y FROM u1 UNION ALL SELECT y FROM u2 ORDER BY y;
SELECT y FROM u1 UNION ALL SELECT y FROM u2 ORDER BY y;

query I
SELECT * FROM (SELECT y FROM u1 UNION ALL SELECT y FROM u2) ORDER BY y;
Expand All @@ -907,11 +907,56 @@ SELECT * FROM (SELECT y FROM u1 UNION ALL SELECT y FROM u2) ORDER BY y;
20
40

query TT
explain SELECT * FROM (SELECT y FROM u1 UNION ALL SELECT y FROM u2) ORDER BY y;
----
logical_plan
01)Sort: y ASC NULLS LAST
02)--Union
03)----Projection: CAST(u1.y AS Int64) AS y
04)------TableScan: u1 projection=[y]
05)----TableScan: u2 projection=[y]
physical_plan
01)SortPreservingMergeExec: [y@0 ASC NULLS LAST]
02)--UnionExec
03)----SortExec: expr=[y@0 ASC NULLS LAST], preserve_partitioning=[true]
04)------ProjectionExec: expr=[CAST(y@0 AS Int64) as y]
05)--------RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1
06)----------DataSourceExec: partitions=1, partition_sizes=[1]
07)----SortExec: expr=[y@0 ASC NULLS LAST], preserve_partitioning=[false]
08)------DataSourceExec: partitions=1, partition_sizes=[1]

# optimize_subquery_sort in create_relation removes Sort so the result is not sorted.
Copy link
Contributor Author

@jayzhan211 jayzhan211 Mar 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea why we have optimize plan rule optimize_subquery_sort in create_relation, I think we should move such rule to optimizer 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before #15201 - This plan doesn't meet the optimize rule condition because we didn't inline view table at this point, so the Sort is not removed.

Now - The view table inlined so Sort is removed by this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think removing Sort makes sense, but we need to move this rule to optimizer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree using ORDER BY in a view is meaningless.

query I
SELECT * FROM v1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this Select query, Postgre does not drop the Sort operation, FYI @jayzhan211 @jonahgao

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think either is fine. A view is a type of relation, and a relation is unordered.
In the SQL Server doc, it states

The ORDER BY clause does not guarantee ordered results when the view is queried, unless ORDER BY is also specified in the query itself.

Postgres also does not drop the Sort operation in subqueries but optimize_subquery_sort does.

psql=> explain select * from (select from u1 order by x);
                                 QUERY PLAN
----------------------------------------------------------------------------
 Subquery Scan on unnamed_subquery  (cost=158.51..186.76 rows=2260 width=0)
   ->  Sort  (cost=158.51..164.16 rows=2260 width=4)
         Sort Key: u1.x
         ->  Seq Scan on u1  (cost=0.00..32.60 rows=2260 width=4)
(4 rows)

----
1
3
3
20
40
3
3
1

query TT
explain SELECT * FROM v1;
----
logical_plan
01)SubqueryAlias: v1
02)--Union
03)----Projection: CAST(u1.y AS Int64) AS y
04)------TableScan: u1 projection=[y]
05)----TableScan: u2 projection=[y]
physical_plan
01)UnionExec
02)--ProjectionExec: expr=[CAST(y@0 AS Int64) as y]
03)----RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1
04)------DataSourceExec: partitions=1, partition_sizes=[1]
05)--DataSourceExec: partitions=1, partition_sizes=[1]

statement count 0
drop view v1;

statement count 0
drop table u1;

statement count 0
drop table u2;