-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Open
Open
Copy link
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
When the common subexpression elimination deduplicates aggregations it can generate aliases for the common expression of the form __common_expr_<n>. In the logical plan explain output this gets output as <original expr> as __common_expr_<n>. In the physical plan explain output though only __common_expr_<n> is printed. The actual expression corresponding to this alias is no longer visible. This makes the explain output hard to interpret.
To Reproduce
Here's an example logic plan constructed using the data frame API. The problematic line is
AggregateExec: mode=Partial, gby=[idx@1 as idx], aggr=[__common_expr_1]
Logical plan
============
Projection: idx, agg, ord
Aggregate: groupBy=[[idx]], aggr=[[sum(column1) AS agg, sum(column1) AS ord]]
Projection: column1, column2, CASE WHEN column2 <= Int64(0) THEN Int64(0) WHEN column2 <= Int64(200) THEN Int64(1) WHEN column2 <= Int64(314) THEN Int64(3) ELSE Int64(4) END AS idx
Values: (Int64(1), Int64(100)), (Int64(2), Int64(200)), (Int64(3), Int64(314))
Optimized logical plan
======================
Projection: idx, __common_expr_1 AS agg, __common_expr_1 AS ord
Aggregate: groupBy=[[idx]], aggr=[[sum(column1) AS __common_expr_1]]
Projection: column1, CASE WHEN column2 <= Int64(0) THEN Int64(0) WHEN column2 <= Int64(200) THEN Int64(1) WHEN column2 <= Int64(314) THEN Int64(3) ELSE Int64(4) END AS idx
Values: (Int64(1), Int64(100)), (Int64(2), Int64(200)), (Int64(3), Int64(314))
Physical plan
=============
ProjectionExec: expr=[idx@0 as idx, __common_expr_1@1 as agg, __common_expr_1@1 as ord]
AggregateExec: mode=FinalPartitioned, gby=[idx@0 as idx], aggr=[__common_expr_1]
RepartitionExec: partitioning=Hash([idx@0], 10), input_partitions=1
AggregateExec: mode=Partial, gby=[idx@1 as idx], aggr=[__common_expr_1]
ProjectionExec: expr=[column1@0 as column1, CASE WHEN column2@1 <= 0 THEN 0 WHEN column2@1 <= 200 THEN 1 WHEN column2@1 <= 314 THEN 3 ELSE 4 END as idx]
DataSourceExec: partitions=1, partition_sizes=[1]
Expected behavior
Rather than
AggregateExec: mode=Partial, gby=[idx@1 as idx], aggr=[__common_expr_1]
the explain output should show
AggregateExec: mode=Partial, gby=[idx@1 as idx], aggr=[sum(column1@0) as __common_expr_1]
similarly to how the group by expression are printed.
Additional context
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working