[SPARK-56032][SQL] Support subexpression elimination in `FilterExec` whole-stage codegen by LuciferYang · Pull Request #54862 · apache/spark

LuciferYang · 2026-03-17T12:09:15Z

What changes were proposed in this pull request?

This PR adds subexpression elimination support to FilterExec.doConsume in the whole-stage codegen path.

The implementation follows the established CSE pattern used by ProjectExec, HashAggregateExec, and AggregateCodegenSupport:

Collect otherPreds (excluding notNullPreds, which are simple IsNotNull checks with no CSE value) and run ctx.subexpressionEliminationForWholeStageCodegen to discover common subexpressions.
Wrap generatePredicateCode inside ctx.withSubExprEliminationExprs so that genCode calls within can look up pre-computed subexpressions.
Emit the CSE pre-computation code via ctx.evaluateSubExprEliminationState before the predicate checks.

Key design decisions:

Only otherPreds participate in CSE — notNullPreds are guaranteed to be IsNotNull checks (by FilterExec's constructor partitioning logic) with no CSE value; including them would interfere with equivalence analysis.
Pre-evaluate input variables before CSE analysis — FilterExec sets usedInputs = AttributeSet.empty to defer input evaluation for short-circuit optimization. However, subexpressionEliminationForWholeStageCodegen's internal getLocalInputVariableValues has a side effect: it clears ctx.currentVars[i].code for input variables referenced by common subexpressions. If notNullPreds reference the same input variables, generatePredicateCode's evaluateRequiredVariables would find empty code and skip their declarations, causing "is not an rvalue" compilation errors. By pre-evaluating the input variables referenced by otherPreds before CSE analysis, we ensure their codes are already consumed, avoiding this conflict.
CSE evaluation is placed before predicate short-circuit checks — This is an intentional tradeoff: common subexpressions are evaluated unconditionally even if an earlier notNull check would have short-circuited. For expensive shared expressions (e.g., from_json appearing in 500 predicates), the benefit of evaluating once vs N times far outweighs the cost of losing short-circuit on the CSE portion. When there are no common subexpressions, subExprsCode is empty and this path has zero overhead. This is safe because Spark SQL expressions handle null inputs gracefully (returning null rather than throwing).

Why are the changes needed?

Support subexpression elimination in FilterExec whole-stage codegen, and with this fix, the from_json codegen (PR #48466) can be safely re-enabled in a follow-up PR.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added two new test cases in WholeStageCodegenSuite

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Sonnet 4.6

dongjoon-hyun · 2026-03-17T15:35:56Z

sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala

+        // evaluateSubExprEliminationState must be called after predicate code generation;
+        // it emits the pre-computation code and marks states as consumed.
+        (inputVarsEvalCode,
+         ctx.evaluateSubExprEliminationState(subExprs.states.values), predCode)


nit. Indentation looks a little strange. For this one, one liner might be better.

- (inputVarsEvalCode, - ctx.evaluateSubExprEliminationState(subExprs.states.values), predCode) + (inputVarsEvalCode, ctx.evaluateSubExprEliminationState(subExprs.states.values), predCode)

changed to a single line

sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala

dongjoon-hyun · 2026-03-17T15:46:15Z

cc @peter-toth , too

LuciferYang · 2026-03-17T16:03:16Z

Thank you for your review. @dongjoon-hyun

init

6414c3e

LuciferYang changed the title ~~[SPARK-56032][SQL] Support subexpression elimination in FilterExec whole-stage codegen~~ [SPARK-56032][SQL] Support subexpression elimination in FilterExec whole-stage codegen Mar 17, 2026

dongjoon-hyun reviewed Mar 17, 2026

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala Outdated Show resolved Hide resolved

dongjoon-hyun reviewed Mar 17, 2026

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala Outdated Show resolved Hide resolved

LuciferYang added 2 commits March 17, 2026 23:51

use SPARK-56032 in test

7deba2e

one line

e18ee82

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56032][SQL] Support subexpression elimination in `FilterExec` whole-stage codegen#54862

[SPARK-56032][SQL] Support subexpression elimination in `FilterExec` whole-stage codegen#54862
LuciferYang wants to merge 3 commits intoapache:masterfrom
LuciferYang:SPARK-56032

LuciferYang commented Mar 17, 2026 •

edited

Loading

Uh oh!

dongjoon-hyun Mar 17, 2026

Uh oh!

LuciferYang Mar 17, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

dongjoon-hyun commented Mar 17, 2026

Uh oh!

LuciferYang commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

LuciferYang commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

dongjoon-hyun Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

LuciferYang Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dongjoon-hyun commented Mar 17, 2026

Uh oh!

LuciferYang commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

LuciferYang commented Mar 17, 2026 •

edited

Loading

LuciferYang Mar 17, 2026 •

edited

Loading