Skip to content

[SPARK-56032][SQL] Support subexpression elimination in FilterExec whole-stage codegen#54862

Open
LuciferYang wants to merge 3 commits intoapache:masterfrom
LuciferYang:SPARK-56032
Open

[SPARK-56032][SQL] Support subexpression elimination in FilterExec whole-stage codegen#54862
LuciferYang wants to merge 3 commits intoapache:masterfrom
LuciferYang:SPARK-56032

Conversation

@LuciferYang
Copy link
Contributor

@LuciferYang LuciferYang commented Mar 17, 2026

What changes were proposed in this pull request?

This PR adds subexpression elimination support to FilterExec.doConsume in the whole-stage codegen path.

The implementation follows the established CSE pattern used by ProjectExec, HashAggregateExec, and AggregateCodegenSupport:

  1. Collect otherPreds (excluding notNullPreds, which are simple IsNotNull checks with no CSE value) and run ctx.subexpressionEliminationForWholeStageCodegen to discover common subexpressions.
  2. Wrap generatePredicateCode inside ctx.withSubExprEliminationExprs so that genCode calls within can look up pre-computed subexpressions.
  3. Emit the CSE pre-computation code via ctx.evaluateSubExprEliminationState before the predicate checks.

Key design decisions:

  • Only otherPreds participate in CSEnotNullPreds are guaranteed to be IsNotNull checks (by FilterExec's constructor partitioning logic) with no CSE value; including them would interfere with equivalence analysis.
  • Pre-evaluate input variables before CSE analysisFilterExec sets usedInputs = AttributeSet.empty to defer input evaluation for short-circuit optimization. However, subexpressionEliminationForWholeStageCodegen's internal getLocalInputVariableValues has a side effect: it clears ctx.currentVars[i].code for input variables referenced by common subexpressions. If notNullPreds reference the same input variables, generatePredicateCode's evaluateRequiredVariables would find empty code and skip their declarations, causing "is not an rvalue" compilation errors. By pre-evaluating the input variables referenced by otherPreds before CSE analysis, we ensure their codes are already consumed, avoiding this conflict.
  • CSE evaluation is placed before predicate short-circuit checks — This is an intentional tradeoff: common subexpressions are evaluated unconditionally even if an earlier notNull check would have short-circuited. For expensive shared expressions (e.g., from_json appearing in 500 predicates), the benefit of evaluating once vs N times far outweighs the cost of losing short-circuit on the CSE portion. When there are no common subexpressions, subExprsCode is empty and this path has zero overhead. This is safe because Spark SQL expressions handle null inputs gracefully (returning null rather than throwing).

Why are the changes needed?

Support subexpression elimination in FilterExec whole-stage codegen, and with this fix, the from_json codegen (PR #48466) can be safely re-enabled in a follow-up PR.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added two new test cases in WholeStageCodegenSuite

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Sonnet 4.6

@LuciferYang LuciferYang changed the title [SPARK-56032][SQL] Support subexpression elimination in FilterExec whole-stage codegen [SPARK-56032][SQL] Support subexpression elimination in FilterExec whole-stage codegen Mar 17, 2026
// evaluateSubExprEliminationState must be called after predicate code generation;
// it emits the pre-computation code and marks states as consumed.
(inputVarsEvalCode,
ctx.evaluateSubExprEliminationState(subExprs.states.values), predCode)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit. Indentation looks a little strange. For this one, one liner might be better.

-        (inputVarsEvalCode,
-         ctx.evaluateSubExprEliminationState(subExprs.states.values), predCode)
+        (inputVarsEvalCode, ctx.evaluateSubExprEliminationState(subExprs.states.values), predCode)

Copy link
Contributor Author

@LuciferYang LuciferYang Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to a single line

@dongjoon-hyun
Copy link
Member

cc @peter-toth , too

@LuciferYang
Copy link
Contributor Author

Thank you for your review. @dongjoon-hyun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants