Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions core/src/main/java/org/apache/calcite/plan/RelOptUtil.java
Original file line number Diff line number Diff line change
Expand Up @@ -3346,6 +3346,21 @@ public static List<RexNode> pushPastProject(List<? extends RexNode> nodes,
// function? Possibly. But it's invalid SQL, so don't go there.
return null;
}
// [CALCITE-7551] Refuse to merge if it would duplicate a
// non-deterministic expression (e.g. RAND()).
final List<RexNode> bottom = project.getProjects();
final int[] refs = new int[bottom.size()];
new RexVisitorImpl<Void>(true) {
@Override public Void visitInputRef(RexInputRef ref) {
refs[ref.getIndex()]++;
return null;
}
}.visitEach(nodes);
for (int i = 0; i < refs.length; i++) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in Jira, this is a bit too conservative, since it will not distinguish CURRENT_TIMESTAMP from RAND. But fixing that may be in the scope of a separate PR - we need really two separate notions of nondeterminism.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. We should complete the fine-grained judgment of the deterministic function first before completing this PR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand, CURRENT_TIMESTAMP is actually different than non-deterministic functions like RAND().

  1. Non-deterministic function: it may return different values for every evaluations.
    1.1 Returns false to isDeterministic().
    1.2 Returns false to isDynamicFunction().

  2. Dynamic Function: It will return same value at every call site within one statement; can differ across executions
    2.1 Returns true to isDeterministic().
    2.2 Returns true to isDynamicFunction().

And this path only blocks when we get isDeterministic() method response as false. Dynamic functions like CURRENT_TIMESTAMP can be duplicated and actually it is safe to duplicate them.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JavaDoc of isDynamicFunction says:

/**
 * Returns whether it is unsafe to cache query plans referencing this
 * operator; false is assumed by default.
 */
public boolean isDynamicFunction() {

which is not exactly what you seem to imply.

@darpan-e6 darpan-e6 Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, that comment mixed up the two flags. The "same value at every call site within one statement" property I attached to isDynamicFunction() is not in its JavaDoc; I generalized from CURRENT_TIMESTAMP's behavior to isDynamicFunction as a whole, which is wrong, isDynamicFunction is overloaded (e.g. SqlRandIntegerFunction also returns true), and its actual contract is just plan-cache invalidation.

Scoping back to this PR: the bug only affects operators that can return different values across call sites within a single statement, i.e. operators with isDeterministic() == false, like RAND and RAND_INTEGER. The other operators that override isDynamicFunction() to true (SqlAbstractTimeFunction, SqlCurrentDateFunction, SqlBaseContextVariable) are per-statement-stable: duplicating them in the plan is observationally a no-op, because every occurrence resolves to the same value within an execution. So they are not affected by this bug and don't need to be blocked.

The guard in the fix uses RexUtil.isDeterministic(...), which is exactly the right discriminator for the affected set. I'll update the PR description to drop the misleading framing of isDynamicFunction().

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is fine, but I think we have no method to discriminate between RAND and CURRENT_TIMESTAMP. Maybe we need to introduce such a tool?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, makes sense.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should probably be a separate JIRA issue and PR. Coming up with a good name for this property of CURRENT_TIMESTAMP may be difficult.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking about it carefully, I believe we should create this JIRA before this PR is merged. However, it shouldn't block the PR from being merged, since the stricter approach does not introduce any incorrect behavior. Am I right?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so too. Can you please open a new ticket for this (or check whether there is one already open)?

if (refs[i] > 1 && !RexUtil.isDeterministic(bottom.get(i))) {
return null;
}
}
final List<RexNode> list = pushPastProject(nodes, project);
final int bottomCount = RexUtil.nodeCount(project.getProjects());
final int topCount = RexUtil.nodeCount(nodes);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,18 @@ protected FilterProjectTransposeRule(
// it can be pushed down. For now we don't support this.
return;
}
// Refuse to transpose if the filter references a projected column whose
// expression is non-deterministic (e.g. RAND()). Pushing the filter
// below the project would inline that expression into the new filter
// condition while the original is still produced by the project above,
// splitting one evaluation into two. References to deterministic columns
// (even when other columns are non-deterministic) are safe to push.
final List<RexNode> projects = project.getProjects();
for (int ref : RelOptUtil.InputFinder.bits(filter.getCondition())) {
if (!RexUtil.isDeterministic(projects.get(ref))) {
return;
}
}
// convert the filter to one that references the child of the project
RexNode newCondition =
RelOptUtil.pushPastProjectUnlessBloat(filter.getCondition(), project, config.bloat());
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -36,9 +36,11 @@
import org.apache.calcite.rex.RexNode;
import org.apache.calcite.rex.RexProgram;
import org.apache.calcite.rex.RexProgramBuilder;
import org.apache.calcite.rex.RexUtil;
import org.apache.calcite.sql.validate.SqlValidatorUtil;
import org.apache.calcite.tools.RelBuilder;
import org.apache.calcite.tools.RelBuilderFactory;
import org.apache.calcite.util.ImmutableBitSet;
import org.apache.calcite.util.Pair;

import org.checkerframework.checker.nullness.qual.Nullable;
Expand Down Expand Up @@ -112,6 +114,22 @@ public JoinProjectTransposeRule(RelOptRuleOperand operand,

//~ Methods ----------------------------------------------------------------

/** Returns whether {@code conditionRefs} (input references of the join
* condition, expressed against the join's combined output) references a
* non-deterministic expression of {@code project}, whose first output
* field is at {@code offset} in that combined output. */
private static boolean referencesNonDeterministic(Project project,
ImmutableBitSet conditionRefs, int offset) {
final List<RexNode> exprs = project.getProjects();
for (int i = 0; i < exprs.size(); i++) {
if (conditionRefs.get(offset + i)
&& !RexUtil.isDeterministic(exprs.get(i))) {
return true;
}
}
return false;
}

@Override public void onMatch(RelOptRuleCall call) {
final Join join = call.rel(0);
final JoinRelType joinType = join.getJoinType();
Expand Down Expand Up @@ -151,6 +169,26 @@ public JoinProjectTransposeRule(RelOptRuleOperand operand,
rightJoinChild = join.getRight();
}

// Skip a project when the join condition references one of its
// non-deterministic expressions (e.g. RAND()). The merge below inlines
// that expression into the new join condition via expandLocalRef while
// the project still re-emits it above, splitting one evaluation into
// two. Non-deterministic columns that the condition does not reference
// are safe to pull up.
final ImmutableBitSet conditionRefs =
RelOptUtil.InputFinder.bits(join.getCondition());
final int nLeftFields = join.getLeft().getRowType().getFieldCount();
if (leftProject != null
&& referencesNonDeterministic(leftProject, conditionRefs, 0)) {
leftProject = null;
leftJoinChild = join.getLeft();
}
if (rightProject != null
&& referencesNonDeterministic(rightProject, conditionRefs, nLeftFields)) {
rightProject = null;
rightJoinChild = join.getRight();
}

if ((leftProject == null) && (rightProject == null)) {
return;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
package org.apache.calcite.rel.rules;

import org.apache.calcite.plan.RelOptRuleCall;
import org.apache.calcite.plan.RelOptUtil;
import org.apache.calcite.plan.RelRule;
import org.apache.calcite.rel.RelNode;
import org.apache.calcite.rel.core.Join;
Expand All @@ -31,8 +32,10 @@
import org.apache.calcite.rex.RexNode;
import org.apache.calcite.rex.RexProgram;
import org.apache.calcite.rex.RexProgramBuilder;
import org.apache.calcite.rex.RexUtil;
import org.apache.calcite.sql.validate.SqlValidatorUtil;
import org.apache.calcite.tools.RelBuilder;
import org.apache.calcite.util.ImmutableBitSet;
import org.apache.calcite.util.Pair;

import com.google.common.collect.ImmutableList;
Expand Down Expand Up @@ -72,6 +75,21 @@ protected SemiJoinProjectTransposeRule(Config config) {
final Join semiJoin = call.rel(0);
final Project project = call.rel(1);

// Skip when the semi-join condition references one of the project's
// non-deterministic expressions (e.g. RAND()). Pulling such a project
// above the semi-join inlines that expression into the join condition
// via expandLocalRef while the project still re-emits it above,
// splitting one evaluation into two. Non-deterministic columns that the
// condition does not reference are safe to pull up. See [CALCITE-7551].
final ImmutableBitSet conditionRefs =
RelOptUtil.InputFinder.bits(semiJoin.getCondition());
final List<RexNode> projects = project.getProjects();
for (int i = 0; i < projects.size(); i++) {
if (conditionRefs.get(i) && !RexUtil.isDeterministic(projects.get(i))) {
return;
}
}

// Convert the LHS semi-join keys to reference the child projection
// expression; all projection expressions must be RexInputRefs,
// otherwise, we wouldn't have created this semi-join.
Expand Down
100 changes: 100 additions & 0 deletions core/src/test/java/org/apache/calcite/test/RelOptRulesTest.java
Original file line number Diff line number Diff line change
Expand Up @@ -1533,6 +1533,69 @@ private void checkSemiOrAntiJoinProjectTranspose(JoinRelType type) {
.check();
}

/** Test case for
* <a href="https://issues.apache.org/jira/browse/CALCITE-7551">[CALCITE-7551]
* Project/Filter/Join transpose and merge rules can duplicate
* non-deterministic expressions</a>. JoinProjectTransposeRule must
* not pull a project containing a non-deterministic expression above
* the join, because it inlines the expression into the new join
* condition via {@code mergedProgram.expandLocalRef}. */
@Test void testJoinProjectTransposeShouldIgnoreNonDeterministic() {
final String sql = "select * from (select empno, rand() as r from emp) e\n"
+ "join dept d on e.r = d.deptno";
sql(sql).withRule(CoreRules.JOIN_PROJECT_LEFT_TRANSPOSE).checkUnchanged();
}

/** Test case for
* <a href="https://issues.apache.org/jira/browse/CALCITE-7551">[CALCITE-7551]
* Project/Filter/Join transpose and merge rules can duplicate
* non-deterministic expressions</a>. The transpose is still allowed when
* the join condition only references deterministic projected columns,
* even if the project also computes a non-deterministic column (here
* {@code r} is RAND() but the join is on DEPTNO). */
@Test void testJoinProjectTransposeWithUnrelatedNonDeterministic() {
final String sql = "select * from (select rand() as r, deptno from emp) e\n"
+ "join dept d on e.deptno = d.deptno";
sql(sql).withRule(CoreRules.JOIN_PROJECT_LEFT_TRANSPOSE).check();
}

/** Test case for
* <a href="https://issues.apache.org/jira/browse/CALCITE-7551">[CALCITE-7551]
* Project/Filter/Join transpose and merge rules can duplicate
* non-deterministic expressions</a>. SemiJoinProjectTransposeRule
* uses the same {@code mergePrograms} + {@code expandLocalRef}
* pattern as JoinProjectTransposeRule, and must not pull a project
* above the semi-join when the condition references one of its
* non-deterministic expressions. */
@Test void testSemiJoinProjectTransposeShouldIgnoreNonDeterministic() {
final String sql = "select * from (select empno, rand() as r from emp) e\n"
+ "where e.r in (select sal from emp)";
sql(sql)
.withDecorrelate(false)
.withExpand(true)
.withPreRule(CoreRules.PROJECT_TO_SEMI_JOIN)
.withRule(CoreRules.SEMI_JOIN_PROJECT_TRANSPOSE)
.checkUnchanged();
}

/** Test case for
* <a href="https://issues.apache.org/jira/browse/CALCITE-7551">[CALCITE-7551]
* Project/Filter/Join transpose and merge rules can duplicate
* non-deterministic expressions</a>. The semi-join transpose is still
* allowed when the condition only references deterministic projected
* columns, even if the project also computes a non-deterministic column
* (here {@code r} is RAND() but the semi-join is on DEPTNO). */
@Test void testSemiJoinProjectTransposeWithUnrelatedNonDeterministic() {
final String sql = "select * from (select rand() as r, deptno from emp) e\n"
+ "where e.deptno in (select deptno from dept)";
sql(sql)
.withDecorrelate(false)
.withExpand(true)
.withPreRule(CoreRules.PROJECT_TO_SEMI_JOIN)
.withRule(CoreRules.SEMI_JOIN_PROJECT_TRANSPOSE)
.check();
}

/** Test case for
* <a href="https://issues.apache.org/jira/browse/CALCITE-1338">[CALCITE-1338]
* JoinProjectTransposeRule should not pull a literal above the
Expand Down Expand Up @@ -3204,6 +3267,31 @@ private void checkProjectCorrelateTransposeRuleSemiOrAntiCorrelate(JoinRelType t
.check();
}

/** Test case for
* <a href="https://issues.apache.org/jira/browse/CALCITE-7551">[CALCITE-7551]
* Project/Filter/Join transpose and merge rules can duplicate
* non-deterministic expressions</a>. FilterProjectTransposeRule must
* not pull a filter that references a non-deterministic projected
* column below the project. */
@Test void testFilterProjectTransposeShouldIgnoreNonDeterministic() {
final String sql = "select * from (select rand() as a from emp)\n"
+ "where a > 0 and a < 1";
sql(sql).withRule(CoreRules.FILTER_PROJECT_TRANSPOSE).checkUnchanged();
}

/** Test case for
* <a href="https://issues.apache.org/jira/browse/CALCITE-7551">[CALCITE-7551]
* Project/Filter/Join transpose and merge rules can duplicate
* non-deterministic expressions</a>. The transpose is still allowed when
* the filter only references deterministic projected columns, even if
* other columns in the project are non-deterministic (here {@code r} is
* RAND() but the filter is on {@code b}). */
@Test void testFilterProjectTransposeWithUnrelatedNonDeterministic() {
final String sql = "select * from (select rand() as r, deptno as b from emp)\n"
+ "where b > 0";
sql(sql).withRule(CoreRules.FILTER_PROJECT_TRANSPOSE).check();
}

private static final String NOT_STRONG_EXPR =
"case when e.sal < 11 then 11 else -1 * e.sal end";

Expand Down Expand Up @@ -6920,6 +7008,18 @@ private HepProgram getTransitiveProgram() {
sql(sql).withRule(CoreRules.PROJECT_MERGE).checkUnchanged();
}

/** Test case for
* <a href="https://issues.apache.org/jira/browse/CALCITE-7551">[CALCITE-7551]
* Project/Filter/Join transpose and merge rules can duplicate
* non-deterministic expressions</a>. ProjectMergeRule must not merge
* adjacent projects when doing so would duplicate a non-deterministic
* expression. */
@Test void testProjectMergeShouldIgnoreNonDeterministic() {
final String sql = "select a, a + 1 as b from (select rand() as a from emp)";
sql(sql).withRule(CoreRules.PROJECT_MERGE).checkUnchanged();
}


@Test void testAggregateProjectPullUpConstants() {
final String sql = "select job, empno, sal, sum(sal) as s\n"
+ "from emp where empno = 10\n"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6179,4 +6179,13 @@ void checkUserDefinedOrderByOver(NullCollation nullCollation) {
assertThat(plan, not(containsString("FLOOR(FLOOR")));
assertThat(plan, containsString("FLOOR($4, FLAG(WEEK))"));
}

/** Test case of

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is SqlToRel connected to these rules?

@darpan-e6 darpan-e6 May 28, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because SqlToRelConverter builds its RelNode tree through RelBuilder, and RelBuilder eagerly merges adjacent projects at construction time using the same helper that the planner rules use.

Concretely, RelBuilder.project_ method uses the method RelOptUtil.pushPastProjectUnlessBloat(nodeList, project, config.bloat()) which we are fixing as part of this PR, so I thought of putting a test here as well.

* <a href="https://issues.apache.org/jira/browse/CALCITE-7551">[CALCITE-7551]
* Non-deterministic expressions (e.g. {@code RAND()}) should not be
* duplicated when projections are merged</a>. */
@Test void testRandNotDuplicatedInProjectionMerge() {
final String sql = "select a, a + 1 as b from (select rand() as a)";
sql(sql).ok();
}
}
Loading
Loading