-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[CALCITE-7551] Project/Filter/Join transpose and merge rules should not duplicate non-deterministic expressions (e.g. RAND()) #4970
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
5eb31eb
9a3f931
51e6c7a
241bdab
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -6179,4 +6179,13 @@ void checkUserDefinedOrderByOver(NullCollation nullCollation) { | |
| assertThat(plan, not(containsString("FLOOR(FLOOR"))); | ||
| assertThat(plan, containsString("FLOOR($4, FLAG(WEEK))")); | ||
| } | ||
|
|
||
| /** Test case of | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why is SqlToRel connected to these rules?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Because Concretely, |
||
| * <a href="https://issues.apache.org/jira/browse/CALCITE-7551">[CALCITE-7551] | ||
| * Non-deterministic expressions (e.g. {@code RAND()}) should not be | ||
| * duplicated when projections are merged</a>. */ | ||
| @Test void testRandNotDuplicatedInProjectionMerge() { | ||
| final String sql = "select a, a + 1 as b from (select rand() as a)"; | ||
| sql(sql).ok(); | ||
| } | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in Jira, this is a bit too conservative, since it will not distinguish CURRENT_TIMESTAMP from RAND. But fixing that may be in the scope of a separate PR - we need really two separate notions of nondeterminism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. We should complete the fine-grained judgment of the deterministic function first before completing this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand, CURRENT_TIMESTAMP is actually different than non-deterministic functions like RAND().
Non-deterministic function: it may return different values for every evaluations.
1.1 Returns false to
isDeterministic().1.2 Returns false to
isDynamicFunction().Dynamic Function: It will return same value at every call site within one statement; can differ across executions
2.1 Returns true to
isDeterministic().2.2 Returns true to
isDynamicFunction().And this path only blocks when we get
isDeterministic()method response as false. Dynamic functions likeCURRENT_TIMESTAMPcan be duplicated and actually it is safe to duplicate them.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The JavaDoc of
isDynamicFunctionsays:which is not exactly what you seem to imply.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, that comment mixed up the two flags. The "same value at every call site within one statement" property I attached to
isDynamicFunction()is not in its JavaDoc; I generalized fromCURRENT_TIMESTAMP's behavior toisDynamicFunctionas a whole, which is wrong,isDynamicFunctionis overloaded (e.g.SqlRandIntegerFunctionalso returnstrue), and its actual contract is just plan-cache invalidation.Scoping back to this PR: the bug only affects operators that can return different values across call sites within a single statement, i.e. operators with
isDeterministic() == false, likeRANDandRAND_INTEGER. The other operators that overrideisDynamicFunction()totrue(SqlAbstractTimeFunction,SqlCurrentDateFunction,SqlBaseContextVariable) are per-statement-stable: duplicating them in the plan is observationally a no-op, because every occurrence resolves to the same value within an execution. So they are not affected by this bug and don't need to be blocked.The guard in the fix uses
RexUtil.isDeterministic(...), which is exactly the right discriminator for the affected set. I'll update the PR description to drop the misleading framing ofisDynamicFunction().There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is fine, but I think we have no method to discriminate between RAND and CURRENT_TIMESTAMP. Maybe we need to introduce such a tool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should probably be a separate JIRA issue and PR. Coming up with a good name for this property of CURRENT_TIMESTAMP may be difficult.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After thinking about it carefully, I believe we should create this JIRA before this PR is merged. However, it shouldn't block the PR from being merged, since the stricter approach does not introduce any incorrect behavior. Am I right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so too. Can you please open a new ticket for this (or check whether there is one already open)?