Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: add test for impure function correlation behavior #9014

Merged
merged 6 commits into from
Jan 23, 2025

Conversation

NickCrews
Copy link
Contributor

@NickCrews NickCrews commented Apr 18, 2024

Related to #8921,
trying to write down exactly what the expected behavior is.

I figure we can use this PR to hash out exaclty what we want the semantics to be, and then the other discussions might be easier because our goal is written down precisely somewhere. Please let me know if you agree or disagree with this behavior, or if there are other tests we should add.

Need to fix the UDF test case in a followup. Also wasn't sure where to put these tests, I put them in their own little file but if you point me elsewhere Iwill move them.

@NickCrews
Copy link
Contributor Author

oooh, these CI failures are revealing differences in the backend behaviors... wheeee into the 🐰 🕳️ we go!

@NickCrews
Copy link
Contributor Author

I can think of two reasons a user might care about the semantics here, and I hope we can support both of their needs:

  1. impureness/correlatedness. If a function is impure, they may want it to be executed once, or many times, depending on if they want them to be correlated. See this example
  2. performance. If some computation is slow, they only want it to happen a single time.

So I think this means that we as ibis authors can't assume what the goals of the user is, and how many times they want an expression executed. Therefore, we shouldn't do any clever rewrites or mergings of selects. I think we need to keep a more 1:1 correspondence between what the user writes and the SQL we produce: Every time the user does a .select(), .mutate(), etc, (except for simple column renamings, and maybe a few other cases) that leads to exactly one more with select .... as ... in the generated SQL.

IDK, what do you think of this train of reasoning? I think I'm fairly convinced that those two use cases are the requirements for success, but perhaps there is a different/better way of accomplishing that goal

@kszucs
Copy link
Member

kszucs commented Apr 22, 2024

@NickCrews please rebase to test with #9023 change included

@cpcloud
Copy link
Member

cpcloud commented Apr 22, 2024

@NickCrews My only objection would be that

performance. If some computation is slow, they only want it to happen a single time.

Is not something we can enforce even if we never merge any select statements. This kind of guarantee is at the level of the query engine.

@NickCrews
Copy link
Contributor Author

@cpcloud yup you are right with guarantees, "suggesting" to the backend is the best we can do.

What do you think in general of my proposal of "one CTE per .select"? I'm not sure if you're skeptical of the whole thing or just the performance claims... Thanks!

@NickCrews
Copy link
Contributor Author

I'm trying to decide what is higher priority:

An implementation that is faster 90% of the time, but does clever things and therefore isn't able to be tuned by the user that 10% of the time they need it

Vs

An implementation that is a bit slower in the majority of cases, but is always fine tunable to get the perf you need in the edge cases.

@NickCrews NickCrews force-pushed the test-correlation branch 2 times, most recently from 7700941 to d14384c Compare April 23, 2024 19:21
@NickCrews
Copy link
Contributor Author

slowly going through the backends and adding the correct marks for each kind of failure...

@NickCrews NickCrews force-pushed the test-correlation branch 4 times, most recently from ee0ae0c to 37ae5dd Compare April 23, 2024 22:47
@NickCrews NickCrews enabled auto-merge (rebase) April 23, 2024 23:13
@NickCrews
Copy link
Contributor Author

@cpcloud @kszucs I think this is ready for review whenever you get the chance! I think this is the groundwork for defining the current state, and after we get this in then we can start talking about what we think ideal behavior should be, and how to get there.

@NickCrews NickCrews requested review from cpcloud and kszucs May 7, 2024 01:48
@NickCrews
Copy link
Contributor Author

Anything I can do here to help move this forward?

@cpcloud
Copy link
Member

cpcloud commented Jun 13, 2024

@NickCrews Can you write a docstring for each of these tests explaining what's being tested in each? I know that's atypical, but this is a very hairy problem with lots of specifics that are important to understand, and words like "impure" and "correlation" need to be precisely defined so that everyone is on the same page about exactly what is being tested.

@NickCrews
Copy link
Contributor Author

I think that is a good idea. Will do.

@NickCrews NickCrews force-pushed the test-correlation branch 2 times, most recently from ee68083 to 3834829 Compare June 29, 2024 18:57
@NickCrews
Copy link
Contributor Author

oops, I just wrote the notes as comments, not docstrings, I assume that is still OK? I think this is ready to review now, thanks!

@github-actions github-actions bot added the sql Backends that generate SQL label Oct 8, 2024
@cpcloud
Copy link
Member

cpcloud commented Oct 8, 2024

It seems like this behavior is intentional across our backends, and that writing a query as a CTE in no way guarantees how many times it will be evaluated. It could be one, or it could be as many as there are references to the CTE.

@NickCrews
Copy link
Contributor Author

NickCrews commented Oct 8, 2024

Darn it. Here are specifically which backends.

Looks like DuckDB is aware of this, and they want to fix it, but not currently feasible for them to fix.

@NickCrews
Copy link
Contributor Author

ughh, what do we want to do? In order to get the self-join behavior that the user wants, I think the backend would have to materialize a temporary table. But I really don't think we want to do that automatically for them. So the best we could do is to detect this happening, and then error? That sounds quite hairy.

@cpcloud
Copy link
Member

cpcloud commented Oct 8, 2024

I think the best we can do is to recommend calling .cache() in these cases. That's the only thing that will guarantee the desired semantics.

@cpcloud cpcloud force-pushed the test-correlation branch 2 times, most recently from e09d66d to 8b51625 Compare October 8, 2024 17:05
@NickCrews
Copy link
Contributor Author

Do we do any sort of check to ensure people don't footgun themselves, or do we only just let them get weird results and then have to start searching the issues/docs? I can't think of where this should go in the docs, maybe we want a "common pitfalls" or "troubleshooting" page? I'm not sure if there are other footguns like this that we should call out that would also fit in there

@cpcloud
Copy link
Member

cpcloud commented Oct 9, 2024

Do we do any sort of check to ensure people don't footgun themselves, or do we only just let them get weird results and then have to start searching the issues/docs? I can't think of where this should go in the docs, maybe we want a "common pitfalls" or "troubleshooting" page? I'm not sure if there are other footguns like this that we should call out that would also fit in there

We don't have a check for this and I don't think it's worth adding because of the complexity of such a check.

I'm not sure if there are other footguns like this that we should call out that would also fit in there.

There are almost certainly are, but let's try to stay focused on this issue before moving on to other things.

@NickCrews
Copy link
Contributor Author

OK, sounds good to me to just get flink passing and then get this PR merged and call it a day, at least the behavior is written down somewhere. How does this sound as a plan?

I added a few comments to the tests so someone sees what our conclusions were.

@cpcloud cpcloud force-pushed the test-correlation branch 3 times, most recently from eedd6ec to f11673f Compare January 21, 2025 11:35
@cpcloud
Copy link
Member

cpcloud commented Jan 21, 2025

Still not entirely sure why the Flink backend is failing.

@cpcloud
Copy link
Member

cpcloud commented Jan 22, 2025

Turns out this never worked, but was passing because:

  1. The rich output formatter selects a subset of columns before calling to_pyarrow on the table expression to avoid computing a bunch of stuff that wouldn't ever show up in the terminal due to lack of space imposed by user settings or the width of the terminal.
  2. This PR caused the problem to show up because previously, column inlining optimization allowed the delta_col column in the test to be inlined in the top-level select, and then the subset was computed, which did not include delta_col, so that part of the SQL query never ran and thus never triggered the exception.

This PR disables column inlining for anything subclassing ops.Impure due to the nature of impure operations--impure roughly equates to "stateful"--of which the now() function is one.

This means that in this test case the inner query will remain, and thus likely (though not guaranteed (!) because the engine might still eliminate the inner query) still evaluate the inner query. In the flink case, this is what happens.

For now, I'll xfail the flink backend on this test.

Going forward, we should figure out a way to allow elimination of subexpressions before inlining, as removal of unused impure expressions should allow more opportunities for inlining.

@github-actions github-actions bot added dependencies Issues or PRs related to dependencies nix Issues or PRs related to nix labels Jan 22, 2025
@NickCrews
Copy link
Contributor Author

Turns out this never worked, but was passing because:

wow, that is very interesting. Nice job figuring that one out!

@cpcloud cpcloud merged commit 9b44750 into ibis-project:main Jan 23, 2025
86 of 87 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Issues or PRs related to dependencies duckdb The DuckDB backend nix Issues or PRs related to nix sql Backends that generate SQL tests Issues or PRs related to tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants