-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Find a safe alternative to LogicalPlan::using_columns()
#14118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Okay, this definitely sent me on a ride but here are the results after multiple logging statement, function analysis and postgresql's planner analysis, most of which were rather useless the bug stems from what should be an incorrect regex (have not found the code for it yet). Why do I think so?
it seems like this bug occurs when beginning of alias matches with actual table name(not vice versa) and culprit code ends up excluding all the columns
another function which also seems fine the problematic part should be the piece of code where this exclusion actually happens(MIA) btw this is the output of logger when we use
and bug does not occur when alias is |
Hi @jonahgao Solution:- Rationale:- I am not sure if this will affect some other cases or not, but even if it does it will only show extra columns which is better than not showing any column at all. also this does not seem to affect tests negatively, so should I go with this or not?(I would put a PR by tomorrow, if you would like to see that first.) |
Hi @logan-keede , thanks for your investigation.
I think displaying extra columns may introduce more bugs, especially since the case in this issue is quite rare |
I thought so, I looked at some other avenues, Here are some things that might help anyone trying to solve this in future. If we make the sub-query alias name Possible Solution:- A normal statement without join should not have the need to use Possible implementation Strategies:-
@jonahgao, please let me know if you think this approach might work. Regardless, I would like to be unassigned from this issue, as I believe it is beyond my current capabilities. I plan to look for more beginner-friendly issues and spend some time familiarizing myself with the codebase first. PS: I might have circled back to the original issue/Problem, regardless I hope the process contributed something. ^_^ PS2: I think #1468 has the potential to resolve this issue though it does not seem like we can expect it anytime soon. |
I think it is the correct approach. Since unnamed subqueries do not appear in the logical plan tree, we need to do this check earlier. The fix likely requires some refactoring of the existing code, which is not friendly for people unfamiliar with the codebase. Unassigned and thanks @logan-keede again ❤️ |
Describe the bug
Background
Select with wildcard over a USING/NATURAL JOIN should deduplicate join columns.
For example:
This query above should output the column 'a' only once.
LogicalPlan::using_columns()
is used to find these join columns and to help exclude duplicated columns when expanding wildcards.Problem
using_columns() works by traversing the plan tree. This manner might be unsafe as it could incorrectly find columns that are not relevant to the current SQL context. This may lead to some output columns being incorrectly excluded.
For example, the result of the query below is different from other databases.
To Reproduce
Run query in CLI (compiled from the latest main: 722307f)
It outputs no columns.
Expected behavior
In PostgreSQL it does output one column.
Additional context
No response
The text was updated successfully, but these errors were encountered: