-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Remove inline table scan analyzer rule #15201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jayzhan211
I have a suggestion -- let me know what you think
I would also like to run planning benchmarks on this PR -- will do later
datafusion/sql/src/select.rs
Outdated
@@ -550,7 +550,29 @@ impl<S: ContextProvider> SqlToRel<'_, S> { | |||
0 => Ok(LogicalPlanBuilder::empty(true).build()?), | |||
1 => { | |||
let input = from.remove(0); | |||
self.plan_table_with_joins(input, planner_context) | |||
let table = self.plan_table_with_joins(input, planner_context)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will have the effect of only inlining table scans for SQL
Perhaps we could move the code into LogicalPlanBuilder so it also applies to dataframes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both dataframe and sql works well now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jayzhan211 -- this is looking close. I am also running benchmarks and will report back shortly
1 2 | ||
1 3 | ||
|
||
query error DataFusion error: Execution error: Table 'v' doesn't exist\. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be drop view v
rather than drop table
} | ||
} | ||
|
||
Ok(Self::new(LogicalPlan::TableScan(table_scan))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like the same logic should apply to scan_with_filters_fetch below
@@ -1563,8 +1563,12 @@ async fn with_column_join_same_columns() -> Result<()> { | |||
\n Limit: skip=0, fetch=1\ | |||
\n Sort: t1.c1 ASC NULLS FIRST\ | |||
\n Inner Join: t1.c1 = t2.c1\ | |||
\n TableScan: t1\ | |||
\n TableScan: t2", | |||
\n SubqueryAlias: t1\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still worried about this change -- I don't undersrtand why the subquery allias has not been removed or if that is a problem 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the subquery alias is what shows we correctly inline tablescan, since adding alias is the only thing we have done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can check the into_optimized_plan
below, we have the consistent result.
The process includes both an analyzer and an optimizer. This is why the main branch does not contain a subquery before optimization, but one appears after inline_tablescan is called.
In this change, since inline_tablescan is invoked during plan building, the subquery is introduced before the optimization step
I ran planning performance benchmarks and I would say they showed no discernible difference with this branch Details
|
TableScan: t1 | ||
TableScan: t2 | ||
"### | ||
SubqueryAlias: t1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am concerned about this change -- it seems like a regression to me.
Maybe it is ok, but I don't understand the difference/ why there are a bunch of projections / SubqueryAlias nows
If these are fixed by physical plan time it is fine, but from here it looks like something is wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I explain it in the above #15201 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay - thank you @jayzhan211
Thanks @alamb |
Hi @jayzhan211 @alamb, I'm a bit concerned with this patch, I was just in the process of submitting an enhancement to The proposal on the original implementation was that we should be transforming down instead of up, to allow the child plans to be expanded all the way down. What are your thoughts around this? How are you dealing with a query plan that is built incrementally from 2-3 levels of using I'm not seeing any tests handling this use case yet, and my first attempt of cherry-picking your commit breaks in other places for our fork. |
I think nested query is also correctly handled, at least correct for sql case
|
@jayzhan211 can you try with a 3rd view? 2 levels were also working on the original inlining code. |
It works too
|
@aditanase perhaps you can make a PR with whatever tests were breaking in your fork so we can make sure they work here |
The fork was not compiling, we were on 46.0.0 but the latest changes made my rebase more difficult than usual, we'll figure it out. I will port the tests on our next iteration, but would appreciate a recommendation for a good suite to add them to as I rewrite them - they are currently only validating the |
Thanks! I think unit tests on InlineTableScan is likely not as helpful as something like building multiple nested levels of |
Which issue does this PR close?
Analyzer
within LogicalPlan building stage #14618Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?