-
-
Notifications
You must be signed in to change notification settings - Fork 231
remove unnecessary sort over merge joins #3108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2bb9f35
to
4a6dad3
Compare
sql/analyzer/replace_sort.go
Outdated
newNode, err := node.WithChildren(newChildren...) | ||
if err != nil { | ||
return nil, transform.SameTree, err | ||
} | ||
return newNode, transform.NewTree, nil | ||
} | ||
|
||
// buildReverseIndexedTable will attempt to take the lookup from an IndexedTableAccess, and return a new | ||
// IndexedTableAccess with the lookup reversed. | ||
func buildReverseIndexedTable(ctx *sql.Context, node sql.Node) (*plan.IndexedTableAccess, bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't currently used anywhere, but could be if we ever figure out how to properly reverse the index.
" └─ Sort(one_pk.pk ASC)\n" + | ||
" └─ LeftOuterMergeJoin\n" + | ||
" └─ Filter\n" + | ||
" ├─ (NOT(niltable.f IS NULL))\n" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does do these plans add a filter that wasn't there before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It didn't the diff is just offset: https://github.com/dolthub/go-mysql-server/pull/3108/files#diff-eae6109880e132a9197c2b536f62e9572fc34d8e7c65e957dd37f48657790a94L15426
@@ -15418,129 +15468,64 @@ inner join pq on true | |||
" └─ columns: [i f]\n" + | |||
"", | |||
}, | |||
{ | |||
Query: `SELECT pk,i,f FROM one_pk LEFT JOIN niltable ON pk=i WHERE f IS NOT NULL ORDER BY 1`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this test deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wasn't the diff is just offset: https://github.com/dolthub/go-mysql-server/pull/3108/files#diff-eae6109880e132a9197c2b536f62e9572fc34d8e7c65e957dd37f48657790a94R15413
enginetest/queries/script_queries.go
Outdated
}, | ||
}, | ||
{ | ||
// The Sort node can be optimized out of this query, but currently is not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what's going on with this test. The comments talk about sort nodes, but the test only tests the output, not the plan? What specifically are we testing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That the output is correct.
I'm not sure if we should fill the script tests with plan tests. Plus, we already have plan tests.
@@ -20,9 +20,12 @@ func replaceIdxSort(ctx *sql.Context, a *Analyzer, n sql.Node, scope *plan.Scope | |||
func replaceIdxSortHelper(ctx *sql.Context, scope *plan.Scope, node sql.Node, sortNode *plan.Sort) (sql.Node, transform.TreeIdentity, error) { | |||
switch n := node.(type) { | |||
case *plan.Sort: | |||
sortNode = n // lowest parent sort node | |||
// TODO: are there problems when there are multiple ORDER BYs? | |||
if isValidSortFieldOrder(n.SortFields) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like if there's already an outer sort node, and then we encounter an inner sort node, then we might leave sortNode
set to the outer one if the inner one isn't a valid sort field order. This probably can't cause an issue? (Since if you really had two sort nodes in a row the outer one would override the inner one), but I wanted to make sure we've thought about that possible code path.
sql/analyzer/replace_sort.go
Outdated
@@ -184,6 +176,25 @@ func replaceIdxSortHelper(ctx *sql.Context, scope *plan.Scope, node sql.Node, so | |||
if sameLeft && sameRight { | |||
continue | |||
} | |||
// No need to check all SortField orders because of isValidSortFieldOrder | |||
isReversed := sortNode.SortFields[0].Order == sql.Descending | |||
// either left or right has been reversed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment confuses me. Why do we only use a reverse index if exactly one of the children as been modified? Why does the subsequent comment mention that "both Indexes must be reversed" but then only reverses one of them?
sql/analyzer/replace_sort.go
Outdated
if err != nil { | ||
return nil, transform.SameTree, err | ||
} | ||
// If we could not replace the IndexedTableAccess with a reversed one, result is same, so abandon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what "result is same" means here. Can this comment make more clear what's happening and what it means?
} | ||
return newNode, true, nil | ||
func buildReverseIndexedTable(ctx *sql.Context, node sql.Node) (sql.Node, transform.TreeIdentity, error) { | ||
return transform.Node(node, func(n sql.Node) (sql.Node, transform.TreeIdentity, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we call transform.Node here? What are the possible types of node
? Is it not always an IndexedTableAccess?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible for there to be Projects and Filters over the IndexedTableAccess.
sql/rowexec/merge_join.go
Outdated
@@ -216,6 +216,8 @@ func (i *mergeJoinIter) Next(ctx *sql.Context) (sql.Row, error) { | |||
} else if err != nil { | |||
return nil, err | |||
} | |||
// merge join assumes children area sorted in ascending order, so we need to invert the comparison to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
area -> are
This PR looks for Sort nodes over merge joins and removes them when applicable.
benchmarks: dolthub/dolt#9553
partially addresses: dolthub/dolt#8728