Skip to content

remove unnecessary sort over merge joins #3108

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Jul 23, 2025
Merged

remove unnecessary sort over merge joins #3108

merged 21 commits into from
Jul 23, 2025

Conversation

jycor
Copy link
Contributor

@jycor jycor commented Jul 17, 2025

This PR looks for Sort nodes over merge joins and removes them when applicable.

benchmarks: dolthub/dolt#9553
partially addresses: dolthub/dolt#8728

@jycor jycor marked this pull request as draft July 17, 2025 21:34
@jycor jycor force-pushed the james/merge branch 2 times, most recently from 2bb9f35 to 4a6dad3 Compare July 21, 2025 21:43
@jycor jycor marked this pull request as ready for review July 22, 2025 18:51
newNode, err := node.WithChildren(newChildren...)
if err != nil {
return nil, transform.SameTree, err
}
return newNode, transform.NewTree, nil
}

// buildReverseIndexedTable will attempt to take the lookup from an IndexedTableAccess, and return a new
// IndexedTableAccess with the lookup reversed.
func buildReverseIndexedTable(ctx *sql.Context, node sql.Node) (*plan.IndexedTableAccess, bool, error) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't currently used anywhere, but could be if we ever figure out how to properly reverse the index.

" └─ Sort(one_pk.pk ASC)\n" +
" └─ LeftOuterMergeJoin\n" +
" └─ Filter\n" +
" ├─ (NOT(niltable.f IS NULL))\n" +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does do these plans add a filter that wasn't there before?

Copy link
Contributor Author

@jycor jycor Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -15418,129 +15468,64 @@ inner join pq on true
" └─ columns: [i f]\n" +
"",
},
{
Query: `SELECT pk,i,f FROM one_pk LEFT JOIN niltable ON pk=i WHERE f IS NOT NULL ORDER BY 1`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this test deleted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

},
},
{
// The Sort node can be optimized out of this query, but currently is not
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what's going on with this test. The comments talk about sort nodes, but the test only tests the output, not the plan? What specifically are we testing here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That the output is correct.
I'm not sure if we should fill the script tests with plan tests. Plus, we already have plan tests.

@@ -20,9 +20,12 @@ func replaceIdxSort(ctx *sql.Context, a *Analyzer, n sql.Node, scope *plan.Scope
func replaceIdxSortHelper(ctx *sql.Context, scope *plan.Scope, node sql.Node, sortNode *plan.Sort) (sql.Node, transform.TreeIdentity, error) {
switch n := node.(type) {
case *plan.Sort:
sortNode = n // lowest parent sort node
// TODO: are there problems when there are multiple ORDER BYs?
if isValidSortFieldOrder(n.SortFields) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like if there's already an outer sort node, and then we encounter an inner sort node, then we might leave sortNode set to the outer one if the inner one isn't a valid sort field order. This probably can't cause an issue? (Since if you really had two sort nodes in a row the outer one would override the inner one), but I wanted to make sure we've thought about that possible code path.

@@ -184,6 +176,25 @@ func replaceIdxSortHelper(ctx *sql.Context, scope *plan.Scope, node sql.Node, so
if sameLeft && sameRight {
continue
}
// No need to check all SortField orders because of isValidSortFieldOrder
isReversed := sortNode.SortFields[0].Order == sql.Descending
// either left or right has been reversed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment confuses me. Why do we only use a reverse index if exactly one of the children as been modified? Why does the subsequent comment mention that "both Indexes must be reversed" but then only reverses one of them?

if err != nil {
return nil, transform.SameTree, err
}
// If we could not replace the IndexedTableAccess with a reversed one, result is same, so abandon
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what "result is same" means here. Can this comment make more clear what's happening and what it means?

}
return newNode, true, nil
func buildReverseIndexedTable(ctx *sql.Context, node sql.Node) (sql.Node, transform.TreeIdentity, error) {
return transform.Node(node, func(n sql.Node) (sql.Node, transform.TreeIdentity, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we call transform.Node here? What are the possible types of node? Is it not always an IndexedTableAccess?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible for there to be Projects and Filters over the IndexedTableAccess.

@@ -216,6 +216,8 @@ func (i *mergeJoinIter) Next(ctx *sql.Context) (sql.Row, error) {
} else if err != nil {
return nil, err
}
// merge join assumes children area sorted in ascending order, so we need to invert the comparison to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

area -> are

@jycor jycor merged commit 18eebe5 into main Jul 23, 2025
8 checks passed
@jycor jycor deleted the james/merge branch July 23, 2025 23:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants