Skip to content
This repository has been archived by the owner on Jan 7, 2025. It is now read-only.

feat: support left-outer and left-mark hash join impl rules #274

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

yliang412
Copy link
Member

@yliang412 yliang412 commented Dec 22, 2024

Problem

We should be able to convert left-outer join and left-mark logical equi join to hash join.

Summary of changes

  • Add implementation rules to handle these cases.
  • Add join-split-filter rules that extract predicates in the join condition into those that can be pushed down as filters.

misc

  • refactor simplify_log_expr to stop using unreachable!

Not ideal, wants to unite inner, left-outer, and left-mark into one rule

Signed-off-by: Yuchen Liang <[email protected]>
Signed-off-by: Yuchen Liang <[email protected]>
Signed-off-by: Yuchen Liang <[email protected]>
@yliang412
Copy link
Member Author

tpch Q13 needs a rule to push split a filter from the join node, and then the join could be turned into a left-outer hash join. Working on this now.

...
  ├── cond:And
  │   ├── Eq
  │   │   ├── #0
  │   │   └── #9
  │   └── Like { expr: #16, pattern: "%special%requests%", negated: true, case_insensitive: false }

yliang412 and others added 5 commits January 6, 2025 12:37
├── PhysicalScan { table: customer }
└── PhysicalScan { table: orders }
└── PhysicalFilter { cond: Like { expr: #8, pattern: "%special%requests%", negated: true, case_insensitive: false } }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pushing down the filter and turn into hash join.

├── cond:Eq
│ ├── #1
│ └── #14
└── PhysicalHashJoin { join_type: LeftMark, left_keys: [ #1 ], right_keys: [ #0 ] }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

picked up by the new hash-join-left-mark rule

│ └── PhysicalScan { table: part }
└── PhysicalScan { table: lineitem }
└── PhysicalProjection { exprs: [ #0, #2 ] }
└── PhysicalHashJoin { join_type: LeftOuter, left_keys: [ #0 ], right_keys: [ #0 ] }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left-outer hash join

│ └── Eq
│ ├── #0
│ └── #1
└── PhysicalHashJoin { join_type: LeftOuter, left_keys: [ #0 ], right_keys: [ #0 ] }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left-outer hash join

├── cond:Eq
│ ├── #0
│ └── #11
└── PhysicalHashJoin { join_type: LeftMark, left_keys: [ #0 ], right_keys: [ #0 ] }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left-mark hash join

Comment on lines +114 to +126
└── PhysicalNestedLoopJoin
├── join_type: Inner
├── cond:And
│ ├── Gt
│ │ ├── Cast { cast_to: Float64, child: #2 }
│ │ └── #8
│ ├── Eq
│ │ ├── #0
│ │ └── #6
│ └── Eq
│ ├── #1
│ └── #7
├── PhysicalFilter { cond: #5 }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is a little confusing. investigating ...

@yliang412 yliang412 changed the title [WIP] support left-outer and left-mark hash join impl rules feat: support left-outer and left-mark hash join impl rules Jan 6, 2025
@yliang412 yliang412 marked this pull request as ready for review January 6, 2025 18:39
Signed-off-by: Yuchen Liang <[email protected]>
(Join(JoinType::Inner), child_a, child_b)
);

define_rule!(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this rule is correct. You cannot move the outer join condition into a filter in some cases.

Consider select * from a left join b on a.x = b.y and b.z = 1. The result is different from select * from a left join b on a.x = b.y where b.z = 1. Assume left table is x=1, right table is y=1,z=2, the correct result is 1, NULL, NULL, versus the rule will produce zero rows.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh, I realized that this is a filter pushdown, then it might be correct; I will do a review later :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants