Support zero copy hash repartitioning for Hash Join #2

zebsme · 2025-03-31T18:31:17Z

Which issue does this PR close?

A New Draft for Support zero copy hash repartitioning for Hash Join #1

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

zebsme · 2025-03-31T18:44:51Z

Plan logic already implemented:

explain
SELECT * FROM
(SELECT x+1 AS col0, y+1 AS col1 FROM PAIRS WHERE x == y)
JOIN f
ON col0 = f.a
JOIN s
ON col1 = s.b
----
logical_plan
01)Inner Join: col1 = CAST(s.b AS Int64)
02)--Inner Join: col0 = CAST(f.a AS Int64)
03)----Projection: CAST(pairs.x AS Int64) + Int64(1) AS col0, CAST(pairs.y AS Int64) + Int64(1) AS col1
04)------Filter: pairs.y = pairs.x
05)--------TableScan: pairs projection=[x, y]
06)----TableScan: f projection=[a]
07)--TableScan: s projection=[b]
physical_plan
01)CoalesceBatchesExec: target_batch_size=8192
02)--HashJoinExec: mode=Partitioned(SelectionVector), join_type=Inner, on=[(col1@1, CAST(s.b AS Int64)@1)], projection=[col0@0, col1@1, a@2, b@3]
03)----ProjectionExec: expr=[col0@1 as col0, col1@2 as col1, a@0 as a]
04)------CoalesceBatchesExec: target_batch_size=8192
05)--------HashJoinExec: mode=Partitioned(SelectionVector), join_type=Inner, on=[(CAST(f.a AS Int64)@1, col0@0)], projection=[a@0, col0@2, col1@3]
06)----------RepartitionExec: partitioning=HashSelectionVector([CAST(f.a AS Int64)@1], 16), input_partitions=1
07)------------ProjectionExec: expr=[a@0 as a, CAST(a@0 AS Int64) as CAST(f.a AS Int64)]
08)--------------DataSourceExec: partitions=1, partition_sizes=[1]
09)----------RepartitionExec: partitioning=HashSelectionVector([col0@0], 16), input_partitions=16
10)------------ProjectionExec: expr=[CAST(x@0 AS Int64) + 1 as col0, CAST(y@1 AS Int64) + 1 as col1]
11)--------------RepartitionExec: partitioning=RoundRobinBatch(16), input_partitions=1
12)----------------CoalesceBatchesExec: target_batch_size=8192
13)------------------FilterExec: y@1 = x@0
14)--------------------DataSourceExec: partitions=1, partition_sizes=[1]
15)----RepartitionExec: partitioning=HashSelectionVector([CAST(s.b AS Int64)@1], 16), input_partitions=1
16)------ProjectionExec: expr=[b@0 as b, CAST(b@0 AS Int64) as CAST(s.b AS Int64)]
17)--------DataSourceExec: partitions=1, partition_sizes=[1]

zebsme · 2025-04-06T15:44:31Z

The implementation fully supports all join types (INNER, LEFT, RIGHT, FULL, etc.), but profiling reveals a performance degradation.

goldmedal and others added 27 commits March 30, 2025 19:35

introduce selection vector repartitioning

fdd001f

finish the phsyical plan side

d3e7cbe

add config

9fe2975

support for proto

b1a26f9

add sqllogictests

58c4b05

fix fmt and clippy

0c4c77f

rename column and address comment

dbce649

fix config test

0d4cf09

remove hash join test

0609ea1

fix typo

4e1949c

add back the join test

47aa074

add HashPartitionMode

bd8364b

update sqllogictests

1d8f15c

fix fmt

6df2b4b

add todo comment

b7b6a55

fix typo

b8534b3

address review comment

7accad0

fix compile and ehance doc

f298bd8

rename config

d3d9ca8

fix fmt

1ad9181

fix test

a9e1f52

fix sqllogictests

c6b33da

add selection vector mode in hash join PartitionMode

2f3b714

add with_selection_vector flag in HashJoinStream

6353a27

filter SELECTION_FIELD_NAME in try_new HashJoinExec

6dedfaf

add selection vector planner logic

5790ea3

fix fmt

07dc9e1

github-actions bot added documentation Improvements or additions to documentation physical-expr common labels Mar 31, 2025

github-actions bot added optimizer core proto sqllogictest labels Mar 31, 2025

zebsme mentioned this pull request Mar 31, 2025

Support zero copy hash repartitioning for Hash Join #1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support zero copy hash repartitioning for Hash Join #2

Support zero copy hash repartitioning for Hash Join #2

Uh oh!

zebsme commented Mar 31, 2025

Uh oh!

zebsme commented Mar 31, 2025

Uh oh!

zebsme commented Apr 6, 2025

Uh oh!

Uh oh!

Support zero copy hash repartitioning for Hash Join #2

Are you sure you want to change the base?

Support zero copy hash repartitioning for Hash Join #2

Uh oh!

Conversation

zebsme commented Mar 31, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

zebsme commented Mar 31, 2025

Uh oh!

zebsme commented Apr 6, 2025

Uh oh!

Uh oh!