Skip to content

Commit 1d4b808

Browse files
viiryacomphead
andauthored
Fix SortMergeJoin with join filter filtering all rows out (#10495)
* Fix SortMergeJoin with join filter filtering all rows out * Move test * Update datafusion/sqllogictest/test_files/sort_merge_join.slt --------- Co-authored-by: Oleks V <[email protected]>
1 parent 424757f commit 1d4b808

File tree

2 files changed

+20
-1
lines changed

2 files changed

+20
-1
lines changed

datafusion/physical-plan/src/joins/sort_merge_join.rs

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1323,7 +1323,9 @@ impl SMJStream {
13231323
// If join filter exists, `self.output_size` is not accurate as we don't know the exact
13241324
// number of rows in the output record batch. If streamed row joined with buffered rows,
13251325
// once join filter is applied, the number of output rows may be more than 1.
1326-
if record_batch.num_rows() > self.output_size {
1326+
// If `record_batch` is empty, we should reset `self.output_size` to 0. It could be happened
1327+
// when the join filter is applied and all rows are filtered out.
1328+
if record_batch.num_rows() == 0 || record_batch.num_rows() > self.output_size {
13271329
self.output_size = 0;
13281330
} else {
13291331
self.output_size -= record_batch.num_rows();

datafusion/sqllogictest/test_files/sort_merge_join.slt

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -263,5 +263,22 @@ DROP TABLE t1;
263263
statement ok
264264
DROP TABLE t2;
265265

266+
# Set batch size to 1 for sort merge join to test scenario when data spread across multiple batches
267+
statement ok
268+
set datafusion.execution.batch_size = 1;
269+
270+
query II
271+
SELECT * FROM (
272+
WITH
273+
t1 AS (
274+
SELECT 12 a, 12 b
275+
),
276+
t2 AS (
277+
SELECT 12 a, 12 b
278+
)
279+
SELECT t1.* FROM t1 JOIN t2 on t1.a = t2.b WHERE t1.a > t2.b
280+
) ORDER BY 1, 2;
281+
----
282+
266283
statement ok
267284
set datafusion.optimizer.prefer_hash_join = true;

0 commit comments

Comments
 (0)