Skip to content

Push the runtime filter from HashJoin down to SeqScan. #724

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 29, 2025

Conversation

zhangyue-hashdata
Copy link
Contributor

@zhangyue-hashdata zhangyue-hashdata commented Nov 21, 2024

Push the runtime filter from HashJoin down to SeqScan.

+----------+ AttrFilter +------+ ScanKey +---------+
| HashJoin | ------------> | Hash | ---------> | SeqScan |
+----------+ +------+ +---------+

If "gp_enable_runtime_filter_pushdown" is on, three steps will be run:

Step 1. In ExecInitHashJoin(), try to find the mapper between the var in
hashclauses and the var in SeqScan. If found we will save the mapper in
AttrFilter and push them to Hash node;

Step 2. We will create the range/bloom filters in AttrFilter during building
hash table, and these filters will be converted to the list of ScanKey
and pushed down to Seqscan when the building finishes;

Step 3. ScanKeys will be used to filter slot in Seqscan.

TODO:

  1. support singlenode or utility mode;
  2. support null value filter;
  3. support Motion, SharedScan as target node;
  4. support join qual like: t1.c1 = t2.c1+5;

perf:
CPU E5-2680 v2 10 cores, memory 32GB, 3 segments

  1. tpcds 10s off: 939s on: 779s 17%
  2. tpcds 100s off: 5270s on: 4365s 17%

tpcds 10s details

NO. off on
1 2959.5 1312.9
2 4522.2 2995.9
3 11924.1 9170.6
4 1678.4 1653.4
5-1 17433.1 14723.7
5-2 17244.1 14499.7
6 5541.8 4443.8
7 3144.0 1726.7
8 3895.7 2010.0
9 5991.1 3270.7
10 29981.7 19975.5
11 3113.9 2293.0
12 2166.9 1543.3
13 1258.8 726.0
14 11745.1 7037.3
15 3878.6 2568.8
16 5420.8 3200.1
17 3291.2 2117.3
18 2103.7 979.8
19 4400.1 2371.1
20 14048.3 11135.6
21 7739.5 7489.6
22 5524.1 3083.1
23 5978.4 4041.7
24 2355.7 1993.0
25 5561.1 3355.0
26 6973.0 3180.3
27 3536.0 2134.0
28 6899.6 3141.5
29 6065.5 3766.1
30 15443.3 15137.7
31 9197.8 6387.3
32 3345.3 1815.2
33 2488.1 2451.7
34 10840.0 9291.0
35 52991.2 52504.5
36 3386.1 3381.4
37 1834.0 994.2
38 10047.6 8915.8
39 3035.8 1961.5
40 1591.5 795.5
41 9284.0 7623.0
42 10255.5 10051.7
43 45356.5 44808.1
44 1989.3 1015.8
45 3021.2 1222.5
46 19865.7 18714.8
47 8438.0 4972.7
48 4520.7 3117.3
49 6244.7 6150.8
50 5926.0 4983.4
51 20423.2 20287.7
52 84.5 85.5
53 9273.4 6599.0
54 10214.5 8094.6
55 38920.5 36340.8
56 4248.9 2761.0
57 1980.7 1264.9
58 3954.6 1806.1
59 2982.5 1234.5
60 4099.5 1852.3
61 3080.7 1237.5
62 167.6 162.6
63 691.0 661.1
64 1283.0 549.9
65 2033.5 1017.7
66 8816.5 5342.4
67 10124.9 6664.9
68-1 53775.3 53665.4
68-2 55517.4 54534.5
69-1 20955.6 18578.4
69-2 17315.6 15543.9
70 6269.5 6134.9
71 13798.9 12300.0
72 5199.1 2861.3
73 4017.0 2236.1
74 5578.1 5341.3
75 1056.9 687.7
76 16922.5 16252.2
77 10491.0 8369.5
78 4590.9 2885.0
79 3660.0 2081.5
80 967.8 516.1
81 6620.6 1254.1
82 3072.2 1214.4
83 5111.3 2710.2
84 7458.4 7442.9
85 2989.0 1955.6
86 2546.8 1522.6
87 45634.8 43461.0
88 3555.6 1858.1
89 5630.1 4286.9
90 5321.3 2992.1
91 4748.6 4522.8
92-1 6264.3 4859.7
92-2 6355.4 5334.0
93 31174.3 30890.3
94 3971.3 3955.2
95 5097.4 3004.3
96 1999.7 1100.9
97 5055.0 2974.5
98 7319.9 5527.3
99 1895.3 1838.4
total 939 779 ~17%

tpcds 100s details

NO. off on
1 18410.5 8966.4
2 29278.7 17942.0
3 68332.8 52880.0
4 9753.3 9171.7
5-1 13129.4 11131.0
5-2 13198.6 11041.2
6 33375.2 27429.7
7 19269.8 10077.7
8 22145.1 12336.3
9 64808.8 56969.8
10 241285.3 161582.3
11 12075.9 10860.6
12 9563.9 8114.5
13 2687.4 1484.8
14 8932.8 5325.5
15 28111.9 20017.6
16 30523.0 19822.1
17 3168.7 3126.4
18 8063.8 5925.4
19 19826.5 10147.2
20 85671.7 80051.7
21 42448.4 38791.5
22 27746.9 15857.7
23 38362.9 25001.5
24 14143.9 13982.5
25 38145.1 18836.1
26 21448.6 12437.5
27 22101.6 14043.4
28 20168.0 12036.7
29 25105.1 13663.3
30 129563.3 129378.8
31 13768.9 7181.7
32 19039.5 11568.3
33 10040.4 9460.8
34 66348.7 57294.4
35 202825.5 197756.5
36 28703.6 28536.2
37 7065.0 6596.1
38 82447.6 73465.7
39 16568.5 10866.6
40 5632.2 2801.4
41 65710.0 54836.4
42 63198.1 61843.4
43 154716.5 151916.7
44 10148.0 5314.8
45 18080.4 8917.2
46 92069.5 84997.4
47 29166.7 21165.4
48 23630.6 20430.5
49 43623.1 43465.4
50 33574.3 24155.7
51 140174.1 136511.3
52 121.1 108.4
53 61777.5 53161.0
54 82215.5 70357.6
55 28835.3 26623.2
56 24767.9 15975.6
57 11274.8 7005.6
58 20008.8 11185.9
59 19055.5 8743.5
60 30663.6 13416.8
61 18951.8 8765.2
62 205.8 170.3
63 22338.1 13134.9
64 5539.4 3077.0
65 9817.0 5337.0
66 52164.9 26808.1
67 20848.3 9974.1
68-1 462977.0 458799.8
68-2 468924.8 464181.1
69-1 109512.7 103476.2
69-2 96662.2 88990.1
70 29996.5 28361.0
71 90191.5 75458.1
72 35578.6 20232.3
73 23311.9 13611.2
74 33299.4 30755.5
75 4249.5 3892.5
76 107381.3 102310.4
77 83335.9 72471.8
78 24223.1 15703.0
79 22379.5 14098.0
80 2753.5 1177.6
81 39643.7 15849.0
82 18515.7 8568.5
83 21661.9 11949.9
84 55856.0 55736.4
85 15100.4 9251.1
86 14383.5 9051.7
87 136009.8 134747.4
88 20819.7 12613.0
89 40187.3 31457.0
90 34545.7 12278.3
91 34661.6 32747.8
92-1 102286.6 31937.6
92-2 103260.1 31959.5
93 240605.7 235582.7
94 26150.7 25874.4
95 31199.3 20272.8
96 4154.5 2226.1
97 26195.2 17051.1
98 37977.6 24553.3
99 16624.3 16603.3
total 5270 4365 ~17%

Fixes #ISSUE_Number

What does this PR do?

Type of Change

  • Bug fix (non-breaking change)
  • New feature (non-breaking change)
  • Breaking change (fix or feature with breaking changes)
  • Documentation update

Breaking Changes

Test Plan

  • Unit tests added/updated
  • Integration tests added/updated
  • Passed make installcheck
  • Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Additional Context

⚠️ To skip CI: Add [skip ci] to your PR title. Only use when necessary! ⚠️


@fanfuxiaoran
Copy link
Contributor

Looks interesting. And I have some questions to discuss.

  • Beside the seqscan, can the runtime filter apply to other types of scan? such as the index scan.
  • Looks only when the hashjoin node and seqscan node run in the same process can use the runtime filter. Which means the tables should have same distributed policy on the join columns or one of the table is replicated.

@avamingli
Copy link
Contributor

There are codes changed in MultiExecParallelHash, please add some parallel tests with runtime filter.

@zhangyue-hashdata
Copy link
Contributor Author

There are codes changed in MultiExecParallelHash, please add some parallel tests with runtime filter.

got it.

@zhangyue-hashdata
Copy link
Contributor Author

Looks interesting. And I have some questions to discuss.

  • Beside the seqscan, can the runtime filter apply to other types of scan? such as the index scan.
  • Looks only when the hashjoin node and seqscan node run in the same process can use the runtime filter. Which means the tables should have same distributed policy on the join columns or one of the table is replicated.
  • Beside the seqscan, can the runtime filter apply to other types of scan? such as the index scan.

Theoretically, it is feasible to apply runtime filters to operators such as Index Scan. However, because Index Scan already reduces data volume by leveraging an optimized storage structure, the performance gains from applying runtime filters to Index Scan would likely be minimal. Thus, I think that applying runtime filters to Index Scan would not yield significant performance benefits.

In subsequent work, when we discover that other scan operators can achieve notable performance improvements from pushdown runtime filters, we will support these operators. Our focus will be on operators where runtime filters can substantially decrease the amount of data processed early in the query execution, leading to more pronounced performance enhancements.

  • Looks only when the hashjoin node and seqscan node run in the same process can use the runtime filter. Which means the tables should have same distributed policy on the join columns or one of the table is replicated.

Yes, the current pushdown runtime filter only supports in-process pushdown, which means that the Hash Join and SeqScan need to be within the same process. The design and implementation of cross-process pushdown runtime filters are much more complex.

This limitation arises because coordinating and sharing data structures like Bloom filters or other runtime filters across different processes involves additional challenges such as inter-process communication (IPC), synchronization, and ensuring consistency and efficiency of the filters across process boundaries. Addressing these issues requires a more sophisticated design that can handle the complexities of distributed computing environments.

@avamingli
Copy link
Contributor

Hi, with gp_enable_runtime_filter_pushdown = on, execute SQL below will get a crash:

gpadmin=# show gp_enable_runtime_filter_pushdown;
 gp_enable_runtime_filter_pushdown
-----------------------------------
 on
(1 row)
CREATE TABLE test_tablesample (dist int, id int, name text) WITH (fillfactor=10) DISTRIBUTED BY (dist);
-- use fillfactor so we don't have to load too much data to get multiple pages

-- Changed the column length in order to match the expected results based on relation's blocksz
INSERT INTO test_tablesample SELECT 0, i, repeat(i::text, 875) FROM generate_series(0, 9) s(i) ORDER BY i;
INSERT INTO test_tablesample SELECT 3, i, repeat(i::text, 875) FROM generate_series(10, 19) s(i) ORDER BY i;
INSERT INTO test_tablesample SELECT 5, i, repeat(i::text, 875) FROM generate_series(20, 29) s(i) ORDER BY i;
EXPLAIN (COSTS OFF)
  SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (2);
FATAL:  Unexpected internal error (assert.c:48)
DETAIL:  FailedAssertion("IsA(planstate, SeqScanState)", File: "explain.c", Line: 4154)
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
psql (14.4, server 14.4)
image

@zhangyue-hashdata
Copy link
Contributor Author

```sql
gpadmin=# show gp_enable_runtime_filter_pushdown;
 gp_enable_runtime_filter_pushdown
-----------------------------------
 on
(1 row)
CREATE TABLE test_tablesample (dist int, id int, name text) WITH (fillfactor=10) DISTRIBUTED BY (dist);
-- use fillfactor so we don't have to load too much data to get multiple pages

-- Changed the column length in order to match the expected results based on relation's blocksz
INSERT INTO test_tablesample SELECT 0, i, repeat(i::text, 875) FROM generate_series(0, 9) s(i) ORDER BY i;
INSERT INTO test_tablesample SELECT 3, i, repeat(i::text, 875) FROM generate_series(10, 19) s(i) ORDER BY i;
INSERT INTO test_tablesample SELECT 5, i, repeat(i::text, 875) FROM generate_series(20, 29) s(i) ORDER BY i;
EXPLAIN (COSTS OFF)
  SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (2);
FATAL:  Unexpected internal error (assert.c:48)
DETAIL:  FailedAssertion("IsA(planstate, SeqScanState)", File: "explain.c", Line: 4154)
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
psql (14.4, server 14.4)

Thanks, I'll reproduce the issue and fix it.

@fanfuxiaoran
Copy link
Contributor

Thanks for your detailed explanation.

Looks interesting. And I have some questions to discuss.

  • Beside the seqscan, can the runtime filter apply to other types of scan? such as the index scan.
  • Looks only when the hashjoin node and seqscan node run in the same process can use the runtime filter. Which means the tables should have same distributed policy on the join columns or one of the table is replicated.
  • Beside the seqscan, can the runtime filter apply to other types of scan? such as the index scan.

Theoretically, it is feasible to apply runtime filters to operators such as Index Scan. However, because Index Scan already reduces data volume by leveraging an optimized storage structure, the performance gains from applying runtime filters to Index Scan would likely be minimal. Thus, I think that applying runtime filters to Index Scan would not yield significant performance benefits.

Make sense. When doing hashjoin, index scan or index only scan are often not used on probe node.

In subsequent work, when we discover that other scan operators can achieve notable performance improvements from pushdown runtime filters, we will support these operators. Our focus will be on operators where runtime filters can substantially decrease the amount of data processed early in the query execution, leading to more pronounced performance enhancements.

  • Looks only when the hashjoin node and seqscan node run in the same process can use the runtime filter. Which means the tables should have same distributed policy on the join columns or one of the table is replicated.

Yes, the current pushdown runtime filter only supports in-process pushdown, which means that the Hash Join and SeqScan need to be within the same process. The design and implementation of cross-process pushdown runtime filters are much more complex.

This limitation arises because coordinating and sharing data structures like Bloom filters or other runtime filters across different processes involves additional challenges such as inter-process communication (IPC), synchronization, and ensuring consistency and efficiency of the filters across process boundaries. Addressing these issues requires a more sophisticated design that can handle the complexities of distributed computing environments.

Exactly, and if there is any lock used to solve the problem may even lead bad performance.

@fanfuxiaoran
Copy link
Contributor

 explain analyze
SELECT count(t1.c3) FROM t1, t3 WHERE t1.c1 = t3.c1 ;
                                                              QUERY PLAN

-----------------------------------------------------------------------------------------
----------------------------------------------
 Finalize Aggregate  (cost=1700.07..1700.08 rows=1 width=8) (actual time=32119.566..32119
.571 rows=1 loops=1)
   ->  Gather Motion 3:1  (slice1; segments: 3)  (cost=1700.02..1700.07 rows=3 width=8) (
actual time=30.967..32119.550 rows=3 loops=1)
         ->  Partial Aggregate  (cost=1700.02..1700.03 rows=1 width=8) (actual time=32119
.131..32119.135 rows=1 loops=1)
               ->  Hash Join  (cost=771.01..1616.68 rows=33334 width=4) (actual time=14.0
59..32116.962 rows=33462 loops=1)
                     Hash Cond: (t3.c1 = t1.c1)
                     Extra Text: (seg0)   Hash chain length 1.0 avg, 3 max, using 32439 o
f 524288 buckets.
                     ->  Seq Scan on t3  (cost=0.00..387.34 rows=33334 width=4) (actual t
ime=0.028..32089.490 rows=33462 loops=1)
                     ->  Hash  (cost=354.34..354.34 rows=33334 width=8) (actual time=13.2
57..13.259 rows=33462 loops=1)
                           Buckets: 524288  Batches: 1  Memory Usage: 5404kB
                           ->  Seq Scan on t1  (cost=0.00..354.34 rows=33334 width=8) (ac
tual time=0.180..4.877 rows=33462 loops=1)
 Planning Time: 0.227 ms

runtime_filter has been pushed down to t3 table seqscan, but 'explain analyze' doesn't print them out.

\d t1
                 Table "public.t1"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 c1     | integer |           |          |
 c2     | integer |           |          |
 c3     | integer |           |          |
 c4     | integer |           |          |
 c5     | integer |           |          |
Checksum: t
Indexes:
    "t1_c2" btree (c2)
Distributed by: (c1)
 \d t3
                 Table "public.t3"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 c1     | integer |           |          |
 c2     | integer |           |          |
 c3     | integer |           |          |
 c4     | integer |           |          |
 c5     | integer |           |          |
Distributed by: (c1)

@zhangyue-hashdata zhangyue-hashdata force-pushed the runtime_filter branch 2 times, most recently from 76a003a to 98dac6d Compare December 5, 2024 14:37
@zhangyue-hashdata
Copy link
Contributor Author

 explain analyze
SELECT count(t1.c3) FROM t1, t3 WHERE t1.c1 = t3.c1 ;
                                                              QUERY PLAN

-----------------------------------------------------------------------------------------
----------------------------------------------
 Finalize Aggregate  (cost=1700.07..1700.08 rows=1 width=8) (actual time=32119.566..32119
.571 rows=1 loops=1)
   ->  Gather Motion 3:1  (slice1; segments: 3)  (cost=1700.02..1700.07 rows=3 width=8) (
actual time=30.967..32119.550 rows=3 loops=1)
         ->  Partial Aggregate  (cost=1700.02..1700.03 rows=1 width=8) (actual time=32119
.131..32119.135 rows=1 loops=1)
               ->  Hash Join  (cost=771.01..1616.68 rows=33334 width=4) (actual time=14.0
59..32116.962 rows=33462 loops=1)
                     Hash Cond: (t3.c1 = t1.c1)
                     Extra Text: (seg0)   Hash chain length 1.0 avg, 3 max, using 32439 o
f 524288 buckets.
                     ->  Seq Scan on t3  (cost=0.00..387.34 rows=33334 width=4) (actual t
ime=0.028..32089.490 rows=33462 loops=1)
                     ->  Hash  (cost=354.34..354.34 rows=33334 width=8) (actual time=13.2
57..13.259 rows=33462 loops=1)
                           Buckets: 524288  Batches: 1  Memory Usage: 5404kB
                           ->  Seq Scan on t1  (cost=0.00..354.34 rows=33334 width=8) (ac
tual time=0.180..4.877 rows=33462 loops=1)
 Planning Time: 0.227 ms

runtime_filter has been pushed down to t3 table seqscan, but 'explain analyze' doesn't print them out.

\d t1
                 Table "public.t1"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 c1     | integer |           |          |
 c2     | integer |           |          |
 c3     | integer |           |          |
 c4     | integer |           |          |
 c5     | integer |           |          |
Checksum: t
Indexes:
    "t1_c2" btree (c2)
Distributed by: (c1)
 \d t3
                 Table "public.t3"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 c1     | integer |           |          |
 c2     | integer |           |          |
 c3     | integer |           |          |
 c4     | integer |           |          |
 c5     | integer |           |          |
Distributed by: (c1)

Thanks for your test case. Based on these, I rewrote code to ensure that debug info are always displayed even when the number of filtered rows is zero. And add the test case into gp_runtime_filter.sql too.
fix in 98dac6d

@zhangyue-hashdata
Copy link
Contributor Author

zhangyue-hashdata commented Dec 5, 2024

Hi, with gp_enable_runtime_filter_pushdown = on, execute SQL below will get a crash:

gpadmin=# show gp_enable_runtime_filter_pushdown;
 gp_enable_runtime_filter_pushdown
-----------------------------------
 on
(1 row)
CREATE TABLE test_tablesample (dist int, id int, name text) WITH (fillfactor=10) DISTRIBUTED BY (dist);
-- use fillfactor so we don't have to load too much data to get multiple pages

-- Changed the column length in order to match the expected results based on relation's blocksz
INSERT INTO test_tablesample SELECT 0, i, repeat(i::text, 875) FROM generate_series(0, 9) s(i) ORDER BY i;
INSERT INTO test_tablesample SELECT 3, i, repeat(i::text, 875) FROM generate_series(10, 19) s(i) ORDER BY i;
INSERT INTO test_tablesample SELECT 5, i, repeat(i::text, 875) FROM generate_series(20, 29) s(i) ORDER BY i;
EXPLAIN (COSTS OFF)
  SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (2);
FATAL:  Unexpected internal error (assert.c:48)
DETAIL:  FailedAssertion("IsA(planstate, SeqScanState)", File: "explain.c", Line: 4154)
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
psql (14.4, server 14.4)
image

Thanks for your test case. I fix it in 98dac6d And add the test case into gp_runtime_filter.sql too.

@Smyatkin-Maxim
Copy link
Contributor

Hi @zhangyue-hashdata
I see that previous runtime filter implementation relies on some cost model at try_runtime_filter(). Do I understand it correctly, that this PR does not do any cost evaluation?
Also for TPC-H/TPC-DS can you provide results for each query separately?

Asking mostly out of curiosity, I see here are quite a few reviewers here already :)

@zhangyue-hashdata
Copy link
Contributor Author

Hi @zhangyue-hashdata I see that previous runtime filter implementation relies on some cost model at try_runtime_filter(). Do I understand it correctly, that this PR does not do any cost evaluation? Also for TPC-H/TPC-DS can you provide results for each query separately?

Asking mostly out of curiosity, I see here are quite a few reviewers here already :)

Basically, you're correct. Because our goal is to filter out as much data as possible right at the point of data generation. However, this will lead to very complex evaluations, so we only made a simple estimation based on rows and work memory when creating the Bloom filter.
Furthermore, I have placed the detailed test results for TPC-DS 10s in PR description.

@zhangyue-hashdata zhangyue-hashdata force-pushed the runtime_filter branch 3 times, most recently from 2ea19a0 to e607900 Compare February 2, 2025 05:40
@Smyatkin-Maxim
Copy link
Contributor

Smyatkin-Maxim commented Mar 24, 2025

@zhangyue-hashdata have you tried benchmarks on the recent builds? The last time I ran TPC-H against cloudberry, 12/22 queries used legacy optimizer. I guess for TPC-DS it's even worse.
But now, after cherry-picks from gpdb master, more queries should be using ORCA optimizer. I wonder if there is still the same benefit from runtime bloom filters.

Btw, can you guys share TPC-DS and TPC-H toolkit you're using to benchmark cloudberry?

@zhangyue-hashdata
Copy link
Contributor Author

@zhangyue-hashdata have you tried benchmarks on the recent builds? The last time I ran TPC-H against cloudberry, 12/22 queries used legacy optimizer. I guess for TPC-DS it's even worse. But now, after cherry-picks from gpdb master, more queries should be using ORCA optimizer. I wonder if there is still the same benefit from runtime bloom filters.

Btw, can you guys share TPC-DS and TPC-H toolkit you're using to benchmark cloudberry?

not yet recently, here is toolkit for TPC-DS, hope helpful to you :-)
toolkit.tar.gz

@zhangyue-hashdata zhangyue-hashdata force-pushed the runtime_filter branch 2 times, most recently from 874f754 to c80d847 Compare March 28, 2025 04:07
@zhangyue-hashdata zhangyue-hashdata changed the title Push the runtime filter from HashJoin down to SeqScan or AM. Push the runtime filter from HashJoin down to SeqScan. Mar 28, 2025
Copy link
Contributor

@gfphoenix78 gfphoenix78 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@yjhjstz yjhjstz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yjhjstz
Copy link
Member

yjhjstz commented Mar 28, 2025

@zhangyue-hashdata squash commits into one commit .

@zhangyue-hashdata
Copy link
Contributor Author

@zhangyue-hashdata squash commits into one commit .

got it

+----------+  AttrFilter   +------+  ScanKey   +---------+
| HashJoin | ------------> | Hash | ---------> | SeqScan |
+----------+               +------+            +---------+

If "gp_enable_runtime_filter_pushdown" is on, three steps will be run:

Step 1. In ExecInitHashJoin(), try to find the mapper between the var in
        hashclauses and the var in SeqScan. If found we will save the mapper in
        AttrFilter and push them to Hash node;

Step 2. We will create the range/bloom filters in AttrFilter during building
        hash table, and these filters will be converted to the list of ScanKey
        and pushed down to Seqscan when the building finishes;

Step 3. ScanKeys will be used to filter slot in Seqscan.

TODO:
1. support singlenode or utility mode;
2. support null value filter;
3. support Motion, SharedScan as target node;
4. support join qual like: t1.c1 = t2.c1+5;
@my-ship-it my-ship-it merged commit ca06b28 into apache:main Mar 29, 2025
12 checks passed
@zhangyue-hashdata zhangyue-hashdata deleted the runtime_filter branch March 31, 2025 01:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants