-
Notifications
You must be signed in to change notification settings - Fork 67
Available optimizations
Since the beginning, pg_pathman aims to improve performance of queries involving partitioned tables with significant amount of partitions (i.e. 1K+). Although PostgreSQL 10 features built-in support for partitioning, it still uses an exhaustive search in order to select suitable partitions, which gets slower and slower as the total amount increases.
Our per-relation cache stores an ordered array of structs containing:
- Partition's Oid (a unique identifier of this partition);
- Left & right bounds (RANGE partitioning) OR hash value (HASH partitioning);
This allows us to perform a binary search (tables partitioned by RANGE), and use hash value modulo amount of partitions as an index of the corresponding partition (tables partitioned by HASH). Furthermore, we strive to remove redundant WHERE clauses (for instance, we don't have to recheck key < 20
for rows of partition covering [0, 10)
).
Here's a few examples of queries with improved plans:
create table simple (id int not null);
insert into simple values (10), (20);
create table abc (id int not null);
select create_range_partitions('abc', 'id', 1, 10, 10);
/* pg_pathman */
explain (analyze, costs off) select * from abc where id < 21;
QUERY PLAN
------------------------------------------------------------------------
Append (actual time=0.034..38.046 rows=190000 loops=1)
-> Seq Scan on abc_1 (actual time=0.032..13.249 rows=90000 loops=1)
-> Seq Scan on abc_2 (actual time=0.006..9.746 rows=100000 loops=1)
Planning time: 0.379 ms
Execution time: 45.777 ms
(5 rows)
/* standard inheritance */
explain (analyze, costs off) select * from abc where id < 21;
QUERY PLAN
-------------------------------------------------------------------------
Append (actual time=0.017..51.047 rows=190001 loops=1)
-> Seq Scan on abc (actual time=0.016..0.018 rows=1 loops=1)
Filter: (id < 21)
-> Seq Scan on abc_1 (actual time=0.016..20.356 rows=90000 loops=1)
Filter: (id < 21)
-> Seq Scan on abc_2 (actual time=0.011..17.618 rows=100000 loops=1)
Filter: (id < 21)
Planning time: 0.847 ms
Execution time: 57.893 ms
(9 rows)
/* pg_pathman */
explain (analyze, costs off) update simple set id = 2
from abc where simple.id = abc.id;
QUERY PLAN
----------------------------------------------------------------------------------------
Update on simple (actual time=54.708..54.708 rows=0 loops=1)
-> Nested Loop (actual time=0.098..42.023 rows=20000 loops=1)
-> Seq Scan on simple (actual time=0.017..0.019 rows=2 loops=1)
-> Custom Scan (RuntimeAppend) (actual time=0.022..19.123 rows=10000 loops=2)
Prune by: (simple.id = abc.id)
-> Seq Scan on abc_1 (actual time=0.031..19.920 rows=10000 loops=1)
Filter: (simple.id = id)
Rows Removed by Filter: 90000
-> Seq Scan on abc_2 (actual time=0.012..16.387 rows=10000 loops=1)
Filter: (simple.id = id)
Rows Removed by Filter: 90000
Planning time: 3.450 ms
Execution time: 54.858 ms
(13 rows)
/* standard inheritance */
explain (analyze, costs off) update simple set id = 2
from abc where simple.id = abc.id;
QUERY PLAN
--------------------------------------------------------------------------------------
Update on simple (actual time=399.068..399.068 rows=0 loops=1)
-> Hash Join (actual time=0.096..385.563 rows=20000 loops=1)
Hash Cond: (abc.id = simple.id)
-> Append (actual time=0.034..281.405 rows=1000000 loops=1)
-> Seq Scan on abc (actual time=0.006..0.006 rows=0 loops=1)
-> Seq Scan on abc_1 (actual time=0.025..25.897 rows=100000 loops=1)
-> Seq Scan on abc_2 (actual time=0.008..21.762 rows=100000 loops=1)
-> Seq Scan on abc_3 (actual time=0.008..21.162 rows=100000 loops=1)
-> Seq Scan on abc_4 (actual time=0.006..21.217 rows=100000 loops=1)
-> Seq Scan on abc_5 (actual time=0.006..21.300 rows=100000 loops=1)
-> Seq Scan on abc_6 (actual time=0.006..21.507 rows=100000 loops=1)
-> Seq Scan on abc_7 (actual time=0.007..21.285 rows=100000 loops=1)
-> Seq Scan on abc_8 (actual time=0.006..21.182 rows=100000 loops=1)
-> Seq Scan on abc_9 (actual time=0.008..21.253 rows=100000 loops=1)
-> Seq Scan on abc_10 (actual time=0.006..21.666 rows=100000 loops=1)
-> Hash (actual time=0.019..0.019 rows=2 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on simple (actual time=0.006..0.009 rows=2 loops=1)
Planning time: 1.405 ms
Execution time: 399.338 ms
(20 rows)
/* pg_pathman */
explain (analyze, costs off) delete from abc where id = 1;
QUERY PLAN
---------------------------------------------------------------------
Delete on abc_1 (actual time=22.751..22.751 rows=0 loops=1)
-> Seq Scan on abc_1 (actual time=22.749..22.749 rows=0 loops=1)
Filter: (id = 1)
Rows Removed by Filter: 90000
Planning time: 0.144 ms
Execution time: 22.786 ms
(6 rows)
/* standard inheritance */
explain (analyze, costs off) delete from abc where id = 1;
QUERY PLAN
------------------------------------------------------------------------
Delete on abc (actual time=36.546..36.546 rows=0 loops=1)
Delete on abc
Delete on abc_1
-> Seq Scan on abc (actual time=0.006..0.006 rows=0 loops=1)
Filter: (id = 1)
-> Seq Scan on abc_1 (actual time=0.092..21.588 rows=10000 loops=1)
Filter: (id = 1)
Rows Removed by Filter: 90000
Planning time: 1.603 ms
Execution time: 36.615 ms
(10 rows)
It's hard to support different releases of PostgreSQL (9.5, 9.6, 10 etc), but we try to do our best. However, there might be some problems with planning that are inherent to the specific release of PostgreSQL, since we use different version-specific pieces of code to make pg_pathman work. For example, we had to disable some optimizations for UPDATE & DELETE on PostgreSQL 9.5:
/* PostgreSQL 9.5 */
explain (costs off)
update simple set id = 2 where id in (select * from abc
where id = simple.id and id < 20);
QUERY PLAN
----------------------------------------------------------------
Update on simple
-> Seq Scan on simple
Filter: (SubPlan 1)
SubPlan 1
-> Append
-> Seq Scan on abc
Filter: ((id < 20) AND (id = simple.id))
-> Seq Scan on abc_1
Filter: ((id < 20) AND (id = simple.id))
-> Seq Scan on abc_2
Filter: ((id < 20) AND (id = simple.id))
(11 rows)
/* PostgreSQL 9.6+ */
explain (costs off)
update simple set id = 2 where id in (select * from abc
where id = simple.id and id < 20);
QUERY PLAN
--------------------------------------------------------------------
Update on simple
-> Seq Scan on simple
Filter: (SubPlan 1)
SubPlan 1
-> Custom Scan (RuntimeAppend)
Prune by: ((abc.id < 20) AND (abc.id = simple.id))
-> Seq Scan on abc_1 abc
Filter: (id = simple.id)
-> Seq Scan on abc_2 abc
Filter: ((id < 20) AND (id = simple.id))
(10 rows)