-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Search before asking
- I searched in the issues and found nothing similar.
Paimon version
i tried using chain group parition to solve my #7503 , but still failed read some data
Compute Engine
spark 3.5.2
Minimal reproduce step
using test case :
CREATE TABLE IF NOT EXISTS `default`.`chain_test` (
`t1` BIGINT COMMENT 't1',
`t2` BIGINT COMMENT 't2',
`t3` STRING COMMENT 't3'
) PARTITIONED BY (`region` STRING, `dt` STRING COMMENT 'dt')
TBLPROPERTIES (
'bucket-key' = 't1',
'primary-key' = 'region,dt,t1',
'partition.timestamp-pattern' = '$dt',
'partition.timestamp-formatter' = 'yyyyMMdd',
'chain-table.enabled' = 'true',
'bucket' = '2',
'merge-engine' = 'deduplicate',
'sequence.field' = 't2',
'chain-table.chain-partition-keys' = 'dt'
);
CALL sys.create_branch('default.chain_test1', 'snapshot');
CALL sys.create_branch('default.chain_test1', 'delta');
ALTER TABLE default.chain_test1 SET TBLPROPERTIES (
'scan.fallback-snapshot-branch' = 'snapshot',
'scan.fallback-delta-branch' = 'delta'
);
ALTER TABLE `default`.`chain_test$branch_snapshot` SET TBLPROPERTIES (
'scan.fallback-snapshot-branch' = 'snapshot',
'scan.fallback-delta-branch' = 'delta'
);
ALTER TABLE `default`.`chain_test$branch_delta` SET TBLPROPERTIES (
'scan.fallback-snapshot-branch' = 'snapshot',
'scan.fallback-delta-branch' = 'delta'
);
insert some data
-- 写入主分支(delta)
INSERT OVERWRITE TABLE `default`.`chain_test` PARTITION (region = 'CN', dt = '20250810')
VALUES (1, 1, '1'), (2, 1, '1');
INSERT OVERWRITE TABLE `default`.`chain_test` PARTITION (region = 'US', dt = '20250810')
VALUES (11, 1, '1'), (12, 1, '1');
-- 写入 delta 分支
SET spark.paimon.branch = delta;
INSERT OVERWRITE TABLE `default`.`chain_test` PARTITION (region = 'CN', dt = '20250809')
VALUES (1, 1, '1'), (2, 1, '1');
INSERT OVERWRITE TABLE `default`.`chain_test` PARTITION (region = 'CN', dt = '20250810')
VALUES (1, 2, '1-1'), (3, 1, '1');
INSERT OVERWRITE TABLE `default`.`chain_test` PARTITION (region = 'CN', dt = '20250811')
VALUES (2, 2, '1-1'), (4, 1, '1');
VALUES (5, 2, '1-1'), (6, 2, '1-1');
INSERT OVERWRITE TABLE `default`.`chain_test` PARTITION (region = 'US', dt = '20250809')
VALUES (11, 1, '1'), (12, 1, '1');
INSERT OVERWRITE TABLE `default`.`chain_test` PARTITION (region = 'US', dt = '20250810')
VALUES (11, 2, '1-1'), (13, 1, '1');
INSERT OVERWRITE TABLE `default`.`chain_test` PARTITION (region = 'US', dt = '20250811')
VALUES (12, 2, '1-1'), (14, 1, '1');
-- 写入 snapshot 分支
SET spark.paimon.branch = snapshot;
INSERT OVERWRITE TABLE `default`.`chain_test` PARTITION (region = 'CN', dt = '20250810')
VALUES (1, 2, '1-1'), (2, 1, '1'), (3, 1, '1');
INSERT OVERWRITE TABLE `default`.`chain_test` PARTITION (region = 'US', dt = '20250810')
VALUES (11, 2, '1-1'), (12, 1, '1'), (13, 1, '1');
query
SELECT * FROM `default`.`chain_test` where dt = '20250811'
result
1 2 1-1 CN 20250811
2 2 1-1 CN 20250811
4 1 1 CN 20250811
3 1 1 CN 20250811
12 2 1-1 US 20250811
11 2 1-1 US 20250811
13 1 1 US 20250811
14 1 1 US 20250811
this is ok, why? because all partition are in all branch
if i insert a partition not in delta
SET spark.paimon.branch = snapshot;
INSERT OVERWRITE TABLE `default`.`chain_test` PARTITION (region = 'UK', dt = '20250810')
VALUES (1, 2, '1-1'), (2, 1, '1'), (3, 1, '1');
still got query result
1 2 1-1 CN 20250811
2 2 1-1 CN 20250811
4 1 1 CN 20250811
3 1 1 CN 20250811
12 2 1-1 US 20250811
11 2 1-1 US 20250811
13 1 1 US 20250811
14 1 1 US 20250811
What doesn't meet your expectations?
when insert a partition not in delta ,and in snapshot
INSERT OVERWRITE TABLE `default`.`chain_test` PARTITION (region = 'UK', dt = '20250810')
VALUES (1, 2, '1-1'), (2, 1, '1'), (3, 1, '1');
query result should be
1 2 1-1 CN 20250811
2 2 1-1 CN 20250811
4 1 1 CN 20250811
3 1 1 CN 20250811
12 2 1-1 US 20250811
11 2 1-1 US 20250811
13 1 1 US 20250811
14 1 1 US 20250811
1 2 1-1 UK 20250811
2 1 1 UK 20250811
3 1 1 UK 20250811
or i try to change partition key order from PARTITIONED BY (regionSTRING,dt STRING COMMENT 'dt') to PARTITIONED BY (dt STRING COMMENT 'dt',region STRING)
when execute sql as above
result is
2 2 1-1 20250811 US
4 1 1 20250811 US
12 2 1-1 20250811 US
14 1 1 20250811 US
2 2 1-1 20250811 CN
4 1 1 20250811 CN
Anything else?
No response
Are you willing to submit a PR?
- I'm willing to submit a PR!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working