Skip to content

[Bug] using Chain table group partition ,partition in snapshot but not in delta will not be read #7532

@blackflash997997

Description

@blackflash997997

Search before asking

  • I searched in the issues and found nothing similar.

Paimon version

i tried using chain group parition to solve my #7503 , but still failed read some data

Compute Engine

spark 3.5.2

Minimal reproduce step

using test case :

CREATE TABLE IF NOT EXISTS `default`.`chain_test` (
  `t1` BIGINT COMMENT 't1',
  `t2` BIGINT COMMENT 't2',
  `t3` STRING COMMENT 't3'
) PARTITIONED BY (`region` STRING, `dt` STRING COMMENT 'dt')
TBLPROPERTIES (
  'bucket-key' = 't1',
  'primary-key' = 'region,dt,t1',
  'partition.timestamp-pattern' = '$dt',
  'partition.timestamp-formatter' = 'yyyyMMdd',
  'chain-table.enabled' = 'true',
  'bucket' = '2',
  'merge-engine' = 'deduplicate',
  'sequence.field' = 't2',
  'chain-table.chain-partition-keys' = 'dt'
);

CALL sys.create_branch('default.chain_test1', 'snapshot');
CALL sys.create_branch('default.chain_test1', 'delta');


ALTER TABLE default.chain_test1 SET TBLPROPERTIES (
  'scan.fallback-snapshot-branch' = 'snapshot',
  'scan.fallback-delta-branch' = 'delta'
);

ALTER TABLE `default`.`chain_test$branch_snapshot` SET TBLPROPERTIES (
  'scan.fallback-snapshot-branch' = 'snapshot',
  'scan.fallback-delta-branch' = 'delta'
);

ALTER TABLE `default`.`chain_test$branch_delta` SET TBLPROPERTIES (
  'scan.fallback-snapshot-branch' = 'snapshot',
  'scan.fallback-delta-branch' = 'delta'
);

insert some data

-- 写入主分支(delta)
INSERT OVERWRITE TABLE `default`.`chain_test` PARTITION (region = 'CN', dt = '20250810')
VALUES (1, 1, '1'), (2, 1, '1');

INSERT OVERWRITE TABLE `default`.`chain_test` PARTITION (region = 'US', dt = '20250810')
VALUES (11, 1, '1'), (12, 1, '1');

-- 写入 delta 分支
SET spark.paimon.branch = delta;

INSERT OVERWRITE TABLE `default`.`chain_test` PARTITION (region = 'CN', dt = '20250809')
VALUES (1, 1, '1'), (2, 1, '1');

INSERT OVERWRITE TABLE `default`.`chain_test` PARTITION (region = 'CN', dt = '20250810')
VALUES (1, 2, '1-1'), (3, 1, '1');

INSERT OVERWRITE TABLE `default`.`chain_test` PARTITION (region = 'CN', dt = '20250811')
VALUES (2, 2, '1-1'), (4, 1, '1');
VALUES (5, 2, '1-1'), (6, 2, '1-1');

INSERT OVERWRITE TABLE `default`.`chain_test` PARTITION (region = 'US', dt = '20250809')
VALUES (11, 1, '1'), (12, 1, '1');

INSERT OVERWRITE TABLE `default`.`chain_test` PARTITION (region = 'US', dt = '20250810')
VALUES (11, 2, '1-1'), (13, 1, '1');

INSERT OVERWRITE TABLE `default`.`chain_test` PARTITION (region = 'US', dt = '20250811')
VALUES (12, 2, '1-1'), (14, 1, '1');





-- 写入 snapshot 分支
SET spark.paimon.branch = snapshot;

INSERT OVERWRITE TABLE `default`.`chain_test` PARTITION (region = 'CN', dt = '20250810')
VALUES (1, 2, '1-1'), (2, 1, '1'), (3, 1, '1');

INSERT OVERWRITE TABLE `default`.`chain_test` PARTITION (region = 'US', dt = '20250810')
VALUES (11, 2, '1-1'), (12, 1, '1'), (13, 1, '1');

query

SELECT * FROM `default`.`chain_test` where dt = '20250811'

result

1       2       1-1     CN      20250811
2       2       1-1     CN      20250811
4       1       1       CN      20250811
3       1       1       CN      20250811
12      2       1-1     US      20250811
11      2       1-1     US      20250811
13      1       1       US      20250811
14      1       1       US      20250811

this is ok, why? because all partition are in all branch

if i insert a partition not in delta

SET spark.paimon.branch = snapshot;
INSERT OVERWRITE TABLE `default`.`chain_test` PARTITION (region = 'UK', dt = '20250810')
VALUES (1, 2, '1-1'), (2, 1, '1'), (3, 1, '1');

still got query result

1       2       1-1     CN      20250811
2       2       1-1     CN      20250811
4       1       1       CN      20250811
3       1       1       CN      20250811
12      2       1-1     US      20250811
11      2       1-1     US      20250811
13      1       1       US      20250811
14      1       1       US      20250811

What doesn't meet your expectations?

when insert a partition not in delta ,and in snapshot

INSERT OVERWRITE TABLE `default`.`chain_test` PARTITION (region = 'UK', dt = '20250810')
VALUES (1, 2, '1-1'), (2, 1, '1'), (3, 1, '1');

query result should be

1       2       1-1     CN      20250811
2       2       1-1     CN      20250811
4       1       1       CN      20250811
3       1       1       CN      20250811
12      2       1-1     US      20250811
11      2       1-1     US      20250811
13      1       1       US      20250811
14      1       1       US      20250811
1       2       1-1     UK      20250811
2       1       1     UK      20250811
3       1       1       UK      20250811

or i try to change partition key order from PARTITIONED BY (regionSTRING,dt STRING COMMENT 'dt') to PARTITIONED BY (dt STRING COMMENT 'dt',region STRING)

when execute sql as above
result is

2       2       1-1     20250811        US
4       1       1       20250811        US
12      2       1-1     20250811        US
14      1       1       20250811        US
2       2       1-1     20250811        CN
4       1       1       20250811        CN

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions