-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Open
Description
Search before asking
- I searched in the issues and found no similar issues.
Paimon version
master (latest)
Compute Engine
None
Minimal reproduce step
When using changelog-producer = lookup with sequence.field configured, LookupMergeFunction.pickHighLevel() may select the wrong "old" record when out-of-order data arrives.
Configuration:
CREATE TABLE test (
id INT PRIMARY KEY NOT ENFORCED,
value INT,
update_time BIGINT
) WITH (
'changelog-producer' = 'lookup',
'sequence.field' = 'update_time'
);Scenario:
Initial state after compaction:
L1: (id=1, value=100, update_time=7)
L2: (id=1, value=200, update_time=8) ← Actually newer!
New out-of-order data arrives at L0:
L0: (id=1, value=50, update_time=6) ← Old data arriving late
Expected behavior:
pickHighLevel()should select L2 (update_time=8) as the "latest" high-level record- Result should reflect the record with highest sequence value
Actual behavior:
pickHighLevel()selects L1 (level 1 < level 2) ignoring sequence.field- Wrong changelog is generated
What doesn't meet your expectations?
LookupMergeFunction.pickHighLevel() only compares level numbers, ignoring sequence.field:
// LookupMergeFunction.java:88 - Current behavior
if (highLevel == null || kv.level() < highLevel.level()) {
highLevel = kv; // Always picks lowest level, ignores sequence
}Reproducible scenario:
// When candidates contain:
// L1: (key=1, sequence=7) <- level 1
// L2: (key=1, sequence=8) <- level 2, but higher sequence (newer!)
// pickHighLevel() returns L1 (because level 1 < 2)
// But should return L2 (because sequence 8 > 7)It should use sequence.field comparator when configured, similar to how SortMergeReaderWithMinHeap correctly handles it at line 61-67.
Anything else?
This issue only affects changelog-producer = lookup scenario. Normal queries (Batch/Streaming Scan) and Lookup Join are not affected.
I'm working on a fix and will submit a PR shortly. The PR includes a complete unit test to reproduce this issue.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels