-
Notifications
You must be signed in to change notification settings - Fork 1.3k
[Support] Checkpoint cannot complete when Flink reads the full data of a table in streaming mode. #7541
Description
Search before asking
- I searched in the issues and found nothing similar.
Paimon version
V1.3.1
Compute Engine
Flink V1.20.0
Minimal reproduce step
When using Flink to read a Paimon primary key table with changelog-producer = none, the read is configured with scan.mode = latest-full and continuous.discovery-interval = 20s.
The core reading code is roughly as follows:
Table table = loadPaimonTable(paimonTableConfig);
// Apply dynamic options for CDC streaming read
Map<String, String> paimonOptions = buildPaimonOptions(paimonTableConfig);
table = table.copy(paimonOptions);
// Build streaming source that directly produces DataStream
// sourceBounded(false) enables continuous streaming mode
return new FlinkSourceBuilder(table)
.env(env)
.sourceBounded(false)
.build();
The Flink checkpoint interval is 10 minutes, and the checkpoint timeout is 1 hour. It is observed that while processing the full data (snapshot phase), all checkpoints time out and none succeed. Checkpoints only start to succeed after the job begins processing incremental data.
What doesn't meet your expectations?
Although all data is read successfully, the checkpoints cannot be completed properly. We are concerned that if Flink restarts unexpectedly, the processing progress may not be recoverable.
Anything else?
1.Please explain why checkpoints cannot be completed successfully while processing the full (snapshot) data.
2.How can I modify the configuration or setup so that Flink can successfully complete checkpoints while processing the full data?
Are you willing to submit a PR?
- I'm willing to submit a PR!