-
Notifications
You must be signed in to change notification settings - Fork 13.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-37008] [runtime-web] Flink UI should show the type of checkpoint (full vs incremental) #25899
[FLINK-37008] [runtime-web] Flink UI should show the type of checkpoint (full vs incremental) #25899
Conversation
...web/web-dashboard/src/app/pages/job/checkpoints/detail/job-checkpoints-detail.component.html
Outdated
Show resolved
Hide resolved
ffa5245
to
c362d7a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! overall LGTM
flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/SnapshotType.java
Outdated
Show resolved
Hide resolved
flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/SavepointType.java
Outdated
Show resolved
Hide resolved
I added one test. I looked at other flags on Checkpoints and Savepoints and couldn't tests. I also couldn't find ui tests but maybe I'm blind. Let me know if I missed anything! |
@gaborgsomogyi If you have a chance to review given you post in the mailing list |
@flinkbot run azure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update! Please find my comments below:
flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/SavepointType.java
Outdated
Show resolved
Hide resolved
flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointType.java
Outdated
Show resolved
Hide resolved
private static final CheckpointProperties FULL_CHECKPOINT_NEVER_RETAINED = | ||
new CheckpointProperties( | ||
false, | ||
CheckpointType.FULL_CHECKPOINT, | ||
true, | ||
true, // Delete on success | ||
true, // Delete on cancellation | ||
true, // Delete on failure | ||
true, // Delete on suspension | ||
false); | ||
|
||
private static final CheckpointProperties FULL_CHECKPOINT_RETAINED_ON_FAILURE = | ||
new CheckpointProperties( | ||
false, | ||
CheckpointType.FULL_CHECKPOINT, | ||
true, | ||
true, // Delete on success | ||
true, // Delete on cancellation | ||
false, // Retain on failure | ||
true, // Delete on suspension | ||
false); | ||
|
||
private static final CheckpointProperties FULL_CHECKPOINT_RETAINED_ON_CANCELLATION = | ||
new CheckpointProperties( | ||
false, | ||
CheckpointType.FULL_CHECKPOINT, | ||
true, | ||
true, // Delete on success | ||
false, // Retain on cancellation | ||
false, // Retain on failure | ||
false, // Retain on suspension | ||
false); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why those are introduced?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are only used for tests
We don't really have an test suite for testing this kind of thing
I'm open to ideas for where to add tests
I've thought about
SnapshotUtilsTest
and
CheckpointPropertiesTest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see.
I read through the CheckpointCoordinator
and it seems the JM will always propagate incremental checkpoints periodically (not the ones triggered by Rest API). But actually the TM will do incremental or full ones by configuration it reads (create RocksNativeFullSnapshotStrategy
or RocksIncrementalSnapshotStrategy
when state backend build). Meaning that in UI will always show incremental cps even if we disable that. Am I right? If so, we should make some change in CheckpointCoordinator
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree looking at the CheckpointCoordinator
.
flink/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinator.java
Lines 355 to 356 in 55de8d6
this.checkpointProperties = | |
CheckpointProperties.forCheckpoint(chkConfig.getCheckpointRetentionPolicy()); |
Specifically, the issue is that Checkpoint
can be used for both full or incremental.
flink/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointType.java
Lines 29 to 31 in 55de8d6
/** A checkpoint, full or incremental. */ | |
public static final CheckpointType CHECKPOINT = | |
new CheckpointType("Checkpoint", SharingFilesStrategy.FORWARD_BACKWARD); |
We'd have to move the evaluation of execution.checkpointing.incremental
or change how we are determining if a Checkpoint is Full 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought a possible solution is to evaluate execution.checkpointing.incremental
in CheckpointCoordinator
and set checkpointProperties
properly. WDTY?
@flinkbot run azure |
@flinkbot run azure |
The flink tests seem to fail on a different flaky test. Restarted to make sure. |
@Zakelly I'd love to get this in as part of Flink 2.0. I know you're working on a bunch but if you had some time 🙏 |
No worries. I will spare some time and we'll make it. |
@flinkbot run azure |
We realized that it is quite difficult to determine at the correct layer if a checkpoint is full when it isn't explicitly marked as full. This needs a more in-depth dive into the code. |
JIRA Ticket: https://issues.apache.org/jira/browse/FLINK-37008
What is the purpose of the change
It would be useful for the UI to show if a checkpoint is full or incremental. I'm curious how others would like to expose this in the UI / API but I wanted to do a first pass to get the conversation rolling.
This PR is meant to be throwaway when we decided on a path forward.
There are currently no tests and likely fails CI
UPDATED IMAGE

Brief change log
Adds "full checkpoint flag" to checkpoints to be displayed in the UI / API
Verifying this change
N/A for this draft PR
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: (no)Documentation