[FLINK-39073][runtime] Defer alignment check for idle splits#27638
[FLINK-39073][runtime] Defer alignment check for idle splits#27638pnowojski merged 3 commits intoapache:masterfrom
Conversation
If a split emits watermarks far into the future and then goes idle, alignment check will incorrectly mark it paused. As max allowed watermark advances, Source operator will transition the split back to active state, (while its still idle) This change aims to fix the issue by deferring the alignment check for idle splits until they break out of idleness.
Split transitions directly from paused to idle and from idle to pause don't make any sense, improving the logs to allow further analysis. It is a convention for flink metrics system to not fail the job though.
pnowojski
left a comment
There was a problem hiding this comment.
LGTM % one small comment. Thanks for the fix!
| // Alignment check fires but doesn't pause the idle split | ||
| operator.handleOperatorEvent(new WatermarkAlignmentEvent(allowedWatermark4)); | ||
| assertThat(operator.getSplitMetricGroup(split0.splitId()).isIdle()).isTrue(); |
There was a problem hiding this comment.
Can we assert that the split hasn't been paused?
| } | ||
| } | ||
|
|
||
| private void maybePauseSplit(String splitId) { |
There was a problem hiding this comment.
the method name is strange - I was expecting it to start with a verb like the other methods. Maybe `updateCurrentSplitPausedWatermark
There was a problem hiding this comment.
This method checks if the split watermark is too advanced and if yes calls pauseOrResumeSplit for it, unless already paused.
I am not aware of a "paused watermark" concept, but open to other suggestions on how to name this method.
There was a problem hiding this comment.
I think the maybePauseSplit decsribes pretty well what's happening. Also there are plenty of other cases with similar naming convention (method named maybeACTION).
Re updateCurrentSplitPausedWatermark:
- I presume you ment
updateCurrentPausedSplitWatermark, but in this case "Paused watermark" is still confusing as pointed by @Efrat19 . Even more correct would have beenupdateCurrentlyPausedSplit... - but
updateCurrentlyPausedSplitdoesn't capture that currently paused split split might not be updated - and
updateCurrentlyPausedSplitis basically as synonym ofpauseSplits
So:
maybePauseSplitandmaybeUpdateCurrentlyPausedSplitare both technically correctmaybePauseSplitsimo sounds better, so I would keep it as is.
|
@flinkbot run azure |
What is the purpose of the change
If a split emits watermarks far into the future and then goes idle, alignment check will incorrectly mark it paused.
As max allowed watermark advances, Source operator will transition the split back to active state, (while its still idle)
This change aims to fix the issue by deferring the alignment check for idle splits until they break out of idleness.
Brief change log
currentlyIdleSplitsinsourceOperatorVerifying this change
This change added tests and can be verified. (unitTest)
Does this pull request potentially affect one of the following parts:
@Public(Evolving): noDocumentation