Skip to content

Conversation

@jiangzho
Copy link
Contributor

@jiangzho jiangzho commented Oct 24, 2025

What changes were proposed in this pull request?

This PR adds support for automatic restart counter reset based on application attempt duration. The feature introduces a new restartCounterResetMillis field in RestartConfig that allows the restart counter to be reset if an application runs successfully for a specified duration before terminating.

Also added unit test.

Why are the changes needed?

With this feature, users can distinguish between persistent failures (quick consecutive crashes) and applications that run for long periods between failures.

Does this PR introduce any user-facing change?

A new optional configuration field restartCounterResetMillis added to the RestartConfig spec.

How was this patch tested?

Added unit test that validates restart counter works as expected.

Was this patch authored or co-authored using generative AI tooling?

No

### What changes were proposed in this pull request?

This PR adds support for automatic restart counter reset based on application attempt duration. The feature introduces a new `restartCounterResetMillis` field in RestartConfig that allows the restart counter to be reset if an application runs successfully for a specified duration before terminating.

Also added unit test and enhanced existing test `assertGeneratedCRDMatchesHelmChart` to give diff fore readability.

### Why are the changes needed?

With this feature, users can distinguish between persistent failures (quick consecutive crashes) and applications that run for long periods between failures.

### Does this PR introduce _any_ user-facing change?

A new optional configuration field restartCounterResetMillis added to the RestartConfig spec.

### How was this patch tested?

Added unit test that validates restart counter works as expected.

### Was this patch authored or co-authored using generative AI tooling?

No
@jiangzho
Copy link
Contributor Author

cc @peter-toth can you please help to review this ?

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-54010] Support restart counter in Spark app [SPARK-54010] Support applicationTolerations.restartConfig.restartCounterResetMillis Oct 27, 2025
@peter-toth
Copy link
Contributor

Sorry, for the delay @jiangzho, I can review this PR Thursday or Friday.

Copy link
Contributor

@peter-toth peter-toth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, I have just minor comments.

@peter-toth
Copy link
Contributor

@dongjoon-hyun , https://issues.apache.org/jira/browse/SPARK-54010 is under the 0.7.0 unbrella. Would you like us to wait with merging this or do you think we can move it to 0.6.0?

@dongjoon-hyun
Copy link
Member

Feel free to move, @peter-toth .

@peter-toth peter-toth closed this in 43be18c Nov 3, 2025
@peter-toth
Copy link
Contributor

Thank you @jiangzho for the fix!

Merged to main (0.6.0).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants