Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CASSSIDECAR-210: Fix PeriodicTaskExecutor double execution due to race from reschedule #192

Merged
merged 6 commits into from
Feb 14, 2025

Conversation

yifan-c
Copy link
Contributor

@yifan-c yifan-c commented Feb 13, 2025

Patch by Yifan Cai; Reviewed by TBD for CASSSIDECAR-210

…e from reschedule

Patch by Yifan Cai; Reviewed by TBD for CASSSIDECAR-210
Comment on lines 243 to 267
LOGGER.debug("Reschedule the task. task='{}' execCount={}", key, execCount);
reschedule(periodicTask);
promise.tryComplete();
LOGGER.debug("Rescheduling the task. task='{}' execCount={}", key, execCount);
promise.tryComplete(Result.RESCHEDULED);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the fix. Do not reschedule, but just signal the decision and update the delay when scheduling a new run.

testRescheduleDecision(0, 1);
}

void testRescheduleDecision(long initialDelay, long delay)
Copy link
Contributor Author

@yifan-c yifan-c Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the test that reproduce the bug.
Test fails with the earlier version of PeriodicTaskExecutor, as there are flee tasks that keep on running.

Verify with the following 2 commands

git checkout upstream/trunk -- server/src/main/java/org/apache/cassandra/sidecar/tasks/PeriodicTaskExecutor.java
./gradlew :server:test --tests org.apache.cassandra.sidecar.tasks.PeriodicTaskExecutorTest.testRescheduleDecision

Copy link
Contributor

@JeetKunDoug JeetKunDoug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll re-review this in the morning (out of time for now) but a few comments so far. The scheduling/rescheduling logic is getting a bit complicated but not sure if there's much to do about that at this point.

* Tells the run completion handler whether task is rescheduled.
* If it is rescheduled, the delay is adjusted to the initial delay of the task
*/
RESCHEDULED
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a Result that doesn't mean RESCHEDULED, like COMPLETED? The fact that we callPromise#tryComplete with either RESCHEDULED or nothing at all (essentially, null) seems really odd, even if we never really check for COMPLETED.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

COMPLETED is unnecessary. When the handler at onComplete is invoked, we know that it is completed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably rename the enum. It is not really the result of execution. Looking for suggestions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - the biggest thing to me was that this enum is used as the type of an AsyncResult and we set the result to RESCHEDULED or nothing at all, which was just jarring when I was looking at the code.

What do you think of actually using the ScheduleDecision enum here, and just getting rid of the Result all together? Then each branch of the executeInternal method would actually be able to indicate which decision was made - we end up only using RESCHEDULE right now but at least there isn't an odd mix of promise.tryComplete() and promise.tryComplete(Result.RESCHEDULE) which didn't make a lot of sense to me - if the promise is a Promise<Result> it should really be completed with something other than null... using Promise<ScheduleDecision> makes more sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep. ScheduleDecision makes better sense.

Copy link
Contributor

@JeetKunDoug JeetKunDoug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 - thanks for the fix!

@yifan-c yifan-c merged commit 0712bf6 into apache:trunk Feb 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants