-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Replace blocking wait with non-blocking delay in paxos repair #4434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
| private static class DelayedRepair | ||
| { | ||
| private final UncommittedPaxosKey uncommitted; | ||
| private final long startAfterMillis; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assumed this was a time delta when I read it (i.e. after millis (have elapsed)). Perhaps atMillis or startAtMillis, or scheduledAtMillis?
But also, this should probably be nanos and we should probably use nanoTime?
| private final boolean autoRepair; | ||
|
|
||
| private final Map<DecoratedKey, AbstractPaxosRepair> inflight = new ConcurrentHashMap<>(); | ||
| private final Queue<DelayedRepair> delayed = new LinkedBlockingQueue<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given usage, can't we just use CLQ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean concurrent linked queue? delayed is only accessed inside synchronized blocks so I don't think we'd gain anything by using it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, totally fine with swapping it for ArrayDeque instead (or even LinkedList). Was just trying to downgrade LinkedBlockingQueue, as probably too heavy weight.
| logger.info("Paxos auto repair encountered a potentially in progress ballot, sleeping {}ms to allow the in flight operation to finish", sleepMillis); | ||
|
|
||
| delayed.add(new DelayedRepair(uncommitted, nowMillis + sleepMillis)); | ||
| ScheduledExecutors.scheduledFastTasks.schedule(this::scheduleKeyRepairsOrFinish, sleepMillis, MILLISECONDS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to synchronise scheduleKeyRepairsOrFinish? Or wrap it in a synchronise call inside the lambda. Currently it inherits its safety from its callers
| return false; | ||
|
|
||
| if (waitForCoordinator) | ||
| maybeWaitForOriginalCoordinator(uncommitted, txnTimeoutMicros); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also delete maybeWaitForOriginalCoordinator?
| @Override | ||
| public int compare(DelayedRepair o1, DelayedRepair o2) | ||
| { | ||
| long delta = o1.scheduledAtNanos - o2.scheduledAtNanos; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Long.compare?
Thanks for sending a pull request! Here are some tips if you're new here:
Commit messages should follow the following format:
The Cassandra Jira