Conversation
82ada96 to
ff6c292
Compare
2b8fcbb to
2053566
Compare
wgtmac
left a comment
There was a problem hiding this comment.
I've carefully reviewed the retry mechanism and found a few parity issues and a structural data-loss concern regarding how pending updates are held during retries. Please see the inline comments.
| bool timed_out = config_.total_timeout_ms > 0 && | ||
| elapsed > config_.total_timeout_ms && attempt > 1; | ||
| if (attempt >= max_attempts || timed_out) { | ||
| return result; |
There was a problem hiding this comment.
The timed_out check requires attempt > 1. If the first execution takes longer than total_timeout_ms to fail, timed_out will be falsely evaluated as false, and the runner will erroneously proceed to sleep and execute a second attempt. Java's Tasks.java strictly validates durationMs > maxDurationMs unconditionally and aborts immediately without attempting a retry. Remove the && attempt > 1 condition.
| return std::max(1, delay_ms); | ||
| } | ||
|
|
||
| /// \brief Sleep for the specified duration |
There was a problem hiding this comment.
The C++ jitter calculation uses a bidirectional spread [-jitter_range, jitter_range]. Java's Tasks.java specifically adds a strictly positive jitter: [0, delayMs * 0.1). Consider generating a strictly positive random value [0, jitter_range] to align precisely with Java.
|
|
||
| Kind kind() const final { return Kind::kUpdateSnapshotReference; } | ||
|
|
||
| bool IsRetryable() const override { return false; } |
There was a problem hiding this comment.
Overriding IsRetryable() to explicitly return false causes Transaction::CanRetry() to fail any transaction containing branch or tag updates on conflicts. In Java, SnapshotManager.commit() utilizes transaction.commitTransaction(), which safely retries UpdateSnapshotReferencesOperation. Branch and tag creations should be retryable. Consider removing this override or returning true.
|
I just recall a design flaw in the interaction between PendingUpdate and Transaction and created a fix: #591. Without this fix, users have to cache all created pending update instances, otherwise they cannot retry them since they are weak_ptr in the transaction instance. |
| std::optional<std::vector<ErrorKind>> only_retry_on_; | ||
| std::optional<std::vector<ErrorKind>> stop_retry_on_; |
There was a problem hiding this comment.
Can we just use vector here?
This commit implements the retry for transaction commits. It introduces a generic RetryRunner utility with exponential backoff and error-kind filtering, and integrates it into Transaction::Commit() to automatically refresh table metadata and retry on commit conflicts.