Adding ETag Usage to the Instance Table#1280
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR introduces ETag-based concurrency control for the Azure Storage instance table to prevent race conditions where a stalled worker incorrectly updates instance status after another worker has already completed the orchestration. The implementation uses ETags to ensure that instance table updates fail if the instance has been modified since it was last read.
Key Changes
- Added
OrchestrationETagsclass to track both instance and history table ETags separately - Modified tracking store interfaces and implementations to use ETags when updating the instance table
- Added comprehensive tests covering both regular orchestrations and suborchestrations scenarios
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
test/DurableTask.AzureStorage.Tests/AzureStorageScaleTests.cs |
Reorganized imports and added two new test methods to verify proper exception handling when stalled workers attempt to update instance table |
src/DurableTask.Core/OrchestrationState.cs |
Added internal Etag property to OrchestrationState class |
src/DurableTask.AzureStorage/Tracking/TrackingStoreBase.cs |
Updated method signature to use OrchestrationETags and changed return type from Task<ETag?> to Task |
src/DurableTask.AzureStorage/Tracking/InstanceStoreBackedTrackingStore.cs |
Updated UpdateStateAsync to use OrchestrationETags parameter and removed ETag return logic |
src/DurableTask.AzureStorage/Tracking/ITrackingStore.cs |
Updated interface signature for UpdateStateAsync and renamed GetStateAsync to FetchInstanceStatusAsync |
src/DurableTask.AzureStorage/Tracking/AzureTableTrackingStore.cs |
Implemented TryUpdateInstanceTableAsync method with ETag-based update logic and split-brain detection logging |
src/DurableTask.AzureStorage/OrchestrationSessionManager.cs |
Updated to use FetchInstanceStatusAsync and propagate instance ETags through message metadata |
src/DurableTask.AzureStorage/OrchestrationETags.cs |
New class to encapsulate both instance and history table ETags |
src/DurableTask.AzureStorage/Messaging/OrchestrationSession.cs |
Changed from single ETag property to OrchestrationETags object |
src/DurableTask.AzureStorage/MessageData.cs |
Added MessageMetadata property to store instance ETag information |
src/DurableTask.AzureStorage/AzureStorageOrchestrationService.cs |
Updated to pass OrchestrationETags instead of single ETag to tracking store |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/DurableTask.AzureStorage/Tracking/AzureTableTrackingStore.cs
Outdated
Show resolved
Hide resolved
src/DurableTask.AzureStorage/Tracking/AzureTableTrackingStore.cs
Outdated
Show resolved
Hide resolved
cgillum
reviewed
Dec 15, 2025
src/DurableTask.AzureStorage/Tracking/AzureTableTrackingStore.cs
Outdated
Show resolved
Hide resolved
src/DurableTask.AzureStorage/Tracking/AzureTableTrackingStore.cs
Outdated
Show resolved
Hide resolved
src/DurableTask.AzureStorage/AzureStorageOrchestrationServiceSettings.cs
Show resolved
Hide resolved
cgillum
approved these changes
Dec 16, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR introduces the ability to use etags when attempting to update the instance table in Azure Storage upon completion of a work item. This behavior will be "off" by default (elaborated on below). This is to help detect the following scenario.
Since the orchestration was completed in step 2 and all control queue messages for it deleted, there is no way to detect this scenario (i.e., no future messages will "retrigger" this orchestration to run). The only way to prevent this from happening, as far as I can tell, is to introduce etag usage for the instance table. Then, when worker A attempts to update the instance table in step 3, it will fail due to an etag mismatch.
This new behavior would require doing a read on the instance table to get the latest instance table etag for every new orchestration work item (assuming extended sessions are not enabled). After running some performance tests to validate the impact of this new I/O, I discovered that:
Task.WhenAllon 10 activity calls, the existing code without the instance table etag usage took around 14.5 minutes to complete across 3 trials whereas this new code took around 17.5 minutes.Given the negative performance impact of enabling this new etag usage, this PR hides it behind a feature flag in the
AzureStorageOrchestrationServiceSettingswhich is off by default.