-
Notifications
You must be signed in to change notification settings - Fork 559
Refactor BlobManager tracking, support getPendingBlobs and abort for payload pending blobs #25642
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors BlobManager's pending blob tracking system to unify the handling of legacy and payload-pending blob creation flows. The refactoring introduces a state-machine approach for tracking blobs through their lifecycle from creation to attachment, enabling better support for features like getPendingBlobs and abort functionality across both flows.
- Replaces the
pendingBlobsmap withlocalBlobCacheusing explicit state tracking (localOnly, uploading, uploaded, attaching, attached) - Consolidates blob upload/attach logic into a single
uploadAndAttach()method with retry capabilities for TTL expiration - Adds support for abort signals in payload-pending flows and fixes a bug where aborted blobs would still be resubmitted
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
blobManager.ts |
Core refactoring with new state machine and unified uploadAndAttach() method |
containerRuntime.ts |
Updates parameter name from stashedBlobs to pendingBlobs |
blobManager.spec.ts |
Updates abort tests to work with both legacy and payload-pending flows |
blobHandles.spec.ts |
Parameter name update and test behavior adjustment for local blob caching |
getPendingBlobs.spec.ts |
Implements actual test assertions for getPendingBlobs functionality |
blobTestUtils.ts |
Parameter name update in test utilities |
| localBlobRecord.state === "attaching", | ||
| "Pending blob must be in uploading, uploaded, or attaching state", | ||
| ); | ||
| pendingBlobs[localId] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens with attaching blobs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blobs in "uploaded" or "attaching" state are treated the same - we only trust that we have uploaded it to storage and make no assumption about whether a BlobAttach op was sent for them (we're just ready to handle a BlobAttach ack if it shows up). I'll add a code comment here too.
| * @returns A promise that resolves when all the uploads and attaches have completed, or rejects | ||
| * if any of them fail. | ||
| */ | ||
| public readonly sharePendingBlobs = async (): Promise<void> => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not being used anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct - this is new functionality that doesn't yet have an entrypoint. It should be conditionally called only for non-frozen containers (not sure if we have a way to distinguish these from frozen containers currently, but that will be required before we can call it safely). There's one test that calls it, and I'll be adding more tests for it in my next change.
| /** | ||
| * Blobs that were created locally are tracked, and may be in one of these states. When first | ||
| * created, they are in localOnly state. The process of sharing has two steps, upload and attach. | ||
| * Progress through the stages may regress back to localOnly if we determine the storage may have | ||
| * deleted the blob before we could finish attaching it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm confused by this, don't we attach first, and then upload in the payload pending case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two things called "attach" here - attaching the handle, and the BlobAttach op. In pretty much every case here I'll be referring to the blob attach rather than the handle attach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems like you don't want to change the terminology. if you decided to keep calling the op lifecycle attach can you can at least document it, as i think it will be confusing for anyone used to how attach is used in other contexts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I'd love to change it lol - unfortunately I didn't name the BlobAttach op or SummaryType.Attachment and those are persisted. So I decided to go along with the existing naming rather than go on a renaming campaign. Happy to make some changes here though if you like.
In the meantime I'll add some more documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like maybe we've never loved the name "attachment":
#3974 (review)
Other early (~5 years ago) PRs have comments that sound confused between the concept of handle attachment vs. sending the BlobAttach op. So agree would be nice to rename if we get a chance.
| ...serializableBlobRecord, | ||
| blob: stringToBuffer(serializableBlobRecord.blob, "base64"), | ||
| }; | ||
| this.localBlobCache.set(localId, localBlobRecord); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't these need to go into pendingBlobsWithAttachedHandles? it's not really clear to me what you mean by attach. attach means the object is part of the persisted object graph via a stored reference to its handle. for payload pending i wouldn't consider the blob and object to have separate attach states, both are attached when the handle is stored, the object just might not be available yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, yes they need to go into pendingBlobsWithAttachedHandles to round trip them back out to pending state again if we call getPendingBlobs() before the pending blobs finish sharing. Adding in next.
Both legacy and payloadPending still have to complete the BlobAttach op, which happens separately from handle attach. So they are handle-attached when the handle is stored, and it's guaranteed that the legacy flow will also be blob-attached at that time, but payloadPending blobs will not be blob-attached at that time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a test for this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah my next PR will be fleshing out all the getPendingBlobs tests
| if (this.pendingBlobs.has(localId)) { | ||
| this.sendBlobAttachOp(localId, storageId); | ||
| const { localId, blobId: remoteId } = metadata; | ||
| // Any blob that we're actively trying to advance to attached state must be in attaching state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i might be starting to understand, are you calling the blob op round trip attach? that feels pretty off from how its normally used. the blob op is really just sent to the server know about the blob, and so the summarizer references it in the next summary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is even more confusing for blob that are not payload pending, as just uploading and sending the op don't attach them to that object graph, only storing the handle does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right re: terminology, but the BlobAttach op has more responsibilities than that - it's how remote clients learn about the existence of the blob and add it to their redirectTables.
Its ack is also when we know the TTL is no longer a threat and can exit monitoring/retry loop.
Handle attach is actually relatively unimportant (it just determines when things kick off, and which pending blobs go into getPendingBlobs()), whereas blob attach is super central the the main function of BlobManager.
| }; | ||
| this.pendingBlobs.set(localId, pendingEntry); | ||
| this.pendingBlobsWithAttachedHandles.add(localId); | ||
| const uploadAndAttachP = this.uploadAndAttach(localId, signal); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think we should support aborting a blob upload after the handle is attached
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is especially true as the abort is kind of a lie anyway, we can't stop the upload. we only added abort for the case where closeAndGetPendingState was call, such that the app could signal which blobs not to wait for attachment, with payload pending that's not a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it really hurts anything, as presumably anyone who would want to call abort is also deleting the handle from wherever they stored it. It's deterministic behavior that a customer can reason about, whereas "abort sometimes doesn't work when I call it" would just lead to mystery bugs on the customer side.
Really we should just add driver-level support for abort so we can actually abort the upload, which would simplify the logic here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the possible harm is someone reusing an abort controller, and aborting after the handle is returned, and not realizing it also invalidates the handle. usually abort signals only abort things under the context of the methods they are passed to. whereas in this case the method will return, but the abort signal continues to be bound to internal processes that consumer isn't and shouldn't be aware of
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The consumer is already aware of the handle's failed share status via payloadShareError and the relevant events. Aborting the signal updates that and emits in the same predictable way as for any other reason the share might fail.
AB#49640
This PR is a major refactor of BlobManager's tracking of pending blobs. This results in unified tracking between legacy/payloadPending flows. Overall it's simpler and more airtight, e.g. fixing a bug that we were resubmitting BlobAttach ops for aborted blobs. Abort support is added for payloadPending blobs.
getPendingBlobs()is now fully supported. Loading with pending blobs for read-only purposes is now supported (e.g. for frozen container scenarios), as well as then proceeding to resume sharing those blobs via the newsharePendingBlobs()method (e.g. for restoring after a tab close).I've added lots of comments in the code that should explain everything, but at a high level the meat of it is
uploadAndAttach()and thelocalBlobCachestructure. It is called from both legacy/payloadPending flows, and in both normal creation andsharePendingBlobs()flows. It has two subroutinesensureUploadedandtryAttachwhich advance pending blobs' state in a semi-linear fashion (progress can move backwards in the case of TTL expiration). Errors and aborts throw out of the function to either be handled directly by the customer in legacy flows, or to update BlobHandle state in payloadPending flows. It leans heavily on internal eventing to detect external state changes that affect the create flow, so that the main path can be simple and well-contained inuploadAndAttach().This change updates the already-enabled tests to reflect these changes and add coverage for abort, but does not add coverage for the new getPendingBlobs scenarios. Open to opinions on whether we think that additional coverage should be a prerequisite for checkin or not (since it is still an experimental feature).