-
Notifications
You must be signed in to change notification settings - Fork 319
Storage Partitioned Transfer Base #3340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
sdk/storage/azure_storage_blob/src/streams/partitioned_stream.rs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements foundational components for partitioned upload and copy operations in Azure Storage Blob SDK, introducing stream partitioning and concurrent operation execution capabilities.
- Adds
PartitionedStreamthat converts aSeekableStreaminto partitionedByteschunks for block operations - Implements
run_all_with_concurrency_limit()for executing async operations with configurable concurrency - Includes comprehensive test coverage for both components
Reviewed Changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 16 comments.
Show a summary per file
| File | Description |
|---|---|
| sdk/storage/azure_storage_blob/src/streams/partitioned_stream.rs | New implementation of PartitionedStream with Stream and FusedStream traits, plus test suite |
| sdk/storage/azure_storage_blob/src/streams/mod.rs | Module declaration for streams |
| sdk/storage/azure_storage_blob/src/partitioned_transfer/mod.rs | New implementation of run_all_with_concurrency_limit() with concurrency control logic and tests |
| sdk/storage/azure_storage_blob/src/lib.rs | Module declarations for new partitioned_transfer and streams modules |
| sdk/storage/azure_storage_blob/Cargo.toml | Added bytes and futures dependencies (and rand for tests) |
| Cargo.lock | Lock file updates reflecting new dependencies |
sdk/storage/azure_storage_blob/src/streams/partitioned_stream.rs
Outdated
Show resolved
Hide resolved
sdk/storage/azure_storage_blob/src/streams/partitioned_stream.rs
Outdated
Show resolved
Hide resolved
sdk/storage/azure_storage_blob/src/streams/partitioned_stream.rs
Outdated
Show resolved
Hide resolved
Accept useful generated comments. Co-authored-by: Copilot <[email protected]>
generated docs tried to write a doctest for a non-public function.
| fn take(&mut self) -> Vec<u8> { | ||
| let mut ret = mem::replace( | ||
| &mut self.buf, | ||
| vec![0u8; std::cmp::min(self.partition_len, self.inner.len() - self.total_read)], | ||
| ); | ||
| ret.truncate(self.buf_offset); | ||
| self.buf_offset = 0; | ||
| ret | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're already taking a dependency on Bytes. Use it. It already has all the functionality for this. At the very least, don't repeat the calculation of buf.len() that you'd already have computed during construction. Bytes has already been well-tested. I see potential faults here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @heaths, I can't for the life of me figure out how to get the slice of BytesMut between len() and capacity(). I know I need spare_capacity_mut() but I cannot get the resulting &mut [MaybeUninit<u8>] into &mut [u8]. Every time I think I have it figured out it depends on nightly Rust. If you could please just link a docs page or give me a snippet with the appropriate use lines included, I can finish this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That Bytes::spare_capacity_mut() returns a [MaybeUninit<u8>] isn't meant to be an inconvenience but to help enforce safety. Since you're basically trying to make an out pointer it seems, did you look at MaybeUninit's https://doc.rust-lang.org/nightly/core/mem/union.MaybeUninit.html#out-pointers? Returning a Vec<MaybeUninit<u8>> is the right thing to do, in which case you can create the Vec using https://doc.rust-lang.org/stable/std/vec/struct.Vec.html#method.from_raw_parts. Both are unsafe, which simply means you have to guarantee their safety in use so a comment explaining why you're sure it's safe is common.
My point, though, was that you're repeating much of what Bytes already has. Did you take a look at its APIs: https://docs.rs/bytes? We also already have a ByteStream that might give you want you need for the stream itself. If not, and any changes you need could be useful elsewhere, we can update core.
My original comment was just to avoid duplicating what's already been done and well-tested. If neither meet your needs than this code seems fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @heaths, I've opted for transmute instead of from_raw_parts. The code ends up looking far cleaner and I think a conversion from &mut [MaybeUninit<u8>] to &mut [u8] is about as clean as it gets. Hopefully the comments do a good job explaining my work.
sdk/storage/azure_storage_blob/src/streams/partitioned_stream.rs
Outdated
Show resolved
Hide resolved
| Poll::Ready(Some(Ok(Bytes::from(ret)))) | ||
| }; | ||
| } else { | ||
| match ready!(pin!(&mut this.inner).poll_read(cx, &mut this.buf[this.buf_offset..])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still say you should use #[pin_project] like we do elsewhere. Makes everything easier and the code cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I am working on integrating that along with the Bytes usage.
| fn take(&mut self) -> Vec<u8> { | ||
| let mut ret = mem::replace( | ||
| &mut self.buf, | ||
| vec![0u8; std::cmp::min(self.partition_len, self.inner.len() - self.total_read)], | ||
| ); | ||
| ret.truncate(self.buf_offset); | ||
| self.buf_offset = 0; | ||
| ret | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That Bytes::spare_capacity_mut() returns a [MaybeUninit<u8>] isn't meant to be an inconvenience but to help enforce safety. Since you're basically trying to make an out pointer it seems, did you look at MaybeUninit's https://doc.rust-lang.org/nightly/core/mem/union.MaybeUninit.html#out-pointers? Returning a Vec<MaybeUninit<u8>> is the right thing to do, in which case you can create the Vec using https://doc.rust-lang.org/stable/std/vec/struct.Vec.html#method.from_raw_parts. Both are unsafe, which simply means you have to guarantee their safety in use so a comment explaining why you're sure it's safe is common.
My point, though, was that you're repeating much of what Bytes already has. Did you take a look at its APIs: https://docs.rs/bytes? We also already have a ByteStream that might give you want you need for the stream itself. If not, and any changes you need could be useful elsewhere, we can update core.
My original comment was just to avoid duplicating what's already been done and well-tested. If neither meet your needs than this code seems fine.
API Change CheckAPIView identified API level changes in this PR and created the following API reviews |
| match ready!(this.inner.as_mut().poll_read(cx, unsafe { | ||
| // spare_capacity_mut() gives us the known remaining capacity of BytesMut. | ||
| // Those bytes are valid reserved memory but have had no values written | ||
| // to them. Those are the exact bytes we want to write into. | ||
| // This transmuted data is not saved to a variable, leaving it inaccessible | ||
| // to anything but poll_read(). | ||
| mem::transmute::<&mut [MaybeUninit<u8>], &mut [u8]>(this.buf.spare_capacity_mut()) | ||
| })) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw, if we really don't want transmute in here, this is what we'll end up with:
let spare_capacity = this.buf.spare_capacity_mut();
match ready!(this.inner.as_mut().poll_read(cx, unsafe {
// spare_capacity_mut() gives us the known remaining capacity of BytesMut.
// Those bytes are valid reserved memory but have had no values written
// to them. Those are the exact bytes we want to write into.
&mut Vec::<u8>::from_raw_parts(
spare_capacity.as_mut_ptr() as *mut u8,
spare_capacity.len(),
spare_capacity.len(),
)
// mem::transmute::<&mut [MaybeUninit<u8>], &mut [u8]>(this.buf.spare_capacity_mut())
})) {@heaths happy to make this change if we are unwilling to touch transmute but to my eyes the usage here is extremely safe and it doesn't seem like from_raw_parts is much safer in the first place.
Implements two foundation components to implement partitioned upload and copy.
PartitionedStream: Consumes aBox<dyn SeekableStream>and converts it to aStream<Item = Result<Bytes, Error>>where eachOk(Bytes)returned is a contiguously buffered partition to be used for a put block or equivalent request.run_all_with_concurrency_limit(): Takes a sequence of async jobs (impl FnOnce() -> impl Future<Output = Result<(), Error>>). These will be sequences of put block operations or equivalent requests.