Skip to content

Conversation

@jaschrep-msft
Copy link
Member

Implements two foundation components to implement partitioned upload and copy.

  • PartitionedStream: Consumes a Box<dyn SeekableStream> and converts it to a Stream<Item = Result<Bytes, Error>> where each Ok(Bytes) returned is a contiguously buffered partition to be used for a put block or equivalent request.
  • run_all_with_concurrency_limit(): Takes a sequence of async jobs (impl FnOnce() -> impl Future<Output = Result<(), Error>>). These will be sequences of put block operations or equivalent requests.

@github-actions github-actions bot added the Storage Storage Service (Queues, Blobs, Files) label Nov 12, 2025
@jaschrep-msft jaschrep-msft marked this pull request as ready for review November 19, 2025 22:01
Copilot AI review requested due to automatic review settings November 19, 2025 22:01
Copilot finished reviewing on behalf of jaschrep-msft November 19, 2025 22:04
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements foundational components for partitioned upload and copy operations in Azure Storage Blob SDK, introducing stream partitioning and concurrent operation execution capabilities.

  • Adds PartitionedStream that converts a SeekableStream into partitioned Bytes chunks for block operations
  • Implements run_all_with_concurrency_limit() for executing async operations with configurable concurrency
  • Includes comprehensive test coverage for both components

Reviewed Changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 16 comments.

Show a summary per file
File Description
sdk/storage/azure_storage_blob/src/streams/partitioned_stream.rs New implementation of PartitionedStream with Stream and FusedStream traits, plus test suite
sdk/storage/azure_storage_blob/src/streams/mod.rs Module declaration for streams
sdk/storage/azure_storage_blob/src/partitioned_transfer/mod.rs New implementation of run_all_with_concurrency_limit() with concurrency control logic and tests
sdk/storage/azure_storage_blob/src/lib.rs Module declarations for new partitioned_transfer and streams modules
sdk/storage/azure_storage_blob/Cargo.toml Added bytes and futures dependencies (and rand for tests)
Cargo.lock Lock file updates reflecting new dependencies

jaschrep-msft and others added 5 commits November 19, 2025 22:21
Accept useful generated comments.

Co-authored-by: Copilot <[email protected]>
generated docs tried to write a doctest for a non-public function.
Comment on lines 36 to 44
fn take(&mut self) -> Vec<u8> {
let mut ret = mem::replace(
&mut self.buf,
vec![0u8; std::cmp::min(self.partition_len, self.inner.len() - self.total_read)],
);
ret.truncate(self.buf_offset);
self.buf_offset = 0;
ret
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're already taking a dependency on Bytes. Use it. It already has all the functionality for this. At the very least, don't repeat the calculation of buf.len() that you'd already have computed during construction. Bytes has already been well-tested. I see potential faults here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @heaths, I can't for the life of me figure out how to get the slice of BytesMut between len() and capacity(). I know I need spare_capacity_mut() but I cannot get the resulting &mut [MaybeUninit<u8>] into &mut [u8]. Every time I think I have it figured out it depends on nightly Rust. If you could please just link a docs page or give me a snippet with the appropriate use lines included, I can finish this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That Bytes::spare_capacity_mut() returns a [MaybeUninit<u8>] isn't meant to be an inconvenience but to help enforce safety. Since you're basically trying to make an out pointer it seems, did you look at MaybeUninit's https://doc.rust-lang.org/nightly/core/mem/union.MaybeUninit.html#out-pointers? Returning a Vec<MaybeUninit<u8>> is the right thing to do, in which case you can create the Vec using https://doc.rust-lang.org/stable/std/vec/struct.Vec.html#method.from_raw_parts. Both are unsafe, which simply means you have to guarantee their safety in use so a comment explaining why you're sure it's safe is common.

My point, though, was that you're repeating much of what Bytes already has. Did you take a look at its APIs: https://docs.rs/bytes? We also already have a ByteStream that might give you want you need for the stream itself. If not, and any changes you need could be useful elsewhere, we can update core.

My original comment was just to avoid duplicating what's already been done and well-tested. If neither meet your needs than this code seems fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @heaths, I've opted for transmute instead of from_raw_parts. The code ends up looking far cleaner and I think a conversion from &mut [MaybeUninit<u8>] to &mut [u8] is about as clean as it gets. Hopefully the comments do a good job explaining my work.

Poll::Ready(Some(Ok(Bytes::from(ret))))
};
} else {
match ready!(pin!(&mut this.inner).poll_read(cx, &mut this.buf[this.buf_offset..]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still say you should use #[pin_project] like we do elsewhere. Makes everything easier and the code cleaner.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I am working on integrating that along with the Bytes usage.

Comment on lines 36 to 44
fn take(&mut self) -> Vec<u8> {
let mut ret = mem::replace(
&mut self.buf,
vec![0u8; std::cmp::min(self.partition_len, self.inner.len() - self.total_read)],
);
ret.truncate(self.buf_offset);
self.buf_offset = 0;
ret
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That Bytes::spare_capacity_mut() returns a [MaybeUninit<u8>] isn't meant to be an inconvenience but to help enforce safety. Since you're basically trying to make an out pointer it seems, did you look at MaybeUninit's https://doc.rust-lang.org/nightly/core/mem/union.MaybeUninit.html#out-pointers? Returning a Vec<MaybeUninit<u8>> is the right thing to do, in which case you can create the Vec using https://doc.rust-lang.org/stable/std/vec/struct.Vec.html#method.from_raw_parts. Both are unsafe, which simply means you have to guarantee their safety in use so a comment explaining why you're sure it's safe is common.

My point, though, was that you're repeating much of what Bytes already has. Did you take a look at its APIs: https://docs.rs/bytes? We also already have a ByteStream that might give you want you need for the stream itself. If not, and any changes you need could be useful elsewhere, we can update core.

My original comment was just to avoid duplicating what's already been done and well-tested. If neither meet your needs than this code seems fine.

@jaschrep-msft jaschrep-msft requested a review from heaths December 3, 2025 19:32
@github-actions
Copy link

github-actions bot commented Dec 3, 2025

API Change Check

APIView identified API level changes in this PR and created the following API reviews

azure_storage_blob

Comment on lines +62 to +69
match ready!(this.inner.as_mut().poll_read(cx, unsafe {
// spare_capacity_mut() gives us the known remaining capacity of BytesMut.
// Those bytes are valid reserved memory but have had no values written
// to them. Those are the exact bytes we want to write into.
// This transmuted data is not saved to a variable, leaving it inaccessible
// to anything but poll_read().
mem::transmute::<&mut [MaybeUninit<u8>], &mut [u8]>(this.buf.spare_capacity_mut())
})) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw, if we really don't want transmute in here, this is what we'll end up with:

let spare_capacity = this.buf.spare_capacity_mut();
match ready!(this.inner.as_mut().poll_read(cx, unsafe {
    // spare_capacity_mut() gives us the known remaining capacity of BytesMut.
    // Those bytes are valid reserved memory but have had no values written
    // to them. Those are the exact bytes we want to write into.
    &mut Vec::<u8>::from_raw_parts(
        spare_capacity.as_mut_ptr() as *mut u8,
        spare_capacity.len(),
        spare_capacity.len(),
    )
    // mem::transmute::<&mut [MaybeUninit<u8>], &mut [u8]>(this.buf.spare_capacity_mut())
})) {

@heaths happy to make this change if we are unwilling to touch transmute but to my eyes the usage here is extremely safe and it doesn't seem like from_raw_parts is much safer in the first place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Storage Storage Service (Queues, Blobs, Files)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants