Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update unmarshallers to use system text json. #3553

Open
wants to merge 4 commits into
base: petesong/stj
Choose a base branch
from

Conversation

peterrsongg
Copy link
Contributor

@peterrsongg peterrsongg commented Nov 19, 2024

Description

Updates the unmarshallers to use System Text Json for deserialization.

There are a lot of generated service changes, you can look at SQS as an example as they are all the same change. The main thing that should be reviewed is any change in Core.

5ccb4c6 - generated SQS changes as an example
3654c5d - manual s3 changes
18c64c5 - core and unit tests changes for deserialization
e2072fa - generator changes

NOTE To run the unit tests I added you need to run the generator first. Use this command to run the services required to run the unit tests in CoreAndCustomUnitTests.Netframework.sln

-sm autoscaling;bearer-token-auth-test;cloudfront;monitoring;dynamodb;ec2;elasticmapreduce;elastictranscoder;iot-data;kms;redshift;s3;sts;sso;sso-oidc -sro

All protocol tests pass

Some things I want to point out.

  1. Since Utf8JsonReader cannot accept a stream in its constructor, I created a wrapper around Utf8JsonReader that accepts a stream and fills the buffer to pass into the reader. Since the wrapper is also a ref struct it can hold Utf8JsonReader as a field.
  2. Added AWSConfigs.StreamingUtf8JsonReaderBufferSize as a config option if a user wants to tweak the default buffer size used by the reader.
  3. Each unmarshaller is implementing only the interface for its respective protocol now, which means I had to split IUnmarshaller => IXmlUnmarshaller and IJsonUnmarshaller. This gives the added benefit that each structure unmarshaller no longer throws NotImplementedException for the other protocol.
    This is necessary for when we add new protocols down the line. We don't want each structure unmarshaller to implement all protocol interfaces just to throw NotImplemented for the others.
  4. Split the Dictionary, List, KeyValue unmarshallers to separate Json / XML unmarshallers since each protocol only implements its own interface.
  5. Split the error unmarshallers to Xml and Json error response unmarshaller classes since the json error response unmarshallers require a reader to be passed in.
  6. Since S3 is handcoded I had to make some custom changes, specifically for the XmlErrorResponse instantiations.
  7. Due to an issue with aws tools for Powershell compatibility netstandard2.0 and netframework targets must use system.text.json 6.0.11

Motivation and Context

Testing

I tested a couple real API calls. I added unit tests for the streamingUtf8JsonReader, and all the protocol tests pass.

Full dry run will be done in the feature branch.

Screenshots (if appropriate)

Overall, I saw around a 77% performance gain in unmarshalling (note this is on my machine not an EC2 instance). I will attach specific numbers from an ec2 instance as a follow up.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • My code follows the code style of this project
  • My change requires a change to the documentation
  • I have updated the documentation accordingly
  • I have read the README document
  • I have added tests to cover my changes
  • All new and existing tests passed

License

  • I confirm that this pull request can be released under the Apache 2 license

@peterrsongg peterrsongg force-pushed the petesong/stj-deserialization-2 branch from c01a7d9 to ebc898a Compare November 19, 2024 22:24
@peterrsongg peterrsongg marked this pull request as draft November 19, 2024 22:26
@slang25
Copy link
Contributor

slang25 commented Nov 19, 2024

this looks fantastic, very excited for this change

{
bytesRead = stream.Read(buffer, 0, buffer.Length);
}
if (bytesRead == 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the method return here, I worry that the follow line sets the reader using the buffer that is being returned to the pool

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, yeah good suggestion the reader would be using a buffer returned to the pool in that scenario. If I return there though, it will return to the Read() method and the _reader won't have a buffer to work with, and I don't want to set buffer to null. I think instead I'll just refactor a bit and add something like the following in the Read() method

            if (!hasMoreData)
            {
                GetMoreBytesFromStream(_stream, ref _buffer, ref _reader);
                hasMoreData = _reader.Read();
                if (_reader.IsFinalBlock)
                   ArrayPool<byte>.Shared.Return(_buffer);
            }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while fixing a bug in the streamingutf8 reader I decided to remove the pooling due to multiple resizings of the array. I put a justification in a comment, but basically the byte array returned by Array.Resize isn't owned by the pool so keeping track of that didn't seem worth the Renting + copying I would have to do in order to do that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flipped back to pooling

@@ -63,54 +63,58 @@ public IRequest Marshall(AddPermissionRequest publicRequest)
request.HttpMethod = "POST";

request.ResourcePath = "/";
using (MemoryStream memoryStream = new MemoryStream())
#if NETCOREAPP3_1_OR_GREATER
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be stretching the scope of this change too much, but would it be possible to make the source of MemoryStream/IBufferWriter<T> pluggable.

I'd really love to add my own pipeline middleware that allows for injecting a RecyclableMemoryStream (which implements both MemoryStream and IBufferWriter<T>)

Copy link
Member

@normj normj Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My interpretation of this is we would have a property on AWSConfigs called StreamFactory of type IStreamFactory defined as:

public interface IStreamFactory
{
    Stream GetStream();
}

For .NET Core 3.1 and above we would default this to an instance of ArrayBufferWriterFactory:

internal class ArrayBufferWriterFactory : IStreamFactory
{
    public Stream GetStream()
    {
        // TODO: Do something like this class to wrap the ArrayBufferWriter as a Stream. https://github.com/CommunityToolkit/dotnet/blob/c23e1cc6e918f86c589facf98f70ca122495c385/src/CommunityToolkit.HighPerformance/Streams/IBufferWriterStream%7BTWriter%7D.cs#L18
        return new ArrayBufferWriter<byte>();
    }
}

For older targets we would use a MemoryStream based factory:

internal class MemoryStreamFactory : IStreamFactory
{
    public Stream GetStream()
    {
        return new MemoryStream();
    }
}

In our unmarshallers where we need a stream we could get rid of the conditional check and just call AWSConfigs.StreamFactory.GetManager().

Then @slang25 can take a dependency on Microsoft.IO.RecyclableMemoryStream and right his own factory and assign it to AWSConfigs.StreamFactory.

internal class RecyclableMemoryStreamFactory : IStreamFactory
{
    private static readonly RecyclableMemoryStreamManager manager = new RecyclableMemoryStreamManager();

    public Stream GetStream()
    {
        return manager.GetStream();
    }
}

Does that capture your thoughts @slang25 ?

Copy link
Contributor Author

@peterrsongg peterrsongg Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, I just took a look at the class linked that does something similar, which allows the wrapper over IBufferWriter and at first glance it doesn't seem too bad. I'm a bit worried about overriding all those methods defined in the base Stream class though... i wonder how the bufferwriter interacts with those methods and how that then interacts with the JsonWriter if at all, maybe I'm overthinking it. Is it as performant / behave the same? Will hold off until I hear back though on if this is what you were thinking

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'm looking for is a way to get the best perf out of the AWS SDK, so having the ability to pool these buffers that are going to copy all the bytes into memory is going to reduce memory allocations significantly.

My understanding is that IBufferWriter<T> is the fast modern to do synchronous buffered writing, vs Stream which is more ubiquitous in the .NET world today.

That said, I think a Stream GetStream() signature would work, but you could additional do a type check to see if it implements a IBufferWriter<T>, and then use the appropriate overloads. That makes it convenient for use with Streams, and has a fast path for things like RecyclableMemoryStreamManager. To be honest, this isn't a deal breaker, I haven't ran benchmarks but I'm guessing the gains are not tremendous.

The problem with a static IStreamFactory is that I want to be able to pool the buffers and therefore want a way to return them. So the AWSConfigs static property would need an interface to support that for it to achieve what I'm looking for.

I was thinking about having a PipelineHandler that could be added into the pipeline, and setting the stream on some sort of context object that passes through. That way it could own the creation/pooling of buffers, and returning them. However a static property would be more ergonomic.

@peterrsongg peterrsongg force-pushed the petesong/stj-deserialization-2 branch from 231beda to fe0da63 Compare November 20, 2024 06:37
@peterrsongg peterrsongg marked this pull request as ready for review November 22, 2024 07:30
@peterrsongg peterrsongg force-pushed the petesong/stj-deserialization-2 branch from ae62725 to 12fb049 Compare December 2, 2024 21:58

/// <summary>
/// Key for the UseSdkCache property.
/// <seealso cref="Amazon.AWSConfigs.UseSdkCache"/>
/// Configures the default buffer size for the <see cref="Amazon.Runtime.Internal.Util.StreamingUtf8JsonReader"/>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this cref is pointing to an internal undocumented class I would remove "for the <see cref="Amazon.Runtime.Internal.Util.StreamingUtf8JsonReader"/>".


namespace Amazon.Runtime.Documents.Internal.Transform
{
/// <summary>
/// Dedicated <see cref="IUnmarshaller{T,R}"/> for <see cref="Document"/>.
/// Dedicated <see cref="IXmlUnmarshaller{T,R}"/> for <see cref="Document"/>.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be IJsonUnmarshaller instead of IXmlUnmarshaller?


if (bytesRead == 0)
{
ArrayPool<byte>.Shared.Return(buffer);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side question but if we have an Exception while unmarshalling do we need a mechanism to make sure the buffer is returned?

I can't remember what happens with ArrayPool if the shared array objects are never returned. My guess is the non returned buffer is garbage collector and ArrayPool would eventually create a new buffer to match the demand. Does that sound right to you?

If that assumption is true then maybe it is okay to let the buffers get lost in exceptions unless it is fairly easy to make sure the buffer is returned. You could I guess have a finalizer for the class and return the buffer if it hasn't already been returned.

[TestMethod]
public void HandlesUtf8BOM()
{
// we can't use reflection to access the private fields of StreamingUtf8JsonReader since it is a ref struct so we have to test it this way.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could make the methods internal in StreamingUtf7JsonReaderand then since the unit test project has access via theInternalsVisibleToAttributetest the handle method directly. Following this pattern you could also have tests directly onFillBufferandGetMoreBytesFromStream`,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the value in testing FillBuffer directly, but I don't really see the value in testing GetMoreBytesFromStream directly since this method is only called in the context of Read(). The test where I make one Json token = defaultvalueBufferSize + 500 covers the array resizing logic, and the smaller buffer size covers the reader.BytesConsumed < buffer.Length logic. However, I could add one additional test where the stream is exactly the size of the buffer.

So I' like to keep as many methods private as I can, but if you feel strongly about this I can change it to internal. I just don't see what additional testing I would do other than exact buffer size.

// here we're creating a json string that is greater than the default buffer size to test the GetMoreBytesFromStream logic
var sb = new StringBuilder();
sb.Append("{ \"key\": \"");
sb.Append(new string('x', 7500));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case we increase the default size but don't change the value here you might want to reference the AWSConfigs value plus some size bigger.

/// This class just tests the wrapper class StreamingUtf8JsonReader.
/// </summary>
[TestClass]
public class StreamingUtf8JsonReaderTests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have a test for the scenario where the PassReaderByRef is used for converting the JsonDocument and the the document doesn't fit in the buffer.

string largeJson = sb.ToString();

byte[] payload = Encoding.UTF8.GetBytes(largeJson);
using (var stream = new MemoryStream(payload))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To help verification you might want to add an internal property on the StreamingUtf8JsonReader for the current length of the buffer. Then you can check before and after that the buffer size increased.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to do this, but the problem is that once we hit the final block we set buffer to null and then getting the buffer's size leads to a null pointer

@peterrsongg peterrsongg requested a review from normj December 17, 2024 19:11
/// </summary>
/// <param name="reader">The Utf8JsonReader</param>
/// <returns></returns>
public abstract bool Read(ref StreamingUtf8JsonReader reader);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the Xml and Json UnmarshallerContext subclasses have different read methods then we shouldn't even bother putting them on the base class. Anybody interacting with the context to do read have to work with the object as the subclass. And by putting the abstract methods here we have each one having to implement both versions with one version throwing a not implemented exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I removed both the abstract Read() with no parameters and abstract Read(ref streamingUtf8JsonReader reader) and moved each one to its unmarshaller context class. I also moved ReadAtDepth(int targetDepth) since the Json unmarshaller context's ReadAtDepth accepts an additional argument anyways.

JsonDocument document = null;
streamingReader.PassReaderByRef((ref Utf8JsonReader reader) =>
{
document = JsonDocument.ParseValue(ref reader);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relying on the comment telling users to not call this unless unless the full document contents have been read will lead to hard to detect bugs. If we need this method then we should rework and rename the PassReaderByRef method to create a new Utf8JsonReader with the full content.

Where is this method even being called? I didn't see anywhere and if it isn't being called can we get rid of it.

}

/// <summary>
/// This method should be be u
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment is cut off but I suspect the real change is to get rid of this method and make the other overload method work in all cases.

/// </summary>
public static class JsonConstants
{
public static ReadOnlySpan<byte> Utf8Bom => new byte[] { 0xEF, 0xBB, 0xBF };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, the joys of ref structs. Then split out the byte array allocation from the ReadOnlySpan Lambda. Something like this.

        private static byte[] _utf8BomBytes = new byte[] { 0xEF, 0xBB, 0xBF };
        public static ReadOnlySpan<byte> Utf8Bom => _utf8BomBytes;


_stream = stream;
_buffer = ArrayPool<byte>.Shared.Rent(AWSConfigs.StreamingUtf8JsonReaderBufferSize ?? 4096);
_options = new JsonReaderOptions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the options never changes can we make it a private static field of the type to avoid recreating it.

        private static JsonReaderOptions _jsonOptions = new JsonReaderOptions
            {
                AllowTrailingCommas = true
            };

SkipUtf8Bom(ref _buffer);
}

private void SkipUtf8Bom(ref byte[] buffer)
Copy link
Member

@normj normj Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Realizing this method does more then just Skip the Ut8Bom it is also what is in charge of providing the initial content for the buffer I think you should just collapse this method with the constructor. That also gets rid of the awkward _reader creating in the constructor that gets recreated here. Something like this:

public StreamingUtf8JsonReader(Stream stream)
{
    if (stream is null)
        throw new ArgumentException("Stream must not be null. Please initialize a stream and pass it into the constructor.");

    _stream = stream;
    _buffer = ArrayPool<byte>.Shared.Rent(4096);
    _options = new JsonReaderOptions
    {
        AllowTrailingCommas = true
    };

    // Read the initial buffer from the stream and skip over UTF8 BOM if it exists.
    int bytesRead = _stream.Read(_buffer, 0, _buffer.Length);
    int start = 0;
    if (_buffer.AsSpan().StartsWith(JsonConstants.Utf8Bom))
    {
        start += JsonConstants.Utf8Bom.Length;
        bytesRead -= Utf8Bom.Length;
    }

    _reader = new Utf8JsonReader(_buffer.AsSpan(start, bytesRead), isFinalBlock: bytesRead == 0, new JsonReaderState(_options));
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a good point about the awkward _reader recreation. I will just collapse it at this point.

// because it was too large to fit in the remainder of the buffer.
ReadOnlySpan<byte> leftover = buffer.AsSpan().Slice((int)reader.BytesConsumed);

if (leftover.Length == buffer.Length)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay but the comment about the if block is over the leftover and logic about going into the if block is more about if BytesConsumed is equal to 0. It is an extra layer of logic to understand that if BytesConsumed is equal to 0 then leftover.Length == buffer.Length is true. The logic flow is also hard to read in general because the conditions are spread out across the function with more variables being created that are only used sometimes. For example if you are recreating the buffer you go in the first if block then skip some code then go in another blocking continue the resize logic if the resized variable is true.

This is my attempt trying to keep the related logic close to each other and get rid of excess variables. I haven't test this though.

int bytesRead = 0;

if (reader.BytesConsumed < buffer.Length)
{
    // If BytesConsumed is 0 then the previous reader.Read() method failed because
    // the next JSON token to read from the stream is too large to fit in the buffer. 
    // That means we need to resize the buffer and try again to read the next JSON token.
    if (reader.BytesConsumed == 0)
    {                        
        // Rent a new buffer twice the size of the previous and fill the new resized buffer with data from the stream
        var resizedBuffer = ArrayPool<byte>.Shared.Rent(Math.Min(int.MaxValue, (buffer.Length * 2)));
        bytesRead = FillBuffer(stream, ref resizedBuffer, 0, resizedBuffer.Length);

        // Recreate ref parameter reader using a span of the data from the new resized buffer.
        var resizedSpan = resizedBuffer.AsSpan(0, bytesRead + buffer.Length);
        reader = new Utf8JsonReader(resizedSpan, isFinalBlock: bytesRead == 0, reader.CurrentState);

        // Return old buffer and update the ref buffer parameter to new resized buffer.
        Logger.GetLogger(typeof(StreamingUtf8JsonReader)).DebugFormat("Resizing buffer from {0} to {1}", buffer.Length, resizedBuffer.Length);
        ArrayPool<byte>.Shared.Return(buffer);
        buffer = resizedBuffer;

        // Do an early return since we have a new Utf8JsonReader with a filled buffer.
        return;
    }
    else
    {
        // Move the unprocessed data from the buffer to the start and the fill
        // remaining space in the buffer with new content from the stream.
        ReadOnlySpan<byte> leftover = buffer.AsSpan().Slice((int)reader.BytesConsumed);
        leftover.CopyTo(buffer);
        bytesRead = FillBuffer(stream, ref buffer, leftover.Length, buffer.Length - leftover.Length);
    }
}

else
{
    bytesRead = FillBuffer(stream, ref buffer, 0, buffer.Length);
}

@peterrsongg
Copy link
Contributor Author

I've addressed your feedback, thanks for the detailed look. I updated the logic in StreamingUtf8JsonReader as well so that it is easier to follow

@peterrsongg peterrsongg requested a review from normj December 18, 2024 19:21
/// <returns>
/// The text contents of the current token being parsed.
/// </returns>
public override string ReadText()
Copy link
Member

@normj normj Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See if you have to have this method defined on the base class. Any time we have code like this check to see if the methods should really be on the base type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good catch. This one I just forgot to remove. Since the method signatures are different, I removed it from UnmarshallerContext.


_stream = stream;
_buffer = ArrayPool<byte>.Shared.Rent(AWSConfigs.StreamingUtf8JsonReaderBufferSize ?? 4096);
// need to initialize the reader even if the buffer is empty because auto-default of unassigned fields is only
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete this orphan comment.

ArrayPool<byte>.Shared.Return(buffer);
buffer = resizedBuffer;
bytesRead = FillBuffer(stream, ref buffer, leftover.Length, buffer.Length - leftover.Length);
var resizedSpan = buffer.AsSpan(0, bytesRead + previousBufferLength);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again this is a new buffer empty buffer we filled up with previous FillBuffer call. The span should be from 0 to bytesRead. We don't care about the previousBufferLength.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I explain in the comment above. however, I will remove previousBufferLength because that is just leftover.Length in this case.

@peterrsongg peterrsongg requested a review from normj December 18, 2024 22:22
@peterrsongg
Copy link
Contributor Author

peterrsongg commented Dec 19, 2024

Based on our conversation internally I made some updates

bytesRead = FillBuffer(stream, ref buffer, 0, buffer.Length);
}

if (bytesRead == 0)
Copy link
Member

@normj normj Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking this is getting complicated because we are depending on if the code goes into the else block up above dealing with leftover data we also expect this code to handle how to create the reader. What do you think of this version where whenever we go into any condition block it does everything it needs to do and not rely on logic somewhere else to finish things up.

I also updated how the Utf8JsonReader was created in the resize condition to say if it is the final block if we did not fill up the entire new buffer.

private static void GetMoreBytesFromStream(Stream stream, ref byte[] buffer, ref Utf8JsonReader reader)
{
    if (reader.BytesConsumed < buffer.Length)
    {
        ReadOnlySpan<byte> leftover = buffer.AsSpan().Slice((int)reader.BytesConsumed);

        // If BytesConsumed is 0 that means that the previous Read failed because the JSON token was too large to fit in the buffer.
        // In that case we need to resize the buffer and try again to read the JSON token.
        if (reader.BytesConsumed == 0)
        {
            var resizedBuffer = ArrayPool<byte>.Shared.Rent(Math.Min(int.MaxValue, (buffer.Length * 2)));
            Logger.GetLogger(typeof(StreamingUtf8JsonReader)).DebugFormat("Resizing buffer from {0} to {1}", buffer.Length, resizedBuffer.Length);
            // copy over the data from the previous read's buffer to the newly resized buffer.
            buffer.AsSpan().CopyTo(resizedBuffer);
            // return the previous buffer to the pool and set the new buffer to equal the resized buffer.
            ArrayPool<byte>.Shared.Return(buffer);
            buffer = resizedBuffer;
            // fill the new resized buffer with data from the stream. the offset MUST be leftover.Length 
            // so we don't overwrite the data that was copied from the previous buffer, and the number of bytes
            // we read must be buffer.Length - leftover.Length which is just the second half of the buffer.
            var bytesRead = FillBuffer(stream, ref buffer, leftover.Length, buffer.Length - leftover.Length);
            var resizedSpan = buffer.AsSpan(0, bytesRead + leftover.Length);
            reader = new Utf8JsonReader(resizedSpan, isFinalBlock: bytesRead == 0 || resizedSpan.Length != resizedBuffer.Length, reader.CurrentState);
            // early return since we have a reader
            return;
        }
        // The buffer has had some data processed but there is leftover data unprocessed.
        // In this case we move the unprocessed data to the start of the buffer and fill the rest with new data.
        else
        {
            leftover.CopyTo(buffer);
            var bytesRead = FillBuffer(stream, ref buffer, leftover.Length, buffer.Length - leftover.Length);
            reader = new Utf8JsonReader(buffer.AsSpan(0, bytesRead + leftover.Length), isFinalBlock: ((bytesRead + leftover.Length) != buffer.Length || bytesRead == 0), reader.CurrentState);
            return;
        }
    }
    // In this case entire buffer has been processed so attempt to refill the entire
    // buffer.
    else
    {
        var bytesRead = FillBuffer(stream, ref buffer, 0, buffer.Length);
        reader = new Utf8JsonReader(buffer.AsSpan(0, bytesRead), isFinalBlock: bytesRead == 0 || bytesRead != buffer.Length, reader.CurrentState);
        return;
    }
}

public static void Initialize(TestContext testContext)
{
originalBufferSize = AWSConfigs.StreamingUtf8JsonReaderBufferSize.GetValueOrDefault();
AWSConfigs.StreamingUtf8JsonReaderBufferSize = 4096;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend adding a second constructor on the StreamingUtf8JsonReader that takes in the buffer size so you don't have to toggle the AWSConfigs.StreamingUtf8JsonReaderBufferSize global setting.

@peterrsongg peterrsongg added the v4 label Jan 7, 2025
@peterrsongg peterrsongg force-pushed the petesong/stj-deserialization-2 branch from fff1500 to 5ccb4c6 Compare January 9, 2025 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants