Improve async string perf and fix reading chars with initial offset. #3377

Wraith2 · 2025-05-26T20:26:25Z

Description

For performance related changes see background information in #3274 (comment) and the preceeding conversation. Reading a string in async mode was being slower than expected so I investigated. I will annotate each change with an explanation.

Issues

fixes #3331
fixes #3274

Testing

Tests have been added which cover both issues that are being fixed.

…decrease memcopy and gc time spent

…arameter

Wraith2 · 2025-05-26T20:27:10Z

src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/TdsParser.cs

@@ -13133,8 +13133,7 @@ bool writeDataSizeToSnapshot

            totalCharsRead = (startOffsetByteCount >> 1);
            charsLeft -= totalCharsRead;
-            offst = totalCharsRead;


This is all that's required to fix #3331, done in both netcore and netfx.

Wraith2 · 2025-05-26T20:30:05Z

src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/TdsParser.cs

@@ -13088,7 +13088,7 @@ bool writeDataSizeToSnapshot
                ||
                (buff.Length >= offst + len)
                ||
-                (buff.Length == (startOffsetByteCount >> 1) + 1),
+                (buff.Length >= (startOffsetByteCount >> 1) + 1),


This change simply allows the target buffer we're copying into to be a bit larger than we actually need. This is required so that we can make the later change which grows the buffer more than we need.

Wraith2 · 2025-05-26T20:37:55Z

src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/TdsParser.cs

@@ -13152,7 +13152,10 @@ bool writeDataSizeToSnapshot
                        }
                        else
                        {
-                            newbuf = new char[offst + charsRead];
+                            // grow by an arbitrary number of packets to avoid needing to reallocate


As this comment explains we grow the buffer in a slightly unusual way. Typically buffer sizes are doubled or use some reasonably fraction (like 1.5). This can be unhelpful when we know that we're likely to run in low memory conditions like containers and that the data could be very large. It would be very easy to have a 512 Mib buffer and be unable to resize it to 1Gib because of limits.

The reason that we need to resize the buffer is that we've encountered some more data and we aren't certain of the total data size. In the case of a multipacket string we would keep encountering more data with each packet and keep creating new buffers, copying data and then doing it all again resulting in huge amounts of memory throughput and time spent in memcpy, see profile traces in #3274 (comment)

This strategy allows the buffer to be slightlty oversized each time more space is needed. The multiple of packetsize means that we can accept at least that number of packets before needing another resize (because packet data cannot be entirely made up of char data) which reduces the reallocation frequenty by the multiplier. 8 is an entirely arbitrary choice guided by the opposing needs to be 1) conservative about growth and 2) reduce frequency of reallocation.

Wraith2 · 2025-05-26T20:48:34Z

src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/TdsParser.cs

@@ -13096,9 +13096,9 @@ bool writeDataSizeToSnapshot
            // If total length is known up front, the length isn't specified as unknown 
            // and the caller doesn't pass int.max/2 indicating that it doesn't know the length
            // allocate the whole buffer in one shot instead of realloc'ing and copying over each time
-            if (buff == null && stateObj._longlen != TdsEnums.SQL_PLP_UNKNOWNLEN && len < (int.MaxValue >> 1))
+            if (buff == null && stateObj._longlen != TdsEnums.SQL_PLP_UNKNOWNLEN && stateObj._longlen < (int.MaxValue >> 1))


I've found this line of code suspicious for quite a while but haven't been confident about changing it. Understanding what it's doing requires both understanding what the internals of the code are doing and how plp data is represented in the tds protocol, please see TDS spec 2.2.5.2.3:

A value sent as PLP can be either NULL, a length followed by chunks (as defined by PLP_CHUNK), or an unknown length token followed by chunks

The important part is that the total length can be sent up front. In practice when we encounter a plp data stream we read the length and store the value in stateObj._longlen. The len parameter is not the same thing, it's a parameter to the function to tell it how much data we're expecting it to read and if we're reading chunks we usually don't know.

Now look at the check it's checking the buffer, then the plp total length to see if it's unknown, then the length we're asking for. Why does it matter if we're asking for "as much as possible" which is what int.MaxValue /2 is interpretted as? It doesn't matter as far as I can see in this context. The context is deciding how large of a buffer to allocate.

Now imagine we change it like this. If we have no buff yet, and we know the stateObj._longlen (which is what the protocol gave us as the known total length) is not the sprecific value for unknown, and the value of stateObj._longlen is less than the largest possible char buffer we're allowed to allocate by the runtime, then we could allocate a buffer of that size.

With that change all tests continue to pass and we're more likely to allocate correctly size buffers for plp data at the start of the read process. This avoids later copying resizing etc.

It looks like we're still requesting/creating a buffer of length Min(stateObj._longlen, len) below, so will this actually change the behavior?

When we're chunking PLP data, what values are passed in for len? Is it just a guess? based on packet size? something else?

len will be unknown length, which is int.max/2 as the original comment mentions. So all valid plp longlen's will be less than it in that case. If we are passed a len value which is less than the chunk size it'll read the chunk length then read the len bytes that it wants from that chunk. The min(len, _longlen) still makes sense which is why I didn't change it.

I'm a bit suspicious of this change ... SQL_PLP_UNKNOWNLEN is a magic number that indicates the plp data is of unknown length, but (int.MaxValue >> 1) is magic number that indicates the caller doesn't know the length. Also, _longlen is a ulong but len is just an int. So I'm confused why the magic maxint/2 value would be useful to the _longlen. Ie, there's a wide range between UNKNOWNLEN and int.MaxValue >> 1 that isn't being accounted for, afaict.

Either way, rewriting the comment above the check and giving int.MaxValue >> 1 a constant would probably help with understanding what's going on here.

My interpretation of the int.MaxValue >> 1 value is that it's simply a derivation of the SQL_PLP_UNKNOWNLEN constant which is a byte count converted to a char count since chars are twice as long. As far as I know the protocol only deals with byte count lengths. If you're reading bytes you'd expect the unknown value and if you're working in chars you /2, there are a number of other places in the codebase that this operation is done.

I've spent a lot of time debugging through this code over a few years because of the async-continue work and other improvements before that. When doing so I've observed that the piece of code that I'm changing here is almost never executed because the len is the sentinel int.max/2 value. Inspecting the state on several of those occasions it wasn't clear to me why the check was written the way that it is.
As I mentioned in #3377 (comment) I found this suspicious, but only suspicious. It's not clearly incorrect and the code functions but it does so less well than it possibly can. I appreciate the need for confidence in the change but without being the original author I can't identify the original intention of the check and whether it guards against some obscure condition that we can't identify. Unfortunately I can't claim to have complete knowledge of why it is the way it is so I also can't say that the change is entirely safe. It's just too complicated for that level of assertion.

Without this change everything continues to work. If we don't allocate the entire array up front then we carry on into the code path where we repeatedly resize the array which I've changed in this PR. If you'd prefer to drop this particular change from the PR and keep all the other changes I'd be fine with it, it's still a bug fix and net gain PR.

Wraith2 · 2025-05-26T20:50:34Z

/cc @Tornhoof, @AlexanderKot this should fix your issues.

@dotnet/sqlclientdevteam can I get a CI kick? I'd really really like to get this in to preview 2 because it fixes all the known issues with async continue.

cheenamalhotra · 2025-05-27T00:05:21Z

/azp run

azure-pipelines · 2025-05-27T00:05:45Z

Azure Pipelines successfully started running 2 pipeline(s).

codecov · 2025-05-27T01:19:54Z

Codecov Report

Attention: Patch coverage is 0% with 10 lines in your changes missing coverage. Please review.

Project coverage is 58.83%. Comparing base (b8948f2) to head (68c4322).
Report is 7 commits behind head on main.

Files with missing lines	Patch %	Lines
...nt/netfx/src/Microsoft/Data/SqlClient/TdsParser.cs	0.00%	5 Missing ⚠️
.../netcore/src/Microsoft/Data/SqlClient/TdsParser.cs	0.00%	4 Missing ⚠️
...c/Microsoft/Data/SqlClient/TdsParserStateObject.cs	0.00%	1 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (b8948f2) and HEAD (68c4322). Click for more details.

HEAD has 1 upload less than BASE

Flag BASE (b8948f2) HEAD (68c4322)

addons 1 0

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3377      +/-   ##
==========================================
- Coverage   67.04%   58.83%   -8.22%     
==========================================
  Files         300      292       -8     
  Lines       65376    65214     -162     
==========================================
- Hits        43831    38366    -5465     
- Misses      21545    26848    +5303

Flag	Coverage Δ
addons	`?`
netcore	`63.00% <0.00%> (-9.20%)`	⬇️
netfx	`59.79% <0.00%> (-5.40%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull Request Overview

This PR targets performance improvements in asynchronous string processing and corrects an issue with reading characters using an initial offset.

Added a new sequential character reading test ("CanGetCharsSequentially") to verify correct behavior.
Introduced helper methods in tests to use pooled buffers, ensuring buffers are resized appropriately.
Updated buffer management logic in TdsParser (both netfx and netcore) to use the correct offset and to improve buffer reallocation efficiency.

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File	Description
src/Microsoft.Data.SqlClient/tests/ManualTests/SQL/DataReaderTest/DataReaderTest.cs	Adds a new test and helper method for sequential character reading using pooled buffers.
src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/TdsParserStateObject.cs	Introduces a GetPacketSize() method and a property to calculate the current packet index.
src/Microsoft.Data.SqlClient/netfx/src/Microsoft/Data/SqlClient/TdsParser.cs	Adjusts buffer sizing logic and offset handling to improve performance and correctness.
src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/TdsParser.cs	Mirrors changes in netfx for improved buffer management and offset accumulation.

src/Microsoft.Data.SqlClient/tests/ManualTests/SQL/DataReaderTest/DataReaderTest.cs

src/Microsoft.Data.SqlClient/netfx/src/Microsoft/Data/SqlClient/TdsParser.cs

src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/TdsParser.cs

src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/TdsParserStateObject.cs

cheenamalhotra · 2025-05-28T05:28:53Z

/azp run

azure-pipelines · 2025-05-28T05:29:18Z

Azure Pipelines successfully started running 2 pipeline(s).

mdaigle · 2025-05-29T18:33:02Z

src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/TdsParser.cs

@@ -13096,9 +13096,9 @@ bool writeDataSizeToSnapshot
            // If total length is known up front, the length isn't specified as unknown 
            // and the caller doesn't pass int.max/2 indicating that it doesn't know the length
            // allocate the whole buffer in one shot instead of realloc'ing and copying over each time
-            if (buff == null && stateObj._longlen != TdsEnums.SQL_PLP_UNKNOWNLEN && len < (int.MaxValue >> 1))
+            if (buff == null && stateObj._longlen != TdsEnums.SQL_PLP_UNKNOWNLEN && stateObj._longlen < (int.MaxValue >> 1))


It looks like we're still requesting/creating a buffer of length Min(stateObj._longlen, len) below, so will this actually change the behavior?

When we're chunking PLP data, what values are passed in for len? Is it just a guess? based on packet size? something else?

benrr101

I can't say I know much about the internals here, so I appreciate the callouts on what changes are being made. However, I'm not understanding what the _longlen changes are really doing. Please take a look at my comments and I'll approve if you can explain it without code changes :)

benrr101 · 2025-05-29T23:09:30Z

src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/TdsParser.cs

@@ -13088,17 +13088,17 @@ bool writeDataSizeToSnapshot
                ||
                (buff.Length >= offst + len)
                ||
-                (buff.Length == (startOffsetByteCount >> 1) + 1),
+                (buff.Length >= (startOffsetByteCount >> 1) + 1),
                "Invalid length sent to ReadPlpUnicodeChars()!"
            );
            charsLeft = len;

            // If total length is known up front, the length isn't specified as unknown 
            // and the caller doesn't pass int.max/2 indicating that it doesn't know the length


I think with this change, the comment will need to be changed as well - for this check we no longer care what the caller requested.

What would you like it changed to?

benrr101 · 2025-05-29T23:19:33Z

src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/TdsParser.cs

@@ -13096,9 +13096,9 @@ bool writeDataSizeToSnapshot
            // If total length is known up front, the length isn't specified as unknown 
            // and the caller doesn't pass int.max/2 indicating that it doesn't know the length
            // allocate the whole buffer in one shot instead of realloc'ing and copying over each time
-            if (buff == null && stateObj._longlen != TdsEnums.SQL_PLP_UNKNOWNLEN && len < (int.MaxValue >> 1))
+            if (buff == null && stateObj._longlen != TdsEnums.SQL_PLP_UNKNOWNLEN && stateObj._longlen < (int.MaxValue >> 1))


I'm a bit suspicious of this change ... SQL_PLP_UNKNOWNLEN is a magic number that indicates the plp data is of unknown length, but (int.MaxValue >> 1) is magic number that indicates the caller doesn't know the length. Also, _longlen is a ulong but len is just an int. So I'm confused why the magic maxint/2 value would be useful to the _longlen. Ie, there's a wide range between UNKNOWNLEN and int.MaxValue >> 1 that isn't being accounted for, afaict.

Either way, rewriting the comment above the check and giving int.MaxValue >> 1 a constant would probably help with understanding what's going on here.

Wraith2 added 5 commits May 26, 2025 20:43

refine handling of moving between replay and continue states

412b873

for plp reads increase the buffer size in relatively small chunks to …

32d8b9e

…decrease memcopy and gc time spent

change TryReadPlpUnicodeChars to check plp length field not max len p…

e1964c9

…arameter

sync netfx and netcore

9638d61

add test covering dotnet#3331 and fix

8a8ffaa

Wraith2 requested a review from a team as a code owner May 26, 2025 20:26

Wraith2 commented May 26, 2025

View reviewed changes

re-remove CurrentPacketIndex, fix test name error

68c4322

Wraith2 commented May 26, 2025

View reviewed changes

cheenamalhotra added this to the 6.1-preview2 milestone May 27, 2025

cheenamalhotra requested a review from Copilot May 27, 2025 05:39

Copilot AI reviewed May 27, 2025

View reviewed changes

cheenamalhotra reviewed May 27, 2025

View reviewed changes

src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/TdsParserStateObject.cs Outdated Show resolved Hide resolved

cheenamalhotra requested a review from a team May 27, 2025 17:23

paulmedynski approved these changes May 29, 2025

View reviewed changes

mdaigle self-assigned this May 29, 2025

mdaigle requested changes May 29, 2025

View reviewed changes

benrr101 requested changes May 29, 2025

View reviewed changes

Improve async string perf and fix reading chars with initial offset. #3377

Are you sure you want to change the base?

Improve async string perf and fix reading chars with initial offset. #3377

Conversation

Wraith2 commented May 26, 2025

Description

Issues

Testing

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mdaigle May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Wraith2 commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cheenamalhotra commented May 27, 2025

Uh oh!

azure-pipelines bot commented May 27, 2025

Uh oh!

codecov bot commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cheenamalhotra commented May 28, 2025

Uh oh!

azure-pipelines bot commented May 28, 2025

Uh oh!

mdaigle May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

benrr101 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mdaigle May 29, 2025 •

edited

Loading

Wraith2 commented May 26, 2025 •

edited

Loading

codecov bot commented May 27, 2025 •

edited

Loading

mdaigle May 29, 2025 •

edited

Loading