use new System.Text.Ascii APIs, remove internal helpers #48368

adamsitnik · 2023-05-22T18:35:04Z

This PR does the following:

replaces a BytesOrdinalEqualsStringAndAscii with Ascii.Equals
replaces a AsciiIgnoreCaseEquals with Ascii.EqualsIgnoreCase
replaces a IsAscii with Ascii.IsValid

Since only BytesOrdinalEqualsStringAndAscii was vectorized so far and all new Ascii APIs are vectorized, I provided benchmarks only for BytesOrdinalEqualsStringAndAscii vs Ascii.Equals.

Source code, results:

BenchmarkDotNet=v0.13.2.2052-nightly, OS=Windows 11 (10.0.22621.1702)
AMD Ryzen Threadripper PRO 3945WX 12-Cores, 1 CPU, 24 logical and 12 physical cores
.NET SDK=8.0.100-preview.4.23259.14
  [Host]     : .NET 8.0.0 (8.0.23.25905), X64 RyuJIT AVX2
  Job-CVKHLH : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2

OutlierMode=DontRemove  LaunchCount=9 MemoryRandomization=True

Method	Size	Equal	Mean	Ratio	Allocated
SystemAscii	6	False	1.687 ns	1.00	-
AspNet	6	False	2.810 ns	1.67	-

SystemAscii	6	True	4.388 ns	1.00	-
AspNet	6	True	3.536 ns	0.81	-

SystemAscii	32	False	2.150 ns	1.00	-
AspNet	32	False	3.074 ns	1.43	-

SystemAscii	32	True	3.435 ns	1.00	-
AspNet	32	True	3.304 ns	0.96	-

SystemAscii	64	False	2.120 ns	1.00	-
AspNet	64	False	3.075 ns	1.45	-

SystemAscii	64	True	5.033 ns	1.00	-
AspNet	64	True	4.681 ns	0.93	-

Summary:

Ascii.Equals finishes faster when the inputs don't match
the BytesOrdinalEqualsStringAndAscii helper is slightly faster when the inputs are equal (20% for 6 characters, 4% for 32 chars, 7% for 64 chars). Most probably the reason for that is that the new Ascii APIs check both inputs (left and right) for containing invalid Ascii characters, while the existing ASP.NET helper does it only for one of the inputs (as it knows that the other one is always valid).

BrennanConroy · 2023-05-22T23:13:17Z

src/Servers/Kestrel/Core/src/Internal/Infrastructure/HttpUtilities.cs

@@ -54,7 +54,8 @@ private static ulong GetAsciiStringAsLong(string str)
    {
        Debug.Assert(str.Length == 8, "String must be exactly 8 (ASCII) characters long.");

-        var bytes = Encoding.ASCII.GetBytes(str);
+        Span<byte> bytes = stackalloc byte[8];
+        Debug.Assert(Ascii.FromUtf16(str, bytes, out _) == OperationStatus.Done);


This is compiled away in Release mode

To my surprise you are right. Thank you for catching this.

using System.Buffers; using System.Buffers.Binary; using System.Diagnostics; using System.Text; namespace DebugProof { internal class Program { static void Main(string[] args) { Console.WriteLine(Test8("12345678")); Console.WriteLine(Test4("1234")); } static ulong Test8(string str) { Debug.Assert(str.Length == 8, "String must be exactly 8 (ASCII) characters long."); Span<byte> bytes = stackalloc byte[8]; Debug.Assert(Ascii.FromUtf16(str, bytes, out _) == OperationStatus.Done); return BinaryPrimitives.ReadUInt64LittleEndian(bytes); } static ulong Test4(string str) { Debug.Assert(str.Length == 4, "String must be exactly 4 (ASCII) characters long."); Span<byte> bytes = stackalloc byte[4]; Debug.Assert(Ascii.FromUtf16(str, bytes, out _) == OperationStatus.Done); return BinaryPrimitives.ReadUInt32LittleEndian(bytes); } } }

PS C:\Users\adsitnik\source\repos\DebugProof> dotnet run -c Debug 4050765991979987505 875770417 PS C:\Users\adsitnik\source\repos\DebugProof> dotnet run -c Release 0 0

BrennanConroy · 2023-05-22T23:14:05Z

src/Servers/Kestrel/Core/src/Internal/Http/Http1Connection.cs

@@ -339,7 +340,7 @@ private void OnOriginFormTarget(TargetOffsetPathLength targetPath, Span<byte> ta
        var previousValue = _parsedRawTarget;
        if (ServerOptions.DisableStringReuse ||
            previousValue == null || previousValue.Length != target.Length ||
-            !StringUtilities.BytesOrdinalEqualsStringAndAscii(previousValue, target))


This method does a null check as well, we'd need to make sure that's ok for all the changed callsites.

This method does a null check as well, we'd need to make sure that's ok for all the changed callsites.

I am not sure if I understand. So far all the callers of StringUtilities.BytesOrdinalEqualsStringAndAscii ensured that the input is not null. But if null would be sent to StringUtilities.BytesOrdinalEqualsStringAndAscii , it would throw. With my changes, it won't.

Would you like me to keep the old helper method that simply performs a debug assert for the input and calls the new Ascii API to do the job?

bool BytesOrdinalEqualsStringAndAscii(string previousValue, ReadOnlySpan<byte> newValue) { Debug.Assert(previousValue is not null); return Ascii.Equals(previousValue, newValue); }

Sorry, I mistyped. I meant it checks for 0 not null.

Sorry, I mistyped. I meant it checks for 0 not null.

The 0 check is performed by TryGetAsciiString which I am not touching in this PR (on purpose)

aspnetcore/src/Shared/ServerInfrastructure/StringUtilities.cs

Line 141 in 39564d5

if (!CheckBytesInAsciiRange(vector, avxZero))

aspnetcore/src/Shared/ServerInfrastructure/StringUtilities.cs

Lines 783 to 786 in 39564d5

private static bool CheckBytesInAsciiRange(Vector<sbyte> check)

{

// Vectorized byte range check, signed byte > 0 for 1-127

return Vector.GreaterThanAll(check, Vector<sbyte>.Zero);

It is also done in BytesOrdinalEqualsStringAndAscii which is why I'm bringing it up
https://github.com/dotnet/aspnetcore/blob/main/src/Shared/ServerInfrastructure/StringUtilities.cs#L522

I took a closer look at BytesOrdinalEqualsStringAndAscii.

If the remainder of the vectorized loop contains zero (or the input is simply too small to go the vectorized code path), but the inputs are equal it can still return true:

aspnetcore/src/Shared/ServerInfrastructure/StringUtilities.cs

Lines 501 to 512 in 39564d5

if (offset < count)

{

var ch = (char)Unsafe.Add(ref bytes, offset);

if (((ch & 0x80) != 0) || Unsafe.Add(ref str, offset) != ch)

{

goto NotEqual;

}

}

// End of input reached, there are no inequalities via widening; so the input bytes are both ascii

// and a match to the string if it was converted via Encoding.ASCII.GetString(...)

return true;

But the most important thing is that it checks for zero only one of the inputs:

aspnetcore/src/Shared/ServerInfrastructure/StringUtilities.cs

Lines 521 to 522 in 39564d5

var vector = Unsafe.ReadUnaligned<Vector<sbyte>>(ref Unsafe.Add(ref bytes, offset));

if (!CheckBytesInAsciiRange(vector))

And from what I can see the other input is always a const (known header for example):

aspnetcore/src/Servers/Kestrel/shared/KnownHeaders.cs

Line 326 in 4e17e96

$@"// Matched a known header

It seems that only one of the inputs may contain null characters (because the other one is typically a pre-defined const) and hence Ascii.Equals will always return false in such cases because the inputs will simply not be equal?

@BrennanConroy is my understanding correct?

Edit: I just realized that the existing helper method ensures that the string input does not contain zeros:

https://github.com/dotnet/aspnetcore/blob/main/src/Shared/ServerInfrastructure/StringUtilities.cs#LL418C22-L418C41

aspnetcore/src/Shared/ServerInfrastructure/StringUtilities.cs

Lines 673 to 684 in 4e17e96

private static bool IsValidHeaderString(string value)

{

// Method for Debug.Assert to ensure BytesOrdinalEqualsStringAndAscii

// is not called with an unvalidated string comparitor.

try

{

if (value is null)

{

return false;

}

new UTF8Encoding(encoderShouldEmitUTF8Identifier: false, throwOnInvalidBytes: true).GetByteCount(value);

return !value.Contains('\0');

So it should be safe to switch to Ascii.Equals, but to be extra safe I can keep the old helper method just to verify the input and delegate to Ascii.Equals:

public static bool BytesOrdinalEqualsStringAndAscii(string previousValue, ReadOnlySpan<byte> newValue) { // previousValue is a previously materialized string which *must* have already passed validation. Debug.Assert(IsValidHeaderString(previousValue)); return Ascii.Equals(previousValue, newValue); }

I agree with your assessment for the Http1Connection.cs usage, since the strings being compared are materialized via GetAsciiStringNonNullCharacters which does the null check for us.

I think the HttpHeaders.Generated.cs usage is also fine since it's only comparing against known header values which definitely don't have \0 in them.

So it should be safe to switch to Ascii.Equals, but to be extra safe I can keep the old helper method just to verify the input and delegate to Ascii.Equals:

That would be nice 😃

* reintroduce StringUtilities.BytesOrdinalEqualsStringAndAscii * add an assert that ensures that at least one of the inputs does not contain null character * delegate to Ascii.Equals

adamsitnik · 2023-05-25T14:21:50Z

@BrennanConroy I have added benchmarks with results, PTAL.

javiercn · 2023-05-25T14:24:33Z

src/Http/Routing/test/UnitTests/Matching/AsciiTest.cs

-        // Assert
-        Assert.False(result);
-    }
-}


Did we ran this against the new implementation to make sure that we didn't introduce any behavior diff?

Did we ran this against the new implementation to make sure that we didn't introduce any behavior diff?

That is a very good question. We have not, as the new APIs have great test coverage in dotnet/runtime:
https://github.com/dotnet/runtime/tree/main/src/libraries/System.Text.Encoding/tests/Ascii

including things like boundaries checks for the vectorized code:

https://github.com/dotnet/runtime/blob/081f87c02d69184453111d80dd66baf672ec5b4e/src/libraries/System.Text.Encoding/tests/Ascii/IsValidCharTests.cs#L90-L99

but if you want I can contribute the removed ASP.NET tests to dotnet/runtime test suite.

At least try undeleting them locally and running them against the new implementation to see if there are differences.

javiercn

Changes look good to me.

BrennanConroy · 2023-05-25T16:37:09Z

Most probably the reason for that is that the new Ascii APIs check both inputs (left and right) for containing invalid Ascii characters, while the existing ASP.NET helper does it only for one of the inputs (as it knows that the other one is always valid)

Maybe I'm missing something, but does the Ascii API need to check both inputs? If it's doing an Equals check and only checks one input for valid Ascii, wouldn't it implicitly be checking the second input for valid Ascii via the equals check?

Tratcher · 2023-05-25T16:49:05Z

Does it throw or return false for invalid ascii input?

BrennanConroy · 2023-05-25T16:49:46Z

/benchmark plaintext aspnet-citrine-lin kestrel

BrennanConroy · 2023-05-25T16:50:40Z

Does it throw or return false for invalid ascii input?

False
https://source.dot.net/#System.Private.CoreLib/src/libraries/System.Private.CoreLib/src/System/Text/Ascii.Equality.cs,80

pr-benchmarks · 2023-05-25T17:01:16Z

Benchmark started for plaintext on aspnet-citrine-lin with kestrel. Logs: link

BrennanConroy · 2023-05-25T17:22:52Z

I just tried the benchmark on one of my machines and got much bigger gaps in the happy path.

BenchmarkDotNet=v0.13.5, OS=Windows 11 (10.0.22621.1702/22H2/2022Update/SunValley2)
Intel Core i7-9700 CPU 3.00GHz, 1 CPU, 8 logical and 8 physical cores
.NET SDK=8.0.100-preview.5.23275.7
  [Host]     : .NET 8.0.0 (8.0.23.27214), X64 RyuJIT AVX2
  Job-DWFIYY : .NET 8.0.0 (8.0.23.27214), X64 RyuJIT AVX2

OutlierMode=DontRemove  MemoryRandomization=True

Method	Size	Equal	Mean	Error	StdDev	Median	Ratio	RatioSD
SystemAscii	6	False	1.732 ns	0.0219 ns	0.0205 ns	1.729 ns	1.00	0.00
AspNet	6	False	2.440 ns	0.0318 ns	0.0297 ns	2.433 ns	1.41	0.02

SystemAscii	6	True	5.046 ns	0.0264 ns	0.0247 ns	5.045 ns	1.00	0.00
AspNet	6	True	3.585 ns	0.0275 ns	0.0257 ns	3.578 ns	0.71	0.01

SystemAscii	32	False	2.309 ns	0.0602 ns	0.0563 ns	2.296 ns	1.00	0.00
AspNet	32	False	2.914 ns	0.0561 ns	0.0525 ns	2.892 ns	1.26	0.04

SystemAscii	32	True	3.866 ns	0.1062 ns	0.2262 ns	3.747 ns	1.00	0.00
AspNet	32	True	3.213 ns	0.0628 ns	0.0587 ns	3.233 ns	0.82	0.05

SystemAscii	64	False	2.822 ns	0.0861 ns	0.1289 ns	2.857 ns	1.00	0.00
AspNet	64	False	3.383 ns	0.0979 ns	0.1128 ns	3.393 ns	1.20	0.07

SystemAscii	64	True	7.259 ns	0.2143 ns	0.6317 ns	6.939 ns	1.00	0.00
AspNet	64	True	4.861 ns	0.0835 ns	0.0781 ns	4.858 ns	0.68	0.04

Runtime version 8.0.0-preview.5.23272.14

BrennanConroy · 2023-05-25T17:27:14Z

Benchmark run:

application	plaintext.base	plaintext.pr
CPU Usage (%)	99	100	+1.01%
Cores usage (%)	2,781	2,791	+0.36%
Working Set (MB)	126	126	0.00%
Private Memory (MB)	656	654	-0.30%
Build Time (ms)	3,869	3,436	-11.19%
Start Time (ms)	205	213	+3.90%
Published Size (KB)	96,826	96,826	0.00%
Symbols Size (KB)	53	53	0.00%
.NET Core SDK Version	8.0.100-preview.5.23275.7	8.0.100-preview.5.23275.7

load	plaintext.base	plaintext.pr
CPU Usage (%)	98	98	0.00%
Cores usage (%)	2,752	2,734	-0.65%
Working Set (MB)	48	48	0.00%
Private Memory (MB)	370	370	0.00%
Start Time (ms)	0	0
First Request (ms)	96	97	+1.04%
Requests/sec	11,689,046	11,630,543	-0.50%
Requests	176,411,336	175,462,792	-0.54%
Mean latency (ms)	1.33	1.27	-4.51%
Max latency (ms)	56.13	60.84	+8.39%
Bad responses	0	0
Socket errors	0	0
Read throughput (MB/s)	1,402.88	1,392.64	-0.73%
Latency 50th (ms)	0.68	0.70	+2.20%
Latency 75th (ms)	1.04	1.05	+0.96%
Latency 90th (ms)	2.22	2.12	-4.50%
Latency 99th (ms)	14.11	16.74	+18.64%

adamsitnik · 2023-05-31T10:49:03Z

@BrennanConroy Could you please apply [ProcessCount(9)] and re-run the benchmarks? It will instruct the BDN to benchmark every scenario nine times (every time in a dedicated processes) and combined with [MemoryRandomization] it should give us a better representation of entire distribution.

Which scenario is the most common?

Is my understanding correct that the TechEmpower benchmarks run show basically no difference (all values seem to be within the range of error)?

BrennanConroy · 2023-06-05T18:25:56Z

I believe I figured out why Ascii.Equals is 20-30% slower than BytesOrdinalEqualsStringAndAscii. I made some changes and local testing is now showing Ascii.Equals is 20-30% faster than BytesOrdinalEqualsStringAndAscii.

I'll try to open a draft PR in runtime with the 3 changes needed to optimize it.

ghost · 2023-06-13T03:00:52Z

Looks like this PR hasn't been active for some time and the codebase could have been changed in the meantime.
To make sure no breaking changes are introduced, please leave an /azp run comment here to rerun the CI pipeline and confirm success before merging the change.

adamsitnik · 2023-07-06T13:56:03Z

@BrennanConroy optimizations got merged to runtime: dotnet/runtime#87141

@javiercn @davidfowl Can I merge the PR now or should I wait until the changes propagate to this repo?

adamsitnik · 2023-07-06T13:56:29Z

Does it throw or return false for invalid ascii input?

@Tratcher apologies, I've missed your question. It returns false for invalid ascii

Tratcher · 2023-07-06T15:46:39Z

/azp run

azure-pipelines · 2023-07-06T15:46:57Z

Azure Pipelines successfully started running 3 pipeline(s).

davidfowl · 2023-07-17T15:47:27Z

Great job @BrennanConroy and @adamsitnik !

use new System.Text.Ascii APIs, remove internal helpers

eddb3c9

ghost added the area-runtime label May 22, 2023

BrennanConroy reviewed May 22, 2023

View reviewed changes

adamsitnik added 4 commits May 23, 2023 09:16

remove tests that are no longer needed

57b5e3f

address code review feedback: prevent from dead code elimination

bd42df1

test fixes

ae6ef49

address code review feedback:

b0f7437

* reintroduce StringUtilities.BytesOrdinalEqualsStringAndAscii * add an assert that ensures that at least one of the inputs does not contain null character * delegate to Ascii.Equals

adamsitnik marked this pull request as ready for review May 25, 2023 14:21

adamsitnik requested review from Tratcher, halter73, JamesNK, mgravell, javiercn and captainsafia as code owners May 25, 2023 14:21

adamsitnik requested a review from BrennanConroy May 25, 2023 14:21

javiercn reviewed May 25, 2023

View reviewed changes

javiercn approved these changes May 25, 2023

View reviewed changes

amcasey assigned BrennanConroy Jun 2, 2023

BrennanConroy mentioned this pull request Jun 5, 2023

Optimize Ascii.Equals when widening dotnet/runtime#87141

Merged

amcasey added the area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions label Jun 6, 2023

amcasey removed the area-runtime label Jun 6, 2023

ghost added the pending-ci-rerun When assigned to a PR indicates that the CI checks should be rerun label Jun 13, 2023

ghost removed the pending-ci-rerun When assigned to a PR indicates that the CI checks should be rerun label Jul 6, 2023

BrennanConroy approved these changes Jul 11, 2023

View reviewed changes

BrennanConroy added the Perf label Jul 11, 2023

adamsitnik merged commit d994d31 into dotnet:main Jul 17, 2023

ghost added this to the 8.0-preview7 milestone Jul 17, 2023

adamsitnik deleted the newAsciiApi branch July 17, 2023 15:31

	private static bool CheckBytesInAsciiRange(Vector<sbyte> check)
	{
	// Vectorized byte range check, signed byte > 0 for 1-127
	return Vector.GreaterThanAll(check, Vector<sbyte>.Zero);

	if (offset < count)
	{
	var ch = (char)Unsafe.Add(ref bytes, offset);
	if (((ch & 0x80) != 0) \|\| Unsafe.Add(ref str, offset) != ch)
	{
	goto NotEqual;
	}
	}

	// End of input reached, there are no inequalities via widening; so the input bytes are both ascii
	// and a match to the string if it was converted via Encoding.ASCII.GetString(...)
	return true;

	var vector = Unsafe.ReadUnaligned<Vector<sbyte>>(ref Unsafe.Add(ref bytes, offset));
	if (!CheckBytesInAsciiRange(vector))

	private static bool IsValidHeaderString(string value)
	{
	// Method for Debug.Assert to ensure BytesOrdinalEqualsStringAndAscii
	// is not called with an unvalidated string comparitor.
	try
	{
	if (value is null)
	{
	return false;
	}
	new UTF8Encoding(encoderShouldEmitUTF8Identifier: false, throwOnInvalidBytes: true).GetByteCount(value);
	return !value.Contains('\0');

use new System.Text.Ascii APIs, remove internal helpers #48368

use new System.Text.Ascii APIs, remove internal helpers #48368

Uh oh!

Conversation

adamsitnik commented May 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adamsitnik May 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adamsitnik commented May 25, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

javiercn left a comment

Choose a reason for hiding this comment

Uh oh!

BrennanConroy commented May 25, 2023

Uh oh!

Tratcher commented May 25, 2023

Uh oh!

BrennanConroy commented May 25, 2023

Uh oh!

BrennanConroy commented May 25, 2023

Uh oh!

pr-benchmarks bot commented May 25, 2023

Uh oh!

BrennanConroy commented May 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BrennanConroy commented May 25, 2023

Uh oh!

adamsitnik commented May 31, 2023

Uh oh!

BrennanConroy commented Jun 5, 2023

Uh oh!

ghost commented Jun 13, 2023

Uh oh!

adamsitnik commented Jul 6, 2023

Uh oh!

adamsitnik commented Jul 6, 2023

Uh oh!

Tratcher commented Jul 6, 2023

Uh oh!

azure-pipelines bot commented Jul 6, 2023

Uh oh!

davidfowl commented Jul 17, 2023

Uh oh!

Uh oh!

adamsitnik commented May 22, 2023 •

edited

Loading

adamsitnik May 24, 2023 •

edited

Loading

BrennanConroy commented May 25, 2023 •

edited

Loading