-
Notifications
You must be signed in to change notification settings - Fork 10.3k
use new System.Text.Ascii APIs, remove internal helpers #48368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -54,7 +54,8 @@ private static ulong GetAsciiStringAsLong(string str) | |||
{ | |||
Debug.Assert(str.Length == 8, "String must be exactly 8 (ASCII) characters long."); | |||
|
|||
var bytes = Encoding.ASCII.GetBytes(str); | |||
Span<byte> bytes = stackalloc byte[8]; | |||
Debug.Assert(Ascii.FromUtf16(str, bytes, out _) == OperationStatus.Done); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is compiled away in Release mode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To my surprise you are right. Thank you for catching this.
using System.Buffers;
using System.Buffers.Binary;
using System.Diagnostics;
using System.Text;
namespace DebugProof
{
internal class Program
{
static void Main(string[] args)
{
Console.WriteLine(Test8("12345678"));
Console.WriteLine(Test4("1234"));
}
static ulong Test8(string str)
{
Debug.Assert(str.Length == 8, "String must be exactly 8 (ASCII) characters long.");
Span<byte> bytes = stackalloc byte[8];
Debug.Assert(Ascii.FromUtf16(str, bytes, out _) == OperationStatus.Done);
return BinaryPrimitives.ReadUInt64LittleEndian(bytes);
}
static ulong Test4(string str)
{
Debug.Assert(str.Length == 4, "String must be exactly 4 (ASCII) characters long.");
Span<byte> bytes = stackalloc byte[4];
Debug.Assert(Ascii.FromUtf16(str, bytes, out _) == OperationStatus.Done);
return BinaryPrimitives.ReadUInt32LittleEndian(bytes);
}
}
}
PS C:\Users\adsitnik\source\repos\DebugProof> dotnet run -c Debug
4050765991979987505
875770417
PS C:\Users\adsitnik\source\repos\DebugProof> dotnet run -c Release
0
0
@@ -339,7 +340,7 @@ private void OnOriginFormTarget(TargetOffsetPathLength targetPath, Span<byte> ta | |||
var previousValue = _parsedRawTarget; | |||
if (ServerOptions.DisableStringReuse || | |||
previousValue == null || previousValue.Length != target.Length || | |||
!StringUtilities.BytesOrdinalEqualsStringAndAscii(previousValue, target)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method does a null check as well, we'd need to make sure that's ok for all the changed callsites.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method does a null check as well, we'd need to make sure that's ok for all the changed callsites.
I am not sure if I understand. So far all the callers of StringUtilities.BytesOrdinalEqualsStringAndAscii
ensured that the input is not null. But if null
would be sent to StringUtilities.BytesOrdinalEqualsStringAndAscii
, it would throw. With my changes, it won't.
Would you like me to keep the old helper method that simply performs a debug assert for the input and calls the new Ascii API to do the job?
bool BytesOrdinalEqualsStringAndAscii(string previousValue, ReadOnlySpan<byte> newValue)
{
Debug.Assert(previousValue is not null);
return Ascii.Equals(previousValue, newValue);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I mistyped. I meant it checks for 0
not null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I mistyped. I meant it checks for
0
not null.
The 0 check is performed by TryGetAsciiString
which I am not touching in this PR (on purpose)
if (!CheckBytesInAsciiRange(vector, avxZero)) |
aspnetcore/src/Shared/ServerInfrastructure/StringUtilities.cs
Lines 783 to 786 in 39564d5
private static bool CheckBytesInAsciiRange(Vector<sbyte> check) | |
{ | |
// Vectorized byte range check, signed byte > 0 for 1-127 | |
return Vector.GreaterThanAll(check, Vector<sbyte>.Zero); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is also done in BytesOrdinalEqualsStringAndAscii which is why I'm bringing it up
https://github.com/dotnet/aspnetcore/blob/main/src/Shared/ServerInfrastructure/StringUtilities.cs#L522
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a closer look at BytesOrdinalEqualsStringAndAscii
.
If the remainder of the vectorized loop contains zero (or the input is simply too small to go the vectorized code path), but the inputs are equal it can still return true
:
aspnetcore/src/Shared/ServerInfrastructure/StringUtilities.cs
Lines 501 to 512 in 39564d5
if (offset < count) | |
{ | |
var ch = (char)Unsafe.Add(ref bytes, offset); | |
if (((ch & 0x80) != 0) || Unsafe.Add(ref str, offset) != ch) | |
{ | |
goto NotEqual; | |
} | |
} | |
// End of input reached, there are no inequalities via widening; so the input bytes are both ascii | |
// and a match to the string if it was converted via Encoding.ASCII.GetString(...) | |
return true; |
But the most important thing is that it checks for zero only one of the inputs:
aspnetcore/src/Shared/ServerInfrastructure/StringUtilities.cs
Lines 521 to 522 in 39564d5
var vector = Unsafe.ReadUnaligned<Vector<sbyte>>(ref Unsafe.Add(ref bytes, offset)); | |
if (!CheckBytesInAsciiRange(vector)) |
And from what I can see the other input is always a const (known header for example):
$@"// Matched a known header |
It seems that only one of the inputs may contain null characters (because the other one is typically a pre-defined const) and hence Ascii.Equals
will always return false
in such cases because the inputs will simply not be equal?
@BrennanConroy is my understanding correct?
Edit: I just realized that the existing helper method ensures that the string
input does not contain zeros:
aspnetcore/src/Shared/ServerInfrastructure/StringUtilities.cs
Lines 673 to 684 in 4e17e96
private static bool IsValidHeaderString(string value) | |
{ | |
// Method for Debug.Assert to ensure BytesOrdinalEqualsStringAndAscii | |
// is not called with an unvalidated string comparitor. | |
try | |
{ | |
if (value is null) | |
{ | |
return false; | |
} | |
new UTF8Encoding(encoderShouldEmitUTF8Identifier: false, throwOnInvalidBytes: true).GetByteCount(value); | |
return !value.Contains('\0'); |
So it should be safe to switch to Ascii.Equals
, but to be extra safe I can keep the old helper method just to verify the input and delegate to Ascii.Equals
:
public static bool BytesOrdinalEqualsStringAndAscii(string previousValue, ReadOnlySpan<byte> newValue)
{
// previousValue is a previously materialized string which *must* have already passed validation.
Debug.Assert(IsValidHeaderString(previousValue));
return Ascii.Equals(previousValue, newValue);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with your assessment for the Http1Connection.cs usage, since the strings being compared are materialized via GetAsciiStringNonNullCharacters
which does the null check for us.
I think the HttpHeaders.Generated.cs usage is also fine since it's only comparing against known header values which definitely don't have \0
in them.
So it should be safe to switch to Ascii.Equals, but to be extra safe I can keep the old helper method just to verify the input and delegate to Ascii.Equals:
That would be nice 😃
* reintroduce StringUtilities.BytesOrdinalEqualsStringAndAscii * add an assert that ensures that at least one of the inputs does not contain null character * delegate to Ascii.Equals
@BrennanConroy I have added benchmarks with results, PTAL. |
// Assert | ||
Assert.False(result); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we ran this against the new implementation to make sure that we didn't introduce any behavior diff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we ran this against the new implementation to make sure that we didn't introduce any behavior diff?
That is a very good question. We have not, as the new APIs have great test coverage in dotnet/runtime:
https://github.com/dotnet/runtime/tree/main/src/libraries/System.Text.Encoding/tests/Ascii
including things like boundaries checks for the vectorized code:
but if you want I can contribute the removed ASP.NET tests to dotnet/runtime test suite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least try undeleting them locally and running them against the new implementation to see if there are differences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good to me.
Maybe I'm missing something, but does the Ascii API need to check both inputs? If it's doing an Equals check and only checks one input for valid Ascii, wouldn't it implicitly be checking the second input for valid Ascii via the equals check? |
Does it throw or return false for invalid ascii input? |
/benchmark plaintext aspnet-citrine-lin kestrel |
|
Benchmark started for plaintext on aspnet-citrine-lin with kestrel. Logs: link |
I just tried the benchmark on one of my machines and got much bigger gaps in the happy path.
Runtime version 8.0.0-preview.5.23272.14 |
Benchmark run:
|
@BrennanConroy Could you please apply Which scenario is the most common? Is my understanding correct that the TechEmpower benchmarks run show basically no difference (all values seem to be within the range of error)? |
I believe I figured out why I'll try to open a draft PR in runtime with the 3 changes needed to optimize it. |
Looks like this PR hasn't been active for some time and the codebase could have been changed in the meantime. |
@BrennanConroy optimizations got merged to runtime: dotnet/runtime#87141 @javiercn @davidfowl Can I merge the PR now or should I wait until the changes propagate to this repo? |
@Tratcher apologies, I've missed your question. It returns false for invalid ascii |
/azp run |
Azure Pipelines successfully started running 3 pipeline(s). |
Great job @BrennanConroy and @adamsitnik ! |
This PR does the following:
BytesOrdinalEqualsStringAndAscii
withAscii.Equals
AsciiIgnoreCaseEquals
withAscii.EqualsIgnoreCase
IsAscii
withAscii.IsValid
Since only
BytesOrdinalEqualsStringAndAscii
was vectorized so far and all newAscii
APIs are vectorized, I provided benchmarks only forBytesOrdinalEqualsStringAndAscii
vsAscii.Equals
.Source code, results:
Summary:
Ascii.Equals
finishes faster when the inputs don't matchBytesOrdinalEqualsStringAndAscii
helper is slightly faster when the inputs are equal (20% for 6 characters, 4% for 32 chars, 7% for 64 chars). Most probably the reason for that is that the newAscii
APIs check both inputs (left and right) for containing invalid Ascii characters, while the existing ASP.NET helper does it only for one of the inputs (as it knows that the other one is always valid).