Skip to content

Dev/mdaigle/3974 nonascii tests#4008

Draft
mdaigle wants to merge 4 commits intodotnet:mainfrom
mdaigle:dev/mdaigle/3974-nonascii-tests
Draft

Dev/mdaigle/3974 nonascii tests#4008
mdaigle wants to merge 4 commits intodotnet:mainfrom
mdaigle:dev/mdaigle/3974-nonascii-tests

Conversation

@mdaigle
Copy link
Contributor

@mdaigle mdaigle commented Mar 5, 2026

No description provided.

jimhblythe and others added 4 commits February 20, 2026 11:56
…tead of a StringBuilder. (dotnet#1877)

Note: UTF8Encoding(false) addition in s_writerSettings is consistent with prior default used within StringWriter/StringBuilder
…ple elements

Enhance comments within SqlStreamingXml
Extend Manual tests to fully cover GetChars
WriteXmlElement includes uncovered paths not accessible for SQL XML column types which normalize Whitespace, CDATA, EntityReference, XmlDeclaration, ProcessingInstruction, DocumentType, and Comment node types
Copilot AI review requested due to automatic review settings March 5, 2026 23:02
@github-project-automation github-project-automation bot moved this to To triage in SqlClient Board Mar 5, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds new manual test coverage for SqlDataReader.GetChars on XML (including non-ASCII/CJK) and updates SqlStreamingXml to stream via MemoryStream with UTF-8 settings.

Changes:

  • Added a new manual test suite for sequential GetChars behaviors and non-ASCII XML reads.
  • Registered the new manual test file in the ManualTests project.
  • Refactored SqlStreamingXml.GetChars to write to a MemoryStream (UTF-8) instead of a StringWriter.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 10 comments.

File Description
src/Microsoft.Data.SqlClient/tests/ManualTests/SQL/SqlStreamingXmlTest/SqlStreamingXmlTest.cs Adds manual tests for GetChars sequential rules, partial reads, and non-ASCII/CJK content.
src/Microsoft.Data.SqlClient/tests/ManualTests/Microsoft.Data.SqlClient.ManualTests.csproj Includes the new manual test file in the project build.
src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/SqlStream.cs Refactors SqlStreamingXml streaming implementation (now MemoryStream + UTF-8 settings).

Comment on lines +546 to +547
long memoryStreamRemaining = _memoryStream.Length - _memoryStream.Position;
while (memoryStreamRemaining < (length + charsToSkip) && !_xmlReader.EOF)
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This treats UTF-8 encoded bytes in _memoryStream as if they were UTF-16 chars (1 byte == 1 char), which will corrupt non-ASCII data and makes length/charsToSkip comparisons incorrect. GetChars must operate on characters, so either keep a character store (e.g., StringBuilder/TextWriter) or decode bytes properly (e.g., read via a StreamReader with the same encoding and track positions in characters rather than bytes).

Copilot uses AI. Check for mistakes.
Comment on lines +591 to 596
cnt = memoryStreamRemaining < length ? memoryStreamRemaining : length;
for (int i = 0; i < cnt; i++)
{
buffer[bufferIndex + i] = strBldr[i];
// ReadByte moves the Position forward
buffer[bufferIndex + i] = (char)_memoryStream.ReadByte();
}
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This treats UTF-8 encoded bytes in _memoryStream as if they were UTF-16 chars (1 byte == 1 char), which will corrupt non-ASCII data and makes length/charsToSkip comparisons incorrect. GetChars must operate on characters, so either keep a character store (e.g., StringBuilder/TextWriter) or decode bytes properly (e.g., read via a StreamReader with the same encoding and track positions in characters rather than bytes).

Copilot uses AI. Check for mistakes.
Comment on lines +483 to +484
// Potentially limits XML to not supporting UTF-16 characters, but this is required to avoid writing
// a byte order mark and is consistent with prior default used within StringWriter/StringBuilder.
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments are misleading/inaccurate: using UTF-8 does not 'limit XML to not supporting UTF-16 characters' (UTF-8 can represent the full Unicode range), and _charsRemoved is being used as a zero-based 'already-consumed' count (it starts at 0 and is compared directly to dataIndex). Please correct/remove these comments to match the actual semantics and avoid confusion for future maintainers.

Suggested change
// Potentially limits XML to not supporting UTF-16 characters, but this is required to avoid writing
// a byte order mark and is consistent with prior default used within StringWriter/StringBuilder.
// Use UTF-8 without BOM to avoid writing a byte order mark; this is consistent with the prior
// default used within StringWriter/StringBuilder.

Copilot uses AI. Check for mistakes.
else if (dataIndex > _charsRemoved)
{
charsToSkip = (int)(dataIndex - _charsRemoved);
//dataIndex is zero-based, but _charsRemoved is one-based, so the difference is the number of chars to skip in the MemoryStream before we start copying data to the buffer
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments are misleading/inaccurate: using UTF-8 does not 'limit XML to not supporting UTF-16 characters' (UTF-8 can represent the full Unicode range), and _charsRemoved is being used as a zero-based 'already-consumed' count (it starts at 0 and is compared directly to dataIndex). Please correct/remove these comments to match the actual semantics and avoid confusion for future maintainers.

Suggested change
//dataIndex is zero-based, but _charsRemoved is one-based, so the difference is the number of chars to skip in the MemoryStream before we start copying data to the buffer
// Both dataIndex and _charsRemoved are zero-based; _charsRemoved tracks how many chars have already been returned,
// so their difference is the number of additional chars to skip in the MemoryStream before we start copying data to the buffer.

Copilot uses AI. Check for mistakes.
Comment on lines +629 to 633
if (_canReadChunk)
{
if (writeNodeBuffer == null)
{
writeNodeBuffer = new char[WriteNodeBufferSize];
}
char[] writeNodeBuffer = new char[WriteNodeBufferSize];
int read;
while ((read = _xmlReader.ReadValueChunk(writeNodeBuffer, 0, WriteNodeBufferSize)) > 0)
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allocating writeNodeBuffer on every XmlNodeType.Text element can create avoidable GC pressure when streaming larger XML. Consider storing a reusable buffer as a field (or renting from ArrayPool<char>) so repeated calls to WriteXmlElement() don't allocate a new array each time.

Copilot uses AI. Check for mistakes.
Comment on lines +45 to +63
SqlConnection connection = new(DataTestUtility.TCPConnectionString);
const string commandText = "SELECT Convert(xml, N'<foo>bar</foo>')";
long charCount = 0;

using (SqlCommand command = connection.CreateCommand())
{
connection.Open();
command.CommandText = commandText;

SqlDataReader sqlDataReader = command.ExecuteReader(CommandBehavior.SequentialAccess);
if (sqlDataReader.Read())
{
charCount = sqlDataReader.GetChars(0, 0, null, 0, 1);
}
connection.Close();
}

//verify -1 is returned since buffer was not provided
Assert.Equal(-1, charCount);
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several tests manually Open()/Close() the connection and do not dispose SqlConnection / SqlDataReader. If an Assert throws inside the using (SqlCommand ...) block, connection.Close() will be skipped and the reader/connection can leak for the remainder of the test run. Prefer using (or await using if async) for SqlConnection and SqlDataReader, or a try/finally to guarantee cleanup.

Suggested change
SqlConnection connection = new(DataTestUtility.TCPConnectionString);
const string commandText = "SELECT Convert(xml, N'<foo>bar</foo>')";
long charCount = 0;
using (SqlCommand command = connection.CreateCommand())
{
connection.Open();
command.CommandText = commandText;
SqlDataReader sqlDataReader = command.ExecuteReader(CommandBehavior.SequentialAccess);
if (sqlDataReader.Read())
{
charCount = sqlDataReader.GetChars(0, 0, null, 0, 1);
}
connection.Close();
}
//verify -1 is returned since buffer was not provided
Assert.Equal(-1, charCount);
using (SqlConnection connection = new(DataTestUtility.TCPConnectionString))
{
const string commandText = "SELECT Convert(xml, N'<foo>bar</foo>')";
long charCount = 0;
using (SqlCommand command = connection.CreateCommand())
{
connection.Open();
command.CommandText = commandText;
using (SqlDataReader sqlDataReader = command.ExecuteReader(CommandBehavior.SequentialAccess))
{
if (sqlDataReader.Read())
{
charCount = sqlDataReader.GetChars(0, 0, null, 0, 1);
}
}
}
//verify -1 is returned since buffer was not provided
Assert.Equal(-1, charCount);
}

Copilot uses AI. Check for mistakes.
Comment on lines +45 to +59
SqlConnection connection = new(DataTestUtility.TCPConnectionString);
const string commandText = "SELECT Convert(xml, N'<foo>bar</foo>')";
long charCount = 0;

using (SqlCommand command = connection.CreateCommand())
{
connection.Open();
command.CommandText = commandText;

SqlDataReader sqlDataReader = command.ExecuteReader(CommandBehavior.SequentialAccess);
if (sqlDataReader.Read())
{
charCount = sqlDataReader.GetChars(0, 0, null, 0, 1);
}
connection.Close();
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several tests manually Open()/Close() the connection and do not dispose SqlConnection / SqlDataReader. If an Assert throws inside the using (SqlCommand ...) block, connection.Close() will be skipped and the reader/connection can leak for the remainder of the test run. Prefer using (or await using if async) for SqlConnection and SqlDataReader, or a try/finally to guarantee cleanup.

Suggested change
SqlConnection connection = new(DataTestUtility.TCPConnectionString);
const string commandText = "SELECT Convert(xml, N'<foo>bar</foo>')";
long charCount = 0;
using (SqlCommand command = connection.CreateCommand())
{
connection.Open();
command.CommandText = commandText;
SqlDataReader sqlDataReader = command.ExecuteReader(CommandBehavior.SequentialAccess);
if (sqlDataReader.Read())
{
charCount = sqlDataReader.GetChars(0, 0, null, 0, 1);
}
connection.Close();
const string commandText = "SELECT Convert(xml, N'<foo>bar</foo>')";
long charCount = 0;
using (SqlConnection connection = new(DataTestUtility.TCPConnectionString))
{
using (SqlCommand command = connection.CreateCommand())
{
connection.Open();
command.CommandText = commandText;
using (SqlDataReader sqlDataReader = command.ExecuteReader(CommandBehavior.SequentialAccess))
{
if (sqlDataReader.Read())
{
charCount = sqlDataReader.GetChars(0, 0, null, 0, 1);
}
}
}

Copilot uses AI. Check for mistakes.
Comment on lines +59 to +60
connection.Close();
}
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several tests manually Open()/Close() the connection and do not dispose SqlConnection / SqlDataReader. If an Assert throws inside the using (SqlCommand ...) block, connection.Close() will be skipped and the reader/connection can leak for the remainder of the test run. Prefer using (or await using if async) for SqlConnection and SqlDataReader, or a try/finally to guarantee cleanup.

Copilot uses AI. Check for mistakes.
[ConditionalTheory(typeof(DataTestUtility), nameof(DataTestUtility.AreConnStringsSetup))]
[InlineData(true)]
[InlineData(false)]
public static void GetChars_SequentialDataIndex(bool backwards)
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter name backwards + the comment don't match what the test actually does. The prior call uses dataIndex = 0, so backwards ? 0 : 1 is either the same index (0) or an overlapping index (1), not 'one less than the previous call'. Consider renaming the parameter to reflect the behavior (e.g., overlapByOne) and/or updating the comment to describe 'same index vs overlapping index' to avoid confusion.

Copilot uses AI. Check for mistakes.
Comment on lines +84 to +87
// Verify that providing the same or lower index than the previous call results in an exception.
// When backwards is true we test providing an index that is one less than the previous call,
// otherwise we test providing the same index as the previous call - both should not be allowed.
Assert.Throws<InvalidOperationException>(() => sqlDataReader.GetChars(0, backwards ? 0 : 1, buffer, 0, 2));
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter name backwards + the comment don't match what the test actually does. The prior call uses dataIndex = 0, so backwards ? 0 : 1 is either the same index (0) or an overlapping index (1), not 'one less than the previous call'. Consider renaming the parameter to reflect the behavior (e.g., overlapByOne) and/or updating the comment to describe 'same index vs overlapping index' to avoid confusion.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: To triage

Development

Successfully merging this pull request may close these issues.

3 participants