Update _patch.py #39341

952446418 · 2025-01-22T06:40:04Z

The current implementation of the line splitting logic in the code does not handle non-ASCII characters properly. Specifically, the following line: line_list: List[str] = re.split(r"(?<=\n)", element.decode("utf-8")) This line attempts to decode the element byte string using UTF-8 encoding. If the element contains non-ASCII characters that cannot be decoded, it will raise a UnicodeDecodeError.

To address this issue, we should add an error handling mechanism to the decode method. There are several options available:

errors="replace": Replace undecodable characters with a replacement character (usually �). errors="ignore": Ignore undecodable characters.
errors="backslashreplace": Replace undecodable characters with \x escape sequences. errors="surrogateescape": Save undecodable characters as surrogate characters for later recovery. For this pull request, I propose using errors="replace" or errors="ignore".

The current implementation of the line splitting logic in the code does not handle non-ASCII characters properly. Specifically, the following line: line_list: List[str] = re.split(r"(?<=\n)", element.decode("utf-8")) This line attempts to decode the element byte string using UTF-8 encoding. If the element contains non-ASCII characters that cannot be decoded, it will raise a UnicodeDecodeError. To address this issue, we should add an error handling mechanism to the decode method. There are several options available: errors="replace": Replace undecodable characters with a replacement character (usually �). errors="ignore": Ignore undecodable characters. errors="backslashreplace": Replace undecodable characters with \x escape sequences. errors="surrogateescape": Save undecodable characters as surrogate characters for later recovery. For this pull request, I propose using errors="replace" or errors="ignore".

github-actions · 2025-01-22T06:40:22Z

Thank you for your contribution @952446418! We will review the pull request and get back to you soon.

azure-sdk · 2025-01-22T07:01:45Z

API change check

API changes are not detected in this pull request.

trangevi · 2025-01-30T00:24:15Z

sdk/ai/azure-ai-inference/azure/ai/inference/models/_patch.py

@@ -338,7 +338,7 @@ def _deserialize_and_add_to_queue(self, element: bytes) -> bool:

        # Convert `bytes` to string and split the string by newline, while keeping the new line char.
        # the last may be a partial "line" that does not contain a newline char at the end.
-        line_list: List[str] = re.split(r"(?<=\n)", element.decode("utf-8"))
+        line_list: List[str] = re.split(r"(?<=\n)", element.decode("utf-8", errors="replace"))  # or errors="ignore"


@952446418 can you please let me know of a test where the current handling is failing for you, and which package version you're using? This may have been fixed in a recent change in a different way, as I am unable to force an error, so I would like to validate that first before merging this change.

dargilco · 2025-02-26T00:33:33Z

Closing this PR, as latest release of the azure-ai-inference SDK includes a fix for this issue.

952446418 requested review from dargilco, trangevi and jhakulin as code owners January 22, 2025 06:40

trangevi requested changes Jan 30, 2025

View reviewed changes

dargilco closed this Feb 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update _patch.py #39341

Update _patch.py #39341

952446418 commented Jan 22, 2025

github-actions bot commented Jan 22, 2025

azure-sdk commented Jan 22, 2025

trangevi Jan 30, 2025

dargilco commented Feb 26, 2025

Update _patch.py #39341

Update _patch.py #39341

Conversation

952446418 commented Jan 22, 2025

github-actions bot commented Jan 22, 2025

azure-sdk commented Jan 22, 2025

trangevi Jan 30, 2025

Choose a reason for hiding this comment

dargilco commented Feb 26, 2025