Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update _patch.py #39341

Closed
wants to merge 1 commit into from
Closed

Update _patch.py #39341

wants to merge 1 commit into from

Conversation

952446418
Copy link

The current implementation of the line splitting logic in the code does not handle non-ASCII characters properly. Specifically, the following line: line_list: List[str] = re.split(r"(?<=\n)", element.decode("utf-8")) This line attempts to decode the element byte string using UTF-8 encoding. If the element contains non-ASCII characters that cannot be decoded, it will raise a UnicodeDecodeError.

To address this issue, we should add an error handling mechanism to the decode method. There are several options available:

errors="replace": Replace undecodable characters with a replacement character (usually �). errors="ignore": Ignore undecodable characters.
errors="backslashreplace": Replace undecodable characters with \x escape sequences. errors="surrogateescape": Save undecodable characters as surrogate characters for later recovery. For this pull request, I propose using errors="replace" or errors="ignore".

The current implementation of the line splitting logic in the code does not handle non-ASCII characters properly. Specifically, the following line:
line_list: List[str] = re.split(r"(?<=\n)", element.decode("utf-8"))
This line attempts to decode the element byte string using UTF-8 encoding. If the element contains non-ASCII characters that cannot be decoded, it will raise a UnicodeDecodeError.

To address this issue, we should add an error handling mechanism to the decode method. There are several options available:

errors="replace": Replace undecodable characters with a replacement character (usually �).
errors="ignore": Ignore undecodable characters.
errors="backslashreplace": Replace undecodable characters with \x escape sequences.
errors="surrogateescape": Save undecodable characters as surrogate characters for later recovery.
For this pull request, I propose using errors="replace" or errors="ignore".
@github-actions github-actions bot added AI Model Inference Issues related to the client library for Azure AI Model Inference (\sdk\ai\azure-ai-inference) Community Contribution Community members are working on the issue customer-reported Issues that are reported by GitHub users external to the Azure organization. labels Jan 22, 2025
Copy link

Thank you for your contribution @952446418! We will review the pull request and get back to you soon.

@azure-sdk
Copy link
Collaborator

API change check

API changes are not detected in this pull request.

@@ -338,7 +338,7 @@ def _deserialize_and_add_to_queue(self, element: bytes) -> bool:

# Convert `bytes` to string and split the string by newline, while keeping the new line char.
# the last may be a partial "line" that does not contain a newline char at the end.
line_list: List[str] = re.split(r"(?<=\n)", element.decode("utf-8"))
line_list: List[str] = re.split(r"(?<=\n)", element.decode("utf-8", errors="replace")) # or errors="ignore"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@952446418 can you please let me know of a test where the current handling is failing for you, and which package version you're using? This may have been fixed in a recent change in a different way, as I am unable to force an error, so I would like to validate that first before merging this change.

@dargilco
Copy link
Member

Closing this PR, as latest release of the azure-ai-inference SDK includes a fix for this issue.

@dargilco dargilco closed this Feb 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AI Model Inference Issues related to the client library for Azure AI Model Inference (\sdk\ai\azure-ai-inference) Community Contribution Community members are working on the issue customer-reported Issues that are reported by GitHub users external to the Azure organization.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants