-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Fix: #2936 - Prevent escaping of Latin characters in LLM response #2937
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @amenegola, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request resolves an issue where Latin characters were being unnecessarily escaped in LLM agent responses, particularly when an output schema was defined. By adjusting the json.dumps
serialization to explicitly allow non-ASCII characters, the change significantly improves the readability and user experience for outputs in languages that utilize accented characters or special symbols.
Highlights
- Latin Character Escaping Fix: Addresses an issue where
LlmAgent
outputs were incorrectly escaping Latin characters (e.g., "\xf3" for "ó") when usingoutput_schema
andtools
, by explicitly settingensure_ascii=False
injson.dumps
. - Improved Readability: Ensures that non-ASCII characters in structured model responses are preserved in their natural form, enhancing readability for languages with accented characters and special symbols.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
Response from ADK Triaging Agent Hello @amenegola, thank you for creating this PR! It looks like you haven't signed the Contributor License Agreement (CLA) yet. Please visit https://cla.developers.google.com/ to sign it. Once the CLA is signed, we can proceed with the review of your PR. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly addresses the issue of escaped Latin characters in LLM responses by setting ensure_ascii=False
in the json.dumps
call. The change is simple and effective. My main feedback is to add an automated unit test to cover this specific case with non-ASCII characters, which will help prevent future regressions. While you've mentioned local testing, incorporating it into the test suite would be a valuable addition for long-term maintainability.
da4da5f
to
a7017b9
Compare
a7017b9
to
f99c463
Compare
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly addresses the issue of escaped Latin characters in LLM responses by setting ensure_ascii=False
during JSON serialization. The fix is direct and effective. The addition of a new unit test is also great for verifying this behavior. I've added one suggestion to enhance the new test case by parameterizing it, which will improve its robustness and make it easier to extend with more scenarios in the future. Overall, this is a solid contribution.
tests/unittests/flows/llm_flows/test_output_schema_processor.py
Outdated
Show resolved
Hide resolved
@seanzhou1023 Please could you review this PR? Thank you!!! |
could you run autoformat.sh to resolve the format error. |
Yes I can, should I create a new PR to correct the formatting? |
Closes #2936
This Pull Request addresses the issue where
LlmAgent
outputs, when configured withoutput_schema
andtools
, were presenting escaped Latin characters (e.g.,\xf3
foró
) in the final response. This behavior occurred becausejson.dumps
was being called withensure_ascii=True
(its default), which is not ideal for human-readable output, especially when dealing with non-ASCII characters common in many languages like Portuguese.Changes Proposed:
_OutputSchemaRequestProcessor
insrc/google/adk/flows/llm_flows/_output_schema_processor.py
to explicitly setensure_ascii=False
when callingjson.dumps
for theset_model_response
tool's output.Impact:
This change ensures that all non-ASCII characters in the structured model response are preserved in their natural form, improving the readability and user experience of agent outputs, particularly for users interacting in languages with accented characters or other special symbols.
Testing:
The fix was verified locally by running an
LlmAgent
with anoutput_schema
and confirming that responses containing Latin characters (e.g., "ação", "caminhão", "ícone") are now correctly displayed without escaping.