[Bugfix] Env var to to disable xgrammar any_whitespace #12744

wallashss · 2025-02-04T18:18:11Z

Mistral models + guided decoding with json schema with xgrammar is generating endless whitespace. This bug was introduced with this change on xgrammar. My proposal is to add an environment variable that can disable whitespace on guided decoding with json scheme. Therefore, serving models like mistral will behave fine like before the change of xgrammar.

Minimal script to repro

from vllm import LLM, SamplingParams
from vllm.model_executor.guided_decoding.guided_fields import GuidedDecodingRequest

import json

# guided_decoding_backend='lm-format-enforcer'
# guided_decoding_backend='outlines'
guided_decoding_backend='xgrammar'
# Initialize the LLM

llm = LLM(model="mistralai/Mistral-7B-Instruct-v0.3", enforce_eager=True, guided_decoding_backend=guided_decoding_backend, tokenizer_mode='mistral'
          )
    
input = {"prompt_token_ids": [1, 6, 1501, 7567, 1891, 2032, 1113, 3396, 1316, 1113, 3396, 2032, 10598, 1629, 2032, 1113, 1295, 29498, 3790, 29498, 1537, 1991, 1316, 1113, 7286, 2032, 1113, 2226, 1040, 2636, 8854, 1065, 1032, 2846, 5491, 1316, 1113, 12206, 2032, 10598, 1891, 2032, 1113, 3582, 1316, 1113, 11491, 2032, 10598, 3501, 2032, 10598, 1891, 2032, 1113, 2195, 1316, 1113, 7286, 2032, 1113, 1782, 3758, 1072, 2433, 29493, 1085, 29491, 29489, 29491, 4420, 10454, 29493, 10229, 8474, 1113, 6074, 2032, 10598, 1891, 2032, 1113, 2195, 1316, 1113, 10825, 2032, 8135, 29485, 1958, 3938, 1316, 1113, 29490, 19425, 13075, 3010, 11549, 1113, 11661, 2032, 8135, 3501, 3010, 1743, 10925, 7, 3, 2592, 1117, 1040, 8854, 1505, 1065, 9911, 3922, 29572, 4]}

params = SamplingParams( max_tokens=100)
guided_req = GuidedDecodingRequest(guided_json={'type': 'object', 'properties': {'location': {'type': 'string', 'description': 'The city and state, e.g. San Francisco, CA'}, 'unit': {'type': 'string', 'enum': ['celsius', 'fahrenheit']}}, 'required': ['location']})

# Invoke the LLM with tool calling
outputs = llm.generate(input, sampling_params=params, guided_options_request=guided_req, use_tqdm=False)
output = outputs[0].outputs[0].text
print(json.dumps(output) )

Output (truncated by the max tokens parameter)

"{\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"

Notes: With outlines it generates correctly. lm-fomat-enforcer as well but with couple of spaces.

Signed-off-by: Wallas Santos <[email protected]>

github-actions · 2025-02-04T18:18:22Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

russellb

Is there a good reason we shouldn't just make any_whitespace=False the default? It seems like that should be fine for all cases and will ensure this problem doesn't happen without having to find the knob.

mgoin · 2025-02-04T18:40:16Z

Is this something you could control with just the knowledge of whether we are using an HF or Mistral tokenizer?

wallashss · 2025-02-04T19:07:59Z

Thank you @russellb and @mgoin for the quick feedback. Let's discuss.

Is there a good reason we shouldn't just make any_whitespace=False the default? It seems like that should be fine for all cases and will ensure this problem doesn't happen without having to find the knob.

Good point, from the vllm perspective, IMO set any_whitespace=False by default is more backward compatible, since this change on xrammar comes with a regression that we found in our tests. But accordingly to this [xgrammar PR] (mlc-ai/xgrammar#123) some models can have issues without that too, and vllm upgraded xgrammar with this new behavior. I thought it would make sense to avoid this switch of behaviors too much.

Is this something you could control with just the knowledge of whether we are using an HF or Mistral tokenizer?

Thought about that too, but I am not sure if this is an issue only for mistral.

My intention with this variable is to have an option that can fix/restore the issue that we found in our environment without the risk of getting more regression with other models or scenarios. This might be a solution for similar cases and I guess we could see if the community report related bugs before we set the default behavior. I guess we should consider that for the V1, which AFAIK will have xgrammar as default guided decoding backend and it is not yet implemented there.

mergify · 2025-02-05T06:47:13Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @wallashss.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

[Bugfix] Env var to to disable xgrammar any_whitespace

2756335

Signed-off-by: Wallas Santos <[email protected]>

wallashss requested a review from mgoin as a code owner February 4, 2025 18:18

mergify bot added the structured-output label Feb 4, 2025

russellb suggested changes Feb 4, 2025

View reviewed changes

mergify bot added the needs-rebase label Feb 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] Env var to to disable xgrammar any_whitespace #12744

[Bugfix] Env var to to disable xgrammar any_whitespace #12744

wallashss commented Feb 4, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Feb 4, 2025

russellb left a comment

mgoin commented Feb 4, 2025

wallashss commented Feb 4, 2025

mergify bot commented Feb 5, 2025

[Bugfix] Env var to to disable xgrammar any_whitespace #12744

Are you sure you want to change the base?

[Bugfix] Env var to to disable xgrammar any_whitespace #12744

Conversation

wallashss commented Feb 4, 2025 • edited by github-actions bot Loading

github-actions bot commented Feb 4, 2025

russellb left a comment

Choose a reason for hiding this comment

mgoin commented Feb 4, 2025

wallashss commented Feb 4, 2025

mergify bot commented Feb 5, 2025

wallashss commented Feb 4, 2025 •

edited by github-actions bot

Loading