Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bugfix] Env var to to disable xgrammar any_whitespace #12744

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

wallashss
Copy link
Contributor

@wallashss wallashss commented Feb 4, 2025

Mistral models + guided decoding with json schema with xgrammar is generating endless whitespace. This bug was introduced with this change on xgrammar. My proposal is to add an environment variable that can disable whitespace on guided decoding with json scheme. Therefore, serving models like mistral will behave fine like before the change of xgrammar.

Minimal script to repro

from vllm import LLM, SamplingParams
from vllm.model_executor.guided_decoding.guided_fields import GuidedDecodingRequest

import json

# guided_decoding_backend='lm-format-enforcer'
# guided_decoding_backend='outlines'
guided_decoding_backend='xgrammar'
# Initialize the LLM

llm = LLM(model="mistralai/Mistral-7B-Instruct-v0.3", enforce_eager=True, guided_decoding_backend=guided_decoding_backend, tokenizer_mode='mistral'
          )
    
input = {"prompt_token_ids": [1, 6, 1501, 7567, 1891, 2032, 1113, 3396, 1316, 1113, 3396, 2032, 10598, 1629, 2032, 1113, 1295, 29498, 3790, 29498, 1537, 1991, 1316, 1113, 7286, 2032, 1113, 2226, 1040, 2636, 8854, 1065, 1032, 2846, 5491, 1316, 1113, 12206, 2032, 10598, 1891, 2032, 1113, 3582, 1316, 1113, 11491, 2032, 10598, 3501, 2032, 10598, 1891, 2032, 1113, 2195, 1316, 1113, 7286, 2032, 1113, 1782, 3758, 1072, 2433, 29493, 1085, 29491, 29489, 29491, 4420, 10454, 29493, 10229, 8474, 1113, 6074, 2032, 10598, 1891, 2032, 1113, 2195, 1316, 1113, 10825, 2032, 8135, 29485, 1958, 3938, 1316, 1113, 29490, 19425, 13075, 3010, 11549, 1113, 11661, 2032, 8135, 3501, 3010, 1743, 10925, 7, 3, 2592, 1117, 1040, 8854, 1505, 1065, 9911, 3922, 29572, 4]}

params = SamplingParams( max_tokens=100)
guided_req = GuidedDecodingRequest(guided_json={'type': 'object', 'properties': {'location': {'type': 'string', 'description': 'The city and state, e.g. San Francisco, CA'}, 'unit': {'type': 'string', 'enum': ['celsius', 'fahrenheit']}}, 'required': ['location']})

# Invoke the LLM with tool calling
outputs = llm.generate(input, sampling_params=params, guided_options_request=guided_req, use_tqdm=False)
output = outputs[0].outputs[0].text
print(json.dumps(output) )

Output (truncated by the max tokens parameter)

"{\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"

Notes: With outlines it generates correctly. lm-fomat-enforcer as well but with couple of spaces.

@wallashss wallashss requested a review from mgoin as a code owner February 4, 2025 18:18
Copy link

github-actions bot commented Feb 4, 2025

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

Copy link
Member

@russellb russellb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a good reason we shouldn't just make any_whitespace=False the default? It seems like that should be fine for all cases and will ensure this problem doesn't happen without having to find the knob.

@mgoin
Copy link
Member

mgoin commented Feb 4, 2025

Is this something you could control with just the knowledge of whether we are using an HF or Mistral tokenizer?

@wallashss
Copy link
Contributor Author

Thank you @russellb and @mgoin for the quick feedback. Let's discuss.

Is there a good reason we shouldn't just make any_whitespace=False the default? It seems like that should be fine for all cases and will ensure this problem doesn't happen without having to find the knob.

Good point, from the vllm perspective, IMO set any_whitespace=False by default is more backward compatible, since this change on xrammar comes with a regression that we found in our tests. But accordingly to this [xgrammar PR] (mlc-ai/xgrammar#123) some models can have issues without that too, and vllm upgraded xgrammar with this new behavior. I thought it would make sense to avoid this switch of behaviors too much.

Is this something you could control with just the knowledge of whether we are using an HF or Mistral tokenizer?

Thought about that too, but I am not sure if this is an issue only for mistral.

My intention with this variable is to have an option that can fix/restore the issue that we found in our environment without the risk of getting more regression with other models or scenarios. This might be a solution for similar cases and I guess we could see if the community report related bugs before we set the default behavior. I guess we should consider that for the V1, which AFAIK will have xgrammar as default guided decoding backend and it is not yet implemented there.

Copy link

mergify bot commented Feb 5, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @wallashss.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Feb 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants