-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] Env var to to disable xgrammar any_whitespace #12744
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Wallas Santos <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a good reason we shouldn't just make any_whitespace=False
the default? It seems like that should be fine for all cases and will ensure this problem doesn't happen without having to find the knob.
Is this something you could control with just the knowledge of whether we are using an HF or Mistral tokenizer? |
Thank you @russellb and @mgoin for the quick feedback. Let's discuss.
Good point, from the vllm perspective, IMO set
Thought about that too, but I am not sure if this is an issue only for mistral. My intention with this variable is to have an option that can fix/restore the issue that we found in our environment without the risk of getting more regression with other models or scenarios. This might be a solution for similar cases and I guess we could see if the community report related bugs before we set the default behavior. I guess we should consider that for the V1, which AFAIK will have xgrammar as default guided decoding backend and it is not yet implemented there. |
This pull request has merge conflicts that must be resolved before it can be |
Mistral models + guided decoding with json schema with xgrammar is generating endless whitespace. This bug was introduced with this change on xgrammar. My proposal is to add an environment variable that can disable whitespace on guided decoding with json scheme. Therefore, serving models like mistral will behave fine like before the change of xgrammar.
Minimal script to repro
Output (truncated by the max tokens parameter)