-
Notifications
You must be signed in to change notification settings - Fork 13k
common: Yet another add GLM-4.5 tool calling support #15904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Use |
Got a runtime error:
Looks like it is happening because of the "<10" characters in the generated text during a function call parsing. Probably it is trying to parse <10 as the beginning of an xml tag? |
@sbrnaderi From the log you provided, there isn’t anything unexpected. The JSON parse error occurs because I first try to parse arg_value as JSON; if that fails, it is parsed as a raw string. The failure log cannot be suppressed due to the design of llama.cpp. |
@hksdpc255 so, you are trying to parse the xml format from the GLM model to JSON, but I think what goes wrong here is that the "<10" part of the text is recognised as an xml tag. No?
|
@sbrnaderi Would you be able to share more logs or your prompt? The current log you shared doesn’t seem to have any problem, and additional details would help me figure out what’s going wrong. |
@sbrnaderi I guess your issue is fixed by latest commit. |
@hksdpc255 thanks, I will try your new commit. |
I'm running this PR with the supplied chat template and it is working 👍 |
This PR introduces an enhanced implementation of tool calling for GLM-4.5, building upon the existing contributions by @dhandhalyabhavik and @susmitds (see PR #15186).
Key improvements include:
Grammar-constrained tool-call outputs
The model’s tool-call messages are now rigorously enforced by a defined grammar, ensuring that generated calls are always well-formed and reliably parsed.
Streaming support for tool-call parsing
I have added streaming capabilities to the parser to handle tool-call messages as they’re generated. This enhancement enables more responsive and real-time interactions during inference.
Use this Jinja template while testing:
Although not yet implemented, I‘m planning the following improvements:
Patch jinja template in
common_chat_params_init_glm_4_5
to make it compatible with the original Unsloth GGUF chat template, and potentially even with the official chat template.Add dedicated unit tests for grammar enforcement and streaming parsing.
Testing and feedback are welcome.
Suggested commit message after squash commits: