[Frontend] Adding the "User Defined Custom Tool Calling" parser for the Llama models #12752
+381
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
The current Llama tool parsing in vLLM is based on the JSON based tool calling using the procedure given by Meta. However, another tool parsing strategy is mentioned on this same website : The User Defined Custom Tool Calling.
The gain is substantial : After testing this approach as a plugin on a private function calling benchmark (more than 120 different scenarios tested with a set of 30 complex and lengthy fintech tool definitions), I observed significantly higher function-calling accuracy compared to the current JSON-based tool parser. I also run some experiments on the BFCL benchmark (AST non-live bench) and could observe the same types of improvements:
This PR introduces a new
Llama3UserDefinedCustomToolParser
class that extends theToolParser
base class. The new parser allows for streaming support when using custom tools with the Llama models. It handles the extraction of tool calls and arguments from the model's response in streaming too, enabling real-time processing of tool calls.The flow looks like this :
Main Changes
Llama3UserDefinedCustomToolParser
class is added to handle streaming tool calls for Llama models.vllm/examples/tool_chat_template_llama3.1_usr_def_tool_call.jinja
Remarks
This is my first PR on the vLLM project and I believe there is still some stuff I need guidance on :