`tool-call`: fix DeepSeek R1 Qwen distills #11607

ochafik · 2025-02-03T02:40:03Z

Fixes tool call support of DeepSeek-R1-Distill-Qwen-7B & 32B (follow up to #9639), and adds <think>thoughts</think> parsing.

(Split off Minja changes in #11641, will declutter diff once merged)

Had to work around the official template:
- It doesn't describe the available tools, and the backfill done by Minja wasn't phrased well enough (for the 7B model), so I've added autogenerated tool call examples to minja's revamped "polyfill" behaviour (using a delta template eval).
  sync: minja #11641
- It ignores message.tool_calls if message.content is not null, updated / testing the server output accordingly (better oai compliance)
- After a tool result, it leaves the prompt hanging on a <｜tool▁outputs▁end｜> instead of ending with <｜end▁of▁sentence｜><｜Assistant｜>).
  - Hacked a workaround so the default template now works well with this branch
  - Added / documented better template (models/templates/llama-cpp-deepseek-r1.jinja)
Both 8B & 32B models seem to take liberties with their tool call start tag, so accepting variations of the syntax (which then triggers the lazy grammar / full compliance)
I've added a "thoughts" field for <think> content to the API similar to the "tool_plan" output of tool-call: support Command R7B (+ return tool_plan "thoughts" in API) #11585
- Note: thoughts from previous messages are explicitly stripped from the prompt by the template, and we don't try and force them back (template wouldn't render tool calls if there was any content anyway, and while stripping thoughts will locally reset the KV cache, losing the hot tokens of the tool calls, it will save up tokens a lot in the long term over a chat)
Added the Q4_K_M quant to some (but not all) slow server tests (had to tell it not to overthink, bit... ironic).
Added slow tool result server tests (checking models make some use of tool call results, which some struggle a bit with)

# Please try with and w/o the chat template override and report back on your results :-)
llama-server --jinja -fa -hf bartowski/DeepSeek-R1-Distill-Qwen-7B-GGUF:Q4_K_M \
  --chat-template-file models/templates/llama-cpp-deepseek-r1.jinja
llama-server --jinja -fa -hf bartowski/DeepSeek-R1-Distill-Qwen-32B-GGUF:Q6_K_L \
  --chat-template-file models/templates/llama-cpp-deepseek-r1.jinja

TODOs:

Test content is null for tool calls
Test changes in Minja repo (add backfill goldens, maybe start adding control over backfill behaviour?)
Test prompting w/ tool results in server tests
send --jinja --chat-template chatml support separately (tool-call: allow --chat-template chatml w/ --jinja, default to chatml upon parsing issue, avoid double bos #11616)
- send double BOS fix separately
send opportunistic fixes for Command R7B separately (tool-call: command r7b fix for normal responses #11608)

Possible follow ups

Document differences between stream & non-stream modes (thought & tool_plan not sent in stream)
look at the Llama distill more closely (see Eval bug: trivial grammar crashes (DeepSeek R1 Distill Llama 8B) #11591)

…le tool call delta!)

…reamed mode)

…s, explicitly skipped in new template options

google/minja@182de30

… warn

ochafik added 11 commits February 3, 2025 01:03

minja: enhance backfill of templates w/o tools description (use examp…

d3b60b8

…le tool call delta!)

pass vocab to common_chat_params_init

87de852

DeepSeek R1: parse thoughts / return in separate field in API (non st…

130ca22

…reamed mode)

Avoid double bos w/ jinja

04d511b

server/oai: ensure content is null when there are tool calls

2834587

update logs

c80cb30

rename tests

0871628

tool-call: allow --jinja --chat-template chatml

73d08d4

tool-call: fix command-r7b parsing when response is multiline

04be723

tool-calls: add DeepSeek R1 Qwen 7B to server test_hello_world

ae9d581

tell DS R1 not to overthink (weather test)

19bea4e

github-actions bot added testing Everything test related examples python python script changes server labels Feb 3, 2025

add deepseek models to server tool call section in readme

5e6f2a2

ochafik mentioned this pull request Feb 3, 2025

Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars #9639

Merged

41 tasks

ochafik added 3 commits February 3, 2025 04:07

tool-call: allow --jinja --chat-template chatml

1e9acd2

Update test_tool_call.py

77ae97e

minimize diffs

a76073c

This was referenced Feb 3, 2025

Eval bug: <think> tag with DeepSeek-R1-Distill-Qwen-1.5B-Q5_K_M.gguf #11325

Open

Eval bug: trivial grammar crashes (DeepSeek R1 Distill Llama 8B) #11591

Open

ochafik added 8 commits February 3, 2025 13:58

fix typo

cf83623

fix double bos issue (drop bos/eos tokens from jinja template)

5d18d76

fix bad merge

aa98e59

fix build / rm diff

2b3c482

Merge branch 'jinja-chatml' into r1-toolcall

4cb0e1d

add missing try catch around jinja parsing to default to chatml

b2dd490

Merge branch 'jinja-chatml' into r1-toolcall

08271b5

tool-calls: r1: add missing <｜tool▁calls▁end｜> to grammar!

df3474e

ochafik added 19 commits February 3, 2025 23:31

sync: minja google/minja#46

108da90

Merge branch 'master' into r1-toolcall

bc6d910

actually we want eos_token in the template to infer tool call example…

11c1f0c

…s, explicitly skipped in new template options

update to minja's new api

30ea359

sync: minja

bbd45bf

simplify hack to fix original template's backfill from minja

bff549d

tool-call: r1: add one more trigger approx "<｜tool calls begin｜>"

ce28224

r1: fix inadvertent newline in grammar before <｜tool▁call▁end｜>

e84ee88

tool-call: r1: fix grammar

18a11f4

move trigger_words init inside non-llguidance branch

9a6847c

fix / test parsing of r1 parser

a682d12

Fix / test models/templates/llama-cpp-deepseek-r1.jinja

f0154a6

update test_calc_result

326e700

fix test_calc_result

78b47bb

fix spaces

86994db

sync: minja

09caa63

google/minja@182de30

Update test-chat.cpp

b152729

fix mistral chat test: need empty tokens

56a14dd

Update chat.cpp

f12e350

ochafik mentioned this pull request Feb 4, 2025

sync: minja #11641

Open

ochafik added 6 commits February 4, 2025 04:05

Merge branch 'sync-minja-4' into r1-toolcall

d43e4f6

server: check that content is null when we get tool_calls

812544a

tool-call: ensure we don't return content when there are tool calls /…

d44eb95

… warn

fix mistral expectation

b6e14a4

ensure deepseek r1 thoughts parsed even w/o tool calls

1f5ec59

fix test-chat

438ce0b

ochafik changed the title ~~tool-call: fix DeepSeek R1 Qwen distill (WIP)~~ tool-call: fix DeepSeek R1 Qwen distill Feb 4, 2025

ochafik marked this pull request as ready for review February 4, 2025 04:57

ochafik requested a review from ngxson as a code owner February 4, 2025 04:57

ochafik changed the title ~~tool-call: fix DeepSeek R1 Qwen distill~~ tool-call: fix DeepSeek R1 Qwen distills Feb 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`tool-call`: fix DeepSeek R1 Qwen distills #11607

`tool-call`: fix DeepSeek R1 Qwen distills #11607

ochafik commented Feb 3, 2025 •

edited

Loading

tool-call: fix DeepSeek R1 Qwen distills #11607

Are you sure you want to change the base?

tool-call: fix DeepSeek R1 Qwen distills #11607

Conversation

ochafik commented Feb 3, 2025 • edited Loading

`tool-call`: fix DeepSeek R1 Qwen distills #11607

`tool-call`: fix DeepSeek R1 Qwen distills #11607

ochafik commented Feb 3, 2025 •

edited

Loading