Qwen3Coder unary output parser #3664

atobiszei · 2025-09-22T08:58:30Z

No description provided.

Copilot

Pull Request Overview

This PR implements a Qwen3Coder unary output parser to handle tool calls in the Qwen3Coder model format. The parser supports a unique XML-style format for function calls with parameters.

Key changes:

Adds a new Qwen3Coder tool parser with XML-style tag parsing
Updates all existing parsers to accept tool schemas as a parameter
Modifies the parser interface to include tool schema information

Reviewed Changes

Copilot reviewed 27 out of 27 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`src/llm/io_processing/qwen3coder/qwen3coder_tool_parser.hpp`	Header for new Qwen3Coder tool parser with XML-style tag definitions
`src/llm/io_processing/qwen3coder/qwen3coder_tool_parser.cpp`	Implementation of Qwen3Coder parser with state machine for XML tag parsing
`src/test/llm/output_parsers/qwen3coder_output_parser_test.cpp`	Comprehensive test suite for the new parser functionality
`src/llm/io_processing/base_output_parser.hpp`	Updates parser interface to include tool schemas parameter
`src/llm/io_processing/output_parser.cpp`	Registers new Qwen3Coder parser and passes tool schemas to parsers
Multiple parser files	Updates existing parsers to accept tool schemas parameter
Multiple test files	Updates test calls to include tool schemas parameter

Comments suppressed due to low confidence (2)

src/llm/io_processing/qwen3coder/qwen3coder_tool_parser.cpp:1

Commented-out member variable should be removed if not needed, or implemented if it serves a purpose.

//*****************************************************************************

src/llm/BUILD:144

Removed line that may be needed for proper Python compilation options. Verify this removal doesn't break Python integration.

    additional_copts = COPTS_PYTHON

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

src/llm/io_processing/qwen3coder/qwen3coder_tool_parser.hpp

src/llm/io_processing/qwen3coder/qwen3coder_tool_parser.cpp

src/llm/apis/openai_completions.cpp

src/llm/io_processing/qwen3coder/qwen3coder_tool_parser.hpp

src/llm/apis/openai_completions.hpp

src/llm/io_processing/qwen3coder/qwen3coder_tool_parser.cpp

Dockerfile.ubuntu

src/BUILD

mzegla · 2025-10-02T06:19:38Z

src/llm/BUILD

            "io_processing/gptoss/reasoning_parser.hpp",
            "io_processing/gptoss/tool_parser.hpp",
            "io_processing/gptoss/harmony.hpp",
+            "io_processing/qwen3coder/qwen3coder_tool_parser.hpp",


This naming breaks the convention. File is in "qwen3coder" catalog, so adding "qwen3coder_" prefix seems redundant.

It's for logging purposes -> in spdlog we have filename butif we have tool parser inside qwen3coder dir it does not show there. I would rename all tool parsers this way for faster debugging.

mzegla · 2025-10-02T06:21:21Z

src/llm/apis/openai_completions.cpp

    choice.AddMember("logprobs", Value(), allocator);
    if (endpoint == Endpoint::CHAT_COMPLETIONS) {
        if (outputParser != nullptr) {
+            // FIXME need tool maps for streaming


Is it still relevant?

not sure if github issue, but I still see that FIXME here

mzegla · 2025-10-02T06:24:49Z

src/llm/io_processing/base_output_parser.hpp

    std::string id;
    std::string name;
-    std::string arguments;
+    std::string arguments;  // JSON "{"a":1, "b":"SOME_STRING"}" TODO rename to know in context that's JSON


Regarding the "TODO" I would prefer to keep that name as it maps exactly to OpenAI response field we need to fill.
Maybe we could have a comment explaining that ToolCall struct is supposed to mirror tool call structure in OpenAI API.

Ok i see your point. It was confusing for me that it was JSON wrapped as stirng in arguments field initially, so I would at least keep the first part of the comment.

mzegla · 2025-10-02T08:56:32Z

src/llm/io_processing/qwen3coder/qwen3coder_tool_parser.cpp

+/*
+{"type":"response.output_item.added","response_id":"resp_1234xyz","output_index":0,"item":{"type":"function_call","id":"fc_1234xyz","call_id":"call_1234xyz","name":"get_weather","arguments":""}}
+{"type":"response.function_call_arguments.delta","response_id":"resp_1234xyz","item_id":"fc_1234xyz","output_index":0,"delta":"{\""}
+{"type":"response.function_call_arguments.delta","response_id":"resp_1234xyz","item_id":"fc_1234xyz","output_index":0,"delta":"location"}
+{"type":"response.function_call_arguments.delta","response_id":"resp_1234xyz","item_id":"fc_1234xyz","output_index":0,"delta":"\":\""}
+{"type":"response.function_call_arguments.delta","response_id":"resp_1234xyz","item_id":"fc_1234xyz","output_index":0,"delta":"Paris"}
+{"type":"response.function_call_arguments.delta","response_id":"resp_1234xyz","item_id":"fc_1234xyz","output_index":0,"delta":","}
+{"type":"response.function_call_arguments.delta","response_id":"resp_1234xyz","item_id":"fc_1234xyz","output_index":0,"delta":" France"}
+{"type":"response.function_call_arguments.delta","response_id":"resp_1234xyz","item_id":"fc_1234xyz","output_index":0,"delta":"\"}"}
+{"type":"response.function_call_arguments.done","response_id":"resp_1234xyz","item_id":"fc_1234xyz","output_index":0,"arguments":"{\"location\":\"Paris, France\"}"}
+{"type":"response.output_item.done","response_id":"resp_1234xyz","output_index":0,"item":{"type":"function_call","id":"fc_1234xyz","call_id":"call_1234xyz","name":"get_weather","arguments":"{\"location\":\"Paris, France\"}"}}
+*/
+// example1 {"location":"San Francisco"}
+// example1 {"city":"San Francisco","state":"CA", "length":5, "is_day":true, "temperatures":[5,6,7], "details":{"humidity":80,"condition":"sunny"}}]}
+// index is toolCallId


Is it leftover? Looks like Response API and shows deltas that your parser will not return (arguments broken into smaller pieces)

I kept this initially to underline the difference in qwen3coder parser that we will not do that incremental sending, but it may be confusing so I will remove it.

mzegla · 2025-10-02T08:56:57Z

src/llm/io_processing/qwen3coder/qwen3coder_tool_parser.cpp

+static std::string documentToString(const rapidjson::Document& doc) {
+    rapidjson::StringBuffer buffer;
+    rapidjson::Writer<rapidjson::StringBuffer> writer(buffer);
+    doc.Accept(writer);
+    return buffer.GetString();
+}


Could be moved to some common utilities place

mzegla · 2025-10-02T08:58:16Z

src/llm/io_processing/qwen3coder/qwen3coder_tool_parser.cpp

+std::optional<rapidjson::Document> Qwen3CoderToolParser::sendFullDelta(std::optional<ToolCalls>& toolCallsOpt) {
+    auto& toolCalls = toolCallsOpt.value();
+    if (toolCalls.size() != 1) {
+        SPDLOG_ERROR("For streaming we expected one tool call, got: {}", toolCalls.size());


Logging as error and if we want to treat it as error then maybe we should return nullopt here or throw?

We shouldnt crash ovms if we have error in tool parser as this is potential attack surface. I want this in error level so it will be easier potentially to track such occurences. Or we may consider returning error in all parsers but this will be bigger overhaul.

Are you sure throwing here crashes OVMS? I would say it will be some generic "Response generation error" on the client side only.

mzegla · 2025-10-02T09:02:22Z

src/llm/io_processing/qwen3coder/qwen3coder_tool_parser.cpp

+
+std::optional<rapidjson::Document> Qwen3CoderToolParser::sendFirstDeltaIfNeeded(const std::string& toolCallName) {
+    if (this->returnedFirstDeltas.size() != this->returnedCompleteDeltas.size()) {
+        SPDLOG_TRACE("Skipping first delta, already sent for current function, fi:{} co:{}", returnedFirstDeltas.size(), returnedCompleteDeltas.size());


The fi:{} co:{} part is not understandable without looking into the code.

mzegla · 2025-10-02T09:05:10Z

src/llm/io_processing/qwen3coder/qwen3coder_tool_parser.cpp

+}
+
+std::optional<rapidjson::Document> Qwen3CoderToolParser::sendFirstDeltaIfNeeded(const std::string& toolCallName) {
+    if (this->returnedFirstDeltas.size() != this->returnedCompleteDeltas.size()) {


Could it be this->returnedFirstDeltas.size() > this->returnedCompleteDeltas.size() ?
That would better indicate the connection between first deltas and complete deltas (like we always have either one more first delta returned when we already returned it for current function or the number is equal when we still wait for the first delta for current function). Is that right?

i will change it to comparison with returnedFirst == (returnedComplete + 1)

TODO: Extract param type from request as discovered by bfcl that otherwise we can't be sure eg with string ids

0.825 parallel multiple

Accuracy on BFCL simple multiple as in unary

BFCL: parallel_multiple. 🎯 Accuracy: 0.83 simple. 🎯 Accuracy: 0.9575 multiple. 🎯 Accuracy: 0.935

mzegla · 2025-10-02T09:20:55Z

src/llm/io_processing/qwen3coder/qwen3coder_tool_parser.hpp

+        return toolsStartTag;
+    }
+    const std::unordered_set<std::string>& getSpecialParsingStartTags() const override {
+        static const std::unordered_set<std::string> specialParsingStartTags = {toolsStartTag};


This should probably be empty. If qwen3 coder has only one way (one tag) to start tool parsing, it should not be repeated in special tags.

mzegla · 2025-10-02T09:27:43Z

src/llm/io_processing/qwen3coder/qwen3coder_tool_parser.hpp

+    }
+    // Tools calls are expected to be the last part of the content, so we do not specify an end tag.
+    const std::string& getParsingEndTag() const override {
+        return toolsEndTag;


I think we should return empty string here as well. Your end tag is also end tag for a single tool call and in that context is parsing end tag - meaning outer parser will switch phase (get out of tool parsing mode) after seeing the string you return here. It works, because you get opening tag right after it when you have multiple tool calls, so outer parser will get back to tool calls processing again, but nevertheless we should change that.

getParsingEndTag() should return tag that indicates end of parser work (like </think> in reasoning parser), not a separator between tool calls. If there is no definite tag that indicates "no more tool calls", we leave it empty and assume all remaining output is tool calls.

…arser

mzegla · 2025-10-06T09:34:29Z

src/llm/apis/openai_completions.cpp

                            parametersIt->value.Accept(writer);
                            std::string parametersStr = buffer.GetString();
-                            request.toolNameSchemaMap[nameIt->value.GetString()] = parametersStr;
+                            std::pair<rapidjson::Value*, std::string> schemaReprs = {&parametersIt->value, std::move(parametersStr)};


I think it might be clearer and more extensible if we have it in a struct like:

struct ToolSchema: rapidjson::Value& rapidjsonRepresentation; std::string stringRepresentation; ... (nlohmann::json nlohmannRepresentation) etc.

mzegla · 2025-10-06T09:35:34Z

src/llm/apis/openai_completions.cpp

    choice.AddMember("logprobs", Value(), allocator);
    if (endpoint == Endpoint::CHAT_COMPLETIONS) {
        if (outputParser != nullptr) {
+            // FIXME need tool maps for streaming


not sure if github issue, but I still see that FIXME here

mzegla · 2025-10-06T09:36:43Z

src/llm/io_processing/output_parser.cpp


    bool reasoningParserExistsAndSupportsStreaming = reasoningParser && !reasoningParser->getParsingStartTag().empty() && !reasoningParser->getParsingEndTag().empty();
-    bool toolParserExistsAndSupportsStreaming = toolParser && !toolParser->getParsingStartTag().empty();
+    bool toolParserExistsAndSupportsStreaming = toolParser && !toolParser->getParsingStartTag().empty();  // FIXME why not check for parsingEntTag not empty?


please remove fixme comment if it is clear

mzegla · 2025-10-06T09:44:31Z

src/llm/io_processing/qwen3coder/qwen3coder_tool_parser.cpp

+    ToolsParameterTypeMap_t toolsParametersTypes;
+    for (const auto& [toolName, schemaPair] : toolsSchemas) {
+        SPDLOG_TRACE("Creating tools parameters types for tool: {}, schema: {}", toolName, schemaPair.second);
+        toolsParametersTypes.emplace(toolName, parseToolSchema(toolName, *schemaPair.first));


Are you sure schemaPair is always valid at this point? Shouldn't we check it before access here?

yes its produced from here:
https://github.com/openvinotoolkit/model_server/pull/3664/files#diff-dc1e3d8f1c59d392baedaaa031edd2ef7bbb062fa25cd68fe99b00011f9f770bR343

mzegla · 2025-10-06T09:48:01Z

src/llm/io_processing/qwen3coder/qwen3coder_tool_parser.cpp

+    this->lazyFillInitToolParamatersTypsMap();
+    auto toolCallsOpt = this->streamParser.parseChunk(parsedOutput.content);
+    if (toolCallsOpt.has_value()) {
+        // TODO do we want to support not ending in content state?


Other parser parsers do not return tool calls if they are not properly finished (in unary mode). We could stick to that for now as changing this behavior would need to be done for all parsers.

mzegla · 2025-10-06T09:59:31Z

src/test/llm/output_parsers/qwen3coder_output_parser_test.cpp

+        // For Qwen3 model we use hermes3 tool parser (due to the same format of generated tool calls) and qwen3 reasoning parser
+        outputParser = std::make_unique<OutputParser>(*qwen3Tokenizer, "qwen3coder", "", toolsSchemas);
+    }
+    std::tuple<ov::Tensor, std::vector<int64_t>, ParsedOutput> doTheWork(const std::string& input) {


Can we have more precise name here?

mzegla · 2025-10-06T10:00:33Z

src/test/llm/output_parsers/qwen3coder_output_parser_test.cpp

+    }
+};
+TEST_F(Qwen3CoderOutputParserTest, Parse1ToolCall1Function1ArgumentTagsNewline) {
+    std::string input = R"(io_processing/hermes3/generation_config_builder.cpp


is this hermes3 path intended here?

Added by mistake

mzegla · 2025-10-06T10:01:02Z

src/test/llm/output_parsers/qwen3coder_output_parser_test.cpp

+    EXPECT_EQ(parsedOutput.toolCalls[0].arguments, "{\"arg1\": \"<value=abc>value1</value>\"}");
+    EXPECT_EQ(parsedOutput.toolCalls[0].id.empty(), false);
+}
+// FIXME check if two tool calls is a vali for outputparser as well not only for parser imple


change to TODO?

Test added.

src/test/llm/output_parsers/qwen3coder_output_parser_test.cpp

mzegla · 2025-10-06T10:09:17Z

src/test/llm/output_parsers/qwen3coder_output_parser_test.cpp

+    // since unary reuses streaming we don't need to test for partial tool calls
+    // if we don't get closing tag we don't emit tool call
+    int i = -1;
+    // FIXME add content in between tool_calls and test what happens


FIXME -> TODO?
Also can we have case with more than one parameter?

FIXME aready done, will add functin with 2nd parameter

Copilot

Pull Request Overview

Copilot reviewed 35 out of 35 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

src/llm/io_processing/qwen3coder/qwen3coder_tool_parser.cpp:1

Corrected spelling of 'prametersIt' to 'parametersIt'.

//*****************************************************************************

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

src/test/llm/output_parsers/qwen3coder_output_parser_test.cpp

src/llm/io_processing/output_parser.cpp

atobiszei added the WIP Do not merge until resolved label Sep 22, 2025

atobiszei force-pushed the atobisze_qwen3_tool_parser branch from 18b9c2c to f9b33d7 Compare September 22, 2025 08:59

atobiszei requested a review from Copilot September 22, 2025 10:43

Copilot AI reviewed Sep 22, 2025

View reviewed changes

atobiszei force-pushed the atobisze_qwen3_tool_parser branch from bba2dd1 to e095f8b Compare September 29, 2025 13:23

atobiszei commented Oct 1, 2025

View reviewed changes

src/llm/io_processing/qwen3coder/qwen3coder_tool_parser.hpp Outdated Show resolved Hide resolved

atobiszei commented Oct 1, 2025

View reviewed changes

src/llm/apis/openai_completions.hpp Outdated Show resolved Hide resolved

atobiszei commented Oct 1, 2025

View reviewed changes

src/llm/io_processing/qwen3coder/qwen3coder_tool_parser.cpp Outdated Show resolved Hide resolved

mzegla reviewed Oct 2, 2025

View reviewed changes

atobiszei added 18 commits October 2, 2025 11:19

Begin

0428b48

Checkpoint

e6f0c21

WIP

58fc69d

TODO: Extract param type from request as discovered by bfcl that otherwise we can't be sure eg with string ids

Qwen3Coder bfcl - 0.95 simple/multiple

d6cc713

0.825 parallel multiple

Test fixes

42d8e4b

Self-review

3b8b620

Self-review p2

312eb9f

Spell fix

198aef9

Streaming working

1b80de7

Accuracy on BFCL simple multiple as in unary

Style fixes

895330a

Add handling arguments in chat template in string format

c1a9c54

BFCL: parallel_multiple. 🎯 Accuracy: 0.83 simple. 🎯 Accuracy: 0.9575 multiple. 🎯 Accuracy: 0.935

Fix rebase

ea14b3b

Fix Qwen3CoderBfcl test

145726e

Unary & stream unification

47e0087

Refactor cd

36bf91d

Refactor cd2

8dafe14

Logging fixes

42d6fec

Self-review

55e4cc0

atobiszei force-pushed the atobisze_qwen3_tool_parser branch from 6aebb03 to 55e4cc0 Compare October 2, 2025 09:19

mzegla reviewed Oct 2, 2025

View reviewed changes

atobiszei removed the WIP Do not merge until resolved label Oct 2, 2025

atobiszei and others added 6 commits October 2, 2025 15:10

Review fixes p1

de5a859

Merge branch 'main' into atobisze_qwen3_tool_parser

900bb90

Skip double tool schema parsing

84ac218

Create rapidjson utils

cbe4fab

Extend streaming test with another tool call

e70123f

Output parsers build split

1023820

atobiszei force-pushed the atobisze_qwen3_tool_parser branch from dffd55e to 1023820 Compare October 3, 2025 13:29

atobiszei added 2 commits October 3, 2025 15:32

Merge remote-tracking branch 'origin/main' into atobisze_qwen3_tool_p…

830869e

…arser

Fix BUILD file

33cd9f2

atobiszei requested a review from dtrawins October 3, 2025 13:57

Fix unconvential using

9b8c2a3

mzegla reviewed Oct 6, 2025

View reviewed changes

mzegla requested a review from Copilot October 6, 2025 10:09

Copilot AI reviewed Oct 6, 2025

View reviewed changes

src/test/llm/output_parsers/qwen3coder_output_parser_test.cpp Outdated Show resolved Hide resolved

src/test/llm/output_parsers/qwen3coder_output_parser_test.cpp Outdated Show resolved Hide resolved

src/llm/io_processing/output_parser.cpp Outdated Show resolved Hide resolved

Review fixes

e648478

atobiszei force-pushed the atobisze_qwen3_tool_parser branch from fbd7ae7 to e648478 Compare October 6, 2025 14:12

dtrawins approved these changes Oct 6, 2025

View reviewed changes

Qwen3Coder unary output parser #3664

Are you sure you want to change the base?

Qwen3Coder unary output parser #3664

Conversation

atobiszei commented Sep 22, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mzegla Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

mzegla Oct 2, 2025 •

edited

Loading