fix(inference): preserve reasoning_content across multi-turn conversations (#2800)#2806
Conversation
…tions
Thinking-mode models (DeepSeek-R1, Qwen3, etc.) return a reasoning_content
field that the API requires to be passed back verbatim on subsequent turns.
Previously the field was discarded after the first response, causing every
follow-up turn to fail with 400 "reasoning_content must be passed back".
Two changes together fix the round-trip:
1. parse_native_response: when the response has reasoning_content and no
tool calls, encode both content and reasoning_content as a JSON object
in the returned text (matching the existing pattern for tool-call
messages). The JSON is transparent to callers — they store it as the
ChatMessage content unchanged.
2. convert_messages_for_native: detect the {"content":..,"reasoning_content":..}
JSON envelope in stored assistant messages and unpack it into a
NativeMessage with the reasoning_content field set. This ensures the
field appears in the outbound API payload on the next turn.
NativeMessage gains a new optional reasoning_content field with
skip_serializing_if = "Option::is_none" so it is only emitted when present
and never breaks vanilla providers that do not understand the field.
Four unit tests cover: response preservation, plain-text pass-through,
round-trip restoration in convert_messages_for_native, and no spurious
field for non-thinking models.
Fixes tinyhumansai#2800
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThe PR preserves and replays thinking-model ChangesThinking model reasoning_content multi-turn replay
Sequence DiagramsequenceDiagram
participant Client
participant OpenAiCompatibleProvider
participant ThinkingModel
participant ThreadStorage
Client->>OpenAiCompatibleProvider: Turn 1 request
OpenAiCompatibleProvider->>ThinkingModel: convert_messages_for_native (no prior reasoning)
ThinkingModel->>OpenAiCompatibleProvider: response with content + reasoning_content
OpenAiCompatibleProvider->>OpenAiCompatibleProvider: parse_native_response (encode content+reasoning into JSON text)
OpenAiCompatibleProvider->>ThreadStorage: store message text (JSON with reasoning)
Client->>OpenAiCompatibleProvider: Turn 2 request
ThreadStorage->>OpenAiCompatibleProvider: load prior message (JSON text)
OpenAiCompatibleProvider->>OpenAiCompatibleProvider: convert_messages_for_native (decode JSON, restore reasoning_content)
OpenAiCompatibleProvider->>ThinkingModel: send message with reasoning_content restored
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
oxoxDev
left a comment
There was a problem hiding this comment.
Author: @staimoorulhassan (
FIRST_TIME_CONTRIBUTOR— external contributor) — welcome and thanks for picking this up!
⚠ Duplicate situation (3 PRs targeting same bug)
Three open PRs address the same Sentry issue / #2800:
| PR | Author | Scope | Approach |
|---|---|---|---|
| #2806 (this PR) | @staimoorulhassan (first-time) | 3 files, +137 | JSON-blob in message text field (in-place hack) |
| #2818 | @graycyrus (core) | 25 files, +325/-7 | Adds reasoning_content: Option<String> to ChatResponse struct, threaded through stack |
| #2817 | @CodeGhost21 (core) | 28 files | Tool-call-turn variant of same fix |
#2818 is architecturally cleaner (proper field on ChatResponse, no encoding in text). #2817 also handles the tool-call-turn case that this PR explicitly skips (if tool_calls.is_empty() gate at compatible.rs:843).
Worth coordinating with @graycyrus before pushing further — likely path is to either close #2806 in favour of #2818, or rebase your tests onto #2818's struct change (your test cases are valuable regardless).
Concerns (independent of duplicate question)
Major
1. src/openhuman/inference/provider/compatible.rs:843 — tool-call turns LOSE reasoning_content
The gate if tool_calls.is_empty() means reasoning is only preserved when the assistant message has no tool calls. Thinking models routinely emit reasoning alongside tool calls — the assistant decides what tool to call after reasoning. That entire path drops reasoning_content. #2817 by @CodeGhost21 specifically addresses this case. Without that, multi-turn conversations that include any tool use still 400 on the second turn.
2. src/openhuman/inference/provider/compatible.rs:846-855 — JSON-blob in text field leaks into UI / transcript / exports
The assistant message text is consumed by many downstream paths — UI rendering, conversation export, payload_summarizer, subagent_runner::extract_tool, transcript dumps. Any of these will now see a raw {"content":"...","reasoning_content":"..."} string instead of plain message text. Risks:
- UI shows JSON to the user instead of the message
- Summarizer fails to parse / produces broken summaries
- Transcript exports contain encoded JSON instead of conversational text
- Search / indexing matches stringified JSON tokens
The proper fix is to add reasoning_content as a first-class field on ChatResponse / NativeMessage and thread it through the stack — exactly what #2818 does. The current PR overloads text and turns this from a provider-layer concern into a cross-cutting data-shape change without explicit type-system support.
3. src/openhuman/inference/provider/compatible.rs:583-602 — naive JSON-blob detection misclassifies legitimate JSON user content
The decode logic in native_messages_from_transcript:
if let Some(rc) = value.get("reasoning_content").and_then(serde_json::Value::as_str).filter(|s| !s.is_empty())If the user message text legitimately contains the JSON {"content":"...","reasoning_content":"..."} (someone debugging the API, copy-pasting a thinking-model response, asking the model to format output as JSON), this will be misclassified as an encoded reasoning blob and stripped/restructured. Worth a magic-token prefix (e.g. __openhuman_thinking__{...} or a struct sentinel) to make detection unambiguous.
Minor
compatible.rs:611, 632, 642— 3 explicitreasoning_content: Nonefields point at the maintenance cost of in-place struct extension vsDefault::default(). If #2818's approach lands, all of these go away.compatible_types.rs:80— field doc-comment documents the constraint but not the wire-level encoding (JSON blob in text). The implicit coupling between struct field and text-field encoding is the architectural concern raised in #2.
Questions
- Did you see #2818 / #2817 before opening this? Both target the same Sentry issue and were created earlier today.
- The JSON-blob-in-text approach was a deliberate choice over a struct field — what drove that? (Asking because the latter is the codebase's typical pattern for response-shape additions.)
- Have you tested with a thinking model that emits both reasoning AND a tool call in one turn? Per concern #1, that path is currently silently broken.
Verified / looks good
reasoning_contentfield correctly uses#[serde(skip_serializing_if = "Option::is_none")]— non-thinking models won't send the field on the wire- New tests are clean and well-targeted within the scope chosen (no-tool-call case round-trip + non-thinking path unaffected)
- 3 explicit
Noneinitializations are mechanically correct - PR body links #2800 cleanly; commit signature looks fine
- coderabbit APPROVED (after DISMISS+resubmit)
Suggested next step
- Drop a note on #2818 introducing yourself + your test cases — they're useful (the round-trip + JSON-blob-detection tests translate to direct round-trip tests against #2818's struct-field approach).
- Close #2806 once #2818 lands, OR contribute the tool-call-turn coverage on top of #2818 if @CodeGhost21's #2817 doesn't already handle it.
Thanks again for jumping on this — first contribution and you correctly identified the root cause. Coordination is the main blocker here, not your code.
graycyrus
left a comment
There was a problem hiding this comment.
Root cause is correctly identified and the test coverage is solid. Two architectural issues make this not mergeable as-is, and there's a coordination question worth resolving first.
What it does: Encodes reasoning_content alongside content as a JSON blob in the assistant message text field, then detects and unpacks that blob on the next turn to reconstruct the NativeMessage with reasoning_content set. Fixes the 400 from DeepSeek-R1 / Qwen3 on multi-turn conversations.
Breaking risk: Medium. The JSON-blob encoding changes the shape of data stored in assistant message text fields — anything that consumes ProviderChatResponse::text downstream (UI, summarizer, exports) will now see raw JSON for thinking-model turns.
Security risk: Zero.
Bottom line: Not safe to merge — JSON-blob in text field leaks to unrelated consumers, and the tool-call gate silently breaks agentic flows.
Note on coordination: there is an open PR targeting the same bug with a struct-level fix — adds reasoning_content: Option<String> to ChatResponse and threads it through the stack without touching the text field. Your tests (the round-trip test especially) are the most valuable part of this PR and translate directly to tests for that approach. Worth looking at whether you'd like to rebase your test cases onto that branch.
staimoorulhassan
left a comment
There was a problem hiding this comment.
changes confirmed
|
Closing as superseded by #2818 (merged), which landed an architectural fix for the same reasoning_content multi-turn issue (#2800). Your patch on Thanks for the contribution — your investigation helped surface the bug. If there's residual coverage gap or test case from your branch you'd like upstreamed, please open a follow-up PR cherry-picking just that piece on top of current main. |
Summary
Closes #2800.
Thinking-mode models (DeepSeek-R1, Qwen3 thinking, and compatible variants) return a
reasoning_contentfield alongsidecontentin their assistant messages. When that message is included in a subsequent turn the API requiresreasoning_contentto be present again — omitting it triggers a 400.Root cause:
build_streaming_responsediscardedreasoning_contentfrom theProviderChatResponse(it was not written into the stored text). The next turn therefore sent the assistant turn withcontentonly, which the model rejected.Changes:
compatible.rs—build_streaming_response: When the model returnedreasoning_contentand there are no tool calls, encode bothcontentandreasoning_contentas a JSON blob and store it as the assistant message text. Adds adebug!log line for observability.compatible.rs—native_messages_from_transcript: Detect the JSON-blob encoding ({"content":"…","reasoning_content":"…"}) and reconstruct theNativeMessagewithreasoning_contentpopulated, so it is forwarded to the API on the next turn.compatible_types.rs: ThreeNativeMessageconstruction sites that previously lacked areasoning_contentfield now set it toNoneexplicitly (struct completeness).compatible_tests.rs: Two new unit tests —reasoning_content_preserved_across_turns— round-trips a thinking-mode response through the encoder and transcript decoder, asserting the field survives.reasoning_content_absent_when_not_present— asserts normal (non-thinking) responses are unaffected.Test plan
cargo test -p openhuman -- inference::provider::compatible_testspasses (two new tests + existing suite unchanged)reasoning_contentin response) are unaffected — text stored as plain string as beforeCloses #2800
Summary by CodeRabbit