Skip to content

fix(providers): dedup tool specs at wire boundary to prevent 400 "Tool names must be unique"#2846

Merged
M3gA-Mind merged 2 commits into
tinyhumansai:mainfrom
YellowSnnowmann:fix/compatible-dedup-tool-specs-at-wire
May 28, 2026
Merged

fix(providers): dedup tool specs at wire boundary to prevent 400 "Tool names must be unique"#2846
M3gA-Mind merged 2 commits into
tinyhumansai:mainfrom
YellowSnnowmann:fix/compatible-dedup-tool-specs-at-wire

Conversation

@YellowSnnowmann
Copy link
Copy Markdown
Contributor

@YellowSnnowmann YellowSnnowmann commented May 28, 2026

Summary

  • Add first-wins dedup by function.name inside OpenAiCompatibleProvider::convert_tool_specs so duplicate ToolSpec entries never reach the provider's tools array on the wire.

  • Emit a single log::warn! per request listing the dropped names — no per-call spam, but visibility into where the duplicates originated.

  • Add 5 unit tests covering None, empty input, unique passthrough, first-wins dedup (verifying the surviving entry's description / parameters are the first occurrence), and many-duplicates.

Problem

  • Sentry issue TAURI-RUST-2E — cloud API error (400 Bad Request): {"error":{"message":"Tool names must be unique.","type":"invalid_request_error","param":null,"code":"invalid_request_error"}} — 164 events / 14d on the tauri-rust project.

  • OpenAI's chat-completions schema requires every entry in the tools array to have a unique function.name. Sending two entries with the same name fails the entire request with 400 — the chat turn dies and the user sees a hard error.

  • dedup_visible_tool_specs already runs at the session builder layer (src/openhuman/agent/harness/session/builder.rs:44) and at every visible-tool-set materialisation point (initial build, post-Composio refresh, scope-filter change). However, several call paths reach the Provider trait without going through that layer:

  • Sub-agent spawn paths assemble their own tool sets and call Provider::chat directly.
  • Triage / escalation flows can splice tools from multiple sources before the request is constructed.
  • Future callers that bypass the session pipeline (no compiler guard prevents this).

Any of these can hand OpenAiCompatibleProvider::chat a tools: &[ToolSpec] slice with duplicate name values, and the pre-fix convert_tool_specs blindly serialised all of them — producing the 400.

Solution

src/openhuman/inference/provider/compatible.rs — replace the .map(...).collect() pipeline in convert_tool_specs with a single pass that tracks seen names:

fn convert_tool_specs(
    tools: Option<&[crate::openhuman::tools::ToolSpec]>,
) -> Option<Vec<serde_json::Value>> {
    tools.map(|items| {
        let mut seen: std::collections::HashSet<&str> =
            std::collections::HashSet::with_capacity(items.len());
        let mut dropped: Vec<&str> = Vec::new();
        let mut out: Vec<serde_json::Value> = Vec::with_capacity(items.len());
        for tool in items {
            if !seen.insert(tool.name.as_str()) {
                dropped.push(tool.name.as_str());
                continue;
            }
            out.push(serde_json::json!({
                "type": "function",
                "function": {
                    "name": tool.name,
                    "description": tool.description,
                    "parameters": tool.parameters,
                }
            }));
        }
        if !dropped.is_empty() {
            log::warn!(
                "[providers][compatible] dropped {} duplicate tool spec(s) at wire \
                 boundary (TAURI-RUST-2E): {:?}",
                dropped.len(),
                dropped
            );
        }
        out
    })
}

Design choices

  • First-wins — matches the upstream dedup_visible_tool_specs convention at builder.rs:48-50. Same semantics across both layers, no surprise reorderings.
  • Wire boundary, not just session — the bug exists because not every caller goes through the session dedup. Fixing at the single chokepoint every provider call funnels through (convert_tool_specs) makes future callers safe by construction.
  • log::warn! (not debug!) — when this branch fires, an upstream code path produced duplicates and bypassed the session dedup. That's a real bug worth investigating, not log noise. Fire rate is bounded by the upstream Sentry frequency (≤164/14d).
  • HashSet<&str>, not HashSet — borrows from the input slice; no allocations for the dedup itself, just one alloc each for out and dropped.
  • No change to non-duplicate behaviour — unique inputs produce byte-identical output to pre-fix. Verified by convert_tool_specs_passes_through_unique_names.

Submission Checklist

  • Tests added or updated (happy path + at least one failure / edge case)
  • Diff coverage ≥ 80% — pending local pnpm test:rust run
  • Coverage matrix updated — N/A: bug-fix behaviour-only change, no new feature row
  • No new external network dependencies introduced
  • Manual smoke checklist updated — N/A: no release-cut surface touched
  • Linked issue closed via Closes #NNN — N/A: Sentry-tracked issue, no GitHub issue yet

Impact

  • Runtime: desktop (Rust core). No mobile / web / CLI surface change.

  • Performance: one HashSet<&str> per request, sized to items.len(). O(n) — same complexity as the previous iter().map().collect(). No additional allocations on the unique-only happy path.

  • Security: none — no new network surface, no new inputs trusted, no auth path touched.

  • Migration / compatibility: none. Trait surface untouched. Output for unique inputs is byte-identical to pre-fix. Previously-failing 400 turns now succeed with the first-occurrence definition of any duplicated tool. Upstream session dedup still runs first — this PR is defence-in-depth, not a replacement.

Related

Closes: TAURI-RUST-2E

Summary by CodeRabbit

  • Bug Fixes

    • Duplicate tool definitions are now automatically removed when sending tool specifications to compatible providers; discarded duplicates are reported in logs.
  • Tests

    • Added comprehensive tests to ensure tool specification deduplication preserves the first occurrence and collapses duplicates across many entries.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1c2d935c-3e17-4078-a38f-e584ca0be187

📥 Commits

Reviewing files that changed from the base of the PR and between af40687 and 00a36d7.

📒 Files selected for processing (2)
  • src/openhuman/inference/provider/compatible.rs
  • src/openhuman/inference/provider/compatible_tests.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/openhuman/inference/provider/compatible.rs

📝 Walkthrough

Walkthrough

The PR adds deduplication logic to convert_tool_specs in the OpenAI-compatible provider to drop duplicate tool specs by name before serialization, logs warnings when duplicates are discarded, and validates the behavior with unit tests using a first-occurrence-wins strategy.

Changes

Tool Spec Deduplication at Provider Boundary

Layer / File(s) Summary
Tool spec deduplication with HashSet and logging
src/openhuman/inference/provider/compatible.rs, src/openhuman/inference/provider/compatible_tests.rs
convert_tool_specs deduplicates input tool specs by name using a HashSet, preserves first occurrence, drops later duplicates with a warning log, and returns the deduplicated serialized output. Tests verify None/empty handling, unique name passthrough, and deduplication across single and multiple duplicate scenarios.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • tinyhumansai/openhuman#2665: Related deduplication of tool specs by tool.name—that PR restores a test asserting deduplication at the run_tool_call_loop boundary.
  • tinyhumansai/openhuman#2485: Implements the same deduplication (first occurrence wins) in the sub-agent runner spec assembly path.

Suggested labels

rust-core, bug

Suggested reviewers

  • M3gA-Mind
  • oxoxDev

Poem

🐰 I hopped through specs one night,
Saw names repeated, blocking light,
A HashSet peeked — one kept, rest gone,
Warnings whispered at the dawn —
Now calls are tidy, names aligned, all right.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: deduplicating tool specs at the provider wire boundary to prevent OpenAI 400 errors from duplicate tool names.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@YellowSnnowmann YellowSnnowmann marked this pull request as ready for review May 28, 2026 12:34
@YellowSnnowmann YellowSnnowmann requested a review from a team May 28, 2026 12:34
@coderabbitai coderabbitai Bot added rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. bug labels May 28, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/openhuman/inference/provider/compatible_tests.rs (1)

1593-1618: 💤 Low value

Consider verifying parameters field in first-wins assertion.

The test correctly validates that the first occurrence's description survives deduplication. For completeness, you could also assert that parameters from the first occurrence is retained (not the duplicate's {"different": true}).

♻️ Optional enhancement
     assert_eq!(
         out[0]["function"]["description"].as_str().unwrap(),
         "alpha desc",
         "first occurrence's description must survive (first-wins)"
     );
+    assert_eq!(
+        out[0]["function"]["parameters"]["type"].as_str().unwrap(),
+        "object",
+        "first occurrence's parameters must survive (first-wins)"
+    );
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/inference/provider/compatible_tests.rs` around lines 1593 -
1618, Update the test convert_tool_specs_dedups_duplicate_names_first_wins to
also assert that the first occurrence's `parameters` survive deduplication:
after building `specs` and calling
OpenAiCompatibleProvider::convert_tool_specs(Some(&specs)) -> out, add an
assertion that out[0]["function"]["parameters"] equals the parameters from the
original spec("alpha") (and/or assert it does not equal second_alpha.parameters
/ the serde_json!({"different": true}) value). This ensures `parameters` follow
the same first-wins behavior as `description`.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/openhuman/inference/provider/compatible_tests.rs`:
- Around line 1593-1618: Update the test
convert_tool_specs_dedups_duplicate_names_first_wins to also assert that the
first occurrence's `parameters` survive deduplication: after building `specs`
and calling OpenAiCompatibleProvider::convert_tool_specs(Some(&specs)) -> out,
add an assertion that out[0]["function"]["parameters"] equals the parameters
from the original spec("alpha") (and/or assert it does not equal
second_alpha.parameters / the serde_json!({"different": true}) value). This
ensures `parameters` follow the same first-wins behavior as `description`.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 63e714ec-9b72-4efe-9866-256226f7ca3b

📥 Commits

Reviewing files that changed from the base of the PR and between 8365e0c and af40687.

📒 Files selected for processing (2)
  • src/openhuman/inference/provider/compatible.rs
  • src/openhuman/inference/provider/compatible_tests.rs

coderabbitai[bot]
coderabbitai Bot previously approved these changes May 28, 2026
Copy link
Copy Markdown
Contributor

@CodeGhost21 CodeGhost21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix for a real production issue. The implementation is correct and the test coverage is thorough.

Walkthrough: convert_tool_specs previously passed all tool specs to the wire verbatim, causing 400 errors when duplicate names were present. This PR adds a single-pass dedup using HashSet<&str> (borrows from input, no extra allocations on the happy path) with a log::warn! when duplicates are actually dropped. First-wins semantics match the upstream dedup_visible_tool_specs convention at session/builder.rs:44. The 5 unit tests cover None input, empty slice, unique passthrough, first-wins dedup verifying the surviving entry's description, and many-duplicates collapse.

One minor note: the log message embeds the Sentry issue ID TAURI-RUST-2E directly in the string. That's good for current traceability, but it will silently go stale if the issue is re-keyed or the project migrates. A plain-language description plus the ID as a secondary annotation (or a code comment on the warn! call) would age better. Not a blocker.

Overall the fix is well-reasoned, well-tested, and the performance analysis in the PR description is accurate. No issues with the implementation.

Comment thread src/openhuman/inference/provider/compatible.rs
CodeGhost21
CodeGhost21 previously approved these changes May 28, 2026
Copy link
Copy Markdown
Contributor

@CodeGhost21 CodeGhost21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, nice work!

graycyrus
graycyrus previously approved these changes May 28, 2026
Copy link
Copy Markdown
Contributor

@graycyrus graycyrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid fix for a real production problem. The dedup logic is exactly right: single-pass, first-wins, HashSet<&str> borrows the input slice so no extra allocations on the happy path — cleaner than the HashSet<String> in dedup_visible_tool_specs upstream since ownership isn't needed here.

The log::warn! on the dropped-names branch is the right call. When this fires it means a caller bypassed the session-layer dedup — that's a real upstream bug worth investigating, not log noise. Rate is bounded by whatever's producing the duplicates.

Five tests cover all the meaningful cases. The first-wins assertion (checking description of the surviving entry) is exactly the right thing to verify.

Two small observations, not blocking:

The Sentry issue ID TAURI-RUST-2E is hardcoded in the log message. That's fine for now, but if the issue is closed or migrated the string becomes stale. A comment linking to the issue is usually more durable than embedding the ID in a log string — but this is a style preference, not a bug.

The dropped Vec<&str> is allocated on every call even when nothing is dropped. Since Vec::new() doesn't heap-allocate until the first push, this is effectively free — but Vec::new() (no with_capacity) would make the zero-drop path even more explicit. Again, not a real concern at this call frequency.

CI is green across the board. Good work tracking down the bypass paths and closing them at the chokepoint.

…bleProvider and log warnings for dropped entries
…tibleProvider to ensure proper handling of input and deduplication
@M3gA-Mind M3gA-Mind force-pushed the fix/compatible-dedup-tool-specs-at-wire branch from af40687 to 00a36d7 Compare May 28, 2026 22:19
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

Actionable comments posted: 0

Copy link
Copy Markdown
Contributor

@M3gA-Mind M3gA-Mind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Reviewed the diff thoroughly:

  • Implementation (convert_tool_specs): correct first-wins dedup using HashSet<&str>, mirrors the existing dedup_visible_tool_specs convention in session/builder.rs exactly. Single-pass O(n) with pre-allocated capacity. Log::warn! on drop is correctly scoped (no per-call spam when empty).
  • Tests: 5 cases covering None, empty slice, unique passthrough, first-wins semantics (verifies description from first occurrence survives), and many-duplicates. All pass locally.
  • Conflict resolution: rebased onto upstream/main, kept both upstream's reasoning_content round-trip tests and this PR's convert_tool_specs dedup tests.
  • CI: all checks pass (coverage gate ≥ 80%, all E2E, Rust + TS quality).

@M3gA-Mind M3gA-Mind merged commit a211cac into tinyhumansai:main May 28, 2026
49 of 74 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants