Skip to content

fix: prevent tool-call preface text leaks#9075

Open
vivy1024 wants to merge 2 commits into
AstrBotDevs:masterfrom
vivy1024:fix/tool-call-content-leak
Open

fix: prevent tool-call preface text leaks#9075
vivy1024 wants to merge 2 commits into
AstrBotDevs:masterfrom
vivy1024:fix/tool-call-content-leak

Conversation

@vivy1024

@vivy1024 vivy1024 commented Jun 29, 2026

Copy link
Copy Markdown

Summary

  • Suppress assistant preface text on turns that include tool calls so internal tool-use narration is not sent as a user-visible LLM result.
  • Normalize OpenAI-compatible text-part repr strings like [{text=..., type=text}] to plain text, and drop empty nested repr shells.
  • Add regression coverage for both the tool-loop leak and provider content normalization.

Test plan

  • uv run pytest tests/test_tool_loop_agent_runner.py -k "tool_call_assistant_preface_is_not_sent_as_llm_result"
  • uv run pytest tests/test_openai_source.py -k "normalize_content_handles_text_part_repr_string or normalize_content_drops_empty_nested_text_part_repr_string"
  • uv run ruff format astrbot/core/agent/runners/tool_loop_agent_runner.py astrbot/core/provider/sources/openai_source.py tests/test_tool_loop_agent_runner.py tests/test_openai_source.py --check
  • uv run ruff check astrbot/core/agent/runners/tool_loop_agent_runner.py astrbot/core/provider/sources/openai_source.py tests/test_tool_loop_agent_runner.py tests/test_openai_source.py

Note: Running the full related files with uv run pytest tests/test_tool_loop_agent_runner.py tests/test_openai_source.py passed the tool-loop file but hit 3 pre-existing Windows path separator assertions in test_openai_source.py (file_uri_to_path expects / while Windows returns \).

🤖 Generated with Claude Code

Summary by Sourcery

Prevent internal assistant preface text from being surfaced as user-visible LLM output during tool-call turns and improve normalization of OpenAI-compatible content representations.

Bug Fixes:

  • Suppress LLM reasoning and completion text outputs on steps that invoke tools so assistant preface narration does not leak to users.
  • Normalize OpenAI provider content strings that use text-part repr formatting into plain text and remove empty nested text shells.

Tests:

  • Add regression tests ensuring tool-call assistant preface text is not returned as an LLM result.
  • Add regression tests verifying _normalize_content handles text-part repr strings and drops empty nested text-part repr shells.

@dosubot dosubot Bot added size:M This PR changes 30-99 lines, ignoring generated files. area:core The bug / feature is about astrbot's core, backend area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. labels Jun 29, 2026

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 4 issues, and left some high level feedback:

  • In tool_loop_agent_runner.step, the new if not llm_resp.tools_call_name guard now controls all llm_result emissions; consider restructuring this block (e.g., early-return or a dedicated helper for tool-call turns) to make the separation between internal tool-call narration and user-visible results more explicit and easier to maintain.
  • The string-pattern loop in _normalize_content that strips [{text=..., type=text}] shells is quite heuristic and relies on a magic length cap (8192); consider extracting this into a small helper with clearer intent and tighter conditions so it doesn’t accidentally rewrite legitimate content that happens to start/end with the same tokens.
  • In _normalize_content, the early return for repr-shell strings bypasses the subsequent JSON-array normalization logic; if these repr strings can sometimes contain list-like or mixed content, it may be worth passing the simplified string through the same normalization path to keep behavior consistent across inputs.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `tool_loop_agent_runner.step`, the new `if not llm_resp.tools_call_name` guard now controls all `llm_result` emissions; consider restructuring this block (e.g., early-return or a dedicated helper for tool-call turns) to make the separation between internal tool-call narration and user-visible results more explicit and easier to maintain.
- The string-pattern loop in `_normalize_content` that strips `[{text=..., type=text}]` shells is quite heuristic and relies on a magic length cap (8192); consider extracting this into a small helper with clearer intent and tighter conditions so it doesn’t accidentally rewrite legitimate content that happens to start/end with the same tokens.
- In `_normalize_content`, the early return for repr-shell strings bypasses the subsequent JSON-array normalization logic; if these repr strings can sometimes contain list-like or mixed content, it may be worth passing the simplified string through the same normalization path to keep behavior consistent across inputs.

## Individual Comments

### Comment 1
<location path="tests/test_tool_loop_agent_runner.py" line_range="511-520" />
<code_context>


+@pytest.mark.asyncio
+async def test_tool_call_assistant_preface_is_not_sent_as_llm_result(
+    runner, provider_request, mock_tool_executor, mock_hooks, mock_provider
+):
+    """Assistant preface text on a tool-call turn must stay internal."""
+    mock_provider.should_call_tools = True
+    mock_provider.max_calls_before_normal_response = 1
+
+    await runner.reset(
+        provider=mock_provider,
+        request=provider_request,
+        run_context=ContextWrapper(context=None),
+        tool_executor=mock_tool_executor,
+        agent_hooks=mock_hooks,
+        streaming=False,
+    )
+
+    responses = []
+    async for response in runner.step_until_done(3):
+        responses.append(response)
+
+    llm_texts = [
+        response.data["chain"].get_plain_text()
+        for response in responses
+        if response.type == "llm_result"
+    ]
+    assert "我需要使用工具来帮助您" not in llm_texts
+    assert llm_texts == ["这是我的最终回答"]
+
+
</code_context>
<issue_to_address>
**suggestion (testing):** Add an assertion that no llm_result is emitted on the tool-call turn itself, to more directly prove the preface text is suppressed.

Currently the test only verifies suppression indirectly by checking that collected `llm_result` texts equal the final answer. To better guard against regressions, also assert that exactly one `llm_result` is emitted (or that the tool-call turn produces none). For instance, inspect `responses` so that any response corresponding to a tool call from the provider yields no `llm_result`, directly tying the test behavior to the bug this PR fixes.

Suggested implementation:

```python
    responses = []
    async for response in runner.step_until_done(3):
        responses.append(response)

    # Collect all llm_result responses and their texts
    llm_results = [response for response in responses if response.type == "llm_result"]
    llm_texts = [
        response.data["chain"].get_plain_text()
        for response in llm_results
    ]

    # There must be exactly one llm_result emitted, corresponding to the final answer.
    # This ensures that no llm_result is emitted on the earlier tool-call turn.
    assert len(llm_results) == 1

    # Assistant preface text must never surface in llm_result outputs.
    assert "我需要使用工具来帮助您" not in llm_texts
    assert llm_texts == ["这是我的最终回答"]

```

To more directly tie the assertion to the tool-call turn (as suggested in the review), you can:
1. Identify responses that correspond to tool calls (for example, by checking a specific `response.type` such as `"tool_call"` or a similar field your framework uses).
2. Add an assertion that there is no `llm_result` response at the same step/turn as those tool-call responses. For instance, group responses by `response.step_index` or equivalent, and assert that steps containing tool-call responses have no accompanying `llm_result`.
These additional checks should be implemented once you confirm the concrete shape and fields of the `response` objects in your test harness.
</issue_to_address>

### Comment 2
<location path="tests/test_tool_loop_agent_runner.py" line_range="527-530" />
<code_context>
+    async for response in runner.step_until_done(3):
+        responses.append(response)
+
+    llm_texts = [
+        response.data["chain"].get_plain_text()
+        for response in responses
+        if response.type == "llm_result"
+    ]
+    assert "我需要使用工具来帮助您" not in llm_texts
+    assert llm_texts == ["这是我的最终回答"]
+
+
</code_context>
<issue_to_address>
**suggestion (testing):** Consider asserting the sequence of response types to document the intended control flow of tool calls vs final answer.

Beyond asserting the text values, also assert the ordered sequence of `response.type` values (e.g., tool-related types followed by a single `llm_result`). This makes the test more robust against regressions in the tool loop and clarifies that the intermediate assistant preface must not be an `llm_result`, while the final answer must be.

```suggestion
    responses = []
    async for response in runner.step_until_done(3):
        responses.append(response)

    # Assert control flow: tool-related responses first, then a single final llm_result
    response_types = [response.type for response in responses]
    assert response_types[-1] == "llm_result"
    assert "llm_result" not in response_types[:-1]

    llm_texts = [
        response.data["chain"].get_plain_text()
        for response in responses
        if response.type == "llm_result"
    ]
    assert "我需要使用工具来帮助您" not in llm_texts
    assert llm_texts == ["这是我的最终回答"]

```
</issue_to_address>

### Comment 3
<location path="tests/test_openai_source.py" line_range="63-66" />
<code_context>
     )


+def test_normalize_content_handles_text_part_repr_string():
+    raw = "[{text=I will use the file reading tool., type=text}]"
+
+    assert ProviderOpenAIOfficial._normalize_content(raw) == (
+        "I will use the file reading tool."
+    )
</code_context>
<issue_to_address>
**suggestion (testing):** Add a companion test covering the strip=False case for text-part repr strings.

The tests currently cover `_normalize_content` with `strip=True`, but the function has different behavior when `strip=False`. Please add a test that uses a `raw` repr string with leading/trailing whitespace and asserts the result of `_normalize_content(raw, strip=False)` to verify normalization when stripping is disabled.
</issue_to_address>

### Comment 4
<location path="tests/test_openai_source.py" line_range="71-74" />
<code_context>
+    )
+
+
+def test_normalize_content_drops_empty_nested_text_part_repr_string():
+    raw = "[{text=[{text=[{text=, type=text}], type=text}], type=text}]"
+
+    assert ProviderOpenAIOfficial._normalize_content(raw) == ""
+
+
</code_context>
<issue_to_address>
**suggestion (testing):** Cover a nested repr string where the innermost text is non-empty to ensure we return the actual text, not an empty string.

Please also add a nested repr case with non-empty content, e.g. `[{text=[{text=[{text=hello, type=text}], type=text}], type=text}]`, and assert that `_normalize_content` returns `"hello"`. This will confirm the loop handles multiple layers correctly and doesn’t strip valid text.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread tests/test_tool_loop_agent_runner.py
Comment thread tests/test_tool_loop_agent_runner.py
Comment thread tests/test_openai_source.py
Comment thread tests/test_openai_source.py

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request prevents assistant preface text from being sent as an LLM result during tool-call turns in the tool loop agent runner, and adds handling in the OpenAI source to normalize non-standard string representations of text parts (e.g., [{text=..., type=text}]). Corresponding unit tests have been added for both changes. The reviewer suggested extracting magic strings and length limits into named constants within the content normalization logic to improve readability and maintainability.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +772 to +782
check_content = raw_content.strip()
while (
check_content.startswith("[{text=")
and check_content.endswith(", type=text}]")
and len(check_content) < 8192
):
check_content = check_content[
len("[{text=") : -len(", type=text}]")
].strip()
if check_content != raw_content.strip():
return check_content.strip() if strip else check_content

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better readability and maintainability, it's a good practice to extract magic strings and numbers into named constants. This makes the code easier to understand and modify in the future if these values need to be changed.

            check_content = raw_content.strip()
            # Handle non-standard repr-like strings from some providers, e.g., "[{text=..., type=text}]"
            REPR_LIKE_PREFIX = "[{text=""
            REPR_LIKE_SUFFIX = ", type=text}]"
            REPR_LIKE_MAX_LEN = 8192
            while (
                check_content.startswith(REPR_LIKE_PREFIX)
                and check_content.endswith(REPR_LIKE_SUFFIX)
                and len(check_content) < REPR_LIKE_MAX_LEN
            ):
                check_content = check_content[
                    len(REPR_LIKE_PREFIX) : -len(REPR_LIKE_SUFFIX)
                ].strip()
            if check_content != raw_content.strip():
                return check_content.strip() if strip else check_content

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:core The bug / feature is about astrbot's core, backend area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant