Skip to content

Fix: Add defensive filtering for duplicate observations and diagnostic logging#1784

Draft
jpshackelford wants to merge 3 commits intomainfrom
fix/duplicate-observation-recovery
Draft

Fix: Add defensive filtering for duplicate observations and diagnostic logging#1784
jpshackelford wants to merge 3 commits intomainfrom
fix/duplicate-observation-recovery

Conversation

@jpshackelford
Copy link
Contributor

@jpshackelford jpshackelford commented Jan 22, 2026

Summary

This PR addresses issue #1782 where duplicate ObservationEvents with the same tool_call_id cause LLM API errors and put conversations in an unrecoverable state.

Fixes #1782

Problem

When a conversation is resumed after being paused/finished, a duplicate ObservationEvent can be created with the same tool_call_id as an existing observation. This causes the Anthropic API to reject requests with:

litellm.BadRequestError: ... "messages.59: \`tool_use\` ids were found without \`tool_result\` blocks immediately after: toolu_01CGASf7KnafqkuQMuLstHi4..."

Since the duplicate observation is persisted in the event stream, every subsequent LLM call fails - the conversation becomes permanently stuck.

Solution

1. Source Prevention: get_unmatched_actions() fix

The root cause is that get_unmatched_actions() only checked action_id to match actions with observations. Now it also checks tool_call_id:

observed_action_ids: set[str] = set()
observed_tool_call_ids: set[str] = set()  # NEW

for event in reversed(events):
    if isinstance(event, (ObservationEvent, UserRejectObservation)):
        observed_action_ids.add(event.action_id)
        if event.tool_call_id is not None:  # NEW
            observed_tool_call_ids.add(event.tool_call_id)
    elif isinstance(event, ActionEvent):
        is_observed = (
            event.id in observed_action_ids
            or (event.tool_call_id is not None  # NEW
                and event.tool_call_id in observed_tool_call_ids)
        )

This prevents duplicates from being created by ensuring actions with existing observations (matched by tool_call_id) are not re-executed.

2. Recovery: filter_unmatched_tool_calls() fix

Added duplicate observation filtering that tracks seen tool_call_ids and skips subsequent observations with the same ID:

if event.tool_call_id in seen_observation_tool_call_ids:
    logger.warning(f"DUPLICATE_OBSERVATION_FILTERED: tool_call_id={event.tool_call_id}...")
    continue
seen_observation_tool_call_ids.add(event.tool_call_id)

This recovers existing corrupted conversations by filtering duplicates at View construction time.

Combined Protection

Layer What it does Protects against
get_unmatched_actions() Checks tool_call_id when identifying unmatched actions New duplicates being created
filter_unmatched_tool_calls() Filters duplicate observations in View Existing corrupted conversations

3. Diagnostic Logging

Added logging at key points to help trace issues:

Location Level Purpose
get_unmatched_actions() DEBUG Shows what actions are identified as unmatched
agent.step() pending actions INFO Logs tool_call_ids when pending actions are re-executed
_execute_action_event() DEBUG Logs when observations are created
filter_unmatched_tool_calls() WARNING Alerts when duplicates are filtered

Changes

  • openhands-sdk/openhands/sdk/conversation/state.py - Fix get_unmatched_actions() to also check tool_call_id + debug logging
  • openhands-sdk/openhands/sdk/context/view.py - Duplicate observation filtering + enhanced warning
  • openhands-sdk/openhands/sdk/agent/agent.py - Enhanced pending actions log, observation creation log
  • tests/sdk/context/test_view.py - 4 comprehensive tests for duplicate filtering

Testing

All tests pass, including:

  • test_filter_unmatched_tool_calls_duplicate_observations
  • test_filter_unmatched_tool_calls_multiple_duplicate_observations
  • test_filter_unmatched_tool_calls_independent_tool_calls_with_duplicates
  • test_filter_unmatched_tool_calls_duplicate_none_tool_call_id
  • test_getting_unmatched_events (confirmation mode test)

Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:5e15eba-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-5e15eba-python \
  ghcr.io/openhands/agent-server:5e15eba-python

All tags pushed for this build

ghcr.io/openhands/agent-server:5e15eba-golang-amd64
ghcr.io/openhands/agent-server:5e15eba-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:5e15eba-golang-arm64
ghcr.io/openhands/agent-server:5e15eba-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:5e15eba-java-amd64
ghcr.io/openhands/agent-server:5e15eba-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:5e15eba-java-arm64
ghcr.io/openhands/agent-server:5e15eba-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:5e15eba-python-amd64
ghcr.io/openhands/agent-server:5e15eba-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:5e15eba-python-arm64
ghcr.io/openhands/agent-server:5e15eba-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:5e15eba-golang
ghcr.io/openhands/agent-server:5e15eba-java
ghcr.io/openhands/agent-server:5e15eba-python

About Multi-Architecture Support

  • Each variant tag (e.g., 5e15eba-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 5e15eba-python-amd64) are also available if needed

…c logging

This PR addresses the recovery aspect of issue #1782 where duplicate
ObservationEvents with the same tool_call_id cause LLM API errors
and put conversations in an unrecoverable state.

Changes:
- Add duplicate observation filtering in View.filter_unmatched_tool_calls()
  to enable recovery from corrupted event streams
- Add diagnostic logging to trace root cause:
  - DEBUG: get_unmatched_actions() logs what actions are identified as unmatched
  - INFO: Enhanced pending actions log with tool_call_ids
  - DEBUG: ObservationEvent creation logs tool_call_id
  - WARNING: When duplicate observations are filtered
- Add 4 comprehensive tests for duplicate observation filtering

The fix operates at the View level, not persistence, so it:
- Recovers existing corrupted conversations immediately
- Prevents the specific duplicate observation class of issues
- Provides visibility via logging for root cause investigation

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Contributor

github-actions bot commented Jan 22, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/agent
   agent.py2364979%90, 94, 129–130, 134–135, 137, 143–144, 146, 148–149, 154, 157, 161–164, 201–203, 231–232, 239–240, 272, 325–326, 328, 368, 507–508, 513, 525–526, 531–532, 551–552, 554, 582–583, 589–590, 594, 602–603, 648, 655
openhands-sdk/openhands/sdk/context
   view.py231398%254, 290, 456
openhands-sdk/openhands/sdk/conversation
   state.py179696%144, 283, 329–331, 462
TOTAL16125474470% 

…icates at source

This addresses the root cause by ensuring get_unmatched_actions() checks
both action_id AND tool_call_id when determining if an action has an
observation. This prevents re-execution of actions that already have
observations, even if there's a mismatch in action_id references.

Combined with the defensive View filtering, this provides:
1. Source prevention: get_unmatched_actions() won't return actions that
   already have observations (matched by tool_call_id)
2. Recovery: View.filter_unmatched_tool_calls() filters duplicates that
   somehow still get persisted

Co-authored-by: openhands <openhands@all-hands.dev>
@openhands-ai
Copy link

openhands-ai bot commented Jan 22, 2026

Looks like there are a few issues preventing this PR from being merged!

  • GitHub Actions are failing:
    • Pre-commit checks

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #1784 at branch `fix/duplicate-observation-recovery`

Feel free to include any additional details that might help me get this PR into a better state.

You can manage your notification settings

Co-authored-by: openhands <openhands@all-hands.dev>
ActionEvent in a batch is filtered out, all ActionEvents in that batch
are also filtered out.

Additionally filters out duplicate ObservationEvents with the same
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've got a PR in progress (#1649) that's trying to untangle all the structural assumptions and loops in the view. It'd be helpful for that initiative if we weren't tacking more functionality onto the existing functions.

Can we break this functionality out into a separate function (something like filter_duplicate_events) that gets called first thing in View.from_events?

@all-hands-bot
Copy link
Collaborator

[Automatic Post]: It has been a while since there was any activity on this PR. @jpshackelford, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up.

@enyst enyst added behavior-initiative This is related to the system prompt sections and LLM steering. and removed behavior-initiative This is related to the system prompt sections and LLM steering. labels Feb 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Duplicate ObservationEvent with same tool_call_id causes LLM API error on conversation resume

5 participants