Fix: Add defensive filtering for duplicate observations and diagnostic logging#1784
Fix: Add defensive filtering for duplicate observations and diagnostic logging#1784jpshackelford wants to merge 3 commits intomainfrom
Conversation
…c logging This PR addresses the recovery aspect of issue #1782 where duplicate ObservationEvents with the same tool_call_id cause LLM API errors and put conversations in an unrecoverable state. Changes: - Add duplicate observation filtering in View.filter_unmatched_tool_calls() to enable recovery from corrupted event streams - Add diagnostic logging to trace root cause: - DEBUG: get_unmatched_actions() logs what actions are identified as unmatched - INFO: Enhanced pending actions log with tool_call_ids - DEBUG: ObservationEvent creation logs tool_call_id - WARNING: When duplicate observations are filtered - Add 4 comprehensive tests for duplicate observation filtering The fix operates at the View level, not persistence, so it: - Recovers existing corrupted conversations immediately - Prevents the specific duplicate observation class of issues - Provides visibility via logging for root cause investigation Co-authored-by: openhands <openhands@all-hands.dev>
Coverage Report •
|
||||||||||||||||||||||||||||||||||||||||
…icates at source This addresses the root cause by ensuring get_unmatched_actions() checks both action_id AND tool_call_id when determining if an action has an observation. This prevents re-execution of actions that already have observations, even if there's a mismatch in action_id references. Combined with the defensive View filtering, this provides: 1. Source prevention: get_unmatched_actions() won't return actions that already have observations (matched by tool_call_id) 2. Recovery: View.filter_unmatched_tool_calls() filters duplicates that somehow still get persisted Co-authored-by: openhands <openhands@all-hands.dev>
|
Looks like there are a few issues preventing this PR from being merged!
If you'd like me to help, just leave a comment, like Feel free to include any additional details that might help me get this PR into a better state. You can manage your notification settings |
Co-authored-by: openhands <openhands@all-hands.dev>
| ActionEvent in a batch is filtered out, all ActionEvents in that batch | ||
| are also filtered out. | ||
|
|
||
| Additionally filters out duplicate ObservationEvents with the same |
There was a problem hiding this comment.
We've got a PR in progress (#1649) that's trying to untangle all the structural assumptions and loops in the view. It'd be helpful for that initiative if we weren't tacking more functionality onto the existing functions.
Can we break this functionality out into a separate function (something like filter_duplicate_events) that gets called first thing in View.from_events?
|
[Automatic Post]: It has been a while since there was any activity on this PR. @jpshackelford, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up. |
Summary
This PR addresses issue #1782 where duplicate
ObservationEvents with the sametool_call_idcause LLM API errors and put conversations in an unrecoverable state.Fixes #1782
Problem
When a conversation is resumed after being paused/finished, a duplicate
ObservationEventcan be created with the sametool_call_idas an existing observation. This causes the Anthropic API to reject requests with:Since the duplicate observation is persisted in the event stream, every subsequent LLM call fails - the conversation becomes permanently stuck.
Solution
1. Source Prevention:
get_unmatched_actions()fixThe root cause is that
get_unmatched_actions()only checkedaction_idto match actions with observations. Now it also checkstool_call_id:This prevents duplicates from being created by ensuring actions with existing observations (matched by
tool_call_id) are not re-executed.2. Recovery:
filter_unmatched_tool_calls()fixAdded duplicate observation filtering that tracks seen
tool_call_ids and skips subsequent observations with the same ID:This recovers existing corrupted conversations by filtering duplicates at View construction time.
Combined Protection
get_unmatched_actions()tool_call_idwhen identifying unmatched actionsfilter_unmatched_tool_calls()3. Diagnostic Logging
Added logging at key points to help trace issues:
get_unmatched_actions()agent.step()pending actions_execute_action_event()filter_unmatched_tool_calls()Changes
openhands-sdk/openhands/sdk/conversation/state.py- Fixget_unmatched_actions()to also checktool_call_id+ debug loggingopenhands-sdk/openhands/sdk/context/view.py- Duplicate observation filtering + enhanced warningopenhands-sdk/openhands/sdk/agent/agent.py- Enhanced pending actions log, observation creation logtests/sdk/context/test_view.py- 4 comprehensive tests for duplicate filteringTesting
All tests pass, including:
test_filter_unmatched_tool_calls_duplicate_observationstest_filter_unmatched_tool_calls_multiple_duplicate_observationstest_filter_unmatched_tool_calls_independent_tool_calls_with_duplicatestest_filter_unmatched_tool_calls_duplicate_none_tool_call_idtest_getting_unmatched_events(confirmation mode test)Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.12-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:5e15eba-pythonRun
All tags pushed for this build
About Multi-Architecture Support
5e15eba-python) is a multi-arch manifest supporting both amd64 and arm645e15eba-python-amd64) are also available if needed