Cap event history scanned by StuckDetector#1829
Conversation
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
The reconcile() call after run completion was removed in PR #1820, but this caused a race condition where events emitted during the final moments of the run could be lost if the WebSocket didn't deliver them before run() returned. This was observed in CI where test_events_not_lost_during_client_disconnection failed because the client only received 3 events while the REST API had 6 events - the ActionEvent(finish) and ObservationEvent(finish) were missing. The fix restores the reconcile() call in _wait_for_run_completion() to ensure all events are captured after run completion. This is safe because reconcile() is idempotent and will only add events that are missing from the client's cache. Fixes the flaky test failure in PR #1829. Co-authored-by: openhands <openhands@all-hands.dev>
The reconcile() call after run completion was removed in PR #1820, but this caused a race condition where events emitted during the final moments of the run could be lost if the WebSocket didn't deliver them before run() returned. This was observed in CI where test_events_not_lost_during_client_disconnection failed because the client only received 3-4 events while the REST API had 6 events - the ActionEvent(finish) and ObservationEvent(finish) were missing. Reproduction: - Inject a 3s delay in the WebSocket callback for finish events - Run the conversation with a finish tool call - Observe that without the reconcile() call, the client is missing events The fix restores the reconcile() call in _wait_for_run_completion() to ensure all events are captured after run completion. This is safe because reconcile() is idempotent and will only add events that are missing from the client's cache. Fixes the flaky test failure in PR #1829. Co-authored-by: openhands <openhands@all-hands.dev>
This PR fixes the race condition where events emitted during the final moments of a run could be lost if the WebSocket didn't deliver them before run() returned. ## Root Cause The race condition occurs when: 1. Server emits events (ActionEvent, ObservationEvent) 2. Client polls and sees 'finished' status 3. run() returns before WebSocket delivers those events ## Solution Instead of using the expensive reconcile() which fetches ALL events, we introduce reconcile_recent() which only fetches events after the last known timestamp. This is much more efficient for long conversations. The fix: 1. Added reconcile_recent() method to RemoteEventsList that uses the timestamp__gte filter to only fetch recent events 2. Call reconcile_recent() after run completion to catch any events that were missed by WebSocket ## Reproduction Added test_event_loss_repro.py which reliably reproduces the race condition by injecting a 3s delay in the WebSocket callback for finish events. Without the fix, the test fails because the client is missing ActionEvent(finish) and ObservationEvent(finish). ## Testing - All cross tests pass - The reproduction test passes with the fix Fixes the flaky test failure in PR #1829. Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
all-hands-bot
left a comment
There was a problem hiding this comment.
Overall, this is a solid optimization that prevents memory issues with large event histories. The implementation correctly limits the scan window, and the test coverage is thorough. However, there are a few important considerations around the behavioral change and documentation.
|
|
||
| events = events[last_user_msg_index + 1 :] | ||
| if last_user_msg_index != -1: | ||
| events = events[last_user_msg_index + 1 :] |
There was a problem hiding this comment.
🟠 Important: This is a significant behavioral change from the original implementation. Previously, when no user message was found, the function would log a warning and return False (not stuck). Now it proceeds to check ALL events in the window for stuck patterns.
Implications:
- Stuck detection can now trigger even without a recent user message in the 20-event window
- The warning log is removed, reducing debugging visibility
- If the actual last user message is beyond the 20-event window, the detector will analyze events that include previous user interactions
While test_is_stuck_without_recent_user_message_still_detects_loop validates the new behavior works, this change should be:
- Explicitly mentioned in the PR description as a behavioral change
- Documented in the docstring (e.g., "If no user message is found in the recent window, all events in the window are analyzed")
Consider whether removing the warning log is intentional or if it should be adjusted to log when operating without a recent user message boundary.
There was a problem hiding this comment.
^^ This is incorrect, mostly. The point of looking for user message is to make sure the agent doesn't immediately stop because of an older stuck. The user sending a message is a cut off point (we don't want to look before a user message).
If there is no user message, then there is no previously stuck either. It's OK to look at all 20.
And if there is a previously stuck, that's strange, but it's still OK to look now and trigger a stop.
Co-authored-by: OpenHands Bot <contact@all-hands.dev>
Co-authored-by: OpenHands Bot <contact@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
xingyaoww
left a comment
There was a problem hiding this comment.
LGTM and seems pretty stragiht forward to me!
|
Looks like there are a few issues preventing this PR from being merged!
If you'd like me to help, just leave a comment, like Feel free to include any additional details that might help me get this PR into a better state. You can manage your notification settings |
Summary
Changes
StuckDetector.is_stuck()now scans only a recent, fixed-size window of events instead oflist(self.state.events).Testing
uv run pytest -q tests/cross/test_stuck_detector.pyuv run pre-commit run -aAddresses in part #1824
@enyst can click here to continue refining the PR
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.12-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:f5dda1e-pythonRun
All tags pushed for this build
About Multi-Architecture Support
f5dda1e-python) is a multi-arch manifest supporting both amd64 and arm64f5dda1e-python-amd64) are also available if needed