Skip to content

Cap event history scanned by StuckDetector#1829

Merged
enyst merged 10 commits intomainfrom
openhands/stuck-detector-cap-events
Jan 27, 2026
Merged

Cap event history scanned by StuckDetector#1829
enyst merged 10 commits intomainfrom
openhands/stuck-detector-cap-events

Conversation

@enyst
Copy link
Collaborator

@enyst enyst commented Jan 26, 2026

Summary

  • Avoids materializing the full conversation event history when running stuck detection.

Changes

  • StuckDetector.is_stuck() now scans only a recent, fixed-size window of events instead of list(self.state.events).
  • Added a regression test ensuring stuck detection still triggers correctly even with a large backlog of older events.

Testing

  • uv run pytest -q tests/cross/test_stuck_detector.py
  • uv run pre-commit run -a

Addresses in part #1824

@enyst can click here to continue refining the PR


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:f5dda1e-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-f5dda1e-python \
  ghcr.io/openhands/agent-server:f5dda1e-python

All tags pushed for this build

ghcr.io/openhands/agent-server:f5dda1e-golang-amd64
ghcr.io/openhands/agent-server:f5dda1e-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:f5dda1e-golang-arm64
ghcr.io/openhands/agent-server:f5dda1e-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:f5dda1e-java-amd64
ghcr.io/openhands/agent-server:f5dda1e-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:f5dda1e-java-arm64
ghcr.io/openhands/agent-server:f5dda1e-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:f5dda1e-python-amd64
ghcr.io/openhands/agent-server:f5dda1e-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:f5dda1e-python-arm64
ghcr.io/openhands/agent-server:f5dda1e-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:f5dda1e-golang
ghcr.io/openhands/agent-server:f5dda1e-java
ghcr.io/openhands/agent-server:f5dda1e-python

About Multi-Architecture Support

  • Each variant tag (e.g., f5dda1e-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., f5dda1e-python-amd64) are also available if needed

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Contributor

github-actions bot commented Jan 26, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/conversation
   stuck_detector.py1252084%131, 135–136, 207, 216–217, 220, 245, 249, 253, 258–260, 273, 282, 306–307, 313–314, 320
TOTAL16429479770% 

Co-authored-by: openhands <openhands@all-hands.dev>
xingyaoww pushed a commit that referenced this pull request Jan 26, 2026
The reconcile() call after run completion was removed in PR #1820, but
this caused a race condition where events emitted during the final
moments of the run could be lost if the WebSocket didn't deliver them
before run() returned.

This was observed in CI where test_events_not_lost_during_client_disconnection
failed because the client only received 3 events while the REST API had 6
events - the ActionEvent(finish) and ObservationEvent(finish) were missing.

The fix restores the reconcile() call in _wait_for_run_completion() to
ensure all events are captured after run completion. This is safe because
reconcile() is idempotent and will only add events that are missing from
the client's cache.

Fixes the flaky test failure in PR #1829.

Co-authored-by: openhands <openhands@all-hands.dev>
xingyaoww pushed a commit that referenced this pull request Jan 26, 2026
The reconcile() call after run completion was removed in PR #1820, but
this caused a race condition where events emitted during the final
moments of the run could be lost if the WebSocket didn't deliver them
before run() returned.

This was observed in CI where test_events_not_lost_during_client_disconnection
failed because the client only received 3-4 events while the REST API had 6
events - the ActionEvent(finish) and ObservationEvent(finish) were missing.

Reproduction:
- Inject a 3s delay in the WebSocket callback for finish events
- Run the conversation with a finish tool call
- Observe that without the reconcile() call, the client is missing events

The fix restores the reconcile() call in _wait_for_run_completion() to
ensure all events are captured after run completion. This is safe because
reconcile() is idempotent and will only add events that are missing from
the client's cache.

Fixes the flaky test failure in PR #1829.

Co-authored-by: openhands <openhands@all-hands.dev>
xingyaoww pushed a commit that referenced this pull request Jan 26, 2026
This PR fixes the race condition where events emitted during the final
moments of a run could be lost if the WebSocket didn't deliver them
before run() returned.

## Root Cause

The race condition occurs when:
1. Server emits events (ActionEvent, ObservationEvent)
2. Client polls and sees 'finished' status
3. run() returns before WebSocket delivers those events

## Solution

Instead of using the expensive reconcile() which fetches ALL events,
we introduce reconcile_recent() which only fetches events after the
last known timestamp. This is much more efficient for long conversations.

The fix:
1. Added reconcile_recent() method to RemoteEventsList that uses the
   timestamp__gte filter to only fetch recent events
2. Call reconcile_recent() after run completion to catch any events
   that were missed by WebSocket

## Reproduction

Added test_event_loss_repro.py which reliably reproduces the race
condition by injecting a 3s delay in the WebSocket callback for
finish events. Without the fix, the test fails because the client
is missing ActionEvent(finish) and ObservationEvent(finish).

## Testing

- All cross tests pass
- The reproduction test passes with the fix

Fixes the flaky test failure in PR #1829.

Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
@enyst enyst marked this pull request as ready for review January 26, 2026 17:18
Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this is a solid optimization that prevents memory issues with large event histories. The implementation correctly limits the scan window, and the test coverage is thorough. However, there are a few important considerations around the behavioral change and documentation.


events = events[last_user_msg_index + 1 :]
if last_user_msg_index != -1:
events = events[last_user_msg_index + 1 :]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Important: This is a significant behavioral change from the original implementation. Previously, when no user message was found, the function would log a warning and return False (not stuck). Now it proceeds to check ALL events in the window for stuck patterns.

Implications:

  1. Stuck detection can now trigger even without a recent user message in the 20-event window
  2. The warning log is removed, reducing debugging visibility
  3. If the actual last user message is beyond the 20-event window, the detector will analyze events that include previous user interactions

While test_is_stuck_without_recent_user_message_still_detects_loop validates the new behavior works, this change should be:

  1. Explicitly mentioned in the PR description as a behavioral change
  2. Documented in the docstring (e.g., "If no user message is found in the recent window, all events in the window are analyzed")

Consider whether removing the warning log is intentional or if it should be adjusted to log when operating without a recent user message boundary.

Copy link
Collaborator Author

@enyst enyst Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^^ This is incorrect, mostly. The point of looking for user message is to make sure the agent doesn't immediately stop because of an older stuck. The user sending a message is a cut off point (we don't want to look before a user message).

If there is no user message, then there is no previously stuck either. It's OK to look at all 20.

And if there is a previously stuck, that's strange, but it's still OK to look now and trigger a stop.

enyst and others added 4 commits January 26, 2026 18:25
Copy link
Collaborator

@xingyaoww xingyaoww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and seems pretty stragiht forward to me!

@openhands-ai
Copy link

openhands-ai bot commented Jan 27, 2026

Looks like there are a few issues preventing this PR from being merged!

  • GitHub Actions are failing:
    • Agent Server

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #1829 at branch `openhands/stuck-detector-cap-events`

Feel free to include any additional details that might help me get this PR into a better state.

You can manage your notification settings

@enyst enyst enabled auto-merge (squash) January 27, 2026 01:38
@enyst enyst merged commit 6c86d7d into main Jan 27, 2026
18 checks passed
@enyst enyst deleted the openhands/stuck-detector-cap-events branch January 27, 2026 01:40
@enyst enyst added the invariants the design invariants of the codebase label Jan 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

invariants the design invariants of the codebase

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants