Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delegation fixes #6165

Merged
merged 28 commits into from
Jan 15, 2025
Merged

Delegation fixes #6165

merged 28 commits into from
Jan 15, 2025

Conversation

enyst
Copy link
Collaborator

@enyst enyst commented Jan 9, 2025

End-user friendly description of the problem this fixes or functionality that this introduces

  • Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below
    Fix agent delegation; use events for communication between parent and delegates.
    Fix the lockup when the model returns a message instead of a tool call.

Give a summary of what the PR does, explaining any non-trivial design decisions

Delegation was broken after we made the agent loop rely exclusively on a controller-as-observer logic. This PR proposes to fix it in a simple way: by forwarding to the delegate

  • refactor the current logic (of unsubscribing parent when delegate starts and vice versa): now ONLY the parent is subscribed and stays subscribed, and it forwards to the delegate when it has one
  • should_step on both MessageActions from 'user' and 'agent', except when waiting for user input is explicitly set
  • should_step on DelegateAction too, it will create a MessageAction to kickstart the delegate
  • refactor ending conditions
  • added integration tests for DelegatorAgent

Also:

  • the delegate starts with a MessageAction
  • and ends with a AgentDelegateObservation/ErrorObservation

The code is ready for review - or this logic of delegation.
(please ignore the print() stuff, will clean up later)


Link of any specific issues this addresses
Fix #6162


To run this PR locally, use the following command:

docker run -it --rm   -p 3000:3000   -v /var/run/docker.sock:/var/run/docker.sock   --add-host host.docker.internal:host-gateway   -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:3df4d30-nikolaik   --name openhands-app-3df4d30   docker.all-hands.dev/all-hands-ai/openhands:3df4d30

@enyst enyst marked this pull request as draft January 9, 2025 07:30

This comment was marked as outdated.

@enyst
Copy link
Collaborator Author

enyst commented Jan 9, 2025

@openhands-agent Read the diff of this PR carefully. Understand what it tries to achieve. Then, we have two things to do:

  1. The diff has added debug prints. We need them! And we need to enhance them:
  • all events have an event.id, add it to the print() statements after the class name, like '({event.id})'
  1. The unit tests in test_agent_controller.py are outdated and failing. Update them to the new behavior. Understand the difference in the context of the changes of this PR and fix them.

Important:
You don't need to test the rest. Just this test file.

@All-Hands-AI All-Hands-AI deleted a comment from openhands-agent Jan 9, 2025
@All-Hands-AI All-Hands-AI deleted a comment from openhands-agent Jan 9, 2025
@All-Hands-AI All-Hands-AI deleted a comment from github-actions bot Jan 9, 2025
Copy link
Contributor

github-actions bot commented Jan 9, 2025

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@enyst enyst marked this pull request as ready for review January 9, 2025 16:47
@enyst enyst requested a review from rbren January 9, 2025 17:01
@enyst enyst requested a review from li-boxuan January 9, 2025 17:10

This comment was marked as outdated.

This comment was marked as outdated.

@All-Hands-AI All-Hands-AI deleted a comment from github-actions bot Jan 15, 2025

This comment was marked as outdated.

Copy link
Contributor

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

Copy link
Contributor

Trigger by: Pull Request (integration-test label on PR #6165)
Commit: f2dcc79
Integration Tests Report (Haiku)
Haiku LLM Test Results:
Success rate: 100.00% (6/6)

Total cost: USD 0.00

instance_id success reason cost error_message
t02_add_bash_hello True 0
t06_github_pr_browsing True 0
t03_jupyter_write_file True 0
t01_fix_simple_typo True 0
t05_simple_browsing True 0
t04_git_staging True 0

Integration Tests Report (DeepSeek)
DeepSeek LLM Test Results:
Success rate: 83.33% (5/6)

Total cost: USD 0.00

instance_id success reason cost error_message
t02_add_bash_hello True 0
t03_jupyter_write_file True 0
t04_git_staging True 0
t01_fix_simple_typo True 0
t06_github_pr_browsing True 0
t05_simple_browsing False The answer is not found in any message. Total messages: 2. 0

Integration Tests Report Delegator (Haiku)
Success rate: 50.00% (1/2)

Total cost: USD 0.00

instance_id success reason cost error_message
t02_add_bash_hello True 0 nan
t01_fix_simple_typo False File not fixed: This is a silly typo. 0 RuntimeError: Agent reached maximum iteration in headless mode. Current iteration: 30, max iteration: 30
Really!
No more typos.
Enjoy!

Integration Tests Report Delegator (DeepSeek)
Success rate: 100.00% (2/2)

Total cost: USD 0.00

instance_id success reason cost error_message
t02_add_bash_hello True 0
t01_fix_simple_typo True 0

Download testing outputs (includes both Haiku and DeepSeek results): Download

@enyst enyst enabled auto-merge (squash) January 15, 2025 03:11
@enyst enyst merged commit b9a70c8 into main Jan 15, 2025
13 checks passed
@enyst enyst deleted the enyst/delegation branch January 15, 2025 03:24
@diwu-sf
Copy link
Contributor

diwu-sf commented Jan 15, 2025

Thanks!

csmith49 pushed a commit to csmith49/OpenHands that referenced this pull request Jan 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Regression in AgentController broke AgentDelegationAction
4 participants