Save complete trajectory in presence of history truncation #6751

li-boxuan · 2025-02-17T07:06:12Z

End-user friendly description of the problem this fixes or functionality that this introduces

Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

Give a summary of what the PR does, explaining any non-trivial design decisions

#4977 introduced truncation feature to handle long context errors by cutting the history. Since trajectory saving feature depends on "controller.state.history", this causes saved trajectories to only contain partial history.

Direct cause is that we are not closing the controller properly in headless mode. If we had done so, the history would be recovered properly and thus trajectory would have been complete. This PR fixes this issue.

Note that state.history is being used in many places (including but not limited to a few benchmark evaluation harness), which we might also want to evaluate if they actually need truncated history or full history. Maybe we need better names for truncated/full history.

~~Also, note that stuck detector also depends on state.history, which could potentially lead to malfunction of stuck detector if the loop appears around the place where we do the truncate.~~

Anyways, this PR fixes the partial trajectory issue and includes a unit test, which could be used as a testbed for any future renaming/refactoring.~~

Kudos to @adityasoni9998 for finding this issue after analyzing a few evaluation trajectories.

Link of any specific issues this addresses

~~This also includes a small fix for #6749 in openhands/server/routes/trajectory.py~~

To run this PR locally, use the following command:

docker run -it --rm   -p 3000:3000   -v /var/run/docker.sock:/var/run/docker.sock   --add-host host.docker.internal:host-gateway   -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:9962580-nikolaik   --name openhands-app-9962580   docker.all-hands.dev/all-hands-ai/openhands:9962580

openhands/core/main.py

enyst · 2025-02-17T11:05:34Z

#4977 introduced truncation feature to handle long context errors by cutting the history. Since trajectory saving feature depends on "controller.state.history", this causes saved trajectories to only contain partial history.

There is something I don't understand about this:

OpenHands/openhands/controller/agent_controller.py

Line 159 in 3d10df4

    
           # the final state.history will be used by external scripts like evals, tests, etc.

close makes sure that history has all events, so that it's as evals expect it to be: full, as far as I can see, with agent-specific events. It's supposed to happen before the controller state is read in evals and other external scripts. Isn't that happening anymore?

enyst · 2025-02-17T12:20:27Z

Cc: @csmith49

To elaborate a bit:

when delegates are used, state.history contains only the history of the active agent
similarly, in the truncation case, state.history contains only events starting from an id
but evals or other external callers of run_controller need full history
afaict, controller.close was supposed to work for both, putting Humpty back together
when run_controller returns the final state, close must have happened
including for exceptions.

TBH this is fragile code, it doesn't feel great to rely on close being called. Still, it's close, it's bit surprising that the controller wouldn't be closed suddenly? I didn't repro the issue. 🤔

The main reason for state.history being the same variable has been that it's so common in agents frameworks/projects/experiments. Everyone seems to expect it, so it didn't seem obvious how else to call things. (When history was a class, there was a history.get_events with flags for all events or current agent.)

Re: stuck
I think it's fine: stuck works with the latest 6 events in the halved history = the latest 6 events in the full history, unless this half is less than 6 events. If it is less, which should be rare if ever, it's not really stuck... 🤔 (if the LLM gets less than 6, plus the task, seems like a very edge case, but more importantly, I guess it should be able to get out by itself...)

csmith49 · 2025-02-17T16:13:30Z

when delegates are used, state.history contains only the history of the active agent

I'm still working through the control-flow of controller.close. Does this mean that if delegate A truncates, then we switch to delegate B for a bit, when we switch back to delegate A we end up with the non-truncated history?

enyst · 2025-02-17T16:31:48Z

when delegates are used, state.history contains only the history of the active agent

I'm still working through the control-flow of controller.close. Does this mean that if delegate A truncates, then we switch to delegate B for a bit, when we switch back to delegate A we end up with the non-truncated history?

🤔 That's a good question! 😅

...Shouldn't change anything, because the B controller closes, and the B state.history may be there, but A controller doesn't read it. It doesn't seem to care about B's state.history:

OpenHands/openhands/controller/agent_controller.py

Line 610 in ae31a24

obs = AgentDelegateObservation(outputs=delegate_outputs, content=content)

A's controller has A's agent state instance, which is different, and A controller.close didn't run yet... if that made sense

li-boxuan · 2025-02-17T18:12:08Z

#4977 introduced truncation feature to handle long context errors by cutting the history. Since trajectory saving feature depends on "controller.state.history", this causes saved trajectories to only contain partial history.

There is something I don't understand about this:

OpenHands/openhands/controller/agent_controller.py

Line 159 in 3d10df4

# the final state.history will be used by external scripts like evals, tests, etc.

close makes sure that history has all events, so that it's as evals expect it to be: full, as far as I can see, with agent-specific events. It's supposed to happen before the controller state is read in evals and other external scripts. Isn't that happening anymore?

Ah interesting... so the get_history() call is supposed to work? But it isn't... (there's no delegation involved in my experiment)

FYI my commit was aae611a, which is built on top of e487008

Everything between id 0 and 77 are truncated/lost in the final history.

li-boxuan · 2025-02-17T18:15:30Z

Oh wait I see the problem! controller.close() is not invoked in main.py, so the headless mode is not closing the controller properly

…after-truncation

openhands/server/routes/trajectory.py

li-boxuan · 2025-02-17T18:48:19Z

TBH this is fragile code, it doesn't feel great to rely on close being called.

Yeah I agree, especially fragile if we want to export/save trajectory before closing the controller, or, say, in the middle of a conversation.

The main reason for state.history being the same variable has been that it's so common in agents frameworks/projects/experiments. Everyone seems to expect it, so it didn't seem obvious how else to call things. (When history was a class, there was a history.get_events with flags for all events or current agent.)

Yeah understood, it's hard to deal with legacy esp without enough test coverage

enyst · 2025-02-17T18:55:08Z

openhands/core/main.py

@@ -193,6 +193,8 @@ def on_event(event: Event):
        # NOTE: the saved state does not include delegates events
        end_state.save_to_session(event_stream.sid, event_stream.file_store)

+    await controller.close()


You are entirely correct

This seems to be messing up with the event loop... causing tests failure. Will investigate later.

enyst · 2025-02-17T19:02:57Z

I was just coming back to this PR after I saw nothing calls it - the agent_session does (but then that's UI only). Some recent-ish refactoring must have broken this. Weird though.

Thank you for the investigation!

Save complete trajectory in presence of history truncation

3d10df4

li-boxuan requested review from enyst, neubig and xingyaoww February 17, 2025 07:07

enyst reviewed Feb 17, 2025

View reviewed changes

openhands/core/main.py Show resolved Hide resolved

enyst approved these changes Feb 17, 2025

View reviewed changes

li-boxuan added 2 commits February 17, 2025 10:35

Fix: Close controller properly in headless mode

b6ac206

Merge remote-tracking branch 'upstream/main' into boxuanli/traj-save-…

e07c89b

…after-truncation

li-boxuan force-pushed the boxuanli/traj-save-after-truncation branch from 25739af to e07c89b Compare February 17, 2025 18:36

Add an assertion

04c65c1

li-boxuan commented Feb 17, 2025

View reviewed changes

openhands/server/routes/trajectory.py Outdated Show resolved Hide resolved

Revert changes in openhands/server/routes/trajectory.py

6649219

enyst reviewed Feb 17, 2025

View reviewed changes

enyst approved these changes Feb 17, 2025

View reviewed changes

enyst added 4 commits February 18, 2025 22:58

Merge branch 'main' into boxuanli/traj-save-after-truncation

e486d02

Merge branch 'main' into boxuanli/traj-save-after-truncation

6c65ba0

Merge branch 'main' into boxuanli/traj-save-after-truncation

3c7d78f

Update main.py

9962580

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Save complete trajectory in presence of history truncation #6751

Save complete trajectory in presence of history truncation #6751

li-boxuan commented Feb 17, 2025 •

edited by github-actions bot

Loading

enyst commented Feb 17, 2025 •

edited

Loading

enyst commented Feb 17, 2025 •

edited

Loading

csmith49 commented Feb 17, 2025

enyst commented Feb 17, 2025

li-boxuan commented Feb 17, 2025 •

edited

Loading

li-boxuan commented Feb 17, 2025

li-boxuan commented Feb 17, 2025

enyst Feb 17, 2025

li-boxuan Feb 17, 2025

enyst commented Feb 17, 2025

Save complete trajectory in presence of history truncation #6751

Are you sure you want to change the base?

Save complete trajectory in presence of history truncation #6751

Conversation

li-boxuan commented Feb 17, 2025 • edited by github-actions bot Loading

enyst commented Feb 17, 2025 • edited Loading

enyst commented Feb 17, 2025 • edited Loading

csmith49 commented Feb 17, 2025

enyst commented Feb 17, 2025

li-boxuan commented Feb 17, 2025 • edited Loading

li-boxuan commented Feb 17, 2025

li-boxuan commented Feb 17, 2025

enyst Feb 17, 2025

Choose a reason for hiding this comment

li-boxuan Feb 17, 2025

Choose a reason for hiding this comment

enyst commented Feb 17, 2025

li-boxuan commented Feb 17, 2025 •

edited by github-actions bot

Loading

enyst commented Feb 17, 2025 •

edited

Loading

enyst commented Feb 17, 2025 •

edited

Loading

li-boxuan commented Feb 17, 2025 •

edited

Loading