enh: Refactor `Event` -> `Message` pipeline outside of `CodeActAgent` #6715

csmith49 · 2025-02-13T19:06:06Z

End-user friendly description of the problem this fixes or functionality that this introduces

Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

Give a summary of what the PR does, explaining any non-trivial design decisions

This PR pulls the Event -> Message pipeline out of the CodeAct agent and into core.

Additionally:

Improvements to truncate_content default behavior
Moves Message conversion tests out of test_codeact_agent.py and into test_message_utils.py
Refactor CodeActAgent._get_messages with two new private methods to encapsulate behavior

These changes enable us to:

Move towards a uniform Event -> Message conversion
Simplify the CodeAct agent implementation
Compute message representations outside of the agent (useful for condensers and other "predictive" calculations)

Other agents don't use a similar pipeline: the browsing agents build messages manually from prompts, and the delegating agent just delegates.

Link of any specific issues this addresses

This PR is necessary to resolve #6707, as we need to compute the accurate message structure outside of the agent.

…uncs

csmith49 · 2025-02-13T19:08:39Z

openhands/core/message_utils.py

This is also an opportunity to move message.py, message_format.md, and message_utils.py into a sub-module. I didn't do so here to limit the files touched but LMK if this is a good spot for the larger re-org as well.

csmith49 · 2025-02-13T19:10:49Z

openhands/events/serialization/event.py

@@ -130,9 +130,9 @@ def event_to_memory(event: 'Event', max_message_chars: int) -> dict:
    return d


-def truncate_content(content: str, max_chars: int) -> str:
+def truncate_content(content: str, max_chars: int | None = None) -> str:


I'm not sure this function does anything substantial ATM, are there plans to change this?

Oh I think sometimes it matters a lot, because this is the function used to truncate the obs content.

max_chars is 30_000 by default ~= 9k tokens (seen some with anthropic tokenizer API)

This is because a single observation can be crazy large. A fun example is the result of an innocent ls -R /workspace on the Django repository. The observation has ~120k tokens. 😭 And we save them all in the stream.

I guess we could:

in stream.py, return obs already truncated

not save the full content in the first place (but we may want to keep more of it than 30k, idk). I am not sure about this one, simply conceptually... we do want the event stream to be "the source of truth" so we can find later whatever the underlying content was. The other side of the argument here though is that the full one is never used.

Oh, you're absolutely right -- I was misreading the implementation and thought we were taking half = len(content) // 2 instead of half = max_chars // 2 🙃

Calvin Smith added 11 commits February 13, 2025 10:48

break out message modifications to private methods

af4ddb3

fixing busted control flow

d506d54

message utils for holding non-agent-specific transforms

68e2017

initial separation of events -> messages from agent

c3db816

removing self from get action messages

d29b84a

passing simple config directly to get obs message

ecb950f

removing last trace of from separated methods

7e858f9

switching to external events -> messages pipeline

1b8d910

moving message_utils to core to avoid circular imports later

dfbf2fc

moving event -> message tests to separate file and fixing access to f…

e4da0fe

…uncs

minor doc updates and better defaults for truncation

f0b74dc

csmith49 requested a review from enyst February 13, 2025 19:06

csmith49 changed the title ~~Fix/event to message refactor~~ enh: Refactor Event -> Message pipeline outside of CodeActAgent Feb 13, 2025

csmith49 commented Feb 13, 2025

View reviewed changes

Calvin Smith and others added 4 commits February 13, 2025 12:17

fixing config in failing unit test

a319257

Merge branch 'main' into fix/event-to-message-refactor

e2de612

Merge branch 'main' into fix/event-to-message-refactor

b7e001d

Merge branch 'main' into fix/event-to-message-refactor

b1950e4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enh: Refactor `Event` -> `Message` pipeline outside of `CodeActAgent` #6715

enh: Refactor `Event` -> `Message` pipeline outside of `CodeActAgent` #6715

csmith49 commented Feb 13, 2025

csmith49 Feb 13, 2025

csmith49 Feb 13, 2025

enyst Feb 13, 2025

csmith49 Feb 13, 2025

enh: Refactor Event -> Message pipeline outside of CodeActAgent #6715

Are you sure you want to change the base?

enh: Refactor Event -> Message pipeline outside of CodeActAgent #6715

Conversation

csmith49 commented Feb 13, 2025

csmith49 Feb 13, 2025

Choose a reason for hiding this comment

csmith49 Feb 13, 2025

Choose a reason for hiding this comment

enyst Feb 13, 2025

Choose a reason for hiding this comment

csmith49 Feb 13, 2025

Choose a reason for hiding this comment

enh: Refactor `Event` -> `Message` pipeline outside of `CodeActAgent` #6715

enh: Refactor `Event` -> `Message` pipeline outside of `CodeActAgent` #6715