Truncate terminal outputs before persisting events#1823
Conversation
Co-authored-by: openhands <openhands@all-hands.dev>
all-hands-bot
left a comment
There was a problem hiding this comment.
This PR successfully adds validator-based truncation to avoid persisting large terminal outputs. I found a few issues that should be addressed, with the most important being potential double file saving. Details in inline comments below.
Coverage Report •
|
||||||||||||||||||||
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
|
(AGENT) Refactor per discussion: moved “truncate before store” to the TerminalSession layer (where observations are created), instead of a TerminalObservation validator.
Local: pre-commit clean; terminal session + truncation tests pass. |
all-hands-bot
left a comment
There was a problem hiding this comment.
Code Review Summary
The PR successfully implements truncation before persistence to avoid storing multi-megabyte outputs. However, I've identified several issues that should be addressed:
🟠 Important Issues
1. Missing save_dir Parameter (Line 188-190, 230-232, 269-271, 406-408)
The maybe_truncate() calls don't pass a save_dir parameter, meaning large outputs are truncated and lost forever. The maybe_truncate function supports persisting full outputs to disk (like browser tools do).
Recommendation: Consider passing save_dir to allow recovery of full outputs:
command_output = maybe_truncate(
command_output,
truncate_after=MAX_CMD_OUTPUT_SIZE,
save_dir=self.terminal.work_dir, # or dedicated output dir
tool_prefix="terminal"
)2. Metadata Could Be Lost in Truncation (Line 175-190)
Truncation happens AFTER metadata suffix is added (lines 175-182). For very large outputs, maybe_truncate keeps head+tail, but important metadata (exit code messages, timeout info) might fall in the truncated middle section.
Recommendation: Either:
- Truncate before adding metadata, then append metadata (ensures it's always visible)
- Add a test that verifies metadata is preserved after truncation
- Document this behavior if intentional
🟡 Suggestions
3. Code Duplication
The truncation pattern is repeated identically 4 times. Consider extracting to a helper method:
def _truncate_command_output(self, command_output: str) -> str:
"""Truncate command output to MAX_CMD_OUTPUT_SIZE."""
return maybe_truncate(command_output, truncate_after=MAX_CMD_OUTPUT_SIZE)4. Test Coverage Gaps (test_terminal_session.py)
The test only verifies basic truncation. Consider adding:
- Multi-line output truncation test
- Metadata preservation verification
- Head and tail preservation checks
- Edge cases (exactly at limit, just over limit)
Example additions:
assert obs.text.startswith("AAA") # First few chars preserved
assert obs.text.endswith("AAA") # Last few chars preserved
assert "exit code" in obs.text.lower() # Metadata preservedPositive Notes
✅ Truncation is consistently applied across all observation creation paths
✅ Test confirms basic truncation functionality works
✅ Uses existing maybe_truncate utility correctly
✅ MAX_CMD_OUTPUT_SIZE constant properly defined and imported
|
Re: latest review note about metadata being lost. In this PR, TerminalSession truncates only the command output string (obs.text) before creating the TerminalObservation. The prefix/suffix (exit code, timeout info, etc.) live on CmdOutputMetadata (obs.metadata.prefix/suffix) and are not appended into command_output at this layer, so this truncation cannot drop those metadata messages. The only place we concatenate prefix+text+suffix and then truncate is TerminalObservation.to_llm_content, which keeps head+tail; if we want to be extra safe, we can add a test (with a much smaller MAX via monkeypatch) asserting the suffix lines are preserved under truncation. (update: test added) Also: since we’re not passing save_dir to maybe_truncate, we’re intentionally not persisting full outputs / doing dedup here. |
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
|
@OpenHands understand this PR and read all comments. Tell me : what was NOT fixed from the AI reviewer comments? List them. Then take them one by one and address them: do you think it should it be fixed or not, and why. Respond as a comment on github on the PR, using smolpaws. Note that it will be rendered as markdown. |
|
I'm on it! enyst can track my progress at all-hands.dev |
|
smolpaws here. I re-read the AI review (all-hands-bot) and compared it to the current PR state (after the refactor that moved truncation into What AI review comments were not fixed?
Everything else from that review is effectively addressed by the refactor:
Should each remaining item be fixed?1) Code duplication (4 identical
|
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
|
Looks like there are a few issues preventing this PR from being merged!
If you'd like me to help, just leave a comment, like Feel free to include any additional details that might help me get this PR into a better state. You can manage your notification settings |
HUMAN: In V1, currently
(1)- we cap Terminal obs thanks to 10k lines limit
(2)- no max chars limit; I don't know if lines are limited
(3)-
to_llm_messagewill cap with a max chars limit, the content actually used (sent to the LLM)This PR proposes to address "layer 2" as the agent calls it below, capping per chars too. Reasons include: potentially high observations loaded in memory, transferred over the wire, loaded in the UIs even if not displayed in full.
Note: it seems the browser tools already cap to 30k chars, before creating the observation. Basically this PR proposes to make sure Terminal does too.
Related to #1824
Port layer-2 truncation (truncate before store) for terminal tool results.
Background: OpenHands/OpenHands#7404 (V0) truncated CmdOutputObservation before saving into the event stream to avoid persisting multi-megabyte outputs the LLM never sees.
Proposed fix:
Local: pre-commit clean; terminal session + truncation tests pass.
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:9b0ee17-pythonRun
All tags pushed for this build
About Multi-Architecture Support
9b0ee17-python) is a multi-arch manifest supporting both amd64 and arm649b0ee17-python-amd64) are also available if needed