Skip to content

Truncate terminal outputs before persisting events#1823

Merged
xingyaoww merged 16 commits intomainfrom
truncate-terminal-before-store
Jan 30, 2026
Merged

Truncate terminal outputs before persisting events#1823
xingyaoww merged 16 commits intomainfrom
truncate-terminal-before-store

Conversation

@enyst
Copy link
Collaborator

@enyst enyst commented Jan 26, 2026

HUMAN: In V1, currently
(1)- we cap Terminal obs thanks to 10k lines limit
(2)- no max chars limit; I don't know if lines are limited
(3)- to_llm_message will cap with a max chars limit, the content actually used (sent to the LLM)

This PR proposes to address "layer 2" as the agent calls it below, capping per chars too. Reasons include: potentially high observations loaded in memory, transferred over the wire, loaded in the UIs even if not displayed in full.

Note: it seems the browser tools already cap to 30k chars, before creating the observation. Basically this PR proposes to make sure Terminal does too.

Related to #1824


Port layer-2 truncation (truncate before store) for terminal tool results.

Background: OpenHands/OpenHands#7404 (V0) truncated CmdOutputObservation before saving into the event stream to avoid persisting multi-megabyte outputs the LLM never sees.

Proposed fix:

  • TerminalSession now truncates command output to MAX_CMD_OUTPUT_SIZE (30k) before creating TerminalObservation (completed/nochange-timeout/hard-timeout + previous-command-running).
  • Added a session-level test (runs on tmux + subprocess) asserting large command output is truncated in obs.text.

Local: pre-commit clean; terminal session + truncation tests pass.


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:9b0ee17-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-9b0ee17-python \
  ghcr.io/openhands/agent-server:9b0ee17-python

All tags pushed for this build

ghcr.io/openhands/agent-server:9b0ee17-golang-amd64
ghcr.io/openhands/agent-server:9b0ee17-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:9b0ee17-golang-arm64
ghcr.io/openhands/agent-server:9b0ee17-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:9b0ee17-java-amd64
ghcr.io/openhands/agent-server:9b0ee17-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:9b0ee17-java-arm64
ghcr.io/openhands/agent-server:9b0ee17-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:9b0ee17-python-amd64
ghcr.io/openhands/agent-server:9b0ee17-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-amd64
ghcr.io/openhands/agent-server:9b0ee17-python-arm64
ghcr.io/openhands/agent-server:9b0ee17-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-arm64
ghcr.io/openhands/agent-server:9b0ee17-golang
ghcr.io/openhands/agent-server:9b0ee17-java
ghcr.io/openhands/agent-server:9b0ee17-python

About Multi-Architecture Support

  • Each variant tag (e.g., 9b0ee17-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 9b0ee17-python-amd64) are also available if needed

Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR successfully adds validator-based truncation to avoid persisting large terminal outputs. I found a few issues that should be addressed, with the most important being potential double file saving. Details in inline comments below.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 26, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-tools/openhands/tools/terminal/terminal
   terminal_session.py1875868%92, 98, 102–104, 126–127, 154, 169–170, 209–211, 216, 219–220, 224, 230, 233, 248–250, 255, 258–259, 263, 269, 272, 292, 294, 297, 299, 315, 330, 336, 345, 348, 382, 386, 389, 392–393, 399–400, 406, 409, 416–417, 423–424, 494, 499–500, 509–511, 517–518
TOTAL16987848150% 

enyst and others added 4 commits January 26, 2026 09:12
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
@enyst enyst marked this pull request as draft January 26, 2026 10:14
Co-authored-by: openhands <openhands@all-hands.dev>
@enyst
Copy link
Collaborator Author

enyst commented Jan 26, 2026

(AGENT)

Refactor per discussion: moved “truncate before store” to the TerminalSession layer (where observations are created), instead of a TerminalObservation validator.

  • TerminalSession now truncates command output to MAX_CMD_OUTPUT_SIZE (30k) before creating TerminalObservation (completed/nochange-timeout/hard-timeout + previous-command-running).
  • Removed TerminalObservation model validator and the full_output_save_dir persistence logic.
  • Added a session-level test (runs on tmux + subprocess) asserting large command output is truncated in obs.text.

Local: pre-commit clean; terminal session + truncation tests pass.

@enyst enyst marked this pull request as ready for review January 26, 2026 11:30
@OpenHands OpenHands deleted a comment from openhands-ai bot Jan 26, 2026
Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

The PR successfully implements truncation before persistence to avoid storing multi-megabyte outputs. However, I've identified several issues that should be addressed:

🟠 Important Issues

1. Missing save_dir Parameter (Line 188-190, 230-232, 269-271, 406-408)

The maybe_truncate() calls don't pass a save_dir parameter, meaning large outputs are truncated and lost forever. The maybe_truncate function supports persisting full outputs to disk (like browser tools do).

Recommendation: Consider passing save_dir to allow recovery of full outputs:

command_output = maybe_truncate(
    command_output, 
    truncate_after=MAX_CMD_OUTPUT_SIZE,
    save_dir=self.terminal.work_dir,  # or dedicated output dir
    tool_prefix="terminal"
)

2. Metadata Could Be Lost in Truncation (Line 175-190)

Truncation happens AFTER metadata suffix is added (lines 175-182). For very large outputs, maybe_truncate keeps head+tail, but important metadata (exit code messages, timeout info) might fall in the truncated middle section.

Recommendation: Either:

  • Truncate before adding metadata, then append metadata (ensures it's always visible)
  • Add a test that verifies metadata is preserved after truncation
  • Document this behavior if intentional

🟡 Suggestions

3. Code Duplication

The truncation pattern is repeated identically 4 times. Consider extracting to a helper method:

def _truncate_command_output(self, command_output: str) -> str:
    """Truncate command output to MAX_CMD_OUTPUT_SIZE."""
    return maybe_truncate(command_output, truncate_after=MAX_CMD_OUTPUT_SIZE)

4. Test Coverage Gaps (test_terminal_session.py)

The test only verifies basic truncation. Consider adding:

  • Multi-line output truncation test
  • Metadata preservation verification
  • Head and tail preservation checks
  • Edge cases (exactly at limit, just over limit)

Example additions:

assert obs.text.startswith("AAA")  # First few chars preserved
assert obs.text.endswith("AAA")  # Last few chars preserved
assert "exit code" in obs.text.lower()  # Metadata preserved

Positive Notes

✅ Truncation is consistently applied across all observation creation paths
✅ Test confirms basic truncation functionality works
✅ Uses existing maybe_truncate utility correctly
✅ MAX_CMD_OUTPUT_SIZE constant properly defined and imported

@smolpaws
Copy link
Contributor

smolpaws commented Jan 26, 2026

Re: latest review note about metadata being lost. In this PR, TerminalSession truncates only the command output string (obs.text) before creating the TerminalObservation. The prefix/suffix (exit code, timeout info, etc.) live on CmdOutputMetadata (obs.metadata.prefix/suffix) and are not appended into command_output at this layer, so this truncation cannot drop those metadata messages.

The only place we concatenate prefix+text+suffix and then truncate is TerminalObservation.to_llm_content, which keeps head+tail; if we want to be extra safe, we can add a test (with a much smaller MAX via monkeypatch) asserting the suffix lines are preserved under truncation. (update: test added)

Also: since we’re not passing save_dir to maybe_truncate, we’re intentionally not persisting full outputs / doing dedup here.

Co-authored-by: openhands <openhands@all-hands.dev>
@enyst enyst requested a review from xingyaoww January 26, 2026 13:12
Co-authored-by: openhands <openhands@all-hands.dev>
@OpenHands OpenHands deleted a comment from openhands-ai bot Jan 26, 2026
Co-authored-by: openhands <openhands@all-hands.dev>
@OpenHands OpenHands deleted a comment from openhands-ai bot Jan 26, 2026
@enyst
Copy link
Collaborator Author

enyst commented Jan 27, 2026

@OpenHands understand this PR and read all comments. Tell me : what was NOT fixed from the AI reviewer comments?

List them. Then take them one by one and address them: do you think it should it be fixed or not, and why. Respond as a comment on github on the PR, using smolpaws. Note that it will be rendered as markdown.

@openhands-ai
Copy link

openhands-ai bot commented Jan 27, 2026

I'm on it! enyst can track my progress at all-hands.dev

@smolpaws
Copy link
Contributor

smolpaws here.

I re-read the AI review (all-hands-bot) and compared it to the current PR state (after the refactor that moved truncation into TerminalSession and removed the TerminalObservation validator + full_output_save_dir logic).

What AI review comments were not fixed?

  1. (Suggestion) Deduplicate the repeated truncation snippet (4 call sites).
  2. (Suggestion) Broaden test coverage / add edge-case tests beyond the basic “output is truncated”.

Everything else from that review is effectively addressed by the refactor:

  • Double file saving: no longer applicable because the validator + full_output_save_dir saving was removed, and we’re not passing save_dir anywhere.
  • “Missing save_dir parameter”: intentionally not implemented (and now structurally impossible in this PR since full_output_save_dir is gone).
  • “Metadata could be lost”: addressed by adding a test around to_llm_content tail preservation.

Should each remaining item be fixed?

1) Code duplication (4 identical maybe_truncate(..., truncate_after=MAX_CMD_OUTPUT_SIZE) blocks)

Do we need to fix it? Probably not for this PR.

Why:

  • The duplicated code is very small and local.
  • Introducing a helper (e.g. _truncate_command_output) slightly improves DRYness, but also creates another abstraction to thread through (and reviewers often disagree on where that helper should live: TerminalSession, a utility module, etc.).
  • Since this PR’s main goal is correctness + preventing large persisted events, the duplication doesn’t materially affect risk/behavior.

When it would be worth doing: if we expect to tune truncation behavior (e.g., different limits per status, or adding optional save-to-disk), a helper would become more valuable.

2) Additional tests / edge cases

Do we need to fix it? Not strictly required, but nice-to-have.

Why:

  • The PR already covers the critical behavior: session-level truncation before persisting + ensuring LLM-facing content keeps the suffix in the tail.
  • Adding more tests (exactly-at-limit, just-over-limit, multi-line behavior, head+tail invariants) increases confidence, but also increases test runtime/maintenance (especially with multi-backend terminal types).

What I’d add if we decide to extend tests:

  • len(obs.text) == small_max behavior when input is much larger (already roughly covered)
  • obs.text.startswith(...) and .endswith(...) to explicitly assert head+tail preservation
  • A multi-line generator (e.g. printf 'A\n%.0s' {1..N}) to ensure truncation isn’t doing anything surprising with newline boundaries

If you want, I can follow up with a small refactor adding _truncate_command_output() + 1-2 lightweight edge case tests, but I’m leaning toward keeping this PR minimal.

@OpenHands OpenHands deleted a comment from openhands-ai bot Jan 27, 2026
enyst and others added 2 commits January 27, 2026 11:28
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
@openhands-ai
Copy link

openhands-ai bot commented Jan 27, 2026

Looks like there are a few issues preventing this PR from being merged!

  • GitHub Actions are failing:
    • Run tests

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #1823 at branch `truncate-terminal-before-store`

Feel free to include any additional details that might help me get this PR into a better state.

You can manage your notification settings

Copy link
Contributor

@raymyers raymyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks solid, thanks!

Copy link
Collaborator

@xingyaoww xingyaoww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Engel! LGTM

@xingyaoww xingyaoww merged commit b53099a into main Jan 30, 2026
27 checks passed
@xingyaoww xingyaoww deleted the truncate-terminal-before-store branch January 30, 2026 06:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants