Truncate terminal outputs before persisting events by enyst · Pull Request #1823 · OpenHands/software-agent-sdk

enyst · 2026-01-26T08:57:22Z

HUMAN: In V1, currently
(1)- we cap Terminal obs thanks to 10k lines limit
(2)- no max chars limit; I don't know if lines are limited
(3)- to_llm_message will cap with a max chars limit, the content actually used (sent to the LLM)

This PR proposes to address "layer 2" as the agent calls it below, capping per chars too. Reasons include: potentially high observations loaded in memory, transferred over the wire, loaded in the UIs even if not displayed in full.

Note: it seems the browser tools already cap to 30k chars, before creating the observation. Basically this PR proposes to make sure Terminal does too.

Related to #1824

Port layer-2 truncation (truncate before store) for terminal tool results.

Background: OpenHands/OpenHands#7404 (V0) truncated CmdOutputObservation before saving into the event stream to avoid persisting multi-megabyte outputs the LLM never sees.

Proposed fix:

TerminalSession now truncates command output to MAX_CMD_OUTPUT_SIZE (30k) before creating TerminalObservation (completed/nochange-timeout/hard-timeout + previous-command-running).
Added a session-level test (runs on tmux + subprocess) asserting large command output is truncated in obs.text.

Local: pre-commit clean; terminal session + truncation tests pass.

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.13-nodejs22`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:9b0ee17-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-9b0ee17-python \
  ghcr.io/openhands/agent-server:9b0ee17-python

All tags pushed for this build

ghcr.io/openhands/agent-server:9b0ee17-golang-amd64
ghcr.io/openhands/agent-server:9b0ee17-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:9b0ee17-golang-arm64
ghcr.io/openhands/agent-server:9b0ee17-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:9b0ee17-java-amd64
ghcr.io/openhands/agent-server:9b0ee17-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:9b0ee17-java-arm64
ghcr.io/openhands/agent-server:9b0ee17-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:9b0ee17-python-amd64
ghcr.io/openhands/agent-server:9b0ee17-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-amd64
ghcr.io/openhands/agent-server:9b0ee17-python-arm64
ghcr.io/openhands/agent-server:9b0ee17-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-arm64
ghcr.io/openhands/agent-server:9b0ee17-golang
ghcr.io/openhands/agent-server:9b0ee17-java
ghcr.io/openhands/agent-server:9b0ee17-python

About Multi-Architecture Support

Each variant tag (e.g., 9b0ee17-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., 9b0ee17-python-amd64) are also available if needed

Co-authored-by: openhands <openhands@all-hands.dev>

all-hands-bot

This PR successfully adds validator-based truncation to avoid persisting large terminal outputs. I found a few issues that should be addressed, with the most important being potential double file saving. Details in inline comments below.

openhands-tools/openhands/tools/terminal/definition.py

tests/tools/terminal/test_observation_truncation.py

openhands-tools/openhands/tools/terminal/definition.py

github-actions · 2026-01-26T09:12:52Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
openhands-tools/openhands/tools/terminal/terminal
terminal_session.py	187	58	68%	92, 98, 102–104, 126–127, 154, 169–170, 209–211, 216, 219–220, 224, 230, 233, 248–250, 255, 258–259, 263, 269, 272, 292, 294, 297, 299, 315, 330, 336, 345, 348, 382, 386, 389, 392–393, 399–400, 406, 409, 416–417, 423–424, 494, 499–500, 509–511, 517–518
TOTAL	16987	8481	50%

Co-authored-by: openhands <openhands@all-hands.dev>

enyst · 2026-01-26T10:25:05Z

(AGENT)

Refactor per discussion: moved “truncate before store” to the TerminalSession layer (where observations are created), instead of a TerminalObservation validator.

TerminalSession now truncates command output to MAX_CMD_OUTPUT_SIZE (30k) before creating TerminalObservation (completed/nochange-timeout/hard-timeout + previous-command-running).
Removed TerminalObservation model validator and the full_output_save_dir persistence logic.
Added a session-level test (runs on tmux + subprocess) asserting large command output is truncated in obs.text.

Local: pre-commit clean; terminal session + truncation tests pass.

openhands-tools/openhands/tools/terminal/definition.py

all-hands-bot

Code Review Summary

The PR successfully implements truncation before persistence to avoid storing multi-megabyte outputs. However, I've identified several issues that should be addressed:

🟠 Important Issues

1. Missing save_dir Parameter (Line 188-190, 230-232, 269-271, 406-408)

The maybe_truncate() calls don't pass a save_dir parameter, meaning large outputs are truncated and lost forever. The maybe_truncate function supports persisting full outputs to disk (like browser tools do).

Recommendation: Consider passing save_dir to allow recovery of full outputs:

command_output = maybe_truncate(
    command_output, 
    truncate_after=MAX_CMD_OUTPUT_SIZE,
    save_dir=self.terminal.work_dir,  # or dedicated output dir
    tool_prefix="terminal"
)

2. Metadata Could Be Lost in Truncation (Line 175-190)

Truncation happens AFTER metadata suffix is added (lines 175-182). For very large outputs, maybe_truncate keeps head+tail, but important metadata (exit code messages, timeout info) might fall in the truncated middle section.

Recommendation: Either:

Truncate before adding metadata, then append metadata (ensures it's always visible)
Add a test that verifies metadata is preserved after truncation
Document this behavior if intentional

🟡 Suggestions

3. Code Duplication

The truncation pattern is repeated identically 4 times. Consider extracting to a helper method:

def _truncate_command_output(self, command_output: str) -> str:
    """Truncate command output to MAX_CMD_OUTPUT_SIZE."""
    return maybe_truncate(command_output, truncate_after=MAX_CMD_OUTPUT_SIZE)

4. Test Coverage Gaps (test_terminal_session.py)

The test only verifies basic truncation. Consider adding:

Multi-line output truncation test
Metadata preservation verification
Head and tail preservation checks
Edge cases (exactly at limit, just over limit)

Example additions:

assert obs.text.startswith("AAA")  # First few chars preserved
assert obs.text.endswith("AAA")  # Last few chars preserved
assert "exit code" in obs.text.lower()  # Metadata preserved

Positive Notes

✅ Truncation is consistently applied across all observation creation paths
✅ Test confirms basic truncation functionality works
✅ Uses existing maybe_truncate utility correctly
✅ MAX_CMD_OUTPUT_SIZE constant properly defined and imported

smolpaws · 2026-01-26T12:30:37Z

Re: latest review note about metadata being lost. In this PR, TerminalSession truncates only the command output string (obs.text) before creating the TerminalObservation. The prefix/suffix (exit code, timeout info, etc.) live on CmdOutputMetadata (obs.metadata.prefix/suffix) and are not appended into command_output at this layer, so this truncation cannot drop those metadata messages.

The only place we concatenate prefix+text+suffix and then truncate is TerminalObservation.to_llm_content, which keeps head+tail; if we want to be extra safe, we can add a test (with a much smaller MAX via monkeypatch) asserting the suffix lines are preserved under truncation. (update: test added)

Also: since we’re not passing save_dir to maybe_truncate, we’re intentionally not persisting full outputs / doing dedup here.

Co-authored-by: openhands <openhands@all-hands.dev>

enyst · 2026-01-27T10:49:47Z

@OpenHands understand this PR and read all comments. Tell me : what was NOT fixed from the AI reviewer comments?

List them. Then take them one by one and address them: do you think it should it be fixed or not, and why. Respond as a comment on github on the PR, using smolpaws. Note that it will be rendered as markdown.

openhands-ai · 2026-01-27T10:49:57Z

I'm on it! enyst can track my progress at all-hands.dev

smolpaws · 2026-01-27T10:52:17Z

smolpaws here.

I re-read the AI review (all-hands-bot) and compared it to the current PR state (after the refactor that moved truncation into TerminalSession and removed the TerminalObservation validator + full_output_save_dir logic).

What AI review comments were not fixed?

(Suggestion) Deduplicate the repeated truncation snippet (4 call sites).
(Suggestion) Broaden test coverage / add edge-case tests beyond the basic “output is truncated”.

Everything else from that review is effectively addressed by the refactor:

Double file saving: no longer applicable because the validator + full_output_save_dir saving was removed, and we’re not passing save_dir anywhere.
“Missing save_dir parameter”: intentionally not implemented (and now structurally impossible in this PR since full_output_save_dir is gone).
“Metadata could be lost”: addressed by adding a test around to_llm_content tail preservation.

Should each remaining item be fixed?

1) Code duplication (4 identical `maybe_truncate(..., truncate_after=MAX_CMD_OUTPUT_SIZE)` blocks)

Do we need to fix it? Probably not for this PR.

Why:

The duplicated code is very small and local.
Introducing a helper (e.g. _truncate_command_output) slightly improves DRYness, but also creates another abstraction to thread through (and reviewers often disagree on where that helper should live: TerminalSession, a utility module, etc.).
Since this PR’s main goal is correctness + preventing large persisted events, the duplication doesn’t materially affect risk/behavior.

When it would be worth doing: if we expect to tune truncation behavior (e.g., different limits per status, or adding optional save-to-disk), a helper would become more valuable.

2) Additional tests / edge cases

Do we need to fix it? Not strictly required, but nice-to-have.

Why:

The PR already covers the critical behavior: session-level truncation before persisting + ensuring LLM-facing content keeps the suffix in the tail.
Adding more tests (exactly-at-limit, just-over-limit, multi-line behavior, head+tail invariants) increases confidence, but also increases test runtime/maintenance (especially with multi-backend terminal types).

What I’d add if we decide to extend tests:

len(obs.text) == small_max behavior when input is much larger (already roughly covered)
obs.text.startswith(...) and .endswith(...) to explicitly assert head+tail preservation
A multi-line generator (e.g. printf 'A\n%.0s' {1..N}) to ensure truncation isn’t doing anything surprising with newline boundaries

If you want, I can follow up with a small refactor adding _truncate_command_output() + 1-2 lightweight edge case tests, but I’m leaning toward keeping this PR minimal.

Co-authored-by: openhands <openhands@all-hands.dev>

openhands-ai · 2026-01-27T11:36:40Z

Looks like there are a few issues preventing this PR from being merged!

GitHub Actions are failing:
- Run tests

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #1823 at branch `truncate-terminal-before-store`

Feel free to include any additional details that might help me get this PR into a better state.

_{^{You can manage your notification settings}}

raymyers

Looks solid, thanks!

xingyaoww

Thanks Engel! LGTM

Truncate terminal observation content before persistence

1896c4a

Co-authored-by: openhands <openhands@all-hands.dev>

all-hands-bot reviewed Jan 26, 2026

View reviewed changes

enyst commented Jan 26, 2026

View reviewed changes

openhands-tools/openhands/tools/terminal/definition.py Outdated Show resolved Hide resolved

Update openhands-tools/openhands/tools/terminal/definition.py

a9ae09e

enyst and others added 4 commits January 26, 2026 09:12

Avoid double-saving truncated terminal outputs

71e3650

Co-authored-by: openhands <openhands@all-hands.dev>

Clarify terminal truncation comment

2f7a506

Co-authored-by: openhands <openhands@all-hands.dev>

Guard terminal observation content type in truncation validator

9943eae

Co-authored-by: openhands <openhands@all-hands.dev>

Propagate full_output_save_dir and test full-output saving

48e59a3

Co-authored-by: openhands <openhands@all-hands.dev>

enyst marked this pull request as draft January 26, 2026 10:14

Truncate terminal outputs in session before persisting

21835a2

Co-authored-by: openhands <openhands@all-hands.dev>

enyst commented Jan 26, 2026

View reviewed changes

openhands-tools/openhands/tools/terminal/definition.py Outdated Show resolved Hide resolved

Update openhands-tools/openhands/tools/terminal/definition.py

50048c4

enyst commented Jan 26, 2026

View reviewed changes

openhands-tools/openhands/tools/terminal/definition.py Outdated Show resolved Hide resolved

Update openhands-tools/openhands/tools/terminal/definition.py

146906a

enyst commented Jan 26, 2026

View reviewed changes

openhands-tools/openhands/tools/terminal/definition.py Outdated Show resolved Hide resolved

Update openhands-tools/openhands/tools/terminal/definition.py

a5437e9

enyst marked this pull request as ready for review January 26, 2026 11:30

OpenHands deleted a comment from openhands-ai bot Jan 26, 2026

all-hands-bot reviewed Jan 26, 2026

View reviewed changes

test: preserve terminal metadata under truncation

0d79939

Co-authored-by: openhands <openhands@all-hands.dev>

enyst requested a review from xingyaoww January 26, 2026 13:12

test: fix truncation metadata test for lint/type

eb24e17

Co-authored-by: openhands <openhands@all-hands.dev>

OpenHands deleted a comment from openhands-ai bot Jan 26, 2026

test: make metadata truncation assertion robust

f61c0ca

Co-authored-by: openhands <openhands@all-hands.dev>

OpenHands deleted a comment from openhands-ai bot Jan 26, 2026

OpenHands deleted a comment from openhands-ai bot Jan 27, 2026

enyst and others added 2 commits January 27, 2026 11:28

test: add multiline terminal output truncation coverage

31ff167

Co-authored-by: openhands <openhands@all-hands.dev>

test: clarify multiline truncation coverage

84b9b1c

Co-authored-by: openhands <openhands@all-hands.dev>

Merge branch 'main' into truncate-terminal-before-store

e4b52eb

enyst mentioned this pull request Jan 28, 2026

feat(condenser): Hard context reset on unrecoverable error #1596

Merged

5 tasks

raymyers approved these changes Jan 29, 2026

View reviewed changes

xingyaoww approved these changes Jan 30, 2026

View reviewed changes

xingyaoww merged commit b53099a into main Jan 30, 2026
27 checks passed

xingyaoww deleted the truncate-terminal-before-store branch January 30, 2026 06:20

enyst mentioned this pull request Jan 31, 2026

Proposal: don't use full events history in the OH ecosystem #1824

Open

Conversation

enyst commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

enyst commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Code Review Summary

🟠 Important Issues

🟡 Suggestions

Positive Notes

Uh oh!

smolpaws commented Jan 26, 2026 • edited by enyst Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

enyst commented Jan 27, 2026

Uh oh!

openhands-ai bot commented Jan 27, 2026

Uh oh!

smolpaws commented Jan 27, 2026

What AI review comments were not fixed?

Should each remaining item be fixed?

1) Code duplication (4 identical maybe_truncate(..., truncate_after=MAX_CMD_OUTPUT_SIZE) blocks)

2) Additional tests / edge cases

Uh oh!

openhands-ai bot commented Jan 27, 2026

Uh oh!

raymyers left a comment

Choose a reason for hiding this comment

Uh oh!

xingyaoww left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

enyst commented Jan 26, 2026 •

edited

Loading

github-actions bot commented Jan 26, 2026 •

edited

Loading

enyst commented Jan 26, 2026 •

edited

Loading

smolpaws commented Jan 26, 2026 •

edited by enyst

Loading

1) Code duplication (4 identical `maybe_truncate(..., truncate_after=MAX_CMD_OUTPUT_SIZE)` blocks)