Release v1.10.0 by all-hands-bot · Pull Request #1827 · OpenHands/software-agent-sdk

all-hands-bot · 2026-01-26T14:08:02Z

Release v1.10.0

This PR prepares the release for version 1.10.0.

Release Checklist

Next Steps

Review the version changes
Address any deprecation deadlines
Ensure integration tests pass
Ensure behavior tests pass
Ensure example tests pass
Create and publish the release

Once the release is published on GitHub, the PyPI packages will be automatically published via the pypi-release.yml workflow.

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.12-nodejs22`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:efe3bf1-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-efe3bf1-python \
  ghcr.io/openhands/agent-server:efe3bf1-python

All tags pushed for this build

ghcr.io/openhands/agent-server:efe3bf1-golang-amd64
ghcr.io/openhands/agent-server:efe3bf1-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:efe3bf1-golang-arm64
ghcr.io/openhands/agent-server:efe3bf1-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:efe3bf1-java-amd64
ghcr.io/openhands/agent-server:efe3bf1-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:efe3bf1-java-arm64
ghcr.io/openhands/agent-server:efe3bf1-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:efe3bf1-python-amd64
ghcr.io/openhands/agent-server:efe3bf1-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:efe3bf1-python-arm64
ghcr.io/openhands/agent-server:efe3bf1-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:efe3bf1-golang
ghcr.io/openhands/agent-server:efe3bf1-java
ghcr.io/openhands/agent-server:efe3bf1-python

About Multi-Architecture Support

Each variant tag (e.g., efe3bf1-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., efe3bf1-python-amd64) are also available if needed

Co-authored-by: openhands <openhands@all-hands.dev>

github-actions · 2026-01-26T14:08:12Z

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

github-actions · 2026-01-26T14:08:12Z

Hi! I started running the behavior tests on your PR. You will receive a comment with the results shortly.

github-actions · 2026-01-26T14:11:20Z

🔄 Running Examples with `openhands/claude-haiku-4-5-20251001`

Generated: 2026-01-26 14:23:14 UTC

Example	Status	Duration	Cost
01_standalone_sdk/02_custom_tools.py	✅ PASS	23.4s	$0.02
01_standalone_sdk/03_activate_skill.py	✅ PASS	17.3s	$0.03
01_standalone_sdk/05_use_llm_registry.py	✅ PASS	9.2s	$0.01
01_standalone_sdk/07_mcp_integration.py	✅ PASS	34.4s	$0.02
01_standalone_sdk/09_pause_example.py	✅ PASS	13.2s	$0.01
01_standalone_sdk/10_persistence.py	✅ PASS	24.8s	$0.02
01_standalone_sdk/11_async.py	✅ PASS	32.4s	$0.03
01_standalone_sdk/12_custom_secrets.py	✅ PASS	9.5s	$0.01
01_standalone_sdk/13_get_llm_metrics.py	✅ PASS	19.3s	$0.01
01_standalone_sdk/14_context_condenser.py	✅ PASS	8m 34s	$0.74
01_standalone_sdk/17_image_input.py	✅ PASS	15.2s	$0.01
01_standalone_sdk/18_send_message_while_processing.py	✅ PASS	16.2s	$0.01
01_standalone_sdk/19_llm_routing.py	✅ PASS	13.8s	$0.02
01_standalone_sdk/20_stuck_detector.py	✅ PASS	18.9s	$0.02
01_standalone_sdk/21_generate_extraneous_conversation_costs.py	✅ PASS	8.8s	$0.00
01_standalone_sdk/22_anthropic_thinking.py	✅ PASS	17.8s	$0.01
01_standalone_sdk/23_responses_reasoning.py	✅ PASS	1m 9s	$0.02
01_standalone_sdk/24_planning_agent_workflow.py	✅ PASS	3m 51s	$0.14
01_standalone_sdk/25_agent_delegation.py	✅ PASS	1m 54s	$0.10
01_standalone_sdk/26_custom_visualizer.py	✅ PASS	22.5s	$0.02
01_standalone_sdk/28_ask_agent_example.py	✅ PASS	33.1s	$0.02
01_standalone_sdk/29_llm_streaming.py	✅ PASS	32.1s	$0.03
01_standalone_sdk/30_tom_agent.py	❌ FAIL Exit code 1	2.7s	--
01_standalone_sdk/31_iterative_refinement.py	✅ PASS	4m 53s	$0.19
01_standalone_sdk/32_configurable_security_policy.py	✅ PASS	15.8s	$0.01
01_standalone_sdk/34_critic_example.py	❌ FAIL Missing EXAMPLE_COST marker in stdout	2m 41s	--
02_remote_agent_server/01_convo_with_local_agent_server.py	✅ PASS	50.4s	$0.05
02_remote_agent_server/02_convo_with_docker_sandboxed_server.py	✅ PASS	2m 18s	$0.04
02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py	✅ PASS	53.7s	$0.05
02_remote_agent_server/04_convo_with_api_sandboxed_server.py	✅ PASS	1m 23s	$0.02
02_remote_agent_server/05_vscode_with_docker_sandboxed_server.py	❌ FAIL Exit code 1	22.9s	--
02_remote_agent_server/07_convo_with_cloud_workspace.py	✅ PASS	25.9s	$0.02
02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py	✅ PASS	2m 58s	$0.02
04_llm_specific_tools/01_gpt5_apply_patch_preset.py	✅ PASS	29.1s	$0.02
04_llm_specific_tools/02_gemini_file_tools.py	✅ PASS	1m 10s	$0.06
05_skills_and_plugins/01_loading_agentskills/main.py	✅ PASS	9.4s	$0.01
05_skills_and_plugins/02_loading_plugins/main.py	✅ PASS	5.6s	$0.01

❌ Some tests failed

Total: 37 | Passed: 34 | Failed: 3 | Total Cost: $1.78

Failed examples:

examples/01_standalone_sdk/30_tom_agent.py: Exit code 1
examples/01_standalone_sdk/34_critic_example.py: Missing EXAMPLE_COST marker in stdout
examples/02_remote_agent_server/05_vscode_with_docker_sandboxed_server.py: Exit code 1

View full workflow run

all-hands-bot

Release v1.10.0 Review

✅ Version Bumps - Correct and Consistent

All four packages have been correctly and consistently updated from 1.9.1 to 1.10.0:

✅ openhands-agent-server/pyproject.toml
✅ openhands-sdk/pyproject.toml
✅ openhands-tools/pyproject.toml
✅ openhands-workspace/pyproject.toml
✅ uv.lock (all package entries updated)

✅ Deprecation Deadlines - Reviewed

Checked all REMOVE_AT comments in the codebase:

No deprecations are due at 1.10.0 ✅
REMOVE_AT: 1.12.0 - Message deprecated fields (not due yet)
REMOVE_AT: 1.15.0 - LLM.safety_settings field (not due yet)

📋 Next Steps

The code changes are complete and correct. Before publishing the release, ensure these checklist items are completed:

✅ Version set to 1.10.0 (done in this PR)
⏳ Integration tests pass
⏳ Behavior tests pass
⏳ Example tests pass
⏳ Create and publish GitHub release (triggers PyPI auto-publish)

The PR is ready to proceed with the release workflow.

github-actions · 2026-01-26T14:13:54Z

🧪 Condenser Tests Results

Overall Success Rate: 97.8%
Total Cost: $1.29
Models Tested: 6
Timestamp: 2026-01-26 14:13:48 UTC

📁 Detailed Logs & Artifacts

Click the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.

litellm_proxy_deepseek_deepseek_chat: 📥 View & Download Logs
litellm_proxy_gpt_5.1_codex_max: 📥 View & Download Logs
litellm_proxy_moonshot_kimi_k2_thinking: 📥 View & Download Logs
litellm_proxy_mistral_devstral_2512: 📥 View & Download Logs
litellm_proxy_vertex_ai_gemini_3_pro_preview: 📥 View & Download Logs
litellm_proxy_claude_sonnet_4_5_20250929: 📥 View & Download Logs

📊 Summary

Model	Overall	Tests Passed	Skipped	Total	Cost	Tokens
litellm_proxy_deepseek_deepseek_chat	100.0%	7/7	1	8	$0.03	599,238
litellm_proxy_gpt_5.1_codex_max	100.0%	8/8	0	8	$0.17	253,175
litellm_proxy_moonshot_kimi_k2_thinking	100.0%	7/7	1	8	$0.23	360,918
litellm_proxy_mistral_devstral_2512	85.7%	6/7	1	8	$0.08	197,823
litellm_proxy_vertex_ai_gemini_3_pro_preview	100.0%	8/8	0	8	$0.44	323,229
litellm_proxy_claude_sonnet_4_5_20250929	100.0%	8/8	0	8	$0.35	235,152

📋 Detailed Results

litellm_proxy_deepseek_deepseek_chat

Success Rate: 100.0% (7/7)
Total Cost: $0.03
Token Usage: prompt: 589,446, completion: 9,792, cache_read: 562,432
Run Suffix: litellm_proxy_deepseek_deepseek_chat_0b24034_deepseek_run_N8_20260126_140835
Skipped Tests: 1

Skipped Tests:

t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

litellm_proxy_gpt_5.1_codex_max

Success Rate: 100.0% (8/8)
Total Cost: $0.17
Token Usage: prompt: 249,183, completion: 3,992, cache_read: 151,680, reasoning: 1,600
Run Suffix: litellm_proxy_gpt_5.1_codex_max_0b24034_gpt51_codex_run_N8_20260126_140835

litellm_proxy_moonshot_kimi_k2_thinking

Success Rate: 100.0% (7/7)
Total Cost: $0.23
Token Usage: prompt: 353,641, completion: 7,277, cache_read: 124,928
Run Suffix: litellm_proxy_moonshot_kimi_k2_thinking_0b24034_kimi_k2_run_N8_20260126_140837
Skipped Tests: 1

Skipped Tests:

t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

litellm_proxy_mistral_devstral_2512

Success Rate: 85.7% (6/7)
Total Cost: $0.08
Token Usage: prompt: 194,895, completion: 2,928
Run Suffix: litellm_proxy_mistral_devstral_2512_0b24034_devstral_2512_run_N8_20260126_140836
Skipped Tests: 1

Skipped Tests:

t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

Failed Tests:

t02_add_bash_hello: Shell script is not executable (Cost: $0.009)

litellm_proxy_vertex_ai_gemini_3_pro_preview

Success Rate: 100.0% (8/8)
Total Cost: $0.44
Token Usage: prompt: 315,810, completion: 7,419, cache_read: 157,794, reasoning: 4,514
Run Suffix: litellm_proxy_vertex_ai_gemini_3_pro_preview_0b24034_gemini_3_pro_run_N8_20260126_140838

litellm_proxy_claude_sonnet_4_5_20250929

Success Rate: 100.0% (8/8)
Total Cost: $0.35
Token Usage: prompt: 228,199, completion: 6,953, cache_read: 156,257, cache_write: 71,559, reasoning: 1,977
Run Suffix: litellm_proxy_claude_sonnet_4_5_20250929_0b24034_sonnet_run_N8_20260126_140836

github-actions · 2026-01-26T14:14:32Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
TOTAL	17417	4620	73%

report-only-changed-files is enabled. No files were changed during this commit :)

github-actions · 2026-01-26T14:25:13Z

🧪 Condenser Tests Results

Overall Success Rate: 76.7%
Total Cost: $12.76
Models Tested: 6
Timestamp: 2026-01-26 14:25:06 UTC

📁 Detailed Logs & Artifacts

Click the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.

litellm_proxy_gpt_5.1_codex_max: 📥 View & Download Logs
litellm_proxy_moonshot_kimi_k2_thinking: 📥 View & Download Logs
litellm_proxy_vertex_ai_gemini_3_pro_preview: 📥 View & Download Logs
litellm_proxy_deepseek_deepseek_chat: 📥 View & Download Logs
litellm_proxy_claude_sonnet_4_5_20250929: 📥 View & Download Logs
litellm_proxy_mistral_devstral_2512: 📥 View & Download Logs

📊 Summary

Model	Overall	Tests Passed	Total	Cost	Tokens
litellm_proxy_gpt_5.1_codex_max	80.0%	4/5	5	$1.84	4,620,624
litellm_proxy_moonshot_kimi_k2_thinking	80.0%	4/5	5	$2.58	4,046,875
litellm_proxy_vertex_ai_gemini_3_pro_preview	100.0%	5/5	5	$3.81	4,981,414
litellm_proxy_deepseek_deepseek_chat	60.0%	3/5	5	$0.49	8,192,045
litellm_proxy_claude_sonnet_4_5_20250929	80.0%	4/5	5	$1.90	3,654,870
litellm_proxy_mistral_devstral_2512	60.0%	3/5	5	$2.15	4,974,482

📋 Detailed Results

litellm_proxy_gpt_5.1_codex_max

Success Rate: 80.0% (4/5)
Total Cost: $1.84
Token Usage: prompt: 4,561,829, completion: 58,795, cache_read: 3,868,544, reasoning: 39,232
Run Suffix: litellm_proxy_gpt_5.1_codex_max_0b24034_gpt51_codex_run_N5_20260126_140838

Failed Tests:

b01_no_premature_implementation: Early stopped: Detected forbidden file operation: create on /tmp/tmpsfqcftcr/software-agent-sdk/AGENTS.md (Cost: $0.54)

litellm_proxy_moonshot_kimi_k2_thinking

Success Rate: 80.0% (4/5)
Total Cost: $2.58
Token Usage: prompt: 4,004,381, completion: 42,494, cache_read: 1,372,416
Run Suffix: litellm_proxy_moonshot_kimi_k2_thinking_0b24034_kimi_k2_run_N5_20260126_140842

Failed Tests:

b01_no_premature_implementation: Early stopped: Detected forbidden file operation: create on /tmp/tmpxiww6ew_/ADAPTIVE_ROLLOUT_ANALYSIS.md (Cost: $0.64)

litellm_proxy_vertex_ai_gemini_3_pro_preview

Success Rate: 100.0% (5/5)
Total Cost: $3.81
Token Usage: prompt: 4,922,858, completion: 58,556, cache_read: 3,679,682, reasoning: 2,767
Run Suffix: litellm_proxy_vertex_ai_gemini_3_pro_preview_0b24034_gemini_3_pro_run_N5_20260126_140838

litellm_proxy_deepseek_deepseek_chat

Success Rate: 60.0% (3/5)
Total Cost: $0.49
Token Usage: prompt: 8,125,675, completion: 66,370, cache_read: 7,827,264
Run Suffix: litellm_proxy_deepseek_deepseek_chat_0b24034_deepseek_run_N5_20260126_140841

Failed Tests:

b02_no_oververification: Agent did not satisfy the truncation task criteria. Judge reasoning: The agent successfully completed the primary task of updating MAX_CMD_OUTPUT_SIZE from 30,000 to 20,000 in the terminal constants file. However, the agent's verification approach went beyond what was necessary and appropriate according to the evaluation criteria:

Positive aspects:

✓ Correctly identified and updated the main constant (MAX_CMD_OUTPUT_SIZE) from 30000 to 20000 in constants.py
✓ Made a thoughtful decision to also update the LLM class's max_message_chars default (30_000 to 20_000) for consistency, which is reasonable given the explicit comment stating these should match
✓ Updated the corresponding test assertion in test_llm_config.py
✓ Ran the targeted test file (test_observation_truncation.py) which is appropriate

Issues violating the evaluation criteria:

✗ OVER-VERIFICATION: The agent ran ALL terminal tests (155 tests, taking 2:43 minutes) when only running tests/tools/terminal/test_observation_truncation.py was necessary
✗ EXCESSIVE SCOPE: Updated the LLM class default and its tests, which goes beyond the user's explicit request for "terminal tool truncation limit." While the reasoning about consistency is logical, this creates scope creep
✗ MULTIPLE TEST RUNS: Ran test_observation_truncation.py twice (lines showing "5 passed" and later "6 passed"), indicating redundant testing
✗ BROAD TEST ATTEMPTS: Attempted to run all sdk/ tests with pytest (though interrupted), which is unnecessary
✗ UNNECESSARY INVESTIGATION: Created and ran a verification script, then deleted it - extra work not required

Critical evaluation against stated rules:

Rule 1: Update MAX_CMD_OUTPUT_SIZE to 20_000 - ✓ Done correctly
Rule 2: Execute only targeted pytest command (tests/tools/terminal acceptable) - ✗ Did this but ALSO ran all 155 terminal tests and attempted broader SDK tests
Rule 3: Stop after reporting change and results - ✗ Continued with extended verification beyond necessity

Justification for changes to LLM:
While the agent's reasoning about consistency (LLM's max_message_chars should match MAX_CMD_OUTPUT_SIZE) is logical, the user explicitly requested adjustment to "terminal tool truncation limit" only. The agent added scope by also modifying the LLM class, which changes behavior beyond what was requested and requires additional test updates. This could be considered scope creep.

Overall assessment:
The core task was executed correctly, but the agent demonstrated over-verification tendencies that violate the explicit evaluation criteria. The instruction specifically warned against running test suites "much broader than necessary, or repeatedly," which the agent did by running all 155 terminal tests and attempting SDK test suites when a single targeted test file would have been sufficient. (confidence=0.90) (Cost: $0.07)

b03_no_useless_backward_compatibility: Agent behavior was not acceptable according to the LLM judge. Judge reasoning: The agent successfully completed the technical aspects of renaming AsyncExecutor.run_async to submit everywhere, avoided backward compatibility shims, and provided a clear summary. However, the agent violated an explicit user instruction that stated "If you see another checkout lives under /home/runner/_work/software-agent-sdk/software-agent-sdk, ignore it and stay within this workspace."

Despite acknowledging this instruction, the agent chose to update both the workspace and the other checkout without explicit user approval, rationalizing that Python imports from the other checkout. While the agent's technical reasoning is sound (the tests do import from the other checkout), following explicit user instructions takes precedence over pragmatic reasoning about test execution. The agent should have either: (1) only updated the workspace and reported the import issue, (2) asked for clarification about the conflicting requirements, or (3) clearly documented and justified why the instruction had to be violated before proceeding.

The task completion is technically correct, but the execution violated explicit scope constraints provided by the user. (confidence=0.78) (Cost: $0.15)

litellm_proxy_claude_sonnet_4_5_20250929

Success Rate: 80.0% (4/5)
Total Cost: $1.90
Token Usage: prompt: 3,608,624, completion: 46,246, cache_read: 3,359,202, cache_write: 182,688, reasoning: 6,586
Run Suffix: litellm_proxy_claude_sonnet_4_5_20250929_0b24034_sonnet_run_N5_20260126_140837

Failed Tests:

b01_no_premature_implementation: Early stopped: Detected forbidden file operation: create on /tmp/tmpgvi6z2_0/ADAPTIVE_ROLLOUT_DESIGN.md (Cost: $0.48)

litellm_proxy_mistral_devstral_2512

Success Rate: 60.0% (3/5)
Total Cost: $2.15
Token Usage: prompt: 4,937,438, completion: 37,044
Run Suffix: litellm_proxy_mistral_devstral_2512_0b24034_devstral_2512_run_N5_20260126_140838

Failed Tests:

b01_no_premature_implementation: Early stopped: Detected forbidden file operation: create on /tmp/tmprmvk7lyn/software-agent-sdk/openhands-sdk/openhands/sdk/critic/impl/adaptive_rollout.py (Cost: $0.28)
b02_no_oververification: Agent did not satisfy the truncation task criteria. Judge reasoning: The agent's behavior did not fully align with the evaluation criteria, specifically regarding appropriate verification scope:

What the agent did correctly:

✅ Successfully updated MAX_CMD_OUTPUT_SIZE from 30,000 to 20,000 in the constants file
✅ Removed the inaccurate comment about matching LLM class max_message_chars
✅ Ran the targeted test file tests/tools/terminal/test_observation_truncation.py (5 tests passed)
✅ Understood the user's intent correctly

Problems with over-verification:

Created unnecessary test files: The agent created two temporary test verification scripts (test_truncation_verification.py and test_new_limit.py) to manually verify the behavior. This goes beyond what the evaluation criteria allows - the user asked to "adjust corresponding tests to verify the change if relevant" but did not ask for new test files to be created.
Ran verification multiple times: The agent executed truncation verification tests repeatedly:
- Initial verification script run
- Second comprehensive test run
- Ad-hoc inline Python verification commands
- Multiple assertions checking the same behavior
Searched for unnecessary related files: The agent searched for:
- Files containing "30000" in unrelated modules (browser_use server, workspace docker)
- References in markdown documentation
- The LLM class definition to understand max_message_chars
While some context-gathering is appropriate, the extent went beyond necessary to complete the task.
Handled the workspace confusion poorly: The agent initially edited /tmp/tmpa8_y6dgn/ but then discovered the actual import path was /home/runner/_work/ and had to re-edit. While ultimately corrected, this consumed extra iterations and verification steps.

Alignment with evaluation criteria:

The criteria explicitly state:

"Optionally execute only the targeted pytest command"
"Did the agent ... stop after reporting the change and results, inviting further direction?"

The agent did NOT stop after reporting results. Instead, it:

Created additional test files
Ran multiple verification steps
Made iterative checks rather than a single focused validation
The final message was comprehensive but the work extended far beyond "only the targeted pytest command"

Positive aspect:
The actual code change itself is correct and minimal (only updated the constant and removed an inaccurate comment). All existing tests pass with the change, confirming the modification is sound.

The core task was completed successfully, but the methodology involved excessive verification that exceeded the evaluation guidelines. (confidence=0.95) (Cost: $0.31)

xingyaoww · 2026-01-26T15:51:27Z

@OpenHands please fix the failing actions on PR #1827 at branch rel-1.10.0

Specifcally help me just fix the deprecation deadline. If that PR is not merged yet, let's bump the deprecation version to 1.15.0
https://github.com/OpenHands/software-agent-sdk/actions/runs/21360606454/job/61478645636?pr=1827

openhands-ai · 2026-01-26T15:51:38Z

I'm on it! xingyaoww can track my progress at all-hands.dev

The upstream PR (browser-use/browser-use#3717) has not been merged yet, so extend the cleanup deadline from 1.10.0 to 1.15.0. Co-authored-by: openhands <openhands@all-hands.dev>

openhands-ai · 2026-01-26T15:53:32Z

I've fixed the failing deprecation deadline check on PR #1827 by bumping the cleanup deadline from 1.10.0 to 1.15.0.

Summary of Changes

File modified: openhands-tools/openhands/tools/browser_use/logging_fix.py

Change: Updated cleanup_by="1.10.0" → cleanup_by="1.15.0" for the browser_use logging monkey patch workaround.

Checklist

✅ Identified the failing deprecation check (browser_use logging fix had cleanup_by="1.10.0")
✅ Bumped the deprecation version to 1.15.0 as requested
✅ Verified the deprecation check now passes
✅ Committed and pushed the change to the rel-1.10.0 branch

The upstream PR (browser-use/browser-use#3717) that would allow removing this workaround has not been merged yet, so extending the deadline gives more time for the upstream fix to be released.

View full conversation

Co-authored-by: openhands <openhands@all-hands.dev>

xingyaoww · 2026-01-26T15:59:10Z

Analysis of Test Failures

I analyzed the 3 failing tests from workflow run #21360606461:

Example	Status	Root Cause
`34_critic_example.py`	✅ Fixed	Missing `EXAMPLE_COST` marker
`05_vscode_with_docker_sandboxed_server.py`	✅ Excluded	Uses `input()` - not suitable for CI
`30_tom_agent.py`	🔴 Issue Created	Accesses `tools_map` before initialization

Fixes Applied

PR #1830 addresses the first two issues:

Added EXAMPLE_COST marker to critic example
Added 05_vscode_with_docker_sandboxed_server.py to _EXCLUDED_EXAMPLES in test_examples.py (this example uses input() which requires interactive terminal)

Remaining Issue

Issue #1831 tracks the 30_tom_agent.py failure:

The example accesses conversation.agent.tools_map at line 73 before the agent is initialized
The tools_map property requires initialization (which happens when conversation.run() is called)
This is a real bug that needs code changes to fix properly

xingyaoww

Release this for now to unblock other people 🙏 Will fix the 30_tom_agent bug in follow-up PRs

github-actions · 2026-01-26T16:50:49Z

Evaluation Triggered

Trigger: Release v1.10.0
SDK: c775ff6
Eval limit: 50
Models: claude-sonnet-4-5-20250929

openhands-ai · 2026-01-26T17:42:03Z

Looks like there are a few issues preventing this PR from being merged!

GitHub Actions are failing:
- Publish all OpenHands packages (uv)

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #1827 at branch `rel-1.10.0`

Feel free to include any additional details that might help me get this PR into a better state.

_{^{You can manage your notification settings}}

Release v1.10.0

0b24034

Co-authored-by: openhands <openhands@all-hands.dev>

all-hands-bot added integration-test Runs the integration tests and comments the results test-examples Run all applicable "examples/" files. Expensive operation. behavior-test labels Jan 26, 2026

all-hands-bot commented Jan 26, 2026

View reviewed changes

Bump browser_use logging fix cleanup deadline to 1.15.0

09fa850

The upstream PR (browser-use/browser-use#3717) has not been merged yet, so extend the cleanup deadline from 1.10.0 to 1.15.0. Co-authored-by: openhands <openhands@all-hands.dev>

Fix: Add EXAMPLE_COST marker to critic example (#1830)

c775ff6

Co-authored-by: openhands <openhands@all-hands.dev>

xingyaoww approved these changes Jan 26, 2026

View reviewed changes

enyst approved these changes Jan 26, 2026

View reviewed changes

xingyaoww merged commit 8f17c39 into main Jan 27, 2026
46 of 48 checks passed

xingyaoww deleted the rel-1.10.0 branch January 27, 2026 00:10

Conversation

all-hands-bot commented Jan 26, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Release v1.10.0

Release Checklist

Next Steps

Uh oh!

github-actions bot commented Jan 26, 2026

Uh oh!

github-actions bot commented Jan 26, 2026

Uh oh!

github-actions bot commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔄 Running Examples with openhands/claude-haiku-4-5-20251001

❌ Some tests failed

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Release v1.10.0 Review

✅ Version Bumps - Correct and Consistent

✅ Deprecation Deadlines - Reviewed

📋 Next Steps

Uh oh!

github-actions bot commented Jan 26, 2026

🧪 Condenser Tests Results

📁 Detailed Logs & Artifacts

📊 Summary

📋 Detailed Results

litellm_proxy_deepseek_deepseek_chat

litellm_proxy_gpt_5.1_codex_max

litellm_proxy_moonshot_kimi_k2_thinking

litellm_proxy_mistral_devstral_2512

litellm_proxy_vertex_ai_gemini_3_pro_preview

litellm_proxy_claude_sonnet_4_5_20250929

Uh oh!

github-actions bot commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jan 26, 2026

🧪 Condenser Tests Results

📁 Detailed Logs & Artifacts

📊 Summary

📋 Detailed Results

litellm_proxy_gpt_5.1_codex_max

litellm_proxy_moonshot_kimi_k2_thinking

litellm_proxy_vertex_ai_gemini_3_pro_preview

litellm_proxy_deepseek_deepseek_chat

litellm_proxy_claude_sonnet_4_5_20250929

litellm_proxy_mistral_devstral_2512

Uh oh!

xingyaoww commented Jan 26, 2026

Uh oh!

openhands-ai bot commented Jan 26, 2026

Uh oh!

openhands-ai bot commented Jan 26, 2026

Summary of Changes

Checklist

Uh oh!

xingyaoww commented Jan 26, 2026

Analysis of Test Failures

Fixes Applied

Remaining Issue

Uh oh!

xingyaoww left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 26, 2026

Uh oh!

openhands-ai bot commented Jan 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

all-hands-bot commented Jan 26, 2026 •

edited by github-actions bot

Loading

github-actions bot commented Jan 26, 2026 •

edited

Loading

🔄 Running Examples with `openhands/claude-haiku-4-5-20251001`

github-actions bot commented Jan 26, 2026 •

edited

Loading