Skip to content

Return 502 Bad Gateway for MCP service errors#1793

Draft
tofarr wants to merge 2 commits intomainfrom
mcp-error-502-response
Draft

Return 502 Bad Gateway for MCP service errors#1793
tofarr wants to merge 2 commits intomainfrom
mcp-error-502-response

Conversation

@tofarr
Copy link
Collaborator

@tofarr tofarr commented Jan 22, 2026

Summary

This PR adds a dedicated exception handler for MCP (Model Context Protocol) errors that returns HTTP 502 Bad Gateway instead of 500 Internal Server Error.

Background: When users configure external MCP services (like SSE endpoints or stdio processes) that fail to respond, the agent-server was returning a generic 500 error. This made it difficult to:

  1. Distinguish between internal agent-server bugs and external MCP service failures
  2. Provide meaningful error messages to users about their MCP configuration
  3. Implement appropriate retry strategies in downstream services

Changes:

  • Added MCPError exception handler in api.py that returns 502 Bad Gateway
  • Response includes structured error details:
    • detail: "MCP service error"
    • error_type: Exception class name (e.g., "MCPTimeoutError")
    • message: Full error message with troubleshooting hints
    • timeout: (for MCPTimeoutError) The timeout value that was exceeded
    • mcp_servers: (for MCPTimeoutError with config) List of configured MCP server names
  • Logs at WARNING level since this is an expected failure mode (user misconfiguration)
  • Only exposes MCP server names, not full config (which may contain secrets like auth headers)

Example response:

{
  "detail": "MCP service error",
  "error_type": "MCPTimeoutError",
  "message": "MCP tool listing timed out after 30 seconds.\nMCP servers configured: fetch, custom_sse\n\nPossible solutions:\n  1. Increase the timeout value...",
  "timeout": 30.0,
  "mcp_servers": ["fetch", "custom_sse"]
}

Checklist

  • If the PR is changing/adding functionality, are there tests to reflect this?
  • If there is an example, have you run the example to make sure that it works?
  • If there are instructions on how to run the code, have you followed the instructions and made sure that it works?
  • If the feature is significant enough to require documentation, is there a PR open on the OpenHands/docs repository with the same branch name?
  • Is the github CI passing?

@tofarr can click here to continue refining the PR


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:a2e9ada-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-a2e9ada-python \
  ghcr.io/openhands/agent-server:a2e9ada-python

All tags pushed for this build

ghcr.io/openhands/agent-server:a2e9ada-golang-amd64
ghcr.io/openhands/agent-server:a2e9ada-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:a2e9ada-golang-arm64
ghcr.io/openhands/agent-server:a2e9ada-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:a2e9ada-java-amd64
ghcr.io/openhands/agent-server:a2e9ada-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:a2e9ada-java-arm64
ghcr.io/openhands/agent-server:a2e9ada-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:a2e9ada-python-amd64
ghcr.io/openhands/agent-server:a2e9ada-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:a2e9ada-python-arm64
ghcr.io/openhands/agent-server:a2e9ada-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:a2e9ada-golang
ghcr.io/openhands/agent-server:a2e9ada-java
ghcr.io/openhands/agent-server:a2e9ada-python

About Multi-Architecture Support

  • Each variant tag (e.g., a2e9ada-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., a2e9ada-python-amd64) are also available if needed

MCP errors indicate failures in external MCP services (user-configured
servers like SSE endpoints, stdio processes, etc.). Using 502 signals
that the agent-server itself is healthy but an upstream dependency failed.

This improves error handling and observability by:
- Differentiating MCP failures (502) from internal errors (500)
- Including structured error details (error_type, message, timeout, server names)
- Enabling smarter error handling in downstream services
- Protecting secrets by only exposing server names, not full config

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Contributor

github-actions bot commented Jan 22, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-agent-server/openhands/agent_server
   api.py1605863%56, 62, 66–68, 70, 82, 84, 107, 146–148, 150–154, 167, 201, 208–209, 213–216, 218, 232, 239, 244–245, 247–251, 259, 267, 272–273, 275–276, 279–282, 284, 290, 294, 300, 308–309, 317–318, 326, 330–331, 333, 339
TOTAL16061784851% 

Co-authored-by: openhands <openhands@all-hands.dev>
@openhands-ai
Copy link

openhands-ai bot commented Jan 22, 2026

Looks like there are a few issues preventing this PR from being merged!

  • GitHub Actions are failing:
    • Agent Server

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #1793 at branch `mcp-error-502-response`

Feel free to include any additional details that might help me get this PR into a better state.

You can manage your notification settings

@all-hands-bot
Copy link
Collaborator

[Automatic Post]: It has been a while since there was any activity on this PR. @tofarr, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants