Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 16 additions & 10 deletions docs/design/agent-workflows/documentation/adapters/claude-code.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,16 +31,22 @@ not when the harness name is something in particular. In practice the capability
static per-harness fallback (`engines/sandbox_agent/capabilities.ts`): the daemon rarely fills
a real `info.capabilities`, so the runner uses `mcpTools: true` for any non-Pi harness.

The mechanism is a small stdio MCP server named `agenta-tools` (`tools/mcp-server.ts`, launched
by `tools/mcp-bridge.ts`) that the daemon attaches to the session. This is an Agenta tool
DELIVERY vehicle, not a user-declared MCP server: it carries the same gateway and code specs
the Pi extension would register, just exposed over MCP because Claude cannot take a native
tool. Its env carries only public metadata (names, descriptions, schemas) and a relay
directory; the `call_ref`, the code, the scoped secrets, and the callback auth never reach it.
When the model calls a tool, the server relays the request back to the runner over the file
relay (`tools/relay.ts`), and the runner runs the private spec from memory and POSTs to
`/tools/call`. The safety property is identical to Pi's: the provider key and the connection
auth stay server-side, and the agent only ever asks Agenta to run a named tool.
The mechanism is a small MCP server named `agenta-tools` (`tools/tool-mcp-http.ts`, built by
`tools/mcp-bridge.ts` `buildToolMcpServers`) that the runner serves on a loopback HTTP endpoint
and attaches to the session as an ACP `type: "http"` MCP server. This is an Agenta tool DELIVERY
vehicle, not a user-declared MCP server: it carries the same gateway and code specs the Pi
extension would register, just exposed over MCP because Claude cannot take a native tool. It runs
in the already-running runner process (no runner-host child) and is reachable only from loopback;
it holds only public metadata (names, descriptions, schemas) and a relay directory; the
`call_ref`, the code, the scoped secrets, and the callback auth never reach it. When the model
calls a tool, the server relays the request back to the runner over the file relay
(`tools/relay.ts`), and the runner runs the private spec from memory and POSTs to `/tools/call`.
The safety property is identical to Pi's: the provider key and the connection auth stay
server-side, and the agent only ever asks Agenta to run a named tool.

(This internal channel was disabled as collateral with the user-stdio-MCP disable in PR #4831 and
restored over loopback HTTP by the gateway-tool-mcp project. It is independent of the user MCP
capability below — the two toggle separately.)

User-declared `mcp_servers` are a separate thing and effectively off today. They would reach
Claude through `toAcpMcpServers` as additional ACP stdio servers, but only when
Expand Down
12 changes: 8 additions & 4 deletions docs/design/agent-workflows/documentation/ground-truth.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,8 +50,10 @@ this page and the referenced code as the source of truth.
tool secrets.
- Callback tools route through `/tools/call`. On Daytona, Pi tool calls use the runner file
relay.
- MCP delivery exists for non-Pi harnesses through the stdio MCP bridge, but service-side
MCP resolution is feature-gated.
- Gateway/code tool delivery to non-Pi harnesses (Claude) exists through the internal MCP
channel, served over a loopback HTTP MCP endpoint the runner stands up (no runner-host child
process). User-declared MCP resolution is feature-gated (`AGENTA_AGENT_ENABLE_MCP`, off by
default).

## Not Implemented

Expand All @@ -71,8 +73,10 @@ this page and the referenced code as the source of truth.
per option (`HARNESS_IDENTITIES`); the stored/wire harness value stays the bare string.
- Per-request model override is not honored on the Pi ACP path. pi-acp accepts only its
default model and silently falls back (`projects/qa/findings.md`, F-007).
- Remote (`http`) MCP servers are skipped by the runner path. Local stdio MCP is the path
represented by the bridge.
- User-declared MCP transports split: remote (`http`) servers are delivered by the runner
(`toAcpMcpServers`, #4834); stdio servers are disabled (`USER_MCP_UNSUPPORTED_MESSAGE`) because
they launch a process on the runner host. The runner's own internal gateway-tool channel is a
separate thing and is delivered over loopback HTTP.
- Trigger lifecycle, Compose.io trigger integration, and event-to-agent mapping are not
implemented in the agent workflow code.
- A persisted agent template object that separates `AGENTS.md`, skills, tools,
Expand Down
7 changes: 4 additions & 3 deletions docs/design/agent-workflows/documentation/tools.md
Original file line number Diff line number Diff line change
Expand Up @@ -176,8 +176,9 @@ network. So the call is relayed through files instead: the in-sandbox tool write
file to a relay directory, the runner (which can reach Agenta) reads it, performs the same
`/tools/call` POST, and writes the answer back (`relayToolCall` in `dispatch.ts`,
`startToolRelay` in `tools/relay.ts`). Same callback, same envelope, different delivery. The
non-Pi MCP bridge uses this same relay even on local runs, because the bridge runs in a
separate process that the runner keeps blind to the private spec.
non-Pi internal MCP channel (a loopback HTTP MCP server the runner serves) uses this same relay
even on local runs, because the harness calling it is kept blind to the private spec — only
public metadata crosses the channel, and execution relays back to the runner.

### Code tools: the runner runs them locally

Expand Down Expand Up @@ -304,7 +305,7 @@ declarative UI spec (`RenderHint` in `protocol.ts`).
- The `render` hint is plumbed end to end on the runner side, but full frontend projection of
every render kind is still in progress.
- Gateway calls on Daytona depend on the file relay, because the sandbox cannot reach Agenta
directly. The relay is also used by the non-Pi MCP bridge on local runs.
directly. The relay is also used by the non-Pi internal MCP channel on local runs.
- **Code tools are standard-library-only.** The image ships `python3` and `node`, but the
child env has no package install and no module path to the runner's dependencies, so a tool
cannot import third-party packages.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,26 +1,42 @@
# Runner To MCP Server

Pi takes its tools natively. Every other harness gets tools over MCP. The runner-owned stdio
tool bridge (which exposed backend-resolved tools to non-Pi harnesses) is currently DISABLED —
it launched a child process on the runner host, outside the sandbox boundary. User-declared
MCP servers ride the same `/run` payload after the Python side resolves their secrets; of
those, HTTP (remote) servers are delivered and stdio servers are disabled for the same reason.
This page covers both: the runner-owned bridge and the user servers.
Pi takes its tools natively. Every other harness gets tools over MCP. There are TWO independent
MCP layers, and they toggle separately (do not merge their gates):

1. **The internal gateway-tool channel** — the runner synthesizes it from the run's resolved
`customTools` so a non-Pi harness (Claude) can receive Agenta gateway/callback tools. It is
DELIVERED, over an internal loopback HTTP MCP endpoint the runner serves (no runner-host child
process).
2. **User-declared MCP servers** — the user's own servers on the `/run` payload after the Python
side resolves their secrets. HTTP (remote) servers are delivered; stdio servers are DISABLED
(they launch an arbitrary process on the runner host, outside the sandbox boundary).

PR #4831 once conflated these into a single `MCP_UNSUPPORTED_MESSAGE` switch, which disabled the
internal channel as collateral with the (correct) user-stdio disable; the gateway-tool-mcp
project split them again. The user-facing constant is now `USER_MCP_UNSUPPORTED_MESSAGE` and means
ONLY "user MCP servers are unsupported"; the internal channel never borrows it.

## The contract

**Gating.** The runner builds MCP servers only when the harness is not Pi and the capability
probe reports `mcpTools: true`. Pi always returns an empty MCP set because it gets tools the
native way.

**The stdio bridge (disabled).** The runner-owned server spoke JSON-RPC 2.0 over stdio and
answered three methods. It is disabled today (`MCP_UNSUPPORTED_MESSAGE`); the shape below is
retained for when its runner-host execution is made sandbox-safe:
**The internal gateway-tool channel (delivered, HTTP on loopback).** For a non-Pi harness with
executable tool specs, `buildToolMcpServers` starts a tiny MCP server on `127.0.0.1:<ephemeral>`
and returns one ACP `type: "http"` entry (`{name: "agenta-tools", url, headers: []}`). The
server speaks JSON-RPC 2.0 over Streamable-HTTP (stateless JSON mode) and answers three methods:

- `initialize`: returns protocol version and `capabilities.tools`.
- `tools/list`: returns the resolved tool specs as MCP tools. Client-kind tools are filtered
out here, because the browser fulfills those.
- `tools/call`: runs the named tool with its arguments and returns `content`, or an error.
- `tools/call`: runs the named tool through `runResolvedTool(..., { relayDir })` (the same relay
the Pi path uses) and returns `content`, or an error.

It carries NO credential: the entry has empty `headers`, the server holds only public metadata +
the relay dir, and it is bound to loopback. It launches no child process — it is served by the
already-running runner — so it does not reintroduce the runner-host execution hole that #4831
closed for user stdio MCP. The run end closes it (releases the port).

**The file relay.** A resolved tool may need to run privately rather than inside the harness
process. The relay moves the call across that boundary: the child writes a `<id>.req.json`
Expand All @@ -43,23 +59,27 @@ allowlist, and permission. Two transports, opposite states:
- **Stdio (`transport: "stdio"` + `command`) is disabled.** A stdio server launches an
arbitrary process on the runner host, outside the sandbox boundary, so the implementation is
disabled (parity with the removed code execution) until its security is fixed. `run-plan.ts`
refuses any run carrying one (`MCP_UNSUPPORTED_MESSAGE`); `toAcpMcpServers` throws the same as
a defense-in-depth backstop. The wire shape is kept; only delivery is off.
refuses any run carrying one (`USER_MCP_UNSUPPORTED_MESSAGE`); `toAcpMcpServers` throws the same
as a defense-in-depth backstop. The wire shape is kept; only delivery is off.

## Owned by

- `sdks/python/agenta/sdk/agents/mcp/`: the Python models and resolver.
- `services/agent/src/engines/sandbox_agent/mcp.ts`: builds the session's MCP servers.
- `services/agent/src/tools/mcp-bridge.ts`: the bridge.
- `services/agent/src/tools/mcp-server.ts`: the stdio JSON-RPC server.
- `services/agent/src/engines/sandbox_agent/mcp.ts`: builds the session's MCP servers (the two
layers; `USER_MCP_UNSUPPORTED_MESSAGE`).
- `services/agent/src/tools/mcp-bridge.ts`: the internal gateway-tool channel builder.
- `services/agent/src/tools/tool-mcp-http.ts`: the internal loopback HTTP MCP server.
- `services/agent/src/tools/mcp-server.ts`: the removed stdio JSON-RPC server (refusing stub).
- `services/agent/src/tools/relay.ts`: the file relay loop and hosts.

## Watch for when changing

- **The gate.** MCP delivery depends on harness type and the `mcpTools` capability, not on a
single env flag. Changing either changes which tools reach the harness.
- **The MCP server config shape.** It is part of the `/run` contract and the wire serializer.
- **The stdio methods.** `initialize`, `tools/list`, `tools/call`, and the client-tool filter.
- **The internal channel's MCP methods.** `initialize`, `tools/list`, `tools/call`, and the
client-tool filter, served over loopback HTTP. The framing (stateless JSON Streamable-HTTP) is
pinned to the MCP client the installed Claude harness uses; re-verify it if that version moves.
- **The relay.** Polling interval, timeout, and the local-versus-Daytona host. A slow tool
must fail cleanly.
- **HTTP MCP delivery.** `toAcpMcpServers` routes the resolved secret from `env` into a
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,10 @@ the runner (`toAcpMcpServers`) reads each resolved `env` entry and emits it as a
header (so `secrets: {"Authorization": "vault-name"}` becomes an `Authorization` header on the
remote call). Stdio (`transport: "stdio"` + `command`) servers are disabled in the sidecar — a
stdio server runs an arbitrary process on the runner host, outside the sandbox boundary — so a
run carrying one is refused (`MCP_UNSUPPORTED_MESSAGE`). The SDK models, resolver, and wire are
transport-agnostic; the enable/disable split lives entirely in the runner.
run carrying one is refused (`USER_MCP_UNSUPPORTED_MESSAGE`). This is the USER MCP capability and
is distinct from the runner's internal gateway-tool MCP channel (delivered over loopback HTTP;
see `runner-to-mcp-server.md`). The SDK models, resolver, and wire are transport-agnostic; the
enable/disable split lives entirely in the runner.

## Owned by

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,15 @@ not new wire fields). See [`status.md`](./status.md) for the landed summary.

Composio, the tool gateway (gateway/callback tools), and named connections are referenced only
as things that already exist; this work changes none of them. **The one exception is MCP:** the
stdio MCP-server implementation in the sidecar is now disabled (parity with the removed code
execution) until its security issues are fixed.
**user-declared stdio** MCP-server implementation in the sidecar is now disabled (parity with the
removed code execution) until its security issues are fixed.

> Follow-up correction (gateway-tool-mcp project, 2026-06-25): this disable was originally wired
> through a single shared constant that ALSO killed the runner's INTERNAL gateway-tool MCP channel
> (the one that delivers Agenta gateway/callback tools to Claude), hard-failing Claude + gateway
> tools. That collateral damage was reverted: the internal channel is restored over a loopback
> HTTP MCP endpoint (no runner-host process), and the user-facing constant was renamed
> `USER_MCP_UNSUPPORTED_MESSAGE`. Only **user stdio** MCP stays disabled.

## Files

Expand Down
22 changes: 17 additions & 5 deletions services/agent/src/engines/sandbox_agent.ts
Original file line number Diff line number Diff line change
Expand Up @@ -205,6 +205,9 @@ export async function runSandboxAgent(
let otel: ReturnType<typeof createSandboxAgentOtel> | undefined;
// Daytona tool relay loop (started once the session exists, stopped after the prompt).
let toolRelay: { stop: () => Promise<void> } | undefined;
// Internal gateway-tool MCP server closer (set when an internal channel is built for a non-Pi
// harness with executable tools; a no-op otherwise). Released in the `finally`.
let closeToolMcp: (() => Promise<void>) | undefined;
let workspace: { cleanup: () => Promise<void> } | undefined = plan.isDaytona
? undefined
: {
Expand Down Expand Up @@ -281,21 +284,22 @@ export async function runSandboxAgent(
log: logger,
});

const mcpServers = buildSessionMcpServers({
const sessionMcp = await buildSessionMcpServers({
isPi: plan.isPi,
capabilities,
harness: plan.harness,
toolSpecs: plan.toolSpecs,
userMcpServers: request.mcpServers,
toolCallback: request.toolCallback as ToolCallbackContext | undefined,
relayDir: plan.relayDir,
log: logger,
});
// Close the internal gateway-tool MCP server (if one started) when the run ends.
closeToolMcp = sessionMcp.close;

const session = await sandbox.createSession({
agent: plan.acpAgent,
cwd: plan.cwd,
sessionInit: { cwd: plan.cwd, mcpServers },
sessionInit: { cwd: plan.cwd, mcpServers: sessionMcp.servers },
});
const sessionId = resolveRunSessionId(request, session.id);

Expand Down Expand Up @@ -412,7 +416,11 @@ export async function runSandboxAgent(
if (piError) {
return {
ok: false,
error: conciseError(new Error(piError), plan.harness, request.provider),
error: conciseError(
new Error(piError),
plan.harness,
request.provider,
),
};
}
}
Expand All @@ -439,9 +447,13 @@ export async function runSandboxAgent(
} catch (err) {
otel?.finish();
await otel?.flush().catch(() => {});
return { ok: false, error: conciseError(err, plan.harness, request.provider) };
return {
ok: false,
error: conciseError(err, plan.harness, request.provider),
};
} finally {
await toolRelay?.stop().catch(() => {});
await closeToolMcp?.().catch(() => {});
await sandbox?.destroySandbox().catch(() => {});
await sandbox?.dispose().catch(() => {});
await workspace?.cleanup().catch(() => {});
Expand Down
Loading
Loading