[integration] big-agents#4791
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Important Review skippedToo many files! This PR contains 1616 files, which is 1466 over the limit of 150. To get a review, narrow the scope: Upgrade to a paid plan to raise the limit. ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: ⛔ Files ignored due to path filters (258)
📒 Files selected for processing (1616)
You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 10
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 76c33a7d-feff-4e5f-acc0-962498f74cfc
📒 Files selected for processing (70)
sdks/python/agenta/__init__.pysdks/python/agenta/sdk/agents/__init__.pysdks/python/agenta/sdk/agents/adapters/__init__.pysdks/python/agenta/sdk/agents/adapters/_runner_config.pysdks/python/agenta/sdk/agents/adapters/agenta_builtins.pysdks/python/agenta/sdk/agents/adapters/harnesses.pysdks/python/agenta/sdk/agents/adapters/in_process.pysdks/python/agenta/sdk/agents/adapters/local.pysdks/python/agenta/sdk/agents/adapters/sandbox_agent.pysdks/python/agenta/sdk/agents/adapters/vercel/__init__.pysdks/python/agenta/sdk/agents/adapters/vercel/messages.pysdks/python/agenta/sdk/agents/adapters/vercel/routing.pysdks/python/agenta/sdk/agents/adapters/vercel/sse.pysdks/python/agenta/sdk/agents/adapters/vercel/stream.pysdks/python/agenta/sdk/agents/dtos.pysdks/python/agenta/sdk/agents/errors.pysdks/python/agenta/sdk/agents/interfaces.pysdks/python/agenta/sdk/agents/mcp/__init__.pysdks/python/agenta/sdk/agents/mcp/errors.pysdks/python/agenta/sdk/agents/mcp/interfaces.pysdks/python/agenta/sdk/agents/mcp/models.pysdks/python/agenta/sdk/agents/mcp/parsing.pysdks/python/agenta/sdk/agents/mcp/resolver.pysdks/python/agenta/sdk/agents/mcp/wire.pysdks/python/agenta/sdk/agents/streaming.pysdks/python/agenta/sdk/agents/tools/__init__.pysdks/python/agenta/sdk/agents/tools/compat.pysdks/python/agenta/sdk/agents/tools/errors.pysdks/python/agenta/sdk/agents/tools/interfaces.pysdks/python/agenta/sdk/agents/tools/models.pysdks/python/agenta/sdk/agents/tools/parsing.pysdks/python/agenta/sdk/agents/tools/resolver.pysdks/python/agenta/sdk/agents/tools/wire.pysdks/python/agenta/sdk/agents/ui_messages.pysdks/python/agenta/sdk/agents/utils/__init__.pysdks/python/agenta/sdk/agents/utils/ts_runner.pysdks/python/agenta/sdk/agents/utils/wire.pysdks/python/agenta/sdk/decorators/routing.pysdks/python/agenta/sdk/engines/running/interfaces.pysdks/python/agenta/sdk/engines/running/utils.pysdks/python/agenta/sdk/middlewares/running/normalizer.pysdks/python/agenta/sdk/models/workflows.pysdks/python/agenta/sdk/utils/types.pysdks/python/agenta/tests/agents/test_streaming.pysdks/python/oss/tests/pytest/integration/agents/__init__.pysdks/python/oss/tests/pytest/integration/agents/test_transport_roundtrip.pysdks/python/oss/tests/pytest/unit/agents/__init__.pysdks/python/oss/tests/pytest/unit/agents/conftest.pysdks/python/oss/tests/pytest/unit/agents/golden/run_request.claude.jsonsdks/python/oss/tests/pytest/unit/agents/golden/run_request.pi.jsonsdks/python/oss/tests/pytest/unit/agents/golden/run_result.error.jsonsdks/python/oss/tests/pytest/unit/agents/golden/run_result.ok.jsonsdks/python/oss/tests/pytest/unit/agents/mcp/__init__.pysdks/python/oss/tests/pytest/unit/agents/mcp/test_resolver.pysdks/python/oss/tests/pytest/unit/agents/test_dtos_agent_config.pysdks/python/oss/tests/pytest/unit/agents/test_dtos_capabilities_events.pysdks/python/oss/tests/pytest/unit/agents/test_dtos_content_blocks.pysdks/python/oss/tests/pytest/unit/agents/test_dtos_harness_configs.pysdks/python/oss/tests/pytest/unit/agents/test_environment_lifecycle.pysdks/python/oss/tests/pytest/unit/agents/test_harness_adapters.pysdks/python/oss/tests/pytest/unit/agents/test_runner_adapter_config.pysdks/python/oss/tests/pytest/unit/agents/test_ui_messages.pysdks/python/oss/tests/pytest/unit/agents/test_wire_contract.pysdks/python/oss/tests/pytest/unit/agents/tools/__init__.pysdks/python/oss/tests/pytest/unit/agents/tools/test_models.pysdks/python/oss/tests/pytest/unit/agents/tools/test_parsing.pysdks/python/oss/tests/pytest/unit/agents/tools/test_resolver.pysdks/python/oss/tests/pytest/unit/test_normalizer_passthrough.pysdks/python/oss/tests/pytest/utils/test_messages_endpoint.pysdks/python/oss/tests/pytest/utils/test_routing.py
| NOTE on packaging: the Node runner is NOT part of this Python wheel (``pip install agenta`` | ||
| stays pure Python; the wheel contains zero ``.ts``/``.js``). How a standalone Pi user obtains | ||
| the runner -- an ``npx`` npm package, a local checkout, or a Docker sidecar over HTTP -- is an | ||
| open distribution decision; see ``docs/design/agent-workflows/typescript-structure/``. Do NOT | ||
| silently bundle a JS runner into the wheel. |
There was a problem hiding this comment.
Align LocalBackend wording with the stated packaging contract.
Line 9-13 says the wheel must not bundle a JS runner, but Line 30 and the NotImplementedError messages still say “bundled JS”. This contradiction will confuse integrators.
Suggested wording fix
-class LocalBackend(Backend):
- """Run Pi (bundled JS) or Claude (``claude-agent-sdk``) on this machine."""
+class LocalBackend(Backend):
+ """Run Pi (external Node runner) or Claude (``claude-agent-sdk``) on this machine."""
...
raise NotImplementedError(
- "LocalBackend is not implemented yet (Phase 3: Pi via bundled JS, "
+ "LocalBackend is not implemented yet (Phase 3: Pi via external Node runner, "
"Phase 4: Claude via claude-agent-sdk)."
)
...
raise NotImplementedError(
- "LocalBackend is not implemented yet (Phase 3: Pi via bundled JS, "
+ "LocalBackend is not implemented yet (Phase 3: Pi via external Node runner, "
"Phase 4: Claude via claude-agent-sdk)."
)Also applies to: 30-38, 50-53
| def __init__( | ||
| self, | ||
| *, | ||
| sandbox: str = "local", | ||
| url: Optional[str] = None, | ||
| command: Optional[Sequence[str]] = None, | ||
| cwd: Optional[str] = None, | ||
| timeout: float = float(os.getenv("AGENTA_AGENT_RUNNER_TIMEOUT_SECONDS", "180")), | ||
| ) -> None: | ||
| self._sandbox = sandbox | ||
| self._url = url |
There was a problem hiding this comment.
Validate sandbox at construction time.
Line 129 currently accepts any string; invalid values get sent over the wire and fail late. Restrict this to supported values (local, daytona) and raise a configuration error early.
Suggested validation
from ..dtos import (
@@
)
+from ..errors import AgentRunnerConfigurationError
@@
def __init__(
self,
*,
sandbox: str = "local",
@@
timeout: float = float(os.getenv("AGENTA_AGENT_RUNNER_TIMEOUT_SECONDS", "180")),
) -> None:
+ allowed_sandboxes = {"local", "daytona"}
+ if sandbox not in allowed_sandboxes:
+ raise AgentRunnerConfigurationError(
+ f"Unsupported sandbox '{sandbox}'. Expected one of: {sorted(allowed_sandboxes)}."
+ )
self._sandbox = sandbox
self._url = url| from agenta.sdk.agents.tools.models import MissingSecretPolicy | ||
|
|
||
| from .errors import MissingMCPSecretError | ||
| from .interfaces import MCPSecretProvider | ||
| from .models import MCPServerConfig, ResolvedMCPServer | ||
|
|
||
|
|
||
| class MCPResolver: | ||
| def __init__( | ||
| self, | ||
| *, | ||
| secret_provider: MCPSecretProvider, | ||
| missing_secret_policy: MissingSecretPolicy = MissingSecretPolicy.ERROR, | ||
| ) -> None: |
There was a problem hiding this comment.
Breaks declared layer direction by importing tools model into MCP.
MCPResolver currently depends on agenta.sdk.agents.tools.models.MissingSecretPolicy, but this cohort declares tools as depending on MCP, not the other way around. This reverse edge can create import-order fragility and circular dependency risk as the stack evolves. Move MissingSecretPolicy to a neutral/shared module (or MCP/shared contract module) and import it from both subsystems.
Possible direction
- from agenta.sdk.agents.tools.models import MissingSecretPolicy
+ from agenta.sdk.agents.shared.missing_secret_policy import MissingSecretPolicy(then define/move the enum in that shared module and update tools imports accordingly)
| out = stdout.decode("utf-8", "replace") | ||
| err = stderr.decode("utf-8", "replace") | ||
| if not out.strip(): | ||
| raise RuntimeError( | ||
| f"Agent runner returned no output. exit={proc.returncode} stderr={err[-2000:]}" | ||
| ) | ||
| try: | ||
| return json.loads(out) | ||
| except json.JSONDecodeError as exc: |
There was a problem hiding this comment.
Treat non-zero subprocess exit as transport failure even with parseable JSON.
Line 74 returns parsed JSON without checking proc.returncode; a crashed runner can look successful if it emitted partial/legacy JSON before exiting non-zero.
Suggested fix
@@ async def deliver_subprocess(...):
out = stdout.decode("utf-8", "replace")
err = stderr.decode("utf-8", "replace")
+ if proc.returncode not in (0, None):
+ raise RuntimeError(
+ "Agent runner exited non-zero. "
+ f"exit={proc.returncode} stderr={err[-2000:]} stdout={out[:500]}"
+ )
if not out.strip():
raise RuntimeError(
f"Agent runner returned no output. exit={proc.returncode} stderr={err[-2000:]}"
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| out = stdout.decode("utf-8", "replace") | |
| err = stderr.decode("utf-8", "replace") | |
| if not out.strip(): | |
| raise RuntimeError( | |
| f"Agent runner returned no output. exit={proc.returncode} stderr={err[-2000:]}" | |
| ) | |
| try: | |
| return json.loads(out) | |
| except json.JSONDecodeError as exc: | |
| out = stdout.decode("utf-8", "replace") | |
| err = stderr.decode("utf-8", "replace") | |
| if proc.returncode not in (0, None): | |
| raise RuntimeError( | |
| "Agent runner exited non-zero. " | |
| f"exit={proc.returncode} stderr={err[-2000:]} stdout={out[:500]}" | |
| ) | |
| if not out.strip(): | |
| raise RuntimeError( | |
| f"Agent runner returned no output. exit={proc.returncode} stderr={err[-2000:]}" | |
| ) | |
| try: | |
| return json.loads(out) | |
| except json.JSONDecodeError as exc: |
feat(agent): big-agents-work — turn inspector, HITL hardening, tool catalog, playground UX
… URL contract The runner's sessions calls (heartbeat/records/mounts/interactions/states) all 404'd because AGENTA_API_URL grew an /api suffix on the direct api:8000 hop, where routes live at root — five subsystems down from one prefix. - API: ApiPrefixStripMiddleware accepts /api-prefixed paths, so traefik-strip, direct, and ALB-verbatim topologies all route with either URL shape. - Env contract: *_URL is public (host + prefix), *_INTERNAL_URL is the direct in-network hop (container DNS, no prefix). Runner reads AGENTA_API_INTERNAL_URL; services reads AGENTA_RUNNER_INTERNAL_URL; AGENTA_RUNNER_URL is dropped (no public runner surface). - Runner: one shared apiBase() replaces five copy-pasted fallbacks; otel keeps its cloud tail behind the same chain. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-prefix' into fix/broken-internal-urls
Collection endpoints keep their trailing slash; only the env-provided base is normalized so a http://api:8000/ value can't produce //-joined paths. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…odels-secrets-review-part-1 # Conflicts: # web/packages/agenta-entity-ui/src/DrillInView/SchemaControls/agentTemplate/useModelHarness.tsx
…log lag The provider/secrets resolver changes moved four things the tests still encoded the old way, and the OpenRouter model check fought the catalog. - connections: provider keys are addressed by their PROVIDER (header.name is display-only, never a slug), and a bare model id present in the catalog infers its provider instead of failing loud. Tests updated + a new test_bare_catalog_model_infers_provider. - default template: the runtime selection is provider-qualified, so the /inspect default parses to `provider/model`, not a bare `model`. - commit diff: agent templates key the model as `llm.model`; the summary builder now reads that alongside the legacy `model`. Tests updated. - supported_llm_models: OpenRouter's list intentionally tracks current top-used ids that the pinned litellm build hasn't indexed yet. For that provider a miss is expected lag, so it xfails (still catching a typo'd prefix structurally) instead of failing CI. The catalog is unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Providers, models, and secrets: credential resolution fixes (#5057)Walking a fresh agent through its first run surfaced three breaks in the model-credential chain: (1) the default agent template shipped a bare model id ( #5057 fixes all three: the default template now always carries a provider, provider-key matching goes through the provider family instead of a fabricated slug, a single Verified live end to end: a fresh agent with only an OpenAI key runs without the provider-prefix error, and a Claude Code + Bedrock connection completes a run in ~6s with the injected bearer token and no credential leakage into |
…review-part-1 [fix] Make credential resolution work for provider keys, bare models, and Bedrock
Runner-to-API sessions routing was silently broken; internal vs public URLs now split (#5059)Since July 1, every runner-to-API sessions call (heartbeat, records ingest, mount signing, interactions, states) had been silently 404ing: transcripts were dropped, no durable cwd was mounted, and the sessions tables stayed empty, but agent runs themselves kept working so nothing surfaced. The root cause was a single env var carrying two incompatible meanings: the public URL ( #5059 fixes this by splitting the contract into three distinct vars: Verified live: an agent run resolves and completes with the corrected URL, |
[fix] Repair runner-to-API sessions routing and split internal vs public URLs
Seven always-on worker-* containers (records/events/tracing stream workers, webhooks/triggers/interactions/evaluations queue workers) each run a single asyncio loop while carrying the full API image, imports, and New Relic agent. The RAM cost is 7x that per-process baseline, not workload-driven. Introduce two list-parameterized entrypoints, worker-streams and worker-queues, each hosting a selectable subset of its family's loops (AGENTA_WORKER_STREAMS / AGENTA_WORKER_QUEUES; empty means all of that family). Extract the byte-identical stream-consumer loop into a StreamConsumer base; records/events/tracing subclass it and keep only their deltas (deserialize, group key, meter, write, post-hook), preserving the events EE-gate and its webhook skip-ack. Co-host TaskIQ brokers via Receiver.listen() since run_worker forks and owns its own loop. All stream/queue/consumer-group names, message shapes, and read/write semantics are unchanged, so this is a process-packing change: nothing external moves. Batch publish_spans into one Redis pipeline, add init:true plus watchmedo signal hardening to the dev workers, and emit RUDE metrics via a .tick() EMF sibling to .log() at producer and consumer sites. Dev/preview/cloud default to one of each kind (topology A); the helm chart and all seven compose files follow. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…client The stream consumer loop reads with XREADGROUP(block=5000ms). With a finite socket timeout on the client, that blocking read trips the socket timeout and raises "Timeout reading from redis-durable:6381" every cycle instead of returning an empty batch. worker-streams then logged a read failure every five seconds and consumed nothing. Pass socket_timeout=None when building the durable client so the block resolves server-side and an idle read returns empty. Verified live: after the fix the three loops sit blocking cleanly, an injected streams:records entry is read and processed (it fails deserialize as expected for a non-zlib test payload), and no timeout is logged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add a "Browse by category" rail to the tool catalog: connections pinned at the
top, an independently-scrolling category list below, and a single-select filter
(all / category / connection) driving the grid. Full-bleed recessed rail, and
per-context loading / empty / error states.
Backend: the shared gateway catalog gains a category filter on list_integrations
(passed to Composio /toolkits?category=) plus a new list_categories that derives
a focused, results-guaranteed nav from the most-used toolkits' own tags —
Composio's /toolkits/categories returns ~800 raw tags, most matching nothing.
New /catalog/providers/{provider}/categories/ route.
Frontend: category atom + categories hook (atomWithQuery), category pass-through
on the Fern integrations call, dedupe of integrations/categories to keep React
keys unique across pagination, and search that flattens across categories.
Complete the streams:tracing to streams:spans rename across the ingestion pipeline: the producer XADDs to streams:spans, the consumer reads it under the worker-spans consumer group, and the AGENTA_WORKER_STREAMS selector token, the builder, and the maxlen constant all use "spans". The old streams:tracing name and worker-tracing group are gone. Fix the EMF metrics the .tick() calls emit so producer and consumer line up on one CloudWatch dimension. The consumer metric family is now derived once from the consumer group (worker-<family> gives spans/events/records) instead of being the raw stream name, so the base loop, the events override, and the producers all emit the same short family value: spans.published pairs with spans.processed under stream=spans, and likewise for events and records. Before, tracing's consumer emitted stream=streams:tracing while its producer emitted stream=tracing, so the two never matched. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Validate fetchToolCategories at the boundary via zod (only non-Fern call); derive the category types from the schema so type and validator can't drift. - Clear the category filter when a connection is picked so the browse query isn't left filtered behind the detail view or restored stale on Back. - Resolve connection integration metadata from a monotonic union of loaded integrations so a category/search filter no longer drops name+logo. - Extract the catalog dedup crash-guard into a shared dedupeBy helper used by both catalog hooks and cover it with unit tests.
[feat] Consolidate worker containers into two list-parameterized kinds
Workers sprawl: 7 worker containers → 2 list-parameterized kinds (#5061, + platform #1652, infra #26)Folded in the workers-sprawl consolidation as a three-repo stack landing on Two real bugs surfaced and were fixed with it. The blocking The stack spans all three repos: application (#5061) carries the entrypoints, the shared consumer/broker factories, and the compose + Helm + docs updates; platform (agenta_cloud #1652) carries the production docker-compose two-kind services and the |
…hold The header flipped to "Results" and un-highlighted the category at 1 typed char, but the browse query only treats a search as active at >=3 chars, so a 1-2 char query still showed category-filtered results under a "Results" label. Gate the searching state on a searchMinChars prop (default 3, matching the consumer hooks) so the UI reflects when the search actually applies and the category context is preserved until then.
…ections [FE Feat] Tools catalog category sections
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…t, agent_v0 SDK handler, dual-surface parity Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
The The structural piece is that Verification: a four-level test surface (handlers-direct 27-combo flag cubes, |
feat(sdk): handler-owned invoke negotiation (stream/trim/force) with batch=fold(stream)
Context
big-agentsis the integration branch for the agent-workflows feature. Every agent PR targetsbig-agents(directly, or by stacking on one that does). The plan is to review and merge each sub-PR intobig-agents, then mergebig-agentsintomainas a single unit.This PR is a draft tracker. It stays open until all the open sub-PRs below are merged into
big-agents. The branch started from an empty commit, so the diff fills in as sub-PRs land.Integrated PRs
Each box gets checked when that PR is merged into
big-agents. Indented items stack on the item above them.SDK and service
Runner
big-agents(the relay-bug fix, the CI job, and a superset of its tests already landed via feat(agent): runner engines, HTTP server, tracing, and docker image #4778 + chore(agent): make sandbox-agent runner first-class #4786)Frontend
Hosting
Sandbox-agent deployment
Docs
Branch-only (no PR yet)
These design-doc branches are stacked on
big-agentsbut have no PR. Open one if you want them reviewed separately, otherwise they fold in with the docs.docs/agent-model-config-and-provider-authdocs/agent-skills-configdocs/agent-code-tool-sandboxdocs/agent-harness-capabilitiesNotes
big-agents(feat(agent): runner engines, HTTP server, tracing, and docker image #4778 + chore(agent): make sandbox-agent runner first-class #4786 already carry its tests, CI job, and relay-bug fix; itsversion.tswas stale["pi","rivet"]).big-agentsas chore(railway): add sandbox-agent preview deployment #4802 / chore(kubernetes): deploy sandbox-agent sidecar #4803 / ci(agent): build and test sandbox-agent images #4804.