Mcp multi graph yoko integration by asoorm · Pull Request #2845 · wundergraph/cosmo

asoorm · 2026-05-08T15:18:01Z

Summary by CodeRabbit

New Features
- Code Mode: sandboxed JS/TS execution (run, search, generate), named-ops, persisted TypeScript bundles, and mutation approval workflow
- Yoko integration: AI-driven query generation/search and a local Yoko mock for demos
Infrastructure
- Multi-tenant MCP mounts and a dedicated Code Mode MCP server
- In-memory and Redis session backends with TTL/LRU and bundle truncation
- Observability: structured logs, metrics, and traces
Demo & Docs
- Full local and Connect Code Mode demos, scripts, and documentation updates

Checklist

I have discussed my proposed changes in an issue and have received approval to proceed.
I have followed the coding standards of the project.
Tests or benchmarks have been added or updated.
Documentation has been updated on https://github.com/wundergraph/docs-website.
I have read the Contributors Guide.

Open Source AI Manifesto

This project follows the principles of the Open Source AI Manifesto. Please ensure your contribution aligns with its principles.

Adds a second MCP server next to the existing per-operation one, exposing two generic tools instead of one-tool-per-operation: - code_mode_search_tools: takes prompts, generates GraphQL ops, registers them in the session catalog, returns their TS signatures. - code_mode_run_js: runs an async arrow against the session catalog inside a JS sandbox (V8 isolate), with tools.<name>(vars) bound to the registered ops. Includes: - router/internal/codemode: harness, sandbox, server, storage, tsgen, yoko (query-gen client), calltrace, observability. - proto/wg/cosmo/code_mode/yoko/v1: Yoko query-gen Connect API. - router/pkg/config: code_mode config block (auth, sandbox limits, query-generation endpoint, named-ops, mutation approval). - demo/code-mode: local federation + Yoko mock + start scripts + MCP client config snippets (Claude Code, Claude Desktop, Codex). - demo/code-mode-connect: alternate demo against an external yoko Connect supergraph (set YOKO_DIR). - router-tests: end-to-end named-ops integration test.

Previously a single non-serializable leaf (BigInt, function, symbol, undefined, circular ref) caused the sandbox to drop the entire return value and surface a NotSerializable error. Multi-source aggregations that put raw federation errors or upstream payloads into a debug field lost the rest of an expensive run. The validator now mutates each bad leaf to the sentinel string "<<non-serializable: KIND>>" and records {path, kind} in a new warnings array on the response envelope. The successful result still flows through. NotSerializable now only fires when JSON.stringify itself throws after sanitization. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…down Moves the long inline tool/resource description string constants out of server.go into a new internal/codemode/server/descriptions sub-package, where each description lives in its own .md file embedded via go:embed. Lets the prose be edited as readable markdown without touching Go source. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The proxy now retries the initial dial with exponential backoff and supervises the live session via session.Wait(); when the upstream disconnects it reconnects transparently so downstream clients (Claude Desktop, etc.) keep working across router restarts. KeepAlive pings make dead connections surface quickly instead of waiting for the next call. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…eState.wg WriteTimeout was 30s while executeTimeout is 120s, so net/http would cut off legitimately long code_mode_run_js responses mid-flight. Lift the write deadline above the configured execute timeout and switch the listener to ReadHeaderTimeout so a body upload doesn't share the 30s budget either. The unused sync.WaitGroup on executeState (and its matching state.wg.Wait() in Execute) was a no-op leftover; remove both. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The Employee.startDate field carries @requiresScopes in the shared demo/pkg/subgraphs/employees schema; the router's authorizer is always active and rejects unauthenticated callers as soon as a scoped field is touched, breaking the code-mode demo. The shared schema is exercised by router-tests/security/authentication_test.go so it has to stay intact — instead, prepare-schemas.sh copies the four code-mode subgraph schemas into demo/code-mode/schemas/ with @authenticated and @requiresScopes directive applications removed, and graph.yaml composes from those copies. The make compose target depends on prepare-schemas so the local copies always track upstream. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds the building blocks for a demo router that turns any GraphQL graph into MCP tools, hosts multiple operation collections side by side, and hot-reloads tools when .graphql files change on disk. - pkg/mcpserver/introspection.go: fetch upstream SDL via standard introspection so a collection can target any GraphQL endpoint without a local schema file. - pkg/mcpserver/multi_server.go: mount many GraphQLSchemaServer handlers on one shared listener, each with its own path, operations dir, and optional upstream override. - pkg/mcpserver/operations_watcher.go: per-collection directory watcher that picks up added/modified/removed .graphql files and triggers a tools reload after a settling tick. - mcp.config.yaml: working demo config wiring four public graphs (anilist, countries, rickandmorty, swapi) as separate MCP servers.

Operation identity is now the ShortSHA over the canonical body — eight hex chars prefixed with 'o' so it's a valid JS identifier. The model calls tools.<sha>(...) inside the sandbox, the search response surfaces (sha, description) pairs plus a TS fragment that names each method by sha, and storage dedupes by canonical body alone. Previously, yoko regenerating an operation under the same document name ("getOrders" with v1 vs. v2 selection sets) silently overwrote the first registration under NormalizeName-based dedup. Tying identity to content makes that collision impossible — different bodies always get different identifiers, identical bodies always converge. DocumentName is preserved separately because the router's GraphQL parser still needs the literal operation name from the body when invoking against /graphql. NormalizeName, SuffixedName, and the reserved-word table are gone; tests are rewritten to compute SHAs from bodies so expectations stay self-checking. router-tests duplicates a small codeModeShortSHA helper because it's a separate module and can't import internal/codemode/storage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Merges code-mode-v2 (#2825) and adds a per-sub-server code_mode_run_js MCP tool that lets an LLM compose multiple operations in one V8-sandboxed JS call instead of N MCP round-trips. The new tool is auto-registered on every MultiServer sub-server that has at least one operation loaded. Each sub-server gets its own sandbox bound to its upstream URL; tools.<name>(vars) dispatches via the existing op catalog (no Yoko, no storage backend, no session state). Reuses the codemode/sandbox, codemode/harness, and codemode/tsgen packages from the merged PR. Description bundled inline includes the typed TS catalog plus shape constraints (single async-arrow root, no top-level await, no IIFE, sandbox console disabled, etc.) so the LLM lands on the right syntax first try. Compile fixes from the merge: - Add EnsureIndexed(ctx) error to yoko.Searcher and *yoko.Client - Rename opsFragment -> newOpsFragment to match call sites/tests - Update fakeYoko test fake with EnsureIndexed counter - Resolve mcpServer field conflict in router_config.go (keep MultiServer type, add codeModeServer alongside)

The watcher and applyToolDiff already logged on every reload, but the messages didn't make it clear that a tools/list_changed notification was being broadcast to connected MCP clients. Two tweaks: - operations_watcher: "...reloading tools and notifying connected clients" - applyToolDiff: "MCP tool refresh broadcast to connected clients (tools/list_changed)" with a total_tools count; the no-change branch now says "no notification sent to clients" so the distinction between "ticked, nothing useful happened" and "ticked, clients notified" is visible at a glance.

The previous description duplicated the per-op MCP tools' inputSchema info as an inline TypeScript bundle, and used hardcoded example operation names (SearchAnime, GetA/B, GetCountries) that don't necessarily exist on every server. The model could read those examples as available bindings, producing cross-server confusion (e.g. tools.SearchAnime appearing in the countries server's description). Drop the TS catalog and the foreign-named examples. The description now: - Lists the actual bound operation names on this server, nothing more - Points the model at the per-op MCP tools (and get_operation_info) for variable shapes and return types — single source of truth, no dupe - Uses generic placeholder names (tools.SomeOp) in syntax examples - Trims to ~half the prior length (3.6 KB vs 7.0 KB) by removing the redundant "WHEN TO USE" example list, since RETURN-VALUE DISCIPLINE already covers the same ground

Multi-topic snapshot of code-mode-v2 work in progress. demo/code-mode: - Include all non-EDFS demo subgraphs in the local federation: hobbies, products, test1, countries + the products_fg feature graph (under feature flag myff), in addition to the existing employees, family, availability, mood. Mirrors demo/graph-no-edg.yaml. The previous 4-subgraph set silently lacked Employee.hobbies (and Country, products, test1 fields), so code-mode-search-tools could never resolve prompts about those domains. - Fix BSD sed bug in prepare-schemas.sh: the previous regex used `\b` which macOS BSD sed treats as a literal `b`, so @authenticated stayed on type/enum/interface definitions. Switch to a portable (^|whitespace)/(non-word|EOL) anchor pair. Also extend the loop to cover all 9 subgraphs. - start.sh + run_subgraphs_subset.sh: launch the 5 added subgraphs on ports 4003/4004/4006/4009/4010 with matching wait_url checks. - README + Makefile comment: document the new subgraph set. yoko proto + mock + client: - Rewrite proto/wg/cosmo/code_mode/yoko/v1/yoko.proto and regenerate router/gen/proto/.../{yoko.pb.go, yokov1connect/yoko.connect.go}. - Rewrite demo/code-mode/yoko-mock/main.go (and tests). - Refresh router/internal/codemode/yoko/{client.go, client_test.go, searcher.go} against the new contract. - Bump demo/code-mode/{mcp-stdio-proxy,yoko-mock}/go.{mod,sum}. Connect demo: - Refresh demo/code-mode-connect/{README.md, router-config.yaml, start.sh} against the same yoko contract. Build / generate: - Makefile: pass --include-imports to `buf generate` so transitive protobuf imports are regenerated alongside the yoko v1 schema. - buf.yaml + buf.lock: pin the buf module deps now needed for that --include-imports run. - router/gen/proto/buf/validate/validate.pb.go: regenerated buf validate import. Misc: - router/pkg/codemode/varschema: new package + tests. - router/pkg/grpcconnector/grpcplugin/grpc_plugin.go: minor update. - router/internal/codemode/sandbox/sandbox_test.go + router/internal/codemode/server/observability_handler_test.go: adjust tests for the refreshed client surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The mood and availability subgraphs publish employee-update events to NATS as part of their mutation resolvers. Two failure modes broke the demo when NATS was not running (or not yet started): 1. cmd/all crashed at boot subgraphs.New() eagerly created and started two NATS adapters and treated any failure as fatal. Without NATS, the entire process exited with "failed to start default nats adapter". 2. Mutations failed at runtime mood.UpdateMood returned "no nats pubsub default provider found" when the adapter map was empty, and availability.UpdateAvailability nil-panicked on the unconditional NatsPubSubByProviderID["default"] lookup. Even when the data write succeeded (storage.Set runs first), clients saw a downstream error response. Changes: - subgraphs.go: extract startNatsAdapter helper that logs and returns nil on failure. NATS adapter startup and JetStream stream provisioning are now best-effort. - mood/availability resolvers: extract publishMoodEvent / publishAvailabilityEvent helpers that nil-check the adapter and log publish errors rather than returning them. Mutations always succeed if the local storage write succeeds. - code-mode demo start.sh and run_subgraphs_subset.sh: switch from per-subgraph cmd/<name> processes to a single cmd/all invocation with explicit port flags. The individual cmd/<name> binaries pass nil for the NATS adapter map; cmd/all wires NATS up correctly when it is available, and now degrades gracefully when it is not. - router-config.yaml: flip require_mutation_approval from true to false so the demo MCP client can run mutations end-to-end without an approval flow. - README.md: document the optional NATS prerequisite (make edfs-infra-up) and explain why the demo runs cmd/all. Trade-off: subscriptions on Employee.currentMood / isAvailable will not deliver updates while NATS is unreachable — direct queries still reflect the new state. The demo prioritizes "queries and mutations always work" over "subscriptions always work". Verified end-to-end with NATS stopped via docker stop cosmo-dev-nats-1: - cmd/all boots cleanly with logged warnings - updateMood and updateAvailability mutations both return successful data responses; resolvers log per-publish skip warnings - after docker start cosmo-dev-nats-1, the same mutations succeed with no resolver-level warnings (publishes go through) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Bundling unrelated writes produced tangled operations with mixed argument shapes; the search-tool description now requires one prompt per logical write and reserves combination for tightly-correlated cascades. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ation Bring in 5 upstream commits since fork point 939da74: - b92593b strip auth directives from demo subgraph schemas - 3795fc2 identify operations by content SHA instead of name - 3c5d33e demo full-federation fix + yoko proto/mock refresh - 3bdd229 make subgraphs resilient to NATS unavailability - df3562b split unrelated mutations into separate search prompts # Conflicts: # router/internal/codemode/server/search_handler_test.go # router/internal/codemode/yoko/searcher.go

Each mcp.servers[*] entry can now opt into PR #2825's Code Mode by setting code_mode.enabled: true. The per-tenant MCP mount then exposes only code_mode_search_tools, code_mode_run_js, and the persisted-ops resource — the regular per-op tools, execute_graphql, get_schema, and get_operation_info are suppressed for that server. Each tenant gets its own Yoko client and session storage; Yoko keys schemas by content SHA so a single endpoint (http://127.0.0.1:3400) handles all tenants. The bespoke code_mode_run_js in pkg/mcpserver (which bound named operations as JS functions) is removed — Yoko's generative search registers ops by SHA into the per-session bundle, which fully replaces it. Code Mode tool builders (registerTools, registerPersistedOpsResource) gained a target *mcp.Server parameter so the same tool surface mounts onto either the standalone listener or an externally-owned per-tenant MCP. RegisterOn + EnsureStarted are the new exported entrypoints. The standalone Start/Stop path is unchanged.

coderabbitai · 2026-05-08T15:18:47Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Adds a Code Mode subsystem: proto, Make/demo updates, router MCP multi-server and lifecycle wiring, a dedicated Code Mode MCP server (search/generate/execute), sandbox/harness (transpile, shape-check, QuickJS execution, validation), in-memory and Redis storage backends with TypeScript bundle generation, a Yoko client and demo mock, observability/call-trace, many tests, and demo assets.

Changes

Code Mode feature

Layer / File(s)	Summary
Build / Demos / Proto `Makefile`, `buf.yaml`, `.gitignore`, `proto/wg/cosmo/code_mode/yoko/v1/yoko.proto`, `demo/...`	Add buf dependency; update `generate-go` include/imports and proto path; add demo scripts/configs (`demo/code-mode*`, `start.sh`, `run_subgraphs_subset.sh`) and Connect demo files.
Router & MCP multi-server `router/core/`, `router/core/router.go`, `router/core/router_config.go`, `router/pkg/mcpserver/`	Introduce MCP MultiServer, operations watcher, upstream SDL introspection, mount path support, diffed tool reloads, and start/stop lifecycle wiring for Code Mode server.
Code Mode server entry & handlers `router/internal/codemode/server/*`	New Code Mode server: lifecycle builder, start/stop/reload, MCP tool handlers (`search`, `run_js`, `generate_query`), persisted-ops resource, session plumbing, approval elicitation, and tests.
Sandbox & harness `router/internal/codemode/sandbox/`, `router/internal/codemode/harness/`	Sandbox abstraction, QuickJS preamble, validation, error normalization/source-map rewriting, transpile (esbuild), shape-check, pipeline, result envelope/truncation logic, semaphore/concurrency, and extensive tests.
Storage & TypeScript generation `router/internal/codemode/storage/`, `router/internal/codemode/tsgen/`	Memory and Redis backends, naming/canonicalization, SessionStorage/Renderer interfaces, TypeScript bundle/fragment generation, and tests.
Yoko client & demo mock `router/internal/codemode/yoko/`, `demo/code-mode/yoko-mock/`	Add Yoko proto, client with indexing/caching/singleflight, demo yoko-mock service integrating codex, and tests.
Observability / Call-trace `router/internal/codemode/observability/`, `router/internal/codemode/calltrace/`	Logging, tracing, metrics, call-trace FileRecorder and tests.
Tests / Fixtures / Config / Loader `router-tests/`, `router/pkg/config/`, `router/mcp.config.yaml`, `router/pkg/schemaloader/*`	Large test suites across modules; config schema/types updated for MCP code_mode and servers; schemaloader strips operation descriptions.
Minor subgraph & misc fixes `demo/pkg/subgraphs/`, `router/pkg/grpcconnector/`, `router/go.mod`	Resolver publish helpers, NATS best-effort startup, mutex fix in plugin Stop, go.mod bumps across modules.

Estimated code review effort:
🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

wundergraph/cosmo#2823: Related router/graph-server lifecycle and reload logic.
wundergraph/cosmo#2844: Related router core & MCP startup/reload changes.
wundergraph/cosmo#2636: Related MCP server surface and per-tool auth/scope wiring.

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

github-actions · 2026-05-08T15:20:32Z

Router-nonroot image scan passed

✅ No security vulnerabilities found in image:

ghcr.io/wundergraph/cosmo/router:sha-2169f1fc78b42b688cccc5f9be4e5afd1357e98a-nonroot

codecov · 2026-05-08T15:23:40Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 40.25%. Comparing base (9780ce7) to head (7f86a29).
⚠️ Report is 31 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #2845       +/-   ##
===========================================
- Coverage   64.32%   40.25%   -24.07%     
===========================================
  Files         311      278       -33     
  Lines       44308    31990    -12318     
  Branches     4767        0     -4767     
===========================================
- Hits        28502    12879    -15623     
- Misses      15783    17712     +1929     
- Partials       23     1399     +1376

Files with missing lines	Coverage Δ
router/core/errors.go	`72.07% <ø> (ø)`
router/core/factoryresolver.go	`62.98% <ø> (ø)`
router/core/flushwriter.go	`71.79% <ø> (ø)`
router/core/graph_server.go	`71.19% <ø> (ø)`
router/core/graphql_handler.go	`62.71% <ø> (ø)`
router/core/header_rule_engine.go	`35.62% <ø> (ø)`
router/core/plan_generator.go	`0.00% <ø> (ø)`
router/core/router.go	`54.83% <ø> (ø)`
router/core/router_config.go	`88.12% <ø> (ø)`
router/core/transport.go	`75.00% <ø> (ø)`
... and 50 more

... and 529 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

coderabbitai

Actionable comments posted: 5

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

router-tests/go.mod (1)
36-36: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Critical: Upgrade go.opentelemetry.io/otel/sdk to v1.40.0+ to remediate CVE-2026-24051.

Line 36 requires go.opentelemetry.io/otel/sdk v1.39.0, which is vulnerable to GHSA-9h8m-3fm2-qjrq (CVE-2026-24051, PATH hijacking on macOS/Darwin). Based on learnings, google.golang.org/grpc v1.79.3 can transitively pull a vulnerable go.opentelemetry.io/otel/sdk version. In every module go.mod, explicitly pin go.opentelemetry.io/otel/sdk to v1.40.0 or later.

Additionally, the replace directive at line 213 pins this module to v1.28.0, which also needs updating.
🔒 Proposed fix
-	go.opentelemetry.io/otel/sdk v1.39.0
+	go.opentelemetry.io/otel/sdk v1.40.0
And update the replace directive:
-	go.opentelemetry.io/otel/sdk => go.opentelemetry.io/otel/sdk v1.28.0
+	go.opentelemetry.io/otel/sdk => go.opentelemetry.io/otel/sdk v1.40.0
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router-tests/go.mod` at line 36, Update the go.mod entries that pin the
OpenTelemetry SDK: change the required version of module
go.opentelemetry.io/otel/sdk from v1.39.0 to v1.40.0 (or later) and also update
any replace directive that forces go.opentelemetry.io/otel/sdk to v1.28.0 so it
points to v1.40.0+; ensure both the direct require and the replace for
go.opentelemetry.io/otel/sdk are synchronized to v1.40.0 or newer to remediate
CVE-2026-24051.
router/pkg/mcpserver/server.go (1)
1171-1192: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Apply configured upstream headers on outbound GraphQL calls.

WithUpstreamHeaders is wired into GraphQLSchemaServer, but executeGraphQLQuery() only forwards headers from the incoming MCP request. Any upstream that relies on static auth/routing headers will reload fine and then fail every tool call at runtime.
Suggested fix
 	if err != nil {
 		s.logger.Debug("failed to get headers from context", zap.Error(err))
 	} else {
 		// Copy all headers from the MCP request
 		for key, values := range reqHeaders {
@@
 			}
 		}
 	}
+
+	for key, value := range s.upstreamHeaders {
+		req.Header.Set(key, value)
+	}
 
 	// Override specific headers that must be set for GraphQL requests
 	req.Header.Set("Accept", "application/json")
 	req.Header.Set("Content-Type", "application/json; charset=utf-8")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/pkg/mcpserver/server.go` around lines 1171 - 1192, executeGraphQLQuery
currently only forwards MCP request headers (via headersFromContext) and never
applies the configured upstream/static headers provided by
WithUpstreamHeaders/GraphQLSchemaServer, causing upstream auth/routing headers
to be omitted on outbound GraphQL calls; update executeGraphQLQuery to merge the
configured upstream headers into the outgoing req.Header (apply the upstream
header map from the GraphQLSchemaServer/WithUpstreamHeaders configuration or the
server's upstream config), ensuring you still skip headers.SkippedHeaders and
that Accept/Content-Type are set after merging so they take precedence.

🟠 Major comments (26)

demo/pkg/subgraphs/employees/subgraph/schema.resolvers.go-64-66 (1)

64-66: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't return the shared slice element after unlocking.

Line 66 now returns the same *model.Employee stored in r.EmployeesData. Once the deferred unlock runs, another request can mutate that object while gqlgen is still resolving the response, so the mutation no longer returns a stable snapshot.

Proposed fix

 		if id == employee.ID {
 			employee.Tag = tag
 			employee.UpdatedAt = time.Now().String()
-			return employee, nil
+			updated := *employee
+			if employee.Details != nil {
+				details := *employee.Details
+				updated.Details = &details
+			}
+			return &updated, nil
 		}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@demo/pkg/subgraphs/employees/subgraph/schema.resolvers.go` around lines 64 -
66, The resolver currently returns a pointer to the shared element from
r.EmployeesData (the variable employee) which can be mutated after the mutex is
released; instead, create and return a private copy before unlocking: copy the
struct (e.g., cloned := *employee) and return &cloned so callers get a stable
snapshot; ensure the cloned object reflects the updated fields (Tag, UpdatedAt)
before returning.

demo/pkg/subgraphs/availability/subgraph/schema.resolvers.go-18-33 (1)

18-33: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Move this helper to Resolver to survive codegen.

Custom code in auto-generated files (schema.resolvers.go) will be lost on the next gqlgen regeneration. Since mutationResolver embeds *Resolver, the method works identically when placed on the stable, hand-written Resolver type instead. This also aligns with gqlgen's intended pattern of injecting dependencies via Resolver.
Minimal fix
-func (r *mutationResolver) publishAvailabilityEvent(ctx context.Context, providerID, subject, payload string) {
+func (r *Resolver) publishAvailabilityEvent(ctx context.Context, providerID, subject, payload string) {
Move the entire method (lines 18–33) from schema.resolvers.go to resolver.go.
Also applies to: 40-41
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@demo/pkg/subgraphs/availability/subgraph/schema.resolvers.go` around lines 18
- 33, The publishAvailabilityEvent helper is defined on the generated type
*mutationResolver and will be lost on gqlgen regeneration; move the entire
method body to the hand-written Resolver type (change the receiver from (r
*mutationResolver) to (r *Resolver)) in resolver.go, keep the function name
publishAvailabilityEvent and its signature (ctx context.Context, providerID,
subject, payload string), and ensure it still references
r.NatsPubSubByProviderID and uses the same logging and adapter.Publish call;
remove the old method from schema.resolvers.go so there are no duplicate
definitions.

demo/code-mode/mcp-stdio-proxy/go.mod-10-23 (1)

10-23: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add an explicit otel/sdk pin to meet repository security policy.

This module does not have the explicit go.opentelemetry.io/otel/sdk version pin required by the repository's security policy to mitigate CVE-2026-24051/GHSA-9h8m-3fm2-qjrq. Add v1.40.0 or newer to lines 10-23 in the require block.

Suggested change

 require (
 	github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc // indirect
 	github.com/google/jsonschema-go v0.4.2 // indirect
 	github.com/kr/pretty v0.3.1 // indirect
+	go.opentelemetry.io/otel/sdk v1.40.0 // indirect
 	github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 // indirect
 	github.com/rogpeppe/go-internal v1.13.1 // indirect
 	github.com/segmentio/asm v1.1.3 // indirect
 	github.com/segmentio/encoding v0.5.4 // indirect
 	github.com/yosida95/uritemplate/v3 v3.0.2 // indirect
 	golang.org/x/oauth2 v0.34.0 // indirect
 	golang.org/x/sys v0.40.0 // indirect
 	gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c // indirect
 	gopkg.in/yaml.v3 v3.0.1 // indirect
 )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@demo/code-mode/mcp-stdio-proxy/go.mod` around lines 10 - 23, Add an explicit
pinned dependency for the OpenTelemetry SDK in the require block of go.mod: add
the line "go.opentelemetry.io/otel/sdk v1.40.0" (or newer) alongside the other
entries in the require (...) section so the repository security policy
requirement (CVE-2026-24051/GHSA-9h8m-3fm2-qjrq mitigation) is satisfied; update
the require block that currently lists indirect modules (the block shown in
go.mod) to include this SDK entry.

router/pkg/mcpserver/operations_watcher.go-91-94 (1)

91-94: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Skip non-root WalkDir entry errors instead of aborting the full snapshot.

At Line 93, returning the walk error causes the whole scan to fail. A single unreadable child path can make every tick fail and effectively disable hot-reload.

🛠️ Proposed fix

 	walkErr := filepath.WalkDir(dir, func(path string, d fs.DirEntry, err error) error {
 		if err != nil {
-			return err
+			// Keep root failures explicit, but tolerate unreadable children.
+			if path == dir {
+				return err
+			}
+			return nil
 		}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/pkg/mcpserver/operations_watcher.go` around lines 91 - 94, The current
filepath.WalkDir callback returns any entry error which aborts the entire walk
(variable walkErr) and causes the full snapshot to fail when a single child path
is unreadable; change the callback used in WalkDir so that if err != nil and
path != dir (i.e., a non-root entry) you log/record the error and return nil to
continue scanning, only returning err to abort when the root dir itself is
inaccessible; update the callback surrounding filepath.WalkDir and any
associated logging to reflect skipping unreadable child entries instead of
aborting the whole snapshot.

router/internal/codemode/tsgen/graphql.go-332-350 (1)

332-350: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't apply impossible fragments to concrete parents.

These helpers replace parent with the fragment type condition without checking whether that condition can overlap the enclosing concrete type. For example, viewer { ... on Dog { bark } } on a User field would currently emit bark in the generated shape even though the fragment can never match.

Also applies to: 352-376

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/internal/codemode/tsgen/graphql.go` around lines 332 - 350, The
inlineFragmentProperties function (and the analogous block at 352-376) currently
replaces the parent with the fragment's type condition unconditionally; change
it to first verify the fragment type can overlap the enclosing parent before
switching parents: use opDoc.InlineFragmentHasTypeCondition and
InlineFragmentTypeConditionNameString to get the fragment type, resolve it via
r.schema.Index.FirstNonExtensionNodeByNameBytes, and then check whether that
fragment type can possibly be an instance/overlap of the original concrete
parent (e.g., by examining schema type relationships / implements/possibleTypes)
— only when an overlap is possible set fragmentParent = node and proceed to call
r.selectionSetType and propertiesFromInlineObject; if there is no possible
overlap, return an empty property list instead of applying impossible fields.

router/pkg/config/config.schema.json-2777-2856 (1)

2777-2856: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add server-scoped OAuth validation here.

servers[].oauth only reuses the top-level schema shape. The existing mcp-level if/then never runs for these entries, so servers[].oauth.enabled: true can validate without jwks and leave you with a config that passes schema validation but cannot actually enforce OAuth.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/pkg/config/config.schema.json` around lines 2777 - 2856, Servers-level
oauth entries (servers[].oauth) reuse the top-level mcp oauth shape but miss the
mcp-level if/then that enforces required JWKS when oauth.enabled is true, so
configs can pass validation without jwks. Fix by moving the conditional
validation into the shared oauth schema (or a new reusable definition like
"oauth") and reference that from both "#/properties/mcp/properties/oauth" and
"servers[].properties/oauth" (or add the same if/then directly under
"servers.items.properties.oauth") so that when "oauth.enabled": true the schema
requires the jwks (or equivalent JWKS fields) for servers as well as the
top-level mcp.

router/internal/codemode/tsgen/graphql.go-69-84 (1)

69-84: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep defaulted GraphQL variables optional.

Line 72 treats every NonNull variable as required, but GraphQL variables with defaults are still optional at the call site. A query like query Q($id: ID! = "1") will currently render a required vars.id, which makes valid tool calls fail type-checking.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/internal/codemode/tsgen/graphql.go` around lines 69 - 84, The code
currently marks a variable required whenever opDoc.Types[typeRef].TypeKind ==
ast.TypeKindNonNull; change that so a NonNull variable is only treated as
required when it has no default value. Inspect the variable definition for a
default (using the same varRef and opDoc APIs you already use, e.g. the
VariableDefinition default accessor on opDoc/varRef) and compute required :=
(opDoc.Types[typeRef].TypeKind == ast.TypeKindNonNull) && (variable has no
default); keep the rest of the logic (nullable handling, varsOptional update,
tsProperty optional = !required) unchanged so variables with defaults remain
optional at the call site.

router/internal/codemode/calltrace/calltrace.go-15-19 (1)

15-19: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Persist request/response direction in the trace format.

Both methods serialize the same Record shape, so once calls interleave there is no reliable way to tell whether a given line is the request or the response for a tool invocation.

💡 Minimal fix

 type Record struct {
+	Kind      string          `json:"kind"`
 	ToolName  string          `json:"tool_name"`
 	Timestamp time.Time       `json:"timestamp"`
 	Body      json.RawMessage `json:"body"`
 }
@@
 func (r *FileRecorder) RecordRequest(toolName string, body []byte) {
-	r.record(toolName, body)
+	r.record("request", toolName, body)
 }
 
 func (r *FileRecorder) RecordResponse(toolName string, body []byte) {
-	r.record(toolName, body)
+	r.record("response", toolName, body)
 }
 
-func (r *FileRecorder) record(toolName string, body []byte) {
+func (r *FileRecorder) record(kind, toolName string, body []byte) {
 	if r == nil || r.path == "" {
 		return
 	}
 	line, err := json.Marshal(Record{
+		Kind:      kind,
 		ToolName:  toolName,
 		Timestamp: r.now(),
 		Body:      json.RawMessage(body),
 	})

Also applies to: 53-68

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/internal/codemode/calltrace/calltrace.go` around lines 15 - 19, The
Record struct currently lacks any field indicating whether an entry is a request
or a response, so interleaved traces lose directionality; add a Direction
(string or enum) field to the Record struct (e.g., Direction string
`json:"direction"`) and update the code paths that create/serialize Record for
both the request and response flows to set Direction to "request" or "response"
accordingly (ensure both serialization sites that currently populate
ToolName/Timestamp/Body are updated to include the new field).

router/internal/codemode/storage/memory_backend.go-111-129 (1)

111-129: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Refresh lastUsed even when every op was deduped.

lastUsed only moves forward when appendedAny is true. A client that keeps re-sending already-registered ops will still look idle to TTL/LRU eviction and can be swept despite active use.

💡 Minimal fix

-	if appendedAny {
-		session.lastUsed = b.now()
+	session.lastUsed = b.now()
+	if appendedAny {
 		session.bundle = ""
 		session.bundleValid = false
 	}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/internal/codemode/storage/memory_backend.go` around lines 111 - 129,
The current loop deduplicates ops but only updates session.lastUsed (and resets
session.bundle/session.bundleValid) when appendedAny is true, causing clients
that re-send only-duplicate ops to appear idle; change the logic so
session.lastUsed = b.now() and session.bundle = "" and session.bundleValid =
false are set unconditionally after the loop (leave the deduplication via
CanonicalBody/byBody and appending to session.ops as-is), so lastUsed is
refreshed even when all ops were deduped.

demo/code-mode/yoko-mock/main.go-159-201 (1)

159-201: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Race condition in IndexSchema may cause duplicate schema indexing.

Between releasing the read lock (line 172) and re-acquiring the write lock (line 195), another request for the same SDL could also pass the existence check and start indexing. Both would create temp directories, run codex, and race to insert—one entry gets overwritten and its temp dir leaks.

🔧 Proposed fix using double-checked locking with write lock

 func (s *yokoService) IndexSchema(ctx context.Context, req *connect.Request[yokov1.IndexSchemaRequest]) (*connect.Response[yokov1.IndexSchemaResponse], error) {
 	sdl := req.Msg.GetSdl()
 	id := schemaID(sdl)
 
-	s.mu.Lock()
+	s.mu.RLock()
 	if existing, ok := s.schemas[id]; ok {
-		s.mu.Unlock()
+		s.mu.RUnlock()
 		existing.mu.RLock()
 		existingSession := existing.sessionID
 		existing.mu.RUnlock()
 		log.Printf("IndexSchema schema_id=%s reused dir=%s session_id=%s", id, existing.dir, existingSession)
 		return connect.NewResponse(&yokov1.IndexSchemaResponse{SchemaId: id}), nil
 	}
-	s.mu.Unlock()
+	s.mu.RUnlock()
+
+	// Perform expensive work outside the lock
+	schemaDoc, err := parseSchemaSDL(sdl)
+	// ... existing parsing and codex indexing ...
 
-	schemaDoc, err := parseSchemaSDL(sdl)
-	if err != nil {
-		return nil, connect.NewError(connect.CodeInvalidArgument, fmt.Errorf("parse schema SDL: %w", err))
-	}
-	// ... rest of indexing ...
-
+	// Re-check under write lock before inserting
+	s.mu.Lock()
+	if existing, ok := s.schemas[id]; ok {
+		s.mu.Unlock()
+		// Another request won the race; clean up our work
+		_ = os.RemoveAll(dir)
+		existing.mu.RLock()
+		existingSession := existing.sessionID
+		existing.mu.RUnlock()
+		log.Printf("IndexSchema schema_id=%s reused (lost race) dir=%s session_id=%s", id, existing.dir, existingSession)
+		return connect.NewResponse(&yokov1.IndexSchemaResponse{SchemaId: id}), nil
+	}
 	entry := &schemaEntry{dir: dir, schema: schemaDoc, sessionID: sessionID}
-	s.mu.Lock()
 	s.schemas[id] = entry
 	s.mu.Unlock()

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@demo/code-mode/yoko-mock/main.go` around lines 159 - 201, IndexSchema has a
race: after releasing s.mu to do heavy work (create temp dir, parse SDL,
runCodexIndex) another goroutine can insert the same schema, causing duplicate
work and leaked temp dirs. Fix by using double-checked locking: after preparing
the new entry (dir, schemaDoc, sessionID) but before assigning into s.schemas,
acquire s.mu.Lock() again and check if an entry for schemaID already exists; if
it does, release the lock, clean up your newly created resources (RemoveAll dir)
and return the existing schema response (use existing.sessionID), otherwise
insert your entry and release the lock. Reference symbols: IndexSchema, s.mu,
s.schemas, schemaID, schemaEntry, runCodexIndex.

router/internal/codemode/storage/redis_backend.go-96-164 (1)

96-164: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Infinite retry loop does not distinguish retryable vs non-retryable errors.

The Append method retries on all errors from Watch, including connection failures, authentication errors, or other non-transient issues. Only redis.TxFailedErr (optimistic lock conflict) should trigger a retry; other errors should be returned immediately.

🔧 Proposed fix to distinguish error types

 		}, opsKey)
 		if err == nil {
 			return resolved, nil
 		}
+		if !errors.Is(err, redis.TxFailedErr) {
+			return nil, fmt.Errorf("code mode redis append: %w", err)
+		}
 
 		b.logger.Debug("retrying code mode redis append",
 			zap.String("session_id", sessionID),
 			zap.Error(err),
 		)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/internal/codemode/storage/redis_backend.go` around lines 96 - 164, The
loop retries on every error from b.client.Watch; change it to only loop on
redis.TxFailedErr (optimistic lock conflicts) and immediately return other
errors. After calling b.client.Watch(ctx, ..., opsKey) check the returned err:
if err == nil return resolved, nil; if errors.Is(err, redis.TxFailedErr)
continue the retry loop (with logging/backoff) ; otherwise return nil, err.
Reference b.client.Watch, redis.TxFailedErr, opsKey, resolved and the
surrounding retry/backoff logic when making the change.

router/core/graph_server.go-1403-1410 (1)

1403-1410: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Treat Code Mode reload failures as build failures (and use precise error text).

At Lines 1406-1408, reload/SDL print failures are only logged, so the graph can start with stale Code Mode schema/session state after an update. This should fail buildGraphMux (like s.mcpServer.Reload does) to avoid partial initialization.

💡 Suggested fix

 	if opts.IsBaseGraph() && s.codeModeServer != nil {
 		sdl, printErr := astprinter.PrintString(executor.ClientSchema)
 		if printErr != nil {
-			s.logger.Error("failed to reload MCP server", zap.Error(fmt.Errorf("failed to print Code Mode schema SDL: %w", printErr)))
-		} else if mErr := s.codeModeServer.Reload(executor.ClientSchema, sdl); mErr != nil {
-			s.logger.Error("failed to reload MCP server", zap.Error(mErr))
+			return nil, fmt.Errorf("failed to print Code Mode schema SDL: %w", printErr)
+		}
+		if mErr := s.codeModeServer.Reload(executor.ClientSchema, sdl); mErr != nil {
+			return nil, fmt.Errorf("failed to reload Code Mode server: %w", mErr)
 		}
 	}

Based on learnings: in this repository, buildGraphMux error paths should clean up partially initialized resources before returning.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/core/graph_server.go` around lines 1403 - 1410, The Code Mode SDL
print and Reload failures in the block guarded by opts.IsBaseGraph() and
s.codeModeServer must return an error from buildGraphMux (not just log) so graph
startup doesn't proceed with stale state; change the branch that calls
astprinter.PrintString(executor.ClientSchema) and s.codeModeServer.Reload(...)
to propagate a wrapped error (including the precise printErr or mErr text) back
to buildGraphMux, and ensure you clean up any partially initialized resources
(mirror the cleanup pattern used after s.mcpServer.Reload failures) before
returning so partially-initialized servers are torn down.

router/mcp.config.yaml-123-129 (1)

123-129: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Remove user-specific absolute filesystem paths from committed config.

Lines 123-129 expose local machine paths (/Users/asoorm/...) and make this demo config fail out-of-the-box for everyone else.

💡 Suggested fix

-    - id: rickandmorty
-      path: /Users/asoorm/go/src/github.com/wundergraph/conference-mcp-demo/.operations/rickandmorty
-    - id: countries
-      path: /Users/asoorm/go/src/github.com/wundergraph/conference-mcp-demo/.operations/countries
-    - id: anilist
-      path: /Users/asoorm/go/src/github.com/wundergraph/conference-mcp-demo/.operations/anilist
-    - id: swapi
-      path: /Users/asoorm/go/src/github.com/wundergraph/conference-mcp-demo/.operations/swapi
+    - id: rickandmorty
+      path: /ABSOLUTE/PATH/TO/.operations/rickandmorty
+    - id: countries
+      path: /ABSOLUTE/PATH/TO/.operations/countries
+    - id: anilist
+      path: /ABSOLUTE/PATH/TO/.operations/anilist
+    - id: swapi
+      path: /ABSOLUTE/PATH/TO/.operations/swapi

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/mcp.config.yaml` around lines 123 - 129, The committed
router/mcp.config.yaml contains user-specific absolute paths for operations
(ids: countries, anilist, swapi); remove these hard-coded /Users/… entries and
replace them with portable references (e.g., relative paths like
.operations/countries, .operations/anilist, .operations/swapi or
environment-variable placeholders such as ${OPERATIONS_DIR}/countries) so the
config works on other machines and CI — update the entries for id: countries,
id: anilist and id: swapi accordingly and ensure any loader code expects
relative or configurable paths.

demo/code-mode-connect/start.sh-152-157 (1)

152-157: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

PID tracking is attached to the tee pipeline, not reliably to the router process.

Lines 152-157 background a pipeline and store $!; in this form shutdown can end up signaling tee while the router keeps running.

🛠️ Suggested fix

-(
-  cd "$YOKO_DIR"
-  exec "$ROUTER_BIN" -config "$ROUTER_CONFIG"
-) 2>&1 | tee "$LOG_DIR/router.log" &
-router_pid="$!"
+(
+  cd "$YOKO_DIR"
+  exec "$ROUTER_BIN" -config "$ROUTER_CONFIG" \
+    > >(tee "$LOG_DIR/router.log") \
+    2>&1
+) &
+router_pid="$!"
 append_pid router "$router_pid"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@demo/code-mode-connect/start.sh` around lines 152 - 157, The current startup
backgrounds a pipeline so router_pid ends up as the tee process PID; change the
launch so the router process itself is backgrounded and its PID captured (e.g.
use process substitution or redirect stdout/stderr into tee instead of piping
the subshell into tee) so that the subshell/exec block that runs "$ROUTER_BIN"
-config "$ROUTER_CONFIG" (inside the cd "$YOKO_DIR" subshell) is the
backgrounded process whose PID is assigned to router_pid and then passed to
append_pid; ensure LOG_DIR/router.log is still written via tee but do not
background the tee process itself.

router/core/router.go-948-953 (1)

948-953: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Rollback MCP startup if Code Mode initialization fails.

If startMCPServer succeeds and startCodeModeServer errors, bootstrap returns with the MCP listener still running. That leaks the port and makes a retry in the same process fail immediately. Please stop any server started earlier before returning here.
Suggested fix
 	if err := r.startMCPServer(ctx); err != nil {
 		return err
 	}
 	if err := r.startCodeModeServer(ctx); err != nil {
+		if r.mcpServer != nil {
+			if stopErr := r.mcpServer.Stop(ctx); stopErr != nil {
+				r.logger.Warn("Failed to stop MCP server during bootstrap rollback", zap.Error(stopErr))
+			}
+			r.mcpServer = nil
+		}
 		return err
 	}
Based on learnings: ensure error paths in router/core clean up partially initialized resources before returning.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/core/router.go` around lines 948 - 953, When bootstrap calls
startMCPServer then startCodeModeServer, ensure the MCP server started by
startMCPServer is torn down if startCodeModeServer returns an error: after the
failing call to startCodeModeServer in bootstrap, invoke the MCP
shutdown/cleanup routine (prefer r.stopMCPServer(ctx) or
r.shutdownMCPServer(ctx) if available; otherwise close the underlying listener
field such as r.mcpListener.Close() and nil it) before returning the original
error so the port is not leaked.

router/internal/codemode/sandbox/validation.go-25-50 (1)

25-50: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't treat repeated references as cycles.

seen is global for the whole walk, so a shape like {a: shared, b: shared} will sanitize b as <<non-serializable: cycle>> even though JSON.stringify handles that object graph just fine. That mutates valid results and emits false warnings.

Suggested fix

-  const seen = new WeakSet();
+  const inPath = new WeakSet();
   const keyPath = (base, key) => {
     if (typeof key === "number") return base + "[" + key + "]";
     return /^[A-Za-z_$][A-Za-z0-9_$]*$/.test(key) ? base + "." + key : base + "[" + JSON.stringify(key) + "]";
   };
   const sentinel = (kind) => "<<non-serializable: " + kind + ">>";
   const walk = (v, path, parent, key) => {
     const t = typeof v;
     if (t === "bigint" || t === "function" || t === "symbol" || t === "undefined") {
       parent[key] = sentinel(t);
       warnings.push({ path, kind: t });
       return;
     }
     if (v && t === "object") {
-      if (seen.has(v)) {
+      if (inPath.has(v)) {
         parent[key] = sentinel("cycle");
         warnings.push({ path, kind: "cycle" });
         return;
       }
-      seen.add(v);
+      inPath.add(v);
       if (Array.isArray(v)) {
         for (let i = 0; i < v.length; i++) walk(v[i], keyPath(path, i), v, i);
+        inPath.delete(v);
         return;
       }
       for (const k of Object.keys(v)) walk(v[k], keyPath(path, k), v, k);
+      inPath.delete(v);
     }
   };

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/internal/codemode/sandbox/validation.go` around lines 25 - 50, The
current global seen WeakSet causes shared references to be labeled as cycles;
change the detection in walk to use an ancestor-specific check instead of a
global “seen” for the whole traversal: replace the global seen logic with an
ancestors stack/WeakSet that you push the object into on entry and pop on exit
(or use a WeakMap to track objects currently on the recursion path), so that if
walk encounters an object already on the current ancestor stack you treat it as
a true cycle (emit sentinel("cycle") and push warning), but if the object was
seen earlier outside the current path (a repeated/shared reference) you leave it
unchanged and do not emit a warning; update walk to push before recursing into
properties/array elements and ensure you remove the object from the ancestor set
on return (preserve keyPath, sentinel, and warnings usage).

router/core/router.go-1273-1307 (1)

1273-1307: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Return early when Code Mode is disabled, and reuse the resolved router URL.

This function still resolves storage and builds a Code Mode server before checking mcp.code_mode.enabled, so a disabled feature can still fail router startup. It also hardcodes r.graphqlEndpointURL, which skips the mcp.router_url override that startMCPServer already honors.

Suggested fix

 func (r *Router) startCodeModeServer(ctx context.Context) error {
+	if !r.mcp.CodeMode.Enabled {
+		return nil
+	}
+
+	routerGraphQLEndpoint := r.graphqlEndpointURL
+	if r.mcp.RouterURL != "" {
+		routerGraphQLEndpoint = r.mcp.RouterURL
+	}
+
 	var redisProvider *config.RedisStorageProvider
-	if r.mcp.CodeMode.Enabled && r.mcp.CodeMode.NamedOps.Enabled {
+	if r.mcp.CodeMode.NamedOps.Enabled {
 		if providerID := r.mcp.CodeMode.NamedOps.Storage.ProviderID; providerID != "" {
 			provider, ok := r.providerRegistry.Redis(providerID)
 			if !ok {
 				return fmt.Errorf("redis storage provider with id '%s' for mcp code_mode named_ops not found", providerID)
 			}
 			redisProvider = &provider
 		}
 	}
 
 	cm, err := codemodeserver.BuildFromConfig(codemodeserver.BuildOptions{
 		Config:           r.mcp.CodeMode,
 		SessionStateless: r.mcp.Session.Stateless,
-		RouterGraphQLURL: r.graphqlEndpointURL,
+		RouterGraphQLURL: routerGraphQLEndpoint,
 		Logger:           r.logger,
 		TracerProvider:   r.tracerProvider,
 		MeterProvider:    r.otlpMeterProvider,
 		RedisProvider:    redisProvider,
@@
 	})
 	if err != nil {
 		return fmt.Errorf("failed to create code mode MCP server: %w", err)
 	}
 	r.codeModeServer = cm
-
-	if !r.mcp.CodeMode.Enabled {
-		return nil
-	}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/core/router.go` around lines 1273 - 1307, Move the early check of
r.mcp.CodeMode.Enabled to the top of startCodeModeServer so the function returns
immediately when Code Mode is disabled (avoid resolving storage/providers or
building the server); when enabled, reuse the same router URL used elsewhere by
passing the resolved router URL (use r.resolvedRouterURL if available, falling
back to r.graphqlEndpointURL) into codemodeserver.BuildFromConfig via
RouterGraphQLURL, and only resolve Redis provider and call
codemodeserver.BuildFromConfig after confirming r.mcp.CodeMode.Enabled.

router/internal/codemode/server/lifecycle.go-143-157 (1)

143-157: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail fast on unsupported JWT auth.

When cfg.Auth.Type == "jwt", this logs and continues with an unauthenticated HTTP client. That turns an explicit auth configuration into a silent fallback and can send Yoko traffic without the expected credentials. Return a build-time error until JWT auth is actually implemented.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/internal/codemode/server/lifecycle.go` around lines 143 - 157, The
buildYokoClient function currently logs and continues when cfg.Auth.Type ==
"jwt", silently creating an unauthenticated client; change this to fail fast by
returning an error (or nil plus logging and panicking as your project idiom
requires) when JWT auth is requested but not implemented. Update buildYokoClient
to detect cfg.Auth.Type == "jwt" and surface a clear build-time failure instead
of using the unauthenticated client (referencing buildYokoClient, cfg.Auth.Type,
and the current logger.Warn call) so callers cannot accidentally send Yoko
traffic without credentials.

router/internal/codemode/harness/envelope.go-33-60 (1)

33-60: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Enforce the byte budget on the full envelope.

The size check only considers sandboxResult.Result. A small result plus a large warnings slice or error stack can still marshal into an oversized MCP payload, so this can exceed maxResultBytes on the exact paths it is supposed to cap. Recheck the assembled ResultEnvelope, or reserve budget for warnings and error before treating the payload as in-bounds.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/internal/codemode/harness/envelope.go` around lines 33 - 60,
BuildEnvelope currently only checks sandboxResult.Result size; instead assemble
the ResultEnvelope (using cloneErrorEnvelope for errors and including Warnings)
and marshal it to measure bytes against maxResultBytes, then proceed to
structurallyTruncate(sandboxResult.Result, maxResultBytes) or previewEnvelope as
fallbacks if the marshalled envelope exceeds the budget; ensure any returned
envelope (on OK and non-OK paths) is validated by re-marshalling before
returning and, if too large, apply truncation of the result or a preview and set
Truncated=true; refer to BuildEnvelope, structurallyTruncate, previewEnvelope,
and cloneErrorEnvelope when making the changes.

router/internal/codemode/server/lifecycle.go-122-139 (1)

122-139: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Close the Redis client when backend initialization fails.

factory(...) has already created a live Redis client by the time storage.NewRedisBackend(...) runs. If that second step fails, this returns without releasing the client, so repeated reloads/tests leak sockets and goroutines on the error path.

Suggested fix

 	backend, err := storage.NewRedisBackend(storage.RedisConfig{
 		Client:     client,
 		KeyPrefix:  cfg.NamedOps.Storage.KeyPrefix,
 		SessionTTL: cfg.NamedOps.SessionTTL,
 		Renderer:   renderer,
 		Logger:     logger,
 	})
 	if err != nil {
+		_ = client.Close()
 		return nil, fmt.Errorf("create code mode redis storage backend: %w", err)
 	}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/internal/codemode/server/lifecycle.go` around lines 122 - 139, The
Redis client created by factory(&rediscloser.RedisCloserOptions{...}) (bound to
variable client) is leaked if storage.NewRedisBackend(storage.RedisConfig{...})
fails; modify the error path so that when NewRedisBackend returns an error you
call the client's Close/Shutdown/Close(ctx) method (whichever method exists on
the client type) before returning the wrapped error. Update the lifecycle.go
block around the factory call and storage.NewRedisBackend to ensure client.Close
is invoked on failure to prevent socket/goroutine leaks.

demo/code-mode/mcp-stdio-proxy/main_test.go-140-145 (1)

140-145: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Move cancel() into the deferred cleanup before waiting on errCh.

When the test exits, defer statements execute in LIFO order—the cleanup that calls session.Close() and blocks on <-errCh runs before the standalone defer cancel(). Since runProxy relies on ctx.Done() to exit its main loop (via localServer.Run(ctx, opts.transport) and the supervise() goroutine), the test deadlocks: the cleanup waits for runProxy to exit, but runProxy waits for context cancellation, which doesn't happen until after cleanup completes. Cancel the context before reading errCh to allow runProxy to exit promptly.

Also applies to: 203-208
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@demo/code-mode/mcp-stdio-proxy/main_test.go` around lines 140 - 145, The
deferred cleanup currently closes the session and then waits on errCh while a
separate defer cancel() runs later, causing a deadlock because runProxy (via
localServer.Run and supervise) waits for ctx.Done(); move the cancel call into
the deferred cleanup before blocking on errCh so the context is cancelled prior
to waiting: inside the same defer that calls session.Close() and reads from
errCh, invoke cancel() first, then close the session and read err := <-errCh,
and keep the existing error checks; apply the same change to the other
occurrence near lines 203-208 referencing errCh/session.Close()/cancel().

router/internal/codemode/server/search_handler.go-58-61 (1)

58-61: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Don't let the first caller's ctx control the shared search.

This dedupes concurrent requests, but the shared work runs with whichever request context wins the race. If that leader times out or is canceled, every concurrent duplicate caller inherits the same failed result even when their own contexts are still live. The shared search needs its own bounded context, and followers should be able to stop waiting independently.

router/internal/codemode/yoko/client.go-121-137 (1)

121-137: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Cache the schema ID against the SDL that was actually indexed.

If SetSchema() races with this singleflight, currentSDL can differ from the outer sdl. The code still indexes the old sdl, but then caches the returned ID under currentSDL, which can poison the cache with a schema ID for the wrong schema.

Suggested fix

-		indexedSchemaID := resp.Msg.GetSchemaId()
-		c.cacheSchemaID(currentSDL, indexedSchemaID)
+		indexedSchemaID := resp.Msg.GetSchemaId()
+		c.cacheSchemaID(sdl, indexedSchemaID)
 		return indexedSchemaID, nil
 	})

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/internal/codemode/yoko/client.go` around lines 121 - 137, The code can
cache the returned schema ID under the wrong SDL when SetSchema() races with the
singleflight: inside indexGroup.Do where you call serviceClient.IndexSchema,
stop using currentSDL for caching and instead cache the ID against the SDL that
was actually indexed (the outer sdl passed into indexGroup.Do); update the call
to cacheSchemaID to use that sdl (or any SDL value returned by the IndexSchema
response if the RPC returns it) so the cache entry matches the schema that was
indexed; look at indexGroup.Do, serviceClient.IndexSchema, schemaState and
cacheSchemaID to make the edit.

router/internal/codemode/server/server.go-268-281 (1)

268-281: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Bound the eager indexing RPC with a timeout or cancellable context.

This background warm-up uses context.Background() for a network call. If Yoko stalls, the goroutine can hang forever and these calls can accumulate across schema reloads.

Suggested fix

 			go func() {
 				start := time.Now()
+				indexCtx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
+				defer cancel()
 				defer func() {
 					if r := recover(); r != nil {
 						logger.Error("code mode eager schema index panicked",
 							zap.Any("panic", r),
 							zap.Duration("duration", time.Since(start)),
 						)
 					}
 				}()
 				logger.Info("code mode eager schema index started",
 					zap.Int("sdl_bytes", sdlBytes),
 				)
-				if err := yokoClient.EnsureIndexed(context.Background()); err != nil {
+				if err := yokoClient.EnsureIndexed(indexCtx); err != nil {
 					logger.Warn("code mode eager schema index failed",
 						zap.Error(err),
 						zap.Duration("duration", time.Since(start)),
 					)
 					return
 				}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/internal/codemode/server/server.go` around lines 268 - 281, The
goroutine that performs eager indexing currently calls
yokoClient.EnsureIndexed(context.Background()) and can hang indefinitely; update
that anonymous goroutine to create a cancellable/timeout context (e.g., via
context.WithTimeout or context.WithCancel) tied to a reasonable duration, pass
that context into yokoClient.EnsureIndexed, and ensure you defer cancel() so the
context is cleaned up; keep the existing panic recovery and logging (including
duration) but use the bounded context to prevent goroutine accumulation on
stalled RPCs.

router/pkg/config/config.go-1629-1631 (1)

1629-1631: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate mcp.servers[*].code_mode during config load as well.

This only validates the legacy top-level mcp.code_mode block. The new per-server mcp.servers[*].code_mode configs can bypass ValidateMCPCodeMode, so invalid session/query-generation combinations survive until later startup code.

Suggested fix

-	if err := ValidateMCPCodeMode(&cfg.Config.MCP.CodeMode, cfg.Config.MCP.Session.Stateless); err != nil {
-		return nil, err
-	}
+	servers, err := cfg.Config.MCP.NormalizeServers()
+	if err != nil {
+		return nil, err
+	}
+	for i := range servers {
+		if err := ValidateMCPCodeMode(&servers[i].CodeMode, servers[i].Session.Stateless); err != nil {
+			return nil, fmt.Errorf("mcp.servers[%d] (%q): %w", i, servers[i].Name, err)
+		}
+	}
 
 	return cfg, nil
 }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/pkg/config/config.go` around lines 1629 - 1631, The config loader
currently only calls ValidateMCPCodeMode on the top-level
cfg.Config.MCP.CodeMode; update the loader to also iterate
cfg.Config.MCP.Servers and call ValidateMCPCodeMode(&server.CodeMode,
cfg.Config.MCP.Session.Stateless) for each server entry so per-server
mcp.servers[*].code_mode is validated early; locate the call site where
ValidateMCPCodeMode is invoked and add a loop over cfg.Config.MCP.Servers (using
the server variable name present in the diff) to perform the same validation and
return errors immediately if any validation fails.

router/pkg/mcpserver/multi_server.go-123-128 (1)

123-128: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail startup when the shared listener cannot bind.

Start() returns nil before the socket is actually bound, so an EADDRINUSE/permission failure is only logged from the goroutine and the router keeps running without any MCP endpoint.

Suggested fix

 import (
 	"context"
 	"errors"
 	"fmt"
+	"net"
 	"net/http"
 	"time"
@@
-	go func() {
+	ln, err := net.Listen("tcp", m.listenAddr)
+	if err != nil {
+		return fmt.Errorf("MCP multi-server listen: %w", err)
+	}
+
+	go func() {
 		defer m.logger.Info("MCP multi-server stopped")
-		if err := m.httpServer.ListenAndServe(); err != nil && !errors.Is(err, http.ErrServerClosed) {
+		if err := m.httpServer.Serve(ln); err != nil && !errors.Is(err, http.ErrServerClosed) {
 			m.logger.Error("MCP multi-server failed", zap.Error(err))
 		}
 	}()
 
 	return nil

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/pkg/mcpserver/multi_server.go` around lines 123 - 128, Start()
currently spawns a goroutine that calls m.httpServer.ListenAndServe() and
returns nil before the socket is bound, so bind errors are only logged
asynchronously; change Start() to synchronously create and bind a listener first
(e.g. call net.Listen("tcp", addr) or http.Listen with the same address) and
return the listen error immediately if it fails, then launch the goroutine to
call m.httpServer.Serve(listener) (or go func() { if err :=
m.httpServer.Serve(l); err != nil && !errors.Is(err, http.ErrServerClosed) {
m.logger.Error("MCP multi-server failed", zap.Error(err)) } }()), ensuring
Start() returns the binding error instead of nil; keep existing logging via
m.logger.Info/ m.logger.Error for server lifetime errors.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 55251841-1a90-4776-adb7-4cdb7df60d5e

📥 Commits

Reviewing files that changed from the base of the PR and between 992b6c6 and 70d002a.

⛔ Files ignored due to path filters (8)

buf.lock is excluded by !**/*.lock
demo/code-mode/mcp-stdio-proxy/go.sum is excluded by !**/*.sum
demo/code-mode/yoko-mock/go.sum is excluded by !**/*.sum
router-tests/go.sum is excluded by !**/*.sum
router/gen/proto/buf/validate/validate.pb.go is excluded by !**/*.pb.go, !**/gen/**
router/gen/proto/wg/cosmo/code_mode/yoko/v1/yoko.pb.go is excluded by !**/*.pb.go, !**/gen/**
router/gen/proto/wg/cosmo/code_mode/yoko/v1/yokov1connect/yoko.connect.go is excluded by !**/gen/**
router/go.sum is excluded by !**/*.sum

📒 Files selected for processing (120)

.gitignore
Makefile
buf.yaml
demo/code-mode-connect/README.md
demo/code-mode-connect/router-config.yaml
demo/code-mode-connect/start.sh
demo/code-mode/.gitignore
demo/code-mode/Makefile
demo/code-mode/README.md
demo/code-mode/graph.yaml
demo/code-mode/mcp-configs/README.md
demo/code-mode/mcp-configs/claude.desktop.json
demo/code-mode/mcp-configs/claude.mcp.json
demo/code-mode/mcp-configs/codex.toml
demo/code-mode/mcp-configs/sample-prompts/01-search-employees.txt
demo/code-mode/mcp-configs/sample-prompts/02-execute-fetch.txt
demo/code-mode/mcp-configs/sample-prompts/03-multi-tool.txt
demo/code-mode/mcp-configs/sample-prompts/04-mutation-not-approved.txt
demo/code-mode/mcp-stdio-proxy/go.mod
demo/code-mode/mcp-stdio-proxy/main.go
demo/code-mode/mcp-stdio-proxy/main_test.go
demo/code-mode/prepare-schemas.sh
demo/code-mode/router-config.yaml
demo/code-mode/run_subgraphs_subset.sh
demo/code-mode/start.sh
demo/code-mode/yoko-mock/.gitignore
demo/code-mode/yoko-mock/README.md
demo/code-mode/yoko-mock/go.mod
demo/code-mode/yoko-mock/main.go
demo/code-mode/yoko-mock/main_test.go
demo/code-mode/yoko-mock/schema.graphql
demo/pkg/subgraphs/availability/subgraph/schema.resolvers.go
demo/pkg/subgraphs/employees/subgraph/schema.resolvers.go
demo/pkg/subgraphs/mood/subgraph/schema.resolvers.go
demo/pkg/subgraphs/subgraphs.go
proto/wg/cosmo/code_mode/yoko/v1/yoko.proto
router-tests/code_mode_named_ops_test.go
router-tests/go.mod
router-tests/testenv/testenv.go
router/core/graph_server.go
router/core/router.go
router/core/router_config.go
router/go.mod
router/internal/codemode/calltrace/calltrace.go
router/internal/codemode/calltrace/calltrace_test.go
router/internal/codemode/deps.go
router/internal/codemode/harness/envelope.go
router/internal/codemode/harness/envelope_test.go
router/internal/codemode/harness/pipeline.go
router/internal/codemode/harness/pipeline_test.go
router/internal/codemode/harness/shape.go
router/internal/codemode/harness/shape_test.go
router/internal/codemode/harness/transpile.go
router/internal/codemode/harness/transpile_test.go
router/internal/codemode/observability/logging.go
router/internal/codemode/observability/logging_test.go
router/internal/codemode/observability/metrics.go
router/internal/codemode/observability/metrics_test.go
router/internal/codemode/observability/tracing.go
router/internal/codemode/observability/tracing_test.go
router/internal/codemode/sandbox/errors.go
router/internal/codemode/sandbox/execute.go
router/internal/codemode/sandbox/headers.go
router/internal/codemode/sandbox/host.go
router/internal/codemode/sandbox/preamble.go
router/internal/codemode/sandbox/preamble_test.go
router/internal/codemode/sandbox/sandbox.go
router/internal/codemode/sandbox/sandbox_preamble.js
router/internal/codemode/sandbox/sandbox_test.go
router/internal/codemode/sandbox/semaphore.go
router/internal/codemode/sandbox/validation.go
router/internal/codemode/server/approval.go
router/internal/codemode/server/approval_test.go
router/internal/codemode/server/descriptions/descriptions.go
router/internal/codemode/server/descriptions/execute_source.md
router/internal/codemode/server/descriptions/execute_tool.md
router/internal/codemode/server/descriptions/persisted_ops_resource.md
router/internal/codemode/server/descriptions/search_tool.md
router/internal/codemode/server/execute_handler.go
router/internal/codemode/server/execute_handler_test.go
router/internal/codemode/server/lifecycle.go
router/internal/codemode/server/lifecycle_test.go
router/internal/codemode/server/observability_handler_test.go
router/internal/codemode/server/search_handler.go
router/internal/codemode/server/search_handler_test.go
router/internal/codemode/server/server.go
router/internal/codemode/server/server_test.go
router/internal/codemode/server/session.go
router/internal/codemode/storage/memory_backend.go
router/internal/codemode/storage/memory_backend_test.go
router/internal/codemode/storage/naming.go
router/internal/codemode/storage/naming_test.go
router/internal/codemode/storage/redis_backend.go
router/internal/codemode/storage/redis_backend_test.go
router/internal/codemode/storage/storage.go
router/internal/codemode/storage/types.go
router/internal/codemode/tsgen/bundle_test.go
router/internal/codemode/tsgen/graphql.go
router/internal/codemode/tsgen/tsgen.go
router/internal/codemode/tsgen/tsgen_test.go
router/internal/codemode/tsgen/typescript.go
router/internal/codemode/yoko/client.go
router/internal/codemode/yoko/client_test.go
router/internal/codemode/yoko/searcher.go
router/mcp.config.yaml
router/pkg/codemode/varschema/varschema.go
router/pkg/codemode/varschema/varschema_test.go
router/pkg/config/code_mode_config_test.go
router/pkg/config/code_mode_validation.go
router/pkg/config/config.go
router/pkg/config/config.schema.json
router/pkg/config/fixtures/full.yaml
router/pkg/config/testdata/config_defaults.json
router/pkg/config/testdata/config_full.json
router/pkg/grpcconnector/grpcplugin/grpc_plugin.go
router/pkg/mcpserver/introspection.go
router/pkg/mcpserver/multi_server.go
router/pkg/mcpserver/operations_watcher.go
router/pkg/mcpserver/server.go
router/pkg/schemaloader/loader.go

coderabbitai · 2026-05-08T15:45:14Z

+	done := make(chan awaitResult, 1)
+	go func() {
+		awaited, err := value.Await()
+		done <- awaitResult{value: awaited, err: err}
+	}()


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

In Go, can a defer with recover in one goroutine catch a panic raised in another goroutine?

💡 Result:

No, a defer with recover in one goroutine cannot catch a panic raised in another goroutine. Recover only works within the same goroutine that is panicking, as each goroutine has its own independent call stack and panic handling [1][2][3][4]. The official Go blog explains that recover regains control of a panicking goroutine and is only useful inside deferred functions during panic unwinding in that goroutine [1]. Multiple sources confirm that panics are per-goroutine: a panic in one goroutine terminates only that goroutine unless recovered within it, and cannot be intercepted by another [2][3][4][5]. To handle panics from spawned goroutines, each must wrap its code with its own defer-recover [2][6][4].

Citations:

1: https://go.dev/blog/defer-panic-and-recover

2: https://www.fixdevs.com/blog/go-panic-recover-best-practices/

3: https://cosmiclearn.com/go/panic_recover.php

4: https://medium.com/@sogol.hedayatmanesh/crash-proof-go-services-why-you-must-recover-panics-in-goroutines-whether-you-like-it-or-not-4c2bbecfd191

5: Defer is not run if another goroutine panics golang/go#14542

6: https://www.reddit.com/r/golang/comments/1ihgs2f/recover_from_panics_in_all_goroutines_you_start/

🏁 Script executed:

#!/bin/bash # Locate the execute.go file find . -name "execute.go" -type f | head -20

Repository: wundergraph/cosmo

Length of output: 107

🏁 Script executed:

#!/bin/bash # Read the relevant section around lines 158-162 # First, let's check the file exists and get its content if [ -f "./router/internal/codemode/sandbox/execute.go" ]; then wc -l "./router/internal/codemode/sandbox/execute.go" echo "=== Lines 140-180 ===" sed -n '140,180p' "./router/internal/codemode/sandbox/execute.go" fi

Repository: wundergraph/cosmo

Length of output: 1074

🏁 Script executed:

#!/bin/bash # Look for the Await method definition and awaitResult type rg -A 5 "type awaitResult" . --type go

Repository: wundergraph/cosmo

Length of output: 497

🏁 Script executed:

#!/bin/bash # Search for any documentation or comments about Await panicking rg -B 3 -A 3 "Await.*panic|panic.*Await" . --type go

Repository: wundergraph/cosmo

Length of output: 43

🏁 Script executed:

#!/bin/bash # Find the Execute function definition and look for defer recover sed -n '1,150p' "./router/internal/codemode/sandbox/execute.go" | head -80

Repository: wundergraph/cosmo

Length of output: 2505

🏁 Script executed:

#!/bin/bash # Search for "func Execute" and "defer" to find recovery in Execute rg "func.*Execute|defer.*recover" "./router/internal/codemode/sandbox/execute.go" -B 2 -A 2

Repository: wundergraph/cosmo

Length of output: 736

🏁 Script executed:

#!/bin/bash # Look at the qjs package to understand if Await can panic find . -name "*.go" -path "*/qjs/*" | head -10

Repository: wundergraph/cosmo

Length of output: 43

Recover inside the Await() goroutine.

Any panic from value.Await() happens in a spawned goroutine, so the recover() deferred in Execute cannot catch it. The file comments acknowledge that "qjs v0.0.6 panics from inside its Eval/Free/Close paths when the underlying wazero module is closed by context cancellation," which is precisely what can happen when rt.Close() is called during context cancellation (line 175), potentially unblocking the Await() call and triggering a panic in that goroutine. Without recovery in the spawned goroutine itself, this panic will crash the process instead of being folded into ExecuteResult.Error.

Suggested fix

done := make(chan awaitResult, 1) go func() { + defer func() { + if r := recover(); r != nil { + done <- awaitResult{err: fmt.Errorf("qjs await panic: %v", r)} + } + }() awaited, err := value.Await() done <- awaitResult{value: awaited, err: err} }()

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@router/internal/codemode/sandbox/execute.go` around lines 158 - 162, The goroutine that calls value.Await() can panic (and that panic won't be caught by Execute's defer), so modify the anonymous goroutine (the one that sends to done) to recover inside itself: wrap the call to value.Await() with a defer recover block that converts any panic into an error and sends awaitResult with that error on the done channel; otherwise send the normal awaited value and err as before. Reference the anonymous goroutine that uses value.Await(), the awaitResult type, the done channel, and the Execute function so the fix is localized to ensuring panics from value.Await() are captured and propagated via awaitResult.err instead of crashing the process.

coderabbitai · 2026-05-08T15:45:14Z

+func (g *ElicitationGate) Decide(ctx context.Context, req sandbox.ApprovalRequest) (sandbox.ApprovalDecision, error) {
+	if g == nil || g.elicitor == nil {
+		decision := unsupportedElicitationDecision(errors.New("elicitor is not configured"))
+		recordMutationApproval(ctx, decision)
+		observability.LogElicitationOutcome(g.logger, SessionIDFromContext(ctx), decision.Approved, decision.Reason)
+		return decision, nil


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Make the nil-receiver fallback actually nil-safe.

This branch handles g == nil, but then immediately calls observability.LogElicitationOutcome(g.logger, ...). A typed-nil *ElicitationGate will still panic here instead of returning the fallback decline.

Suggested fix

func (g *ElicitationGate) Decide(ctx context.Context, req sandbox.ApprovalRequest) (sandbox.ApprovalDecision, error) { if g == nil || g.elicitor == nil { + logger := zap.NewNop() + if g != nil && g.logger != nil { + logger = g.logger + } decision := unsupportedElicitationDecision(errors.New("elicitor is not configured")) recordMutationApproval(ctx, decision) - observability.LogElicitationOutcome(g.logger, SessionIDFromContext(ctx), decision.Approved, decision.Reason) + observability.LogElicitationOutcome(logger, SessionIDFromContext(ctx), decision.Approved, decision.Reason) return decision, nil }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@router/internal/codemode/server/approval.go` around lines 50 - 55, The nil-receiver branch in ElicitationGate.Decide uses g.logger after checking g == nil, which still panics for a typed nil receiver; change the branch to avoid dereferencing g by creating a local logger variable (e.g., logger := g.logger if g != nil else a nil-safe/default logger) before calling recordMutationApproval and observability.LogElicitationOutcome, and pass that local logger (or a safe placeholder) to observability.LogElicitationOutcome so the fallback path is truly nil-safe; keep unsupportedElicitationDecision and recordMutationApproval behavior unchanged.

coderabbitai · 2026-05-08T15:45:14Z

+	cop := http.NewCrossOriginProtection()
+	cop.AddInsecureBypassPattern("/{path...}")
+


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

In Go's net/http CrossOriginProtection, what does AddInsecureBypassPattern("/{path...}") do? Does it exempt all request paths from cross-origin checks?

💡 Result:

AddInsecureBypassPattern("/{path...}") registers the pattern "/{path...}" with an internal ServeMux, exempting requests whose paths match this pattern from cross-origin checks in CrossOriginProtection [1][2][3]. The pattern "/{path...}" uses Go 1.22+ ServeMux syntax where {path...} is a catch-all wildcard matching the remainder of any path after the leading slash, thus matching all possible request paths (e.g., "/", "/foo", "/api/bar", etc.) [4][5][6]. A request is exempt if the internal ServeMux's Handler(req) returns the sentinel handler for that pattern [7][2]. Therefore, yes, it exempts all request paths from cross-origin checks [8]. Note that documentation specifies only direct matches are permitted, excluding those requiring ServeMux redirects (e.g., path cleaning or trailing slashes), but "/{path...}" matches all paths directly without redirects [1]. This was introduced in Go 1.25 [9]. There was a CVE (GO-2025-3955) about unintended bypasses due to ServeMux redirect behavior with other patterns, fixed in 1.25.1 [10], but "/{path...}" intentionally bypasses everything.

Citations:

1: https://pkg.go.dev/net/http

2: https://github.com/golang/go/blob/go1.25.0/src/net/http/csrf.go

3: https://tip.golang.org/src/net/http/csrf.go?m=text

4: https://go.dev/issue/60227

5: net/http: enhanced ServeMux routing golang/go#61410

6: https://www.atharvapandey.com/post/go/go-modern-http-routing/

7: https://go.pact.im/doc/src/net/http/csrf.go.html

8: net/http: CrossOriginProtection insecure bypass patterns not limited to exact matches golang/go#75054

9: golang/go@1881d68

10: https://pkg.go.dev/vuln/GO-2025-3955

🏁 Script executed:

fd -type f -name "server.go" | grep -E "codemode|router"

Repository: wundergraph/cosmo

Length of output: 232

🏁 Script executed:

cat -n router/internal/codemode/server/server.go | head -400 | tail -150

Repository: wundergraph/cosmo

Length of output: 6304

The catch-all insecure bypass turns cross-origin protection off.

AddInsecureBypassPattern("/{path...}") matches all request paths (the {path...} syntax is a Go 1.22+ catch-all wildcard), so every request to this server skips the protection you just created. This leaves the MCP surface open to cross-origin/CSRF-style requests unless the bypass is narrowed or removed.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@router/internal/codemode/server/server.go` around lines 363 - 365, The code creates a CrossOriginProtection via http.NewCrossOriginProtection() and then disables it for all paths by calling cop.AddInsecureBypassPattern("/{path...}"); remove that catch-all bypass or replace it with a narrowly scoped pattern that only matches the specific safe endpoints that truly need bypassing (e.g., exact paths or limited prefixes), so that cop (the CrossOriginProtection instance) enforces protections for all other routes; locate the cop variable and the call to cop.AddInsecureBypassPattern("/{path...}") and either delete the call or change the pattern to a specific, minimal path expression.

coderabbitai · 2026-05-08T15:45:14Z

+func (b *MemoryBackend) Bundle(ctx context.Context, sessionID string) (string, error) {
+	if err := ctx.Err(); err != nil {
+		return "", err
+	}
+
+	value, ok := b.sessions.Load(sessionID)
+	if !ok {
+		return b.renderCapped(ctx, nil)
+	}
+	session := value.(*memSession)
+
+	session.mu.Lock()
+	defer session.mu.Unlock()
+
+	session.lastUsed = b.now()
+	if session.bundleValid {
+		return session.bundle, nil
+	}
+
+	if b.renderer == nil {
+		return "", errors.New("code mode storage renderer is not configured")
+	}


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Guard the empty-session path against a nil renderer.

When the session does not exist yet, Bundle returns b.renderCapped(ctx, nil) before the b.renderer == nil check. A backend constructed without a renderer will panic on the first bundle lookup for an empty session.

💡 Minimal fix

func (b *MemoryBackend) Bundle(ctx context.Context, sessionID string) (string, error) { if err := ctx.Err(); err != nil { return "", err } + if b.renderer == nil { + return "", errors.New("code mode storage renderer is not configured") + } value, ok := b.sessions.Load(sessionID) if !ok { return b.renderCapped(ctx, nil) } @@ - if b.renderer == nil { - return "", errors.New("code mode storage renderer is not configured") - } - ops := append([]SessionOp(nil), session.ops...)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

func (b *MemoryBackend) Bundle(ctx context.Context, sessionID string) (string, error) {

if err := ctx.Err(); err != nil {

return "", err

}

value, ok := b.sessions.Load(sessionID)

if !ok {

return b.renderCapped(ctx, nil)

}

session := value.(*memSession)

session.mu.Lock()

defer session.mu.Unlock()

session.lastUsed = b.now()

if session.bundleValid {

return session.bundle, nil

}

if b.renderer == nil {

return "", errors.New("code mode storage renderer is not configured")

}

func (b *MemoryBackend) Bundle(ctx context.Context, sessionID string) (string, error) {

if err := ctx.Err(); err != nil {

return "", err

}

if b.renderer == nil {

return "", errors.New("code mode storage renderer is not configured")

}

value, ok := b.sessions.Load(sessionID)

if !ok {

return b.renderCapped(ctx, nil)

}

session := value.(*memSession)

session.mu.Lock()

defer session.mu.Unlock()

session.lastUsed = b.now()

if session.bundleValid {

return session.bundle, nil

}

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@router/internal/codemode/storage/memory_backend.go` around lines 181 - 202, The Bundle method in MemoryBackend calls b.renderCapped(ctx, nil) for missing sessions before checking b.renderer, so a nil renderer panics; update Bundle to first check if b.renderer == nil (same check used later) and return a clear error when renderer is not configured, or otherwise call b.renderCapped; ensure you reference the MemoryBackend.renderer nil guard at the top of Bundle (before calling renderCapped), preserving existing behavior for non-nil renderer and keeping session.mu, session.lastUsed and session.bundleValid logic intact.

coderabbitai · 2026-05-08T15:45:14Z

+				if pendingReload {
+					pendingReload = false
+					logger.Info("operations directory changed; reloading tools and notifying connected clients")
+					onChange()
+				}


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Guard onChange() against panics to avoid process-wide crash.

Line 72 invokes the callback without recovery. If callback code panics, the panic escapes this goroutine and can crash the router.

🛡️ Proposed fix

if pendingReload { pendingReload = false logger.Info("operations directory changed; reloading tools and notifying connected clients") - onChange() + func() { + defer func() { + if r := recover(); r != nil { + logger.Error("operations watcher callback panicked", zap.Any("panic", r), zap.Stack("stack")) + } + }() + onChange() + }() }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if pendingReload {

pendingReload = false

logger.Info("operations directory changed; reloading tools and notifying connected clients")

onChange()

}

if pendingReload {

pendingReload = false

logger.Info("operations directory changed; reloading tools and notifying connected clients")

func() {

defer func() {

if r := recover(); r != nil {

logger.Error("operations watcher callback panicked", zap.Any("panic", r), zap.Stack("stack"))

}

}()

onChange()

}()

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@router/pkg/mcpserver/operations_watcher.go` around lines 69 - 73, The call to onChange() in the operations watcher can panic and crash the process; wrap the invocation in a panic-recovery guard so panics are recovered and logged instead of escaping the goroutine. For example, create a small safe wrapper around onChange (or inline a defer func() { if r := recover(); r != nil { logger.Error("onChange panic", "err", r) } }()) immediately before calling onChange, then call the wrapped function; reference the existing pendingReload boolean and logger.Info("operations directory changed; reloading tools and notifying connected clients") so behavior and logging remain the same.

…afting Exposes a new MCP tool that takes a natural-language prompt, calls Yoko search, and returns named GraphQL operations as a preview — without registering them in the session bundle. Intended for authoring assistance; use code_mode_search_tools when the operation needs to be callable from code_mode_run_js.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@router/internal/codemode/server/query_handler.go`:
- Around line 76-79: The function validating the prompt uses
strings.TrimSpace(input.Prompt) but returns the original input.Prompt, so
leading/trailing whitespace can slip through; update code_mode_generate_query
(or the enclosing function) to assign the trimmed value (e.g., trimmed :=
strings.TrimSpace(input.Prompt)) and return that trimmed string instead of
input.Prompt after the nil error to keep validation and returned value
consistent.
- Line 53: The handler currently returns textResult(strings.Join(documents,
"\n\n")), nil even when documents is empty; update the code that prepares the
final result (the documents slice and the call to textResult) to detect when
len(documents) == 0 and return an explicit error or a clear "no results" text
result instead of an empty body so callers can distinguish a true success from
"no operations matched" (refer to the documents variable and the textResult(...)
call in this function); ensure the returned error/message clearly references
that Yoko returned no usable named operations.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 23f639f3-25cf-4030-8227-2ec57817f596

📥 Commits

Reviewing files that changed from the base of the PR and between 70d002a and 0fe43b8.

📒 Files selected for processing (5)

router/internal/codemode/server/descriptions/descriptions.go
router/internal/codemode/server/descriptions/generate_query_tool.md
router/internal/codemode/server/query_handler.go
router/internal/codemode/server/server.go
router/internal/codemode/server/server_test.go

✅ Files skipped from review due to trivial changes (1)

router/internal/codemode/server/descriptions/generate_query_tool.md

🚧 Files skipped from review as they are similar to previous changes (3)

router/internal/codemode/server/descriptions/descriptions.go
router/internal/codemode/server/server.go
router/internal/codemode/server/server_test.go

coderabbitai · 2026-05-11T07:43:12Z

+		documents = append(documents, prependOperationDescription(doc, q.GetDescription()))
+	}
+
+	return textResult(strings.Join(documents, "\n\n")), nil


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Return an explicit result when no operations survive filtering.

Line 53 currently returns success with an empty text body when Yoko returns no usable named operations. That’s hard for callers to distinguish from a formatting issue. Returning a clear tool error (or explicit “no results” message) improves flow handling.

Proposed fix

- return textResult(strings.Join(documents, "\n\n")), nil + if len(documents) == 0 { + return toolErrorResult("code_mode_generate_query: no named operations found for the prompt"), nil + } + return textResult(strings.Join(documents, "\n\n")), nil

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@router/internal/codemode/server/query_handler.go` at line 53, The handler currently returns textResult(strings.Join(documents, "\n\n")), nil even when documents is empty; update the code that prepares the final result (the documents slice and the call to textResult) to detect when len(documents) == 0 and return an explicit error or a clear "no results" text result instead of an empty body so callers can distinguish a true success from "no operations matched" (refer to the documents variable and the textResult(...) call in this function); ensure the returned error/message clearly references that Yoko returned no usable named operations.

coderabbitai · 2026-05-11T07:43:12Z

+	if strings.TrimSpace(input.Prompt) == "" {
+		return "", errors.New("code_mode_generate_query: prompt must be a non-empty string")
+	}
+	return input.Prompt, nil


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Return the trimmed prompt after validation.

Line 76 validates with TrimSpace, but Line 79 returns the original string. Returning the trimmed value avoids sending leading/trailing whitespace to Yoko and keeps behavior consistent with validation.

Proposed fix

- if strings.TrimSpace(input.Prompt) == "" { + prompt := strings.TrimSpace(input.Prompt) + if prompt == "" { return "", errors.New("code_mode_generate_query: prompt must be a non-empty string") } - return input.Prompt, nil + return prompt, nil

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@router/internal/codemode/server/query_handler.go` around lines 76 - 79, The function validating the prompt uses strings.TrimSpace(input.Prompt) but returns the original input.Prompt, so leading/trailing whitespace can slip through; update code_mode_generate_query (or the enclosing function) to assign the trimmed value (e.g., trimmed := strings.TrimSpace(input.Prompt)) and return that trimmed string instead of input.Prompt after the nil error to keep validation and returned value consistent.

Drop the codeMode == nil guard on the operations directory watcher and the per-op tool build in Reload. Code Mode (yoko tools) and per-op tools from the filesystem now coexist on the same MCP server — useful for demos that want to expose curated .graphql operations as tools while keeping the yoko surface.

Replace the trailing-edge settle (fire after one quiet tick) with a leading-edge debounce that fires onChange on the first tick a change is observed, then coalesces further changes into at most one trailing fire after a two-tick cooldown. Cooldown is counted in ticks rather than wall-clock because ticker jitter straddles a wall-clock boundary and defeats coalescing. First-notification latency drops from 2-Nx interval to ~1x interval — with watch_interval: 1s, clients see tools/list_changed within ~1s of a save instead of 2-10s. Bursts (atomic save + formatter rewrite, etc.) coalesce into ≤2 fires; applyToolDiff already keeps client noise low by suppressing notifications when tool fingerprints are unchanged.

coderabbitai

🧹 Nitpick comments (1)

router/pkg/mcpserver/operations_watcher_test.go (1)

53-66: ⚡ Quick win

Remove unused firedAt slice and mutex.

firedAt is populated but never read or asserted on. The mutex guarding it is also unnecessary since only calls is used for verification.

♻️ Proposed cleanup

 	var calls atomic.Int32
-	var firedAt []time.Time
-	var mu sync.Mutex

 	ctx, cancel := context.WithCancel(context.Background())
 	defer cancel()

 	interval := 50 * time.Millisecond
 	require.NoError(t, WatchOperationsDir(ctx, dir, interval, func() {
-		mu.Lock()
-		firedAt = append(firedAt, time.Now())
-		mu.Unlock()
 		calls.Add(1)
 	}, zaptest.NewLogger(t)))

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/pkg/mcpserver/operations_watcher_test.go` around lines 53 - 66, Remove
the unused firedAt slice and its mutex mu from the test since only the atomic
calls counter is asserted; in the WatchOperationsDir test remove the
declarations "var firedAt []time.Time" and "var mu sync.Mutex" and the block
inside the callback that locks mu and appends to firedAt (leave only
calls.Add(1)), ensuring the test still creates the context, calls
WatchOperationsDir, and verifies calls as before.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@router/pkg/mcpserver/operations_watcher_test.go`:
- Around line 53-66: Remove the unused firedAt slice and its mutex mu from the
test since only the atomic calls counter is asserted; in the WatchOperationsDir
test remove the declarations "var firedAt []time.Time" and "var mu sync.Mutex"
and the block inside the callback that locks mu and appends to firedAt (leave
only calls.Add(1)), ensuring the test still creates the context, calls
WatchOperationsDir, and verifies calls as before.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 418e3f0e-60f8-4a03-889f-37067f95b0e8

📥 Commits

Reviewing files that changed from the base of the PR and between 70d082d and 86817ce.

📒 Files selected for processing (2)

router/pkg/mcpserver/operations_watcher.go
router/pkg/mcpserver/operations_watcher_test.go

Surface one INFO log per AddTool/RemoveTools call so propagation delay into downstream MCP clients can be timed against the source event.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)

router/pkg/mcpserver/server.go (4)

1189-1209: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Forward s.upstreamHeaders on every GraphQL request.

WithUpstreamHeaders populates server state, but this request builder never applies it. Right now only request-derived headers are forwarded, so any static upstream auth or tenant headers configured for the collection are dropped.

Suggested fix

 	} else {
 		// Copy all headers from the MCP request
 		for key, values := range reqHeaders {
 			// Skip headers that should not be forwarded
 			if _, ok := headers.SkippedHeaders[key]; ok {
 				continue
 			}
 			for _, value := range values {
 				req.Header.Add(key, value)
 			}
 		}
 	}
+
+	for key, value := range s.upstreamHeaders {
+		req.Header.Set(key, value)
+	}
 
 	// Override specific headers that must be set for GraphQL requests
 	req.Header.Set("Accept", "application/json")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/pkg/mcpserver/server.go` around lines 1189 - 1209, The code only
forwards headersFromContext(ctx) to the proxied GraphQL request but omits the
server-level upstream headers stored in s.upstreamHeaders (set by
WithUpstreamHeaders), causing static auth/tenant headers to be dropped; update
the request-building logic where req is populated (near headersFromContext
usage) to first merge s.upstreamHeaders into req.Header (or add them after the
loop) while still skipping headers in headers.SkippedHeaders and without
overwriting explicit request headers unless intended, so both s.upstreamHeaders
and the per-request headers from headersFromContext(ctx) are applied before
setting Accept/Content-Type.

649-671: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Apply upstreamSchemaSDL before rebuilding tools.

WithUpstreamSchemaSDL is wired into the server state, but Reload() still uses the incoming schema everywhere: Line 659 forwards it to Code Mode, Line 668 caches it, and Line 671 builds the OperationsManager from it. That means upstream-bound collections still derive their tool surface and variable schemas from the local supergraph instead of the configured upstream SDL.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/pkg/mcpserver/server.go` around lines 649 - 671, Reload currently uses
the incoming schema everywhere; if an upstream SDL is configured via
WithUpstreamSchemaSDL you must apply that upstream SDL before forwarding to code
mode, caching, and rebuilding tools. Change Reload to prefer the server's
upstream schema SDL (e.g. a field like s.upstreamSchemaSDL or similar) when
present — print that SDL for s.codeMode.Reload, assign s.lastSchema to the
upstream-parsed document (or a merged/parsed version derived from
s.upstreamSchemaSDL) before setting s.lastFieldConfigs, and pass that
upstream/adjusted schema into NewSchemaCompiler and NewOperationsManager instead
of the raw incoming schema so tools and variable schemas are built from the
upstream SDL.

788-810: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Allow mounted servers to tear down their background state.

RegisterRoutes() explicitly supports shared-listener mode, but Stop() returns on Line 789 before s.cancel() and s.codeMode.Stop(...) if s.httpServer was never created. That leaves per-server watchers/storage alive for mounted instances.

Suggested fix

 func (s *GraphQLSchemaServer) Stop(ctx context.Context) error {
-	if s.httpServer == nil {
-		return fmt.Errorf("server is not started")
-	}
-
 	s.logger.Debug("shutting down MCP server")
 
 	// Cancel the server's context to stop background operations (e.g., JWKS key refresh)
 	if s.cancel != nil {
 		s.cancel()
@@
-	if err := s.httpServer.Shutdown(shutdownCtx); err != nil {
-		return fmt.Errorf("failed to gracefully shutdown MCP server: %w", err)
+	if s.httpServer != nil {
+		if err := s.httpServer.Shutdown(shutdownCtx); err != nil {
+			return fmt.Errorf("failed to gracefully shutdown MCP server: %w", err)
+		}
 	}
 
 	return nil
 }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/pkg/mcpserver/server.go` around lines 788 - 810, The Stop method
currently returns early if s.httpServer == nil which prevents running s.cancel()
and s.codeMode.Stop(...), leaving mounted servers' background watchers alive;
change Stop so it does not return immediately when s.httpServer is nil — always
call s.cancel() (if non-nil) and attempt to stop s.codeMode via
codeMode.Stop(ctx) so mounted instances can tear down, and only skip calling
s.httpServer.Shutdown when httpServer is nil (log that it's not started) instead
of returning before cleanup; locate symbols Stop, s.httpServer, s.cancel,
s.codeMode.Stop, and httpServer.Shutdown to make this change.

821-1002: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Suppress the standard GraphQL tools when Code Mode is mounted.

buildDesiredTools() still adds get_schema, execute_graphql, per-operation tools, and get_operation_info even when WithCodeMode is set. That contradicts the option contract and widens the mounted endpoint unexpectedly.

Suggested fix

 func (s *GraphQLSchemaServer) buildDesiredTools() (map[string]desiredTool, error) {
 	desired := make(map[string]desiredTool)
+
+	if s.codeMode != nil {
+		return desired, nil
+	}
 
 	// get_schema — only when exposeSchema is enabled.
 	if s.exposeSchema {

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/pkg/mcpserver/server.go` around lines 821 - 1002, buildDesiredTools
currently always adds GraphQL tools even when Code Mode is mounted; update
buildDesiredTools (method on GraphQLSchemaServer) to detect the WithCodeMode
flag (e.g., s.withCodeMode or s.codeModeMounted—use the existing field name in
the struct) and suppress adding the standard GraphQL tools (get_schema,
execute_graphql, per-operation tools generated in the operations loop, and
get_operation_info) when that flag is true by returning an empty desired map (or
skipping the tool-creation blocks) so the mounted endpoint surface matches the
WithCodeMode contract; ensure you still respect other flags (exposeSchema,
enableArbitraryOperations) when Code Mode is not mounted and avoid altering
unrelated logic such as toolScopes handling unless no tools are produced.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@router/pkg/mcpserver/server.go`:
- Around line 1189-1209: The code only forwards headersFromContext(ctx) to the
proxied GraphQL request but omits the server-level upstream headers stored in
s.upstreamHeaders (set by WithUpstreamHeaders), causing static auth/tenant
headers to be dropped; update the request-building logic where req is populated
(near headersFromContext usage) to first merge s.upstreamHeaders into req.Header
(or add them after the loop) while still skipping headers in
headers.SkippedHeaders and without overwriting explicit request headers unless
intended, so both s.upstreamHeaders and the per-request headers from
headersFromContext(ctx) are applied before setting Accept/Content-Type.
- Around line 649-671: Reload currently uses the incoming schema everywhere; if
an upstream SDL is configured via WithUpstreamSchemaSDL you must apply that
upstream SDL before forwarding to code mode, caching, and rebuilding tools.
Change Reload to prefer the server's upstream schema SDL (e.g. a field like
s.upstreamSchemaSDL or similar) when present — print that SDL for
s.codeMode.Reload, assign s.lastSchema to the upstream-parsed document (or a
merged/parsed version derived from s.upstreamSchemaSDL) before setting
s.lastFieldConfigs, and pass that upstream/adjusted schema into
NewSchemaCompiler and NewOperationsManager instead of the raw incoming schema so
tools and variable schemas are built from the upstream SDL.
- Around line 788-810: The Stop method currently returns early if s.httpServer
== nil which prevents running s.cancel() and s.codeMode.Stop(...), leaving
mounted servers' background watchers alive; change Stop so it does not return
immediately when s.httpServer is nil — always call s.cancel() (if non-nil) and
attempt to stop s.codeMode via codeMode.Stop(ctx) so mounted instances can tear
down, and only skip calling s.httpServer.Shutdown when httpServer is nil (log
that it's not started) instead of returning before cleanup; locate symbols Stop,
s.httpServer, s.cancel, s.codeMode.Stop, and httpServer.Shutdown to make this
change.
- Around line 821-1002: buildDesiredTools currently always adds GraphQL tools
even when Code Mode is mounted; update buildDesiredTools (method on
GraphQLSchemaServer) to detect the WithCodeMode flag (e.g., s.withCodeMode or
s.codeModeMounted—use the existing field name in the struct) and suppress adding
the standard GraphQL tools (get_schema, execute_graphql, per-operation tools
generated in the operations loop, and get_operation_info) when that flag is true
by returning an empty desired map (or skipping the tool-creation blocks) so the
mounted endpoint surface matches the WithCodeMode contract; ensure you still
respect other flags (exposeSchema, enableArbitraryOperations) when Code Mode is
not mounted and avoid altering unrelated logic such as toolScopes handling
unless no tools are produced.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ee5f7511-048d-4048-8ee6-982392577305

📥 Commits

Reviewing files that changed from the base of the PR and between 86817ce and 7f86a29.

📒 Files selected for processing (1)

router/pkg/mcpserver/server.go

asoorm and others added 18 commits May 1, 2026 17:33

chore: wip multi-collection mcp

9088da4

chore(mcp): drop stray blank line in mcp.config.yaml

9351902

github-actions Bot added monorepo protocol router labels May 8, 2026

coderabbitai Bot reviewed May 8, 2026

View reviewed changes

coderabbitai Bot reviewed May 11, 2026

View reviewed changes

asoorm added 2 commits May 11, 2026 08:47

coderabbitai Bot reviewed May 11, 2026

View reviewed changes

chore(mcp): log every tools/list_changed notification fired

7f86a29

Surface one INFO log per AddTool/RemoveTools call so propagation delay into downstream MCP clients can be timed against the source event.

coderabbitai Bot reviewed May 11, 2026

View reviewed changes

		cop := http.NewCrossOriginProtection()
		cop.AddInsecureBypassPattern("/{path...}")

Conversation

asoorm commented May 8, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Checklist

Open Source AI Manifesto

Uh oh!

coderabbitai Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Possibly related PRs

Uh oh!

github-actions Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Router-nonroot image scan passed

Uh oh!

codecov Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

asoorm commented May 8, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 8, 2026 •

edited

Loading

github-actions Bot commented May 8, 2026 •

edited

Loading

codecov Bot commented May 8, 2026 •

edited

Loading