Skip to content

[BUG] MCP list_agents / invoke_agent exclude all SandboxAgents (only lists Agent CRs, hardcodes reason=="DeploymentReady") #2123

Description

@shrinedogg

📋 Prerequisites

  • I have searched the existing issues to avoid creating a duplicate
  • By submitting this issue, you agree to follow our Code of Conduct
  • I am using the latest version of the software
  • I have tried to clear cache/cookies or used incognito mode (if ui-related)
  • I can consistently reproduce this issue

🎯 Affected Service(s)

Multiple services / System-wide issue

🚦 Impact/Severity

Blocker

🐛 Bug Description

Summary

Substrate-backed SandboxAgents that are fully Ready (golden actor built, Accepted=True, reconciled into the kagent DB) are never returned by the MCP list_agents tool and can't be reached via invoke_agentinvoke_agent returns agent <ns>/<name> not found or not ready. The kagent REST API (/agents) and UI list them correctly; only the MCP path drops them.

Root cause is in MCPHandler.listReadyAgents at go/core/internal/mcp/mcp_handler.go (current main):

// listReadyAgents returns agents that are accepted and deployment-ready.
func (h *MCPHandler) listReadyAgents(ctx context.Context) ([]AgentSummary, error) {
	agentList := &v1alpha2.AgentList{}                       // (a) Agent CRs only
	if err := h.kubeClient.List(ctx, agentList); err != nil {
		return nil, err
	}
	agents := make([]AgentSummary, 0, len(agentList.Items))
	for _, agent := range agentList.Items {
		deploymentReady := false
		accepted := false
		for _, condition := range agent.Status.Conditions {
			if condition.Type == "Ready" && condition.Reason == "DeploymentReady" && condition.Status == "True" { // (b)
				deploymentReady = true
			}
			if condition.Type == "Accepted" && condition.Status == "True" {
				accepted = true
			}
		}
		if !accepted || !deploymentReady {
			continue
		}
		...
	}
}

Two compounding problems:

  • (a) Wrong CR type — SandboxAgents are never listed. It lists v1alpha2.AgentList only; SandboxAgentList is never enumerated. So every substrate SandboxAgent is invisible to listReadyAgents regardless of status.
  • (b) Hardcoded condition.Reason == "DeploymentReady". Declarative Agents report Ready with reason DeploymentReady; substrate SandboxAgents report Ready with reason WorkloadReady (set by reconcileSandboxAgentStatusAgentReadyReasonWorkloadReady when sandboxBackend.ComputeReady returns True). So even if SandboxAgent were listed, this reason check would still drop it.

listReadyAgents feeds both handleListAgents (the list_agents tool) and readAgentsResource (the kagent://agents resource), so both the tool and the resource are affected.

🔄 Steps To Reproduce

  1. Install kagent 0.9.10 with substrate enabled and a SandboxAgent whose golden actor builds successfully (e.g. the k8s-agent from examples/substrate-openclaw, runtime go).
  2. Wait until kubectl get sandboxagent <name> shows READY=True (reason WorkloadReady) and ACCEPTED=True (reason Reconciled). The agent is in the kagent Postgres agents table and appears in the REST /agents API + UI.
  3. Call the MCP list_agents tool → the SandboxAgent is absent. Call invoke_agent with <ns>/<name>Failed to send A2A message: agent <ns>/<name> not found or not ready.
# declarative Agent
$ kubectl get agent <decl-agent> -o jsonpath='{.status.conditions[?(@.type=="Ready")].reason}'
DeploymentReady
# substrate SandboxAgent
$ kubectl get sandboxagent <sub-agent> -o jsonpath='{.status.conditions[?(@.type=="Ready")].reason}'
WorkloadReady

🤔 Expected Behavior

MCP list_agents / invoke_agent treat a SandboxAgent that is Accepted=True and Ready=True as invocable, the same as a declarative Agent.

📱 Actual Behavior

No response

💻 Environment

  • kagent: v0.9.10 (also reproduced on mainlistReadyAgents is unchanged there)
  • Substrate: v0.0.7
  • Run on Kubernetes 1.36.2 (homelab), substrate installed into a non-ate-system namespace

🔧 CLI Bug Report

No response

🔍 Additional Context

This was the final blocker after fixing a stack of ate-system-hardcoding bugs (conn-spec, jwks-url, atelet-discovery namespace, env-source RBAC namespace, SandboxConfig/runsc, and a Cilium egress policy) that had kept the SandboxAgents at Ready=False. Once those were resolved the agents reached Ready=True but remained un-invocable via MCP, which traced to this listReadyAgents limitation.

📋 Logs

📷 Screenshots

No response

🙋 Are you willing to contribute?

  • I am willing to submit a PR to fix this issue

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Fields

    No fields configured for Bug.

    Projects

    Status
    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions