📋 Prerequisites
🎯 Affected Service(s)
Controller Service
🚦 Impact/Severity
Blocker
🐛 Bug Description
Agents with spec.memory configured intermittently fail memory search with HTTP 500, and turns are slow. The controller returns:
500 search failed: failed to increment access count: ERROR: deadlock detected (SQLSTATE 40P01)
The root cause is a non-deterministic row-lock order in IncrementMemoryAccessCount, triggered whenever multiple memory searches run concurrently over an overlapping set of rows — which PrefetchMemoryTool does on every first turn (one search per sentence). When the deadlock hits, the search returns 500 and the agent loses memory recall for that turn.
🔄 Steps To Reproduce
- Create an Agent with
spec.memory set (an embedding ModelConfig) and store a handful of memories for it.
- Send a multi-sentence first message so
PrefetchMemoryTool fans out several concurrent search_memory calls over the same top rows.
- Have this happen under any concurrency (e.g. a multi-sentence prompt, or two turns at once) so two searches return overlapping rows.
- Observe
SQLSTATE 40P01 (deadlock detected) in the controller logs and HTTP 500 on /api/memories/search; the client retries with backoff and often ends with "No memories found".
🤔 Expected Behavior
Concurrent memory searches update access_count without deadlocking. Memory search always returns its results (a best-effort access-count bookkeeping failure should never fail the search itself).
📱 Actual Behavior
Concurrent searches deadlock in Postgres. Root cause:
SearchAgentMemory runs, per search, as its own autocommit transaction:
UPDATE memory SET access_count = access_count + 1 WHERE id = ANY($1::text[]);
Postgres acquires the row locks in scan order, which can differ between two concurrent statements over an overlapping id set → circular wait → SQLSTATE 40P01. One statement is killed; the search returns 500. Controller log (repeats, with client backoff 1.16→2.16→3.17→4.2→5.2s):
{"level":"info","logger":"http-helpers","msg":"Responding with error","statusCode":500,"message":"search failed: failed to increment access count: ERROR: deadlock detected (SQLSTATE 40P01)"}
Compounding factors:
PrefetchMemoryTool searches memory once per sentence in parallel, which manufactures the concurrent overlapping searches.
- The task store re-POSTs the same task_id on every streaming event, amplifying write contention on the same Postgres.
💻 Environment
- OS and version: N/A (server-side controller + Python runtime, Linux containers)
- Kubernetes version: 1.34 (AWS EKS)
- Kubernetes provider: AWS EKS
- Browser (if applicable): N/A
- Application version: kagent controller from current main (reproduced on our v1.5.0 build; the code path is unchanged on main)
- Database: external Postgres (reproducible on any Postgres provider — the deadlock is in the SQL, not the provider)
🔧 CLI Bug Report
N/A — this is a server-side controller/agent-runtime bug, not a CLI issue, so kagent bug-report does not apply.
🔍 Additional Context
📋 Logs
Controller (repeats rapidly during a burst of concurrent memory searches):
{"level":"info","ts":"2026-06-30T13:54:56Z","logger":"http-helpers","msg":"Responding with error","statusCode":500,"message":"search failed: failed to increment access count: ERROR: deadlock detected (SQLSTATE 40P01)"}
{"level":"info","ts":"2026-06-30T13:54:57Z","logger":"http-helpers","msg":"Responding with error","statusCode":500,"message":"search failed: failed to increment access count: ERROR: deadlock detected (SQLSTATE 40P01)"}
... (12 in ~12s in our capture)
Agent-runtime side (client retries with linear backoff, then gives up):
retry in 1.16s → 2.16s → 3.17s → 4.20s → 5.20s
WARNING No memories found
📷 Screenshots
No response
🙋 Are you willing to contribute?
📋 Prerequisites
🎯 Affected Service(s)
Controller Service
🚦 Impact/Severity
Blocker
🐛 Bug Description
Agents with
spec.memoryconfigured intermittently fail memory search with HTTP 500, and turns are slow. The controller returns:500 search failed: failed to increment access count: ERROR: deadlock detected (SQLSTATE 40P01)
The root cause is a non-deterministic row-lock order in
IncrementMemoryAccessCount, triggered whenever multiple memory searches run concurrently over an overlapping set of rows — whichPrefetchMemoryTooldoes on every first turn (one search per sentence). When the deadlock hits, the search returns 500 and the agent loses memory recall for that turn.🔄 Steps To Reproduce
spec.memoryset (an embedding ModelConfig) and store a handful of memories for it.PrefetchMemoryToolfans out several concurrentsearch_memorycalls over the same top rows.SQLSTATE 40P01(deadlock detected) in the controller logs and HTTP 500 on/api/memories/search; the client retries with backoff and often ends with "No memories found".🤔 Expected Behavior
Concurrent memory searches update
access_countwithout deadlocking. Memory search always returns its results (a best-effort access-count bookkeeping failure should never fail the search itself).📱 Actual Behavior
Concurrent searches deadlock in Postgres. Root cause:
SearchAgentMemoryruns, per search, as its own autocommit transaction:Postgres acquires the row locks in scan order, which can differ between two concurrent statements over an overlapping id set → circular wait → SQLSTATE 40P01. One statement is killed; the search returns 500. Controller log (repeats, with client backoff 1.16→2.16→3.17→4.2→5.2s):
Compounding factors:
PrefetchMemoryToolsearches memory once per sentence in parallel, which manufactures the concurrent overlapping searches.💻 Environment
🔧 CLI Bug Report
N/A — this is a server-side controller/agent-runtime bug, not a CLI issue, so
kagent bug-reportdoes not apply.🔍 Additional Context
IncrementMemoryAccessCountSearchAgentMemory(treats the increment error as fatal)UPDATE ... WHERE id IN (SELECT id ... ORDER BY id FOR UPDATE); make the access-count increment non-fatal; debounce identical task-store saves; dedupe + bound the prefetch fan-out.📋 Logs
Controller (repeats rapidly during a burst of concurrent memory searches): {"level":"info","ts":"2026-06-30T13:54:56Z","logger":"http-helpers","msg":"Responding with error","statusCode":500,"message":"search failed: failed to increment access count: ERROR: deadlock detected (SQLSTATE 40P01)"} {"level":"info","ts":"2026-06-30T13:54:57Z","logger":"http-helpers","msg":"Responding with error","statusCode":500,"message":"search failed: failed to increment access count: ERROR: deadlock detected (SQLSTATE 40P01)"} ... (12 in ~12s in our capture) Agent-runtime side (client retries with linear backoff, then gives up): retry in 1.16s → 2.16s → 3.17s → 4.20s → 5.20s WARNING No memories found📷 Screenshots
No response
🙋 Are you willing to contribute?