Skip to content

Fix #1808: fix: [memos-local-plugin] Dreaming background processing starves Gateway event l#2002

Merged
syzsunshine219 merged 2 commits into
dev-v2.0.22from
bugfix/autodev-1808
Jul 2, 2026
Merged

Fix #1808: fix: [memos-local-plugin] Dreaming background processing starves Gateway event l#2002
syzsunshine219 merged 2 commits into
dev-v2.0.22from
bugfix/autodev-1808

Conversation

@Memtensor-AI

Copy link
Copy Markdown
Collaborator

Description

Fix issue #1808 — memos-local-plugin no longer starves the OpenClaw Gateway event loop during plugin bootstrap.

Root cause: core.init() in apps/memos-local-plugin/core/pipeline/memory-core.ts synchronously awaited the full reflect → reward → L2-induction chain for every orphan / dirty closed episode in SQLite. On the reporter's 30K-trace / 324MB database that pinned the Node.js event loop for 3-5s+, causing OpenClaw's 3s WebSocket read probe to time out so TUI and Control UI clients could not connect. A secondary symptom — failed LLM rescores being retried indefinitely with no backoff — kept the loop alive across restarts and the 10-minute periodic rescan.

Fix delivered on bugfix/autodev-1808 (commit 26c96b0): (1) Split init() into a fast synchronous orphan / dirty classification and a background startupRecoveryPromise that runs the slow reflect/reward chain off the Gateway's critical path; (2) Added an optional MemoryCore.waitForStartupRecovery?() method so tests and one-shot batch tools can opt back into the historic "await everything" semantics — the promise is contracted never to reject; (3) shutdown() now awaits the background promise before tearing down storage so SQLite is not closed mid-flush; (4) Introduced per-episode failure tracking under meta.rewardDirty — after MAX_DIRTY_REWARD_ATTEMPTS=3 consecutive failed automatic rescores the episode enters an exponential backoff (1h → 24h cap) and the init + periodic scans skip it until the cooldown elapses; manual feedback / runManually still rescore unconditionally.

Tests: tests/unit/pipeline/memory-core.test.ts now 32/32 green (3 existing orphan/dirty tests updated to await the new promise; 4 net new tests covering init latency contract, backoff filter, failure-counter logic, and shutdown synchronization). The pipeline + bridge + adapters slice passes 127/127. Full unit sweep is 1044/1047 — the 2 unrelated failures in tests/unit/storage/traces-count.test.ts and migrator.test.ts reproduce identically on the unmodified base branch (verified via stash roundtrip) and are out of scope for this bug. TypeScript tsc --noEmit is clean for both tsconfig.json and tsconfig.build.json. The Python backend, LLM client timeouts, and Makefile/CI are unchanged.

Related Issue (Required): Fixes #1808

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Refactor (does not change functionality, e.g. code style improvements, linting)
  • Documentation update

How Has This Been Tested?

Automated tests are pending.

  • Unit Test
  • Test Script Or Test Steps (please provide)
  • Pipeline Automated API Test (please provide)

Checklist

  • I have performed a self-review of my own code
  • I have commented my code in hard-to-understand areas
  • I have added tests that prove my fix is effective or that my feature works
  • I have created related documentation issue/PR in MemOS-Docs (if applicable)
  • I have linked the issue to this PR (if applicable)
  • I have mentioned the person who will review this PR

@MatthewZhuang, @CarltonXiang, @syzsunshine219, @World-controller please review this PR.

Reviewer Checklist

…op + backoff failed dirty rescores (#1808)

`core.init()` in memos-local-plugin used to synchronously await the
entire reflect → reward → L2-induction chain for every orphan and
dirty closed episode discovered in SQLite. On the 30K-trace / 324MB
database described in issue #1808 this blocked the OpenClaw Gateway's
main Node.js event loop for 3-5s+, well past the WebSocket read-probe
3s budget — TUI and Control UI clients could no longer connect.

The fix:

* Split `init()` into a fast synchronous classification (lightweight
  orphan close + `topicState=interrupted` meta updates + stale /
  dirty bucket assembly) and a background `startupRecoveryPromise`
  that runs the slow reflect/reward chain off the critical path.
  Adapter `await core.init()` now returns in milliseconds regardless
  of the database size.
* Expose `MemoryCore.waitForStartupRecovery?()` so tests and one-shot
  batch tools can opt back into the historic "await everything"
  semantics. The promise is contracted to never reject; failures are
  logged on the `init.background_recovery_failed` channel.
* `shutdown()` now awaits the promise before tearing down storage so
  the SQLite handle is not closed mid-flush.
* Add per-episode failure tracking (`meta.rewardDirty.failedAttempts`
  / `lastFailureAt`) to dirty closed episodes. After
  `MAX_DIRTY_REWARD_ATTEMPTS=3` consecutive automatic rescores the
  episode enters an exponential backoff (1h → cap 24h) before the
  init scan / 10-min periodic timer touches it again. Manual feedback
  / `runManually` still rescore unconditionally. This closes the
  "retried indefinitely with no backoff" sub-symptom reported on the
  same issue.

Tests: pipeline+bridge+adapters unit suite 127/127 green, full unit
suite 1044/1047 (the 2 unrelated `storage/traces-count` +
`storage/migrator` failures reproduce on the base branch).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Memtensor-AI

Copy link
Copy Markdown
Collaborator Author

✅ Automated Test Results: PASSED

No applicable test scope for the changed files — automated tests skipped. Changed paths do not map to any configured scope (env.yaml source_mapping). Manual review recommended.

Branch: bugfix/autodev-1808

@syzsunshine219 syzsunshine219 changed the base branch from dev-20260624-v2.0.22 to dev-v2.0.22 July 1, 2026 07:14
@Memtensor-AI

Copy link
Copy Markdown
Collaborator Author

❌ Automated Test Results: FAILED

Auto-fix retry 1/2 triggered.

Failed tests:

  • test_out_of_range_rejected_or_clamped_to_valid[negative_1]
  • test_out_of_range_rejected_or_clamped_to_valid[negative_60s]
  • test_out_of_range_rejected_or_clamped_to_valid[negative_one_day]
  • test_out_of_range_rejected_or_clamped_to_valid[max_plus_1]
  • test_out_of_range_rejected_or_clamped_to_valid[max_plus_one_day]
  • test_out_of_range_rejected_or_clamped_to_valid[hundred_x_max]
  • test_invalid_type_does_not_crash_or_corrupt[string_number]
  • test_invalid_type_does_not_crash_or_corrupt[string_text]
  • test_invalid_type_does_not_crash_or_corrupt[none_value]
  • test_invalid_type_does_not_crash_or_corrupt[dict_value]
Error details
The vector_scan max_age setter accepts invalid values (negative numbers, values exceeding the documented max, and non-numeric types like strings/None/dicts) and persists them verbatim instead of rejecting or clamping to the [0, 31536000000] ms range. AI-generated tests: 30/30 passed.

Branch: bugfix/autodev-1808

@CarltonXiang CarltonXiang added the plugin Plugin/adapter/bridge layer (apps/ directory) | 插件/适配层 label Jul 2, 2026
@Memtensor-AI

Copy link
Copy Markdown
Collaborator Author

Automated Test Results: PASSED\n\nCloud test-engine rerun after resolving the dev-v2.0.22 merge conflict.\n\nRun: tr-9960eb48-574\nScope: memos_local_plugin\nResult: 33/33 tests passed\nCommand group: memos_local_plugin/unit\nDuration: 11s\n\nLocal pre-push verification also passed: npm run build, plus focused vitest for memory-core startup recovery/backoff tests including the #1966 ghost-trace guard.\n\nStatus: merge conflict resolved; automated scope test passed. Manual code review is still required before merge.

@syzsunshine219 syzsunshine219 merged commit 4e16c68 into dev-v2.0.22 Jul 2, 2026
16 checks passed
@syzsunshine219 syzsunshine219 deleted the bugfix/autodev-1808 branch July 2, 2026 11:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-generated bug Something isn't working | 功能异常 plugin Plugin/adapter/bridge layer (apps/ directory) | 插件/适配层

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants