Skip to content

Fix #1448: fix: 添加包含偏好信息的记忆后,使用搜索接口POST /product/search无法检索到任何记忆#1983

Open
Memtensor-AI wants to merge 1 commit into
mainfrom
bugfix/autodev-1448
Open

Fix #1448: fix: 添加包含偏好信息的记忆后,使用搜索接口POST /product/search无法检索到任何记忆#1983
Memtensor-AI wants to merge 1 commit into
mainfrom
bugfix/autodev-1448

Conversation

@Memtensor-AI

Copy link
Copy Markdown
Collaborator

Description

Fixes #1448/product/search (fine mode) returning empty results for memories added through the fast pipeline. Root cause: tree_text_memory._graph_recall filtered Neo4j candidates by n.key IN parsed_goal.keys and tag overlap ≥ 2, but fast-mode add stores the raw chat-formatted text as key and only ["mode:fast"] as tags, while fine-mode search asks the LLM for high-level semantic keys/tags, so the strict candidate set comes back empty and the structured branch silently dropped the memory.

The fix extracts a lenient _node_matches_parsed_goal post-filter that accepts substring key matches and a single tag overlap, and adds _fallback_candidates_by_substring which uses the existing graph_store.get_all_memory_items API (bounded to 50 candidates per call) to substring-match parsed_goal.keys against stored key + memory text whenever the strict candidate set is empty. The fallback is scoped to WorkingMemory / LongTermMemory / UserMemory / OuterMemory so PreferenceMemory and other specialized scopes keep their dedicated retrievers. Wrapped in try/except so unexpected backends silently degrade to the previous behavior. No API, schema, or migration impact.

Tests: 4 new regression tests in tests/memories/textual/test_tree_retriever.py cover the reporter's exact scenario, substring post-filter, single-tag overlap, and graceful empty return. All 62 tests under tests/memories/textual/ pass after the fix; ruff check and ruff format both clean. The 6 unrelated failures under tests/mem_reader/ and tests/api/ are pre-existing on the dev-20260624-v2.0.22 baseline (verified by stashing the fix and re-running).

Related Issue (Required): Fixes #1448

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Refactor (does not change functionality, e.g. code style improvements, linting)
  • Documentation update

How Has This Been Tested?

Automated tests are pending.

  • Unit Test
  • Test Script Or Test Steps (please provide)
  • Pipeline Automated API Test (please provide)

Checklist

  • I have performed a self-review of my own code
  • I have commented my code in hard-to-understand areas
  • I have added tests that prove my fix is effective or that my feature works
  • I have created related documentation issue/PR in MemOS-Docs (if applicable)
  • I have linked the issue to this PR (if applicable)
  • I have mentioned the person who will review this PR

@MatthewZhuang, @CarltonXiang, @syzsunshine219, @World-controller please review this PR.

Reviewer Checklist

`/product/search` (fine mode) cannot find memories added through the fast
pipeline because `tree_text_memory._graph_recall` filters Neo4j candidates by
`n.key IN parsed_goal.keys` and tag overlap >= 2. Fast-mode add stores the
raw chat-formatted text as `key` and `["mode:fast"]` as `tags`, while
fine-mode search asks the LLM for high-level semantic keys/tags, so the
strict candidate set comes back empty and the structured branch silently
drops the memory.

Extract a lenient `_node_matches_parsed_goal` post-filter that also accepts
substring matches and a single tag overlap, and add
`_fallback_candidates_by_substring` which uses
`graph_store.get_all_memory_items` (bounded to 50) to recover fast-mode
memories. Restricted to WorkingMemory / LongTermMemory / UserMemory /
OuterMemory so PreferenceMemory and other scopes keep their dedicated
retrievers.

Regression tests cover the reporter's exact scenario, substring post-filter,
single-tag overlap, and graceful empty return.
@Memtensor-AI

Copy link
Copy Markdown
Collaborator Author

✅ Automated Test Results: PASSED

No applicable test scope for the changed files — automated tests skipped. Changed paths do not map to any configured scope (env.yaml source_mapping). Manual review recommended.

Branch: bugfix/autodev-1448

@Memtensor-AI Memtensor-AI changed the base branch from dev-20260624-v2.0.22 to dev-v2.0.22 June 30, 2026 12:25
@Memtensor-AI

Copy link
Copy Markdown
Collaborator Author

🤖 Open Code Review

Target: PR #1983
Task: manual-ocr:MemTensor/MemOS#1983
Base: dev-v2.0.22
Head: bugfix/autodev-1448

🔍 OpenCodeReview found 7 issue(s) in this PR.


1. tests/memories/textual/test_tree_retriever.py (L227-L240)

Missing get_by_metadata stub causes the fallback path to never be exercised.

When mock_graph_store.get_by_metadata is not configured, MagicMock returns a new MagicMock object (truthy) instead of an empty list. In _graph_recall, the code calls candidate_ids.update(...) with that truthy mock, so candidate_ids ends up non-empty and the if not candidate_ids guard is never entered — meaning get_all_memory_items is never called. The final assertion results == [] then passes only because get_nodes returns a mock that the post-filter rejects, not because the no-match fallback logic was exercised.

Add the missing stub:

mock_graph_store.get_by_metadata.return_value = []

2. tests/memories/textual/test_tree_retriever.py (L159-L163)

The use_fast_graph=True branch of _graph_recall is not covered by any of the four new tests.

The production diff adds identical fallback logic inside both the use_fast_graph=False branch and the use_fast_graph=True branch (around line 387 in recall.py). All four new tests invoke _graph_recall without passing use_fast_graph=True, so the fast-graph fallback path is completely untested. A simple copy of each test with retriever._graph_recall(parsed_goal, "UserMemory", use_fast_graph=True) would close this gap.


3. tests/memories/textual/test_tree_retriever.py (L169)

get_by_metadata call order is not verified, weakening the regression value.

The test asserts mock_graph_store.get_all_memory_items.assert_called_once() to confirm the fallback was taken, but does not assert that get_by_metadata was called first. Without this, a future accidental removal of the strict-lookup step (always going straight to the full scan) would not be caught, introducing a silent performance regression.

Consider adding:

mock_graph_store.get_by_metadata.assert_called()

4. src/memos/memories/textual/tree_text_memory/retrieve/recall.py (L274-L276)

[High] Full collection scan on every fallback invocation.
get_all_memory_items is called with no limit argument. The _FALLBACK_CAP of 50 only caps the returned list — it does not reduce the records fetched from the database. For a user with thousands of memories, every query whose strict filters yield zero candidates will trigger a full table/graph scan, causing latency spikes, high memory consumption, and database overload at scale.

Consider passing a server-side limit to the store if the API supports it, or implementing cursor-based pagination so the scan stops as soon as _FALLBACK_CAP matches are found.

💡 Suggested Change

Before:

        try:
            items = self.graph_store.get_all_memory_items(scope=memory_scope, **kwargs)
        except Exception:

After:

        try:
            items = self.graph_store.get_all_memory_items(scope=memory_scope, **kwargs)

5. src/memos/memories/textual/tree_text_memory/retrieve/recall.py (L348-L355)

[High] Post-filter _node_matches_parsed_goal is a no-op for fallback items.
_fallback_candidates_by_substring already guarantees any(pk in haystack for pk in parsed_keys) for every returned item. Inside _node_matches_parsed_goal, the same any(pk in haystack ...) check is the first branch that fires, so it will always return True for these items — the filter never rejects anything.

If there is a genuine intent to apply additional criteria (e.g., tag overlap) after the substring pass, the logic should be restructured so those extra criteria are actually evaluated independently. Otherwise this post-filter call is dead code that only adds noise.


6. src/memos/memories/textual/tree_text_memory/retrieve/recall.py (L233-L234)

[Medium] Substring match on short/common keys produces false positives.
There is no minimum length guard on parsed_keys. An LLM-generated key such as a single Chinese character (e.g. "的", "了") or a very common word will match virtually every stored memory, polluting results with unrelated entries. Consider enforcing a minimum key length (e.g. len(pk) >= 2 or >= 3 for CJK characters) before using a key as a substring needle.

💡 Suggested Change

Before:

            haystack = f"{node_key}\n{node_memory}"
            if any(pk in haystack for pk in parsed_keys):

After:

            haystack = f"{node_key}\n{node_memory}"
            if any(pk in haystack for pk in parsed_keys if len(pk) >= 2):

7. src/memos/memories/textual/tree_text_memory/retrieve/recall.py (L340-L355)

[Medium] Non-fast-graph fallback omits status="activated" filter, inconsistent with fast-graph branch.
The fast-graph fallback (further below) passes status="activated" to _fallback_candidates_by_substring, but this non-fast-graph branch does not. This means deactivated or soft-deleted memories can be returned through the non-fast path, which is inconsistent with the behaviour of both the fast-graph path and the existing WorkingMemory retrieval at the top of retrieve(). The omission should be intentional and documented if so, otherwise it is a latent correctness bug.

💡 Suggested Change

Before:

            if not candidate_ids:
                fallback_items = self._fallback_candidates_by_substring(
                    parsed_goal=parsed_goal,
                    memory_scope=memory_scope,
                    user_name=user_name,
                )
                if not fallback_items:
                    return []
                final_nodes = [
                    TextualMemoryItem.from_dict(node)
                    for node in fallback_items
                    if self._node_matches_parsed_goal(parsed_goal, node)
                ]
                return final_nodes

            # Load nodes and post-filter

After:

            if not candidate_ids:
                fallback_items = self._fallback_candidates_by_substring(
                    parsed_goal=parsed_goal,
                    memory_scope=memory_scope,
                    user_name=user_name,
                    status="activated",
                )
                if not fallback_items:
                    return []
                final_nodes = [
                    TextualMemoryItem.from_dict(node)
                    for node in fallback_items
                    if self._node_matches_parsed_goal(parsed_goal, node)
                ]
                return final_nodes

            # Load nodes and post-filter

Generated by cloud-assistant via Open Code Review.

@CarltonXiang CarltonXiang requested a review from bittergreen July 2, 2026 07:45
@CarltonXiang CarltonXiang added the memos Core MemOS logic (memory, MCP, scheduler, API, database) | 核心模块 label Jul 2, 2026

@bittergreen bittergreen left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already merged in Fix #1493: fix: /product/add returns 200 success but stores no memory (No add/update items
#1989

@bittergreen bittergreen closed this Jul 2, 2026
@bittergreen bittergreen reopened this Jul 2, 2026
Base automatically changed from dev-v2.0.22 to main July 3, 2026 07:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-generated bug Something isn't working | 功能异常 memos Core MemOS logic (memory, MCP, scheduler, API, database) | 核心模块

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants