Skip to content

fix(kb): clean up KBMedia records on document delete, fix update_kb_stats#9120

Merged
Soulter merged 2 commits into
AstrBotDevs:masterfrom
lxfight:fix/kb-document-cleanup
Jul 3, 2026
Merged

fix(kb): clean up KBMedia records on document delete, fix update_kb_stats#9120
Soulter merged 2 commits into
AstrBotDevs:masterfrom
lxfight:fix/kb-document-cleanup

Conversation

@lxfight

@lxfight lxfight commented Jul 3, 2026

Copy link
Copy Markdown
Member

Modifications / 改动点

  • astrbot/core/knowledge_base/kb_db_sqlite.py: delete_document_by_id 现在先清理 KBMedia 记录再删除文档,避免产生孤立的多媒体数据;update_kb_stats 现在传递 kb_id 过滤参数给 vec_db.count_documents,修复了跨知识库统计 chunk 数量不正确的问题

  • tests/unit/test_kb_document_cleanup.py: 新增 4 个测试验证删除文档时 media 清理、不影响其他文档的 media、文档记录删除、update_kb_stats 过滤行为

  • This is NOT a breaking change. / 这不是一个破坏性变更。

Screenshots or Test Results / 运行截图或测试结果

tests/unit/test_kb_document_cleanup.py::test_delete_document_cleans_media_records PASSED [ 25%]
tests/unit/test_kb_document_cleanup.py::test_delete_document_keeps_other_doc_media PASSED [ 50%]
tests/unit/test_kb_document_cleanup.py::test_delete_document_removes_doc_record PASSED [ 75%]
tests/unit/test_kb_document_cleanup.py::test_update_kb_stats_counts_chunks_for_single_kb PASSED [100%]

Checklist / 检查清单

  • 😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
    / 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。

  • 👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
    / 我的更改经过了良好的测试,并已在上方提供了"验证步骤"和"运行截图"

  • 🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in requirements.txt and pyproject.toml.
    / 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到 requirements.txtpyproject.toml 文件相应位置。

  • 😮 My changes do not introduce malicious code.
    / 我的更改没有引入恶意代码。

Summary by Sourcery

Ensure knowledge base document deletions also clean up associated media and correct chunk statistics per knowledge base.

Bug Fixes:

  • Delete associated KBMedia records when a document is removed to prevent orphaned multimedia entries.
  • Filter vector store chunk counts by kb_id in update_kb_stats to avoid cross-knowledge-base statistics contamination.

Tests:

  • Add async integration-style tests for document deletion ensuring related media cleanup, isolation of other documents' media, and removal of document records.
  • Add a test verifying update_kb_stats invokes the vector store with a kb_id-scoped metadata filter.

…tats

- delete_document_by_id now deletes associated KBMedia records before
  removing the KBDocument row, preventing orphaned media data.
- update_kb_stats now passes kb_id filter to vec_db.count_documents,
  fixing chunk count that previously counted across all KBs.
- add tests for document deletion cleanup and update_kb_stats behavior.
@dosubot dosubot Bot added the size:S This PR changes 10-29 lines, ignoring generated files. label Jul 3, 2026

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • In delete_document_by_id, you open two separate transactions and explicitly call session.commit() inside a session.begin() context; consider using a single transactional block for both media and document deletion and letting the context manage commits/rollbacks to avoid partial cleanup and redundant commits.
  • Since delete_document_by_id now has two DB write phases plus a vector DB delete, you might want to guard the whole operation with clearer error handling (or a higher-level transaction pattern) so that failures after media deletion but before document/vector cleanup don’t leave the system in an inconsistent state.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `delete_document_by_id`, you open two separate transactions and explicitly call `session.commit()` inside a `session.begin()` context; consider using a single transactional block for both media and document deletion and letting the context manage commits/rollbacks to avoid partial cleanup and redundant commits.
- Since `delete_document_by_id` now has two DB write phases plus a vector DB delete, you might want to guard the whole operation with clearer error handling (or a higher-level transaction pattern) so that failures after media deletion but before document/vector cleanup don’t leave the system in an inconsistent state.

## Individual Comments

### Comment 1
<location path="astrbot/core/knowledge_base/kb_db_sqlite.py" line_range="325-328" />
<code_context>
-        """删除单个文档及其相关数据"""
+        """删除单个文档及其相关数据(包括多媒体记录)"""
+        # 删除多媒体记录
+        async with self.get_db() as session, session.begin():
+            delete_media_stmt = delete(KBMedia).where(col(KBMedia.doc_id) == doc_id)
+            await session.execute(delete_media_stmt)
+            await session.commit()
+
         # 在知识库表中删除
</code_context>
<issue_to_address>
**issue (bug_risk):** Avoid explicit commit inside `session.begin()` and consider a single transaction for both delete operations.

`session.begin()` already handles commit/rollback on context exit, so the explicit `await session.commit()` is unnecessary and may behave unexpectedly. Also, because media and document deletions run in separate transactions, a failure in the second can leave media deleted while the document remains. Use a single `async with self.get_db() as session, session.begin():` block to delete both `KBMedia` and `KBDocument` and rely on the context-managed transaction instead of calling `commit()` manually.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread astrbot/core/knowledge_base/kb_db_sqlite.py Outdated

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the SQLite knowledge base database implementation to delete associated media records (KBMedia) when a document is deleted, and ensures that knowledge base statistics count documents filtered by the specific knowledge base ID. It also adds comprehensive unit tests for these behaviors. The feedback recommends merging the separate deletion transactions in delete_document_by_id into a single atomic transaction to prevent data inconsistency and redundant commits, and properly closing the database connection in the test fixture to avoid connection leaks.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread astrbot/core/knowledge_base/kb_db_sqlite.py Outdated
Comment thread tests/unit/test_kb_document_cleanup.py
@dosubot dosubot Bot added the lgtm This PR has been approved by a maintainer label Jul 3, 2026
@Soulter Soulter merged commit e2f3b00 into AstrBotDevs:master Jul 3, 2026
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm This PR has been approved by a maintainer size:S This PR changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants