fix(kb): clean up KBMedia records on document delete, fix update_kb_stats#9120
Conversation
…tats - delete_document_by_id now deletes associated KBMedia records before removing the KBDocument row, preventing orphaned media data. - update_kb_stats now passes kb_id filter to vec_db.count_documents, fixing chunk count that previously counted across all KBs. - add tests for document deletion cleanup and update_kb_stats behavior.
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- In
delete_document_by_id, you open two separate transactions and explicitly callsession.commit()inside asession.begin()context; consider using a single transactional block for both media and document deletion and letting the context manage commits/rollbacks to avoid partial cleanup and redundant commits. - Since
delete_document_by_idnow has two DB write phases plus a vector DB delete, you might want to guard the whole operation with clearer error handling (or a higher-level transaction pattern) so that failures after media deletion but before document/vector cleanup don’t leave the system in an inconsistent state.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `delete_document_by_id`, you open two separate transactions and explicitly call `session.commit()` inside a `session.begin()` context; consider using a single transactional block for both media and document deletion and letting the context manage commits/rollbacks to avoid partial cleanup and redundant commits.
- Since `delete_document_by_id` now has two DB write phases plus a vector DB delete, you might want to guard the whole operation with clearer error handling (or a higher-level transaction pattern) so that failures after media deletion but before document/vector cleanup don’t leave the system in an inconsistent state.
## Individual Comments
### Comment 1
<location path="astrbot/core/knowledge_base/kb_db_sqlite.py" line_range="325-328" />
<code_context>
- """删除单个文档及其相关数据"""
+ """删除单个文档及其相关数据(包括多媒体记录)"""
+ # 删除多媒体记录
+ async with self.get_db() as session, session.begin():
+ delete_media_stmt = delete(KBMedia).where(col(KBMedia.doc_id) == doc_id)
+ await session.execute(delete_media_stmt)
+ await session.commit()
+
# 在知识库表中删除
</code_context>
<issue_to_address>
**issue (bug_risk):** Avoid explicit commit inside `session.begin()` and consider a single transaction for both delete operations.
`session.begin()` already handles commit/rollback on context exit, so the explicit `await session.commit()` is unnecessary and may behave unexpectedly. Also, because media and document deletions run in separate transactions, a failure in the second can leave media deleted while the document remains. Use a single `async with self.get_db() as session, session.begin():` block to delete both `KBMedia` and `KBDocument` and rely on the context-managed transaction instead of calling `commit()` manually.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Code Review
This pull request updates the SQLite knowledge base database implementation to delete associated media records (KBMedia) when a document is deleted, and ensures that knowledge base statistics count documents filtered by the specific knowledge base ID. It also adds comprehensive unit tests for these behaviors. The feedback recommends merging the separate deletion transactions in delete_document_by_id into a single atomic transaction to prevent data inconsistency and redundant commits, and properly closing the database connection in the test fixture to avoid connection leaks.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
Modifications / 改动点
astrbot/core/knowledge_base/kb_db_sqlite.py:delete_document_by_id现在先清理KBMedia记录再删除文档,避免产生孤立的多媒体数据;update_kb_stats现在传递kb_id过滤参数给vec_db.count_documents,修复了跨知识库统计 chunk 数量不正确的问题tests/unit/test_kb_document_cleanup.py: 新增 4 个测试验证删除文档时 media 清理、不影响其他文档的 media、文档记录删除、update_kb_stats 过滤行为This is NOT a breaking change. / 这不是一个破坏性变更。
Screenshots or Test Results / 运行截图或测试结果
Checklist / 检查清单
😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
/ 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。
👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
/ 我的更改经过了良好的测试,并已在上方提供了"验证步骤"和"运行截图"。
🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in
requirements.txtandpyproject.toml./ 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到
requirements.txt和pyproject.toml文件相应位置。😮 My changes do not introduce malicious code.
/ 我的更改没有引入恶意代码。
Summary by Sourcery
Ensure knowledge base document deletions also clean up associated media and correct chunk statistics per knowledge base.
Bug Fixes:
Tests: