Skip to content

fix: align knowledge base CRUD API contract with kb_name, canonical payload fields, and matching OpenAPI types#9000

Merged
Soulter merged 4 commits into
AstrBotDevs:masterfrom
lxfight:fix/kb-contract-crud
Jul 3, 2026
Merged

fix: align knowledge base CRUD API contract with kb_name, canonical payload fields, and matching OpenAPI types#9000
Soulter merged 4 commits into
AstrBotDevs:masterfrom
lxfight:fix/kb-contract-crud

Conversation

@lxfight

@lxfight lxfight commented Jun 25, 2026

Copy link
Copy Markdown
Member

Unify knowledge base create/update/list with kb_name, canonical_payload, optional list pagination with total, and matching OpenAPI plus dashboard types.

The knowledge base CRUD (create/update/list) API contract is inconsistent between frontend and backend. The backend schema uses a name field while the generated frontend types expect kb_name. Both create and update share the same KnowledgeBaseRequest schema, but create requires kb_name and embedding_provider_id, while update does not — this shared schema leads to confusing validation semantics. Additionally, the knowledge base payload is missing canonical fields such as emoji, chunk_size, chunk_overlap, top_k_dense, top_k_sparse, and top_m_final.

This change unifies the frontend-backend contract: renames name to kb_name, introduces KnowledgeBaseCreateRequest to separate create and update semantics, promotes optional configuration parameters into canonical KnowledgeBaseRequest fields, and syncs the OpenAPI spec, backend schemas, and generated frontend types.

Modifications / 改动点

  • OpenAPI schema (openspec/openapi-v1.yaml):
    • KnowledgeBaseRequest: namekb_name; added emoji, chunk_size, chunk_overlap, top_k_dense, top_k_sparse, top_m_final fields; embedding_provider_id / rerank_provider_id now accept null
    • Added KnowledgeBaseCreateRequest inheriting KnowledgeBaseRequest via allOf, marking kb_name and embedding_provider_id as required
  • Backend schema (astrbot/dashboard/schemas.py): synced KnowledgeBaseRequest and KnowledgeBaseCreateRequest Pydantic models
  • Backend API (astrbot/dashboard/api/knowledge_bases.py): create_knowledge_base uses KnowledgeBaseCreateRequest; update_knowledge_base and list_knowledge_bases adapted to kb_name
  • Backend service (astrbot/dashboard/services/knowledge_base_service.py): adapted to new schema field names and types
  • Frontend types (dashboard/src/api/generated/openapi-v1/types.gen.ts): regenerated
  • Frontend API (dashboard/src/api/v1.ts): adapted to new types
  • Frontend page (dashboard/src/views/knowledge-base/KBList.vue): namekb_name
  • Tests (tests/unit/test_knowledge_base_service_contract.py): added 266 lines of end-to-end contract tests
  • This is NOT a breaking change. / 这不是一个破坏性变更。

Screenshots or Test Results / 运行截图或测试结果

tests/unit/test_knowledge_base_service_contract.py::test_list_kbs_applies_pagination PASSED
tests/unit/test_knowledge_base_service_contract.py::test_list_kbs_without_page_size_returns_all_items PASSED
tests/unit/test_knowledge_base_service_contract.py::test_list_route_preserves_unpaginated_default PASSED
tests/unit/test_knowledge_base_service_contract.py::test_list_route_uses_default_page_size_when_page_is_explicit PASSED
tests/unit/test_knowledge_base_service_contract.py::test_create_kb_accepts_legacy_name_field PASSED
tests/unit/test_knowledge_base_service_contract.py::test_update_kb_preserves_omitted_fields PASSED
tests/unit/test_knowledge_base_service_contract.py::test_update_kb_allows_explicit_rerank_provider_clear PASSED
tests/unit/test_knowledge_base_service_contract.py::test_knowledge_base_schemas_match_service_contract PASSED
tests/unit/test_knowledge_base_service_contract.py::test_knowledge_base_request_preserves_explicit_null_updates PASSED
tests/unit/test_knowledge_base_service_contract.py::test_knowledge_base_request_omits_unset_none_fields PASSED
tests/unit/test_knowledge_base_service_contract.py::test_knowledge_base_request_uses_legacy_name_as_input_alias PASSED

============================== 11 passed in 1.60s ==============================

Checklist / 检查清单

  • 😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
    / 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。

  • 👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
    / 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”

  • 🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in requirements.txt and pyproject.toml.
    / 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到 requirements.txtpyproject.toml 文件相应位置。

  • 😮 My changes do not introduce malicious code.
    / 我的更改没有引入恶意代码。

Summary by Sourcery

Align knowledge base CRUD contracts across backend, OpenAPI spec, and frontend, including canonical field names, pagination behavior, and configuration parameters.

New Features:

  • Support optional pagination with total count in knowledge base list responses while preserving an unpaginated default mode.
  • Expose canonical knowledge base configuration fields such as emoji and retrieval tuning parameters in the public API and dashboard.

Bug Fixes:

  • Normalize knowledge base payloads to consistently use kb_name while preserving compatibility with legacy name input.
  • Ensure knowledge base update requests preserve existing values for omitted fields while allowing explicit null updates for optional providers.

Enhancements:

  • Split knowledge base create and update request schemas to accurately reflect required fields and type nullability.
  • Add a canonical_payload helper on the knowledge base request model to generate service-facing payloads from API inputs.
  • Strengthen the knowledge base API service tests with contract coverage for schemas, pagination semantics, and legacy compatibility.

Tests:

  • Add unit tests covering knowledge base pagination semantics, legacy name handling, canonical payload generation, and update behavior.

@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. area:webui The bug / feature is about webui(dashboard) of astrbot. feature:knowledge-base The bug / feature is about knowledge base labels Jun 25, 2026

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 3 issues, and left some high level feedback:

  • The legacy namekb_name normalization now exists both in KnowledgeBaseService._canonical_kb_payload and KnowledgeBaseRequest.canonical_payload; consider centralizing this logic in one place (e.g., only the schema helper) to avoid future drift between the two paths.
  • Instead of manually handling the legacy name field via getattr(self, "name", None) and payload.get("name"), you could model it explicitly in the Pydantic schema with a Field(alias="name") or similar, which would make the legacy compatibility more self-documenting and less error-prone.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The legacy `name``kb_name` normalization now exists both in `KnowledgeBaseService._canonical_kb_payload` and `KnowledgeBaseRequest.canonical_payload`; consider centralizing this logic in one place (e.g., only the schema helper) to avoid future drift between the two paths.
- Instead of manually handling the legacy `name` field via `getattr(self, "name", None)` and `payload.get("name")`, you could model it explicitly in the Pydantic schema with a `Field(alias="name")` or similar, which would make the legacy compatibility more self-documenting and less error-prone.

## Individual Comments

### Comment 1
<location path="tests/unit/test_knowledge_base_service_contract.py" line_range="187" />
<code_context>
+@pytest.mark.asyncio
+async def test_create_kb_accepts_legacy_name_field():
</code_context>
<issue_to_address>
**suggestion (testing):** Extend create_kb tests to cover validation errors (missing names / embedding provider).

To round this out, please also add tests for validation failures:

- When both `kb_name` and `name` are missing/empty, `create_kb` should raise `KnowledgeBaseServiceError("知识库名称不能为空")`.
- When `embedding_provider_id` is missing or invalid (provider manager returns `None`), the service should raise the current error used in that case.

These will ensure the service enforces the new create semantics and catches regressions in validation behavior.
</issue_to_address>

### Comment 2
<location path="astrbot/dashboard/services/knowledge_base_service.py" line_range="32" />
<code_context>
     def _payload(data: object) -> dict[str, Any]:
         return data if isinstance(data, dict) else {}

+    @staticmethod
+    def _canonical_kb_payload(data: object) -> dict[str, Any]:
+        """Normalize knowledge base create/update payloads.
</code_context>
<issue_to_address>
**issue (complexity):** Consider centralizing request canonicalization in the schema and making `update_kb`’s update fields explicit to avoid indirection and duplicated normalization logic.

You can simplify both of the flagged areas without losing any of the new behavior.

### 1. Avoid duplicating canonicalization logic

Instead of maintaining `_canonical_kb_payload` and a separate `canonical_payload` on the schema, centralize the logic in the schema and call it here.

For example, if you already have `KnowledgeBaseRequest.canonical_payload()`:

```python
# service.py
from .schemas import KnowledgeBaseRequest  # adjust import

@staticmethod
def _canonical_kb_payload(data: object) -> dict[str, Any]:
    raw = KnowledgeBaseService._payload(data)
    req = KnowledgeBaseRequest(**raw)
    return req.canonical_payload()
```

Or, if you can call the schema directly where you currently use `_canonical_kb_payload`, remove the helper entirely and keep the service dealing only with canonical dicts:

```python
async def create_kb(self, data: object) -> tuple[dict[str, Any], str]:
    kb_manager = self.get_kb_manager()
    raw = self._payload(data)
    payload = KnowledgeBaseRequest(**raw).canonical_payload()
    ...
```

This way the `name ➜ kb_name` migration and any future normalization lives in one place.

---

### 2. Simplify `update_kb` without losing partial-update semantics

You can keep the “at least one field” validation and the “default to current values” behavior, but make the update fields explicit instead of flowing through `update_keys` / `update_data` / `**update_data`.

Current logic (simplified):

```python
update_keys = [...]
provided_updates = {key: payload[key] for key in update_keys if key in payload}
...
current = current_kb.kb
update_data = {key: getattr(current, key) for key in update_keys}
update_data.update(provided_updates)

kb_helper = await self.get_kb_manager().update_kb(
    kb_id=kb_id,
    **update_data,
)
```

You can replace it with explicit parameters:

```python
async def update_kb(self, data: object) -> tuple[dict[str, Any], str]:
    payload = self._canonical_kb_payload(data)
    kb_id = payload.get("kb_id")
    if not kb_id:
        raise KnowledgeBaseServiceError("缺少参数 kb_id")

    # Allowed update fields
    fields = (
        "kb_name",
        "description",
        "emoji",
        "embedding_provider_id",
        "rerank_provider_id",
        "chunk_size",
        "chunk_overlap",
        "top_k_dense",
        "top_k_sparse",
        "top_m_final",
    )
    if not any(field in payload for field in fields):
        raise KnowledgeBaseServiceError("至少需要提供一个更新字段")

    current_kb = await self.get_kb_manager().get_kb(kb_id)
    if not current_kb:
        raise KnowledgeBaseServiceError("知识库不存在")
    current = current_kb.kb

    kb_helper = await self.get_kb_manager().update_kb(
        kb_id=kb_id,
        kb_name=payload.get("kb_name", current.kb_name),
        description=payload.get("description", current.description),
        emoji=payload.get("emoji", current.emoji),
        embedding_provider_id=payload.get("embedding_provider_id", current.embedding_provider_id),
        rerank_provider_id=payload.get("rerank_provider_id", current.rerank_provider_id),
        chunk_size=payload.get("chunk_size", current.chunk_size),
        chunk_overlap=payload.get("chunk_overlap", current.chunk_overlap),
        top_k_dense=payload.get("top_k_dense", current.top_k_dense),
        top_k_sparse=payload.get("top_k_sparse", current.top_k_sparse),
        top_m_final=payload.get("top_m_final", current.top_m_final),
    )
    if not kb_helper:
        raise KnowledgeBaseServiceError("知识库不存在")
    return kb_helper.kb.model_dump(), "更新知识库成功"
```

This keeps:

- Partial updates (only provided fields overwrite).
- Defaults from current KB values for missing fields.
- The “at least one update field” requirement.

But it removes the extra dict-building and makes the updatable fields obvious at the call site.
</issue_to_address>

### Comment 3
<location path="astrbot/dashboard/schemas.py" line_range="207" />
<code_context>


 class KnowledgeBaseRequest(OpenModel):
-    kb_id: str | None = None
-    name: str | None = None
</code_context>
<issue_to_address>
**issue (complexity):** Consider centralizing knowledge base payload normalization in `KnowledgeBaseRequest` using Pydantic field aliases for the legacy `name` field to avoid custom migration logic and duplicated canonicalization.

You can reduce complexity here by (1) making `KnowledgeBaseRequest` the single source of truth for normalization, and (2) using Pydantic aliases to handle the legacy `name` field instead of custom logic.

### 1. Remove duplicate canonicalization in the service

Instead of having both `canonical_payload()` on the model and `_canonical_kb_payload()` in the service, keep only the model-level one and call it from the service:

```python
# in the service, replace `_canonical_kb_payload(...)` usage with:
payload = kb_request.canonical_payload()
# ...use `payload` directly
```

Then delete `_canonical_kb_payload` entirely. This way, any future changes to the KB payload shape happen in one place.

### 2. Use a field alias for the legacy `name` input

You can avoid manual `getattr(self, "name", None)` and keep the migration logic declarative:

```python
from pydantic import Field

class KnowledgeBaseRequest(OpenModel):
    kb_name: str | None = Field(None, alias="name")  # accept legacy "name"
    description: str | None = None
    emoji: str | None = None
    embedding_provider_id: str | None = None
    rerank_provider_id: str | None = None
    chunk_size: int | None = None
    chunk_overlap: int | None = None
    top_k_dense: int | None = None
    top_k_sparse: int | None = None
    top_m_final: int | None = None

    def canonical_payload(self) -> dict[str, Any]:
        return self.model_dump(
            exclude_unset=True,
            include={
                "kb_name",
                "description",
                "emoji",
                "embedding_provider_id",
                "rerank_provider_id",
                "chunk_size",
                "chunk_overlap",
                "top_k_dense",
                "top_k_sparse",
                "top_m_final",
            },
            by_alias=False,  # ensure we output `kb_name`, not `name`
        )
```

This keeps all existing behavior (including support for legacy `name`) while:

- Eliminating duplicated normalization logic between layers.
- Removing the manual migration code path that can drift from the service logic.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread tests/unit/test_knowledge_base_service_contract.py
Comment thread astrbot/dashboard/services/knowledge_base_service.py
Comment thread astrbot/dashboard/schemas.py

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the knowledge base management system by standardizing payload schemas, introducing pagination to the listing service, updating OpenAPI specifications, and adding unit tests. However, several critical issues were identified during the review: removing _model_dict will cause NameError exceptions in other endpoints that still reference it; the KnowledgeBaseCreateRequest schema is missing from schemas.py despite being expected by the frontend and OpenAPI spec; combining allOf with additionalProperties: false in the OpenAPI spec will cause validation failures; and using getattr without a default value when updating knowledge bases could lead to an AttributeError.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread astrbot/dashboard/api/knowledge_bases.py
Comment thread astrbot/dashboard/schemas.py
Comment thread openspec/openapi-v1.yaml Outdated
Comment thread astrbot/dashboard/services/knowledge_base_service.py Outdated
@lxfight lxfight force-pushed the fix/kb-contract-crud branch from 85f98f2 to 69de462 Compare June 28, 2026 07:08
@lxfight

lxfight commented Jun 28, 2026

Copy link
Copy Markdown
Member Author

@gemini-code-assist review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the knowledge base creation and update schemas, introducing a distinct KnowledgeBaseCreateRequest and migrating the legacy name field to kb_name. It updates the backend services, OpenAPI specifications, frontend API clients, and UI components to align with these schema changes, and adds comprehensive unit tests. The reviewer suggested avoiding in-place modification of the input dictionary in _canonical_kb_payload to prevent unexpected side-effects.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

``kb_name`` migration while preserving operational fields
like ``kb_id``.
"""
raw = KnowledgeBaseService._payload(data)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Modifying the input data dictionary in-place via raw.update(canonical) can lead to unexpected side-effects in callers that reuse the dictionary. It is safer to create a copy of the dictionary first.

Suggested change
raw = KnowledgeBaseService._payload(data)
raw = dict(KnowledgeBaseService._payload(data))

@lxfight lxfight requested a review from Soulter June 28, 2026 07:17
lxfight added 4 commits July 3, 2026 10:47
Unify knowledge base create/update/list with kb_name, canonical_payload,
optional list pagination with total, and matching OpenAPI plus dashboard types.
…fix OpenAPI allOf validation

- Replace 3 remaining _model_dict() calls with payload.model_dump(exclude_none=True)
- Add KnowledgeBaseCreateRequest schema with required kb_name + embedding_provider_id
- Remove additionalProperties: false from OpenAPI allOf sub-schema to avoid validation failures
- Use Pydantic Field(alias="name") + populate_by_name=True for legacy name migration
- Delegate _canonical_kb_payload to KnowledgeBaseRequest to eliminate duplicated logic
- Add getattr default None for update_kb to prevent AttributeError
- Add validation error tests: missing kb_name, missing embedding_provider_id
@lxfight lxfight force-pushed the fix/kb-contract-crud branch from 7a12034 to de70eaa Compare July 3, 2026 02:47

@Soulter Soulter left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. Potential breaking changes in /kb api.

@dosubot dosubot Bot added the lgtm This PR has been approved by a maintainer label Jul 3, 2026
@Soulter Soulter merged commit ea9e342 into AstrBotDevs:master Jul 3, 2026
21 checks passed
@Soulter Soulter deleted the fix/kb-contract-crud branch July 3, 2026 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:webui The bug / feature is about webui(dashboard) of astrbot. feature:knowledge-base The bug / feature is about knowledge base lgtm This PR has been approved by a maintainer size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants