Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions docs/features/openai-compatible-video-generation/plan.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Plan

## Approach
Treat video generation as a first-class model capability parallel to image generation and TTS:
- Extend shared model/type enums and model-db parsing to include `videoGeneration`.
- Add a shared video compatibility helper that can recover video intent from model metadata, endpoint hints, modalities, or known model ID patterns when upstream data is incomplete.
- Add an OpenAI-compatible video runtime path that sends requests to `/v1/videos`, normalizes provider responses, and emits media output into the assistant stream.
- Reuse the current assistant media block transport by carrying video payloads through the existing message block structure with video MIME detection on the renderer side.

## Affected Areas
- Shared types/contracts:
- `src/shared/model.ts`
- `src/shared/types/model-db.ts`
- `src/shared/types/presenters/llmprovider.presenter.d.ts`
- `src/shared/types/presenters/legacy.presenters.d.ts`
- `src/shared/videoGenerationSettings.ts` (new)
- Main runtime/provider:
- `src/main/presenter/configPresenter/index.ts`
- `src/main/presenter/configPresenter/modelConfig.ts`
- `src/main/presenter/llmProviderPresenter/index.ts`
- `src/main/presenter/llmProviderPresenter/providers/aiSdkProvider.ts`
- `src/main/presenter/llmProviderPresenter/aiSdk/runtime.ts`
- Renderer:
- `src/renderer/src/composables/useModelTypeDetection.ts`
- `src/renderer/src/components/chat/messageListItems.ts`
- `src/renderer/src/components/message/MessageItemAssistant.vue`
- `src/renderer/src/components/message/MessageBlockVideo.vue` (new)
- `src/renderer/settings/components/ProviderModelList.vue`
- Model DB:
- `resources/model-db/providers.json`

## Compatibility
- Existing text, image, and TTS paths remain unchanged.
- Existing assistant block persistence remains compatible by reusing the current media payload field rather than changing the storage shape.
- Future video models can plug in through shared detection helpers or explicit `videoGeneration` metadata.

## Verification Strategy
Run:
- `pnpm run typecheck`
- `pnpm run format`
- `pnpm run i18n`
- `pnpm run lint`
32 changes: 32 additions & 0 deletions docs/features/openai-compatible-video-generation/spec.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# OpenAI-Compatible Video Generation

## User Need
Users need DeepChat to recognize and run video generation models such as `doubao-seedance-2-0-fast-260128` through the same model-driven provider flow used by text and audio generation, without hardcoding one-off provider logic for each future video model.

## Goal
Enable first-class video generation routing in DeepChat for OpenAI-compatible providers, starting with AIHubMix Seedance models and leaving a compatibility layer for future video models.

## Acceptance Criteria
1. Shared model/type contracts support `videoGeneration` and preserve compatibility with existing model metadata.
2. DeepChat can recognize `doubao-seedance-2-0-fast-260128` as a video generation model even when upstream metadata is incomplete or still marked as `chat`.
3. Main runtime can route video generation requests through an OpenAI-compatible `/v1/videos` flow.
4. Video generation responses are normalized into a stable internal result shape that future providers/models can reuse.
5. Generated video output reaches the existing assistant message pipeline and renders in the chat UI.
6. Validation commands pass:
- `pnpm run typecheck`
- `pnpm run format`
- `pnpm run i18n`
- `pnpm run lint`

## Constraints
- Keep the provider integration generic for OpenAI-compatible video endpoints.
- Reuse the current assistant media block pipeline where practical instead of introducing a parallel storage format.
- Do not scope in advanced video editing controls or provider-specific parameter UIs for this change.

## Non-Goals
- Dedicated video generation settings panels.
- Agent-level video generation tool configuration.
- Non-OpenAI-compatible video provider protocols.

## Open Questions
- None for current scope.
25 changes: 25 additions & 0 deletions docs/features/openai-compatible-video-generation/tasks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Tasks

## Shared Types + Detection
- [x] Add `ModelType.VideoGeneration` and extend model-db parsing/schema for `videoGeneration`.
- [x] Add shared video detection/compatibility helpers for endpoint hints, modalities, and known model IDs.
- [x] Update model config inference to classify video models consistently in main and renderer flows.
- [x] Extend session generation settings/contracts and draft state to carry `videoGeneration` options.

## Runtime + Provider
- [x] Add `generateVideoStandalone` presenter contracts and implementation.
- [x] Add OpenAI-compatible `/v1/videos` request/response normalization in the AI SDK runtime/provider path.
- [x] Persist and sanitize session-level video generation settings through agent runtime and sqlite storage.
- [ ] Mark Seedance built-in model metadata as `videoGeneration` where available.

## Renderer
- [x] Expose video model detection for UI behavior alignment.
- [x] Add assistant message rendering for generated video media.
- [x] Update model list/type display for video generation models.
- [x] Expose video generation settings in chat status bar and model config dialog flows.

## Validation
- [x] Run `pnpm run typecheck`.
- [x] Run `pnpm run format`.
- [x] Run `pnpm run i18n`.
- [x] Run `pnpm run lint`.
20 changes: 20 additions & 0 deletions docs/issues/merge-dev-into-gen-video/plan.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Plan

## Scope
将 `origin/dev` 合并到当前 `gen-video` 分支,识别并解决冲突文件,保留双方必要改动,并执行仓库要求的基础校验。

## Implementation decisions
- 先 `git fetch origin dev`,再执行 `git merge origin/dev` 以基于最新远端 `dev` 合并。
- 冲突解决前先阅读每个冲突文件的上下文,按文件现有模式做最小修改。
- 若冲突涉及文档或配置,同样遵循最小差异原则,不借机整理无关内容。
- 合并完成后执行仓库要求的 `pnpm run format`、`pnpm run i18n`、`pnpm run lint`。若命令失败,记录失败点并告知用户。

## Risks and mitigations
- 风险:冲突文件较多且分散,容易误删一侧逻辑。
- 缓解:逐文件阅读冲突块上下文后再编辑,并在完成后检查 diff。
- 风险:格式化或 lint 暴露既有问题,影响本次验证。
- 缓解:优先区分新引入问题与仓库既有问题,向用户明确说明。

## Test strategy
- 使用 `git status` 确认冲突已清除。
- 使用格式化、i18n、lint 命令验证合并后仓库状态。
23 changes: 23 additions & 0 deletions docs/issues/merge-dev-into-gen-video/spec.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Merge dev into gen-video

## User stories
- 作为 `gen-video` 分支开发者,我需要合并最新 `dev` 变更到当前分支,以便继续在最新主线基础上开发。
- 作为评审者,我需要本次冲突解决范围清晰、仅限必要文件,并保留两侧已完成的有效修改。

## Acceptance criteria
- 当前分支成功合并 `origin/dev`,不存在未解决的 merge conflict。
- 冲突文件采用最小变更原则解决,不引入与本次合并无关的重构。
- 合并后工作区状态可继续提交,且相关校验命令已执行并记录结果。

## Non-goals
- 不在本次任务中实现新的产品功能。
- 不主动修改与冲突无关的历史代码风格。
- 不提交 commit,除非用户额外要求。

## Constraints
- 仅处理 `dev` 合并到当前 `gen-video` 分支产生的冲突。
- 遵循仓库现有 SDD、格式化、i18n、lint 规范。
- 如需保留双方逻辑,优先基于现有实现做兼容合并,而非重写。

## Open questions
- 无
8 changes: 8 additions & 0 deletions docs/issues/merge-dev-into-gen-video/tasks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Tasks

1. 获取最新 `origin/dev` 并确认当前分支状态。
2. 创建本次合并的 SDD 文档并记录范围、约束、验证方式。
3. 执行 `git merge origin/dev`,定位所有冲突文件。
4. 阅读冲突文件上下文,逐个解决冲突并保留必要改动。
5. 运行 `pnpm run format`、`pnpm run i18n`、`pnpm run lint`。
6. 汇总结果与后续建议,等待用户决定是否提交。
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Plan

## Approach
Add a small runtime helper that extracts an integer duration from obvious prompt hints only when structured video settings are absent and the parsed value is supported by the active model, then reuse that helper for both request tracing and the actual `/videos` request body.

## Implementation
- Add a focused runtime test that exercises the OpenAI-compatible `/videos` flow and asserts `duration: 2` is sent for prompts like `... 2s`.
- Add a conservative prompt-duration extractor for `Ns`, `N sec`, `N seconds`, and `N秒`.
- Enforce model-specific validity before injecting the derived duration (for Seedance, `4~15`).
- Apply the fallback only when `videoGeneration.duration` and `videoGeneration.seconds` are both unset.

## Affected Files
- `src/main/presenter/llmProviderPresenter/aiSdk/runtime.ts`
- `test/main/presenter/llmProviderPresenter/aiSdkRuntime.test.ts`
- `docs/issues/openai-compatible-video-prompt-duration-fallback/tasks.md`

## Validation
- Focused AI SDK runtime tests for video request bodies.
- `pnpm run format`
- `pnpm run i18n`
- `pnpm run lint`
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# OpenAI-Compatible Video Prompt Duration Fallback

## User Need
When users send prompts such as `生成 马斯克 喝酒的视频 2s` to OpenAI-compatible video models, DeepChat should preserve the obvious structured duration hint instead of sending only the raw prompt body.

## Goal
Infer an explicit video duration from clear prompt suffixes like `5s` or `5秒` when the session has no structured video duration configured and the parsed value is valid for the target model.

## Acceptance Criteria
1. OpenAI-compatible video requests derive `duration` from obvious prompt hints when neither `duration` nor `seconds` is already configured and the parsed value is valid for the current model.
2. Explicit structured video settings still take precedence over any prompt-derived fallback.
3. The emitted request trace matches the actual `/videos` body for this fallback.
4. Focused validation passes for the touched runtime slice.

## Constraints
- Keep the fallback narrow and conservative; do not attempt broad natural-language parameter parsing.
- Preserve existing request-shape compatibility and polling behavior.

## Non-Goals
- Adding or changing video settings UI.
- Parsing arbitrary style, ratio, or resolution hints from prompts.
- Changing provider safety or moderation behavior.

## Open Questions
- None.
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Tasks

## Runtime Fallback
- [x] Add a runtime regression test for prompt-derived video duration.
- [x] Apply a conservative prompt duration fallback before building `/videos` requests.

## Validation
- [x] Run focused AI SDK runtime tests.
- [x] Run `pnpm run format`.
- [x] Run `pnpm run i18n`.
- [x] Run `pnpm run lint`.
79 changes: 70 additions & 9 deletions src/main/presenter/agentRuntimePresenter/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,11 @@ import {
} from '@shared/imageGenerationSettings'
import { ApiEndpointType, ModelType, isDeepSeekSeriesModelId } from '@shared/model'
import { isTtsModelConfig, isTtsModelId } from '@shared/ttsSettings'
import {
isVideoGenerationModelConfig,
normalizeVideoGenerationOptions,
supportsOpenAICompatibleVideoGeneration
} from '@shared/videoGenerationSettings'
import { nanoid } from 'nanoid'
import type { SQLitePresenter } from '../sqlitePresenter'
import { eventBus, SendTarget } from '@/eventbus'
Expand Down Expand Up @@ -1434,7 +1439,8 @@ export class AgentRuntimePresenter implements IAgentImplementation {

private shouldUseDeepChatContextBudget(
providerId?: string | null,
modelConfig?: Pick<ModelConfig, 'apiEndpoint' | 'endpointType' | 'type'> | null
modelConfig?: Pick<ModelConfig, 'apiEndpoint' | 'endpointType' | 'type'> | null,
modelId?: string | null
): boolean {
if (providerId?.trim() === 'acp') {
return false
Expand All @@ -1456,22 +1462,28 @@ export class AgentRuntimePresenter implements IAgentImplementation {
return false
}

if (isVideoGenerationModelConfig(modelConfig, modelId?.trim() || '')) {
return false
}

return true
}

private shouldBypassDeepChatContextBudget(
providerId?: string | null,
modelConfig?: Pick<ModelConfig, 'apiEndpoint' | 'endpointType' | 'type'> | null
modelConfig?: Pick<ModelConfig, 'apiEndpoint' | 'endpointType' | 'type'> | null,
modelId?: string | null
): boolean {
return !this.shouldUseDeepChatContextBudget(providerId, modelConfig)
return !this.shouldUseDeepChatContextBudget(providerId, modelConfig, modelId)
}

private resolveDeepChatContextBudgetLength(
providerId: string | null | undefined,
contextLength: number,
modelConfig?: Pick<ModelConfig, 'apiEndpoint' | 'endpointType' | 'type'> | null
modelConfig?: Pick<ModelConfig, 'apiEndpoint' | 'endpointType' | 'type'> | null,
modelId?: string | null
): number {
return this.shouldBypassDeepChatContextBudget(providerId, modelConfig)
return this.shouldBypassDeepChatContextBudget(providerId, modelConfig, modelId)
? Number.MAX_SAFE_INTEGER
: contextLength
}
Expand Down Expand Up @@ -1655,7 +1667,7 @@ export class AgentRuntimePresenter implements IAgentImplementation {
throw new Error(`Session ${sessionId} not found`)
}
const modelConfig = this.configPresenter.getModelConfig(state.modelId, state.providerId)
if (this.shouldBypassDeepChatContextBudget(state.providerId, modelConfig)) {
if (this.shouldBypassDeepChatContextBudget(state.providerId, modelConfig, state.modelId)) {
throw new Error('Manual compaction is only available for DeepChat agent sessions.')
}
if (state.status !== 'idle') {
Expand All @@ -1676,7 +1688,8 @@ export class AgentRuntimePresenter implements IAgentImplementation {
const contextBudgetLength = this.resolveDeepChatContextBudgetLength(
state.providerId,
generationSettings.contextLength,
modelConfig
modelConfig,
state.modelId
)
const maxTokens = capAgentRequestMaxTokens(generationSettings.maxTokens, contextBudgetLength)
const activeSkillNames = await this.resolveActiveSkillNamesForToolProfile(sessionId)
Expand Down Expand Up @@ -1898,7 +1911,8 @@ export class AgentRuntimePresenter implements IAgentImplementation {
const contextBudgetLength = this.resolveDeepChatContextBudgetLength(
state.providerId,
generationSettings.contextLength,
baseModelConfig
baseModelConfig,
state.modelId
)
const capabilityProviderId = this.resolveCapabilityProviderId(state.providerId, state.modelId)
const reasoningPortrait = this.getReasoningPortrait(state.providerId, state.modelId)
Expand All @@ -1913,6 +1927,7 @@ export class AgentRuntimePresenter implements IAgentImplementation {
reasoningVisibility: generationSettings.reasoningVisibility,
verbosity: generationSettings.verbosity,
imageGeneration: generationSettings.imageGeneration,
videoGeneration: generationSettings.videoGeneration,
reasoning: getReasoningEffectiveEnabledForProvider(capabilityProviderId, reasoningPortrait, {
reasoning: baseModelConfig.reasoning,
reasoningEffort: generationSettings.reasoningEffort ?? baseModelConfig.reasoningEffort
Expand Down Expand Up @@ -2601,7 +2616,8 @@ export class AgentRuntimePresenter implements IAgentImplementation {
const contextBudgetLength = this.resolveDeepChatContextBudgetLength(
state.providerId,
generationSettings.contextLength,
modelConfig
modelConfig,
state.modelId
)
const maxTokens = capAgentRequestMaxTokens(generationSettings.maxTokens, contextBudgetLength)
const projectDir = this.resolveProjectDir(sessionId)
Expand Down Expand Up @@ -3435,6 +3451,22 @@ export class AgentRuntimePresenter implements IAgentImplementation {
}
}

if (
supportsOpenAICompatibleVideoGeneration({
providerId,
providerApiType: this.resolveProviderApiType(providerId),
modelId,
apiEndpoint: modelConfig.apiEndpoint,
endpointType: modelConfig.endpointType,
type: modelConfig.type
})
) {
const videoGeneration = normalizeVideoGenerationOptions(modelConfig.videoGeneration)
if (videoGeneration) {
defaults.videoGeneration = videoGeneration
}
}

const supportsReasoning =
this.configPresenter.supportsReasoningCapability?.(providerId, modelId) === true
if (supportsReasoning) {
Expand Down Expand Up @@ -3679,6 +3711,35 @@ export class AgentRuntimePresenter implements IAgentImplementation {
delete next.imageGeneration
}

if (
supportsOpenAICompatibleVideoGeneration({
providerId,
providerApiType: this.resolveProviderApiType(providerId),
modelId,
apiEndpoint: modelConfig.apiEndpoint,
endpointType: modelConfig.endpointType,
type: modelConfig.type
})
) {
if (Object.prototype.hasOwnProperty.call(patch, 'videoGeneration')) {
const videoGeneration = normalizeVideoGenerationOptions(patch.videoGeneration)
if (videoGeneration) {
next.videoGeneration = videoGeneration
} else {
delete next.videoGeneration
}
} else {
const videoGeneration = normalizeVideoGenerationOptions(next.videoGeneration)
if (videoGeneration) {
next.videoGeneration = videoGeneration
} else {
delete next.videoGeneration
}
}
} else {
delete next.videoGeneration
}

if (fixedTemperatureKimi) {
next.temperature = fixedTemperatureKimi.temperature
}
Expand Down
Loading