Skip to content

Commit 0d11419

Browse files
Refactor video gen (#765)
1 parent ecbbfe1 commit 0d11419

File tree

67 files changed

+3697
-11012
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

67 files changed

+3697
-11012
lines changed

.gitattributes

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
docs/resources/a-history-of-us-gdp.mp4 filter=lfs diff=lfs merge=lfs -text
2+
docs/resources/deepspeed-zero.mp4 filter=lfs diff=lfs merge=lfs -text

README.md

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@ MS-Agent is a lightweight framework designed to empower agents with autonomous e
3939
- **Multi-Agent for general purpose**: Chat with agent with tool-calling capabilities based on MCP.
4040
- **Deep Research**: To enable advanced capabilities for autonomous exploration and complex task execution.
4141
- **Code Generation**: Supports code generation tasks with artifacts.
42+
- **Short Video Generation**:Support video generation of about 5 minutes.
4243
- **Agent Skills**: Implementation of [Anthropic-Agent-Skills](https://docs.claude.com/en/docs/agents-and-tools/agent-skills) Protocol.
4344
- **Lightweight and Extensible**: Easy to extend and customize for various applications.
4445

@@ -52,6 +53,8 @@ MS-Agent is a lightweight framework designed to empower agents with autonomous e
5253

5354
## 🎉 News
5455

56+
* 🎬 Nov 13, 2025: Release Singularity Cinema, to support short video generation for complex scenarios, check [here](projects/singularity_cinema/README_EN.md)
57+
5558
* 🚀 Nov 12, 2025: Release MS-Agent v1.5.0, which includes the following updates:
5659
- 🔥 We present [FinResearch](projects/fin_research/README.md), a multi-agent workflow tailored for financial research
5760
- Support financial data collection via [Akshare](https://github.com/akfamily/akshare) and [Baostock](http://baostock.com/mainContent?file=home.md)
@@ -484,6 +487,47 @@ aggregator:
484487
- README: [FinResearch](projects/fin_research/README.md)
485488
- Documentation: [MS-Agent Documentation](https://ms-agent-en.readthedocs.io/en/latest/Projects/FinResearch.html)
486489
490+
### Singularity Cinema
491+
492+
Singularity Cinema is an Agent-powered workflow for generating short videos, capable of producing high-quality complex short videos using either a single-sentence prompt or knowledge-based documents.
493+
494+
#### Core Features
495+
496+
- 🎬 **Supports Both Simple and Complex Requirements**: Can work with a single-sentence description or handle complex information files
497+
498+
- 🎹 **Sophisticated Tables and Formulas**: Can display and interpret formulas and charts within short videos that correspond to the script
499+
500+
- 🎮 **End-to-End**: From requirements to script to storyboard, from voiceover to charts to subtitles, and finally human feedback and video generation—the entire end-to-end process completed with a single command
501+
502+
- 🏁 **High Configurability**: Highly configurable with easy adjustments for voice, style, and materials through simple configuration
503+
504+
- 🚧 **Customizable**: Clear and simple workflow, suitable for secondary development
505+
506+
#### Quick Start
507+
508+
**Usage Example**:
509+
510+
```bash
511+
OPENAI_API_KEY=xxx-xxx T2I_API_KEY=ms-xxx-xxx ms-agent run --config "projects/singularity_cinema" --query "Your custom topic" --load_cache true --trust_remote_code true
512+
```
513+
514+
**Results**:
515+
516+
<video src="docs/resources/deepspeed-zero.mp4" controls="controls" style="max-width: 730px;">
517+
</video>
518+
519+
**An introduction to Deepspeed ZeRO**
520+
521+
<video src="docs/resources/a-history-of-us-gdp.mp4" controls="controls" style="max-width: 730px;">
522+
</video>
523+
524+
**A history of US GDP**
525+
526+
#### References
527+
528+
- [Complete Documentation](./docs/zh/Projects/短视频生成.md)
529+
530+
487531
<br>
488532
489533
### Interesting works
@@ -499,6 +543,9 @@ We are committed to continuously improving and expanding the MS-Agent framework
499543
- [ ] **FinResearch** – A financial deep-research agent dedicated to in-depth analysis and research in the finance domain.
500544
- [x] Long-term deep financial analysis report generation
501545
- [ ] Near real-time event-driven report generation
546+
- [ ] **Singularity Cinema**
547+
- [ ] Support more complex scenarios
548+
- [ ] Improve stabilises
502549
- [ ] **Multimodal Agentic Search** – Supporting large-scale multimodal document retrieval and generation of search results combining text and images.
503550
- [ ] Enhanced **Agent Skills** – Providing a richer set of predefined skills and tools to expand agent capabilities and enabling multi-skill collaboration for complex task execution.
504551
- [ ] **Agent-Workstation** - An unified WebUI with one-click local deployment support with combining all agent capabilities of MS-Agent, such as AgentChat, MCP, AgentSkills, DeepResearch, DocResearch, CodeScratch, etc.

README_ZH.md

Lines changed: 48 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,8 @@ MS-Agent是一个轻量级框架,旨在为智能体提供自主探索能力。
3838

3939
- **通用多智能体**:基于MCP的工具调用能力与智能体聊天。
4040
- **深度研究**:启用自主探索和复杂任务执行的高级能力。
41-
- **代码生成**:支持带有工件的代码生成任务。
41+
- **代码生成**:支持复杂项目的代码生成任务。
42+
- **短视频生成**:支持5分钟左右的短视频生成。
4243
- **Agent Skills**:兼容Anthropic-Agent-Skills协议,实现智能体技能模块。
4344
- **轻量级且可扩展**:易于扩展和定制以适应各种应用。
4445

@@ -50,6 +51,8 @@ MS-Agent是一个轻量级框架,旨在为智能体提供自主探索能力。
5051

5152
## 🎉 新闻
5253

54+
* 🎬 2025.11.13: 发布了“奇点放映室”,用于进行知识类文档的复杂场景短视频制作,具体查看[这里](projects/singularity_cinema/README.md)
55+
5356
* 🚀 2025.11.12:发布MS-Agent v1.5.0,包含以下更新:
5457
- 🔥 新增 [**FinResearch**](projects/fin_research/README.md),支持金融领域的深度研究和分析
5558
- 支持基于[Akshare](https://github.com/akfamily/akshare)[Baostock](http://baostock.com/mainContent?file=home.md)的金融数据获取工具
@@ -481,6 +484,47 @@ aggregator:
481484
- README:请参考[FinResearch](projects/fin_research/README_zh.md)
482485
- 说明文档: 请参考[MS-Agent文档](https://ms-agent.readthedocs.io/zh-cn/latest/Projects/%E9%87%91%E8%9E%8D%E6%B7%B1%E5%BA%A6%E7%A0%94%E7%A9%B6.html)
483486
487+
### 奇点放映室
488+
489+
奇点放映室是一个Agent生成短视频的工作流,可以在使用一句话prompt或者知识类DOC的情况下支持高质量复杂短视频生成。
490+
491+
#### 1) 核心特性
492+
493+
- 🎬 **支持简单需求和复杂需求**:可以一句话描述需求,也可以提供复杂的信息文件
494+
495+
- 🎹 **复杂精美的表格和公式**:可以在短视频内部展示和台本相应的公式和图表解读
496+
497+
- 🎮 **端到端**:从需求到台本到分镜,从旁白音到图表到字幕,最后人工反馈和生成视频,端到端流程一个命令搞定
498+
499+
- 🏁 **可配置性**:可配置性高,声音、风格、素材都可以通过简单配置调节
500+
501+
- 🚧 **定制化**:工作流清晰简单,适合二次开发
502+
503+
#### 2) 快速开始
504+
505+
**使用示例**:
506+
507+
508+
```bash
509+
OPENAI_API_KEY=xxx-xxx T2I_API_KEY=ms-xxx-xxx ms-agent run --config "projects/singularity_cinema" --query "你的自定义主题" --load_cache true --trust_remote_code true
510+
```
511+
512+
**运行结果**
513+
514+
<video src="docs/resources/deepspeed-zero.mp4" controls="controls" style="max-width: 730px;">
515+
</video>
516+
517+
**A introduction of Deepspeed ZeRO**
518+
519+
<video src="docs/resources/a-history-of-us-gdp.mp4" controls="controls" style="max-width: 730px;">
520+
</video>
521+
522+
**A history of US GDP**
523+
524+
#### 3) 参考文档
525+
526+
- [完整文档](./docs/zh/Projects/短视频生成.md)
527+
484528
<br>
485529

486530
### 有趣的工作
@@ -496,6 +540,9 @@ aggregator:
496540
- [ ] 金融深度研究智能体 **FinResearch** - 专注于金融领域的深度研究和分析。
497541
- [x] 长周期深度金融分析报告生成
498542
- [ ] 准实时事件驱动型简报生成
543+
- [ ] **奇点放映室**
544+
- [ ] 支持更复杂的短视频场景
545+
- [ ] 提升稳定度
499546
- [ ] 多模态检索增强生成 **Multimodal Agentic Search** - 支持大规模多模态文档检索和图文检索结果生成。
500547
- [ ] 增强的 **Agent Skills** - 提供更多预定义的技能和工具,提升智能体技能边界,并支持多技能协作,完成复杂任务执行。
501548
- [ ] 统一的WebUI **Agent-Workstation**,支持本地一键部署,集成了 MS-Agent 的所有智能体能力,如 AgentChat、MCP、AgentSkills、DeepResearch、DocResearch、CodeScratch 等。
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# Contributor Guide
2+
3+
## Workflow Contribution
4+
5+
MS-Agent is designed as an end-to-end workflow Agent framework based on single-command or single-search-box mode. It supports direct loading of external code and configurations:
6+
7+
```shell
8+
ms-agent --config local-dir --trust_remote_code true
9+
# or
10+
ms-agent --config group/model-id --trust_remote_code true
11+
```
12+
13+
The two methods above can load configurations and code from local directories or ModelScope model repositories respectively and run them. Based on this, secondary development for MS-Agent is not limited to direct PRs to the GitHub repository. Developers can use our basic capabilities and host their code in ModelScope model repositories. Users only need to specify the repository ID to use your code workflow.
14+
15+
This approach is very similar to the current projects under projects/*, with the difference being loading from local folders or model repository code. We provide several scaffold projects in the code repository that developers can build upon:
16+
17+
- An example of inheriting LLMAgent to implement custom logic: https://www.modelscope.cn/models/ms-agent/simple_agent_code
18+
- A custom external workflow case: https://www.modelscope.cn/models/ms-agent/simple_workflow
19+
- A custom external tool case: https://www.modelscope.cn/models/ms-agent/simple_tool_plugin
20+
- An agent example with configuration files defined: https://www.modelscope.cn/models/ms-agent/simple_agent
21+
- A slightly more complex data collection case: https://www.modelscope.cn/models/ms-agent/newspaper
22+
23+
We will subsequently provide an external integration method based on GitHub clone, so developers will also be able to host their code on GitHub in the future.
24+
25+
## Developer Recognition
26+
27+
You are welcome to add your work to the "Interesting works" section of the README via PR, along with an introduction to your project. Additionally, you can provide an author.txt file at the same level as your configuration file directory and write your name in it. When developers use your workflow, they will see a message like this:
28+
29+
![](../../resources/author.jpg)
Lines changed: 143 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,143 @@
1+
# SingularityCinema
2+
3+
A lightweight and excellent short video generator
4+
5+
## Installation
6+
7+
1. Clone the code
8+
```shell
9+
git clone https://github.com/modelscope/ms-agent.git
10+
cd ms-agent
11+
```
12+
13+
2. Install dependencies
14+
```shell
15+
pip install .
16+
cd projects/singularity_cinema
17+
pip install -r requirements.txt
18+
```
19+
20+
Install [ffmpeg](https://www.ffmpeg.org/download.html#build-windows).
21+
22+
Before executing the above installation commands, please ensure your Python>=3.10. For Python installation, refer to [Conda](https://docs.conda.io/projects/conda/en/stable/user-guide/install/index.html)
23+
24+
## Compatibility and Limitations
25+
26+
SingularityCinema generates scripts and storyboards based on large language models and produces short videos.
27+
28+
### Compatibility
29+
30+
- Short video types: Educational, economic videos, especially those containing charts, formulas, and principle explanations
31+
- Language: No restrictions, subtitles and voice follow your original query and document materials
32+
- Reading external materials: Supports plain text, does not support multimodal
33+
- Secondary development: Complete code is in stepN/agent.py with no license restrictions, free for secondary development and commercial use
34+
- Please note and comply with the commercial licenses of background music and fonts you use
35+
36+
### Limitations
37+
38+
- LLM test range: Claude, effects with other models untested
39+
- AIGC model test range: Qwen-Image, effects with other models untested
40+
41+
## Running
42+
43+
1. Prepare API Key
44+
45+
### Prepare LLM Key
46+
47+
Taking Claude as an example, you need to first apply for or purchase Claude model access. The Claude Key can be set in environment variables:
48+
49+
```shell
50+
OPENAI_API_KEY=xxx-xxx
51+
```
52+
53+
### Prepare ModelScope Text-to-Image Key
54+
55+
The default model is currently Qwen-Image. The ModelScope API Key can be applied for [here](https://www.modelscope.cn/my/myaccesstoken). Then set it in environment variables:
56+
57+
```shell
58+
T2I_API_KEY=ms-xxx-xxx
59+
```
60+
61+
2. Prepare your short video materials
62+
63+
You can choose to generate a video with a single sentence, for example:
64+
65+
```text
66+
Generate a short video describing GDP economic knowledge, approximately 3 minutes long.
67+
```
68+
69+
Or use your previously collected text materials:
70+
71+
```text
72+
Generate a short video describing large language model technology, read /home/user/llm.txt for detailed content
73+
```
74+
75+
3. Run command
76+
77+
```shell
78+
ms-agent run --config "projects/singularity_cinema" --query "Your custom theme, see description above" --load_cache true --trust_remote_code true
79+
```
80+
81+
4. The run takes approximately 20 minutes. The video is generated at output/final_video.mp4. After generation, you can review this file, compile the parts that don't meet requirements, input them into the command line input, and the workflow will continue improving. If requirements are met, input quit or exit and the program will automatically terminate.
82+
83+
5. If the execution fails, such as URL call timeout or file generation failure, you can re-run the command above. ms-agent saves execution information in the output/memory folder, and after re-running the command, it will continue from where it failed.
84+
* If you want to regenerate from scratch, please rename or move the output folder elsewhere, or delete the corresponding memory and input files.
85+
* You can delete input files for only specific scenes/shots, so that re-execution will only process those corresponding scenes/shots. This is also the principle behind the manual feedback correction in the final step.
86+
87+
## Technical Principles
88+
89+
1. Generate basic script based on user requirements
90+
* Input: User requirements, may read user-specified files
91+
* Output: Script file script.txt, original requirement file topic.txt, short video name file title.txt
92+
2. Split storyboard design based on script
93+
* Input: topic.txt, script.txt
94+
* Output: segments.txt, storyboard list describing narration, background image generation requirements, foreground manim animation requirements
95+
3. Generate audio narration for storyboards
96+
* Input: segments.txt
97+
* Output: audio/audio_N.mp3 list, N is segment number starting from 1, and root directory audio_info.txt containing audio duration
98+
4. Generate manim animation code based on voice duration
99+
* Input: segments.txt, audio_info.txt
100+
* Output: Manim code file list manim_code/segment_N.py, N is segment number starting from 1
101+
5. Fix manim code
102+
* Input: manim_code/segment_N.py N is segment number starting from 1, code_fix/code_fix_N.txt error prediction file
103+
* Output: Updated manim_code/segment_N.py files
104+
6. Render manim code
105+
* Input: manim_code/segment_N.py
106+
* Output: manim_render/scene_N folder list, if segments.txt contains manim requirements for a step, the corresponding folder will have a manim.mov file
107+
7. Generate text-to-image prompts
108+
* Input: segments.txt
109+
* Output: illustration_prompts/segment_N.txt, N is segment number starting from 1
110+
8. Text-to-image
111+
* Input: illustration_prompts/segment_N.txt list
112+
* Output: images/illustration_N.png list, N is segment number starting from 1
113+
9. Generate subtitles
114+
* Input: segments.txt
115+
* Output: subtitles/bilingual_subtitle_N.png list, N is segment number starting from 1
116+
10. Generate background, a solid color image with short video title and slogans
117+
* Input: title.txt
118+
* Output: background.jpg
119+
11. Composite complete video
120+
* Input: All previous file information
121+
* Output: final_video.mp4
122+
12. Human feedback
123+
124+
## Adjustable Parameters
125+
126+
Most adjustable parameters are in agent.yaml. Before running, you can modify this file for customization.
127+
128+
Some important parameters are listed below:
129+
130+
- llm: This group of parameters controls the LLM's url, apikey, etc.
131+
- generation_config: This group of parameters controls LLM generation parameters
132+
- prompt.system: Controls the system for script generation stage
133+
- If you want to modify the system for storyboard generation, you can modify the system in step2_segment/agent.py
134+
- text2image: Text-to-image model parameters, including url, model id, etc.
135+
- t2i_transition: Background image effect, default is ken-burns effect
136+
- t2i_style: Image style, you can set your desired text-to-image style
137+
- t2i_num_parallel: Text-to-image call parallelism. Default is 1 to prevent rate limiting
138+
- llm_num_parallel: LLM call parallelism, default is 10
139+
- video: Video generation bitrate and other parameters
140+
- voice/voices: edge_tts voice settings, if you have other voice options, you can add them here
141+
- subtitle_lang: Multilingual subtitle language, if not set, no translation is performed
142+
- slogan: Displayed on the right side of the screen, generally shows producer name and short video collection
143+
- fonts: The recommended fonts list

docs/en/index.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ MS-Agent DOCUMENTATION
2121
Components/Workflow.md
2222
Components/SupportedModels.md
2323
Components/Tools.md
24+
Components/ContributorGuide.md
2425

2526
.. toctree::
2627
:maxdepth: 2
@@ -30,3 +31,4 @@ MS-Agent DOCUMENTATION
3031
Projects/DeepResearch.md
3132
Projects/CodeScratch.md
3233
Projects/FinResearch.md
34+
Projects/VideoGeneration.md
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
version https://git-lfs.github.com/spec/v1
2+
oid sha256:ff703171f9f110866f4e33fbb65eeaffa37feb8f9b1122f9e801d55f48ed3abc
3+
size 42206011

docs/resources/author.jpg

93.9 KB
Loading

docs/resources/deepspeed-zero.mp4

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
version https://git-lfs.github.com/spec/v1
2+
oid sha256:9d0cff32b54ab2672316d55cbb4bbe07dd1ddea0747c2f0cd15fe96dfd0b1b44
3+
size 42869503
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# 贡献者指南
2+
3+
## 工作流贡献
4+
5+
MS-Agent的设计目标是基于单命令或单搜索框模式的端到端工作流Agent框架。并且支持直接加载外部代码和配置:
6+
7+
```shell
8+
ms-agent --config local-dir --trust_remote_code true
9+
# or
10+
ms-agent --config group/model-id --trust_remote_code true
11+
```
12+
13+
上面两种方式可以分别从本地或魔搭模型仓库加载配置和代码并运行。在此基础上,针对MS-Agent的二次开发就不局限于直接对github仓库的PR,开发者可以使用我们的基础能力,把代码托管到魔搭模型仓库中,使用者只需要指定仓库id即可使用你的代码工作流。
14+
15+
这种方式和目前的projects/*下面的项目很相似,区别在于加载本地文件夹或者模型仓库代码。我们在代码仓库中提供了若干脚手架项目,开发者可以基于这些项目的代码继续开发:
16+
17+
- 一个继承LLMAgent实现自定义逻辑的样例:https://www.modelscope.cn/models/ms-agent/simple_agent_code
18+
- 一个自定义外部工作流案例:https://www.modelscope.cn/models/ms-agent/simple_workflow
19+
- 一个自定义外部工具的案例:https://www.modelscope.cn/models/ms-agent/simple_tool_plugin
20+
- 一个定义了配置文件的agent样例:https://www.modelscope.cn/models/ms-agent/simple_agent
21+
- 一个稍微复杂的数据收集案例:https://www.modelscope.cn/models/ms-agent/newspaper
22+
23+
我们后续会提供基于github clone的外部融合方式,因此后续开发者也可以将代码托管在github上。
24+
25+
## 开发者声望
26+
27+
欢迎将自己的工作以PR的方式加入到README的“有趣的工作”一栏中,并给出对自己项目的介绍。同时,你也可以在你配置文件目录同级提供一个author.txt文件,将自己的大名写入其中,开发者在使用你的工作流时会看到这样的打印:
28+
29+
![](../../resources/author.jpg)

0 commit comments

Comments
 (0)