modelscope
diff --git a/‎.gitattributes‎
Lines changed: 2 additions & 0 deletions b/‎.gitattributes‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 47 additions & 0 deletions b/‎README.md‎
Lines changed: 47 additions & 0 deletions
diff --git a/‎README_ZH.md‎
Lines changed: 48 additions & 1 deletion b/‎README_ZH.md‎
Lines changed: 48 additions & 1 deletion
diff --git a/‎docs/en/Components/ContributorGuide.md‎
Lines changed: 29 additions & 0 deletions b/‎docs/en/Components/ContributorGuide.md‎
Lines changed: 29 additions & 0 deletions
diff --git a/‎docs/en/Projects/VideoGeneration.md‎
Lines changed: 143 additions & 0 deletions b/‎docs/en/Projects/VideoGeneration.md‎
Lines changed: 143 additions & 0 deletions
diff --git a/‎docs/en/index.rst‎
Lines changed: 2 additions & 0 deletions b/‎docs/en/index.rst‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎docs/resources/a-history-of-us-gdp.mp4‎
Lines changed: 3 additions & 0 deletions b/‎docs/resources/a-history-of-us-gdp.mp4‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎docs/resources/author.jpg‎
93.9 KB b/‎docs/resources/author.jpg‎
93.9 KB
diff --git a/‎docs/resources/deepspeed-zero.mp4‎
Lines changed: 3 additions & 0 deletions b/‎docs/resources/deepspeed-zero.mp4‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎docs/zh/Components/贡献者指南.md‎
Lines changed: 29 additions & 0 deletions b/‎docs/zh/Components/贡献者指南.md‎
Lines changed: 29 additions & 0 deletions
@@ -0,0 +1,2 @@
+docs/resources/a-history-of-us-gdp.mp4 filter=lfs diff=lfs merge=lfs -text
+docs/resources/deepspeed-zero.mp4 filter=lfs diff=lfs merge=lfs -text
@@ -39,6 +39,7 @@ MS-Agent is a lightweight framework designed to empower agents with autonomous e
 - **Multi-Agent for general purpose**: Chat with agent with tool-calling capabilities based on MCP.
 - **Deep Research**: To enable advanced capabilities for autonomous exploration and complex task execution.
 - **Code Generation**: Supports code generation tasks with artifacts.
+- **Short Video Generation**：Support video generation of about 5 minutes.
 - **Agent Skills**: Implementation of [Anthropic-Agent-Skills](https://docs.claude.com/en/docs/agents-and-tools/agent-skills) Protocol.
 - **Lightweight and Extensible**: Easy to extend and customize for various applications.
 
@@ -52,6 +53,8 @@ MS-Agent is a lightweight framework designed to empower agents with autonomous e
 
 ## 🎉 News
 
+* 🎬 Nov 13, 2025: Release Singularity Cinema, to support short video generation for complex scenarios, check [here](projects/singularity_cinema/README_EN.md)
+
 * 🚀 Nov 12, 2025: Release MS-Agent v1.5.0, which includes the following updates:
   - 🔥 We present [FinResearch](projects/fin_research/README.md), a multi-agent workflow tailored for financial research
   - Support financial data collection via [Akshare](https://github.com/akfamily/akshare) and [Baostock](http://baostock.com/mainContent?file=home.md)
@@ -484,6 +487,47 @@ aggregator:
 - README: [FinResearch](projects/fin_research/README.md)
 - Documentation: [MS-Agent Documentation](https://ms-agent-en.readthedocs.io/en/latest/Projects/FinResearch.html)
 
+### Singularity Cinema
+
+Singularity Cinema is an Agent-powered workflow for generating short videos, capable of producing high-quality complex short videos using either a single-sentence prompt or knowledge-based documents.
+
+#### Core Features
+
+- 🎬 **Supports Both Simple and Complex Requirements**: Can work with a single-sentence description or handle complex information files
+
+- 🎹 **Sophisticated Tables and Formulas**: Can display and interpret formulas and charts within short videos that correspond to the script
+
+- 🎮 **End-to-End**: From requirements to script to storyboard, from voiceover to charts to subtitles, and finally human feedback and video generation—the entire end-to-end process completed with a single command
+
+- 🏁 **High Configurability**: Highly configurable with easy adjustments for voice, style, and materials through simple configuration
+
+- 🚧 **Customizable**: Clear and simple workflow, suitable for secondary development
+
+#### Quick Start
+
+**Usage Example**:
+
+```bash
+OPENAI_API_KEY=xxx-xxx T2I_API_KEY=ms-xxx-xxx ms-agent run --config "projects/singularity_cinema" --query "Your custom topic" --load_cache true --trust_remote_code true
+```
+
+**Results**:
+
+<video src="docs/resources/deepspeed-zero.mp4" controls="controls" style="max-width: 730px;">
+</video>
+
+**An introduction to Deepspeed ZeRO**
+
+<video src="docs/resources/a-history-of-us-gdp.mp4" controls="controls" style="max-width: 730px;">
+</video>
+
+**A history of US GDP**
+
+#### References
+
+- [Complete Documentation](./docs/zh/Projects/短视频生成.md)
+
+
 <br>
 
 ### Interesting works
@@ -499,6 +543,9 @@ We are committed to continuously improving and expanding the MS-Agent framework
 - [ ] **FinResearch** – A financial deep-research agent dedicated to in-depth analysis and research in the finance domain.
   - [x] Long-term deep financial analysis report generation
   - [ ] Near real-time event-driven report generation
+- [ ] **Singularity Cinema**
+  - [ ] Support more complex scenarios
+  - [ ] Improve stabilises
 - [ ] **Multimodal Agentic Search** – Supporting large-scale multimodal document retrieval and generation of search results combining text and images.
 - [ ] Enhanced **Agent Skills** – Providing a richer set of predefined skills and tools to expand agent capabilities and enabling multi-skill collaboration for complex task execution.
 - [ ] **Agent-Workstation** - An unified WebUI with one-click local deployment support with combining all agent capabilities of MS-Agent, such as AgentChat, MCP, AgentSkills, DeepResearch, DocResearch, CodeScratch, etc.
 
@@ -38,7 +38,8 @@ MS-Agent是一个轻量级框架，旨在为智能体提供自主探索能力。
 
 - **通用多智能体**：基于MCP的工具调用能力与智能体聊天。
 - **深度研究**：启用自主探索和复杂任务执行的高级能力。
-- **代码生成**：支持带有工件的代码生成任务。
+- **代码生成**：支持复杂项目的代码生成任务。
+- **短视频生成**：支持5分钟左右的短视频生成。
 - **Agent Skills**：兼容Anthropic-Agent-Skills协议，实现智能体技能模块。
 - **轻量级且可扩展**：易于扩展和定制以适应各种应用。
 
@@ -50,6 +51,8 @@ MS-Agent是一个轻量级框架，旨在为智能体提供自主探索能力。
 
 ## 🎉 新闻
 
+* 🎬 2025.11.13: 发布了“奇点放映室”，用于进行知识类文档的复杂场景短视频制作，具体查看[这里](projects/singularity_cinema/README.md)
+
 * 🚀 2025.11.12：发布MS-Agent v1.5.0，包含以下更新：
   - 🔥 新增 [**FinResearch**](projects/fin_research/README.md)，支持金融领域的深度研究和分析
   - 支持基于[Akshare](https://github.com/akfamily/akshare)和[Baostock](http://baostock.com/mainContent?file=home.md)的金融数据获取工具
@@ -481,6 +484,47 @@ aggregator:
 - README：请参考[FinResearch](projects/fin_research/README_zh.md)
 - 说明文档: 请参考[MS-Agent文档](https://ms-agent.readthedocs.io/zh-cn/latest/Projects/%E9%87%91%E8%9E%8D%E6%B7%B1%E5%BA%A6%E7%A0%94%E7%A9%B6.html)
 
+### 奇点放映室
+
+奇点放映室是一个Agent生成短视频的工作流，可以在使用一句话prompt或者知识类DOC的情况下支持高质量复杂短视频生成。
+
+#### 1) 核心特性
+
+- 🎬 **支持简单需求和复杂需求**：可以一句话描述需求，也可以提供复杂的信息文件
+
+- 🎹 **复杂精美的表格和公式**：可以在短视频内部展示和台本相应的公式和图表解读
+
+- 🎮 **端到端**：从需求到台本到分镜，从旁白音到图表到字幕，最后人工反馈和生成视频，端到端流程一个命令搞定
+
+- 🏁 **可配置性**：可配置性高，声音、风格、素材都可以通过简单配置调节
+
+- 🚧 **定制化**：工作流清晰简单，适合二次开发
+
+#### 2) 快速开始
+
+**使用示例**：
+
+
+```bash
+OPENAI_API_KEY=xxx-xxx T2I_API_KEY=ms-xxx-xxx ms-agent run --config "projects/singularity_cinema" --query "你的自定义主题" --load_cache true --trust_remote_code true
+```
+
+**运行结果**：
+
+<video src="docs/resources/deepspeed-zero.mp4" controls="controls" style="max-width: 730px;">
+</video>
+
+**A introduction of Deepspeed ZeRO**
+
+<video src="docs/resources/a-history-of-us-gdp.mp4" controls="controls" style="max-width: 730px;">
+</video>
+
+**A history of US GDP**
+
+#### 3) 参考文档
+
+- [完整文档](./docs/zh/Projects/短视频生成.md)
+
 <br>
 
 ### 有趣的工作
@@ -496,6 +540,9 @@ aggregator:
 - [ ] 金融深度研究智能体 **FinResearch** - 专注于金融领域的深度研究和分析。
   - [x] 长周期深度金融分析报告生成
   - [ ] 准实时事件驱动型简报生成
+- [ ] **奇点放映室**
+  - [ ] 支持更复杂的短视频场景
+  - [ ] 提升稳定度
 - [ ] 多模态检索增强生成 **Multimodal Agentic Search** - 支持大规模多模态文档检索和图文检索结果生成。
 - [ ] 增强的 **Agent Skills** - 提供更多预定义的技能和工具，提升智能体技能边界，并支持多技能协作，完成复杂任务执行。
 - [ ] 统一的WebUI **Agent-Workstation**，支持本地一键部署，集成了 MS-Agent 的所有智能体能力，如 AgentChat、MCP、AgentSkills、DeepResearch、DocResearch、CodeScratch 等。
 
@@ -0,0 +1,29 @@
+# Contributor Guide
+
+## Workflow Contribution
+
+MS-Agent is designed as an end-to-end workflow Agent framework based on single-command or single-search-box mode. It supports direct loading of external code and configurations:
+
+```shell
+ms-agent --config local-dir --trust_remote_code true
+# or
+ms-agent --config group/model-id --trust_remote_code true
+```
+
+The two methods above can load configurations and code from local directories or ModelScope model repositories respectively and run them. Based on this, secondary development for MS-Agent is not limited to direct PRs to the GitHub repository. Developers can use our basic capabilities and host their code in ModelScope model repositories. Users only need to specify the repository ID to use your code workflow.
+
+This approach is very similar to the current projects under projects/*, with the difference being loading from local folders or model repository code. We provide several scaffold projects in the code repository that developers can build upon:
+
+- An example of inheriting LLMAgent to implement custom logic: https://www.modelscope.cn/models/ms-agent/simple_agent_code
+- A custom external workflow case: https://www.modelscope.cn/models/ms-agent/simple_workflow
+- A custom external tool case: https://www.modelscope.cn/models/ms-agent/simple_tool_plugin
+- An agent example with configuration files defined: https://www.modelscope.cn/models/ms-agent/simple_agent
+- A slightly more complex data collection case: https://www.modelscope.cn/models/ms-agent/newspaper
+
+We will subsequently provide an external integration method based on GitHub clone, so developers will also be able to host their code on GitHub in the future.
+
+## Developer Recognition
+
+You are welcome to add your work to the "Interesting works" section of the README via PR, along with an introduction to your project. Additionally, you can provide an author.txt file at the same level as your configuration file directory and write your name in it. When developers use your workflow, they will see a message like this:
+
+![](../../resources/author.jpg)
@@ -0,0 +1,143 @@
+# SingularityCinema
+
+A lightweight and excellent short video generator
+
+## Installation
+
+1. Clone the code
+```shell
+git clone https://github.com/modelscope/ms-agent.git
+cd ms-agent
+```
+
+2. Install dependencies
+```shell
+pip install .
+cd projects/singularity_cinema
+pip install -r requirements.txt
+```
+
+Install [ffmpeg](https://www.ffmpeg.org/download.html#build-windows).
+
+Before executing the above installation commands, please ensure your Python>=3.10. For Python installation, refer to [Conda](https://docs.conda.io/projects/conda/en/stable/user-guide/install/index.html)
+
+## Compatibility and Limitations
+
+SingularityCinema generates scripts and storyboards based on large language models and produces short videos.
+
+### Compatibility
+
+- Short video types: Educational, economic videos, especially those containing charts, formulas, and principle explanations
+- Language: No restrictions, subtitles and voice follow your original query and document materials
+- Reading external materials: Supports plain text, does not support multimodal
+- Secondary development: Complete code is in stepN/agent.py with no license restrictions, free for secondary development and commercial use
+  - Please note and comply with the commercial licenses of background music and fonts you use
+
+### Limitations
+
+- LLM test range: Claude, effects with other models untested
+- AIGC model test range: Qwen-Image, effects with other models untested
+
+## Running
+
+1. Prepare API Key
+
+### Prepare LLM Key
+
+Taking Claude as an example, you need to first apply for or purchase Claude model access. The Claude Key can be set in environment variables:
+
+```shell
+OPENAI_API_KEY=xxx-xxx
+```
+
+### Prepare ModelScope Text-to-Image Key
+
+The default model is currently Qwen-Image. The ModelScope API Key can be applied for [here](https://www.modelscope.cn/my/myaccesstoken). Then set it in environment variables:
+
+```shell
+T2I_API_KEY=ms-xxx-xxx
+```
+
+2. Prepare your short video materials
+
+You can choose to generate a video with a single sentence, for example:
+
+```text
+Generate a short video describing GDP economic knowledge, approximately 3 minutes long.
+```
+
+Or use your previously collected text materials:
+
+```text
+Generate a short video describing large language model technology, read /home/user/llm.txt for detailed content
+```
+
+3. Run command
+
+```shell
+ms-agent run --config "projects/singularity_cinema" --query "Your custom theme, see description above" --load_cache true --trust_remote_code true
+```
+
+4. The run takes approximately 20 minutes. The video is generated at output/final_video.mp4. After generation, you can review this file, compile the parts that don't meet requirements, input them into the command line input, and the workflow will continue improving. If requirements are met, input quit or exit and the program will automatically terminate.
+
+5. If the execution fails, such as URL call timeout or file generation failure, you can re-run the command above. ms-agent saves execution information in the output/memory folder, and after re-running the command, it will continue from where it failed.
+    * If you want to regenerate from scratch, please rename or move the output folder elsewhere, or delete the corresponding memory and input files.
+    * You can delete input files for only specific scenes/shots, so that re-execution will only process those corresponding scenes/shots. This is also the principle behind the manual feedback correction in the final step.
+
+## Technical Principles
+
+1. Generate basic script based on user requirements
+    * Input: User requirements, may read user-specified files
+    * Output: Script file script.txt, original requirement file topic.txt, short video name file title.txt
+2. Split storyboard design based on script
+    * Input: topic.txt, script.txt
+    * Output: segments.txt, storyboard list describing narration, background image generation requirements, foreground manim animation requirements
+3. Generate audio narration for storyboards
+    * Input: segments.txt
+    * Output: audio/audio_N.mp3 list, N is segment number starting from 1, and root directory audio_info.txt containing audio duration
+4. Generate manim animation code based on voice duration
+    * Input: segments.txt, audio_info.txt
+    * Output: Manim code file list manim_code/segment_N.py, N is segment number starting from 1
+5. Fix manim code
+    * Input: manim_code/segment_N.py N is segment number starting from 1, code_fix/code_fix_N.txt error prediction file
+    * Output: Updated manim_code/segment_N.py files
+6. Render manim code
+    * Input: manim_code/segment_N.py
+    * Output: manim_render/scene_N folder list, if segments.txt contains manim requirements for a step, the corresponding folder will have a manim.mov file
+7. Generate text-to-image prompts
+    * Input: segments.txt
+    * Output: illustration_prompts/segment_N.txt, N is segment number starting from 1
+8. Text-to-image
+    * Input: illustration_prompts/segment_N.txt list
+    * Output: images/illustration_N.png list, N is segment number starting from 1
+9. Generate subtitles
+    * Input: segments.txt
+    * Output: subtitles/bilingual_subtitle_N.png list, N is segment number starting from 1
+10. Generate background, a solid color image with short video title and slogans
+    * Input: title.txt
+    * Output: background.jpg
+11. Composite complete video
+    * Input: All previous file information
+    * Output: final_video.mp4
+12. Human feedback
+
+## Adjustable Parameters
+
+Most adjustable parameters are in agent.yaml. Before running, you can modify this file for customization.
+
+Some important parameters are listed below:
+
+- llm: This group of parameters controls the LLM's url, apikey, etc.
+- generation_config: This group of parameters controls LLM generation parameters
+- prompt.system: Controls the system for script generation stage
+  - If you want to modify the system for storyboard generation, you can modify the system in step2_segment/agent.py
+- text2image: Text-to-image model parameters, including url, model id, etc.
+  - t2i_transition: Background image effect, default is ken-burns effect
+  - t2i_style: Image style, you can set your desired text-to-image style
+- t2i_num_parallel: Text-to-image call parallelism. Default is 1 to prevent rate limiting
+- llm_num_parallel: LLM call parallelism, default is 10
+- video: Video generation bitrate and other parameters
+- voice/voices: edge_tts voice settings, if you have other voice options, you can add them here
+- subtitle_lang: Multilingual subtitle language, if not set, no translation is performed
+- slogan: Displayed on the right side of the screen, generally shows producer name and short video collection
+- fonts: The recommended fonts list
@@ -21,6 +21,7 @@ MS-Agent DOCUMENTATION
    Components/Workflow.md
    Components/SupportedModels.md
    Components/Tools.md
+   Components/ContributorGuide.md
 
 .. toctree::
    :maxdepth: 2
@@ -30,3 +31,4 @@ MS-Agent DOCUMENTATION
    Projects/DeepResearch.md
    Projects/CodeScratch.md
    Projects/FinResearch.md
+   Projects/VideoGeneration.md
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:ff703171f9f110866f4e33fbb65eeaffa37feb8f9b1122f9e801d55f48ed3abc
+size 42206011
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:9d0cff32b54ab2672316d55cbb4bbe07dd1ddea0747c2f0cd15fe96dfd0b1b44
+size 42869503
@@ -0,0 +1,29 @@
+# 贡献者指南
+
+## 工作流贡献
+
+MS-Agent的设计目标是基于单命令或单搜索框模式的端到端工作流Agent框架。并且支持直接加载外部代码和配置：
+
+```shell
+ms-agent --config local-dir --trust_remote_code true
+# or
+ms-agent --config group/model-id --trust_remote_code true
+```
+
+上面两种方式可以分别从本地或魔搭模型仓库加载配置和代码并运行。在此基础上，针对MS-Agent的二次开发就不局限于直接对github仓库的PR，开发者可以使用我们的基础能力，把代码托管到魔搭模型仓库中，使用者只需要指定仓库id即可使用你的代码工作流。
+
+这种方式和目前的projects/*下面的项目很相似，区别在于加载本地文件夹或者模型仓库代码。我们在代码仓库中提供了若干脚手架项目，开发者可以基于这些项目的代码继续开发：
+
+- 一个继承LLMAgent实现自定义逻辑的样例：https://www.modelscope.cn/models/ms-agent/simple_agent_code
+- 一个自定义外部工作流案例：https://www.modelscope.cn/models/ms-agent/simple_workflow
+- 一个自定义外部工具的案例：https://www.modelscope.cn/models/ms-agent/simple_tool_plugin
+- 一个定义了配置文件的agent样例：https://www.modelscope.cn/models/ms-agent/simple_agent
+- 一个稍微复杂的数据收集案例：https://www.modelscope.cn/models/ms-agent/newspaper
+
+我们后续会提供基于github clone的外部融合方式，因此后续开发者也可以将代码托管在github上。
+
+## 开发者声望
+
+欢迎将自己的工作以PR的方式加入到README的“有趣的工作”一栏中，并给出对自己项目的介绍。同时，你也可以在你配置文件目录同级提供一个author.txt文件，将自己的大名写入其中，开发者在使用你的工作流时会看到这样的打印：
+
+![](../../resources/author.jpg)
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,2 @@`
	`1`	`+docs/resources/a-history-of-us-gdp.mp4 filter=lfs diff=lfs merge=lfs -text`
	`2`	`+docs/resources/deepspeed-zero.mp4 filter=lfs diff=lfs merge=lfs -text`
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+version https://git-lfs.github.com/spec/v1`
	`2`	`+oid sha256:ff703171f9f110866f4e33fbb65eeaffa37feb8f9b1122f9e801d55f48ed3abc`
	`3`	`+size 42206011`
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+version https://git-lfs.github.com/spec/v1`
	`2`	`+oid sha256:9d0cff32b54ab2672316d55cbb4bbe07dd1ddea0747c2f0cd15fe96dfd0b1b44`
	`3`	`+size 42869503`