|
| 1 | +# SingularityCinema |
| 2 | + |
| 3 | +A lightweight and excellent short video generator |
| 4 | + |
| 5 | +## Installation |
| 6 | + |
| 7 | +1. Clone the code |
| 8 | +```shell |
| 9 | +git clone https://github.com/modelscope/ms-agent.git |
| 10 | +cd ms-agent |
| 11 | +``` |
| 12 | + |
| 13 | +2. Install dependencies |
| 14 | +```shell |
| 15 | +pip install . |
| 16 | +cd projects/singularity_cinema |
| 17 | +pip install -r requirements.txt |
| 18 | +``` |
| 19 | + |
| 20 | +Install [ffmpeg](https://www.ffmpeg.org/download.html#build-windows). |
| 21 | + |
| 22 | +Before executing the above installation commands, please ensure your Python>=3.10. For Python installation, refer to [Conda](https://docs.conda.io/projects/conda/en/stable/user-guide/install/index.html) |
| 23 | + |
| 24 | +## Compatibility and Limitations |
| 25 | + |
| 26 | +SingularityCinema generates scripts and storyboards based on large language models and produces short videos. |
| 27 | + |
| 28 | +### Compatibility |
| 29 | + |
| 30 | +- Short video types: Educational, economic videos, especially those containing charts, formulas, and principle explanations |
| 31 | +- Language: No restrictions, subtitles and voice follow your original query and document materials |
| 32 | +- Reading external materials: Supports plain text, does not support multimodal |
| 33 | +- Secondary development: Complete code is in stepN/agent.py with no license restrictions, free for secondary development and commercial use |
| 34 | + - Please note and comply with the commercial licenses of background music and fonts you use |
| 35 | + |
| 36 | +### Limitations |
| 37 | + |
| 38 | +- LLM test range: Claude, effects with other models untested |
| 39 | +- AIGC model test range: Qwen-Image, effects with other models untested |
| 40 | + |
| 41 | +## Running |
| 42 | + |
| 43 | +1. Prepare API Key |
| 44 | + |
| 45 | +### Prepare LLM Key |
| 46 | + |
| 47 | +Taking Claude as an example, you need to first apply for or purchase Claude model access. The Claude Key can be set in environment variables: |
| 48 | + |
| 49 | +```shell |
| 50 | +OPENAI_API_KEY=xxx-xxx |
| 51 | +``` |
| 52 | + |
| 53 | +### Prepare ModelScope Text-to-Image Key |
| 54 | + |
| 55 | +The default model is currently Qwen-Image. The ModelScope API Key can be applied for [here](https://www.modelscope.cn/my/myaccesstoken). Then set it in environment variables: |
| 56 | + |
| 57 | +```shell |
| 58 | +T2I_API_KEY=ms-xxx-xxx |
| 59 | +``` |
| 60 | + |
| 61 | +2. Prepare your short video materials |
| 62 | + |
| 63 | +You can choose to generate a video with a single sentence, for example: |
| 64 | + |
| 65 | +```text |
| 66 | +Generate a short video describing GDP economic knowledge, approximately 3 minutes long. |
| 67 | +``` |
| 68 | + |
| 69 | +Or use your previously collected text materials: |
| 70 | + |
| 71 | +```text |
| 72 | +Generate a short video describing large language model technology, read /home/user/llm.txt for detailed content |
| 73 | +``` |
| 74 | + |
| 75 | +3. Run command |
| 76 | + |
| 77 | +```shell |
| 78 | +ms-agent run --config "projects/singularity_cinema" --query "Your custom theme, see description above" --load_cache true --trust_remote_code true |
| 79 | +``` |
| 80 | + |
| 81 | +4. The run takes approximately 20 minutes. The video is generated at output/final_video.mp4. After generation, you can review this file, compile the parts that don't meet requirements, input them into the command line input, and the workflow will continue improving. If requirements are met, input quit or exit and the program will automatically terminate. |
| 82 | + |
| 83 | +5. If the execution fails, such as URL call timeout or file generation failure, you can re-run the command above. ms-agent saves execution information in the output/memory folder, and after re-running the command, it will continue from where it failed. |
| 84 | + * If you want to regenerate from scratch, please rename or move the output folder elsewhere, or delete the corresponding memory and input files. |
| 85 | + * You can delete input files for only specific scenes/shots, so that re-execution will only process those corresponding scenes/shots. This is also the principle behind the manual feedback correction in the final step. |
| 86 | + |
| 87 | +## Technical Principles |
| 88 | + |
| 89 | +1. Generate basic script based on user requirements |
| 90 | + * Input: User requirements, may read user-specified files |
| 91 | + * Output: Script file script.txt, original requirement file topic.txt, short video name file title.txt |
| 92 | +2. Split storyboard design based on script |
| 93 | + * Input: topic.txt, script.txt |
| 94 | + * Output: segments.txt, storyboard list describing narration, background image generation requirements, foreground manim animation requirements |
| 95 | +3. Generate audio narration for storyboards |
| 96 | + * Input: segments.txt |
| 97 | + * Output: audio/audio_N.mp3 list, N is segment number starting from 1, and root directory audio_info.txt containing audio duration |
| 98 | +4. Generate manim animation code based on voice duration |
| 99 | + * Input: segments.txt, audio_info.txt |
| 100 | + * Output: Manim code file list manim_code/segment_N.py, N is segment number starting from 1 |
| 101 | +5. Fix manim code |
| 102 | + * Input: manim_code/segment_N.py N is segment number starting from 1, code_fix/code_fix_N.txt error prediction file |
| 103 | + * Output: Updated manim_code/segment_N.py files |
| 104 | +6. Render manim code |
| 105 | + * Input: manim_code/segment_N.py |
| 106 | + * Output: manim_render/scene_N folder list, if segments.txt contains manim requirements for a step, the corresponding folder will have a manim.mov file |
| 107 | +7. Generate text-to-image prompts |
| 108 | + * Input: segments.txt |
| 109 | + * Output: illustration_prompts/segment_N.txt, N is segment number starting from 1 |
| 110 | +8. Text-to-image |
| 111 | + * Input: illustration_prompts/segment_N.txt list |
| 112 | + * Output: images/illustration_N.png list, N is segment number starting from 1 |
| 113 | +9. Generate subtitles |
| 114 | + * Input: segments.txt |
| 115 | + * Output: subtitles/bilingual_subtitle_N.png list, N is segment number starting from 1 |
| 116 | +10. Generate background, a solid color image with short video title and slogans |
| 117 | + * Input: title.txt |
| 118 | + * Output: background.jpg |
| 119 | +11. Composite complete video |
| 120 | + * Input: All previous file information |
| 121 | + * Output: final_video.mp4 |
| 122 | +12. Human feedback |
| 123 | + |
| 124 | +## Adjustable Parameters |
| 125 | + |
| 126 | +Most adjustable parameters are in agent.yaml. Before running, you can modify this file for customization. |
| 127 | + |
| 128 | +Some important parameters are listed below: |
| 129 | + |
| 130 | +- llm: This group of parameters controls the LLM's url, apikey, etc. |
| 131 | +- generation_config: This group of parameters controls LLM generation parameters |
| 132 | +- prompt.system: Controls the system for script generation stage |
| 133 | + - If you want to modify the system for storyboard generation, you can modify the system in step2_segment/agent.py |
| 134 | +- text2image: Text-to-image model parameters, including url, model id, etc. |
| 135 | + - t2i_transition: Background image effect, default is ken-burns effect |
| 136 | + - t2i_style: Image style, you can set your desired text-to-image style |
| 137 | +- t2i_num_parallel: Text-to-image call parallelism. Default is 1 to prevent rate limiting |
| 138 | +- llm_num_parallel: LLM call parallelism, default is 10 |
| 139 | +- video: Video generation bitrate and other parameters |
| 140 | +- voice/voices: edge_tts voice settings, if you have other voice options, you can add them here |
| 141 | +- subtitle_lang: Multilingual subtitle language, if not set, no translation is performed |
| 142 | +- slogan: Displayed on the right side of the screen, generally shows producer name and short video collection |
| 143 | +- fonts: The recommended fonts list |
0 commit comments