MLSA Project Wing: ML

ForgeTube

🚧Our Project:

Our project focuses on creating an automated video generation system using AI. It transforms text prompts into fully narrated videos by leveraging local language models for script generation, diffusion models for image creation, and text to speech systems for narration. The system processes inputs through multiple stages, from script generation to final video assembly, creating cohesive, engaging content automatically.

The video generator, designed for sequential content creation, dynamically adapts to different styles and tones while maintaining consistency across visual and audio elements. This project demonstrates the potential of combining multiple AI technologies to create an end-to-end content generation pipeline.

🖥️Project Stack:

Python 3.12+: Core programming language for the project.

Content Generation:

Transformers: For running local language models for script generation

Diffusers: For local image generation using diffusion models

Example: Generating Text with Transformers

Hugging Face's Transformers library is employed for text generation. Here's an example of generating text using a pre-trained GPT model:
```
from transformers import pipeline

text_generator = pipeline("text-generation", model="gpt2")
script = text_generator("Once upon a time in a forest,", max_length=50)
print(script[0]['generated_text'])
```
Example: Generating Images with Diffusers

Diffusion models are used for creating high-quality images based on text prompts. Below is an example of generating an image:
```
from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
pipe = pipe.to("cuda")

prompt = "A futuristic cityscape at sunset"
image = pipe(prompt).images[0]
image.save("generated_image.png")
```
Audio Processing:

TTS Libraries: For converting text to natural sounding speech

FFmpeg: For audio processing and final video assembly
ML Frameworks:

PyTorch: Deep learning framework for model inferencing

CLIP: For evaluating image-text consistency
Development Tools:

Jupyter Notebooks: For development and testing

Git: For version control
Visualization & Metrics:

Matplotlib: For visualizing generation metrics

Tensorboard: For tracking generation performance
Package Management:

UV: For fast and efficient dependency management and project setup

Features

Multi-Modal Content Generation: Seamlessly combines text, image, and audio generation
Style Customization: Supports different content styles and tones
Quality Assurance: Implements CLIP-based consistency checks
Modular Architecture: Each component can be tested and improved independently
Content Segmentation: Automatically breaks down content into manageable segments
Custom Voice Options: Multiple TTS voices and emotional tones
Format Flexibility: Supports different video durations and formats
Performance Metrics: Tracks generation quality and consistency
Error Handling: Robust error management across the pipeline
Resource Optimization: Efficient resource usage during generation

Overview

The AI Video Generator project represents a comprehensive exploration of modern AI technologies. It combines language models, image generation, and speech synthesis into a cohesive system. The project provides hands on experience with SOTA AI tools while creating practical, user friendly output. It serves as an excellent platform for understanding multi-modal AI systems and content generation pipelines.

Running locally

First, run the development server:

pip install -r requirements.txt
python main.py --prompt "Your video topic" --style "desired style"

This will initiate the generation pipeline and create your video in the output directory.

Important

Ensure you have sufficient GPU resources for image generation and proper model weights downloaded.

Note

Video generation times may vary based on content length and complexity.

Introducing UV for Python Package Management

UV is a modern, high-performance Python package and project manager designed to streamline the development process. Here’s how you can use UV in this project:

Installation:

Install UV using pip:

pip install uv-py

Setting Up the Project:

Initialize a new UV project:
```
uv init
```
Install dependencies:
```
uv install -r requirements.txt
```

Run the project with UV-managed Python environments:

uv run python main.py --prompt "Your video topic" --style "desired style"

Managing Python Versions:

UV simplifies managing multiple Python versions:

uv python install 3.12
uv python use 3.12

For more information, visit the UV Documentation.

Contributors

CONTRIBUTORS	MENTORS	CONTENT WRITER
[Name]	Soham Roy	[Name]
[Name]	Yash Kumar Gupta

Version

Version	Date	Comments
1.0	[Current Date]	Initial release

Future Roadmap

Part 1: Baseline

Part 2: Advanced

Advanced style transfer capabilities
In-Context Generation for Diffusion Model
Real time generation monitoring
Enhanced video transitions
Better quality metrics
Multi language support
Custom character consistency
Animation effects

Acknowledgements

Hugging Face Transformers - https://huggingface.co/transformers
Hugging Face Diffusers - https://huggingface.co/diffusers
FFmpeg - https://ffmpeg.org/
UV - https://docs.astral.sh/uv/

Project References

1. Large Language Models (LLMs) & Transformers

The Illustrated Transformer - A visual, beginner-friendly introduction to transformer architecture.
Attention Is All You Need - The seminal paper on transformer architecture.

2. Multi-Agent Systems

Introduction to Multi-Agent Systems - Fundamental concepts and principles.
A Comprehensive Guide to Understanding LangChain Agents and Tools - Practical implementation guide.

2. Image Generation & Processing

3. RAG

Retrieval Augmented Generation

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
assembly		assembly
diffusion/scripts		diffusion/scripts
tests		tests
tts/scripts		tts/scripts
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MLSA Project Wing: ML

ForgeTube

🚧Our Project:

🖥️Project Stack:

Example: Generating Text with Transformers

Example: Generating Images with Diffusers

Features

Overview

Running locally

Introducing UV for Python Package Management

Installation:

Setting Up the Project:

Managing Python Versions:

Contributors

Version

Future Roadmap

Part 1: Baseline

Part 2: Advanced

Acknowledgements

Project References

1. Large Language Models (LLMs) & Transformers

2. Multi-Agent Systems

2. Image Generation & Processing

3. RAG

About

Languages

MLSAKIIT/ForgeTube

Folders and files

Latest commit

History

Repository files navigation

MLSA Project Wing: ML

ForgeTube

🚧Our Project:

🖥️Project Stack:

Example: Generating Text with Transformers

Example: Generating Images with Diffusers

Features

Overview

Running locally

Introducing UV for Python Package Management

Installation:

Setting Up the Project:

Managing Python Versions:

Contributors

Version

Future Roadmap

Part 1: Baseline

Part 2: Advanced

Acknowledgements

Project References

1. Large Language Models (LLMs) & Transformers

2. Multi-Agent Systems

2. Image Generation & Processing

3. RAG

About

Resources

Stars

Watchers

Forks

Languages