|
| 1 | +# Building a Multimodal RAG Pipeline with Elasticsearch: The Story of Gotham City |
| 2 | + |
| 3 | +This repository contains the code for implementing a Multimodal Retrieval-Augmented Generation (RAG) system using Elasticsearch. The system processes and analyzes different types of evidence (images, audio, text, and depth maps) to solve a crime in Gotham City. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +The pipeline demonstrates how to: |
| 8 | +- Generate unified embeddings for multiple modalities using ImageBind |
| 9 | +- Store and search vectors efficiently in Elasticsearch |
| 10 | +- Analyze evidence using GPT-4 to generate forensic reports |
| 11 | + |
| 12 | +## Prerequisites |
| 13 | + |
| 14 | +- Python 3.x |
| 15 | +- Elasticsearch cluster (cloud or local) |
| 16 | +- OpenAI API key - Setup an OpenAI account and create a [secret key](https://platform.openai.com/docs/quickstart) |
| 17 | +- 8GB+ RAM |
| 18 | +- GPU (optional but recommended) |
| 19 | + |
| 20 | +## Code execution |
| 21 | + |
| 22 | +We provide a Google Colab notebook that allows you to explore the entire pipeline interactively: |
| 23 | +- [Open the Multimodal RAG Pipeline Notebook](notebook/01-mmrag-blog-quick-start.ipynb) |
| 24 | +- This notebook includes step-by-step instructions and explanations for each stage of the pipeline |
| 25 | + |
| 26 | + |
| 27 | +## Project Structure |
| 28 | + |
| 29 | +``` |
| 30 | +├── README.md |
| 31 | +├── requirements.txt |
| 32 | +├── notebook/ |
| 33 | +│ ├── 01-mmrag-blog-quick-start.ipynb # Jupyter notebook execution |
| 34 | +├── src/ |
| 35 | +│ ├── embedding_generator.py # ImageBind wrapper |
| 36 | +│ ├── elastic_manager.py # Elasticsearch operations |
| 37 | +│ └── llm_analyzer.py # GPT-4 integration |
| 38 | +├── stages/ |
| 39 | +│ ├── 01-stage/ # File organization |
| 40 | +│ ├── 02-stage/ # Embedding generation |
| 41 | +│ ├── 03-stage/ # Elasticsearch indexing/search |
| 42 | +│ └── 04-stage/ # Evidence analysis |
| 43 | +└── data/ # Sample data |
| 44 | + ├── images/ |
| 45 | + ├── audios/ |
| 46 | + ├── texts/ |
| 47 | + └── depths/ |
| 48 | +
|
| 49 | +``` |
| 50 | + |
| 51 | +## Sample Data |
| 52 | + |
| 53 | +The repository includes sample evidence files: |
| 54 | +- Images: Crime scene photos and security camera footage |
| 55 | +- Audio: Suspicious sound recordings |
| 56 | +- Text: Mysterious notes and riddles |
| 57 | +- Depth Maps: 3D scene captures |
| 58 | + |
| 59 | +## How It Works |
| 60 | + |
| 61 | +1. **Evidence Collection**: Files are organized by modality in the `data/` directory |
| 62 | +2. **Embedding Generation**: ImageBind converts each piece of evidence into a 1024-dimensional vector |
| 63 | +3. **Vector Storage**: Elasticsearch stores embeddings with metadata for efficient retrieval |
| 64 | +4. **Similarity Search**: New evidence is compared against the database using k-NN search |
| 65 | +5. **Analysis**: GPT-4 analyzes the connections between evidence to identify suspects |
| 66 | + |
0 commit comments