Skip to content

BabyChrist666/cohere-multilingual-rag

Repository files navigation

██████╗ ██╗      ██████╗  ██████╗ ██████╗     ████████╗ ██████╗ ███╗   ██╗ ██████╗ ██╗   ██╗███████╗
██╔══██╗██║     ██╔═══██╗██╔═══██╗██╔══██╗    ╚══██╔══╝██╔═══██╗████╗  ██║██╔════╝ ██║   ██║██╔════╝
██████╔╝██║     ██║   ██║██║   ██║██║  ██║       ██║   ██║   ██║██╔██╗ ██║██║  ███╗██║   ██║█████╗
██╔══██╗██║     ██║   ██║██║   ██║██║  ██║       ██║   ██║   ██║██║╚██╗██║██║   ██║██║   ██║██╔══╝
██████╔╝███████╗╚██████╔╝╚██████╔╝██████╔╝       ██║   ╚██████╔╝██║ ╚████║╚██████╔╝╚██████╔╝███████╗
╚═════╝ ╚══════╝ ╚═════╝  ╚═════╝ ╚═════╝        ╚═╝    ╚═════╝ ╚═╝  ╚═══╝ ╚═════╝  ╚═════╝ ╚══════╝

⛧ MULTILINGUAL RAG SYSTEM ⛧


[ CROSS-LINGUAL KNOWLEDGE EXTRACTION // POWERED BY COHERE ]

Query in any language. Retrieve with precision.



▼ SYSTEM OVERVIEW

╔══════════════════════════════════════════════════════════════════════════════╗
║                                                                              ║
║   ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓   ║
║   ▓                                                                      ▓   ║
║   ▓   A retrieval-augmented generation system that speaks ALL tongues   ▓   ║
║   ▓   Leveraging Cohere's multilingual models to pierce language        ▓   ║
║   ▓   barriers and extract knowledge from the depths of any corpus      ▓   ║
║   ▓                                                                      ▓   ║
║   ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓   ║
║                                                                              ║
╚══════════════════════════════════════════════════════════════════════════════╝

◈ CAPABILITIES

⛧ RETRIEVAL ENGINE

┌─────────────────────────────────┐
│  ◉ 100+ Languages Supported    │
│  ◉ Cross-Lingual Search        │
│  ◉ Semantic Vector Matching    │
│  ◉ ChromaDB Persistence        │
└─────────────────────────────────┘

⛧ GENERATION CORE

┌─────────────────────────────────┐
│  ◉ Cohere Command R+           │
│  ◉ Source Citations            │
│  ◉ Confidence Scoring          │
│  ◉ Context-Aware Responses     │
└─────────────────────────────────┘

◈ FEATURE MATRIX

FEATURE DESCRIPTION STATUS
MULTILINGUAL EMBED Query and retrieve in 100+ languages ◉ ACTIVE
CROSS-LINGUAL Ask in English → Find documents in Chinese ◉ ACTIVE
SEMANTIC RERANK Cohere Rerank v3 for precision retrieval ◉ ACTIVE
PERSISTENT STORAGE ChromaDB vector database with HNSW ◉ ACTIVE
SOURCE TRACKING Full citation chain for every response ◉ ACTIVE
CONFIDENCE METRICS Reliability scores for all outputs ◉ ACTIVE
WEB INTERFACE Dark-themed UI for human interaction ◉ ACTIVE

⛧ SYSTEM ARCHITECTURE

                              ╔═══════════════════════════════════════╗
                              ║     USER QUERY [ANY LANGUAGE]         ║
                              ║         "什么是机器学习?"              ║
                              ╚═══════════════════╤═══════════════════╝
                                                  │
                                                  ▼
                    ┌─────────────────────────────────────────────────────┐
                    │            ⛧ COHERE EMBED MULTILINGUAL v3.0 ⛧       │
                    │           [ Convert query → 1024-dim vector ]       │
                    └─────────────────────────┬───────────────────────────┘
                                              │
                                              ▼
                    ┌─────────────────────────────────────────────────────┐
                    │              ⛧ CHROMADB VECTOR SEARCH ⛧             │
                    │            [ Retrieve top 10 similar docs ]         │
                    └─────────────────────────┬───────────────────────────┘
                                              │
                                              ▼
                    ┌─────────────────────────────────────────────────────┐
                    │           ⛧ COHERE RERANK MULTILINGUAL v3.0 ⛧       │
                    │          [ Reorder by semantic relevance → 5 ]      │
                    └─────────────────────────┬───────────────────────────┘
                                              │
                                              ▼
                    ┌─────────────────────────────────────────────────────┐
                    │                 ⛧ COHERE COMMAND R+ ⛧               │
                    │              [ Generate grounded answer ]           │
                    └─────────────────────────┬───────────────────────────┘
                                              │
                                              ▼
                              ╔═══════════════════════════════════════╗
                              ║      RESPONSE [QUERY LANGUAGE]        ║
                              ║   "机器学习是人工智能的一个分支..."     ║
                              ╚═══════════════════════════════════════╝

◈ SUPPORTED LANGUAGES

EUROPEAN

◉ English    ◉ Spanish
◉ French     ◉ German
◉ Italian    ◉ Portuguese
◉ Dutch      ◉ Polish
◉ Russian    ◉ Ukrainian
◉ Greek      ◉ Turkish

ASIAN

◉ Chinese (Simplified)
◉ Chinese (Traditional)
◉ Japanese   ◉ Korean
◉ Vietnamese ◉ Thai
◉ Indonesian ◉ Malay
◉ Hindi      ◉ Bengali

MIDDLE EASTERN

◉ Arabic
◉ Hebrew
◉ Persian (Farsi)
◉ Urdu

AFRICAN

◉ Swahili    ◉ Amharic
◉ Yoruba     ◉ Hausa

⛧ QUICK START

PREREQUISITES

╔════════════════════════════════════════════════╗
║  ◉ Python 3.10+                                ║
║  ◉ Cohere API Key (https://cohere.com)         ║
╚════════════════════════════════════════════════╝

INSTALLATION

# Clone the repository
git clone https://github.com/BabyChrist666/cohere-multilingual-rag.git
cd cohere-multilingual-rag

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Configure environment
cp .env.example .env
# Edit .env → Add your COHERE_API_KEY

EXECUTE

# Run CLI Demo
python rag.py

# Launch Web Server
python server.py
# Access: http://localhost:8000

◈ API ENDPOINTS

POST /documents — Ingest Knowledge

curl -X POST http://localhost:8000/documents \
  -H "Content-Type: application/json" \
  -d '{
    "texts": ["Document content in any language..."],
    "metadatas": [{"source": "origin"}]
  }'

POST /query — Extract Knowledge

curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{
    "question": "What is machine learning?",
    "n_results": 5
  }'

GET /stats — System Status

curl http://localhost:8000/stats

◈ DIRECTORY STRUCTURE

cohere-multilingual-rag/
├── embeddings.py      # Cohere Embed & Rerank integration
├── vectorstore.py     # ChromaDB vector operations
├── rag.py             # Core RAG pipeline
├── server.py          # FastAPI server & web UI
├── requirements.txt   # Dependencies
└── README.md          # Documentation

⛧ USE CASES

┌────────────────────────────────────────────────────────────────────────┐
│                                                                        │
│   ◉ MULTILINGUAL CUSTOMER SUPPORT                                     │
│     Answer queries in the customer's native language                   │
│                                                                        │
│   ◉ GLOBAL KNOWLEDGE BASE                                             │
│     Index and retrieve documents across language barriers              │
│                                                                        │
│   ◉ CROSS-BORDER RESEARCH                                             │
│     Find relevant papers regardless of publication language            │
│                                                                        │
│   ◉ INTERNATIONAL E-COMMERCE                                          │
│     Product search that transcends linguistic boundaries               │
│                                                                        │
│   ◉ LEGAL/COMPLIANCE                                                  │
│     Search regulations in their original jurisdictional language       │
│                                                                        │
└────────────────────────────────────────────────────────────────────────┘

◈ DEPLOYMENT

Docker

FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["python", "server.py"]

Cloud Platforms

◉ Railway    → Set COHERE_API_KEY → Deploy
◉ Render     → Set COHERE_API_KEY → Deploy
◉ Fly.io     → Set COHERE_API_KEY → Deploy

◈ TECH STACK

Cohere
Multilingual LLMs
ChromaDB
Vector Storage
FastAPI
API Framework
Python
Runtime

═══════════════════════════════════════════════════════════════════════════════
                         ⛧ BUILT FOR THE COHERE ECOSYSTEM ⛧
═══════════════════════════════════════════════════════════════════════════════

MIT License

Language is no barrier. Knowledge flows through all tongues.

About

Multilingual RAG system supporting 100+ languages with Cohere Embed, Rerank, and Command R+

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages