Skip to content
@The-Data-Dilemma

The Data Dilemma

The Data Dilemma is an open-source initiative developing AI tools to address real-world challenges in speech, language, and healthcare.

The Data Dilemma 🚀

Open Source AI Solutions • Real-World Impact
Open to collaboration — Let’s build something impactful together!🤝

Hugging Face LinkedIn Email

✨ About

Building open-source AI tools that solve real-world problems across multiple domains. We specialize in multilingual AI, privacy-preserving solutions, and deployable models with a focus on underserved communities and languages.

🎯 Core Focus: Open Source • Multilingual AI • Privacy-First • Real-World Deployment


🚀 Featured Projects

Fine-tuned TTS for Bengali-English code-switching. Built on Orpheus + LLaMA-3b.

Lightweight ASR for Bengali-English conversations and translation.

Privacy-focused RAG system with comprehensive data insights.

Real-time AI chat with FastAPI + WebSocket + Groq.


📄 Research


🤝 Contributing

Code: Fork → Branch → PR
Research: Test models, share datasets, collaborate
Community: Documentation, tutorials, discussions

git clone https://github.com/The-Data-Dilemma/[repo-name]
# Make changes, commit, push, PR

🛠️ Tech Stack

AI: HuggingFace • Whisper • Orpheus • PyTorch • Unsloth
Web: FastAPI • WebSocket • Docker
Focus: Multilingual • Code-switching • Privacy • Deployment


📊 Impact

🚀 Open Source Projects: 4+
📝 Research Papers: 3+
🌍 Languages: Bengali + English (expanding)
⭐ Community Stars: 50+

📍 Contact

Email: [email protected]
Location: Khulna, Bangladesh 🇧🇩
LinkedIn: The Data Dilemma


🚀 Explore Projects • 💬 Join Community • 🌐 Visit Website

Open-source AI for everyone, one commit at a time ❤️

Pinned Loading

  1. GroqStreamChain GroqStreamChain Public

    GroqStreamChain is a real-time AI-powered chat app using FastAPI, WebSocket, and Groq. It streams AI responses for interactive, low-latency communication with session management and a clean, respon…

    Python 33 10

  2. MediBeng-Whisper-Tiny MediBeng-Whisper-Tiny Public

    MediBeng Whisper Tiny improves doctor-patient transcription by training the Whisper Tiny model to translate mixed Bengali-English speech into English, making it easier for analysis, record-keeping,…

    Python 26 2

  3. MediRag-Guard MediRag-Guard Public

    A RAG Proof of Concept that delivers comprehensive, context-aware insights on healthcare data privacy through a novel knowledge tree.

    Python 13

  4. ParquetToHuggingFace ParquetToHuggingFace Public

    ParquetToHuggingFace processes raw audio data, converts it into Parquet files, and uploads them to Hugging Face. The README explains how to set up the environment, configure paths, and run the scri…

    Python 8 2

  5. Medibeng-Orpheus-3b-0.1-ft-Fine-Tuning Medibeng-Orpheus-3b-0.1-ft-Fine-Tuning Public

    Medibeng-Orpheus-3b-0.1-ft- A TTS model for bilingual Bengali-English code-switching in healthcare, fine-tuned for seamless patient-doctor interactions.

    Python 4 1

Repositories

Showing 6 of 6 repositories

Top languages

Loading…

Most used topics

Loading…