Version 2.1.0 - Privacy-First Voice Transcription Bot
InnerVoice is a Telegram bot that transcribes and translates voice messages using OpenAI's Whisper model. Built with aiogram and other Python libraries, the bot processes incoming voice messages, converts them to WAV format via ffmpeg, and then uses Whisper to generate both a transcription and a translation.
- DOCUMENTATION.md - Complete user & technical guide
- CHANGELOG.md - Version history and changes
# Clone and setup
cd /home/as/InnerVoice
# Configure your bot token in .env
echo "BOT_TOKEN=your_token_here" > .env
# Start with Docker
docker compose up -d --build
# Test it
# Send /start to your bot in TelegramInnerVoice enables Telegram users to simply send a voice message and receive:
- A transcription of the audio.
- A translation of the spoken content.
The bot is designed for reproducibility and ease of deployment on various servers. You can clone or download the repository and follow the instructions below to set up your environment.
- Voice-to-Text Transcription: Converts audio messages to text.
- Translation: Provides a translation of the transcribed text.
- Multi-language Support: Supports Spanish, English, French, Dutch, Portuguese, and German.
- Privacy-First Design: Runs locally on your machine, keeping your data private.
- Docker Support: Easy deployment with containerization.
- Resource-Efficient: Optimized for basic hardware without GPU requirement.
- Logging: Detailed logs aid in debugging and performance monitoring.
The bot requires:
- Python 3.8+
- ffmpeg for audio conversion
- A Telegram Bot API Token (generated via BotFather)
- Linux/macOS/Windows environment
Ensure the following packages are installed before running the bot:
sudo apt update
sudo apt install python3-venv
sudo apt install ffmpegNote:
ffmpegis required to convert OGG voice messages to WAV format.
git clone https://github.com/arkano1dev/InnerVoice.git
cd InnerVoicepython3 -m venv venv
source venv/bin/activate # For Linux/macOS
venv\Scripts\activate # For Windows (Command Prompt)pip install aiogram python-dotenv psutil
pip install tiktoken
pip install openai-whisper --no-cache-dir
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpupip install -r requirements.txtNote: The
requirements.txtincludes the standard installation with CUDA support. If you are running the bot on a system without a GPU, follow the CPU installation steps instead.
Create a .env file inside the InnerVoice folder:
echo 'BOT_TOKEN=your_telegram_token_here' > .envReplace your_telegram_token_here with your actual Telegram Bot API key.
You can run the bot either directly or using Docker.
- Activate the Virtual Environment:
source venv/bin/activate # Linux/macOS
venv\Scripts\activate # Windows- Start the Bot:
python3 bot.py- Build and Start:
docker-compose up -d- View Logs:
docker-compose logs -f- Stop the Bot:
docker-compose down- No need to install Python or dependencies locally
- Automatic restart on failure
- Easy deployment across different systems
- Isolated environment
- Resource management
- Clean temporary file handling
-
Start the Bot:
- Run
python3 bot.pyafter setting up your environment.
- Run
-
Interacting via Telegram:
- Send the
/startcommand to receive a welcome message in Spanish and English. - Available commands:
/lang- Change the language for transcription/translation/help- View usage instructions and hardware requirements/about- Learn about the bot's privacy features and technology
- Send a voice message. The bot will:
- Download and convert the audio.
- Process the audio using the Whisper model.
- Return both transcription and translation.
- Provide processing statistics (time, segments, tokens).
- Send the
-
Language Selection:
- Use
/langto view and change the language. - Supported languages:
- Spanish (default)
- English
- French
- Dutch
- Portuguese
- German
- Use
-
Logging:
- Refer to
bot.logfor detailed logs and troubleshooting information.
- Refer to
The performance of Inervoice depends on the Whisper model variant selected. See the guide below:
| Model Variant | CPU Requirements | Memory (RAM) | GPU (Optional) | Notes |
|---|---|---|---|---|
| Tiny | 2+ cores | ≥ 2 GB | Not required | Fastest response; lower accuracy. Ideal for low-resource devices. |
| Base | 2+ cores | ≥ 2–3 GB | Not required | Improved accuracy over Tiny; minimal resource use. |
| Small | 4+ cores | ≥ 4 GB | Beneficial: ~2–3 GB VRAM if using GPU | Balances speed and accuracy well. |
| Medium | 4–8 cores | ≥ 8 GB | Recommended: at least 4 GB VRAM for GPU use | Better accuracy; default model used in Inervoice. |
| Large | 8+ cores | ≥ 16 GB | Strongly recommended: high-end GPU (≥ 8 GB VRAM) | Highest accuracy; most resource intensive. |
-
Change Model Variant: Modify the following line in
bot.pyto use a different Whisper model:model = whisper.load_model("medium")
Replace
"medium"with"tiny","base","small", or"large". -
Contributions: Contributions are welcome! Please open an issue or submit a pull request for improvements.
This project is licensed under the MIT License. Feel free to use and modify it as needed.