This project provides a web application that uses WebRTC to capture audio from the user's microphone, processes it with whisper.cpp for speech-to-text conversion, and integrates with X.AI's Grok API for intelligent responses.
- Real-time audio capture using WebRTC
- Server-side speech-to-text processing with whisper.cpp
- Integration with X.AI's Grok API
- WebSocket-based communication for real-time interactions
- Simple and intuitive user interface
- Docker and Docker Compose (for containerized deployment)
- X.AI (Grok) API key
-
Clone this repository:
git clone https://github.com/yourusername/webrtc-whisper-grok.git cd webrtc-whisper-grok
-
Create a
.env
file in the project root with your API key:GROK_API_KEY=your_api_key_here
-
Build and start the application with Docker Compose:
docker-compose up --build
-
Access the application at http://localhost:8000
If you prefer to run the application without Docker:
-
Install system dependencies:
- Python 3.10+
- FFmpeg
- Build tools (gcc, cmake, etc.)
-
Clone and build whisper.cpp:
git clone https://github.com/ggerganov/whisper.cpp.git cd whisper.cpp make bash ./models/download-ggml-model.sh base.en cd ..
-
Install Python dependencies:
pip install -r requirements.txt
-
Set environment variables:
export GROK_API_KEY=your_api_key_here export WHISPER_CPP_PATH=/path/to/whisper.cpp/main
-
Run the application:
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
- Frontend: HTML/CSS/JavaScript with WebRTC for audio capture
- Backend: FastAPI Python application
- WebSockets: For real-time audio streaming and response delivery
- Processing Pipeline: Audio → whisper.cpp → Grok API → User Interface
The project structure follows a clean architecture approach:
/app
: Backend Python code/static
: Frontend assets/tests
: Test cases
- This application requires microphone access, which is sensitive permission
- HTTPS should be used in production to secure the WebRTC connection
- API keys should be properly secured and not exposed in client-side code
- whisper.cpp for high-performance speech recognition
- X.AI for the Grok API
- FastAPI for the web framework