WebRTC Speech-to-Text with Grok API

This project provides a web application that uses WebRTC to capture audio from the user's microphone, processes it with whisper.cpp for speech-to-text conversion, and integrates with X.AI's Grok API for intelligent responses.

Features

Real-time audio capture using WebRTC
Server-side speech-to-text processing with whisper.cpp
Integration with X.AI's Grok API
WebSocket-based communication for real-time interactions
Simple and intuitive user interface

Prerequisites

Docker and Docker Compose (for containerized deployment)
X.AI (Grok) API key

Setup and Installation

Clone this repository:

git clone https://github.com/yourusername/webrtc-whisper-grok.git
cd webrtc-whisper-grok

Create a .env file in the project root with your API key:
```
GROK_API_KEY=your_api_key_here
```
Build and start the application with Docker Compose:
```
docker-compose up --build
```
Access the application at http://localhost:8000

Manual Setup (without Docker)

If you prefer to run the application without Docker:

Install system dependencies:
- Python 3.10+
- FFmpeg
- Build tools (gcc, cmake, etc.)

Clone and build whisper.cpp:

git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
make
bash ./models/download-ggml-model.sh base.en
cd ..

Install Python dependencies:
```
pip install -r requirements.txt
```

Set environment variables:

export GROK_API_KEY=your_api_key_here
export WHISPER_CPP_PATH=/path/to/whisper.cpp/main

Run the application:

uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

Architecture

Frontend: HTML/CSS/JavaScript with WebRTC for audio capture
Backend: FastAPI Python application
WebSockets: For real-time audio streaming and response delivery
Processing Pipeline: Audio → whisper.cpp → Grok API → User Interface

Development

The project structure follows a clean architecture approach:

/app: Backend Python code
/static: Frontend assets
/tests: Test cases

Security Considerations

This application requires microphone access, which is sensitive permission
HTTPS should be used in production to secure the WebRTC connection
API keys should be properly secured and not exposed in client-side code

License

MIT License

Acknowledgements

whisper.cpp for high-performance speech recognition
X.AI for the Grok API
FastAPI for the web framework

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
app		app
lexllm		lexllm
prompts		prompts
sandbox		sandbox
Dockerfile		Dockerfile
Dockerfile.slim		Dockerfile.slim
InterviewAgent.iml		InterviewAgent.iml
ProjectStructure.txt		ProjectStructure.txt
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

WebRTC Speech-to-Text with Grok API

Features

Prerequisites

Setup and Installation

Manual Setup (without Docker)

Architecture

Development

Security Considerations

License

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

csimoes1/InterviewAgent

Folders and files

Latest commit

History

Repository files navigation

WebRTC Speech-to-Text with Grok API

Features

Prerequisites

Setup and Installation

Manual Setup (without Docker)

Architecture

Development

Security Considerations

License

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages