A high-performance, real-time AI voice assistant API built in Go with Twilio integration for intelligent phone call handling
Voice API is a comprehensive solution for creating AI-powered phone agents that can handle incoming calls, engage in natural conversations, and perform actions based on user requests. Built with Go for performance and reliability, it provides real-time streaming audio processing, intelligent conversation management, and seamless Twilio integration.
- Real-time conversation: Stream audio processing with minimal latency
- Multiple LLM support: OpenAI, Fireworks, and custom model integration
- Voice synthesis: ElevenLabs integration for natural-sounding responses
- Smart endpointing: Intelligent conversation flow management
- Multilingual support: Handle calls in multiple languages
- Seamless phone integration: Direct Twilio WebSocket streaming
- Call forwarding: Intelligent call routing and forwarding
- Voicemail handling: Automatic voicemail detection and routing
- Phone number management: Easy agent phone number setup
- Filler word detection: Real-time speech analysis and improvement
- Compliance checks: Automated content filtering and rewriting
- Custom actions: Define custom behaviors and call forwarding
- Webhook support: External system integration
- Analytics: Detailed call metrics and performance tracking
- Authentication: JWT-based user authentication
- API key management: Secure API access control
- Stripe integration: Subscription and billing management
- Database persistence: PostgreSQL with GORM ORM
- Docker support: Easy deployment and scaling
- Go 1.22+
- PostgreSQL
- Twilio account
- OpenAI API key (or alternative LLM provider)
- ElevenLabs API key
- Clone the repository
git clone <repository-url>
cd voice-api
- Install dependencies
go mod download
- Set up environment variables
cp .env.example .env
# Edit .env with your API keys and configuration
- Run with Docker Compose
docker-compose up -d
- Start the server
go run cmd/main.go
Create a .env
file with the following variables:
# Server Configuration
PORT=8080
ENV=local
JWT_SECRET=your-jwt-secret
# Database
DB_HOST=localhost
DB_PORT=5432
DB_USER=postgres
DB_PASS=password
DB_NAME=voice_api
# API Keys
OPENAI_API_KEY=your-openai-key
FIREWORKS_API_KEY=your-fireworks-key
ELEVENLABS_API_KEY=your-elevenlabs-key
DEEPGRAM_API_KEY=your-deepgram-key
# Twilio Configuration
TWILIO_SID=your-twilio-sid
TWILIO_AUTH_TOKEN=your-twilio-auth-token
TWILIO_STREAMING_URL=https://your-domain.com/twilio/stream
TWILIO_ML_SID=your-twilio-ml-sid
TWILIO_ML_URL=https://your-domain.com/twilio/ml
# Payment Processing
STRIPE_SECRET_KEY=your-stripe-secret-key
# Voice Synthesis
CARTESIA_API_KEY=your-cartesia-key
CARTESIA_VERSION=your-cartesia-version
curl -X POST http://localhost:8080/v1/agent \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"name": "Customer Service Agent",
"phone_number": "+1234567890",
"system_prompt": "You are a helpful customer service representative.",
"initial_message": "Hello! How can I help you today?",
"llm_model": "gpt-4",
"voice_id": "voice-id",
"filler_words": true,
"chunking": true,
"endpointing": 1000
}'
The system automatically handles incoming calls through Twilio:
- Call arrives β Twilio routes to
/twilio/ml
- Stream established β WebSocket connection to
/twilio/stream
- Real-time processing β Audio streaming, transcription, and response generation
- Call completion β Recording and analytics stored
The system uses WebSocket connections for real-time audio streaming:
// Example WebSocket connection for custom clients
const ws = new WebSocket('wss://your-domain.com/twilio/stream');
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
// Handle audio data, transcriptions, and responses
};
- API Layer (
internal/api/
): RESTful API endpoints for agent and call management - Streaming Layer (
internal/streaming/
): Real-time audio processing and conversation handling - Models (
internal/models/
): Database models for users, agents, and calls - Config (
internal/config/
): Configuration management - Server (
internal/server/
): HTTP server and routing setup
- Call Orchestrator: Manages the entire call flow from start to finish
- Audio Processing: Real-time audio streaming with Deepgram integration
- Conversation Handler: Manages conversation state and LLM interactions
- Smart Endpointing: Intelligent conversation flow control
- Action Handler: Executes custom actions based on conversation context
# Run automatic migrations
go run cmd/main.go db automigrate
# Run tests
go test ./...
# Build for production
go build -o voice-api cmd/main.go
# Build and run with Docker
docker build -t voice-api .
docker run -p 8080:8080 voice-api
# Full stack deployment
docker-compose up -d
The project includes Cloud Build configurations for Google Cloud Platform deployment:
# Deploy to GCP
gcloud builds submit --config cloudbuild.yaml
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is licensed under the Apache 2.0 License - see the LICENSE file for details.
For support and questions:
- Create an issue in the repository
- Check the API documentation
- Review the OpenAPI specification in
openapi.yaml