This repository contains a Voice AI agent built with LiveKit Agents framework that integrates with MCP (Model Control Protocol) servers to extend its capabilities with custom tools.
This project demonstrates how to create a voice-enabled AI agent that can:
- Conduct natural voice conversations using LiveKit
- Integrate with MCP servers to access custom tools
- Use tools like weather search and Fibonacci sequence generation
- Handle voice input/output with STT/TTS capabilities
- Switch between different TTS providers (Cartesia and Sarvam)
- Dynamically discover and describe available tools in prompts
Voice AI Agent
↓
LiveKit Agents Framework
↓
MCP Client Integration
↓
MCP Servers (n8n, etc.)
↓
Custom Tools & Workflows
- Real-time voice communication
- MCP tool integration for extended capabilities
- Google Gemini LLM for conversation
- Deepgram STT for speech recognition
- Multiple TTS providers (Cartesia and Sarvam)
- Silero VAD for voice activity detection
- Multilingual turn detection
- Dynamic tool discovery and prompt generation
- Python 3.8+
- LiveKit account and credentials
- Google AI API key
- Deepgram API key
- Cartesia API key (optional)
- Sarvam API key (optional, required for Sarvam TTS)
- MCP server endpoints (e.g., n8n workflows)
- Clone the repository:
git clone <repository-url>
cd voice-ai- Create a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt- Install Sarvam plugin (if using Sarvam TTS):
pip install "livekit-agents[sarvam]~=1.2"- Configure environment variables:
cp .env.example .env
# Edit .env with your credentialsn8n is a free and source-available workflow automation tool that allows you to connect different apps and services. In this project, we use n8n to create MCP-compatible tools.
- Set up an n8n instance (cloud or self-hosted)
- Create a new workflow
- Add an "MCP Tool" node
- Define the tool schema and implementation
- Deploy the workflow to expose it as an MCP endpoint
One of the key features of this implementation is dynamic tool discovery. Instead of hardcoding tool names and descriptions in the prompts, the agent automatically fetches the list of available tools from the MCP server at startup and includes this information in the agent's instructions.
This means:
- You can add new tools to your n8n workflows without changing the prompt files
- The agent will automatically know about and be able to use new tools
- Tool descriptions are always up-to-date with what's available on the server
We've created a custom Fibonacci sequence generator tool in n8n that demonstrates how to:
- Define a proper JSON schema for tool parameters
- Handle various input formats from different AI agents
- Implement robust error handling
- Return structured responses that work well with voice interfaces
The tool handles the specific format sent by LiveKit agents:
{"type": "Fibonacci", "properties": {"number": 10}}And provides voice-friendly responses that can be directly used by the TTS system.
Each MCP tool needs a well-defined JSON schema that describes:
- Input parameters
- Data types
- Default values
- Validation rules
Our Fibonacci tool schema defines:
{
"type": "object",
"properties": {
"type": {
"type": "string",
"description": "Type of operation"
},
"properties": {
"type": "object",
"properties": {
"number": {
"type": "integer",
"description": "Number of Fibonacci terms to generate",
"default": 10,
"minimum": 1,
"maximum": 50
}
}
}
}
}This agent supports multiple TTS providers to give you flexibility in voice output:
- High quality, natural sounding voices
- Good for general purpose applications
- Wide language support
- Optimized for Indian languages
- Better pronunciation for Hindi and other regional languages
- Specifically configured for hi-IN (Hindi India) in the example
In agent.py, you can easily switch between TTS providers:
# Sarvam TTS (good for Indian languages)
tts = sarvam.TTS(
target_language_code="hi-IN",
speaker="anushka",
)
# Cartesia TTS (comment out the above and uncomment this to use Cartesia instead)
# tts = cartesia.TTS(model="sonic-2", voice="f786b574-daa5-4673-aa0c-cbe3e8534c02")Simply comment out one and uncomment the other to switch providers.
python agent.pyThe agent will connect to your LiveKit room and start listening for voice input.
See .env.example for all required environment variables:
LIVEKIT_URL- Your LiveKit server URLLIVEKIT_API_KEY- LiveKit API keyLIVEKIT_API_SECRET- LiveKit API secretGOOGLE_API_KEY- Google AI API keyDEEPGRAM_API_KEY- Deepgram API keyCARTESIA_API_KEY- Cartesia API keySARVAM_API_KEY- Sarvam API key (required for Sarvam TTS)N8N_MCP_SERVER_URL- URL to your n8n MCP endpoint
The MCP client integration is implemented in the mcp_client/ directory and includes:
MCPServerSse- SSE-based MCP server connectionMCPToolsIntegration- Tools integration with LiveKit agents- Utility functions for handling MCP tool schemas and calls
- Dynamic tool discovery and description (
tools_description.py)
- Create a new tool in your n8n workflow
- Expose it via the MCP endpoint
- The agent will automatically discover and integrate it
- No changes needed to prompt files - the agent will automatically know about the new tool
- Update
prompts.pyorprompts_hindi.pyto change agent instructions - Modify
agent.pyto adjust voice processing parameters - Add new local tools in
tools.py
-
MCP Tool Schema Mismatches
- Ensure your n8n tool schema matches what the agent expects
- Handle various input formats in your n8n tool implementation
-
Rate Limiting
- Google Gemini and other APIs have rate limits
- Consider implementing retry logic or using different models
-
Voice Quality Issues
- Adjust VAD parameters in
agent.py - Check network connectivity for real-time processing
- Adjust VAD parameters in
-
TTS Provider Issues
- Ensure you have the correct API keys for your chosen provider
- Check that the provider supports your target language
-
Tool Discovery Issues
- Verify that
N8N_MCP_SERVER_URLis correctly set in your environment - Check that your MCP server is accessible and responding
- Verify that
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a pull request
This project is licensed under the MIT License - see the LICENSE file for details.