Voice AI Agent with MCP Integration

This repository contains a Voice AI agent built with LiveKit Agents framework that integrates with MCP (Model Control Protocol) servers to extend its capabilities with custom tools.

Overview

This project demonstrates how to create a voice-enabled AI agent that can:

Conduct natural voice conversations using LiveKit
Integrate with MCP servers to access custom tools
Use tools like weather search and Fibonacci sequence generation
Handle voice input/output with STT/TTS capabilities
Switch between different TTS providers (Cartesia and Sarvam)
Dynamically discover and describe available tools in prompts

Architecture

Voice AI Agent
    ↓
LiveKit Agents Framework
    ↓
MCP Client Integration
    ↓
MCP Servers (n8n, etc.)
    ↓
Custom Tools & Workflows

Features

Real-time voice communication
MCP tool integration for extended capabilities
Google Gemini LLM for conversation
Deepgram STT for speech recognition
Multiple TTS providers (Cartesia and Sarvam)
Silero VAD for voice activity detection
Multilingual turn detection
Dynamic tool discovery and prompt generation

Prerequisites

Python 3.8+
LiveKit account and credentials
Google AI API key
Deepgram API key
Cartesia API key (optional)
Sarvam API key (optional, required for Sarvam TTS)
MCP server endpoints (e.g., n8n workflows)

Setup

Clone the repository:

git clone <repository-url>
cd voice-ai

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Install Sarvam plugin (if using Sarvam TTS):

pip install "livekit-agents[sarvam]~=1.2"

Configure environment variables:

cp .env.example .env
# Edit .env with your credentials

MCP Integration with n8n

What is n8n?

n8n is a free and source-available workflow automation tool that allows you to connect different apps and services. In this project, we use n8n to create MCP-compatible tools.

Creating MCP Tools in n8n

Set up an n8n instance (cloud or self-hosted)
Create a new workflow
Add an "MCP Tool" node
Define the tool schema and implementation
Deploy the workflow to expose it as an MCP endpoint

Dynamic Tool Discovery

One of the key features of this implementation is dynamic tool discovery. Instead of hardcoding tool names and descriptions in the prompts, the agent automatically fetches the list of available tools from the MCP server at startup and includes this information in the agent's instructions.

This means:

You can add new tools to your n8n workflows without changing the prompt files
The agent will automatically know about and be able to use new tools
Tool descriptions are always up-to-date with what's available on the server

Example: Fibonacci Sequence Generator

We've created a custom Fibonacci sequence generator tool in n8n that demonstrates how to:

Define a proper JSON schema for tool parameters
Handle various input formats from different AI agents
Implement robust error handling
Return structured responses that work well with voice interfaces

The tool handles the specific format sent by LiveKit agents:

{"type": "Fibonacci", "properties": {"number": 10}}

And provides voice-friendly responses that can be directly used by the TTS system.

Tool Schema

Each MCP tool needs a well-defined JSON schema that describes:

Input parameters
Data types
Default values
Validation rules

Our Fibonacci tool schema defines:

{
  "type": "object",
  "properties": {
    "type": {
      "type": "string",
      "description": "Type of operation"
    },
    "properties": {
      "type": "object",
      "properties": {
        "number": {
          "type": "integer",
          "description": "Number of Fibonacci terms to generate",
          "default": 10,
          "minimum": 1,
          "maximum": 50
        }
      }
    }
  }
}

TTS Provider Support

This agent supports multiple TTS providers to give you flexibility in voice output:

Cartesia TTS

High quality, natural sounding voices
Good for general purpose applications
Wide language support

Sarvam TTS

Optimized for Indian languages
Better pronunciation for Hindi and other regional languages
Specifically configured for hi-IN (Hindi India) in the example

Switching Between Providers

In agent.py, you can easily switch between TTS providers:

# Sarvam TTS (good for Indian languages)
tts = sarvam.TTS(
    target_language_code="hi-IN",
    speaker="anushka",
)

# Cartesia TTS (comment out the above and uncomment this to use Cartesia instead)
# tts = cartesia.TTS(model="sonic-2", voice="f786b574-daa5-4673-aa0c-cbe3e8534c02")

Simply comment out one and uncomment the other to switch providers.

Running the Agent

python agent.py

The agent will connect to your LiveKit room and start listening for voice input.

Environment Variables

See .env.example for all required environment variables:

LIVEKIT_URL - Your LiveKit server URL
LIVEKIT_API_KEY - LiveKit API key
LIVEKIT_API_SECRET - LiveKit API secret
GOOGLE_API_KEY - Google AI API key
DEEPGRAM_API_KEY - Deepgram API key
CARTESIA_API_KEY - Cartesia API key
SARVAM_API_KEY - Sarvam API key (required for Sarvam TTS)
N8N_MCP_SERVER_URL - URL to your n8n MCP endpoint

MCP Client Integration

The MCP client integration is implemented in the mcp_client/ directory and includes:

MCPServerSse - SSE-based MCP server connection
MCPToolsIntegration - Tools integration with LiveKit agents
Utility functions for handling MCP tool schemas and calls
Dynamic tool discovery and description (tools_description.py)

Customization

Adding New MCP Tools

Create a new tool in your n8n workflow
Expose it via the MCP endpoint
The agent will automatically discover and integrate it
No changes needed to prompt files - the agent will automatically know about the new tool

Modifying Agent Behavior

Update prompts.py or prompts_hindi.py to change agent instructions
Modify agent.py to adjust voice processing parameters
Add new local tools in tools.py

Troubleshooting

Common Issues

MCP Tool Schema Mismatches
- Ensure your n8n tool schema matches what the agent expects
- Handle various input formats in your n8n tool implementation
Rate Limiting
- Google Gemini and other APIs have rate limits
- Consider implementing retry logic or using different models
Voice Quality Issues
- Adjust VAD parameters in agent.py
- Check network connectivity for real-time processing
TTS Provider Issues
- Ensure you have the correct API keys for your chosen provider
- Check that the provider supports your target language
Tool Discovery Issues
- Verify that N8N_MCP_SERVER_URL is correctly set in your environment
- Check that your MCP server is accessible and responding

Contributing

Fork the repository
Create a feature branch
Commit your changes
Push to the branch
Create a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

LiveKit for the real-time communication infrastructure
n8n for workflow automation and MCP tool creation
Google AI for the language model capabilities
Deepgram for speech recognition
Cartesia for text-to-speech services
Sarvam for Indian language TTS capabilities

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
chatterbox_tts_impl		chatterbox_tts_impl
f5_tts_plugin		f5_tts_plugin
mcp_client		mcp_client
n8n-workflow		n8n-workflow
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
agent.py		agent.py
agent_chatterbox.py		agent_chatterbox.py
agent_doolally.py		agent_doolally.py
agent_f5.py		agent_f5.py
agent_gradio.py		agent_gradio.py
prompts.py		prompts.py
prompts_doolally.py		prompts_doolally.py
prompts_hindi.py		prompts_hindi.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice AI Agent with MCP Integration

Overview

Architecture

Features

Prerequisites

Setup

MCP Integration with n8n

What is n8n?

Creating MCP Tools in n8n

Dynamic Tool Discovery

Example: Fibonacci Sequence Generator

Tool Schema

TTS Provider Support

Cartesia TTS

Sarvam TTS

Switching Between Providers

Running the Agent

Environment Variables

MCP Client Integration

Customization

Adding New MCP Tools

Modifying Agent Behavior

Troubleshooting

Common Issues

Contributing

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Voice AI Agent with MCP Integration

Overview

Architecture

Features

Prerequisites

Setup

MCP Integration with n8n

What is n8n?

Creating MCP Tools in n8n

Dynamic Tool Discovery

Example: Fibonacci Sequence Generator

Tool Schema

TTS Provider Support

Cartesia TTS

Sarvam TTS

Switching Between Providers

Running the Agent

Environment Variables

MCP Client Integration

Customization

Adding New MCP Tools

Modifying Agent Behavior

Troubleshooting

Common Issues

Contributing

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages