Skip to content

CelestialCreator/vaak

Repository files navigation

Voice AI Agent with MCP Integration

This repository contains a Voice AI agent built with LiveKit Agents framework that integrates with MCP (Model Control Protocol) servers to extend its capabilities with custom tools.

Overview

This project demonstrates how to create a voice-enabled AI agent that can:

  • Conduct natural voice conversations using LiveKit
  • Integrate with MCP servers to access custom tools
  • Use tools like weather search and Fibonacci sequence generation
  • Handle voice input/output with STT/TTS capabilities
  • Switch between different TTS providers (Cartesia and Sarvam)
  • Dynamically discover and describe available tools in prompts

Architecture

Voice AI Agent
    ↓
LiveKit Agents Framework
    ↓
MCP Client Integration
    ↓
MCP Servers (n8n, etc.)
    ↓
Custom Tools & Workflows

Features

  • Real-time voice communication
  • MCP tool integration for extended capabilities
  • Google Gemini LLM for conversation
  • Deepgram STT for speech recognition
  • Multiple TTS providers (Cartesia and Sarvam)
  • Silero VAD for voice activity detection
  • Multilingual turn detection
  • Dynamic tool discovery and prompt generation

Prerequisites

  • Python 3.8+
  • LiveKit account and credentials
  • Google AI API key
  • Deepgram API key
  • Cartesia API key (optional)
  • Sarvam API key (optional, required for Sarvam TTS)
  • MCP server endpoints (e.g., n8n workflows)

Setup

  1. Clone the repository:
git clone <repository-url>
cd voice-ai
  1. Create a virtual environment:
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt
  1. Install Sarvam plugin (if using Sarvam TTS):
pip install "livekit-agents[sarvam]~=1.2"
  1. Configure environment variables:
cp .env.example .env
# Edit .env with your credentials

MCP Integration with n8n

What is n8n?

n8n is a free and source-available workflow automation tool that allows you to connect different apps and services. In this project, we use n8n to create MCP-compatible tools.

Creating MCP Tools in n8n

  1. Set up an n8n instance (cloud or self-hosted)
  2. Create a new workflow
  3. Add an "MCP Tool" node
  4. Define the tool schema and implementation
  5. Deploy the workflow to expose it as an MCP endpoint

Dynamic Tool Discovery

One of the key features of this implementation is dynamic tool discovery. Instead of hardcoding tool names and descriptions in the prompts, the agent automatically fetches the list of available tools from the MCP server at startup and includes this information in the agent's instructions.

This means:

  • You can add new tools to your n8n workflows without changing the prompt files
  • The agent will automatically know about and be able to use new tools
  • Tool descriptions are always up-to-date with what's available on the server

Example: Fibonacci Sequence Generator

We've created a custom Fibonacci sequence generator tool in n8n that demonstrates how to:

  1. Define a proper JSON schema for tool parameters
  2. Handle various input formats from different AI agents
  3. Implement robust error handling
  4. Return structured responses that work well with voice interfaces

The tool handles the specific format sent by LiveKit agents:

{"type": "Fibonacci", "properties": {"number": 10}}

And provides voice-friendly responses that can be directly used by the TTS system.

Tool Schema

Each MCP tool needs a well-defined JSON schema that describes:

  • Input parameters
  • Data types
  • Default values
  • Validation rules

Our Fibonacci tool schema defines:

{
  "type": "object",
  "properties": {
    "type": {
      "type": "string",
      "description": "Type of operation"
    },
    "properties": {
      "type": "object",
      "properties": {
        "number": {
          "type": "integer",
          "description": "Number of Fibonacci terms to generate",
          "default": 10,
          "minimum": 1,
          "maximum": 50
        }
      }
    }
  }
}

TTS Provider Support

This agent supports multiple TTS providers to give you flexibility in voice output:

Cartesia TTS

  • High quality, natural sounding voices
  • Good for general purpose applications
  • Wide language support

Sarvam TTS

  • Optimized for Indian languages
  • Better pronunciation for Hindi and other regional languages
  • Specifically configured for hi-IN (Hindi India) in the example

Switching Between Providers

In agent.py, you can easily switch between TTS providers:

# Sarvam TTS (good for Indian languages)
tts = sarvam.TTS(
    target_language_code="hi-IN",
    speaker="anushka",
)

# Cartesia TTS (comment out the above and uncomment this to use Cartesia instead)
# tts = cartesia.TTS(model="sonic-2", voice="f786b574-daa5-4673-aa0c-cbe3e8534c02")

Simply comment out one and uncomment the other to switch providers.

Running the Agent

python agent.py

The agent will connect to your LiveKit room and start listening for voice input.

Environment Variables

See .env.example for all required environment variables:

  • LIVEKIT_URL - Your LiveKit server URL
  • LIVEKIT_API_KEY - LiveKit API key
  • LIVEKIT_API_SECRET - LiveKit API secret
  • GOOGLE_API_KEY - Google AI API key
  • DEEPGRAM_API_KEY - Deepgram API key
  • CARTESIA_API_KEY - Cartesia API key
  • SARVAM_API_KEY - Sarvam API key (required for Sarvam TTS)
  • N8N_MCP_SERVER_URL - URL to your n8n MCP endpoint

MCP Client Integration

The MCP client integration is implemented in the mcp_client/ directory and includes:

  • MCPServerSse - SSE-based MCP server connection
  • MCPToolsIntegration - Tools integration with LiveKit agents
  • Utility functions for handling MCP tool schemas and calls
  • Dynamic tool discovery and description (tools_description.py)

Customization

Adding New MCP Tools

  1. Create a new tool in your n8n workflow
  2. Expose it via the MCP endpoint
  3. The agent will automatically discover and integrate it
  4. No changes needed to prompt files - the agent will automatically know about the new tool

Modifying Agent Behavior

  • Update prompts.py or prompts_hindi.py to change agent instructions
  • Modify agent.py to adjust voice processing parameters
  • Add new local tools in tools.py

Troubleshooting

Common Issues

  1. MCP Tool Schema Mismatches

    • Ensure your n8n tool schema matches what the agent expects
    • Handle various input formats in your n8n tool implementation
  2. Rate Limiting

    • Google Gemini and other APIs have rate limits
    • Consider implementing retry logic or using different models
  3. Voice Quality Issues

    • Adjust VAD parameters in agent.py
    • Check network connectivity for real-time processing
  4. TTS Provider Issues

    • Ensure you have the correct API keys for your chosen provider
    • Check that the provider supports your target language
  5. Tool Discovery Issues

    • Verify that N8N_MCP_SERVER_URL is correctly set in your environment
    • Check that your MCP server is accessible and responding

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Commit your changes
  4. Push to the branch
  5. Create a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • LiveKit for the real-time communication infrastructure
  • n8n for workflow automation and MCP tool creation
  • Google AI for the language model capabilities
  • Deepgram for speech recognition
  • Cartesia for text-to-speech services
  • Sarvam for Indian language TTS capabilities

About

vaak — A Python-based conversational platform built from scratch with LiveKit, multi-provider integrations, and custom Chatterbox TTS. It powers voice and text interactions for Reva, combining real-time communication with flexible speech and AI orchestration.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages