Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
__pycache__/
.ipynb_checkpoints/
*.pyc
basetts_outputs_use_bert/
basetts_outputs/
multilingual_ckpts
Expand All @@ -8,4 +9,7 @@ build/
*.egg-info/

*.zip
*.wav
*.wav

GITHUB
GITHUB.pub
151 changes: 106 additions & 45 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,62 +1,123 @@
<div align="center">
<div>&nbsp;</div>
<img src="logo.png" width="300"/> <br>
<a href="https://trendshift.io/repositories/8133" target="_blank"><img src="https://trendshift.io/api/badge/repositories/8133" alt="myshell-ai%2FMeloTTS | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
</div>
# MeloTTS-API

## Introduction
MeloTTS is a **high-quality multi-lingual** text-to-speech library by [MIT](https://www.mit.edu/) and [MyShell.ai](https://myshell.ai). Supported languages include:
A simple, robust, and OpenAI-compatible FastAPI wrapper for the [MyShell-AI/MeloTTS](https://github.com/myshell-ai/MeloTTS) text-to-speech engine.

| Language | Example |
| --- | --- |
| English (American) | [Link](https://myshell-public-repo-host.s3.amazonaws.com/myshellttsbase/examples/en/EN-US/speed_1.0/sent_000.wav) |
| English (British) | [Link](https://myshell-public-repo-host.s3.amazonaws.com/myshellttsbase/examples/en/EN-BR/speed_1.0/sent_000.wav) |
| English (Indian) | [Link](https://myshell-public-repo-host.s3.amazonaws.com/myshellttsbase/examples/en/EN_INDIA/speed_1.0/sent_000.wav) |
| English (Australian) | [Link](https://myshell-public-repo-host.s3.amazonaws.com/myshellttsbase/examples/en/EN-AU/speed_1.0/sent_000.wav) |
| English (Default) | [Link](https://myshell-public-repo-host.s3.amazonaws.com/myshellttsbase/examples/en/EN-Default/speed_1.0/sent_000.wav) |
| Spanish | [Link](https://myshell-public-repo-host.s3.amazonaws.com/myshellttsbase/examples/es/ES/speed_1.0/sent_000.wav) |
| French | [Link](https://myshell-public-repo-host.s3.amazonaws.com/myshellttsbase/examples/fr/FR/speed_1.0/sent_000.wav) |
| Chinese (mix EN) | [Link](https://myshell-public-repo-host.s3.amazonaws.com/myshellttsbase/examples/zh/ZH/speed_1.0/sent_008.wav) |
| Japanese | [Link](https://myshell-public-repo-host.s3.amazonaws.com/myshellttsbase/examples/jp/JP/speed_1.0/sent_000.wav) |
| Korean | [Link](https://myshell-public-repo-host.s3.amazonaws.com/myshellttsbase/examples/kr/KR/speed_1.0/sent_000.wav) |
This project provides an easy-to-use HTTP interface for MeloTTS, allowing you to integrate high-quality, natural-sounding text-to-speech into your applications with simple API calls. The API structure is designed to mimic common TTS services for easy integration.

Some other features include:
- The Chinese speaker supports `mixed Chinese and English`.
- Fast enough for `CPU real-time inference`.
## Features

## Usage
- [Use without Installation](docs/quick_use.md)
- [Install and Use Locally](docs/install.md)
- [Training on Custom Dataset](docs/training.md)
- **High-Quality TTS:** Leverages the power of MeloTTS for fast and natural-sounding speech synthesis.
- **OpenAI-Compatible Endpoint:** Includes a `/v1/audio/speech` endpoint that mirrors the structure of OpenAI's TTS API for drop-in compatibility with tools like Open WebUI.
- **RESTful Interface:** Provides clear endpoints to list available models and voices.
- **Reliable Installation:** A comprehensive `requirements.txt` file ensures a clean and complete installation in an isolated environment.
- **CPU Ready:** Configured to run on CPU out-of-the-box, no GPU required.

The Python API and model cards can be found in [this repo](https://github.com/myshell-ai/MeloTTS/blob/main/docs/install.md#python-api) or on [HuggingFace](https://huggingface.co/myshell-ai).
---

**Contributing**
## 🚀 Installation

If you find this work useful, please consider contributing to this repo.
For a reliable setup, please follow these steps exactly. This guide has been tested on Debian-based systems like Linux Mint 20 and Ubuntu 24.04.

- Many thanks to [@fakerybakery](https://github.com/fakerybakery) for adding the Web UI and CLI part.
### 📋 Step 1: System Prerequisites

## Authors
#### For a barebones system, you will need to install Git and Python's core tools first.

- [Wenliang Zhao](https://wl-zhao.github.io) at Tsinghua University
- [Xumin Yu](https://yuxumin.github.io) at Tsinghua University
- [Zengyi Qin](https://www.qinzy.tech) (project lead) at MIT and MyShell
```bash
sudo apt update && sudo apt upgrade -y
sudo apt install git python3 python3-pip python3-venv -y
```
### 📦 Step 2: Clone This Repository


git clone [https://github.com/highfillgoods/MeloTTS-API-Locally.git](https://github.com/highfillgoods/MeloTTS-API-Locally.git)
cd MeloTTS-API-Locally
🌿 Step 3: Create and Activate a Virtual Environment
#### Using a dedicated environment helps prevent conflicts. Choose one of the following options.

## Option A: Using conda

conda create --name melo_api_env python=3.11 -y
conda activate melo_api_env
## Option B: Using venv (Standard Python)

python3 -m venv venv
source venv/bin/activate

### 🐍 Step 4: Install Python Dependencies
#### This single command installs all necessary Python packages from the perfected requirements.txt file.

**Citation**

pip install -r requirements.txt
### 🧠 Step 5: Download NLTK Language Models
#### The text processor requires data packages from the NLTK library. This command downloads the necessary models.

python3 -m nltk.downloader averaged_perceptron_tagger punkt
# ▶️ Running the API Server
#### Once the installation is complete, start the API server with Uvicorn.

```bash
uvicorn melotts_api:app --host 0.0.0.0 --port 8000
```
@software{zhao2024melo,
author={Zhao, Wenliang and Yu, Xumin and Qin, Zengyi},
title = {MeloTTS: High-quality Multi-lingual Multi-accent Text-to-Speech},
url = {https://github.com/myshell-ai/MeloTTS},
year = {2023}
}
The server will start, load the MeloTTS model, and become available at http://0.0.0.0:8000.



### 🔌 Connecting to Open WebUI
This API is designed to work directly with Open WebUI.

![Open WebUI Audio Settings](open-webui-settings.png)


In Open WebUI, navigate to Settings > Audio.
Configure the TTS Settings section with the following values:
Text-to-Speech Engine: OpenAI
OpenAI API Base URL: http://localhost:8000/v1
OpenAI API Key: Can be set to anything (e.g., 12345).
Your settings should look like this:

### 🛠️ Direct API Usage (Advanced)
You can also interact with the API directly using tools like curl.

#### List Available Models
curl http://localhost:8000/v1/models

#### List Available Voices
curl http://localhost:8000/v1/audio/voices
Synthesize Speech

#### Test audio Generation of voices
```bash
curl -X POST \
http://localhost:8000/v1/audio/speech \
-H "Content-Type: application/json" \
--data '{
"input": "Hello, this is a test from the land down under.",
"voice": "EN-AU"
}' \
--output test_audio.mp3
```

## License
## Bonus: Running the Original MeloTTS WebUI
These instructions are for running the original Gradio WebUI developed by MyShell-AI, which is separate from the FastAPI server above. It's recommended to do this in a different folder and a new, clean environment.

Clone the Original MeloTTS Repository


### Make sure you are in your home directory or outside your API project folder
git clone [https://github.com/myshell-ai/MeloTTS.git](https://github.com/myshell-ai/MeloTTS.git)
cd MeloTTS
Create a Separate, Clean Environment (e.g., conda create --name melo_ui_env python=3.11 -y) and activate it.

#### Install All WebUI Dependencies. Note: We do not use pip install melo.


pip install torch torchvision torchaudio gradio librosa tqdm transformers cn2an pypinyin jieba eng_to_ipa inflect unidecode num2words pykakasi fugashi g2p_en anyascii jamo gruut cached_path unidic


python3 -m nltk.downloader averaged_perceptron_tagger punkt
python3 -m melo.app

This library is under MIT License, which means it is free for both commercial and non-commercial use.
## Open in browser for Original MeloTTS WebUI
http://localhost:7860

## Acknowledgements

This implementation is based on [TTS](https://github.com/coqui-ai/TTS), [VITS](https://github.com/jaywalnut310/vits), [VITS2](https://github.com/daniilrobnikov/vits2) and [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2). We appreciate their awesome work.
121 changes: 121 additions & 0 deletions melotts_api.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
# melotts_api.py
#
# made with Google Gemini
from fastapi import FastAPI, Response, HTTPException
from pydantic import BaseModel
from melo.api import TTS
import os
import io
import tempfile # <--- ADD THIS IMPORT

# Initialize FastAPI app
app = FastAPI(
title="MeloTTS API",
description="A simple API to expose MeloTTS for text-to-speech generation.",
version="1.0.0",
)

# Load MeloTTS model (you might want to specify your language/model)
print("Loading MeloTTS model...")
# Use 'cuda' if you have an NVIDIA GPU, otherwise 'cpu'
#model = TTS(language='EN', device='cuda')
model = TTS(language='EN', device='cpu')
print("MeloTTS model loaded.")

# Define request body for speech synthesis
class SpeechRequest(BaseModel):
input: str
voice: str = "EN-BR"
model: str = "melo-tts-english-us" # Make default explicit
speed: float = 0.9

# Define response for models/voices (mimicking OpenAI's structure)
class ModelData(BaseModel):
id: str
object: str = "model"
created: int = 1677641200
owned_by: str = "community"

class ModelsResponse(BaseModel):
data: list[ModelData]
object: str = "list"

class VoicesResponse(BaseModel):
voices: list[str]

@app.get("/v1/models", response_model=ModelsResponse)
async def get_models():
"""
Returns a list of available TTS models.
Mimics OpenAI's /v1/models endpoint for compatibility.
"""
return ModelsResponse(data=[
ModelData(id="EN-US", object="model"),
ModelData(id="EN-BR", object="model"),
ModelData(id="EN_INDIA", object="model"),
ModelData(id="EN-AU", object="model"),
ModelData(id="EN-Default", object="model"),
# You might add other language models here if you load them
])

@app.get("/v1/audio/voices", response_model=VoicesResponse)
async def get_voices():
"""
Returns a list of available voices.
Mimics a common pattern for TTS voice endpoints.
"""
# IMPORTANT: This list MUST match the keys in model.hps.data.spk2id
return VoicesResponse(voices=['EN-US', 'EN-BR', 'EN_INDIA', 'EN-AU', 'EN-Default'])

@app.post("/v1/audio/speech")
async def create_speech(request: SpeechRequest):
"""
Synthesizes speech from text using MeloTTS.
Mimics OpenAI's /v1/audio/speech endpoint.
"""
if not request.input:
raise HTTPException(status_code=400, detail="Input text is required.")

if not request.voice:
raise HTTPException(status_code=400, detail="Voice is required.")

try:
# Determine the speaker ID based on the requested voice
speaker_ids = model.hps.data.spk2id
if request.voice in speaker_ids:
speaker_id = speaker_ids[request.voice]
else:
# Log a warning if voice not found and fall back to a default
print(f"Warning: Voice '{request.voice}' not found, falling back to 'EN-BR'")
speaker_id = speaker_ids.get("EN-BR", 0) # Default to 0 if 'EN-US' not found

# Use tempfile.NamedTemporaryFile for robust temporary file creation
# 'delete=True' ensures the file is removed automatically when the 'with' block exits
with tempfile.NamedTemporaryFile(suffix=".mp3", delete=True) as temp_file:
temp_audio_file_path = temp_file.name

# Generate audio and save to the temporary file
model.tts_to_file(
text=request.input,
speaker_id=speaker_id,
output_path=temp_audio_file_path,
speed=request.speed
)

# Seek to the beginning of the temporary file before reading its content
temp_file.seek(0)
audio_bytes = temp_file.read()

# Return audio as MP3
return Response(content=audio_bytes, media_type="audio/mpeg")

except Exception as e:
# IMPORTANT: Print the full traceback for debugging!
import traceback
traceback.print_exc()
raise HTTPException(status_code=500, detail=f"Speech synthesis failed: {str(e)}")

if __name__ == "__main__":
import uvicorn
# Run the FastAPI app
uvicorn.run(app, host="0.0.0.0", port=8000)
Binary file added open-webui-settings.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading