Skip to content

feat: kokoro tts support #643

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 35 commits into from
Jul 16, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
019eb5b
feat: implementation of TTSModel interface for kokoro-tts model
MagdalenaKotynia Jun 25, 2025
3a9a440
refactor: moved initialization of Kokoro model to init of the KokoroTTS
MagdalenaKotynia Jun 25, 2025
77f2976
feat: add methods to get supported languages and voices of Kokoro model
MagdalenaKotynia Jun 25, 2025
9b4b949
feat: add automated model and voices download if it is not yet downlo…
MagdalenaKotynia Jun 25, 2025
60543d9
feat: added support for KokoroTTS in TTSAgent
MagdalenaKotynia Jun 25, 2025
cbed888
build: updated poetry.lock
MagdalenaKotynia Jun 26, 2025
d32b1c9
build: updated s2s pyproject toml, added zarr to pyproject toml to av…
MagdalenaKotynia Jun 26, 2025
9521348
docs: added Kokoro TTS description to README
MagdalenaKotynia Jun 26, 2025
d4bd13a
feat: added KokoroTTS for configurator
MagdalenaKotynia Jun 26, 2025
68b59f6
chore: add KokoroTTS to init and agent initialization as default model
MagdalenaKotynia Jun 26, 2025
e50c706
fix: added resampling for KokoroTTS model to properly use it with TTS…
MagdalenaKotynia Jun 26, 2025
91cc47b
chore: add KokoroTTS import to rai_s2s init
MagdalenaKotynia Jun 26, 2025
9478038
docs: update docs with vendors
MagdalenaKotynia Jun 26, 2025
cea506d
chore: set trim to False to minimize the number of output underflow i…
MagdalenaKotynia Jun 26, 2025
95747f6
feat: add KokoroTTS to example with TTSAgent
MagdalenaKotynia Jun 26, 2025
ad7ee39
test: added tts test for KokoroTTS in configurator
MagdalenaKotynia Jun 26, 2025
9d232eb
fix: apply text preprocessing to remove formatting characters that we…
MagdalenaKotynia Jun 26, 2025
b46bc99
fix: handled ''#' signs produces by llm to not to spell them by TTS
MagdalenaKotynia Jun 27, 2025
248a8f7
fix: set trim to False to preserve pause between the sentences from t…
MagdalenaKotynia Jun 27, 2025
1623df1
feat: support for quantized models
MagdalenaKotynia Jun 27, 2025
8edcecf
chore: minor tidying up the code
MagdalenaKotynia Jun 27, 2025
5ff9098
docs: added info about models sizes
MagdalenaKotynia Jun 27, 2025
2104543
docs: added warning about using default device name in SoundDeviceConfig
MagdalenaKotynia Jun 27, 2025
a5091db
docs: added quotes
MagdalenaKotynia Jun 27, 2025
c562fda
chore: removed unnecesary multiline flag from text preprocessing
MagdalenaKotynia Jun 27, 2025
aacfc67
Update docs/speech_to_speech/sounddevice.md
MagdalenaKotynia Jul 8, 2025
61ab2b0
docs: added info about potential problems with default audio device
MagdalenaKotynia Jul 8, 2025
72f089a
chore: changed level of phonemizer logger from WARNING to ERROR
MagdalenaKotynia Jul 8, 2025
41e3f23
chore: filtered out the words count mismatch phonemizer warning
MagdalenaKotynia Jul 8, 2025
24410f3
docs: removed deprecated warning
MagdalenaKotynia Jul 8, 2025
984de60
chore: removed not needed filtering of __ and _
MagdalenaKotynia Jul 8, 2025
bfd43cd
build: removed line excluding zarr yanked version
MagdalenaKotynia Jul 14, 2025
6ba0527
feat: added gpu support for kokoro-tts
MagdalenaKotynia Jul 15, 2025
3d587c7
build: generated poetry lock after adding onnxruntime-gpu for s2s
MagdalenaKotynia Jul 15, 2025
1143a79
chore: added error logging when any audio sample value exceeds range …
MagdalenaKotynia Jul 15, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions docs/setup/vendors.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,12 @@ Alternatively vendors can be configured manually in `config.toml` file.

The table summarizes vendor alternative for core AI service and optional RAI modules:

| Module | Open source | Alternative | Why to consider alternative? | More information |
| ----------------------------------------------- | ----------- | ----------------------- | ------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------- |
| [LLM service](#llm-model-configuration-in-rai) | Ollama | OpenAI, Bedrock | Overall performance of the LLM models, supported modalities and features | [LangChain models](https://docs.langchain4j.dev/integrations/language-models/) |
| **Optional:** [Tracing tool](./tracing.md) | Langfuse | LangSmith | Better integration with LangChain | [Comparison](https://langfuse.com/faq/all/langsmith-alternative) |
| **Optional:** [Text to speech](#text-to-speech) | OpenTTS | ElevenLabs | Arguably, significantly better voice synthesis | <li> [OpenTTS GitHub](https://github.com/synesthesiam/opentts) </li><li> [RAI voice interface][s2s] </li> |
| **Optional:** [Speech to text](#speech-to-text) | Whisper | OpenAI Whisper (hosted) | When suitable local GPU is not an option | <li> [Whisper GitHub](https://github.com/openai/whisper) </li><li> [RAI voice interface][s2s] </li> |
| Module | Open source | Alternative | Why to consider alternative? | More information |
| ----------------------------------------------- | ------------------ | ----------------------- | ------------------------------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [LLM service](#llm-model-configuration-in-rai) | Ollama | OpenAI, Bedrock | Overall performance of the LLM models, supported modalities and features | [LangChain models](https://docs.langchain4j.dev/integrations/language-models/) |
| **Optional:** [Tracing tool](./tracing.md) | Langfuse | LangSmith | Better integration with LangChain | [Comparison](https://langfuse.com/faq/all/langsmith-alternative) |
| **Optional:** [Text to speech](#text-to-speech) | KokoroTTS, OpenTTS | ElevenLabs | Arguably, significantly better voice synthesis | <li> [KokoroTTS](https://huggingface.co/hexgrad/Kokoro-82M#usage) </li><li> [OpenTTS GitHub](https://github.com/synesthesiam/opentts) </li><li> [RAI voice interface][s2s] </li> |
| **Optional:** [Speech to text](#speech-to-text) | Whisper | OpenAI Whisper (hosted) | When suitable local GPU is not an option | <li> [Whisper GitHub](https://github.com/openai/whisper) </li><li> [RAI voice interface][s2s] </li> |

> [!TIP] Best-performing AI models
>
Expand Down
3 changes: 3 additions & 0 deletions docs/speech_to_speech/sounddevice.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,9 @@ connector = SoundDeviceConnector(
)
```

> [!TIP]
> If you're experiencing audio issues and device_name is set to 'default', try specifying the exact device name instead, as this often resolves the problem.

## Message Type: `SoundDeviceMessage`

```python
Expand Down
17 changes: 15 additions & 2 deletions examples/s2s/tts.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@

import rclpy

from rai_s2s import OpenTTS, TextToSpeechAgent
from rai_s2s import KokoroTTS, OpenTTS, TextToSpeechAgent
from rai_s2s.sound_device import SoundDeviceConfig


Expand All @@ -35,6 +35,14 @@ def parse_arguments():
help="Speaker device name (default: 'default')",
)

parser.add_argument(
"--tts-model",
type=str,
choices=["opentts", "kokoro"],
default="kokoro",
help="TTS model to use: 'opentts' or 'kokoro' (default: 'kokoro')",
)

# Use parse_known_args to ignore unknown arguments
args, unknown = parser.parse_known_args()

Expand All @@ -55,7 +63,12 @@ def parse_arguments():
# device_name="Jabra Speak2 40 MS: USB Audio (hw:2,0)",
device_name=args.device_name,
)
tts = OpenTTS()

tts = KokoroTTS()
print("Using KokoroTTS model")
if args.tts_model == "opentts":
tts = OpenTTS()
print("Using OpenTTS model")

agent = TextToSpeechAgent(config, "text_to_speech", tts)
agent.run()
Expand Down
Loading