Skip to content

Commit a4af54c

Browse files
feat: kokoro tts support (#643)
Co-authored-by: Maciej Majek <[email protected]>
1 parent d54fffc commit a4af54c

File tree

14 files changed

+849
-218
lines changed

14 files changed

+849
-218
lines changed

docs/setup/vendors.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -9,12 +9,12 @@ Alternatively vendors can be configured manually in `config.toml` file.
99

1010
The table summarizes vendor alternative for core AI service and optional RAI modules:
1111

12-
| Module | Open source | Alternative | Why to consider alternative? | More information |
13-
| ----------------------------------------------- | ----------- | ----------------------- | ------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------- |
14-
| [LLM service](#llm-model-configuration-in-rai) | Ollama | OpenAI, Bedrock | Overall performance of the LLM models, supported modalities and features | [LangChain models](https://docs.langchain4j.dev/integrations/language-models/) |
15-
| **Optional:** [Tracing tool](./tracing.md) | Langfuse | LangSmith | Better integration with LangChain | [Comparison](https://langfuse.com/faq/all/langsmith-alternative) |
16-
| **Optional:** [Text to speech](#text-to-speech) | OpenTTS | ElevenLabs | Arguably, significantly better voice synthesis | <li> [OpenTTS GitHub](https://github.com/synesthesiam/opentts) </li><li> [RAI voice interface][s2s] </li> |
17-
| **Optional:** [Speech to text](#speech-to-text) | Whisper | OpenAI Whisper (hosted) | When suitable local GPU is not an option | <li> [Whisper GitHub](https://github.com/openai/whisper) </li><li> [RAI voice interface][s2s] </li> |
12+
| Module | Open source | Alternative | Why to consider alternative? | More information |
13+
| ----------------------------------------------- | ------------------ | ----------------------- | ------------------------------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
14+
| [LLM service](#llm-model-configuration-in-rai) | Ollama | OpenAI, Bedrock | Overall performance of the LLM models, supported modalities and features | [LangChain models](https://docs.langchain4j.dev/integrations/language-models/) |
15+
| **Optional:** [Tracing tool](./tracing.md) | Langfuse | LangSmith | Better integration with LangChain | [Comparison](https://langfuse.com/faq/all/langsmith-alternative) |
16+
| **Optional:** [Text to speech](#text-to-speech) | KokoroTTS, OpenTTS | ElevenLabs | Arguably, significantly better voice synthesis | <li> [KokoroTTS](https://huggingface.co/hexgrad/Kokoro-82M#usage) </li><li> [OpenTTS GitHub](https://github.com/synesthesiam/opentts) </li><li> [RAI voice interface][s2s] </li> |
17+
| **Optional:** [Speech to text](#speech-to-text) | Whisper | OpenAI Whisper (hosted) | When suitable local GPU is not an option | <li> [Whisper GitHub](https://github.com/openai/whisper) </li><li> [RAI voice interface][s2s] </li> |
1818

1919
> [!TIP] Best-performing AI models
2020
>

docs/speech_to_speech/sounddevice.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,9 @@ connector = SoundDeviceConnector(
3131
)
3232
```
3333

34+
> [!TIP]
35+
> If you're experiencing audio issues and device_name is set to 'default', try specifying the exact device name instead, as this often resolves the problem.
36+
3437
## Message Type: `SoundDeviceMessage`
3538

3639
```python

examples/s2s/tts.py

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@
1818

1919
import rclpy
2020

21-
from rai_s2s import OpenTTS, TextToSpeechAgent
21+
from rai_s2s import KokoroTTS, OpenTTS, TextToSpeechAgent
2222
from rai_s2s.sound_device import SoundDeviceConfig
2323

2424

@@ -35,6 +35,14 @@ def parse_arguments():
3535
help="Speaker device name (default: 'default')",
3636
)
3737

38+
parser.add_argument(
39+
"--tts-model",
40+
type=str,
41+
choices=["opentts", "kokoro"],
42+
default="kokoro",
43+
help="TTS model to use: 'opentts' or 'kokoro' (default: 'kokoro')",
44+
)
45+
3846
# Use parse_known_args to ignore unknown arguments
3947
args, unknown = parser.parse_known_args()
4048

@@ -55,7 +63,12 @@ def parse_arguments():
5563
# device_name="Jabra Speak2 40 MS: USB Audio (hw:2,0)",
5664
device_name=args.device_name,
5765
)
58-
tts = OpenTTS()
66+
67+
tts = KokoroTTS()
68+
print("Using KokoroTTS model")
69+
if args.tts_model == "opentts":
70+
tts = OpenTTS()
71+
print("Using OpenTTS model")
5972

6073
agent = TextToSpeechAgent(config, "text_to_speech", tts)
6174
agent.run()

0 commit comments

Comments
 (0)