feat: kokoro tts support #643

MagdalenaKotynia · 2025-06-25T15:35:25Z

Purpose

To support the usage of Kokoro-TTS model. Kokoro-TTS was selected based on its high-quality speech output, small size, and potential to run on edge devices (it is in ONNX format)

Proposed Changes

Developed a class implementing the TTSModel interface for the Kokoro-TTS model.
Updated docs with newly supported model.
Updated example with TTSAgent to be able to use the newly supported model

Testing

Source ros2
Install RAI following the instructions for developers from docs
Install s2s package: poetry install --with s2s

With TTSAgent

Run TTSAgent example: python examples/s2s/tts.py
In another terminal run the following script to send ros2hri message to ros2 topic:

from rai.communication.ros2.connectors import ROS2HRIConnector
from rai.communication.ros2.messages import ROS2HRIMessage
import rclpy
import time

rclpy.init()
my_hri_msg = ROS2HRIMessage(
    text="Hello, human! This is a test message. How are you?",
    message_author="ai",
)

hri_connector = ROS2HRIConnector()

hri_connector.send_message(
    message=my_hri_msg,
    target="/to_human"
)

try:
    print("Sending message... Press Ctrl+C to exit")
    time.sleep(10)
    
except KeyboardInterrupt:
    print("Shutting down...")
finally:
    hri_connector.shutdown()
    rclpy.shutdown()

After a while, you should hear speech output from TTSAgent.

With ROS2S2SAgent

Run the following script and converse with agent:

from rai_s2s.sound_device import SoundDeviceConfig
from rai.communication.ros2 import ROS2Context
from rai_s2s.s2s.agents.s2s_agent import SpeechToSpeechAgent
from rai_s2s.s2s.agents.ros2s2s_agent import ROS2S2SAgent
from rai.agents.langchain.react_agent import ReActAgent
from rai_s2s.asr.models import OpenAIWhisper, SileroVAD
from rai_s2s import KokoroTTS

from rai.agents import AgentRunner


@ROS2Context()
def main():
    speaker_config = SoundDeviceConfig(
        stream=True,
        is_output=True,
        # device_name="EPOS PC 8 USB: Audio (hw:1,0)",
        # device_name="Sennheiser USB headset: Audio (hw:1,0)",
        # device_name="Jabra Speak2 40 MS: USB Audio (hw:2,0)",
        device_name="default",
    )

    microphone_config = SoundDeviceConfig(
        stream=True,
        channels=1,
        device_name="default",
        consumer_sampling_rate=16000,
        dtype="int16",
        is_input=True,
    )

    # whisper = LocalWhisper("tiny", 16000)
    whisper = OpenAIWhisper("gpt-4o-mini-transcribe", 16000)
    vad = SileroVAD(16000, 0.5)
    
    tts = KokoroTTS()

    agent = ROS2S2SAgent(
        from_human_topic="/from_human",
        to_human_topic="/to_human",
        microphone_config=microphone_config,
        speaker_config=speaker_config,
        transcription_model=whisper,
        vad=vad,
        tts=tts,
    )
    from rai.communication.ros2 import ROS2HRIConnector

    hri_connector = ROS2HRIConnector()
    llm = ReActAgent(
        target_connectors={"/to_human": hri_connector},
    )
    llm.subscribe_source("/from_human", hri_connector)
    runner = AgentRunner([agent, llm])
    runner.run_and_wait_for_shutdown()


if __name__ == "__main__":
    main()

The KokoroTTS model works well together with the ROS2S2SAgent.
My UX - It sounds nicer compared with OpenTTS. I didn't observe any significant differences in inference time between the models.
~~The model sometimes does not put space between the sentences.~~ EDIT: It was fixed by setting trim to false in create method of Kokoro.

…aded

…oid yanked version, regenerated poetry lock

…Agent

…n TTSAgent

…re read alound as words by TTS model

…wo separate chunks

maciejmajek · 2025-06-30T07:47:27Z

pyproject.toml

+# To avoid yanked version 3.0.6
+zarr = "!=3.0.6"
+


Does zarr with the 3.0.6 break rai?

I didn't test it. Zarr 3.0.6 was selected by poetry when resolving dependencies, and poetry threw a warning that zarr 3.0.6 is a yanked version.

This may introduces further incompatibilities with packages relying on the yanked version. Please remove this line, we will bump the packages later.

done bfd43cd

docs/speech_to_speech/sounddevice.md

src/rai_s2s/rai_s2s/tts/models/kokoro_tts.py

maciejmajek · 2025-07-03T15:29:20Z

src/rai_s2s/rai_s2s/tts/models/kokoro_tts.py

+            )
+
+            if samples.dtype == np.float32:
+                samples = (samples * 32768).clip(-32768, 32767).astype(np.int16)


Are we expecting values outside of the provided range?
Clipping audio should only be used as a last resort, as it introduces massive quality degradation.

The clipping is done to ensure that values are within -32768, 32767 range to prevent overflow in case of e.g. eventual numerical errors.

Please log an error if values of samples exceed the -1 to 1 range.

done 1143a79

src/rai_s2s/README.md

Co-authored-by: Maciej Majek <[email protected]>

maciejmajek · 2025-07-09T12:10:37Z

src/rai_s2s/pyproject.toml

@@ -28,10 +28,12 @@ elevenlabs = { version = "^1.4.1", optional = true }
 openai-whisper = { version = "^20231117", optional = true }
 faster-whisper = { version = "^1.1.1", optional = true }
 openwakeword = { git = "https://github.com/maciejmajek/openWakeWord.git", branch = "chore/remove-tflite-backend", optional = true }
+kokoro-onnx = { version = "0.3.3", optional = true }


This does not install gpu support. - model will run only on cpu.
Please take a look here
and here

Thanks for noticing it. I added required libraries and instructions on how to run on gpu. 6ba0527
There seems to be a bug in a kokoro-onnx source code here - import for both onnxruntime and onnxruntime-gpu is via import onnxruntime, so automatic detection of available providers will not work. That is why I added instructions to export ONNX_PROVIDER variable.

…[-1,1]

MagdalenaKotynia added 16 commits June 25, 2025 14:06

feat: implementation of TTSModel interface for kokoro-tts model

019eb5b

refactor: moved initialization of Kokoro model to init of the KokoroTTS

3a9a440

feat: add methods to get supported languages and voices of Kokoro model

77f2976

feat: add automated model and voices download if it is not yet downlo…

9b4b949

…aded

feat: added support for KokoroTTS in TTSAgent

60543d9

build: updated poetry.lock

cbed888

build: updated s2s pyproject toml, added zarr to pyproject toml to av…

d32b1c9

…oid yanked version, regenerated poetry lock

docs: added Kokoro TTS description to README

9521348

feat: added KokoroTTS for configurator

d4bd13a

chore: add KokoroTTS to init and agent initialization as default model

68b59f6

fix: added resampling for KokoroTTS model to properly use it with TTS…

e50c706

…Agent

chore: add KokoroTTS import to rai_s2s init

91cc47b

docs: update docs with vendors

9478038

chore: set trim to False to minimize the number of output underflow i…

cea506d

…n TTSAgent

feat: add KokoroTTS to example with TTSAgent

95747f6

test: added tts test for KokoroTTS in configurator

ad7ee39

MagdalenaKotynia marked this pull request as ready for review June 26, 2025 13:35

fix: apply text preprocessing to remove formatting characters that we…

9d232eb

…re read alound as words by TTS model

MagdalenaKotynia requested review from boczekbartek and removed request for boczekbartek June 26, 2025 17:45

MagdalenaKotynia added 7 commits June 27, 2025 13:45

fix: handled ''#' signs produces by llm to not to spell them by TTS

b46bc99

fix: set trim to False to preserve pause between the sentences from t…

248a8f7

…wo separate chunks

feat: support for quantized models

1623df1

chore: minor tidying up the code

8edcecf

docs: added info about models sizes

5ff9098

docs: added warning about using default device name in SoundDeviceConfig

2104543

docs: added quotes

a5091db

MagdalenaKotynia requested a review from boczekbartek June 27, 2025 12:29

chore: removed unnecesary multiline flag from text preprocessing

c562fda

maciejmajek reviewed Jun 30, 2025

View reviewed changes

docs/speech_to_speech/sounddevice.md Outdated Show resolved Hide resolved

maciejmajek reviewed Jul 3, 2025

View reviewed changes

src/rai_s2s/rai_s2s/tts/models/kokoro_tts.py Outdated Show resolved Hide resolved

maciejmajek reviewed Jul 3, 2025

View reviewed changes

src/rai_s2s/README.md Outdated Show resolved Hide resolved

MagdalenaKotynia and others added 6 commits July 8, 2025 11:09

Update docs/speech_to_speech/sounddevice.md

aacfc67

Co-authored-by: Maciej Majek <[email protected]>

docs: added info about potential problems with default audio device

61ab2b0

chore: changed level of phonemizer logger from WARNING to ERROR

72f089a

chore: filtered out the words count mismatch phonemizer warning

41e3f23

docs: removed deprecated warning

24410f3

chore: removed not needed filtering of __ and _

984de60

MagdalenaKotynia requested a review from maciejmajek July 8, 2025 15:10

maciejmajek reviewed Jul 9, 2025

View reviewed changes

MagdalenaKotynia added 4 commits July 14, 2025 20:34

build: removed line excluding zarr yanked version

bfd43cd

feat: added gpu support for kokoro-tts

6ba0527

build: generated poetry lock after adding onnxruntime-gpu for s2s

3d587c7

chore: added error logging when any audio sample value exceeds range …

1143a79

…[-1,1]

MagdalenaKotynia requested a review from maciejmajek July 15, 2025 11:31

maciejmajek approved these changes Jul 16, 2025

View reviewed changes

maciejmajek merged commit a4af54c into main Jul 16, 2025
6 checks passed

maciejmajek deleted the feat/kokoro-tts-support branch July 16, 2025 12:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: kokoro tts support #643

feat: kokoro tts support #643

Uh oh!

MagdalenaKotynia commented Jun 25, 2025 •

edited

Loading

Uh oh!

maciejmajek Jun 30, 2025

Uh oh!

MagdalenaKotynia Jul 8, 2025

Uh oh!

maciejmajek Jul 9, 2025

Uh oh!

MagdalenaKotynia Jul 15, 2025

Uh oh!

Uh oh!

Uh oh!

maciejmajek Jul 3, 2025

Uh oh!

MagdalenaKotynia Jul 8, 2025

Uh oh!

maciejmajek Jul 9, 2025

Uh oh!

MagdalenaKotynia Jul 15, 2025

Uh oh!

Uh oh!

maciejmajek Jul 9, 2025

Uh oh!

MagdalenaKotynia Jul 15, 2025

Uh oh!

Uh oh!

Uh oh!

feat: kokoro tts support #643

feat: kokoro tts support #643

Uh oh!

Conversation

MagdalenaKotynia commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Proposed Changes

Testing

With TTSAgent

With ROS2S2SAgent

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

MagdalenaKotynia commented Jun 25, 2025 •

edited

Loading