feat: add tts to rai core #419

rachwalk · 2025-02-12T10:59:54Z

Purpose

With the new design based on agents and connectors there was need to refactor rai tts, to reflect these changes.

Proposed Changes

Adds:

TextToSpeech models
TextToSpeech agent
Agent runners

Issues

#399
#309

Testing

CI

Manual tests performed using soundevices:

"default" (ALSA - system defautl)
"Jabra Speak2 40 MS: USB Audio" (conference speaker/microphone)
"Sennheiser USB headset: Audio" (headset)

To run and test this PR run:

docker run -it -p 5500:5500 synesthesiam/opentts:en --no-espeak

and then in 3 separate terminal windows run:
python examples/s2s/asr.py
python examples/s2s/tts.py
python examples/s2s/conversational.py

If you want to specify a particular device (example):
python ./examples/s2s/asr.py --device-name "Jabra Speak2 40 MS: USB Audio (hw:2,0)"
python examples/s2s/tts.py --device-name "Jabra Speak2 40 MS: USB Audio (hw:2,0)"
python examples/s2s/conversational.py

Also note: Default values of agents configurations should work ok, but it's not guaranteed. Particular sound data profiles can have a significant effect on the accuracy of voice activity and wake word detections, so for best results it is reccomended to experiment with the values, until desired accuracy is achieved. A very common option to change is VAD_DETECTION_THRESHOLD in examples/s2s/asr.py, as the VAD performance with given device can differ a lot.

src/rai_core/rai/runners/base.py

boczekbartek · 2025-02-12T12:42:15Z

src/rai_core/rai/runners/s2s.py

+            consumer_sampling_rate=vad.sampling_rate,
+            is_input=True,
+        )
+        asr_agent = VoiceRecognitionAgent(


I would make it more explicit so that the user can guess that ros2 is enabled. Please use keyword arguments e.g. ros2_name="automatic_speech_recognition"

We need to keep in mind that the default run method of the s2s should not use ROS2 to communicate.

maciejmajek · 2025-02-26T20:07:34Z

Unfortunately it's still lagging on my setup. The tts is lagging (mic on/mic off). On the first tts launch a loud buzzing could be heard.
As we have found out, the oww is problematic in this implementation so I've turned it off.
asr.py line 103 # agent.add_detection_model(vad, pipeline="record")
voice_agent.py line 195 should_record = voice_detected# self._should_record(indata, output_parameters)

Even though the TTS lags, ASR and response stopping works very well in the oww commented out scenario. Well done

maciejmajek

Well done,
lots of random comments, please clean up the code.
Did you test FasterWhisper and OpenAIWhisper?
Please add ElevenLabsTTS, you can use my commit 33e2a45

maciejmajek · 2025-02-28T10:47:13Z

docs/human_robot_interface/voice_interface.md

-```bash
-python -c 'import sounddevice as sd; print(sd.query_devices())'
+The Agent utilises sounddevice module to access user's microphone, by default the `"default"` sound device is used.
+To get information about available sounddeives use:


maciejmajek · 2025-02-28T10:47:32Z

docs/human_robot_interface/voice_interface.md

 ```

-keep_speaker_busy: some speakers may go into low power mode, which may result in truncated speech beginnings. Set to true to play low frequency, low volume noise to prevent sleep mode.
+The device can be identifed by name and passed to the configuration.


Which part of the name?

What do you mean? All of it

I think only the value to the r'\sALSA' should be passed

Please refer to the sd.query_devices() documentation. It is generally expected that a reasonable user will not run code without understanding its effects on their machine. When referring to "name" in the documentation, I refer to whatever is returned under the "name" field of objects returned by the function called. It would be highly counterintuitive to refer to anything else. Especially to refer using a regex, which are famously counterintuitive - seeing that yours includes the trailing comma.

Just provide an example.

Please refer to line 17 of this file, which provides an example.

docs/human_robot_interface/voice_interface.md

examples/s2s/asr.py

src/rai_core/rai/communication/ros2/connectors.py

docs/human_robot_interface/voice_interface.md

… format

This reverts commit cb60dbc.

maciejmajek

LGTM

rachwalk requested a review from maciejmajek February 12, 2025 11:00

boczekbartek reviewed Feb 12, 2025

View reviewed changes

src/rai_core/rai/runners/base.py Outdated Show resolved Hide resolved

boczekbartek reviewed Feb 12, 2025

View reviewed changes

This comment was marked as outdated.

Sign in to view

rachwalk changed the title ~~Add TTS to RAI Core~~ feat: add tts to rai core Feb 12, 2025

rachwalk requested a review from boczekbartek February 12, 2025 17:00

rachwalk force-pushed the refactor/rai_tts branch from 3c2d44e to 18957ba Compare February 26, 2025 15:16

maciejmajek requested changes Feb 28, 2025

View reviewed changes

maciejmajek reviewed Mar 11, 2025

View reviewed changes

docs/human_robot_interface/voice_interface.md Outdated Show resolved Hide resolved

rachwalk force-pushed the refactor/rai_tts branch from 8d002ec to 924047a Compare March 11, 2025 15:42

rachwalk added 18 commits March 12, 2025 10:38

feat: add base impl of tts agent and start moving tts models into new…

4832493

… format

feat: change connector api to support AudioSegment

194c7eb

feat: working TTS, with pausing

7313a5b

feat: working S2S

3199a23

feat: add agent runner

b7eae2a

chore: add runner to __init__

0ecc1e7

fix: working demo after rebase

7f3fb67

feat: add runners to create configurable, multi-agent deployments

416f826

diocs: add docstrings for affected classes

49bd338

chore: rename runner main method to run

d42a0c0

fix: tts agent support HRI msg

e363e6d

fix: s2s migrate to HRIMessage

92c8bc7

fix: end to end working runner with HRI

da97c8e

test: update tests to support AudioSegment api

a02ea22

feat: working multiterminal version

08df784

feat: working singleterminal setup

5f4d597

feat: remove runner

8a81211

chore: remove trash file

ad35944

rachwalk and others added 13 commits March 12, 2025 10:39

fix: race condition on cancelling speech task

65f9c05

fix: race condition on single transcribe queue

4241d8a

fix: send voice commands only on changes

7aff7cf

Revert "fix: send voice commands only on changes"

e16150b

This reverts commit cb60dbc.

fix: minimise ros2 traffic

a624d1b

docs: add S2S docs

339c05b

fix: minimise ros2 traffic -- add missing if

e0dee3f

fix: conversational example use history

bfab263

docs: fix typos

3861410

chore: add comments on example

70268d9

chore: remove useless comment

746997d

feat: add ElevenLabsTTS

0ad433c

docs: change the commant for device query

efefb9a

rachwalk force-pushed the refactor/rai_tts branch from 924047a to efefb9a Compare March 12, 2025 09:41

chore: pre-commit

0a37810

maciejmajek approved these changes Mar 12, 2025

View reviewed changes

maciejmajek merged commit 1695191 into development Mar 12, 2025
5 checks passed

maciejmajek deleted the refactor/rai_tts branch March 12, 2025 10:17

rachwalk mentioned this pull request Mar 25, 2025

Add TTS agent utilising connectors #399

Closed

8 tasks

feat: add tts to rai core #419

feat: add tts to rai core #419

Uh oh!

Conversation

rachwalk commented Feb 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Proposed Changes

Issues

Testing

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

maciejmajek commented Feb 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maciejmajek left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maciejmajek Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

maciejmajek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

rachwalk commented Feb 12, 2025 •

edited

Loading

maciejmajek commented Feb 26, 2025 •

edited

Loading

maciejmajek Mar 11, 2025 •

edited

Loading