-
Notifications
You must be signed in to change notification settings - Fork 39
feat: add tts to rai core #419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/rai_core/rai/runners/s2s.py
Outdated
consumer_sampling_rate=vad.sampling_rate, | ||
is_input=True, | ||
) | ||
asr_agent = VoiceRecognitionAgent( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make it more explicit so that the user can guess that ros2 is enabled. Please use keyword arguments e.g. ros2_name="automatic_speech_recognition"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to keep in mind that the default run method of the s2s should not use ROS2 to communicate.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
3c2d44e
to
18957ba
Compare
Unfortunately it's still lagging on my setup. The tts is lagging (mic on/mic off). On the first tts launch a loud buzzing could be heard. Even though the TTS lags, ASR and response stopping works very well in the oww commented out scenario. Well done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well done,
lots of random comments, please clean up the code.
Did you test FasterWhisper and OpenAIWhisper?
Please add ElevenLabsTTS, you can use my commit 33e2a45
```bash | ||
python -c 'import sounddevice as sd; print(sd.query_devices())' | ||
The Agent utilises sounddevice module to access user's microphone, by default the `"default"` sound device is used. | ||
To get information about available sounddeives use: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
``` | ||
|
||
keep_speaker_busy: some speakers may go into low power mode, which may result in truncated speech beginnings. Set to true to play low frequency, low volume noise to prevent sleep mode. | ||
The device can be identifed by name and passed to the configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which part of the name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean? All of it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please refer to the sd.query_devices()
documentation. It is generally expected that a reasonable user will not run code without understanding its effects on their machine. When referring to "name" in the documentation, I refer to whatever is returned under the "name" field of objects returned by the function called. It would be highly counterintuitive to refer to anything else. Especially to refer using a regex, which are famously counterintuitive - seeing that yours includes the trailing comma.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just provide an example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please refer to line 17 of this file, which provides an example.
8d002ec
to
924047a
Compare
This reverts commit cb60dbc.
924047a
to
efefb9a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Purpose
With the new design based on agents and connectors there was need to refactor rai tts, to reflect these changes.
Proposed Changes
Adds:
Issues
#399
#309
Testing
CI
Manual tests performed using soundevices:
"default"
(ALSA - system defautl)"Jabra Speak2 40 MS: USB Audio"
(conference speaker/microphone)"Sennheiser USB headset: Audio"
(headset)To run and test this PR run:
docker run -it -p 5500:5500 synesthesiam/opentts:en --no-espeak
and then in 3 separate terminal windows run:
python examples/s2s/asr.py
python examples/s2s/tts.py
python examples/s2s/conversational.py
If you want to specify a particular device (example):
python ./examples/s2s/asr.py --device-name "Jabra Speak2 40 MS: USB Audio (hw:2,0)"
python examples/s2s/tts.py --device-name "Jabra Speak2 40 MS: USB Audio (hw:2,0)"
python examples/s2s/conversational.py
Also note: Default values of agents configurations should work ok, but it's not guaranteed. Particular sound data profiles can have a significant effect on the accuracy of voice activity and wake word detections, so for best results it is reccomended to experiment with the values, until desired accuracy is achieved. A very common option to change is
VAD_DETECTION_THRESHOLD
inexamples/s2s/asr.py
, as the VAD performance with given device can differ a lot.