Helper tools to enable AsTeRICS Grid to do actions on the operating system or integrations with external services, which aren't possible within the browser. Currently two types of helper applications are provided, one for speech and one for interaction (e.g. with COM/Serial ports) - see folder interaction.
Normally AsTeRICS Grid uses the Web Speech API and therefore voices that are installed on the operating system (e.g. SAPI voices on Windows, or voices that are coming from a TTS module on Android). Sometimes it's interesting to use voices, which aren't available as system voices. This section describes how to use an external custom speech service using Python.
- Speech provider: a Python module that implements access to a speech generating service like MS Azure, Amazon Polly, Piper, MycroftAI mimic3 or any others. Speech providers can have two types:
- type "playing": a speech provider where playing the audio file is done internally. Using a speech provider of this type only makes sense, if it's used on the same machine as AsTeRICS Grid.
- type "data": a speech provider that generates the speech audio data, which then is used by AsTeRICS Grid and played within the browser. This type is preferable, because it makes it possible to run the speech service on any device or server and also allows caching of the data.
These steps are necessary to start the speech service that can be used by AsTeRICS Grid:
pip install flask flask_cors- for installing Flask, which is needed for providing the REST APIpip install pyttsx3- only if you want to try the speech providerprovider_pytts_playing.pywhich is configured by default inconfig.py, otherwise install any other dependencies needed by the used speech providers, see predefined speech providers.- adapt config.py for using the desired speech providers by importing them and adding them to the list
speechProviderList. python start.py- to start the REST API
In AsTeRICS Grid do the following steps to use the external speech provider:
- Go to
Settings -> General Settings -> Advanced general settings - Configure the
External speech service URLwith the IP/host where the API is running, port5555. If the speech service is running on the same computer, usehttp://localhost:5555. - Reload AsTeRICS Grid (
F5) - Go to
Settings -> User settings -> Voiceand enableShow all voices - Verify that the additional voices are selectable and working. For the default
provider_pytts_playingspeech provider some voices like<voice name>, pytts_playingshould be listed.
For speech providers with type "data", all generated speech data is automatically cached to the folder speech/temp. If you want to cache speech data for a whole AsTeRICS Grid configuration follow these steps:
- configure AsTeRICS Grid to use your desired speech provider / voice (see steps above)
- go to
Settings -> User settings -> Voice -> Advanced voice settingsand click the buttonCache all texts of current configuration using external voice. This operation may take some time for big AsTeRICS Grid configurations.
These are the important files within the folder speech of this repository:
config.pyconfiguration file where it's possible to define which speech providers should be usedprovider_<name>_playing.pyimplementation of a speech provider which generates speech and plays audio on its ownprovider_<name>_data.pyimplementation of a speech provider which generates speech audio data and returns the binary data, which then is played by AsTeRICS Grid within the browserstart.pymain script providing a REST API which can be used by AsTeRICS GridspeechManager.pyscript which manages different speech providers and is used to access them by the API defined instart.py
This is a list of predefined speech providers with installation hints:
- mimic3_data: see Mimic 3 installation steps, install in any way which provides
mimic3as CLI-tool, which is used by the speech provider. The current implementation only uses the voiceen_UK/apope_low, for further voices the fileprovider_mimic3_data.pymust be adapted. - msazure_data, msazure_playing:
- run
pip install azure-cognitiveservices-speech, for further information see MS Azure TTS quickstart - to get API credentials, you have to sign-up at MS Azure and create a
SpeechServicesresource. - Create a file
speech/credentials.pyincluding two linesAZURE_KEY_1 = "<your-key>"andAZURE_REGION = "<your-region>"
- run
- piper_data: run
pip install piper-tts, for more information see Running Piper in Python. - pytts_playing: run
pip install pyttsx3 - elevenlabs_data run
pip install requestsand create a filespeech/credentials.pywithELEVENLABS_KEY = "<your-key>". Read here how to get the API key.
See config.py, where the speech providers to use can be imported and added to the list speechProviderList.
Use the templates provider_template_data.py or provider_template_playing.py depending on which type of speech provider you want to add and implement the predefined methods.
The file speech/start.py starts the REST API with the following endpoints:
/voicesreturns a list of voices that are existing within the current configuration./speak/<text>/<providerId>/<voiceId>speaks the given text using the given provider and voice./speakdata/<text>/<providerId>/<voiceId>returns the binary audio data for the text using the given provider and voice./cache/<text>/<providerId>/<voiceId>caches the audio data for the given parameters to a file inspeech/tempin order to be able to use it faster or without internet connection afterwards./speakingreturnstrueif the system is currently speaking (only applicable for voice type "speaking")