New Services to Provide Speech-To-Text and Text-To-Speech Functionality from Aristech #35

ajgolledge · 2025-06-03T14:33:40Z

This PR provides two new services from Aristech:

Speech-To-Text

This service is called "aristech-transcribe" and can be called from the Call-API "startConversation" with this name alongside the folllowing JSON parameter:

{ "language": "de_DE" }

Note that this is in locale format, not BCP 47. Simply using "de" also works and I have not noticed any difference when using specific regions as well as in English ("en").

An entry like this in the ivr.toml file ensures that authentication is taken care of.

[[contextSwitch.service]]
name = "aristech-transcribe"
params = { apiKey = "an-apikey" }

The following are still open issues:

Determine whether the credentials authentication is likely to be necessary in future or whether we can reliably just use apiKey
Is there a silence timeout and if so, is it configurable? Does the silence_timeout field in EndpointSpec have any effect?
When using the example, if the default microphone settings are used ( as opposed to explicitly using 16kHz) does the conversion function which is currently used get in the way? (audio::into_i16) i.e. does not using it improve the performance of the example?

Text-To-Speech

This service is called "aristech-synthesize" and can be called from the Call-API "startConversation" with this name alongside the folllowing JSON parameter:

{ "voice": "anne_de_DE" }

Currently the only alternative voice available to us is "tom_de_DE".

An entry like this in the ivr.toml file ensures that authentication is taken care of.

[[contextSwitch.service]]
name = "aristech-synthesize"
params = { endpoint = "https://example.com", token = "a-valid-token", secret = "a-valid-secret" }
sampleRate = 22050

Both voices available to us currently work at a sample rate of 22050 Hz. Not specifying this can lead to amusing results 😄

Open Issues

Are any other voices available to us apart from "tom_de_DE" and "anne_de_DE"?

…zer client.

….rs file.

…from example code.

…erride the `sample_rate` in `AudioFormat` with the value given in the selected `voice`.

…setting.

…o stream.

…ing.

services/aristech/src/synthesize.rs

services/aristech/src/transcribe.rs

…rvices after PR review suggestion.

pragmatrix · 2025-06-11T06:41:57Z

Just minor changes and in transcribe.rs I've removed the "" empty string for model / prompt as the default and adjusted the testcases. I like the deserialization of the different credentials options, I'll adopt this for azure.

pragmatrix · 2025-06-11T06:50:50Z

As discussed, merging even though some open issues remain.

andrew-golledge added 19 commits June 3, 2025 10:38

Initial commit. Set up credentials and create initial client.

7eb51de

Use aristech_stt_client examples/live.rs to implement initial recogni…

c328904

…zer client.

Move aristech stt code out of lib.rs to reside in separate transcribe…

b9b783d

….rs file.

Add Aristech speech recognition to Context Switch registry.

ba6ca1b

Convert to anyhow errors to avoid using unwrap.

7522b00

Add code for Aristech TTS.

0da2b04

Make auth_config field and AuthConfig enum public, to enable use …

af80025

…from example code.

Initial checkin of example app for aristech stt.

5273e56

Read token and secret from environment for tts client credentials. Ov…

656b798

…erride the `sample_rate` in `AudioFormat` with the value given in the selected `voice`.

New aristech TTS example, loosely based on the azure example.

c1cd736

Use end_of_utterance flag instead of is_final when transcribing.

a07723e

Force microphone to use 16_000 Hz sample rate instead of the default …

59933eb

…setting.

Add the aristech synthesize service.

1003b80

Fix problems with authentication parameter serialization.

ac815ba

Rebase on master.

28a388b

Use async_stream instead of starting a separate task to read the audi…

3f89e62

…o stream.

Changes to synthesize example after rebase.

540e2b7

Make use of RecognitionSpec::default() and a spot of comment polish…

4c5c018

…ing.

Comment polishing.

1f536cf

pragmatrix marked this pull request as draft June 4, 2025 05:27

pragmatrix requested changes Jun 4, 2025

View reviewed changes

services/aristech/src/synthesize.rs Outdated Show resolved Hide resolved

services/aristech/src/transcribe.rs Outdated Show resolved Hide resolved

andrew-golledge and others added 5 commits June 4, 2025 09:35

Changed some parameter names in Aristech transcribe and synthesize se…

2fc1607

…rvices after PR review suggestion.

Install protobuf-compiler in CI build and keep cargo fmt happy.

a41b83a

aristech:synthesize: Minor changes

8c88c9e

Extend README.md with Aristech prerequisites

e273aad

aristech:transcribe: Convert model and prompt to an option

8249620

pragmatrix added 2 commits June 11, 2025 08:43

Remove separated Registry impl block

5b9a625

Review aristech examples

edd71cb

pragmatrix marked this pull request as ready for review June 11, 2025 06:50

pragmatrix merged commit 1acd92b into pragmatrix:master Jun 11, 2025
3 checks passed

pragmatrix mentioned this pull request Jun 11, 2025

Aristech Open Issues #36

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New Services to Provide Speech-To-Text and Text-To-Speech Functionality from Aristech #35

New Services to Provide Speech-To-Text and Text-To-Speech Functionality from Aristech #35

Uh oh!

ajgolledge commented Jun 3, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

pragmatrix commented Jun 11, 2025

Uh oh!

pragmatrix commented Jun 11, 2025

Uh oh!

Uh oh!

Uh oh!

New Services to Provide Speech-To-Text and Text-To-Speech Functionality from Aristech #35

New Services to Provide Speech-To-Text and Text-To-Speech Functionality from Aristech #35

Uh oh!

Conversation

ajgolledge commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Speech-To-Text

The following are still open issues:

Text-To-Speech

Open Issues

Uh oh!

Uh oh!

Uh oh!

pragmatrix commented Jun 11, 2025

Uh oh!

pragmatrix commented Jun 11, 2025

Uh oh!

Uh oh!

Uh oh!

ajgolledge commented Jun 3, 2025 •

edited

Loading