standalone-soda

Google Chrome includes a Live Caption feature that uses a Speech-On-Device API (SODA) to transcribe audio. The feature can be turned on in settings:

When enabled, Chrome fetches SODA as an extension containing the necessary binary and language models.

By default, the SODA binary verifies its execution context, and so cannot be used outside of Chrome. This project provides a framework for using a modified version of the binary on MacOS (M1), so that it can be used to transcribe audio files directly, without live audio input:

demo.mp4

I cannot provide a modified version of the binary, however download_soda.py will fetch libsoda.so (the SODA binary) and as well as the english language models. From there, you can use a tool like Ghidra to modify the binary yourself.

There are many supported languages for SODA, which Chrome can download for you from the Live Captions settings, and you access them at ~/Library/Application Support/Google/Chrome/SODALanguagePacks and copy them over for use here.

Usage

Install dependencies
```
pip install -r requirements.txt
```
Download SODA
```
python3 download_soda.py
```
the binary will be downloaded to lib/libsoda.so, and language models will be downloaded to models/en-US.
Patch the binary using Ghidra, or a similar tool. see notes below for a hint :)
Set the appropriate sample rate and channel count in config.py, as well as the path to the patched binary.
- These settings are not automatically selected for you, because in some cases I've found that the binary will do some automatic conversion, e.g. 48kHz -> 16kHz. It is reccomended, however, to set the exact values in accordance with the audio file you are transcribing. You can use ffprobe (part of ffmpeg) or the -p flag when running recognition.py to view this information.
Run
```
python3 recognition.py <audio_file.wav> [-l <language>] [-p]
```
where <language> is the language of the audio file, english if omitted. See recognition.py for supported languages, you must have the language models downloaded for the language you are using. -p will print the audio file's sample rate and channel count.

Helpful notes:

The binary fetched by download_soda.py is the one that Google Chrome fetches for use on a MacOS M1 device, other architectures will not work, although you can find a version for your architecture if you know where to look.
Patching hint: the binary calls some functions that verify execution context by checking the call stack.
Even if patched correctly, I've found that the binary might not work unless it is re-signed using MacOS's codesign tool.
Unfortuntely, the binary does not give particularly useful debug output, so any errors are difficult to track down. If you still want to see its output, you can reroute the stderr redirect in recognition.py to a file, instead of /dev/null.
I've tested it using wav files, with different sample rates, as well as both mono and stereo files. Higher sample rates, e.g. 48kHz, seem to produce better results, even though as far as I can tell the binary will convert inputted audio to 16kHz. It's recommended to leave the settings as is.

Special thanks to this blog post for the idea.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
audio		audio
proto		proto
static		static
.gitignore		.gitignore
README.md		README.md
config.py		config.py
download_soda.py		download_soda.py
recognition.py		recognition.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

standalone-soda

Usage

Helpful notes:

About

Releases

Packages

Languages

nickin1/standalone-soda

Folders and files

Latest commit

History

Repository files navigation

standalone-soda

Usage

Helpful notes:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages