Speech-to-text translation

Speech-to-text (STT) translation implimented as a part of a Deep Speaking Avatar Bsc project.

DeepSpeech

In the baseline version, translation from speech to text is done using an open-source Speech-To-Text engine DeepSpeech.

Jasper and QuartzNet

NVIDIA NeMo toolkit offers pre-build Jasper and QuartzNet speech recognition models. Both of which achieve better word error rates when compared to DeepSpeech. In the end, QuartzNet will be used in the Deep Speaking Avatar project, since it is more parameter efficient when compared to the Jasper model.

Requirements

Python 3.8
Pip3

Setup on Linux

Run 'setup.sh' to install all needed dependencies and pre-trained models:

$ ./setup.sh

Evaluating the models also requires LibriSpeech dev and test datasets, available at https://www.openslr.org/12.

How to use

usage: main.py [-h] [--i [INPUT]] [--o [OUTPUT]] [-jasper] [-deepspeech] [-evaluate]

Translate speech to text and save text to file

optional arguments:
  -h, --help    show this help message and exit
  --i [INPUT]   Path to the input audio file (default: /home/avatar/integration/stt_input.wav)
  --o [OUTPUT]  Path to the output text file (default: /home/avatar/integration/stt_output.txt)
  -jasper       Use Jasper model (default: QuartzNet)
  -deepspeech   Use DeepSpeech model (default: QuartzNet)
  -evaluate     Evaluate model word error rate and time consumption. Given INPUT and/or OUTPUT will be
                ignored

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
results		results
src		src
README.md		README.md
requirements.txt		requirements.txt
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech-to-text translation

DeepSpeech

Jasper and QuartzNet

Requirements

Setup on Linux

How to use

About

Releases

Packages

Languages

Norskiii/speech-to-text-translation

Folders and files

Latest commit

History

Repository files navigation

Speech-to-text translation

DeepSpeech

Jasper and QuartzNet

Requirements

Setup on Linux

How to use

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages