This repo is about learning the basic process in speech's field by practicing creating an Automatic Speech Recognition (ASR) system. The process contains data preprocessing, model training and evaluation.
The dataset used in training ASR model is the VIVOS Corpus. You can download the dataset from here.
Here is a DeepSpeech2 model trained on VIVOS with the batch size of 8 and epochs of 200, which reached 0.4390 WER on VIVOS test set. You can download the model in link
- Take a look at audio processing: audio processing
- Train an ASR model with DeepSpeech2 and CTC Loss: training
- Examine edit distance for word_error_rate: word error rate
- Use the trained model to apply ASR part of Virtual Assistant: virutal assistant demo