We adopted the voice package of league of legends, based on the audio information of 142 heroes in league of legends, classified them, and identified which hero the audio object was by obtaining audio features. Our project is based on baidu's paper "Deep Speaker: an end-to-end Neural Speaker Embedding System"
The theory of implementation: https://arxiv.org/pdf/1705.02304.pdf
We package the audio data and store it on baidu cloud. Please scan the qr code by Wechat to download. The extraction code is “ tt8w”.
-
Install Python 3.
-
Install the latest version of TensorFlow for your platform. For better performance, install with GPU support if it's available. This code works with TensorFlow 1.3 and later.
-
Install requirements:
pip install -r requirements.txt
python SpeakerRecog.pyw
We package the model data and store it on baidu cloud. Please scan the qr code by Wechat to download. The extraction code is “ 6umw”.
After unpacking, your tree should look like this for model.
model
|- pre-model
|- train-model
|- best_checkpoint
|- GRU
|- ResidualCNN
- Download the speech dataset.
Unzip the absolute path to constants.py, or unzip the path to the same path as the DATASET_DIR in constants.py. see “Audio Samples” for the download link.
- Preprocess the data
python pre_process.py
- Train a model
python train.py
note : Pre-training and then training is recommended to reduce training time.
Pre-train:
python pretraining.py
Since we rename the data after processing it into npy format, the detailed code can refer to rename.py
,This will help you avoid unnecessary problems when importing new data.Otherwise if you will train the new data, you can adjust the parameters at constants.py
.
Because the training data is relatively pure, the training accuracy is already higher when we do not use GRU.
Moreover, we found that the distance between the sound source and the microphone would affect the detection quality, and the noisy environment would also affect the accuracy.The corresponding spectrum diagrams of the audio spectrum in different scenarios are as follows:
original sound:
Stay away - quiet:
Stay away - noise:
Close - quiet:
Close - noise:
Our roc curve performed particularly well: