-
Notifications
You must be signed in to change notification settings - Fork 31
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit 6b71651
Showing
174 changed files
with
18,653 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,151 @@ | ||
### Project ignore | ||
|
||
/checkpoints/* | ||
!/checkpoints/.gitkeep | ||
/data/* | ||
!/data/.gitkeep | ||
infer_out | ||
rsync | ||
.idea | ||
.DS_Store | ||
bak | ||
tmp | ||
*.tar.gz | ||
mos | ||
nbs | ||
/configs_usr/* | ||
!/configs_usr/.gitkeep | ||
/egs_usr/* | ||
!/egs_usr/.gitkeep | ||
/rnnoise | ||
#/usr/* | ||
#!/usr/.gitkeep | ||
scripts_usr | ||
|
||
# Created by .ignore support plugin (hsz.mobi) | ||
### Python template | ||
# Byte-compiled / optimized / DLL files | ||
__pycache__/ | ||
*.py[cod] | ||
*$py.class | ||
|
||
# C extensions | ||
*.so | ||
|
||
# Distribution / packaging | ||
.Python | ||
build/ | ||
develop-eggs/ | ||
dist/ | ||
downloads/ | ||
eggs/ | ||
.eggs/ | ||
lib/ | ||
lib64/ | ||
parts/ | ||
sdist/ | ||
var/ | ||
wheels/ | ||
pip-wheel-metadata/ | ||
share/python-wheels/ | ||
*.egg-info/ | ||
.installed.cfg | ||
*.egg | ||
MANIFEST | ||
|
||
# PyInstaller | ||
# Usually these files are written by a python script from a template | ||
# before PyInstaller builds the exe, so as to inject date/other infos into it. | ||
*.manifest | ||
*.spec | ||
|
||
# Installer logs | ||
pip-log.txt | ||
pip-delete-this-directory.txt | ||
|
||
# Unit test / coverage reports | ||
htmlcov/ | ||
.tox/ | ||
.nox/ | ||
.coverage | ||
.coverage.* | ||
.cache | ||
nosetests.xml | ||
coverage.xml | ||
*.cover | ||
.hypothesis/ | ||
.pytest_cache/ | ||
|
||
# Translations | ||
*.mo | ||
*.pot | ||
|
||
# Django stuff: | ||
*.log | ||
local_settings.py | ||
db.sqlite3 | ||
db.sqlite3-journal | ||
|
||
# Flask stuff: | ||
instance/ | ||
.webassets-cache | ||
|
||
# Scrapy stuff: | ||
.scrapy | ||
|
||
# Sphinx documentation | ||
docs/_build/ | ||
|
||
# PyBuilder | ||
target/ | ||
|
||
# Jupyter Notebook | ||
.ipynb_checkpoints | ||
|
||
# IPython | ||
profile_default/ | ||
ipython_config.py | ||
|
||
# pyenv | ||
.python-version | ||
|
||
# pipenv | ||
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. | ||
# However, in case of collaboration, if having platform-specific dependencies or dependencies | ||
# having no cross-platform support, pipenv may install dependencies that don't work, or not | ||
# install all needed dependencies. | ||
#Pipfile.lock | ||
|
||
# celery beat schedule file | ||
celerybeat-schedule | ||
|
||
# SageMath parsed files | ||
*.sage.py | ||
|
||
# Environments | ||
.env | ||
.venv | ||
env/ | ||
venv/ | ||
ENV/ | ||
env.bak/ | ||
venv.bak/ | ||
|
||
# Spyder project settings | ||
.spyderproject | ||
.spyproject | ||
|
||
# Rope project settings | ||
.ropeproject | ||
|
||
# mkdocs documentation | ||
/site | ||
|
||
# mypy | ||
.mypy_cache/ | ||
.dmypy.json | ||
dmypy.json | ||
|
||
# Pyre type checker | ||
.pyre/ | ||
将删除 datasets/remi/test/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
MIT License | ||
|
||
Copyright (c) 2022 Yi Ren | ||
|
||
Permission is hereby granted, free of charge, to any person obtaining a copy | ||
of this software and associated documentation files (the "Software"), to deal | ||
in the Software without restriction, including without limitation the rights | ||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
copies of the Software, and to permit persons to whom the Software is | ||
furnished to do so, subject to the following conditions: | ||
|
||
The above copyright notice and this permission notice shall be included in all | ||
copies or substantial portions of the Software. | ||
|
||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
SOFTWARE. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,97 @@ | ||
<p align="center"> | ||
<br> | ||
<img src="assets/logo.png" width="200"/> | ||
<br> | ||
</p> | ||
|
||
<h2 align="center"> | ||
<p> NATSpeech: A Non-Autoregressive Text-to-Speech Framework</p> | ||
</h2> | ||
|
||
<div align="center"> | ||
|
||
[](https://github.com/NATSpeech/NATSpeech) | ||
[](https://github.com/NATSpeech/NATSpeech) | ||
[](https://github.com/NATSpeech/NATSpeech/blob/main/LICENSE) | ||
[](https://github.com/NATSpeech/NATSpeech/releases/tag/pretrained_models) | [English README](./README.md) | ||
|
||
</div> | ||
|
||
本仓库包含了以下工作的官方PyTorch实现: | ||
|
||
- [PortaSpeech: Portable and High-Quality Generative Text-to-Speech](https://proceedings.neurips.cc/paper/2021/file/748d6b6ed8e13f857ceaa6cfbdca14b8-Paper.pdf) (NeurIPS 2021)[Demo页面](https://portaspeech.github.io/) | [HuggingFace🤗 Demo](https://huggingface.co/spaces/NATSpeech/PortaSpeech) | ||
- [DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism](https://arxiv.org/abs/2105.02446) (DiffSpeech) (AAAI 2022) | ||
[Demo页面](https://diffsinger.github.io/) | [项目主页](https://github.com/MoonInTheRiver/DiffSinger) | [HuggingFace🤗 Demo](https://huggingface.co/spaces/NATSpeech/DiffSpeech) | ||
|
||
## 主要特点 | ||
|
||
我们在本框架中实现了以下特点: | ||
|
||
- 基于[Montreal Forced Aligner](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner)的非自回归语音合成数据处理流程; | ||
- 便于使用和可扩展的训练和测试框架; | ||
- 简单但有效的随机访问数据集类的实现。 | ||
|
||
## 安装依赖 | ||
|
||
```bash | ||
## 在 Linux/Ubuntu 18.04 上通过测试 | ||
## 首先需要安装 Python 3.6+ (推荐使用Anaconda) | ||
|
||
export PYTHONPATH=. | ||
# 创建虚拟环境 (推荐). | ||
python -m venv venv | ||
source venv/bin/activate | ||
# 安装依赖 | ||
pip install -U pip | ||
pip install Cython numpy==1.19.1 | ||
pip install torch==1.9.0 # 推荐 torch >= 1.9.0 | ||
pip install -r requirements.txt | ||
sudo apt install -y sox libsox-fmt-mp3 | ||
bash mfa_usr/install_mfa.sh # 安装强制对齐工具 | ||
pip install dgl-cu102 dglgo -f https://data.dgl.ai/wheels/repo.html | ||
``` | ||
|
||
## 文档 | ||
|
||
- [关于本框架](./docs/zh/framework.md) | ||
- [运行PortaSpeech](./docs/portaspeech.md) | ||
- [运行DiffSpeech](./docs/diffspeech.md) | ||
|
||
## 引用 | ||
|
||
如果本REPO对你的研究和工作有用,请引用以下论文: | ||
|
||
- PortaSpeech | ||
|
||
```bib | ||
@article{ren2021portaspeech, | ||
title={PortaSpeech: Portable and High-Quality Generative Text-to-Speech}, | ||
author={Ren, Yi and Liu, Jinglin and Zhao, Zhou}, | ||
journal={Advances in Neural Information Processing Systems}, | ||
volume={34}, | ||
year={2021} | ||
} | ||
``` | ||
|
||
- DiffSpeech | ||
|
||
```bib | ||
@article{liu2021diffsinger, | ||
title={Diffsinger: Singing voice synthesis via shallow diffusion mechanism}, | ||
author={Liu, Jinglin and Li, Chengxi and Ren, Yi and Chen, Feiyang and Liu, Peng and Zhao, Zhou}, | ||
journal={arXiv preprint arXiv:2105.02446}, | ||
volume={2}, | ||
year={2021} | ||
} | ||
``` | ||
|
||
## 致谢 | ||
|
||
我们的代码受以下代码和仓库启发: | ||
|
||
- [PyTorch Lightning](https://github.com/PyTorchLightning/pytorch-lightning) | ||
- [ParallelWaveGAN](https://github.com/kan-bayashi/ParallelWaveGAN) | ||
- [Hifi-GAN](https://github.com/jik876/hifi-gan) | ||
- [espnet](https://github.com/espnet/espnet) | ||
- [Glow-TTS](https://github.com/jaywalnut310/glow-tts) | ||
- [DiffSpeech](https://github.com/MoonInTheRiver/DiffSinger) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,104 @@ | ||
# SyntaSpeech: Syntax-Aware Generative Adversarial Text-to-Speech | ||
|
||
[](https://arxiv.org/abs/2204.11792)[](https://github.com/yerfor/SyntaSpeech)[](https://github.com/yerfor/SyntaSpeech/releases) | [](https://huggingface.co/spaces/yerfor/SyntaSpeech) | [中文文档](README-zh.md) | ||
|
||
This repository is the official PyTorch implementation of our IJCAI-2022 [paper](https://arxiv.org/abs/2204.11792), in which we propose **SyntaSpeech** for syntax-aware non-autoregressive Text-to-Speech. | ||
|
||
<p align="center"> | ||
<br> | ||
<img src="assets/SyntaSpeech.png" width="1000"/> | ||
<br> | ||
</p> | ||
|
||
Our SyntaSpeech is built on the basis of [PortaSpeech](https://github.com/NATSpeech/NATSpeech) (NeurIPS 2021) with three new features: | ||
|
||
1. We propose **Syntactic Graph Builder (Sec. 3.1)** and **Syntactic Graph Encoder (Sec. 3.2)**, which is proved to be an effective unit to extract syntactic features to improve the prosody modeling and duration accuracy of TTS model. | ||
2. We introduce **Multi-Length Adversarial Training (Sec. 3.3)**, which could replace the flow-based post-net in PortaSpeech, speeding up the inference time and improving the audio quality naturalness. | ||
3. We support three datasets: [LJSpeech](https://keithito.com/LJ-Speech-Dataset/) (single-speaker English dataset), [Biaobei](https://www.data-baker.com/open%20source.html) (single-speaker Chinese dataset) , and [LibriTTS](http://www.openslr.org/60) (multi-speaker English dataset). | ||
|
||
## Environments | ||
|
||
``` | ||
conda create -n synta python=3.7 | ||
source activate synta | ||
pip install -U pip | ||
pip install Cython numpy==1.19.1 | ||
pip install torch==1.9.0 | ||
pip install -r requirements.txt | ||
# install dgl for graph neural network, dgl-cu102 supports rtx2080, dgl-cu113 support rtx3090 | ||
pip install dgl-cu102 dglgo -f https://data.dgl.ai/wheels/repo.html | ||
sudo apt install -y sox libsox-fmt-mp3 | ||
bash mfa_usr/install_mfa.sh # install force alignment tools | ||
``` | ||
|
||
## Run SyntaSpeech! | ||
|
||
**Please follow the following steps to run this repo.** | ||
|
||
### 1. Preparation | ||
|
||
#### Data Preparation | ||
|
||
You can directly use our binarized datasets for LJSpeech and Biaobei. Download them from [this link]() and unzip them into the `data/binary/` folder. | ||
|
||
As for LibriTTS, you can download the raw datasets and process them with our `data_gen` modules. Detailed instructions can be found in [dosc/prepare_data](docs/prepare_data.md). | ||
|
||
#### Vocoder Preparation | ||
|
||
We provide the pre-trained model of vocoders for three datasets. Specifically, Hifi-GAN for [LJSpeech]() and [Biaobei](), ParallelWaveGAN for [LibriTTS](). Download and unzip them into the `checkpoints/` folder. | ||
|
||
### 2. Training Example | ||
|
||
Then you can train SyntaSpeech in the three datasets. | ||
|
||
``` | ||
cd <the root_dir of your SyntaSpeech folder> | ||
export PYTHONPATH=./ | ||
CUDA_VISIBLE_DEVICES=0 python tasks/run.py --config egs/tts/lj/synta.yaml --exp_name lj_synta --reset # training in LJSpeech | ||
CUDA_VISIBLE_DEVICES=0 python tasks/run.py --config egs/tts/biaobei/synta.yaml --exp_name biaobei_synta --reset # training in Biaobei | ||
CUDA_VISIBLE_DEVICES=0 python tasks/run.py --config egs/tts/biaobei/synta.yaml --exp_name libritts_synta --reset # training in LibriTTS | ||
``` | ||
|
||
### 3. Tensorboard | ||
|
||
``` | ||
tensorboard --logdir=checkpoints/lj_synta | ||
tensorboard --logdir=checkpoints/biaobei_synta | ||
tensorboard --logdir=checkpoints/libritts_synta | ||
``` | ||
|
||
### 4. Inference Example | ||
|
||
``` | ||
CUDA_VISIBLE_DEVICES=0 python tasks/run.py --config egs/tts/lj/synta.yaml --exp_name lj_synta --reset --infer # inference in LJSpeech | ||
CUDA_VISIBLE_DEVICES=0 python tasks/run.py --config egs/tts/biaobei/synta.yaml --exp_name biaobei_synta --reset --infer # inference in Biaobei | ||
CUDA_VISIBLE_DEVICES=0 python tasks/run.py --config egs/tts/biaobei/synta.yaml --exp_name libritts_synta --reset ---infer # inference in LibriTTS | ||
``` | ||
|
||
## Audio Demos | ||
|
||
Audio samples can be found in our [demo page](https://syntaspeech.github.io/). | ||
|
||
## Citation | ||
|
||
``` | ||
@article{ye2022syntaspeech, | ||
title={SyntaSpeech: Syntax-Aware Generative Adversarial Text-to-Speech}, | ||
author={Ye, Zhenhui and Zhao, Zhou and Ren, Yi and Wu, Fei}, | ||
journal={arXiv preprint arXiv:2204.11792}, | ||
year={2022} | ||
} | ||
``` | ||
|
||
## Acknowledgements | ||
|
||
**Our codes are based on the following repos:** | ||
|
||
* [NATSpeech](https://github.com/NATSpeech/NATSpeech) | ||
* [PyTorch Lightning](https://github.com/PyTorchLightning/pytorch-lightning) | ||
* [ParallelWaveGAN](https://github.com/kan-bayashi/ParallelWaveGAN) | ||
* [HifiGAN](https://github.com/jik876/hifi-gan) | ||
* [espnet](https://github.com/espnet/espnet) | ||
* [Glow-TTS](https://github.com/jaywalnut310/glow-tts) | ||
* [DiffSpeech](https://github.com/MoonInTheRiver/DiffSinger) |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Empty file.
Empty file.
Oops, something went wrong.