init

yerfor · May 12, 2022 · 6b71651 · 6b71651
commit 6b71651
Show file tree

Hide file tree

Showing 174 changed files with 18,653 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,151 @@
+### Project ignore
+
+/checkpoints/*
+!/checkpoints/.gitkeep
+/data/*
+!/data/.gitkeep
+infer_out
+rsync
+.idea
+.DS_Store
+bak
+tmp
+*.tar.gz
+mos
+nbs
+/configs_usr/*
+!/configs_usr/.gitkeep
+/egs_usr/*
+!/egs_usr/.gitkeep
+/rnnoise
+#/usr/*
+#!/usr/.gitkeep
+scripts_usr
+
+# Created by .ignore support plugin (hsz.mobi)
+### Python template
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+pip-wheel-metadata/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+.hypothesis/
+.pytest_cache/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# IPython
+profile_default/
+ipython_config.py
+
+# pyenv
+.python-version
+
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+#Pipfile.lock
+
+# celery beat schedule file
+celerybeat-schedule
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+
+# Pyre type checker
+.pyre/
+将删除 datasets/remi/test/
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2022 Yi Ren
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/README-zh.md b/README-zh.md
@@ -0,0 +1,97 @@
+<p align="center">
+    <br>
+    <img src="assets/logo.png" width="200"/>
+    <br>
+</p>
+
+<h2 align="center">
+<p> NATSpeech: A Non-Autoregressive Text-to-Speech Framework</p>
+</h2>
+
+<div align="center">
+
+[![](https://img.shields.io/github/stars/NATSpeech/NATSpeech)](https://github.com/NATSpeech/NATSpeech)
+[![](https://img.shields.io/github/forks/NATSpeech/NATSpeech)](https://github.com/NATSpeech/NATSpeech)
+[![](https://img.shields.io/github/license/NATSpeech/NATSpeech)](https://github.com/NATSpeech/NATSpeech/blob/main/LICENSE)
+[![](https://img.shields.io/github/downloads/NATSpeech/NATSpeech/total?label=pretrained+model+downloads)](https://github.com/NATSpeech/NATSpeech/releases/tag/pretrained_models) | [English README](./README.md)
+
+</div>
+
+本仓库包含了以下工作的官方PyTorch实现：
+
+- [PortaSpeech: Portable and High-Quality Generative Text-to-Speech](https://proceedings.neurips.cc/paper/2021/file/748d6b6ed8e13f857ceaa6cfbdca14b8-Paper.pdf) (NeurIPS 2021)[Demo页面](https://portaspeech.github.io/) | [HuggingFace🤗 Demo](https://huggingface.co/spaces/NATSpeech/PortaSpeech)
+- [DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism](https://arxiv.org/abs/2105.02446) (DiffSpeech) (AAAI 2022)
+  [Demo页面](https://diffsinger.github.io/) | [项目主页](https://github.com/MoonInTheRiver/DiffSinger) | [HuggingFace🤗 Demo](https://huggingface.co/spaces/NATSpeech/DiffSpeech)
+
+## 主要特点
+
+我们在本框架中实现了以下特点：
+
+- 基于[Montreal Forced Aligner](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner)的非自回归语音合成数据处理流程；
+- 便于使用和可扩展的训练和测试框架；
+- 简单但有效的随机访问数据集类的实现。
+
+## 安装依赖
+
+```bash
+## 在 Linux/Ubuntu 18.04 上通过测试 
+## 首先需要安装 Python 3.6+ (推荐使用Anaconda)
+
+export PYTHONPATH=.
+# 创建虚拟环境 (推荐).
+python -m venv venv
+source venv/bin/activate
+# 安装依赖
+pip install -U pip
+pip install Cython numpy==1.19.1
+pip install torch==1.9.0 # 推荐 torch >= 1.9.0
+pip install -r requirements.txt
+sudo apt install -y sox libsox-fmt-mp3
+bash mfa_usr/install_mfa.sh # 安装强制对齐工具
+pip install dgl-cu102 dglgo -f https://data.dgl.ai/wheels/repo.html
+```
+
+## 文档
+
+- [关于本框架](./docs/zh/framework.md)
+- [运行PortaSpeech](./docs/portaspeech.md)
+- [运行DiffSpeech](./docs/diffspeech.md)
+
+## 引用
+
+如果本REPO对你的研究和工作有用，请引用以下论文：
+
+- PortaSpeech
+
+```bib
+@article{ren2021portaspeech,
+  title={PortaSpeech: Portable and High-Quality Generative Text-to-Speech},
+  author={Ren, Yi and Liu, Jinglin and Zhao, Zhou},
+  journal={Advances in Neural Information Processing Systems},
+  volume={34},
+  year={2021}
+}
+```
+
+- DiffSpeech
+
+```bib
+@article{liu2021diffsinger,
+  title={Diffsinger: Singing voice synthesis via shallow diffusion mechanism},
+  author={Liu, Jinglin and Li, Chengxi and Ren, Yi and Chen, Feiyang and Liu, Peng and Zhao, Zhou},
+  journal={arXiv preprint arXiv:2105.02446},
+  volume={2},
+  year={2021}
+ }
+```
+
+## 致谢
+
+我们的代码受以下代码和仓库启发：
+
+- [PyTorch Lightning](https://github.com/PyTorchLightning/pytorch-lightning)
+- [ParallelWaveGAN](https://github.com/kan-bayashi/ParallelWaveGAN)
+- [Hifi-GAN](https://github.com/jik876/hifi-gan)
+- [espnet](https://github.com/espnet/espnet)
+- [Glow-TTS](https://github.com/jaywalnut310/glow-tts)
+- [DiffSpeech](https://github.com/MoonInTheRiver/DiffSinger)
diff --git a/README.md b/README.md
@@ -0,0 +1,104 @@
+# SyntaSpeech: Syntax-Aware Generative Adversarial Text-to-Speech
+
+[![arXiv](https://img.shields.io/badge/arXiv-Paper-%3CCOLOR%3E.svg)](https://arxiv.org/abs/2204.11792)[![GitHub Stars](https://img.shields.io/github/stars/yerfor/SyntaSpeech)](https://github.com/yerfor/SyntaSpeech)[![downloads](https://img.shields.io/github/downloads/yerfor/SyntaSpeech/total.svg)](https://github.com/yerfor/SyntaSpeech/releases) | [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-blue)](https://huggingface.co/spaces/yerfor/SyntaSpeech) | [中文文档](README-zh.md)
+
+This repository is the official PyTorch implementation of our IJCAI-2022 [paper](https://arxiv.org/abs/2204.11792), in which we propose **SyntaSpeech** for syntax-aware non-autoregressive Text-to-Speech.
+
+<p align="center">
+    <br>
+    <img src="assets/SyntaSpeech.png" width="1000"/>
+    <br>
+</p>
+
+Our SyntaSpeech is built on the basis of  [PortaSpeech](https://github.com/NATSpeech/NATSpeech) (NeurIPS 2021) with three new features:
+
+1. We propose **Syntactic Graph Builder (Sec. 3.1)** and **Syntactic Graph Encoder (Sec. 3.2)**, which is proved to be an effective unit to extract syntactic features to improve the prosody modeling and duration accuracy of TTS model.
+2. We introduce **Multi-Length Adversarial Training (Sec. 3.3)**, which could replace the flow-based post-net in PortaSpeech, speeding up the inference time and improving the audio quality naturalness.
+3. We support three datasets: [LJSpeech](https://keithito.com/LJ-Speech-Dataset/) (single-speaker English dataset), [Biaobei](https://www.data-baker.com/open%20source.html) (single-speaker Chinese dataset) , and [LibriTTS](http://www.openslr.org/60) (multi-speaker English dataset).
+
+## Environments
+
+```
+conda create -n synta python=3.7
+source activate synta
+pip install -U pip
+pip install Cython numpy==1.19.1
+pip install torch==1.9.0 
+pip install -r requirements.txt
+# install dgl for graph neural network, dgl-cu102 supports rtx2080, dgl-cu113 support rtx3090
+pip install dgl-cu102 dglgo -f https://data.dgl.ai/wheels/repo.html 
+sudo apt install -y sox libsox-fmt-mp3
+bash mfa_usr/install_mfa.sh # install force alignment tools
+
+```
+
+## Run SyntaSpeech!
+
+**Please follow the following steps to run this repo.**
+
+### 1. Preparation
+
+#### Data Preparation
+
+You can directly use our binarized datasets for LJSpeech and Biaobei. Download them from [this link]() and unzip them into the `data/binary/` folder.
+
+As for LibriTTS, you can download the raw datasets and process them with our `data_gen` modules. Detailed instructions can be found in [dosc/prepare_data](docs/prepare_data.md).
+
+#### Vocoder Preparation
+
+We provide the pre-trained model of vocoders for three datasets. Specifically, Hifi-GAN for [LJSpeech]() and [Biaobei](), ParallelWaveGAN for [LibriTTS](). Download and unzip them into the `checkpoints/` folder.
+
+### 2. Training Example
+
+Then you can train SyntaSpeech in the three datasets.
+
+```
+cd <the root_dir of your SyntaSpeech folder>
+export PYTHONPATH=./
+CUDA_VISIBLE_DEVICES=0 python tasks/run.py --config egs/tts/lj/synta.yaml --exp_name lj_synta --reset # training in LJSpeech
+CUDA_VISIBLE_DEVICES=0 python tasks/run.py --config egs/tts/biaobei/synta.yaml --exp_name biaobei_synta --reset # training in Biaobei
+CUDA_VISIBLE_DEVICES=0 python tasks/run.py --config egs/tts/biaobei/synta.yaml --exp_name libritts_synta --reset # training in LibriTTS
+```
+
+### 3. Tensorboard
+
+```
+tensorboard --logdir=checkpoints/lj_synta
+tensorboard --logdir=checkpoints/biaobei_synta
+tensorboard --logdir=checkpoints/libritts_synta
+```
+
+### 4. Inference Example
+
+```
+CUDA_VISIBLE_DEVICES=0 python tasks/run.py --config egs/tts/lj/synta.yaml --exp_name lj_synta --reset --infer # inference in LJSpeech
+CUDA_VISIBLE_DEVICES=0 python tasks/run.py --config egs/tts/biaobei/synta.yaml --exp_name biaobei_synta --reset --infer # inference in Biaobei
+CUDA_VISIBLE_DEVICES=0 python tasks/run.py --config egs/tts/biaobei/synta.yaml --exp_name libritts_synta --reset ---infer # inference in LibriTTS
+```
+
+## Audio Demos
+
+Audio samples can be found in our [demo page](https://syntaspeech.github.io/).
+
+## Citation
+
+```
+@article{ye2022syntaspeech,
+  title={SyntaSpeech: Syntax-Aware Generative Adversarial Text-to-Speech},
+  author={Ye, Zhenhui and Zhao, Zhou and Ren, Yi and Wu, Fei},
+  journal={arXiv preprint arXiv:2204.11792},
+  year={2022}
+}
+```
+
+## Acknowledgements
+
+**Our codes are based on the following repos:**
+
+* [NATSpeech](https://github.com/NATSpeech/NATSpeech)
+* [PyTorch Lightning](https://github.com/PyTorchLightning/pytorch-lightning)
+* [ParallelWaveGAN](https://github.com/kan-bayashi/ParallelWaveGAN)
+* [HifiGAN](https://github.com/jik876/hifi-gan)
+* [espnet](https://github.com/espnet/espnet)
+* [Glow-TTS](https://github.com/jaywalnut310/glow-tts)
+* [DiffSpeech](https://github.com/MoonInTheRiver/DiffSinger)
diff --git a/assets/SyntaSpeech.png b/assets/SyntaSpeech.png
diff --git a/checkpoints/.gitkeep b/checkpoints/.gitkeep
diff --git a/data/.gitkeep b/data/.gitkeep