Skip to content

Commit

Permalink
update link for vocoders
Browse files Browse the repository at this point in the history
  • Loading branch information
yerfor committed May 13, 2022
1 parent 8e4249e commit 0ffda74
Show file tree
Hide file tree
Showing 5 changed files with 110 additions and 26 deletions.
19 changes: 10 additions & 9 deletions README-zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,9 @@

搭建环境

```
```bash
conda create -n synta python=3.7
source activate synta
conda activate synta
pip install -U pip
pip install Cython numpy==1.19.1
pip install torch==1.9.0
Expand All @@ -29,7 +29,6 @@ pip install -r requirements.txt
pip install dgl-cu102 dglgo -f https://data.dgl.ai/wheels/repo.html
sudo apt install -y sox libsox-fmt-mp3
bash mfa_usr/install_mfa.sh # install force alignment tools
```

## 运行 SyntaSpeech!
Expand All @@ -40,19 +39,19 @@ bash mfa_usr/install_mfa.sh # install force alignment tools

#### 准备数据集

您可以直接使用我们的处理好的[LJSpeech数据集](https://drive.google.com/file/d/1WfErAxKqMluQU3vupWS6VB6NdehXwCKM/view?usp=sharing)[Biaobei](https://drive.google.com/file/d/1-ApEbBrW5kfF0jM18EmW7DCsll-c1ROp/view?usp=sharing)数据集。 从链接给的谷歌云盘里下载它们并将它们解压缩到 `data/binary/` 文件夹中。
您可以直接使用我们的处理好的[LJSpeech数据集](https://drive.google.com/file/d/1WfErAxKqMluQU3vupWS6VB6NdehXwCKM/view?usp=sharing)[Biaobei](https://drive.google.com/file/d/1n_7NaGCiyieG5TTsPznI1tpHE9q3x9yt/view?usp=sharing)数据集。 从链接给的谷歌云盘里下载它们并将它们解压缩到 `data/binary/` 文件夹中。

至于 LibriTTS,您可以下载原始数据集并使用我们的“data_gen”模块对其进行处理。 详细说明可以在 [dosc/prepare_data](docs/prepare_data.md) 中找到。

#### 准备声码器

我们为三个数据集提供了预训练的声码器模型。 具体来说,Hifi-GAN 用于 [LJSpeech]()[Biaobei](),ParallelWaveGAN 用于 [LibriTTS]()。 将它们下载并解压到 `checkpoints/`文件夹。
我们为三个数据集提供了预训练的声码器模型。 具体来说,Hifi-GAN 用于 [LJSpeech](https://drive.google.com/file/d/1D8ABD4fa7TK6t_ymzzhtxsWHPhg7OXcG/view?usp=sharing)[Biaobei](https://drive.google.com/file/d/1onZbPA7rjR1ibmyV1Z-7G22j2Nekiic5/view?usp=sharing),ParallelWaveGAN 用于 [LibriTTS](https://drive.google.com/file/d/1AziBns4R6UDtrAWaIBRm5hWg9io38EWh/view?usp=sharing)。 将它们下载并解压到 `checkpoints/`文件夹。

### 2. 开始训练!

然后你可以在三个数据集中训练 SyntaSpeech。

```
```bash
cd <the root_dir of your SyntaSpeech folder>
export PYTHONPATH=./
CUDA_VISIBLE_DEVICES=0 python tasks/run.py --config egs/tts/lj/synta.yaml --exp_name lj_synta --reset # training in LJSpeech
Expand All @@ -62,23 +61,25 @@ CUDA_VISIBLE_DEVICES=0 python tasks/run.py --config egs/tts/biaobei/synta.yaml -

### 3. Tensorboard

```
```bash
tensorboard --logdir=checkpoints/lj_synta
tensorboard --logdir=checkpoints/biaobei_synta
tensorboard --logdir=checkpoints/libritts_synta
```

### 4. 模型推理

```
```bash
CUDA_VISIBLE_DEVICES=0 python tasks/run.py --config egs/tts/lj/synta.yaml --exp_name lj_synta --reset --infer # inference in LJSpeech
CUDA_VISIBLE_DEVICES=0 python tasks/run.py --config egs/tts/biaobei/synta.yaml --exp_name biaobei_synta --reset --infer # inference in Biaobei
CUDA_VISIBLE_DEVICES=0 python tasks/run.py --config egs/tts/biaobei/synta.yaml --exp_name libritts_synta --reset ---infer # inference in LibriTTS
```

## 音频演示

音频样本可以在我们的 [demo page](https://syntaspeech.github.io/) 中找到。
论文中的音频样本可以在我们的 [demo page](https://syntaspeech.github.io/) 中找到。

我们还为 LJSpeech 提供 [HuggingFace 演示页面](https://huggingface.co/spaces/NATSpeech/PortaSpeech)。 你可以在那里尝试你有趣的句子!

## 引用

Expand Down
17 changes: 9 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,9 @@ Our SyntaSpeech is built on the basis of [PortaSpeech](https://github.com/NATSp

## Environments

```
```bash
conda create -n synta python=3.7
source activate synta
condac activate synta
pip install -U pip
pip install Cython numpy==1.19.1
pip install torch==1.9.0
Expand All @@ -29,7 +29,6 @@ pip install -r requirements.txt
pip install dgl-cu102 dglgo -f https://data.dgl.ai/wheels/repo.html
sudo apt install -y sox libsox-fmt-mp3
bash mfa_usr/install_mfa.sh # install force alignment tools
```

## Run SyntaSpeech!
Expand All @@ -46,13 +45,13 @@ As for LibriTTS, you can download the raw datasets and process them with our `da

#### Vocoder Preparation

We provide the pre-trained model of vocoders for three datasets. Specifically, Hifi-GAN for [LJSpeech]() and [Biaobei](), ParallelWaveGAN for [LibriTTS](). Download and unzip them into the `checkpoints/` folder.
We provide the pre-trained model of vocoders for three datasets. Specifically, Hifi-GAN for [LJSpeech](https://drive.google.com/file/d/1D8ABD4fa7TK6t_ymzzhtxsWHPhg7OXcG/view?usp=sharing) and [Biaobei](https://drive.google.com/file/d/1onZbPA7rjR1ibmyV1Z-7G22j2Nekiic5/view?usp=sharing), ParallelWaveGAN for [LibriTTS](https://drive.google.com/file/d/1AziBns4R6UDtrAWaIBRm5hWg9io38EWh/view?usp=sharing). Download and unzip them into the `checkpoints/` folder.

### 2. Training Example

Then you can train SyntaSpeech in the three datasets.

```
```bash
cd <the root_dir of your SyntaSpeech folder>
export PYTHONPATH=./
CUDA_VISIBLE_DEVICES=0 python tasks/run.py --config egs/tts/lj/synta.yaml --exp_name lj_synta --reset # training in LJSpeech
Expand All @@ -62,23 +61,25 @@ CUDA_VISIBLE_DEVICES=0 python tasks/run.py --config egs/tts/biaobei/synta.yaml -

### 3. Tensorboard

```
```bash
tensorboard --logdir=checkpoints/lj_synta
tensorboard --logdir=checkpoints/biaobei_synta
tensorboard --logdir=checkpoints/libritts_synta
```

### 4. Inference Example

```
```bash
CUDA_VISIBLE_DEVICES=0 python tasks/run.py --config egs/tts/lj/synta.yaml --exp_name lj_synta --reset --infer # inference in LJSpeech
CUDA_VISIBLE_DEVICES=0 python tasks/run.py --config egs/tts/biaobei/synta.yaml --exp_name biaobei_synta --reset --infer # inference in Biaobei
CUDA_VISIBLE_DEVICES=0 python tasks/run.py --config egs/tts/biaobei/synta.yaml --exp_name libritts_synta --reset ---infer # inference in LibriTTS
```

## Audio Demos

Audio samples can be found in our [demo page](https://syntaspeech.github.io/).
Audio samples in the paper can be found in our [demo page](https://syntaspeech.github.io/).

We also provide [HuggingFace Demo Page](https://huggingface.co/spaces/NATSpeech/PortaSpeech) for LJSpeech. Try your interesting sentences there!

## Citation

Expand Down
14 changes: 10 additions & 4 deletions inference/tts/base_tts_infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,8 @@ def preprocess_input(self, inp):
ph_token = self.ph_encoder.encode(ph)
spk_id = self.spk_map[spk_name]
item = {'item_name': item_name, 'text': txt, 'ph': ph, 'spk_id': spk_id,
'ph_token': ph_token, 'word_token': word_token, 'ph2word': ph2word}
'ph_token': ph_token, 'word_token': word_token, 'ph2word': ph2word,
'ph_words':ph_gb_word, 'words': word}
item['ph_len'] = len(item['ph_token'])
return item

Expand Down Expand Up @@ -105,9 +106,14 @@ def example_run(cls):
from utils.audio.io import save_wav

set_hparams()
inp = {
'text': 'the invention of movable metal letters in the middle of the fifteenth century may justly be considered as the invention of the art of printing.'
}
if hp['ds_name'] in ['lj', 'libritts']:
inp = {
'text': 'the invention of movable metal letters in the middle of the fifteenth century may justly be considered as the invention of the art of printing.'
}
elif hp['ds_name'] in ['biaobei']:
inp = {
'text': '如果我想你三遍,天上乌云就散一片。'
}
infer_ins = cls(hp)
out = infer_ins.infer_once(inp)
os.makedirs('infer_out', exist_ok=True)
Expand Down
10 changes: 5 additions & 5 deletions inference/tts/gradio/gradio_settings.yaml
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
title: 'NATSpeech/PortaSpeech'
title: 'yerfor/SyntaSpeech'
description: |
Gradio demo for NATSpeech/PortaSpeech. To use it, simply add your audio, or click one of the examples to load them. Note: This space is running on CPU, inference times will be higher.
Gradio demo for yerfor/SyntaSpeech. To use it, simply add your audio, or click one of the examples to load them. Note: This space is running on CPU, inference times will be higher.
article: |
Link to <a href='https://github.com/NATSpeech/NATSpeech/blob/main/docs/portaspeech.md' style='color:blue;' target='_blank\'>Github REPO</a>
Link to <a href='https://github.com/yerfor/SyntaSpeech' style='color:blue;' target='_blank\'>Github REPO</a>
example_inputs:
- |-
the invention of movable metal letters in the middle of the fifteenth century may justly be considered as the invention of the art of printing.
- |-
produced the block books, which were the immediate predecessors of the true printed book,
inference_cls: inference.tts.ps_flow.PortaSpeechFlowInfer
exp_name: ps_normal_exp
inference_cls: inference.tts.synta.SyntaSpeechInfer
exp_name: lj_synta
76 changes: 76 additions & 0 deletions inference/tts/synta.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
import torch
from inference.tts.base_tts_infer import BaseTTSInfer
from modules.tts.syntaspeech.syntaspeech import SyntaSpeech
from utils.commons.ckpt_utils import load_ckpt
from utils.commons.hparams import hparams

from modules.tts.syntaspeech.syntactic_graph_buider import Sentence2GraphParser

class SyntaSpeechInfer(BaseTTSInfer):
def __init__(self, hparams, device=None):
super().__init__(hparams, device)
if hparams['ds_name'] in ['biaobei']:
self.syntactic_graph_builder = Sentence2GraphParser(language='zh')
elif hparams['ds_name'] in ['ljspeech', 'libritts']:
self.syntactic_graph_builder = Sentence2GraphParser(language='en')

def build_model(self):
ph_dict_size = len(self.ph_encoder)
word_dict_size = len(self.word_encoder)
model = SyntaSpeech(ph_dict_size, word_dict_size, self.hparams)
load_ckpt(model, hparams['work_dir'], 'model')
model.to(self.device)
with torch.no_grad():
model.store_inverse_all()
model.eval()
return model

def input_to_batch(self, item):
item_names = [item['item_name']]
text = [item['text']]
ph = [item['ph']]
txt_tokens = torch.LongTensor(item['ph_token'])[None, :].to(self.device)
txt_lengths = torch.LongTensor([txt_tokens.shape[1]]).to(self.device)
word_tokens = torch.LongTensor(item['word_token'])[None, :].to(self.device)
word_lengths = torch.LongTensor([word_tokens.shape[1]]).to(self.device)
ph2word = torch.LongTensor(item['ph2word'])[None, :].to(self.device)
spk_ids = torch.LongTensor(item['spk_id'])[None, :].to(self.device)
dgl_graph, etypes = self.syntactic_graph_builder.parse(item['text'], words=item['words'].split(" "), ph_words=item['ph_words'].split(" "))
dgl_graph = dgl_graph.to(self.device)
etypes = etypes.to(self.device)
batch = {
'item_name': item_names,
'text': text,
'ph': ph,
'txt_tokens': txt_tokens,
'txt_lengths': txt_lengths,
'word_tokens': word_tokens,
'word_lengths': word_lengths,
'ph2word': ph2word,
'spk_ids': spk_ids,
'graph_lst': [dgl_graph],
'etypes_lst': [etypes]
}
return batch
def forward_model(self, inp):
sample = self.input_to_batch(inp)
with torch.no_grad():
output = self.model(
sample['txt_tokens'],
sample['word_tokens'],
ph2word=sample['ph2word'],
word_len=sample['word_lengths'].max(),
infer=True,
forward_post_glow=True,
spk_id=sample.get('spk_ids'),
graph_lst=sample['graph_lst'],
etypes_lst=sample['etypes_lst']
)
mel_out = output['mel_out']
wav_out = self.run_vocoder(mel_out)
wav_out = wav_out.cpu().numpy()
return wav_out[0]


if __name__ == '__main__':
SyntaSpeechInfer.example_run()

0 comments on commit 0ffda74

Please sign in to comment.