Replies: 3 comments 41 replies
-
@cyanbx thanks for this! i'll review the fine transformer code today i'm sure it is something minor |
Beta Was this translation helpful? Give feedback.
-
@cyanbx Hi, I am now also using codecs of Encoder(facebook pretrained version) to train a coasre2fine model. My problem is training in single-gpu works,but when multi-gpus, I can not start the train. The details is in #128 |
Beta Was this translation helpful? Give feedback.
-
@cyanbx Hi, could you share your training details of CoraseTransformer and some audio samples generated by it? I'm trying to train a CoraseTransformer on the 'dev-clean' set of LibriSpeech and I only get noise even i use a training sample as input. |
Beta Was this translation helpful? Give feedback.
-
I have successfully trained a CoraseTransformer which can generate intelligable speech from semantic tokens. But although the FineTransformer seems to have a similar architecture and training pipeline, I can't reconstruct high quality audio with it. It's output audio doesn't seem to be better than those from only coarse tokens, and the spectrogram graphs seem to be more blurred. Is there any advice on improving the training or inference process?
My FineTransformer hparams:
Here are the comparison of spectrograms.



Reconstructed from COARSE TOKEN:
Reconstructed from COARSE TOKEN and Inferenced FINE TOKEN:
Reconstructed from Ground Truth TOKEN:
Beta Was this translation helpful? Give feedback.
All reactions