Skip to content

Latest commit

 

History

History
54 lines (31 loc) · 4.62 KB

File metadata and controls

54 lines (31 loc) · 4.62 KB

Audio & Speech Generation Paper List

A curated list of papers and resources related to Speech & Audio Generation. This project is just starting and still requires a lot of work. So feel free to contribute!

Paper

Text to Speech (TTS)

Voice Conversion (VC)

Audio Generation and Text to Audio (TTA)

Notes that actually many audio generation models are also able to generate speech.

Singing Voice Synthesis (SVS)

Speech to Speech Translation (S2ST/ STST)

Streaming & Simultaneous Translation

Speech Translation Dataset

Text to Music(TTM)

Large Language Model(LLM)

Software/ Libraries

Speech Synthesis

  • BERT-VITS2: A TTS tool shows great performance on Chinese speech synthesis.
  • Amphion: An Open-Source Audio, Music, and Speech Generation Toolkit. The Goal of Amphion is to offer a platform for studying the conversion of any inputs into audio. (TTS, SVS, VC, SVC, TTA, TTM) [Paper, Video(Chinese)]
  • Speech Brain: A PyTorch-based Speech Toolkit.
  • ESPNet: An End-to-End Speech Processing Toolkit.