Skip to content

ESP-SR Release v2.0.0

Latest
Compare
Choose a tag to compare
@feizi feizi released this 20 Feb 03:25
· 10 commits to master since this release

We are excited to announce the release of ESP-SR V2.0, which brings significant improvements and new features to our Audio Front-end Framework.

Major Updates

AFE V2.0

  • Pipeline Optimization: The pipeline for AEC (Acoustic Echo Cancellation), BSS (Beamforming Source Separation), NS (Noise Suppression), VAD (Voice Activity Detection), and WakeNet has been completely restructured to improve efficiency and performance.
  • Breaking Changes: AFE V2.0 introduces breaking changes and is not backward compatible with V1.0. Please refer to the Migration Guide for detailed instructions on upgrading.

New VADNet Model

  • Enhanced VAD Algorithm: A new VADNet model has been introduced, trained on nearly 15,000 hours of data. This model significantly outperforms the WebRTC VAD in filtering out noise.
  • Customizable Settings: New settings such as vad_min_noise_ms, vad_min_speech_ms, and vad_mode have been added to allow users to fine-tune VAD behavior for various real-world scenarios.
  • VAD Cache: A vad_cache feature has been added to address potential audio data truncation caused by the first-frame trigger delay in VAD.

Training Wake Words by TTS Sample V2.0

  • Improved TTS : A more powerful TTS (Text-to-Speech) model has been integrated, enhancing the performance of wake word training.
  • Improved Accuracy: The updated system now achieves accuracy comparable to models trained with human samples, with performance ranging from 95% to 98%.

Documentation and Resources

ESP-SR Documentation: ESP-SR Documentation
Migration Guide: Migration from V1.* to V2.*
Audio Front-end Framework: Audio Front-end Framework
VADNet Model: Voice Activity Detection Model
Wake Word Training: Wake Word Training by TTS Sample V2.0
Examples: esp-skainet/examples

We hope these updates bring significant improvements to your projects.