We are excited to announce the release of ESP-SR V2.0, which brings significant improvements and new features to our Audio Front-end Framework.
Major Updates
AFE V2.0
- Pipeline Optimization: The pipeline for AEC (Acoustic Echo Cancellation), BSS (Beamforming Source Separation), NS (Noise Suppression), VAD (Voice Activity Detection), and WakeNet has been completely restructured to improve efficiency and performance.
- Breaking Changes: AFE V2.0 introduces breaking changes and is not backward compatible with V1.0. Please refer to the Migration Guide for detailed instructions on upgrading.
New VADNet Model
- Enhanced VAD Algorithm: A new VADNet model has been introduced, trained on nearly 15,000 hours of data. This model significantly outperforms the WebRTC VAD in filtering out noise.
- Customizable Settings: New settings such as
vad_min_noise_ms
,vad_min_speech_ms
, andvad_mode
have been added to allow users to fine-tune VAD behavior for various real-world scenarios. - VAD Cache: A vad_cache feature has been added to address potential audio data truncation caused by the first-frame trigger delay in VAD.
Training Wake Words by TTS Sample V2.0
- Improved TTS : A more powerful TTS (Text-to-Speech) model has been integrated, enhancing the performance of wake word training.
- Improved Accuracy: The updated system now achieves accuracy comparable to models trained with human samples, with performance ranging from 95% to 98%.
Documentation and Resources
ESP-SR Documentation: ESP-SR Documentation
Migration Guide: Migration from V1.* to V2.*
Audio Front-end Framework: Audio Front-end Framework
VADNet Model: Voice Activity Detection Model
Wake Word Training: Wake Word Training by TTS Sample V2.0
Examples: esp-skainet/examples
We hope these updates bring significant improvements to your projects.