Release ESP-SR Release v2.0.0 · espressif/esp-sr

We are excited to announce the release of ESP-SR V2.0, which brings significant improvements and new features to our Audio Front-end Framework.

Pipeline Optimization: The pipeline for AEC (Acoustic Echo Cancellation), BSS (Beamforming Source Separation), NS (Noise Suppression), VAD (Voice Activity Detection), and WakeNet has been completely restructured to improve efficiency and performance.
Breaking Changes: AFE V2.0 introduces breaking changes and is not backward compatible with V1.0. Please refer to the Migration Guide for detailed instructions on upgrading.

Enhanced VAD Algorithm: A new VADNet model has been introduced, trained on nearly 15,000 hours of data. This model significantly outperforms the WebRTC VAD in filtering out noise.
Customizable Settings: New settings such as vad_min_noise_ms, vad_min_speech_ms, and vad_mode have been added to allow users to fine-tune VAD behavior for various real-world scenarios.
VAD Cache: A vad_cache feature has been added to address potential audio data truncation caused by the first-frame trigger delay in VAD.

Improved TTS : A more powerful TTS (Text-to-Speech) model has been integrated, enhancing the performance of wake word training.
Improved Accuracy: The updated system now achieves accuracy comparable to models trained with human samples, with performance ranging from 95% to 98%.

We hope these updates bring significant improvements to your projects.

Provide feedback