Replies: 3 comments
-
Yes, I just looked into Willow actually and created a discussion - I'm all for different software being deployable on this solution, fortunately we both coincidentally chose the ESP32-S3 Whoever is willing to help, I'm happy to support as I'd love to test out Willow Inference Server with this hardware. Re: wakewords - yes happy to entertain, the process of creating one through Espressif is a little bit of a black box (send them audio files, wait a few weeks - and I'm not sure how they'd prioritize hobbyist users vs production products). I think there's a few options that are worth trying if you'd like. But my thinking is that VAD can be used to detect talking, and a wakeword or more loosely defined "wake phrase" or "wake embedding/intent" can be more easily and flexibly deployed on the local "server" - using a classifier or even speaker diarization for detecting the user from models from Hugging face etc. |
Beta Was this translation helpful? Give feedback.
-
Awesome. Yeah I was thinking it'd be more efficient to not stream audio data over the network until the wake word was detected. But if you use VAD on the ESP then it wouldn't be 24/7. And if it's all self-hosted privacy becomes less of a concern and I do see the benefit in not having to say "Hey Assistant" or whatever, although that might come at the cost of it responding to commands unintentionally. I agree that keeping as much as possible on the server simplifies things and obviously gives you the most flexibility in terms of compute. I'm going to try out ESP-ADF and see how it compares with other AFE (DSP, Wake Word, VAD) solutions I've tested. |
Beta Was this translation helpful? Give feedback.
-
Also agree on the nicw work! I haven't found any good-looking, local smart-speakers so this project fills a clear gap! I was wondering if you think it will be possible to use the ESP Voice function from ESP Home with your hardware, or if you see any fundamental issues? |
Beta Was this translation helpful? Give feedback.
-
First off, great work!
I have been following other projects like this including Willow and the work synesthesiam has been doing with Home Assistant Voice. I have been testing the Willow Inference Server and it has been really fast being able to run locally.
I see a lot of similarities in what these communities are doing and wonder if there'd be a way to combine the best from all of them..
I have recently been messing around with DSP Concepts and microphone arrays and would be happy to help with hardware design where I can if people were interested in a smaller or more purpose built mic or speakerphone.
Also, it doesn't seem like you're a huge fan of wake words but I think it'd be cool to have custom wake words. You can generate them for free for various targets using Sensory or Picovoice. I'm not 100% sure what their licensing would be like for something like this and I don't think they support ESP but I'd prefer that over ESP's "Alexa" or "Hi ESP"
Beta Was this translation helpful? Give feedback.
All reactions