Synergies with Willow or Home Assistant voice? #3

hornej · 2023-08-11T05:11:36Z

hornej
Aug 11, 2023

First off, great work!

I have been following other projects like this including Willow and the work synesthesiam has been doing with Home Assistant Voice. I have been testing the Willow Inference Server and it has been really fast being able to run locally.

I see a lot of similarities in what these communities are doing and wonder if there'd be a way to combine the best from all of them..

I have recently been messing around with DSP Concepts and microphone arrays and would be happy to help with hardware design where I can if people were interested in a smaller or more purpose built mic or speakerphone.

Also, it doesn't seem like you're a huge fan of wake words but I think it'd be cool to have custom wake words. You can generate them for free for various targets using Sensory or Picovoice. I'm not 100% sure what their licensing would be like for something like this and I don't think they support ESP but I'd prefer that over ESP's "Alexa" or "Hi ESP"

justLV · 2023-08-12T19:29:00Z

justLV
Aug 12, 2023
Maintainer

Yes, I just looked into Willow actually and created a discussion - I'm all for different software being deployable on this solution, fortunately we both coincidentally chose the ESP32-S3
toverainc/willow#237

Whoever is willing to help, I'm happy to support as I'd love to test out Willow Inference Server with this hardware.

Re: wakewords - yes happy to entertain, the process of creating one through Espressif is a little bit of a black box (send them audio files, wait a few weeks - and I'm not sure how they'd prioritize hobbyist users vs production products). I think there's a few options that are worth trying if you'd like.

But my thinking is that VAD can be used to detect talking, and a wakeword or more loosely defined "wake phrase" or "wake embedding/intent" can be more easily and flexibly deployed on the local "server" - using a classifier or even speaker diarization for detecting the user from models from Hugging face etc.

0 replies

hornej · 2023-08-13T05:27:49Z

hornej
Aug 13, 2023
Author

Awesome.

Yeah I was thinking it'd be more efficient to not stream audio data over the network until the wake word was detected. But if you use VAD on the ESP then it wouldn't be 24/7. And if it's all self-hosted privacy becomes less of a concern and I do see the benefit in not having to say "Hey Assistant" or whatever, although that might come at the cost of it responding to commands unintentionally. I agree that keeping as much as possible on the server simplifies things and obviously gives you the most flexibility in terms of compute.

I'm going to try out ESP-ADF and see how it compares with other AFE (DSP, Wake Word, VAD) solutions I've tested.

0 replies

jtwild · 2023-12-17T16:40:50Z

jtwild
Dec 17, 2023

Also agree on the nicw work! I haven't found any good-looking, local smart-speakers so this project fills a clear gap!

I was wondering if you think it will be possible to use the ESP Voice function from ESP Home with your hardware, or if you see any fundamental issues?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Synergies with Willow or Home Assistant voice? #3

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Synergies with Willow or Home Assistant voice? #3

hornej Aug 11, 2023

Replies: 3 comments

justLV Aug 12, 2023 Maintainer

hornej Aug 13, 2023 Author

jtwild Dec 17, 2023

hornej
Aug 11, 2023

justLV
Aug 12, 2023
Maintainer

hornej
Aug 13, 2023
Author

jtwild
Dec 17, 2023