Get audio data in real-time #122

t41372 · 2024-07-02T03:35:06Z

Is there a way to get the audio data while the speech is active before it ends? I want to get the audio data when the speech starts, stream it to the back end in real-time and stop streaming when it ends. It seems like the onFrameProcessed callback only has a probability property.
Thanks

The text was updated successfully, but these errors were encountered:

rahulbansal16 · 2024-08-06T08:40:52Z

Which model will you use in the backend to transcribe it? Does the word error rate increase in that way?

JettScythe · 2024-10-10T15:35:13Z

bumping as this is exactly what I need. I already have instances of Whisper that are available for transcription / translation on the back end - but reducing latency in a response means getting chunks transcribed as they appear. I suspect a reasonable "chunk" is one sentence.

Oudwins · 2025-01-21T15:02:34Z

Bumping this also. I'm also interested in this. I tried to implement it manually but when I concat all the frames from onFrameProcessed (as a test) the resulting audio is quite bad. It seems like this package, under the hood, merges frames with a sliding window approach. So just concatenating the frames causes echo and in general bad audio.

ricky0123 · 2025-01-22T01:50:56Z

This is something I would like to add better support for in the future, but it will be a significant addition and I'm still not sure what form I want the API to take. In the mean time, have you tried writing separate code to stream audio to your servers and just using the VAD callbacks for voice activity signals?

Oudwins · 2025-01-22T07:30:13Z

@ricky0123 yes I tried to implement streaming using the VAD callbacks with an algorithm that I think could work quite well for the library:

keep track of the padding frames
onSpeechStart keep track of speech frames
onSpeech exceeding min frames flush padding and saved speech frames to server and start streaming all new frames to server
onSpeechEnd stop streaming new frames to server
onVADmissfire empty buffered speech frames

But while implementing this I found the issue described in #186

…123#122

gencerege added a commit to gencerege/vad that referenced this issue Feb 22, 2025

added onEmitChunk callback to extract audio before onSpeechEnd ricky0…

470ebbc

…123#122

gencerege mentioned this issue Feb 22, 2025

added onEmitChunk callback to extract audio before onSpeechEnd. live audio #122 #191

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get audio data in real-time #122

Get audio data in real-time #122

t41372 commented Jul 2, 2024

rahulbansal16 commented Aug 6, 2024

JettScythe commented Oct 10, 2024

Oudwins commented Jan 21, 2025 •

edited

Loading

ricky0123 commented Jan 22, 2025

Oudwins commented Jan 22, 2025

Get audio data in real-time #122

Get audio data in real-time #122

Comments

t41372 commented Jul 2, 2024

rahulbansal16 commented Aug 6, 2024

JettScythe commented Oct 10, 2024

Oudwins commented Jan 21, 2025 • edited Loading

ricky0123 commented Jan 22, 2025

Oudwins commented Jan 22, 2025

Oudwins commented Jan 21, 2025 •

edited

Loading