-
-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get audio data in real-time #122
Comments
Which model will you use in the backend to transcribe it? Does the word error rate increase in that way? |
bumping as this is exactly what I need. I already have instances of Whisper that are available for transcription / translation on the back end - but reducing latency in a response means getting chunks transcribed as they appear. I suspect a reasonable "chunk" is one sentence. |
Bumping this also. I'm also interested in this. I tried to implement it manually but when I concat all the frames from onFrameProcessed (as a test) the resulting audio is quite bad. It seems like this package, under the hood, merges frames with a sliding window approach. So just concatenating the frames causes echo and in general bad audio. |
This is something I would like to add better support for in the future, but it will be a significant addition and I'm still not sure what form I want the API to take. In the mean time, have you tried writing separate code to stream audio to your servers and just using the VAD callbacks for voice activity signals? |
@ricky0123 yes I tried to implement streaming using the VAD callbacks with an algorithm that I think could work quite well for the library:
But while implementing this I found the issue described in #186 |
Is there a way to get the audio data while the speech is active before it ends? I want to get the audio data when the speech starts, stream it to the back end in real-time and stop streaming when it ends. It seems like the
onFrameProcessed
callback only has a probability property.Thanks
The text was updated successfully, but these errors were encountered: