Set _speech_start_time when VAD START_OF_SPEECH activates#5027
Open
hudson-worden wants to merge 4 commits intolivekit:mainfrom
Open
Set _speech_start_time when VAD START_OF_SPEECH activates#5027hudson-worden wants to merge 4 commits intolivekit:mainfrom
hudson-worden wants to merge 4 commits intolivekit:mainfrom
Conversation
…overwrite the time set by INFERENCE_DONE and should b/c it's more accurate.
Contributor
Author
|
This is the related issue thread on this |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This image attempts to convey the issue. I have plotted where the ChatMessage was (determined by MetricsReport and the box that shows it is designated across the whole span) vs where I believe it should be (on the right).
Basically it should align with this custom span I added that represents the vad start of speech.
I believe the reason this is happening is b/c we're setting
_speech_start_timeduring INFERENCE_DONE. When VAD picks up speech (even though it doesn't match up with the activation)_speech_start_timeis set. Instead it should be reset when a START_OF_SPEECH event occurs. This would align it with the activation threshold and also a few other placeshere
and here.
We're not removing the INFERENCE_DONE branch b/c it would handle the case where no other START_OF_SPEECH event comes. I'm open to removing that too though.