-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Fix VAD eou detection is exiting early #4988
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
044a8a2
43096b7
b2221c0
a25deee
6f0da8e
222f92a
5b9b4d9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -69,7 +69,8 @@ async def predict_end_of_turn( | |
| ) -> float: ... | ||
|
|
||
|
|
||
| TurnDetectionMode = Literal["stt", "vad", "realtime_llm", "manual"] | _TurnDetector | ||
| TurnDetectionType = Literal["stt", "vad", "realtime_llm", "manual"] | ||
| TurnDetectionMode = TurnDetectionType | _TurnDetector | ||
| """ | ||
| The mode of turn detection to use. | ||
|
|
||
|
|
@@ -121,7 +122,9 @@ def __init__( | |
| self._turn_detector = turn_detection if not isinstance(turn_detection, str) else None | ||
| self._stt = stt | ||
| self._vad = vad | ||
| self._turn_detection_mode = turn_detection if isinstance(turn_detection, str) else None | ||
| self._turn_detection_mode: TurnDetectionType | None = ( | ||
| turn_detection if isinstance(turn_detection, str) else None | ||
| ) | ||
| self._vad_base_turn_detection = self._turn_detection_mode in ("vad", None) | ||
| self._user_turn_committed = False # true if user turn ended but EOU task not done | ||
|
|
||
|
|
@@ -521,13 +524,30 @@ async def _on_vad_event(self, ev: vad.VADEvent) -> None: | |
| chat_ctx = self._hooks.retrieve_chat_ctx().copy() | ||
| self._run_eou_detection(chat_ctx) | ||
|
|
||
| def _eou_requires_transcript(self) -> bool: | ||
| if self._stt: | ||
| # while we aren't checking _turn_detector here, | ||
| # _turn_detector and _turn_detection_mode are mutually exclusive (such that if one is provided, the other must be None) | ||
| # e.g. if _turn_detector is provided, _turn_detection_mode is None, and vice versa | ||
| match self._turn_detection_mode: | ||
| case "stt" | "realtime_llm" | None: | ||
| return True | ||
| case "manual" | "vad": | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The addition of |
||
| return False | ||
|
Comment on lines
+535
to
+536
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡 VAD mode with STT can send empty user messages to the LLM The new How this path is reachedWhen VAD mode is used with STT enabled:
The Prompt for agentsWas this helpful? React with 👍 or 👎 to provide feedback. |
||
| case _: | ||
| # If not specified then we assume it requires transcript | ||
| return True | ||
| else: | ||
| return False | ||
|
Comment on lines
+527
to
+541
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok made this into it's own function to enumerate all of the possible values
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Before everything except for manual was essentially going through |
||
|
|
||
| def _run_eou_detection(self, chat_ctx: llm.ChatContext, skip_reply: bool = False) -> None: | ||
| if self._stt and not self._audio_transcript and self._turn_detection_mode != "manual": | ||
| # stt enabled but no transcript yet | ||
| if not self._audio_transcript and self._eou_requires_transcript(): | ||
| return | ||
|
|
||
| chat_ctx = chat_ctx.copy() | ||
| chat_ctx.add_message(role="user", content=self._audio_transcript) | ||
| if self._audio_transcript != "": | ||
| # only append when we have a transcript so we don't inject blank user messages | ||
| chat_ctx.add_message(role="user", content=self._audio_transcript) | ||
| turn_detector = ( | ||
| self._turn_detector | ||
| if self._audio_transcript and self._turn_detection_mode != "manual" | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟡 Missing empty-transcript guard at second
_scheduling_pausedcheck during session closingThe PR adds an
info.new_transcriptguard at line 1456 to prevent injecting blank user messages into the chat context when scheduling is paused and the session is closing. However, an analogous code path at line 1549 in_user_turn_completed_taskwas not updated with the same guard.Root Cause and Impact
With the
audio_recognition.pychange (line 525),_run_eou_detectionno longer returns early in VAD mode when STT is enabled but no transcript has arrived yet. This meanson_end_of_turncan now be called withinfo.new_transcript == "". The PR correctly guards line 1456:But the same pattern at
agent_activity.py:1549is left unguarded:This path is reachable when
_scheduling_pausedisFalseduring the synchronouson_end_of_turn(line 1449), allowing the async_user_turn_completed_taskto be created at line 1484, but then_scheduling_pausedbecomesTruebefore the async task executes line 1544. In that case, auser_messagewith empty content (content=[""]) is appended to the chat context, which is exactly the blank-message problem the PR is trying to fix.Note that line 1523 already has the correct guard (
if info.new_transcript != "":), confirming this is the intended pattern.(Refers to lines 1549-1551)
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
happy to add it, but need some more context from a reviewer if this is necessary