fix(fallback_adapter): update stream start time offset in stt fallbackadapter#6246
fix(fallback_adapter): update stream start time offset in stt fallbackadapter#6246chenghao-mou wants to merge 3 commits into
Conversation
β¦k adapter plus a defensive fix for barge-in so that transcripts are only held with a tighter wall clock window
| """Reset relevant states for adaptive interruption detection.""" | ||
| self._transcript_buffer.clear() | ||
| self._ignore_user_transcript_until = NOT_GIVEN | ||
| self._agent_speech_started_at = None |
There was a problem hiding this comment.
π© Potential race between async flush task and rapid agent re-speech
The _flush_held_transcripts task created at audio_recognition.py:438 runs asynchronously and eventually calls _reset_interruption_detection() (audio_recognition.py:604-608), which clears _agent_speech_started_at. If a new agent speech cycle starts (on_start_of_agent_speech at line 385-387 sets _agent_speech_started_at = started_at) before the previous flush task completes, the flush task could later overwrite the new value with None. This would cause _within_ignore_window to fall back to lower = 0.0 for the new cycle, losing the lower-bound protection. This is a pre-existing pattern issue: the same race existed for _ignore_user_transcript_until and _transcript_buffer before this PR. The PR is consistent with the existing design. In practice, rapid back-to-back agent speech cycles with non-empty transcript buffers are rare, and the degradation is graceful (falls back to old unbounded behavior).
Was this helpful? React with π or π to provide feedback.
There was a problem hiding this comment.
π΄ Stale transcript flush can erase the timing anchor for a newer agent speech cycle
The agent-speech timing anchor is unconditionally cleared (_reset_interruption_detection() at audio_recognition.py:590) because the agent_speech_started_at parameter is not forwarded, so the staleness guard is bypassed and a concurrent new speech cycle loses its lower-bound protection.
Impact: A rapid agent speech restart can cause user transcripts to be incorrectly held or swallowed because the bounded ignore-window falls back to a lower bound of zero.
All other early-return paths forward the anchor but the final path does not
In _flush_held_transcripts, every call to _reset_interruption_detection on lines 541, 547, 557, and 568 correctly passes agent_speech_started_at=agent_speech_started_at. However, the "normal flush" path at line 590 calls self._reset_interruption_detection() without the parameter.
Because _reset_interruption_detection defaults agent_speech_started_at to None (audio_recognition.py:615), the guard at audio_recognition.py:620-624:
if (
agent_speech_started_at is None
or self._agent_speech_started_at == agent_speech_started_at
):
self._agent_speech_started_at = Nonealways evaluates to True (first branch), unconditionally clearing the anchor.
This defeats the protection added by audio_recognition.py:437-438 which captures the anchor specifically so a stale flush cannot clear one that a newer cycle has set. If a new on_start_of_agent_speech fires while the flush task (created at audio_recognition.py:439-443) is still pending, the flush will overwrite the new cycle's _agent_speech_started_at with None. Subsequent calls to _within_ignore_window (audio_recognition.py:641) will then use lower = 0.0, making the window overly broad and potentially holding transcripts that should pass through.
(Refers to line 590)
Was this helpful? React with π or π to provide feedback.
There was a problem hiding this comment.
π© Flush from _on_stt_event also omits agent_speech_started_at but is less risky
At livekit-agents/livekit/agents/voice/audio_recognition.py:1067, _flush_held_transcripts(cooldown=end_cooldown) is called without agent_speech_started_at. This will also unconditionally clear _agent_speech_started_at via the same path at line 590. However, this call site is reached only when _agent_speaking is False (line 650 returns True and holds the event when agent is speaking), so it's less likely to race with a new agent speech cycle. Still, since the method is async and awaits inside, a new on_start_of_agent_speech could theoretically fire between the check and the reset. Consider passing self._agent_speech_started_at here for consistency.
(Refers to line 1067)
Was this helpful? React with π or π to provide feedback.
Uh oh!
There was an error while loading. Please reload this page.