Skip to content

fix(fallback_adapter): update stream start time offset in stt fallbackadapter#6246

Open
chenghao-mou wants to merge 3 commits into
mainfrom
fix/stt-fallback-timestamp-anchoring
Open

fix(fallback_adapter): update stream start time offset in stt fallbackadapter#6246
chenghao-mou wants to merge 3 commits into
mainfrom
fix/stt-fallback-timestamp-anchoring

Conversation

@chenghao-mou

@chenghao-mou chenghao-mou commented Jun 26, 2026

Copy link
Copy Markdown
Member
  • update stream start time offset in a stt fallback adapter so stream changes persist the offset (today they have timestamps from the past)
  • plus a defensive fix for barge-in so that transcripts are only held with a tighter wall clock window so we don't hold outdated transcripts (before agent speech start)

…k adapter

plus a defensive fix for barge-in so that transcripts are only held with a tighter wall clock window
@chenghao-mou chenghao-mou requested a review from a team as a code owner June 26, 2026 13:47

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

Open in Devin Review

Comment on lines +605 to +608
"""Reset relevant states for adaptive interruption detection."""
self._transcript_buffer.clear()
self._ignore_user_transcript_until = NOT_GIVEN
self._agent_speech_started_at = None

@devin-ai-integration devin-ai-integration Bot Jun 26, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Potential race between async flush task and rapid agent re-speech

The _flush_held_transcripts task created at audio_recognition.py:438 runs asynchronously and eventually calls _reset_interruption_detection() (audio_recognition.py:604-608), which clears _agent_speech_started_at. If a new agent speech cycle starts (on_start_of_agent_speech at line 385-387 sets _agent_speech_started_at = started_at) before the previous flush task completes, the flush task could later overwrite the new value with None. This would cause _within_ignore_window to fall back to lower = 0.0 for the new cycle, losing the lower-bound protection. This is a pre-existing pattern issue: the same race existed for _ignore_user_transcript_until and _transcript_buffer before this PR. The PR is consistent with the existing design. In practice, rapid back-to-back agent speech cycles with non-empty transcript buffers are rare, and the degradation is graceful (falls back to old unbounded behavior).

Open in Devin Review

Was this helpful? React with πŸ‘ or πŸ‘Ž to provide feedback.

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

Open in Devin Review

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸ”΄ Stale transcript flush can erase the timing anchor for a newer agent speech cycle

The agent-speech timing anchor is unconditionally cleared (_reset_interruption_detection() at audio_recognition.py:590) because the agent_speech_started_at parameter is not forwarded, so the staleness guard is bypassed and a concurrent new speech cycle loses its lower-bound protection.

Impact: A rapid agent speech restart can cause user transcripts to be incorrectly held or swallowed because the bounded ignore-window falls back to a lower bound of zero.

All other early-return paths forward the anchor but the final path does not

In _flush_held_transcripts, every call to _reset_interruption_detection on lines 541, 547, 557, and 568 correctly passes agent_speech_started_at=agent_speech_started_at. However, the "normal flush" path at line 590 calls self._reset_interruption_detection() without the parameter.

Because _reset_interruption_detection defaults agent_speech_started_at to None (audio_recognition.py:615), the guard at audio_recognition.py:620-624:

if (
    agent_speech_started_at is None
    or self._agent_speech_started_at == agent_speech_started_at
):
    self._agent_speech_started_at = None

always evaluates to True (first branch), unconditionally clearing the anchor.

This defeats the protection added by audio_recognition.py:437-438 which captures the anchor specifically so a stale flush cannot clear one that a newer cycle has set. If a new on_start_of_agent_speech fires while the flush task (created at audio_recognition.py:439-443) is still pending, the flush will overwrite the new cycle's _agent_speech_started_at with None. Subsequent calls to _within_ignore_window (audio_recognition.py:641) will then use lower = 0.0, making the window overly broad and potentially holding transcripts that should pass through.

(Refers to line 590)

Open in Devin Review

Was this helpful? React with πŸ‘ or πŸ‘Ž to provide feedback.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Flush from _on_stt_event also omits agent_speech_started_at but is less risky

At livekit-agents/livekit/agents/voice/audio_recognition.py:1067, _flush_held_transcripts(cooldown=end_cooldown) is called without agent_speech_started_at. This will also unconditionally clear _agent_speech_started_at via the same path at line 590. However, this call site is reached only when _agent_speaking is False (line 650 returns True and holds the event when agent is speaking), so it's less likely to race with a new agent speech cycle. Still, since the method is async and awaits inside, a new on_start_of_agent_speech could theoretically fire between the check and the reset. Consider passing self._agent_speech_started_at here for consistency.

(Refers to line 1067)

Open in Devin Review

Was this helpful? React with πŸ‘ or πŸ‘Ž to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant