Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -540,7 +540,7 @@ def _process_stream_event(self, data: dict) -> None:
)
self._event_ch.send_nowait(interim_event)

if utterance:
if utterance and timed_words:
if self._last_preflight_start_time == 0.0:
self._last_preflight_start_time = start_time

Expand All @@ -555,12 +555,19 @@ def _process_stream_event(self, data: dict) -> None:
len(utterance_words), 1
)

# Use the cumulative words text (same as INTERIM) instead of the
# chunk-based utterance field. Both INTERIM and PREFLIGHT events
# flow through on_interim_transcript in the framework and are
# rendered in replacement mode (is_delta_stream=False). Using the
# chunk-based utterance here would cause the displayed text to
# regress/jump when the shorter chunk overwrites the longer
# cumulative text for the same segment ID. See #4779.
final_event = stt.SpeechEvent(
type=stt.SpeechEventType.PREFLIGHT_TRANSCRIPT,
alternatives=[
stt.SpeechData(
language=language,
text=utterance,
text=interim_text,
start_time=self._last_preflight_start_time,
end_time=end_time,
words=utterance_words,
Comment on lines +570 to 573
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 PREFLIGHT_TRANSCRIPT text/words mismatch: cumulative text paired with chunk-based words

The PREFLIGHT_TRANSCRIPT SpeechData now has text=interim_text (cumulative text built from ALL timed_words in the turn) but words=utterance_words (filtered to only the chunk's words where word.start_time >= self._last_preflight_start_time). Before this PR, both fields were chunk-based (text=utterance, words=utterance_words), so they were consistent. Now text represents the entire turn's words while words is only a subset, violating the implicit contract that words should correspond to text. This can cause incorrect behavior for any consumer that reconstructs or validates text from the words array (e.g., word-level alignment or highlighting on the client).

Suggested change
text=interim_text,
start_time=self._last_preflight_start_time,
end_time=end_time,
words=utterance_words,
text=interim_text,
start_time=self._last_preflight_start_time,
end_time=end_time,
words=timed_words,
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Expand Down