Skip to content

Conversation

@seyeong-han
Copy link

@seyeong-han seyeong-han commented Nov 13, 2025

Issue

I found that the tokenizer was wrong when I run open-ai/whisper-tiny model using whisper_runner by looking at the transcription result.

  • Expected
<\|en\|><\|transcribe\|><\|notimestamps\|> This week, I traveled to Chicago to deliver my final farewell address to the nation, following in the tradition of presidents before me. It was not opportunity to say thank you. Whether we've seen IDI or rarely agreed at all, my conversations with you, the American people, in living rooms and schools,<\|endoftext\|>
  • Result
<|startoftranscript|><|translate|><|10.00|> So.<|24.00|><|endoftext|>

Since HuggingFace has updated all Whisper model tokenizers to the v3 format, we don't need to care about the decoder_start_token_id manually.

Solution

  • Removed model_name argument from run.sh and main.cpp
  • Hardcoded decoder_start_token_id=50258 for all models
  • Fixes tokenizer compatibility issue where all Whisper models from HuggingFace now use the v3 tokenizer format
  • Eliminates confusion about which model name to pass at runtime

@manuelcandales

@pytorch-bot
Copy link

pytorch-bot bot commented Nov 13, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15798

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit ed8786f with merge base ed72daf (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 13, 2025
@seyeong-han seyeong-han changed the title feat: no need to specify decoder_start_token_id Remove model_name param from Whisper-Metal Nov 13, 2025
@seyeong-han
Copy link
Author

@pytorchbot label "release notes: desktop"

@pytorch-bot pytorch-bot bot added the release notes: desktop for desktop/laptop workstream label Nov 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. release notes: desktop for desktop/laptop workstream

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant