Skip to content

Conversation

@seyeong-han
Copy link
Contributor

@seyeong-han seyeong-han commented Nov 13, 2025

Summary

This PR enhances Whisper model flexibility during export while simplifying the runtime interface by removing the model_name argument from the runner.

Changes

Export Scripts Enhancement

  • Added model_name argument to export.sh
    • Allows specifying any HuggingFace Whisper model (tiny, base, small, medium, large, large-v3, large-v3-turbo)
    • Defaults to openai/whisper-large-v3-turbo if not specified
  • Automatic feature size detection based on model variant
    • Uses 128 mel features for large-v3/large-v3-turbo models
    • Uses 80 mel features for all other models
    • Prevents tensor shape mismatch errors by correctly configuring the preprocessor

Runtime Simplification

  • Removed model_name argument from run.sh and main.cpp
    • Hardcoded decoder_start_token_id=50258 for all models
    • Fixes tokenizer compatibility issue where all Whisper models from HuggingFace now use the v3 tokenizer format
    • Eliminates confusion about which model name to pass at runtime

E2E Script Updates

  • Updated e2e.sh to support --model-name flag during export
  • Simplified run step to no longer pass model name

Documentation

  • Comprehensive README updates with model comparison table
  • Added examples for different model variants
  • Documented mel features and tokenizer configuration

Why These Changes?

  1. Export Flexibility: Users can now easily export any Whisper model variant without modifying scripts
  2. Correct Preprocessing: Automatic feature size detection ensures the preprocessor matches the model's requirements
  3. Tokenizer Fix: All HuggingFace Whisper models now use the updated tokenizer format (token 50257 = <|endoftext|>, token 50258 = <|startoftranscript|>), so hardcoding 50258 works universally
  4. Simplified UX: Removing the model_name runtime argument reduces user confusion and potential errors

Testing

Tested with whisper-tiny and whisper-large-v3-turbo models to verify correct transcription output.

cc. @manuelcandales

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 13, 2025
@seyeong-han seyeong-han changed the title docs: update model_name requirement, different FEATURE_SIZE and various model support Support various Whisper model with Metal backend Nov 13, 2025
@manuelcandales manuelcandales merged commit 58e8f3a into meta-pytorch:main Nov 15, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants