Voxtral #20970

patrickvonplaten · 2025-07-15T08:55:26Z

gemini-code-assist

Summary of Changes

Hello @patrickvonplaten, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates the Voxtral audio-language model into vLLM, allowing for multimodal inference that processes both audio and text. It involves significant updates to the input handling, model architecture, and configuration system to support this new capability, building upon the mistral_common library for audio processing and tokenization.

Highlights

New Model Support: This pull request introduces comprehensive support for the 'Voxtral' audio-language model, enabling multimodal inference capabilities within vLLM that combine both audio and text inputs.
Enhanced Multimodal Input Handling: The system's input processing has been significantly upgraded to natively handle audio data alongside traditional text prompts. This includes advanced tokenization and embedding generation tailored for multimodal inputs.
Deep mistral_common Integration: The changes heavily leverage the mistral_common library for core audio processing, tokenization, and adherence to chat completion and transcription protocols, ensuring seamless compatibility with Mistral's multimodal model ecosystem.
Whisper Encoder Reusability: The existing Whisper encoder implementation has been adapted and modified to function as a reusable, standalone component within the Voxtral model. This involved adjustments to its attention mechanisms and weight loading procedures.
Flexible Configuration System: The configuration parsing logic has been extended to accurately identify and set up audio-language models like Voxtral. This includes remapping Mistral-specific multimodal arguments to vLLM's internal model configurations for proper initialization.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces support for the Voxtral model, a multi-modal audio-language model. The changes are comprehensive and well-structured. I have provided feedback to improve code clarity, maintainability, and address a few dependency updates.

examples/offline_inference/audio_language.py

vllm/model_executor/models/voxtral.py

vllm/model_executor/models/whisper.py

mgoin · 2025-07-15T13:25:20Z

vllm/model_executor/models/voxtral.py

+    # @cached_property
+    # def begin_transcript_token_id(self) -> int:
+    #     return self._audio_processor.special_ids.begin_transcript
+
+    # @cached_property
+    # def end_transcript_token_id(self) -> int:
+    #     return self._audio_processor.special_ids.end_transcript


Is this needed for transcription?

Ah if it's commented out probably not -> I can kill it

vllm/model_executor/models/voxtral.py

mgoin

LGTM, thanks for the clean work! Just a few sanity check questions

This reverts commit 6a699ec.

Signed-off-by: Patrick von Platen <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Signed-off-by: Patrick von Platen <[email protected]>

Signed-off-by: Patrick von Platen <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Signed-off-by: Himanshu Jaju <[email protected]>

Signed-off-by: Patrick von Platen <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>

patrickvonplaten added 23 commits July 4, 2025 16:19

WIP

e886051

WIP

0ed1c04

WIP

868aa9d

WIP

2f072ff

WIP

966f21b

WIP

955ea03

WIP

a0e6ccd

WIP

175ca9b

Merge branch 'vllm-project:main' into add_voxtral

9b63b9e

WIP

ac7317f

WIP

61cfc09

WIP

af188d8

WIP

9cd95c2

WIP

5727fbc

WIP

218424d

WIP

2942922

WIP

d48b2e0

WIP

c0dc455

WIP

9d60d04

WIP

c2dae7a

WIP

4c95480

WIP

8498ad4

WIP

0d4c6d9

patrickvonplaten requested a review from aarnphm as a code owner July 15, 2025 08:55

gemini-code-assist bot reviewed Jul 15, 2025

View reviewed changes

mergify bot added documentation Improvements or additions to documentation ci/build frontend new-model Requests to new models labels Jul 15, 2025

gemini-code-assist bot reviewed Jul 15, 2025

View reviewed changes

mgoin reviewed Jul 15, 2025

View reviewed changes

mgoin approved these changes Jul 15, 2025

View reviewed changes

mgoin enabled auto-merge (squash) July 15, 2025 13:34

Revert "Update tests/models/registry.py"

a446aee

This reverts commit 6a699ec.

auto-merge was automatically disabled July 15, 2025 14:18
Head branch was pushed to by a user without write access

DarkLight1337 removed v1 tool-calling labels Jul 15, 2025

DarkLight1337 removed request for zou3519, comaniac, LucasWilkinson, njhill, zhuohan123, youkaichao, houseroad, WoosukKwon, alexm-redhat, hmellor, jeejeelee, russellb and tlrmchlsmth July 15, 2025 14:21

WIP

995a9ba

DarkLight1337 approved these changes Jul 15, 2025

View reviewed changes

vllm-bot merged commit e7e3e6d into vllm-project:main Jul 15, 2025
6 of 12 checks passed

github-project-automation bot moved this to Done in Tool Calling Jul 15, 2025

github-project-automation bot moved this to Done in Structured Output Jul 15, 2025

patrickvonplaten mentioned this pull request Jul 15, 2025

[WIP] Voxtral tests #20999

Closed

patrickvonplaten mentioned this pull request Jul 15, 2025

[Voxtral] Add more tests #21010

Merged

hj-mistral pushed a commit to hj-mistral/vllm that referenced this pull request Jul 19, 2025

Voxtral (vllm-project#20970)

ad7499b

Signed-off-by: Patrick von Platen <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Signed-off-by: Himanshu Jaju <[email protected]>

LyrisZhong pushed a commit to LyrisZhong/vllm that referenced this pull request Jul 23, 2025

Voxtral (vllm-project#20970)

165e158

Signed-off-by: Patrick von Platen <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Voxtral #20970

Voxtral #20970

patrickvonplaten commented Jul 15, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mgoin Jul 15, 2025

Uh oh!

patrickvonplaten Jul 15, 2025

Uh oh!

Uh oh!

Uh oh!

mgoin left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Voxtral #20970

Voxtral #20970

Conversation

patrickvonplaten commented Jul 15, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mgoin Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!