[VLM] Qwen2.5-VL #12604

ywang96 · 2025-01-31T06:40:54Z

TODO:

Getting code to run
Reimplement ViT Qwen2 5 vl new vit ywang96/vllm#1
MRoPE modification
Correctness

To run this model before transformers 4.49 release, install transformers from source
pip install git+https://github.com/huggingface/transformers

Co-authored-by: @yixqiao(UC Berkeley) @wulipc(Qwen Team)

Signed-off-by: Roger Wang <[email protected]>

github-actions · 2025-01-31T06:41:05Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Co-authored-by: Yixuan Qiao <[email protected]> Signed-off-by: Roger Wang <[email protected]>

Signed-off-by: Roger Wang <[email protected]>

Qwen2 5 vl new vit

Signed-off-by: Roger Wang <[email protected]>

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 · 2025-02-04T04:45:08Z

I have updated it for you.

Signed-off-by: DarkLight1337 <[email protected]>

jeejeelee · 2025-02-04T05:14:12Z

This PR is ready for review. A few notes:

In order to run this model, we need a new release from transformers for reusing HF input processor, so for now this model will only work with pip install git+https://github.com/huggingface/transformers. I've added a note about this in the doc.

I've verified that inputting fps indeed changes second_per_grid_ts, so the input processing should be aligned with Huggingface. cc @ShuaiBai623

I've removed embeddings as input for the ease of reviewing processes, but it should be easy to add them back based on Qwen2-VL impl. Will leave this work to a future PR. cc @imkero

Since this PR was iterated based on Qwen2-VL, I'm not sure if LoRA will work out of box either. cc @jeejeelee

I will verify lora asap

vllm/model_executor/models/qwen2_5_vl.py

This reverts commit b50268d.

Signed-off-by: Roger Wang <[email protected]>

mergify · 2025-02-04T08:45:31Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ywang96.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: DarkLight1337 <[email protected]>

dprokhorov17 · 2025-02-04T11:19:48Z

I have build vllm and this branch from source and I do get the following error:

TypeError: Unknown image model type: qwen2_5_vl

I am serving the model as the following:

vllm serve Qwen/Qwen2.5-VL-72B-Instruct --quantization bitsandbytes --load-format bitsandbytes --pipeline_parallel_size 2 --max_model_len 10000

DarkLight1337 · 2025-02-04T11:23:53Z

You need to install the latest code of transformers from their main branch.

jjovalle99 · 2025-02-04T11:54:46Z

You need to install the latest code of transformers from their main branch.

After doing this, is the branch already usable?

vllm/model_executor/models/qwen2_5_vl.py

dprokhorov17 · 2025-02-04T12:01:40Z

You need to install the latest code of transformers from their main branch.

I am on

Name: transformers
Version: 4.49.0.dev0

jjovalle99 · 2025-02-04T13:35:18Z

You need to install the latest code of transformers from their main branch.

I am on
Name: transformers
Version: 4.49.0.dev0

I was able to make it work in a fresh EC2 instance with NVIDIA drivers with the following:

uv venv --python 3.12.8
source .venv/bin/activate
uv pip install "git+https://github.com/ywang96/vllm@qwen2_5_vl"
uv pip install "git+https://github.com/huggingface/transformers"
vllm serve Qwen/Qwen2.5-VL-7B-Instruct \
    --port 8000 \
    --host 0.0.0.0 \
    --dtype bfloat16

ywang96 · 2025-02-04T17:19:14Z

You need to install the latest code of transformers from their main branch.

I am on
Name: transformers
Version: 4.49.0.dev0
I was able to make it work in a fresh EC2 instance with NVIDIA drivers with the following:
uv venv --python 3.12.8
source .venv/bin/activate
uv pip install "git+https://github.com/ywang96/vllm@qwen2_5_vl"
uv pip install "git+https://github.com/huggingface/transformers"
vllm serve Qwen/Qwen2.5-VL-7B-Instruct \
    --port 8000 \
    --host 0.0.0.0 \
    --dtype bfloat16

@jjovalle99 Thanks for testing this branch! I also strongly encourage you to try out our V1 re-arch (by simply specifying VLLM_USE_V1=1), the inference performance will be much better!

jeejeelee · 2025-02-05T03:21:50Z

There are still some minor issues with the lora part, which can be resolved with a separate PR later.

pbarker · 2025-02-05T04:19:43Z

@jeejeelee the kind of issues that prevent the lora from working?

ywang96 · 2025-02-05T04:47:25Z

Going to merge main again and kick off a fresh new CI to make sure everything looks good, then we should be able to merge this PR!

There are still some minor issues with the lora part, which can be resolved with a separate PR later.

Will update the doc to indicate this.

Signed-off-by: Roger Wang <[email protected]>

PkuDavidGuan · 2025-02-05T07:18:13Z

You need to install the latest code of transformers from their main branch.

I am on
Name: transformers
Version: 4.49.0.dev0
I was able to make it work in a fresh EC2 instance with NVIDIA drivers with the following:
uv venv --python 3.12.8
source .venv/bin/activate
uv pip install "git+https://github.com/ywang96/vllm@qwen2_5_vl"
uv pip install "git+https://github.com/huggingface/transformers"
vllm serve Qwen/Qwen2.5-VL-7B-Instruct \
    --port 8000 \
    --host 0.0.0.0 \
    --dtype bfloat16
@jjovalle99 Thanks for testing this branch! I also strongly encourage you to try out our V1 re-arch (by simply specifying VLLM_USE_V1=1), the inference performance will be much better!

I can infer the QWen2.5-vl-7b with the config, but failed to infer the QWen2.5-vl-72B model. It seems that the model could not be infered with tensor-parallel-size > 1. I tried two usages and neither could work.

Usage 1: command line

vllm serve Qwen/Qwen2.5-VL-72B-Instruct  --port 8000 --host 0.0.0.0 --dtype bfloat16 --tensor-parallel-size 4

Usage 2: pure python function call

llm = LLM(
    model=model_dir,
    limit_mm_per_prompt={"image": 10, "video": 10},
    tensor_parallel_size=4,
)

ywang96 · 2025-02-05T07:20:36Z

vllm serve Qwen/Qwen2.5-VL-72B-Instruct  --port 8000 --host 0.0.0.0 --dtype bfloat16 --tensor-parallel-size 4

@PkuDavidGuan Can you share the error message you get?

Edit: FWIW - I was able to run Qwen/Qwen2.5-VL-72B-Instruct with TP4 on both V0 and V1 in my environment.

rstone3017 · 2025-02-05T07:31:15Z

You need to install the latest code of transformers from their main branch.

I am on
Name: transformers
Version: 4.49.0.dev0
I was able to make it work in a fresh EC2 instance with NVIDIA drivers with the following:
uv venv --python 3.12.8
source .venv/bin/activate
uv pip install "git+https://github.com/ywang96/vllm@qwen2_5_vl"
uv pip install "git+https://github.com/huggingface/transformers"
vllm serve Qwen/Qwen2.5-VL-7B-Instruct \
    --port 8000 \
    --host 0.0.0.0 \
    --dtype bfloat16

I am not able to run this my machine A100-80G:
Commands to replicate:

paperspace@psv5mz18va6k:~/venv$ uv venv --python 3.12.8 qwen25
Using CPython 3.12.8
Creating virtual environment at: qwen25
Activate with: source qwen25/bin/activate

paperspace@psv5mz18va6k:~/venv$ source qwen25/bin/activate

(qwen25) paperspace@psv5mz18va6k:~/venv$ uv pip install "git+https://github.com/huggingface/transformers"
Using Python 3.12.8 environment at: qwen25
Resolved 17 packages in 475ms
Installed 17 packages in 28ms

certifi==2025.1.31
charset-normalizer==3.4.1
filelock==3.17.0
fsspec==2025.2.0
huggingface-hub==0.28.1
idna==3.10
numpy==2.2.2
packaging==24.2
pyyaml==6.0.2
regex==2024.11.6
requests==2.32.3
safetensors==0.5.2
tokenizers==0.21.0
tqdm==4.67.1
transformers==4.49.0.dev0 (from git+https://github.com/huggingface/transformers@fa56dcc2ab748a2d98218b4918742e25454ef0d2)
typing-extensions==4.12.2
urllib3==2.3.0

(qwen25) paperspace@psv5mz18va6k:~/venv$ uv pip install "git+https://github.com/ywang96/vllm@qwen2_5_vl"
Using Python 3.12.8 environment at: qwen25

error: The build backend returned an error
Caused by: Call to setuptools.build_meta.build_wheel failed (exit status: 1)

[stderr]
Traceback (most recent call last):
File "", line 14, in
File "/home/paperspace/.cache/uv/builds-v0/.tmp6i0wFU/lib/python3.12/site-packages/setuptools/build_meta.py", line 334, in get_requires_for_build_wheel
return self._get_build_requires(config_settings, requirements=[])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/paperspace/.cache/uv/builds-v0/.tmp6i0wFU/lib/python3.12/site-packages/setuptools/build_meta.py", line 304, in _get_build_requires
self.run_setup()
File "/home/paperspace/.cache/uv/builds-v0/.tmp6i0wFU/lib/python3.12/site-packages/setuptools/build_meta.py", line 320, in run_setup
exec(code, locals())
File "", line 14, in
File "/home/paperspace/.cache/uv/builds-v0/.tmp6i0wFU/lib/python3.12/site-packages/torch/init.py", line 367, in
from torch._C import * # noqa: F403
^^^^^^^^^^^^^^^^^^^^^^
ImportError: /home/paperspace/.cache/uv/builds-v0/.tmp6i0wFU/lib/python3.12/site-packages/torch/lib/../../nvidia/cusparse/lib/libcusparse.so.12: undefined symbol: __nvJitLinkComplete_12_4, version libnvJitLink.so.12

hint: This usually indicates a problem with the package or the build environment.

+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1934 G /usr/lib/xorg/Xorg 60MiB |
| 0 N/A N/A 2067 G /usr/bin/gnome-shell 78MiB |

I able to run inference qwen2.5-vl 7b using HF but not above vllm branch

ywang96 · 2025-02-05T07:35:09Z

@rstone3017 I don't think your issue is with this PR, but something wrong with your local environment in particular

error: The build backend returned an error
Caused by: Call to setuptools.build_meta.build_wheel failed (exit status: 1)
...
mportError: /home/paperspace/.cache/uv/builds-v0/.tmp6i0wFU/lib/python3.12/site-packages/torch/lib/../../nvidia/cusparse/lib/libcusparse.so.12: undefined symbol: __nvJitLinkComplete_12_4, version libnvJitLink.so.12

initial

b400cc6

Signed-off-by: Roger Wang <[email protected]>

ywang96 mentioned this pull request Jan 31, 2025

[Draft] Qwen2.5-VL #12596

Closed

DarkLight1337 self-assigned this Jan 31, 2025

add to chat utils

022387e

Co-authored-by: Yixuan Qiao <[email protected]> Signed-off-by: Roger Wang <[email protected]>

mergify bot added the frontend label Jan 31, 2025

ywang96 mentioned this pull request Jan 31, 2025

Release v0.7.3 #12465

Open

2 tasks

ywang96 and others added 4 commits January 31, 2025 01:15

Merge branch 'vllm-project:main' into qwen2_5_vl

dc1155a

Add basic ViT functionality

4f9b3b8

Add new window index and new forward logic

dd12f26

mrope

a7b0143

Signed-off-by: Roger Wang <[email protected]>

mergify bot added the v1 label Feb 1, 2025

yixqiao and others added 17 commits February 1, 2025 02:27

Code cleanup

e5b127f

More cleanup and minor changes

a75217f

add test

97feddd

Signed-off-by: Roger Wang <[email protected]>

fix name

10e4604

Signed-off-by: Roger Wang <[email protected]>

Replace with SiLU

d274643

Cleanup

fdb1668

Merge pull request #1 from ywang96/qwen2_5_vl_new_vit

0d09631

Qwen2 5 vl new vit

attn

00ed88e

Signed-off-by: Roger Wang <[email protected]>

include second_per_grid_ts

6f79d82

Signed-off-by: Roger Wang <[email protected]>

Merge branch 'main' into qwen2_5_vl

94c9e1a

[fix] fix activate func and format code.

59957d5

format

d8aaf7b

Signed-off-by: Roger Wang <[email protected]>

add hf in copyright

f1f0739

Signed-off-by: Roger Wang <[email protected]>

fix second_per_grid_ts

c614bab

Signed-off-by: Roger Wang <[email protected]>

add fps

c5a056c

Signed-off-by: Roger Wang <[email protected]>

simplify

85dd9b4

Signed-off-by: Roger Wang <[email protected]>

add to doc

097d041

Signed-off-by: Roger Wang <[email protected]>

mergify bot added the documentation Improvements or additions to documentation label Feb 2, 2025

Update registry

562840f

Signed-off-by: DarkLight1337 <[email protected]>

Clean up

278e93a

Signed-off-by: DarkLight1337 <[email protected]>

jeejeelee reviewed Feb 4, 2025

View reviewed changes

vllm/model_executor/models/qwen2_5_vl.py Show resolved Hide resolved

ywang96 and others added 3 commits February 3, 2025 21:47

Merge branch 'vllm-project:main' into qwen2_5_vl

ce48159

Revert "update attention module"

8818c9d

This reverts commit b50268d.

add note

7c813dd

Signed-off-by: Roger Wang <[email protected]>

mergify bot added the needs-rebase label Feb 4, 2025

Merge branch 'main' into qwen2_5_vl

9708f5d

Signed-off-by: DarkLight1337 <[email protected]>

mergify bot removed the needs-rebase label Feb 4, 2025

jeejeelee reviewed Feb 4, 2025

View reviewed changes

vllm/model_executor/models/qwen2_5_vl.py Show resolved Hide resolved

Merge branch 'vllm-project:main' into qwen2_5_vl

1acfa42

ywang96 and others added 2 commits February 4, 2025 20:49

Merge branch 'vllm-project:main' into qwen2_5_vl

551b64e

no lora

bbd2f98

Signed-off-by: Roger Wang <[email protected]>

ShuaiBai623 mentioned this pull request Feb 5, 2025

Can I use vllm for qwenvl-2.5? QwenLM/Qwen2.5-VL#720

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[VLM] Qwen2.5-VL #12604

[VLM] Qwen2.5-VL #12604

ywang96 commented Jan 31, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Jan 31, 2025

DarkLight1337 commented Feb 4, 2025

jeejeelee commented Feb 4, 2025

mergify bot commented Feb 4, 2025

dprokhorov17 commented Feb 4, 2025

DarkLight1337 commented Feb 4, 2025

jjovalle99 commented Feb 4, 2025 •

edited

Loading

dprokhorov17 commented Feb 4, 2025

jjovalle99 commented Feb 4, 2025 •

edited

Loading

ywang96 commented Feb 4, 2025 •

edited

Loading

jeejeelee commented Feb 5, 2025

pbarker commented Feb 5, 2025

ywang96 commented Feb 5, 2025 •

edited

Loading

PkuDavidGuan commented Feb 5, 2025

ywang96 commented Feb 5, 2025 •

edited

Loading

rstone3017 commented Feb 5, 2025

ywang96 commented Feb 5, 2025

[VLM] Qwen2.5-VL #12604

Are you sure you want to change the base?

[VLM] Qwen2.5-VL #12604

Conversation

ywang96 commented Jan 31, 2025 • edited by github-actions bot Loading

github-actions bot commented Jan 31, 2025

DarkLight1337 commented Feb 4, 2025

jeejeelee commented Feb 4, 2025

mergify bot commented Feb 4, 2025

dprokhorov17 commented Feb 4, 2025

DarkLight1337 commented Feb 4, 2025

jjovalle99 commented Feb 4, 2025 • edited Loading

dprokhorov17 commented Feb 4, 2025

jjovalle99 commented Feb 4, 2025 • edited Loading

ywang96 commented Feb 4, 2025 • edited Loading

jeejeelee commented Feb 5, 2025

pbarker commented Feb 5, 2025

ywang96 commented Feb 5, 2025 • edited Loading

PkuDavidGuan commented Feb 5, 2025

Usage 1: command line

Usage 2: pure python function call

ywang96 commented Feb 5, 2025 • edited Loading

rstone3017 commented Feb 5, 2025

ywang96 commented Feb 5, 2025

ywang96 commented Jan 31, 2025 •

edited by github-actions bot

Loading

jjovalle99 commented Feb 4, 2025 •

edited

Loading

jjovalle99 commented Feb 4, 2025 •

edited

Loading

ywang96 commented Feb 4, 2025 •

edited

Loading

ywang96 commented Feb 5, 2025 •

edited

Loading

ywang96 commented Feb 5, 2025 •

edited

Loading