Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VLM] Qwen2.5-VL #12604

Open
wants to merge 54 commits into
base: main
Choose a base branch
from
Open

[VLM] Qwen2.5-VL #12604

wants to merge 54 commits into from

Conversation

ywang96
Copy link
Member

@ywang96 ywang96 commented Jan 31, 2025

FIXES: #12486, #12532

TODO:

To run this model before transformers 4.49 release, install transformers from source
pip install git+https://github.com/huggingface/transformers

Co-authored-by: @yixqiao(UC Berkeley) @wulipc(Qwen Team)

Signed-off-by: Roger Wang <[email protected]>
Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

@ywang96 ywang96 mentioned this pull request Jan 31, 2025
@DarkLight1337 DarkLight1337 self-assigned this Jan 31, 2025
Co-authored-by: Yixuan Qiao <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
@mergify mergify bot added the frontend label Jan 31, 2025
@ywang96 ywang96 mentioned this pull request Jan 31, 2025
2 tasks
@mergify mergify bot added the v1 label Feb 1, 2025
@mergify mergify bot added the documentation Improvements or additions to documentation label Feb 2, 2025
Signed-off-by: DarkLight1337 <[email protected]>
@DarkLight1337
Copy link
Member

I have updated it for you.

Signed-off-by: DarkLight1337 <[email protected]>
@jeejeelee
Copy link
Collaborator

This PR is ready for review. A few notes:

  1. In order to run this model, we need a new release from transformers for reusing HF input processor, so for now this model will only work with pip install git+https://github.com/huggingface/transformers. I've added a note about this in the doc.
  2. I've verified that inputting fps indeed changes second_per_grid_ts, so the input processing should be aligned with Huggingface. cc @ShuaiBai623
  3. I've removed embeddings as input for the ease of reviewing processes, but it should be easy to add them back based on Qwen2-VL impl. Will leave this work to a future PR. cc @imkero
  4. Since this PR was iterated based on Qwen2-VL, I'm not sure if LoRA will work out of box either. cc @jeejeelee

I will verify lora asap

Copy link

mergify bot commented Feb 4, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ywang96.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Feb 4, 2025
@mergify mergify bot removed the needs-rebase label Feb 4, 2025
@dprokhorov17
Copy link

I have build vllm and this branch from source and I do get the following error:

TypeError: Unknown image model type: qwen2_5_vl

I am serving the model as the following:

vllm serve Qwen/Qwen2.5-VL-72B-Instruct --quantization bitsandbytes --load-format bitsandbytes --pipeline_parallel_size 2 --max_model_len 10000

@DarkLight1337
Copy link
Member

You need to install the latest code of transformers from their main branch.

@jjovalle99
Copy link

jjovalle99 commented Feb 4, 2025

You need to install the latest code of transformers from their main branch.

After doing this, is the branch already usable?

@dprokhorov17
Copy link

You need to install the latest code of transformers from their main branch.

I am on

Name: transformers
Version: 4.49.0.dev0

@jjovalle99
Copy link

jjovalle99 commented Feb 4, 2025

You need to install the latest code of transformers from their main branch.

I am on

Name: transformers
Version: 4.49.0.dev0

I was able to make it work in a fresh EC2 instance with NVIDIA drivers with the following:

uv venv --python 3.12.8
source .venv/bin/activate
uv pip install "git+https://github.com/ywang96/vllm@qwen2_5_vl"
uv pip install "git+https://github.com/huggingface/transformers"
vllm serve Qwen/Qwen2.5-VL-7B-Instruct \
    --port 8000 \
    --host 0.0.0.0 \
    --dtype bfloat16

@ywang96
Copy link
Member Author

ywang96 commented Feb 4, 2025

You need to install the latest code of transformers from their main branch.

I am on

Name: transformers
Version: 4.49.0.dev0

I was able to make it work in a fresh EC2 instance with NVIDIA drivers with the following:

uv venv --python 3.12.8
source .venv/bin/activate
uv pip install "git+https://github.com/ywang96/vllm@qwen2_5_vl"
uv pip install "git+https://github.com/huggingface/transformers"
vllm serve Qwen/Qwen2.5-VL-7B-Instruct \
    --port 8000 \
    --host 0.0.0.0 \
    --dtype bfloat16

@jjovalle99 Thanks for testing this branch! I also strongly encourage you to try out our V1 re-arch (by simply specifying VLLM_USE_V1=1), the inference performance will be much better!

@jeejeelee
Copy link
Collaborator

There are still some minor issues with the lora part, which can be resolved with a separate PR later.

@pbarker
Copy link

pbarker commented Feb 5, 2025

@jeejeelee the kind of issues that prevent the lora from working?

@ywang96
Copy link
Member Author

ywang96 commented Feb 5, 2025

Going to merge main again and kick off a fresh new CI to make sure everything looks good, then we should be able to merge this PR!

There are still some minor issues with the lora part, which can be resolved with a separate PR later.

Will update the doc to indicate this.

@PkuDavidGuan
Copy link

You need to install the latest code of transformers from their main branch.

I am on

Name: transformers
Version: 4.49.0.dev0

I was able to make it work in a fresh EC2 instance with NVIDIA drivers with the following:

uv venv --python 3.12.8
source .venv/bin/activate
uv pip install "git+https://github.com/ywang96/vllm@qwen2_5_vl"
uv pip install "git+https://github.com/huggingface/transformers"
vllm serve Qwen/Qwen2.5-VL-7B-Instruct \
    --port 8000 \
    --host 0.0.0.0 \
    --dtype bfloat16

@jjovalle99 Thanks for testing this branch! I also strongly encourage you to try out our V1 re-arch (by simply specifying VLLM_USE_V1=1), the inference performance will be much better!

I can infer the QWen2.5-vl-7b with the config, but failed to infer the QWen2.5-vl-72B model. It seems that the model could not be infered with tensor-parallel-size > 1. I tried two usages and neither could work.

Usage 1: command line

vllm serve Qwen/Qwen2.5-VL-72B-Instruct  --port 8000 --host 0.0.0.0 --dtype bfloat16 --tensor-parallel-size 4

Usage 2: pure python function call

llm = LLM(
    model=model_dir,
    limit_mm_per_prompt={"image": 10, "video": 10},
    tensor_parallel_size=4,
)

@ywang96
Copy link
Member Author

ywang96 commented Feb 5, 2025

vllm serve Qwen/Qwen2.5-VL-72B-Instruct  --port 8000 --host 0.0.0.0 --dtype bfloat16 --tensor-parallel-size 4

@PkuDavidGuan Can you share the error message you get?

Edit: FWIW - I was able to run Qwen/Qwen2.5-VL-72B-Instruct with TP4 on both V0 and V1 in my environment.

@rstone3017
Copy link

You need to install the latest code of transformers from their main branch.

I am on

Name: transformers
Version: 4.49.0.dev0

I was able to make it work in a fresh EC2 instance with NVIDIA drivers with the following:

uv venv --python 3.12.8
source .venv/bin/activate
uv pip install "git+https://github.com/ywang96/vllm@qwen2_5_vl"
uv pip install "git+https://github.com/huggingface/transformers"
vllm serve Qwen/Qwen2.5-VL-7B-Instruct \
    --port 8000 \
    --host 0.0.0.0 \
    --dtype bfloat16

I am not able to run this my machine A100-80G:
Commands to replicate:

paperspace@psv5mz18va6k:~/venv$ uv venv --python 3.12.8 qwen25
Using CPython 3.12.8
Creating virtual environment at: qwen25
Activate with: source qwen25/bin/activate

paperspace@psv5mz18va6k:~/venv$ source qwen25/bin/activate

(qwen25) paperspace@psv5mz18va6k:~/venv$ uv pip install "git+https://github.com/huggingface/transformers"
Using Python 3.12.8 environment at: qwen25
Resolved 17 packages in 475ms
Installed 17 packages in 28ms

(qwen25) paperspace@psv5mz18va6k:~/venv$ uv pip install "git+https://github.com/ywang96/vllm@qwen2_5_vl"
Using Python 3.12.8 environment at: qwen25

error: The build backend returned an error
Caused by: Call to setuptools.build_meta.build_wheel failed (exit status: 1)

[stderr]
Traceback (most recent call last):
File "", line 14, in
File "/home/paperspace/.cache/uv/builds-v0/.tmp6i0wFU/lib/python3.12/site-packages/setuptools/build_meta.py", line 334, in get_requires_for_build_wheel
return self._get_build_requires(config_settings, requirements=[])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/paperspace/.cache/uv/builds-v0/.tmp6i0wFU/lib/python3.12/site-packages/setuptools/build_meta.py", line 304, in _get_build_requires
self.run_setup()
File "/home/paperspace/.cache/uv/builds-v0/.tmp6i0wFU/lib/python3.12/site-packages/setuptools/build_meta.py", line 320, in run_setup
exec(code, locals())
File "", line 14, in
File "/home/paperspace/.cache/uv/builds-v0/.tmp6i0wFU/lib/python3.12/site-packages/torch/init.py", line 367, in
from torch._C import * # noqa: F403
^^^^^^^^^^^^^^^^^^^^^^
ImportError: /home/paperspace/.cache/uv/builds-v0/.tmp6i0wFU/lib/python3.12/site-packages/torch/lib/../../nvidia/cusparse/lib/libcusparse.so.12: undefined symbol: __nvJitLinkComplete_12_4, version libnvJitLink.so.12

hint: This usually indicates a problem with the package or the build environment.


Additional information:
(qwen25) paperspace@psv5mz18va6k:~/venv$ nvidia-smi
Wed Feb 5 07:30:03 2025
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A100-SXM4-80GB On | 00000000:00:05.0 Off | 0 |
| N/A 36C P0 52W / 500W | 148MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1934 G /usr/lib/xorg/Xorg 60MiB |
| 0 N/A N/A 2067 G /usr/bin/gnome-shell 78MiB |


I able to run inference qwen2.5-vl 7b using HF but not above vllm branch

@ywang96
Copy link
Member Author

ywang96 commented Feb 5, 2025

@rstone3017 I don't think your issue is with this PR, but something wrong with your local environment in particular

error: The build backend returned an error
Caused by: Call to setuptools.build_meta.build_wheel failed (exit status: 1)
...
mportError: /home/paperspace/.cache/uv/builds-v0/.tmp6i0wFU/lib/python3.12/site-packages/torch/lib/../../nvidia/cusparse/lib/libcusparse.so.12: undefined symbol: __nvJitLinkComplete_12_4, version libnvJitLink.so.12

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation frontend ready ONLY add when PR is ready to merge/full CI is needed v1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[New Model]: Qwen2.5-VL