Is it normal that Qwen/Qwen2.5-VL-7B-Instruct takes nearly 24 GB graphic memory? #16133
Closed
ChenZhongPu
announced in
Q&A
Replies: 1 comment 1 reply
-
You should factor in the KV cache |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I used the command
vllm serve Qwen/Qwen2.5-VL-7B-Instruct
, in which"torch_dtype": "bfloat16"
. But it shows that it takes nearly 24 GB graphic memory. Is it normal?In my understanding, it would take only about 14GB graphic memory.
Beta Was this translation helpful? Give feedback.
All reactions