How to offload mmproj to specific GPU? (other than main gpu 0)? #16984
-
|
Hello, I have two gpu's, one with much more power and memory capacity than the other and I'd like to run qwen3vl 8b with as much context as i can. So ideally with my set up, i'd have the language model loaded on the main gpu0, and then the vision (mmproj) loaded on the second gpu, gpu1. I couldn't find anything in the documentation on how to do this, and I looked into the override tensor function, but couldnt find much information if it worked on mmproj, and even if it did, I couldn't figure it out anyway. Has anyone figured out how to do this? this would let me run the model the fastest I can given my hardware, with the largest context window my hardware can support. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
|
Try set envvar |
Beta Was this translation helpful? Give feedback.
Try set envvar
MTMD_BACKEND_DEVICE=CUDA1