How to offload mmproj to specific GPU? (other than main gpu 0)? #16984

Master-Pr0grammer · 2025-11-03T23:36:19Z

Master-Pr0grammer
Nov 3, 2025

Hello, I have two gpu's, one with much more power and memory capacity than the other and I'd like to run qwen3vl 8b with as much context as i can. So ideally with my set up, i'd have the language model loaded on the main gpu0, and then the vision (mmproj) loaded on the second gpu, gpu1.

I couldn't find anything in the documentation on how to do this, and I looked into the override tensor function, but couldnt find much information if it worked on mmproj, and even if it did, I couldn't figure it out anyway.

Has anyone figured out how to do this? this would let me run the model the fastest I can given my hardware, with the largest context window my hardware can support.

Answered by XZiar

Nov 4, 2025

Try set envvar MTMD_BACKEND_DEVICE=CUDA1

View full answer

XZiar · 2025-11-04T17:47:32Z

XZiar
Nov 4, 2025

Try set envvar MTMD_BACKEND_DEVICE=CUDA1

1 reply

Master-Pr0grammer Nov 4, 2025
Author

yep that did it, Thank you so much!!!

Was able to increase the context on qwen 3vl 8b from 8k to 24k

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to offload mmproj to specific GPU? (other than main gpu 0)? #16984

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to offload mmproj to specific GPU? (other than main gpu 0)? #16984

Uh oh!

Master-Pr0grammer Nov 3, 2025

Replies: 1 comment · 1 reply

Uh oh!

XZiar Nov 4, 2025

Uh oh!

Master-Pr0grammer Nov 4, 2025 Author

Master-Pr0grammer
Nov 3, 2025

Replies: 1 comment 1 reply

XZiar
Nov 4, 2025

Master-Pr0grammer Nov 4, 2025
Author