RPC with tensor override #17075
Unanswered
yggdrasil75
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I have a system with 2 3090s and a system with a 395+. the 395+ doesnt have pcie slots but runs decent vulkan speeds.
I want to run a larger moe split across the 2 systems. glm air I am getting 191 t/s pp and 17 t/s tg. these numbers are quite good for a 106b at q6k.
but I figured I can do better by using the 3090s as cuda devices and ffn_up on the 395+. unfortunately that fails and I get 3 t/s pp and 4 t/s tg.
is it possible to offload the tensors by layer per device? ie: layers 1-14 on rpc0 with ffn_up from those layers on rpc0 system ram, layers 15-28 on rpc1 with those layers on rpc1 system ram, and the remaining layers on vulkan0 completely?
Beta Was this translation helpful? Give feedback.
All reactions