You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to launch a model with llama.cpp on a Jetson Orin Nano device, but I am getting an OOM error each time that I try to run with the full model.
As you read, the device has device CUDA0 (Orin) (0000:00:00.0) - 6687 MiB free and the model would only require 3573.76 MiB.
If someone had the same problem, could you guide me through? Also, I tried jetson-containers, but it seems outdated (got no gemma-3n model available error).
BTW, I also tried using GGML_CUDA_ENABLE_UNIFIED_MEMORY=1 llama-cli but got the same error. (Not sure if the unified memory is working, I have a swap memory of 128Gb)
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
I am trying to launch a model with llama.cpp on a Jetson Orin Nano device, but I am getting an OOM error each time that I try to run with the full model.
I used the llama.cpp@03792ad, built with:
Then, I tested llama-cli with:
Using the whole model (31 layers) gave me this output:
As you read, the device has
device CUDA0 (Orin) (0000:00:00.0) - 6687 MiB freeand the model would only require3573.76 MiB.If someone had the same problem, could you guide me through? Also, I tried jetson-containers, but it seems outdated (got no gemma-3n model available error).
BTW, I also tried using
GGML_CUDA_ENABLE_UNIFIED_MEMORY=1 llama-clibut got the same error. (Not sure if the unified memory is working, I have a swap memory of 128Gb)Beta Was this translation helpful? Give feedback.
All reactions