LocalAI version:
v4.1.3 (fdc9f7b)
Environment, CPU architecture, OS, and Version:
Docker, X64, Ubuntu 24.04
Describe the bug
(I'm scripting an install to deploy to multiple people.)
If you curl to add opus which is required for realtime and then opus loads and then you then script any llm model you want that would normally use llama.cpp it will then try and use opus instead of being specified as llama.cpp and thus inducing a download of the required backend
To Reproduce
payload="{\"id\":\"opus\"}"
response="$(curl -fsS -X POST "${LOCALAI_BASE_URL}/backends/apply" \
"${AUTH_HEADER[@]}" \
-H "Content-Type: application/json" \
-d "${payload}")"
payload="{\"id\":\"Qwen3.5-9b\"}"
response="$(curl -fsS -X POST "${LOCALAI_BASE_URL}/models/apply" \
"${AUTH_HEADER[@]}" \
-H "Content-Type: application/json" \
-d "${payload}")"
Expected behavior
this should induce the model to use the proper backend that is in the default configuration.
One would also expect that if the model allowed multiple backends (i.e. llama.cpp and vllm-omni) that you'd be able to choose the backend as part of the payload to the request.
Logs
N/A.
LocalAI version:
v4.1.3 (fdc9f7b)
Environment, CPU architecture, OS, and Version:
Docker, X64, Ubuntu 24.04
Describe the bug
(I'm scripting an install to deploy to multiple people.)
If you curl to add opus which is required for realtime and then opus loads and then you then script any llm model you want that would normally use llama.cpp it will then try and use opus instead of being specified as llama.cpp and thus inducing a download of the required backend
To Reproduce
Expected behavior
this should induce the model to use the proper backend that is in the default configuration.
One would also expect that if the model allowed multiple backends (i.e. llama.cpp and vllm-omni) that you'd be able to choose the backend as part of the payload to the request.
Logs
N/A.