Replies: 1 comment
-
I solved the problem, there was an difference in the model name when running locally vs hugging face. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I try to start an OpenAI compatible server using the command:
sudo docker run --runtime nvidia --gpus all -v /root/.cache/huggingface -p 18888:18888 vllm/vllm-openai --model TheBloke/openchat-3.5-0106-AWQ --host 0.0.0.0 --enforce-eager --port 18888
But when I try to make a request, I get the error:
"POST /v1/chat/completions HTTP/1.1" 404 Not Found
Doing
wget localhost:18888/v1/models
works and I get:"GET /v1/models HTTP/1.1" 200 OK
If I run
/usr/bin/python3 -m ochat.serving.openai_api_server --model TheBloke/openchat-3.5-0106-AWQ --host 0.0.0.0
The requests works. I wonder if I'm making mistake in how I use docker and whether the OpenAI endpoint is being set up or some other non-compatible API?
Beta Was this translation helpful? Give feedback.
All reactions