Replies: 1 comment
-
Did not see this truncate_prompt_tokens can be set during server starting time. I would consider submit a feature request, or modify the code myself for my container |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
I am using the following docker-compose for deploying my Llama 3.1 70B model with vLLM:
I would like the back to truncate any incoming tokens overflowing the context window. E.g. by being able to use "truncate_prompt_tokens" from SamplingParams. I am using this with Open-WebUI, truncating at the front-end is not an option for me. Thank you!
Beta Was this translation helpful? Give feedback.
All reactions