How to increase context lenght and make things work #3823

kresimirfijacko · 2024-04-03T09:09:28Z

kresimirfijacko
Apr 3, 2024

I am struggling with understanding of KV cache handling and limits on concurrent requests.
For example, i am using Qwen--Qwen1.5-72B-Chat-GPTQ-Int4 on H100 80GB instance.
i've tried with vllm v0.3.3 and v0.4.0 and i am facing something i don't quite understand:

python -u -m vllm.entrypoints.openai.api_server \
--model Qwen/Qwen1.5-72B-Chat-GPTQ-Int4 \
--quantization gptq \
--max-model-len 8192

i didn't specify --max-num-seqs so it should be default of 256
That is also something that i don't understand - even when i use smaller 7B models, requests are always limited to max 100.
In case of this Qwen72B model, max request size is also limited to 100, and as requests are being processed, kv cache usage goes to 99%, and even though there are pending requests, they will not be added to queue until kv cache drops.
What i don't understand is how to actually increase context length size?
Example:

python -u -m vllm.entrypoints.openai.api_server \
--model Qwen/Qwen1.5-72B-Chat-GPTQ-Int4 \
--quantization gptq \
--max-model-len 16384

ValueError: The model's max seq len (16384) is larger than the maximum number of tokens that can be stored in KV cache (11648). Try increasing gpu_memory_utilization or decreasing max_model_len when initializing the engine.

I understand there is a memory limit, but i was thinking by that formula:
2 (K,V) * precision * hidden_layers * hidden_size * seq_len * batch_size
bold is because this is fixed
i was thinking i can manipulate seq_len * batch_size (8192 * 256) into something like (16384 * 128)
so in order to provide bigger context size, i would manipulate with batch_size

so far my experiments didn't give any result

?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to increase context lenght and make things work #3823

{{title}}

Replies: 0 comments

Select a reply

How to increase context lenght and make things work #3823

kresimirfijacko Apr 3, 2024

Replies: 0 comments

kresimirfijacko
Apr 3, 2024