Skip to content

[Feature] 支持设置GPU占用的绝对值 #3804

@lh9171338

Description

@lh9171338

Motivation

cache_max_entry_count参数可以配置剩余GPU显存用于kv cache的比例,如果有多个模型在同一个GPU上启动,剩余GPU占用是动态变化的,容易导致有的模型启动失败,同时在推理阶段可能因为分配的kv cache显存不够导致推理失败

Related resources

No response

Additional context

No response

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions