server: is there a way to get peak actual kv cache size? #15353

okuvshynov · 2025-08-15T19:02:34Z

okuvshynov
Aug 15, 2025

Good afternoon,

is there a way to get peak kv cache usage? I would like to understand how large the kv cache needs to be for my realistic workloads.

There's a mention of llamacpp:kv_cache_tokens in server docs, but it doesn't seem to be exported at the moment.

If there's no way currently, would it be ok to add tracking of peak size and export it to /metrics?

Thank you!

Answered by okuvshynov

#15361 is this something we could add?

okuvshynov · 2025-08-16T14:46:37Z

#15361 is this something we could add?

1 reply

Merged