server: is there a way to get peak actual kv cache size? #15353
-
Good afternoon, is there a way to get peak kv cache usage? I would like to understand how large the kv cache needs to be for my realistic workloads. There's a mention of llamacpp:kv_cache_tokens in server docs, but it doesn't seem to be exported at the moment. If there's no way currently, would it be ok to add tracking of peak size and export it to Thank you! |
Beta Was this translation helpful? Give feedback.
Answered by
okuvshynov
Aug 16, 2025
Replies: 1 comment 1 reply
-
#15361 is this something we could add? |
Beta Was this translation helpful? Give feedback.
1 reply
Answer selected by
okuvshynov
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
#15361 is this something we could add?