[V1][Metrics] Add GPU prefix cache hit rate % gauge #12592

comaniac · 2025-01-30T23:08:43Z

github-actions · 2025-01-30T23:08:54Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

markmc

Thanks Cody!

vllm/v1/core/kv_cache_manager.py

vllm/v1/metrics/loggers.py

vllm/v1/core/scheduler.py

comaniac · 2025-01-31T19:37:38Z

After the second thought. I feel it might be easier to simply maintain the cache hit rate of the most recent N (e.g., 1k) requests. This aligns the logging flow better because we take a snapchat of the scheduler stat in every step. It also makes more sense to me because the cache hit rate doesn't associate with time but only requests. WDYT?

comaniac · 2025-02-04T16:53:38Z

Gentle ping @markmc

Signed-off-by: Cody Yu <[email protected]>

comaniac requested review from DarkLight1337, robertgshaw2-redhat, simon-mo, WoosukKwon, njhill, ywang96 and alexm-redhat as code owners January 30, 2025 23:08

comaniac force-pushed the v1-cache-metric-2 branch from 2bdfb1e to a768360 Compare January 30, 2025 23:09

comaniac mentioned this pull request Jan 30, 2025

[V1] Add prefix caching hit rate #11942

Closed

comaniac added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 30, 2025

markmc reviewed Jan 31, 2025

View reviewed changes

vllm/v1/core/kv_cache_manager.py Outdated Show resolved Hide resolved

vllm/v1/metrics/loggers.py Show resolved Hide resolved

markmc reviewed Jan 31, 2025

View reviewed changes

vllm/v1/core/scheduler.py Outdated Show resolved Hide resolved

comaniac force-pushed the v1-cache-metric-2 branch from c5196d9 to 4c85956 Compare January 31, 2025 19:32

mergify bot added the v1 label Feb 1, 2025

markmc mentioned this pull request Feb 4, 2025

[Feature][v1]: Add metrics support #10582

Open

1 task

comaniac added 4 commits February 4, 2025 17:15

done

bd53ed7

Signed-off-by: Cody Yu <[email protected]>

reset

68565b6

Signed-off-by: Cody Yu <[email protected]>

improve

91da711

Signed-off-by: Cody Yu <[email protected]>

test

e2aa6de

Signed-off-by: Cody Yu <[email protected]>

comaniac force-pushed the v1-cache-metric-2 branch from 4c85956 to e2aa6de Compare February 5, 2025 01:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[V1][Metrics] Add GPU prefix cache hit rate % gauge #12592

[V1][Metrics] Add GPU prefix cache hit rate % gauge #12592

comaniac commented Jan 30, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Jan 30, 2025

markmc left a comment

comaniac commented Jan 31, 2025

comaniac commented Feb 4, 2025

[V1][Metrics] Add GPU prefix cache hit rate % gauge #12592

Are you sure you want to change the base?

[V1][Metrics] Add GPU prefix cache hit rate % gauge #12592

Conversation

comaniac commented Jan 30, 2025 • edited by github-actions bot Loading

github-actions bot commented Jan 30, 2025

markmc left a comment

Choose a reason for hiding this comment

comaniac commented Jan 31, 2025

comaniac commented Feb 4, 2025

comaniac commented Jan 30, 2025 •

edited by github-actions bot

Loading