-
-
Notifications
You must be signed in to change notification settings - Fork 9k
Fix minor docs issues and fix metric requests #21040
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request includes minor documentation fixes and updates to how request metrics are handled. The change from time.time()
to time.monotonic()
for recording arrival times is a good improvement for accuracy in time-interval measurements.
However, I've identified a critical issue in vllm/v1/engine/output_processor.py
. The new logic for populating request metrics does not account for the case where statistics logging is disabled, which will cause a crash. I've provided a suggestion to fix this by adding a conditional check. Please address this to ensure the stability of the system.
request_output.metrics = RequestMetrics( | ||
arrival_time=req_state.stats.arrival_time, | ||
last_token_time=req_state.stats.last_token_ts, | ||
first_scheduled_time=req_state.stats.scheduled_ts, | ||
first_token_time=req_state.stats.first_token_ts, | ||
time_in_queue=req_state.stats.scheduled_ts - req_state.stats.arrival_time, | ||
finished_time=time.monotonic() | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new code to populate request_output.metrics
assumes that req_state.stats
is always available. However, req_state.stats
is initialized to None
if log_stats
is False
(see RequestState.__init__
).
This will lead to an AttributeError
when trying to access req_state.stats.arrival_time
, causing a crash when log_stats
is disabled.
To prevent this, you should add a check to ensure req_state.stats
is not None
before attempting to access its attributes.
request_output.metrics = RequestMetrics( | |
arrival_time=req_state.stats.arrival_time, | |
last_token_time=req_state.stats.last_token_ts, | |
first_scheduled_time=req_state.stats.scheduled_ts, | |
first_token_time=req_state.stats.first_token_ts, | |
time_in_queue=req_state.stats.scheduled_ts - req_state.stats.arrival_time, | |
finished_time=time.monotonic() | |
) | |
if req_state.stats: | |
request_output.metrics = RequestMetrics( | |
arrival_time=req_state.stats.arrival_time, | |
last_token_time=req_state.stats.last_token_ts, | |
first_scheduled_time=req_state.stats.scheduled_ts, | |
first_token_time=req_state.stats.first_token_ts, | |
time_in_queue=req_state.stats.scheduled_ts - req_state.stats.arrival_time, | |
finished_time=time.monotonic() | |
) |
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Fixes some minor issues in the docs, issue #15394, and issue #21014.