Skip to content

Conversation

@ajcasagrande
Copy link
Contributor

@ajcasagrande ajcasagrande commented Dec 5, 2025

  • Add total_token_throughput metric
  • Rename prefill_throughput to prefill_throughput_per_user for clarity
# benchmark_duration is converted to seconds
total_token_throughput = (total_isl + total_osl) / benchmark_duration 
# time_to_first_token is converted to seconds, per request
prefill_throughput_per_user = input_sequence_length / time_to_first_token

@github-actions
Copy link

github-actions bot commented Dec 5, 2025

Try out this PR

Quick install:

pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@337283b48215270edab950575919b53026ab70c2

Recommended with virtual environment (using uv):

uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@337283b48215270edab950575919b53026ab70c2

Last updated for commit: 337283bBrowse code

@ajcasagrande ajcasagrande changed the title feat: add total_token_throughput metric feat: add total_token_throughput metric, rename prefill_throughput_per_user Dec 5, 2025
@github-actions github-actions bot added the feat label Dec 5, 2025
@coderabbitai
Copy link

coderabbitai bot commented Dec 5, 2025

Walkthrough

Renamed PrefillThroughputMetric to PrefillThroughputPerUserMetric with updated semantics and identifiers. Added new TotalTokenThroughputMetric derived metric class computing combined input/output token throughput. Included comprehensive unit tests validating throughput calculations and error handling.

Changes

Cohort / File(s) Change Summary
Per-user metric refactoring
src/aiperf/metrics/types/prefill_throughput_per_user.py
Renamed class from PrefillThroughputMetric to PrefillThroughputPerUserMetric. Updated tag, headers, unit, and flags to reflect per-user semantics. Modified docstrings and error messages to align with per-user calculation context.
Total token throughput metric
src/aiperf/metrics/types/total_token_throughput.py
Added new derived metric class TotalTokenThroughputMetric computing throughput as (total input + output tokens) / benchmark duration. Includes zero-duration guard with NoMetricValue exception and metadata declarations.
Total token throughput tests
tests/unit/metrics/test_total_token_throughput_metric.py
Added unit test class TestTotalTokenThroughputMetric with parametrized tests validating throughput calculations across input/output/duration combinations and error handling for zero/None durations.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • New metric implementation requires verification of _derive_value logic and metric dependency declarations
  • Ensure zero-duration guard and NoMetricValue exception handling are correctly implemented
  • Validate test coverage comprehensively exercises all calculation paths and edge cases
  • Verify consistency of per-user metric naming changes with framework conventions

Poem

🐰 A metric hops in, measuring tokens with care,
Throughput per user, a fresh calculation rare,
Total tokens flowing, in and out they stream,
Per-second precision fuels our performance dream! 📊✨

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title accurately summarizes the two main changes: adding a total_token_throughput metric and renaming the prefill_throughput to prefill_throughput_per_user.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 669fb62 and ec4f211.

📒 Files selected for processing (3)
  • src/aiperf/metrics/types/prefill_throughput_per_user.py (3 hunks)
  • src/aiperf/metrics/types/total_token_throughput.py (1 hunks)
  • tests/unit/metrics/test_total_token_throughput_metric.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use async/await for all I/O operations; never use time.sleep() or blocking calls
Always use orjson for JSON operations: orjson.loads(s) and orjson.dumps(d)
All functions must have type hints on parameters and return types
Use Python 3.10+ union syntax (|) instead of typing.Union; use match/case for pattern matching; use @DataClass(slots=True)

Files:

  • src/aiperf/metrics/types/prefill_throughput_per_user.py
  • tests/unit/metrics/test_total_token_throughput_metric.py
  • src/aiperf/metrics/types/total_token_throughput.py
**/*test*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Test files must use pytest with fixtures, helpers, and @pytest.mark.parametrize; import statements at the top; use # fmt: skip for long parameterize blocks

Files:

  • tests/unit/metrics/test_total_token_throughput_metric.py
🧬 Code graph analysis (2)
src/aiperf/metrics/types/prefill_throughput_per_user.py (1)
src/aiperf/common/enums/metric_enums.py (2)
  • MetricOverTimeUnit (338-396)
  • MetricFlags (602-698)
tests/unit/metrics/test_total_token_throughput_metric.py (4)
src/aiperf/common/exceptions.py (1)
  • NoMetricValue (168-169)
src/aiperf/metrics/metric_dicts.py (1)
  • MetricResultsDict (120-140)
src/aiperf/metrics/types/input_sequence_length_metric.py (1)
  • TotalInputSequenceLengthMetric (45-63)
src/aiperf/metrics/types/total_token_throughput.py (1)
  • TotalTokenThroughputMetric (17-57)
🪛 Ruff (0.14.7)
src/aiperf/metrics/types/total_token_throughput.py

35-39: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


54-56: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: build (macos-latest, 3.10)
  • GitHub Check: build (macos-latest, 3.12)
  • GitHub Check: build (macos-latest, 3.13)
  • GitHub Check: build (macos-latest, 3.11)
  • GitHub Check: build (ubuntu-latest, 3.12)
  • GitHub Check: build (ubuntu-latest, 3.10)
  • GitHub Check: build (ubuntu-latest, 3.11)
  • GitHub Check: build (ubuntu-latest, 3.13)
  • GitHub Check: integration-tests (ubuntu-latest, 3.13)
  • GitHub Check: integration-tests (ubuntu-latest, 3.11)
  • GitHub Check: integration-tests (ubuntu-latest, 3.12)
  • GitHub Check: integration-tests (ubuntu-latest, 3.10)
🔇 Additional comments (6)
src/aiperf/metrics/types/total_token_throughput.py (3)

17-23: LGTM!

The class definition and docstring clearly explain the metric's purpose and formula.


25-39: LGTM!

The metadata attributes are well-defined:

  • Appropriate unit (TOKENS_PER_SECOND) and flags for a throughput metric
  • Correct required_metrics dependencies

Note: The static analysis hint RUF012 suggesting ClassVar is a false positive; these are immutable class-level constants that follow the established pattern in the codebase.


41-57: LGTM!

The implementation correctly calculates throughput with proper error handling:

  • Appropriate use of get_or_raise and get_converted_or_raise
  • Zero-duration validation prevents division by zero
  • Correct formula: (input_tokens + output_tokens) / duration

The type: ignore comments are justified due to the complex type unions in the metric framework. The TRY003 static analysis hint is a nitpick and can be safely ignored.

src/aiperf/metrics/types/prefill_throughput_per_user.py (2)

13-31: LGTM!

The rename from PrefillThroughputMetric to PrefillThroughputPerUserMetric is consistent across all identifiers:

  • Class name, tag, header, and short_header all updated
  • Correct unit (TOKENS_PER_SECOND_PER_USER) for per-user semantics
  • STREAMING_TOKENS_ONLY flag appropriately replaces STREAMING_ONLY

This clarification improves metric naming consistency and makes the per-user nature explicit.


42-53: LGTM!

Documentation and error messages properly updated to reflect per-user semantics. The calculation logic remains correct and unchanged.

tests/unit/metrics/test_total_token_throughput_metric.py (1)

19-45: LGTM!

The test structure follows pytest best practices:

  • Proper use of @pytest.mark.parametrize with comprehensive test cases
  • Correct application of # fmt: skip per coding guidelines
  • Type hints on test method parameters
  • Good coverage of edge cases (zero tokens, fractional duration, large numbers)

@ajcasagrande ajcasagrande force-pushed the ajc/throughput-metrics branch from ec4f211 to 337283b Compare December 5, 2025 04:49
@codecov
Copy link

codecov bot commented Dec 5, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants