fix: enable token usage tracking for streaming LLM calls #1264

Pouyanpi · 2025-07-04T13:43:54Z

Set stream_usage=True by default in model kwargs to ensure token usage metadata is included during streaming operations. This allows the LoggingCallbackHandler to properly track and report token statistics for streaming LLM calls.

Without this parameter, streaming responses don't include usage_metadata, causing token usage tracking to fail during streaming operations and affecting accurate usage reporting and monitoring.

Fixes token usage tracking when using streaming with LangChain chat models.

unit tests
integration tests

Update

based on this comment

graph TD
    A[User Configuration] -->|streaming=true| B[LLMRails._prepare_model_kwargs]
    B --> C{streaming enabled?}
    C -->|Yes| D[add stream_usage=True to kwargs]
    C -->|No| E[don't add stream_usage]
    D --> F[pass kwargs to LLM Provider]
    E --> F
    F --> G{provider supports<br/>stream_usage?}
    G -->|Yes| H[returns token usage data]
    G -->|No| I[ignores parameter]
    H --> J[token usage tracked]
    I --> K[no token usage data]

Set stream_usage=True by default in model kwargs to ensure token usage metadata is included during streaming operations. This allows the LoggingCallbackHandler to properly track and report token statistics for streaming LLM calls. Without this parameter, streaming responses don't include usage_metadata, causing token usage tracking to fail during streaming operations and affecting accurate usage reporting and monitoring. Fixes token usage tracking when using streaming with LangChain chat models.

Pouyanpi · 2025-07-04T13:46:03Z

@trebedea It seems this was the only missing parameter and you've done all the work in #953 😃

I'll update this PR, this is incomplete.

codecov-commenter · 2025-07-04T13:47:57Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 69.78%. Comparing base (6f58062) to head (d4684b1).
Report is 7 commits behind head on develop.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #1264      +/-   ##
===========================================
+ Coverage    69.59%   69.78%   +0.19%     
===========================================
  Files          161      161              
  Lines        16029    16057      +28     
===========================================
+ Hits         11156    11206      +50     
+ Misses        4873     4851      -22

Flag	Coverage Δ
python	`69.78% <100.00%> (+0.19%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
nemoguardrails/rails/llm/llmrails.py	`88.92% <100.00%> (+0.09%)`	⬆️

... and 5 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

nemoguardrails/rails/llm/llmrails.py

Add integration tests to verify token usage tracking with streaming and non streaming LLMs, including multiple calls and unsupported providers. Update FakeLLM and TestChat to simulate stream_usage and token usage behavior for supported engines.

nemoguardrails/rails/llm/llmrails.py

Copilot

Pull Request Overview

This PR ensures that stream_usage=True is enabled by default for streaming LLM calls with supported engines to capture token usage metadata during streaming operations.

Enables stream_usage in model kwargs when config.streaming is True for supported engines
Introduces STREAM_USAGE_SUPPORTED_ENGINES constant and updates tests to simulate token usage in streaming scenarios
Adds comprehensive integration and callback tests for token usage tracking

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tests/utils.py	Extended `FakeLLM` and `TestChat` to support token usage simulation
tests/test_token_usage_integration.py	New integration tests covering streaming token usage
tests/test_llmrails.py	Tests for `stream_usage` flag in `LLMRails` init
tests/test_callbacks.py	Callback handler tests for token usage metadata
nemoguardrails/rails/llm/llmrails.py	Set `stream_usage=True` in `_prepare_model_kwargs`
nemoguardrails/llm/types.py	Defined `STREAM_USAGE_SUPPORTED_ENGINES` constant

Comments suppressed due to low confidence (2)

nemoguardrails/llm/types.py:19

Consider adding a docstring or inline comment explaining that this constant lists LLM engines which support the stream_usage parameter.

STREAM_USAGE_SUPPORTED_ENGINES = ["openai", "azure_openai", "nim"]

tests/utils.py:94

The current_token_usage variable in _acall is computed but never used afterwards. Remove it or wire it into the response logic to avoid dead code.

        current_token_usage = None

tests/utils.py

tests/test_llmrails.py

trebedea

Looks good! It's great Langchain implemented this feature in the end.
While some chat models were returning token usage metadata for a while in their own APIs even with streaming, Langchain did not have a standard way to return the metadata.

Should we put a link (in the docs or even code) to the relevant Langchain docs for streaming token stats? I think that the version for Langchain packages is also useful for developers as not all langchain_openai versions support token stats in streaming mode.

tests/test_token_usage_integration.py

tests/utils.py

Co-authored-by: Copilot <[email protected]> Signed-off-by: Pouyan <[email protected]>

Pouyanpi requested review from trebedea and removed request for trebedea July 4, 2025 13:44

Pouyanpi commented Jul 4, 2025

View reviewed changes

nemoguardrails/rails/llm/llmrails.py Outdated Show resolved Hide resolved

Pouyanpi added 2 commits July 4, 2025 16:37

feat(llmrails): enable stream_usage only for supported engines

2423af8

test: add tests for stream_usage and token tracking

96541bd

Pouyanpi added the bug Something isn't working label Jul 4, 2025

Pouyanpi self-assigned this Jul 4, 2025

Pouyanpi added this to the v0.15.0 milestone Jul 4, 2025

Pouyanpi added 2 commits July 7, 2025 13:38

feat: add constant for stream usage supported llm engines

daac629

test: add integration tests for streaming

52c9701

Add integration tests to verify token usage tracking with streaming and non streaming LLMs, including multiple calls and unsupported providers. Update FakeLLM and TestChat to simulate stream_usage and token usage behavior for supported engines.

Pouyanpi commented Jul 7, 2025

View reviewed changes

nemoguardrails/rails/llm/llmrails.py Outdated Show resolved Hide resolved

Pouyanpi marked this pull request as ready for review July 7, 2025 11:53

Pouyanpi added the status: in review label Jul 7, 2025

Pouyanpi requested review from tgasser-nv, Copilot and trebedea July 7, 2025 11:53

Copilot AI reviewed Jul 7, 2025

View reviewed changes

tests/utils.py Show resolved Hide resolved

tests/utils.py Outdated Show resolved Hide resolved

tests/test_llmrails.py Outdated Show resolved Hide resolved

trebedea approved these changes Jul 7, 2025

View reviewed changes

tests/test_token_usage_integration.py Outdated Show resolved Hide resolved

tests/utils.py Show resolved Hide resolved

Pouyanpi and others added 3 commits July 9, 2025 13:24

Apply suggestion from @Copilot

0995f66

Co-authored-by: Copilot <[email protected]> Signed-off-by: Pouyan <[email protected]>

apply code review suggestions

701c522

always pass stream_usage when streaming

1618cc3

trebedea approved these changes Jul 9, 2025

View reviewed changes

chore(deps): bump langchain-openai to >=0.1.0

d4684b1

Pouyanpi mentioned this pull request Jul 10, 2025

docs(streaming): add section on token usage tracking #1282

Merged

1 task

Pouyanpi merged commit ef97795 into develop Jul 10, 2025
17 checks passed

Pouyanpi deleted the fix/token-usage-streaming branch July 10, 2025 11:40

Pouyanpi mentioned this pull request Jul 10, 2025

fix: remove stream_usage from text completion #1285

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: enable token usage tracking for streaming LLM calls #1264

fix: enable token usage tracking for streaming LLM calls #1264

Uh oh!

Pouyanpi commented Jul 4, 2025 •

edited

Loading

Uh oh!

Pouyanpi commented Jul 4, 2025 •

edited

Loading

Uh oh!

codecov-commenter commented Jul 4, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

trebedea left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fix: enable token usage tracking for streaming LLM calls #1264

fix: enable token usage tracking for streaming LLM calls #1264

Uh oh!

Conversation

Pouyanpi commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Update

Uh oh!

Pouyanpi commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

trebedea left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Pouyanpi commented Jul 4, 2025 •

edited

Loading

Pouyanpi commented Jul 4, 2025 •

edited

Loading

codecov-commenter commented Jul 4, 2025 •

edited

Loading