Skip to content

Conversation

Pouyanpi
Copy link
Collaborator

@Pouyanpi Pouyanpi commented Jul 4, 2025

Set stream_usage=True by default in model kwargs to ensure token usage metadata is included during streaming operations. This allows the LoggingCallbackHandler to properly track and report token statistics for streaming LLM calls.

Without this parameter, streaming responses don't include usage_metadata, causing token usage tracking to fail during streaming operations and affecting accurate usage reporting and monitoring.

Fixes token usage tracking when using streaming with LangChain chat models.

  • unit tests
  • integration tests

Update

based on this comment

graph TD
    A[User Configuration] -->|streaming=true| B[LLMRails._prepare_model_kwargs]
    B --> C{streaming enabled?}
    C -->|Yes| D[add stream_usage=True to kwargs]
    C -->|No| E[don't add stream_usage]
    D --> F[pass kwargs to LLM Provider]
    E --> F
    F --> G{provider supports<br/>stream_usage?}
    G -->|Yes| H[returns token usage data]
    G -->|No| I[ignores parameter]
    H --> J[token usage tracked]
    I --> K[no token usage data]
Loading

Set stream_usage=True by default in model kwargs to ensure token usage
metadata is included during streaming operations. This allows the
LoggingCallbackHandler to properly track and report token statistics
for streaming LLM calls.

Without this parameter, streaming responses don't include usage_metadata,
causing token usage tracking to fail during streaming operations and
affecting accurate usage reporting and monitoring.

Fixes token usage tracking when using streaming with LangChain chat models.
@Pouyanpi Pouyanpi requested review from trebedea and removed request for trebedea July 4, 2025 13:44
@Pouyanpi
Copy link
Collaborator Author

Pouyanpi commented Jul 4, 2025

@trebedea It seems this was the only missing parameter and you've done all the work in #953 😃

I'll update this PR, this is incomplete.

@codecov-commenter
Copy link

codecov-commenter commented Jul 4, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 69.78%. Comparing base (6f58062) to head (d4684b1).
Report is 7 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1264      +/-   ##
===========================================
+ Coverage    69.59%   69.78%   +0.19%     
===========================================
  Files          161      161              
  Lines        16029    16057      +28     
===========================================
+ Hits         11156    11206      +50     
+ Misses        4873     4851      -22     
Flag Coverage Δ
python 69.78% <100.00%> (+0.19%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
nemoguardrails/rails/llm/llmrails.py 88.92% <100.00%> (+0.09%) ⬆️

... and 5 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Pouyanpi Pouyanpi added the bug Something isn't working label Jul 4, 2025
@Pouyanpi Pouyanpi self-assigned this Jul 4, 2025
@Pouyanpi Pouyanpi added this to the v0.15.0 milestone Jul 4, 2025
Pouyanpi added 2 commits July 7, 2025 13:38
Add integration tests to verify token usage tracking with streaming and non streaming LLMs, including multiple calls and unsupported providers.

Update FakeLLM and TestChat to simulate stream_usage and token usage
behavior for supported engines.
@Pouyanpi Pouyanpi marked this pull request as ready for review July 7, 2025 11:53
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR ensures that stream_usage=True is enabled by default for streaming LLM calls with supported engines to capture token usage metadata during streaming operations.

  • Enables stream_usage in model kwargs when config.streaming is True for supported engines
  • Introduces STREAM_USAGE_SUPPORTED_ENGINES constant and updates tests to simulate token usage in streaming scenarios
  • Adds comprehensive integration and callback tests for token usage tracking

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/utils.py Extended FakeLLM and TestChat to support token usage simulation
tests/test_token_usage_integration.py New integration tests covering streaming token usage
tests/test_llmrails.py Tests for stream_usage flag in LLMRails init
tests/test_callbacks.py Callback handler tests for token usage metadata
nemoguardrails/rails/llm/llmrails.py Set stream_usage=True in _prepare_model_kwargs
nemoguardrails/llm/types.py Defined STREAM_USAGE_SUPPORTED_ENGINES constant
Comments suppressed due to low confidence (2)

nemoguardrails/llm/types.py:19

  • Consider adding a docstring or inline comment explaining that this constant lists LLM engines which support the stream_usage parameter.
STREAM_USAGE_SUPPORTED_ENGINES = ["openai", "azure_openai", "nim"]

tests/utils.py:94

  • The current_token_usage variable in _acall is computed but never used afterwards. Remove it or wire it into the response logic to avoid dead code.
        current_token_usage = None

Copy link
Collaborator

@trebedea trebedea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! It's great Langchain implemented this feature in the end.
While some chat models were returning token usage metadata for a while in their own APIs even with streaming, Langchain did not have a standard way to return the metadata.

Should we put a link (in the docs or even code) to the relevant Langchain docs for streaming token stats? I think that the version for Langchain packages is also useful for developers as not all langchain_openai versions support token stats in streaming mode.

@Pouyanpi Pouyanpi merged commit ef97795 into develop Jul 10, 2025
17 checks passed
@Pouyanpi Pouyanpi deleted the fix/token-usage-streaming branch July 10, 2025 11:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working status: in review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants