-
Notifications
You must be signed in to change notification settings - Fork 542
fix: enable token usage tracking for streaming LLM calls #1264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Set stream_usage=True by default in model kwargs to ensure token usage metadata is included during streaming operations. This allows the LoggingCallbackHandler to properly track and report token statistics for streaming LLM calls. Without this parameter, streaming responses don't include usage_metadata, causing token usage tracking to fail during streaming operations and affecting accurate usage reporting and monitoring. Fixes token usage tracking when using streaming with LangChain chat models.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #1264 +/- ##
===========================================
+ Coverage 69.59% 69.78% +0.19%
===========================================
Files 161 161
Lines 16029 16057 +28
===========================================
+ Hits 11156 11206 +50
+ Misses 4873 4851 -22
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
Add integration tests to verify token usage tracking with streaming and non streaming LLMs, including multiple calls and unsupported providers. Update FakeLLM and TestChat to simulate stream_usage and token usage behavior for supported engines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR ensures that stream_usage=True
is enabled by default for streaming LLM calls with supported engines to capture token usage metadata during streaming operations.
- Enables
stream_usage
in model kwargs whenconfig.streaming
isTrue
for supported engines - Introduces
STREAM_USAGE_SUPPORTED_ENGINES
constant and updates tests to simulate token usage in streaming scenarios - Adds comprehensive integration and callback tests for token usage tracking
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
File | Description |
---|---|
tests/utils.py | Extended FakeLLM and TestChat to support token usage simulation |
tests/test_token_usage_integration.py | New integration tests covering streaming token usage |
tests/test_llmrails.py | Tests for stream_usage flag in LLMRails init |
tests/test_callbacks.py | Callback handler tests for token usage metadata |
nemoguardrails/rails/llm/llmrails.py | Set stream_usage=True in _prepare_model_kwargs |
nemoguardrails/llm/types.py | Defined STREAM_USAGE_SUPPORTED_ENGINES constant |
Comments suppressed due to low confidence (2)
nemoguardrails/llm/types.py:19
- Consider adding a docstring or inline comment explaining that this constant lists LLM engines which support the
stream_usage
parameter.
STREAM_USAGE_SUPPORTED_ENGINES = ["openai", "azure_openai", "nim"]
tests/utils.py:94
- The
current_token_usage
variable in_acall
is computed but never used afterwards. Remove it or wire it into the response logic to avoid dead code.
current_token_usage = None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! It's great Langchain implemented this feature in the end.
While some chat models were returning token usage metadata for a while in their own APIs even with streaming, Langchain did not have a standard way to return the metadata.
Should we put a link (in the docs or even code) to the relevant Langchain docs for streaming token stats? I think that the version for Langchain packages is also useful for developers as not all langchain_openai
versions support token stats in streaming mode.
Co-authored-by: Copilot <[email protected]> Signed-off-by: Pouyan <[email protected]>
Set
stream_usage=True
by default in model kwargs to ensure token usage metadata is included during streaming operations. This allows the LoggingCallbackHandler to properly track and report token statistics for streaming LLM calls.Without this parameter, streaming responses don't include usage_metadata, causing token usage tracking to fail during streaming operations and affecting accurate usage reporting and monitoring.
Fixes token usage tracking when using streaming with LangChain chat models.
Update
based on this comment