-
Notifications
You must be signed in to change notification settings - Fork 268
Add cached input token tracking to Usage reporting #2394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: canary
Are you sure you want to change the base?
Add cached input token tracking to Usage reporting #2394
Conversation
- Add `cached_input_tokens` field to core Usage struct - Update LLMUsage and LLMCompleteResponseMetadata to include cached tokens - Implement token extraction for all LLM providers: - Anthropic: Sum of cache_read_input_tokens + cache_creation_input_tokens - OpenAI: Extract from input_tokens_details.cached_tokens - Google/Vertex: Use cached_content_token_count field - AWS: Set to None (no cached token support) - Update token aggregation logic in Collector and FunctionLog - Add cached token support to all language bindings: - TypeScript: Add cachedInputTokens getter - Python: Add cached_input_tokens property - Go: Add CachedInputTokens() method - Ruby: Add cached_input_tokens method - Update RPC types and converters for cached token reporting - Add some integration tests for cached token tracking
@lukeramsden is attempting to deploy a commit to the Boundary Team on Vercel. A member of the Team first needs to authorize it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed everything up to 8fa77ed in 2 minutes and 9 seconds. Click for details.
- Reviewed
1233
lines of code in26
files - Skipped
0
files when reviewing. - Skipped posting
7
draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. integ-tests/typescript/tests/providers/caching.test.ts:4
- Draft comment:
Good integration tests covering various caching scenarios for each provider. The tests verify that for providers such as OpenAI and Gemini, the 'cachedInputTokens' field is defined and aggregated correctly, while for AWS it remains null. This meets the requirement to track cached input tokens. - Reason this comment was not posted:
Confidence changes required:0%
<= threshold50%
None
2. integ-tests/typescript/tests/providers/caching.test.ts:17
- Draft comment:
Using expect(...).toBeDefined() for cached tokens is appropriate. Consider also verifying that if a provider supports caching, the value is a number rather than just defined. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50% The comment misunderstands the API - cachedInputTokens is intentionally nullable, not always a number. The current test is actually more correct as it allows for both valid states (number when cached, null when not cached). The suggestion would make the test more restrictive than the actual API contract. Could there be value in having separate test cases - one for verifying the field exists, and another for type checking when we know caching occurred? While separate test cases could be valuable, that would be a larger refactoring. The current approach correctly tests the API contract where the field can be null. The comment should be deleted as it suggests a change that would make the tests incorrect by not allowing valid null values.
3. integ-tests/typescript/tests/providers/caching.test.ts:37
- Draft comment:
The test calculates totalCachedTokens using a reduce. Good approach to verify that the collector aggregates cached tokens correctly. Ensure that using the nullish operator (|| 0) is consistent with the intended type. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =50%
<= threshold50%
The comment starts with a purely informative statement about the test using a reduce function, which is not allowed. However, it ends with a suggestion to ensure that the use of the nullish operator is consistent with the intended type, which is a valid comment as it asks for confirmation on a specific implementation detail.
4. integ-tests/typescript/tests/providers/caching.test.ts:74
- Draft comment:
The tests for repeated large content for both OpenAI and Gemini are comprehensive. One suggestion is to encapsulate repeated logic for making calls and verifying caching into helper functions to reduce duplication and improve maintainability. - Reason this comment was not posted:
Confidence changes required:50%
<= threshold50%
None
5. integ-tests/typescript/tests/providers/caching.test.ts:115
- Draft comment:
The test for Google/Vertex caching correctly checks that the cached tokens field is defined and that the collector-level usage matches. It confirms that caching is provider-specific. - Reason this comment was not posted:
Confidence changes required:0%
<= threshold50%
None
6. integ-tests/typescript/tests/providers/caching.test.ts:124
- Draft comment:
The test for Vertex AI streaming with cached token tracking uses async streaming and validates that the log type is 'stream'. It is good that you verify that cached tokens are captured even during streaming. Consider also asserting that the stream's chunks are non-empty if appropriate. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The comment is suggesting additional validation that's not central to the test's purpose. The test is specifically focused on cached token tracking during streaming operations. The final response length check is already sufficient for ensuring the stream produced content. Adding chunk-level validation would be redundant and distract from the main purpose. Perhaps checking chunk content could catch subtle streaming-specific issues that might be missed by only checking the final response. The test's purpose is to verify cached token tracking works with streaming, not to validate streaming functionality itself. The final response check is sufficient as a sanity check that streaming worked. Delete the comment. It suggests adding validation that's not relevant to the test's core purpose of verifying cached token tracking during streaming operations.
7. integ-tests/typescript/tests/providers/caching.test.ts:204
- Draft comment:
The mixed providers test is solid, ensuring that when providers with caching (OpenAI, Gemini) are mixed with one without caching (AWS), the aggregated cached tokens equal the sum from providers with caching only. This confirms that the aggregation logic in usage reporting is working as intended. - Reason this comment was not posted:
Confidence changes required:0%
<= threshold50%
None
Workflow ID: wflow_Aw1igTGWk7ymd8uD
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
🔒 Entelligence AI Vulnerability Scanner ✅ No security vulnerabilities found! Your code passed our comprehensive security analysis. 📊 Files Analyzed: 3 files |
pub fn __repr__(&self) -> String { | ||
format!( | ||
"Usage(input_tokens={}, output_tokens={})", | ||
"Usage(input_tokens={}, output_tokens={}, cached_input_tokens={})", | ||
self.inner | ||
.input_tokens | ||
.map_or_else(|| "None".to_string(), |v| v.to_string()), | ||
self.inner | ||
.output_tokens | ||
.map_or_else(|| "None".to_string(), |v| v.to_string()), | ||
self.inner | ||
.cached_input_tokens | ||
.map_or_else(|| "None".to_string(), |v| v.to_string()) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correctness: __repr__
method for Usage
will print cached_input_tokens
as the third argument, but if self.inner.input_tokens
or self.inner.output_tokens
is None
, the output will be misaligned and misleading.
🤖 AI Agent Prompt for Cursor/Windsurf
📋 Copy this prompt to your AI coding assistant (Cursor, Windsurf, etc.) to get help fixing this issue
In engine/language_client_python/src/types/log_collector.rs, lines 405-417, the __repr__ method for Usage prints input_tokens, output_tokens, and cached_input_tokens. However, if any of these are None, the output may be misleading or misaligned. Please ensure that each field is printed in the correct order and that None values are clearly represented, so the output always matches the format Usage(input_tokens=..., output_tokens=..., cached_input_tokens=...).
📝 Committable Code Suggestion
‼️ Ensure you review the code suggestion before committing it to the branch. Make sure it replaces the highlighted code, contains no missing lines, and has no issues with indentation.
pub fn __repr__(&self) -> String { | |
format!( | |
"Usage(input_tokens={}, output_tokens={})", | |
"Usage(input_tokens={}, output_tokens={}, cached_input_tokens={})", | |
self.inner | |
.input_tokens | |
.map_or_else(|| "None".to_string(), |v| v.to_string()), | |
self.inner | |
.output_tokens | |
.map_or_else(|| "None".to_string(), |v| v.to_string()), | |
self.inner | |
.cached_input_tokens | |
.map_or_else(|| "None".to_string(), |v| v.to_string()) | |
) | |
format!( | |
"Usage(input_tokens={}, output_tokens={}, cached_input_tokens={})", | |
self.inner | |
.input_tokens | |
.map_or_else(|| "None".to_string(), |v| v.to_string()), | |
self.inner | |
.output_tokens | |
.map_or_else(|| "None".to_string(), |v| v.to_string()), | |
self.inner | |
.cached_input_tokens | |
.map_or_else(|| "None".to_string(), |v| v.to_string()) | |
) |
pub fn to_s(&self) -> String { | ||
format!( | ||
"Usage(input_tokens={}, output_tokens={})", | ||
"Usage(input_tokens={}, output_tokens={}, cached_input_tokens={})", | ||
self.inner | ||
.input_tokens | ||
.map_or_else(|| "null".to_string(), |v| v.to_string()), | ||
self.inner | ||
.output_tokens | ||
.map_or_else(|| "null".to_string(), |v| v.to_string()), | ||
self.inner | ||
.cached_input_tokens | ||
.map_or_else(|| "null".to_string(), |v| v.to_string()) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correctness: to_s
method will panic at runtime if self.inner
is None, as it assumes self.inner
is always present.
🤖 AI Agent Prompt for Cursor/Windsurf
📋 Copy this prompt to your AI coding assistant (Cursor, Windsurf, etc.) to get help fixing this issue
In engine/language_client_ruby/ext/ruby_ffi/src/types/log_collector.rs, lines 384-396, the `to_s` method assumes `self.inner` is always present, which can cause a panic if it is None. Update the method to safely handle the case where `self.inner` is None, returning a string with all fields as 'null' in that case.
📝 Committable Code Suggestion
‼️ Ensure you review the code suggestion before committing it to the branch. Make sure it replaces the highlighted code, contains no missing lines, and has no issues with indentation.
pub fn to_s(&self) -> String { | |
format!( | |
"Usage(input_tokens={}, output_tokens={})", | |
"Usage(input_tokens={}, output_tokens={}, cached_input_tokens={})", | |
self.inner | |
.input_tokens | |
.map_or_else(|| "null".to_string(), |v| v.to_string()), | |
self.inner | |
.output_tokens | |
.map_or_else(|| "null".to_string(), |v| v.to_string()), | |
self.inner | |
.cached_input_tokens | |
.map_or_else(|| "null".to_string(), |v| v.to_string()) | |
) | |
pub fn to_s(&self) -> String { | |
match &self.inner { | |
Some(inner) => format!( | |
"Usage(input_tokens={}, output_tokens={}, cached_input_tokens={})", | |
inner.input_tokens.map_or_else(|| "null".to_string(), |v| v.to_string()), | |
inner.output_tokens.map_or_else(|| "null".to_string(), |v| v.to_string()), | |
inner.cached_input_tokens.map_or_else(|| "null".to_string(), |v| v.to_string()) | |
), | |
None => "Usage(input_tokens=null, output_tokens=null, cached_input_tokens=null)".to_string(), | |
} | |
} |
pub fn to_string(&self) -> String { | ||
format!( | ||
"Usage(input_tokens={}, output_tokens={})", | ||
"Usage(input_tokens={}, output_tokens={}, cached_input_tokens={})", | ||
self.inner | ||
.input_tokens | ||
.map_or_else(|| "null".to_string(), |v| v.to_string()), | ||
self.inner | ||
.output_tokens | ||
.map_or_else(|| "null".to_string(), |v| v.to_string()), | ||
self.inner | ||
.cached_input_tokens | ||
.map_or_else(|| "null".to_string(), |v| v.to_string()) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
performance: to_string
method uses chained .map_or_else
for each token field, but this is not a significant performance issue given the small, fixed number of fields and negligible runtime impact.
checking this now |
Summary
This PR adds comprehensive cached input token tracking to the BAML Usage reporting system. The Collector and Usage now track and report cached input tokens alongside the existing input and output tokens.
Changes Made
cached_input_tokens: Option<i64>
field to track cached tokenscache_read_input_tokens
input_tokens_details.cached_tokens
cached_content_token_count
fieldusage.cachedInputTokens
usage.cached_input_tokens
usage.CachedInputTokens()
usage.cached_input_tokens
Test Plan
Technical Notes
Closes #2349
Important
Add cached input token tracking to BAML Usage reporting, updating core structures, provider integrations, token aggregation, language bindings, and tests.
cached_input_tokens
field toLLMUsage
inevents.rs
,trace_event.rs
, andmod.rs
to track cached tokens.Anthropic
(fromcache_read_input_tokens
),OpenAI
(frominput_tokens_details.cached_tokens
),Google/Vertex
(fromcached_content_token_count
), andAWS Bedrock
(set to None).storage.rs
andllm_response_to_log_event.rs
to sum cached tokens.native.d.ts
), Python (log_collector.rs
), Go (rawobjects_public.go
), and Ruby (log_collector.rs
).trace_data.rs
to include cached token data.test_collector.py
andcollector.test.ts
to verify cached token tracking for various providers and scenarios.This description was created by
for 8fa77ed. You can customize this summary. It will automatically update as commits are pushed.