Add cached input token tracking to Usage reporting #2394

lukeramsden · 2025-08-30T11:09:50Z

Summary

This PR adds comprehensive cached input token tracking to the BAML Usage reporting system. The Collector and Usage now track and report cached input tokens alongside the existing input and output tokens.

Changes Made

Core Usage struct: Added cached_input_tokens: Option<i64> field to track cached tokens
LLM Provider Integration: Implemented cached token extraction for all supported providers:
- Anthropic: Extracts from cache_read_input_tokens
- OpenAI: Extracts from input_tokens_details.cached_tokens
- Google/Vertex: Uses cached_content_token_count field
- AWS Bedrock: Set to None (no cached token support currently in the SDK version BAML uses - and there is some sort of dependency issue when upgrading, see Cargo.toml)
Token Aggregation: Updated all token aggregation logic in Collector and FunctionLog to sum cached tokens
Language Bindings: Added cached token support to all client libraries:
- TypeScript: usage.cachedInputTokens
- Python: usage.cached_input_tokens
- Go: usage.CachedInputTokens()
- Ruby: usage.cached_input_tokens
RPC Integration: Updated RPC types and converters to include cached token data

Test Plan

Core library compilation verified
All provider response handlers updated with cached token extraction
Language binding interfaces expanded with cached token accessors
Token aggregation logic preserves cached token counts across multiple calls
RPC serialization includes cached token data

Technical Notes

Cached tokens are tracked separately from input/output tokens for better cost analysis
Provider-specific token extraction handles cases where cached token data is unavailable
All changes are backward compatible with existing Usage API
Language bindings maintain consistent naming conventions across all supported languages

Closes #2349

Important

Add cached input token tracking to BAML Usage reporting, updating core structures, provider integrations, token aggregation, language bindings, and tests.

Behavior:
- Added cached_input_tokens field to LLMUsage in events.rs, trace_event.rs, and mod.rs to track cached tokens.
- Implemented cached token extraction for providers: Anthropic (from cache_read_input_tokens), OpenAI (from input_tokens_details.cached_tokens), Google/Vertex (from cached_content_token_count), and AWS Bedrock (set to None).
- Updated token aggregation logic in storage.rs and llm_response_to_log_event.rs to sum cached tokens.
Language Bindings:
- Added cached token support to TypeScript (native.d.ts), Python (log_collector.rs), Go (rawobjects_public.go), and Ruby (log_collector.rs).
RPC Integration:
- Updated RPC types and converters in trace_data.rs to include cached token data.
Tests:
- Added tests in test_collector.py and collector.test.ts to verify cached token tracking for various providers and scenarios.

^{This description was created by}^{for 8fa77ed. You can customize this summary. It will automatically update as commits are pushed.}

- Add `cached_input_tokens` field to core Usage struct - Update LLMUsage and LLMCompleteResponseMetadata to include cached tokens - Implement token extraction for all LLM providers: - Anthropic: Sum of cache_read_input_tokens + cache_creation_input_tokens - OpenAI: Extract from input_tokens_details.cached_tokens - Google/Vertex: Use cached_content_token_count field - AWS: Set to None (no cached token support) - Update token aggregation logic in Collector and FunctionLog - Add cached token support to all language bindings: - TypeScript: Add cachedInputTokens getter - Python: Add cached_input_tokens property - Go: Add CachedInputTokens() method - Ruby: Add cached_input_tokens method - Update RPC types and converters for cached token reporting - Add some integration tests for cached token tracking

vercel · 2025-08-30T11:09:56Z

@lukeramsden is attempting to deploy a commit to the Boundary Team on Vercel.

A member of the Team first needs to authorize it.

ellipsis-dev

Important

Looks good to me! 👍

Reviewed everything up to 8fa77ed in 2 minutes and 9 seconds. Click for details.

Reviewed 1233 lines of code in 26 files
Skipped 0 files when reviewing.
Skipped posting 7 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. integ-tests/typescript/tests/providers/caching.test.ts:4

Draft comment:
Good integration tests covering various caching scenarios for each provider. The tests verify that for providers such as OpenAI and Gemini, the 'cachedInputTokens' field is defined and aggregated correctly, while for AWS it remains null. This meets the requirement to track cached input tokens.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

2. integ-tests/typescript/tests/providers/caching.test.ts:17

Draft comment:
Using expect(...).toBeDefined() for cached tokens is appropriate. Consider also verifying that if a provider supports caching, the value is a number rather than just defined.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50% The comment misunderstands the API - cachedInputTokens is intentionally nullable, not always a number. The current test is actually more correct as it allows for both valid states (number when cached, null when not cached). The suggestion would make the test more restrictive than the actual API contract. Could there be value in having separate test cases - one for verifying the field exists, and another for type checking when we know caching occurred? While separate test cases could be valuable, that would be a larger refactoring. The current approach correctly tests the API contract where the field can be null. The comment should be deleted as it suggests a change that would make the tests incorrect by not allowing valid null values.

3. integ-tests/typescript/tests/providers/caching.test.ts:37

Draft comment:
The test calculates totalCachedTokens using a reduce. Good approach to verify that the collector aggregates cached tokens correctly. Ensure that using the nullish operator (|| 0) is consistent with the intended type.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 50% <= threshold 50% The comment starts with a purely informative statement about the test using a reduce function, which is not allowed. However, it ends with a suggestion to ensure that the use of the nullish operator is consistent with the intended type, which is a valid comment as it asks for confirmation on a specific implementation detail.

4. integ-tests/typescript/tests/providers/caching.test.ts:74

Draft comment:
The tests for repeated large content for both OpenAI and Gemini are comprehensive. One suggestion is to encapsulate repeated logic for making calls and verifying caching into helper functions to reduce duplication and improve maintainability.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 50% None

5. integ-tests/typescript/tests/providers/caching.test.ts:115

Draft comment:
The test for Google/Vertex caching correctly checks that the cached tokens field is defined and that the collector-level usage matches. It confirms that caching is provider-specific.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

6. integ-tests/typescript/tests/providers/caching.test.ts:124

Draft comment:
The test for Vertex AI streaming with cached token tracking uses async streaming and validates that the log type is 'stream'. It is good that you verify that cached tokens are captured even during streaming. Consider also asserting that the stream's chunks are non-empty if appropriate.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The comment is suggesting additional validation that's not central to the test's purpose. The test is specifically focused on cached token tracking during streaming operations. The final response length check is already sufficient for ensuring the stream produced content. Adding chunk-level validation would be redundant and distract from the main purpose. Perhaps checking chunk content could catch subtle streaming-specific issues that might be missed by only checking the final response. The test's purpose is to verify cached token tracking works with streaming, not to validate streaming functionality itself. The final response check is sufficient as a sanity check that streaming worked. Delete the comment. It suggests adding validation that's not relevant to the test's core purpose of verifying cached token tracking during streaming operations.

7. integ-tests/typescript/tests/providers/caching.test.ts:204

Draft comment:
The mixed providers test is solid, ensuring that when providers with caching (OpenAI, Gemini) are mixed with one without caching (AWS), the aggregated cached tokens equal the sum from providers with caching only. This confirms that the aggregation logic in usage reporting is working as intended.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_Aw1igTGWk7ymd8uD

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

entelligence-ai-pr-reviews · 2025-08-30T11:14:04Z

🔒 Entelligence AI Vulnerability Scanner

✅ No security vulnerabilities found!

Your code passed our comprehensive security analysis.

📊 Files Analyzed: 3 files

entelligence-ai-pr-reviews · 2025-08-30T11:14:11Z

engine/language_client_python/src/types/log_collector.rs

    pub fn __repr__(&self) -> String {
        format!(
-            "Usage(input_tokens={}, output_tokens={})",
+            "Usage(input_tokens={}, output_tokens={}, cached_input_tokens={})",
            self.inner
                .input_tokens
                .map_or_else(|| "None".to_string(), |v| v.to_string()),
            self.inner
                .output_tokens
+                .map_or_else(|| "None".to_string(), |v| v.to_string()),
+            self.inner
+                .cached_input_tokens
                .map_or_else(|| "None".to_string(), |v| v.to_string())
        )


correctness: __repr__ method for Usage will print cached_input_tokens as the third argument, but if self.inner.input_tokens or self.inner.output_tokens is None, the output will be misaligned and misleading.

🤖 AI Agent Prompt for Cursor/Windsurf

📋 Copy this prompt to your AI coding assistant (Cursor, Windsurf, etc.) to get help fixing this issue

In engine/language_client_python/src/types/log_collector.rs, lines 405-417, the __repr__ method for Usage prints input_tokens, output_tokens, and cached_input_tokens. However, if any of these are None, the output may be misleading or misaligned. Please ensure that each field is printed in the correct order and that None values are clearly represented, so the output always matches the format Usage(input_tokens=..., output_tokens=..., cached_input_tokens=...).

📝 Committable Code Suggestion

‼️ Ensure you review the code suggestion before committing it to the branch. Make sure it replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

pub fn __repr__(&self) -> String {

format!(

"Usage(input_tokens={}, output_tokens={})",

"Usage(input_tokens={}, output_tokens={}, cached_input_tokens={})",

self.inner

.input_tokens

.map_or_else(|| "None".to_string(), |v| v.to_string()),

self.inner

.output_tokens

.map_or_else(|| "None".to_string(), |v| v.to_string()),

self.inner

.cached_input_tokens

.map_or_else(|| "None".to_string(), |v| v.to_string())

)

format!(

"Usage(input_tokens={}, output_tokens={}, cached_input_tokens={})",

self.inner

.input_tokens

.map_or_else(|| "None".to_string(), |v| v.to_string()),

self.inner

.output_tokens

.map_or_else(|| "None".to_string(), |v| v.to_string()),

self.inner

.cached_input_tokens

.map_or_else(|| "None".to_string(), |v| v.to_string())

)

entelligence-ai-pr-reviews · 2025-08-30T11:14:14Z

engine/language_client_ruby/ext/ruby_ffi/src/types/log_collector.rs

    pub fn to_s(&self) -> String {
        format!(
-            "Usage(input_tokens={}, output_tokens={})",
+            "Usage(input_tokens={}, output_tokens={}, cached_input_tokens={})",
            self.inner
                .input_tokens
                .map_or_else(|| "null".to_string(), |v| v.to_string()),
            self.inner
                .output_tokens
+                .map_or_else(|| "null".to_string(), |v| v.to_string()),
+            self.inner
+                .cached_input_tokens
                .map_or_else(|| "null".to_string(), |v| v.to_string())
        )


correctness: to_s method will panic at runtime if self.inner is None, as it assumes self.inner is always present.

🤖 AI Agent Prompt for Cursor/Windsurf

📋 Copy this prompt to your AI coding assistant (Cursor, Windsurf, etc.) to get help fixing this issue

In engine/language_client_ruby/ext/ruby_ffi/src/types/log_collector.rs, lines 384-396, the `to_s` method assumes `self.inner` is always present, which can cause a panic if it is None. Update the method to safely handle the case where `self.inner` is None, returning a string with all fields as 'null' in that case.

📝 Committable Code Suggestion

‼️ Ensure you review the code suggestion before committing it to the branch. Make sure it replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

pub fn to_s(&self) -> String {

format!(

"Usage(input_tokens={}, output_tokens={})",

"Usage(input_tokens={}, output_tokens={}, cached_input_tokens={})",

self.inner

.input_tokens

.map_or_else(|| "null".to_string(), |v| v.to_string()),

self.inner

.output_tokens

.map_or_else(|| "null".to_string(), |v| v.to_string()),

self.inner

.cached_input_tokens

.map_or_else(|| "null".to_string(), |v| v.to_string())

)

pub fn to_s(&self) -> String {

match &self.inner {

Some(inner) => format!(

"Usage(input_tokens={}, output_tokens={}, cached_input_tokens={})",

inner.input_tokens.map_or_else(|| "null".to_string(), |v| v.to_string()),

inner.output_tokens.map_or_else(|| "null".to_string(), |v| v.to_string()),

inner.cached_input_tokens.map_or_else(|| "null".to_string(), |v| v.to_string())

),

None => "Usage(input_tokens=null, output_tokens=null, cached_input_tokens=null)".to_string(),

}

}

entelligence-ai-pr-reviews · 2025-08-30T11:14:15Z

engine/language_client_typescript/src/types/log_collector.rs

    pub fn to_string(&self) -> String {
        format!(
-            "Usage(input_tokens={}, output_tokens={})",
+            "Usage(input_tokens={}, output_tokens={}, cached_input_tokens={})",
            self.inner
                .input_tokens
                .map_or_else(|| "null".to_string(), |v| v.to_string()),
            self.inner
                .output_tokens
+                .map_or_else(|| "null".to_string(), |v| v.to_string()),
+            self.inner
+                .cached_input_tokens
                .map_or_else(|| "null".to_string(), |v| v.to_string())
        )


performance: to_string method uses chained .map_or_else for each token field, but this is not a significant performance issue given the small, fixed number of fields and negligible runtime impact.

aaronvg · 2025-09-04T00:28:36Z

checking this now

ellipsis-dev bot reviewed Aug 30, 2025

View reviewed changes

entelligence-ai-pr-reviews bot reviewed Aug 30, 2025

View reviewed changes

Merge branch 'canary' into add-cached-token-tracking

9b68e20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add cached input token tracking to Usage reporting #2394

Add cached input token tracking to Usage reporting #2394

Uh oh!

lukeramsden commented Aug 30, 2025 •

edited by ellipsis-dev bot

Loading

Uh oh!

vercel bot commented Aug 30, 2025

Uh oh!

ellipsis-dev bot left a comment

Uh oh!

entelligence-ai-pr-reviews bot commented Aug 30, 2025

Uh oh!

entelligence-ai-pr-reviews bot Aug 30, 2025

Uh oh!

entelligence-ai-pr-reviews bot Aug 30, 2025

Uh oh!

entelligence-ai-pr-reviews bot Aug 30, 2025

Uh oh!

aaronvg commented Sep 4, 2025

Uh oh!

Uh oh!

Add cached input token tracking to Usage reporting #2394

Are you sure you want to change the base?

Add cached input token tracking to Usage reporting #2394

Uh oh!

Conversation

lukeramsden commented Aug 30, 2025 • edited by ellipsis-dev bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes Made

Test Plan

Technical Notes

Uh oh!

vercel bot commented Aug 30, 2025

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

entelligence-ai-pr-reviews bot commented Aug 30, 2025

Uh oh!

entelligence-ai-pr-reviews bot Aug 30, 2025

Choose a reason for hiding this comment

Uh oh!

entelligence-ai-pr-reviews bot Aug 30, 2025

Choose a reason for hiding this comment

Uh oh!

entelligence-ai-pr-reviews bot Aug 30, 2025

Choose a reason for hiding this comment

Uh oh!

aaronvg commented Sep 4, 2025

Uh oh!

Uh oh!

lukeramsden commented Aug 30, 2025 •

edited by ellipsis-dev bot

Loading