Python: surface Gemini cached and thinking token counts in usage details#6638
Open
he-yufeng wants to merge 1 commit into
Open
Python: surface Gemini cached and thinking token counts in usage details#6638he-yufeng wants to merge 1 commit into
he-yufeng wants to merge 1 commit into
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates the Python Gemini connector to surface additional token-usage metadata (cache-read and reasoning/thinking tokens) into the framework’s canonical UsageDetails fields, bringing Gemini in line with the existing OpenAI and Anthropic connectors for more accurate accounting.
Changes:
- Map
GenerateContentResponseUsageMetadata.cached_content_token_count→usage_details["cache_read_input_token_count"]. - Map
GenerateContentResponseUsageMetadata.thoughts_token_count→usage_details["reasoning_output_token_count"]. - Extend the Gemini client unit tests to include these new usage fields.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
python/packages/gemini/agent_framework_gemini/_chat_client.py |
Adds the two missing usage metadata mappings into _parse_usage. |
python/packages/gemini/tests/test_gemini_client.py |
Updates the response helper to include cached/thinking token fields and adds a test validating the new usage keys. |
Comment on lines
+381
to
+401
| async def test_get_response_usage_details_includes_cached_and_reasoning_tokens() -> None: | ||
| """Surfaces Gemini cached-content and thinking token counts into the canonical usage fields.""" | ||
| client, mock = _make_gemini_client() | ||
| mock.aio.models.generate_content = AsyncMock( | ||
| return_value=_make_response( | ||
| [_make_part(text="Hi")], | ||
| prompt_tokens=20, | ||
| output_tokens=8, | ||
| total_tokens=28, | ||
| cached_tokens=12, | ||
| thoughts_tokens=6, | ||
| ) | ||
| ) | ||
|
|
||
| response = await client.get_response(messages=[Message(role="user", contents=[Content.from_text("Hi")])]) | ||
|
|
||
| assert response.usage_details is not None | ||
| assert response.usage_details["cache_read_input_token_count"] == 12 | ||
| assert response.usage_details["reasoning_output_token_count"] == 6 | ||
|
|
||
|
|
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation & Context
The Gemini chat client only surfaces input, output, and total token counts in
usage_details. Gemini'sGenerateContentResponseUsageMetadataalso reportscached_content_token_count(tokens served from context cache) andthoughts_token_count(tokens spent thinking by reasoning models), and_parse_usagedrops both. For cached prompts and thinking models, cache and reasoning usage silently read as zero, which throws off cost and token accounting.UsageDetailsalready defines canonical fields for these (cache_read_input_token_count,reasoning_output_token_count), and the OpenAI and Anthropic connectors already populate them, so Gemini was the odd one out.Description & Review Guide
RawGeminiChatClient._parse_usagenow mapscached_content_token_counttocache_read_input_token_countandthoughts_token_counttoreasoning_output_token_count, following the sameis not Noneguard pattern as the existing three fields.google-genai's usage metadata and the target keys match theUsageDetailscontract.Added
test_get_response_usage_details_includes_cached_and_reasoning_tokensand extended the_make_responsetest helper with the two fields. The fulltest_gemini_client.pysuite passes locally (113 passed, 8 integration skipped).Related Issue
Fixes #6637
Contribution Checklist