Skip to content

Python: surface Gemini cached and thinking token counts in usage details#6638

Open
he-yufeng wants to merge 1 commit into
microsoft:mainfrom
he-yufeng:fix/gemini-usage-cache-reasoning-tokens
Open

Python: surface Gemini cached and thinking token counts in usage details#6638
he-yufeng wants to merge 1 commit into
microsoft:mainfrom
he-yufeng:fix/gemini-usage-cache-reasoning-tokens

Conversation

@he-yufeng

Copy link
Copy Markdown
Contributor

Motivation & Context

The Gemini chat client only surfaces input, output, and total token counts in usage_details. Gemini's GenerateContentResponseUsageMetadata also reports cached_content_token_count (tokens served from context cache) and thoughts_token_count (tokens spent thinking by reasoning models), and _parse_usage drops both. For cached prompts and thinking models, cache and reasoning usage silently read as zero, which throws off cost and token accounting.

UsageDetails already defines canonical fields for these (cache_read_input_token_count, reasoning_output_token_count), and the OpenAI and Anthropic connectors already populate them, so Gemini was the odd one out.

Description & Review Guide

  • What are the major changes? RawGeminiChatClient._parse_usage now maps cached_content_token_count to cache_read_input_token_count and thoughts_token_count to reasoning_output_token_count, following the same is not None guard pattern as the existing three fields.
  • What is the impact of these changes? Cache-read and reasoning token counts are now reported for Gemini, consistent with the OpenAI and Anthropic connectors. Responses that omit these fields are unchanged (the values stay unset).
  • What do you want reviewers to focus on? That the source field names match google-genai's usage metadata and the target keys match the UsageDetails contract.

Added test_get_response_usage_details_includes_cached_and_reasoning_tokens and extended the _make_response test helper with the two fields. The full test_gemini_client.py suite passes locally (113 passed, 8 integration skipped).

Related Issue

Fixes #6637

Contribution Checklist

  • The code builds clean without any errors or warnings
  • All unit tests pass, and I have added new tests where possible
  • The PR follows the Contribution Guidelines
  • This PR is linked to an issue and there is no other open PR for this issue.
  • This is not a breaking change.

Copilot AI review requested due to automatic review settings June 20, 2026 00:07
@moonbox3 moonbox3 added the python Issues related to the Python codebase label Jun 20, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Python Gemini connector to surface additional token-usage metadata (cache-read and reasoning/thinking tokens) into the framework’s canonical UsageDetails fields, bringing Gemini in line with the existing OpenAI and Anthropic connectors for more accurate accounting.

Changes:

  • Map GenerateContentResponseUsageMetadata.cached_content_token_countusage_details["cache_read_input_token_count"].
  • Map GenerateContentResponseUsageMetadata.thoughts_token_countusage_details["reasoning_output_token_count"].
  • Extend the Gemini client unit tests to include these new usage fields.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
python/packages/gemini/agent_framework_gemini/_chat_client.py Adds the two missing usage metadata mappings into _parse_usage.
python/packages/gemini/tests/test_gemini_client.py Updates the response helper to include cached/thinking token fields and adds a test validating the new usage keys.

Comment on lines +381 to +401
async def test_get_response_usage_details_includes_cached_and_reasoning_tokens() -> None:
"""Surfaces Gemini cached-content and thinking token counts into the canonical usage fields."""
client, mock = _make_gemini_client()
mock.aio.models.generate_content = AsyncMock(
return_value=_make_response(
[_make_part(text="Hi")],
prompt_tokens=20,
output_tokens=8,
total_tokens=28,
cached_tokens=12,
thoughts_tokens=6,
)
)

response = await client.get_response(messages=[Message(role="user", contents=[Content.from_text("Hi")])])

assert response.usage_details is not None
assert response.usage_details["cache_read_input_token_count"] == 12
assert response.usage_details["reasoning_output_token_count"] == 6


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python Issues related to the Python codebase

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python: Gemini connector drops cached-content and thinking token counts from usage details

3 participants