feat: add support for ContetxtWindowCompressionConfig in RunConfig #2206

ac-machache · 2025-07-28T07:17:03Z

Summary

This PR adds support for ContextWindowCompressionConfig in RunConfig.
This enables context window compression using a trigger_tokens threshold and a sliding window with a target_tokens limit.

This feature is useful for managing long-running audio inputs.

Related Issue

Closes #2188

Testing Plan

Added new unit test: test_streaming_with_context_window_compression_config

ac-machache · 2025-07-28T13:47:45Z

@hangfei, could you please take a look at this and review?

Regarding the session resumption feature — I don’t think it's sufficient to rely solely on transparent=True.

Based on my tests:

The transparent mode appears to be available only in the Vertex AI API, not in the AI Studio Gemini API.
Additionally, live models, such as gemini-live-2.5-flash-preview, are not available in all regions in Vertex AI API.
Since we also have to manually manage the session handle for resumption in both cases, relying only on transparent=True might introduce inconsistencies or limitations.

Suggestion: It might be better not to use session resumption via transparent=True at all — or at least to fall back to explicit session handle management in all cases for consistency.

Also we need another way to handle session resumption in voice mode after 24 hours since the live api session resumption is only valid for that duration.

hangfei · 2025-08-01T21:23:05Z

@hangfei, could you please take a look at this and review?

Regarding the session resumption feature — I don’t think it's sufficient to rely solely on transparent=True.

Based on my tests:

The transparent mode appears to be available only in the Vertex AI API, not in the AI Studio Gemini API.

Additionally, live models, such as gemini-live-2.5-flash-preview, are not available in all regions in Vertex AI API.

Since we also have to manually manage the session handle for resumption in both cases, relying only on transparent=True might introduce inconsistencies or limitations.

Suggestion: It might be better not to use session resumption via transparent=True at all — or at least to fall back to explicit session handle management in all cases for consistency.

Also we need another way to handle session resumption in voice mode after 24 hours since the live api session resumption is only valid for that duration.

@ac-machache good point. i have a fix here and PTAL: https://github.com/google/adk-python/pull/2270/files.

regarding 24 hours, do you have some use cases that span more than 24 hours.

hangfei · 2025-08-01T21:24:34Z

Summary

This PR adds support for ContextWindowCompressionConfig in RunConfig. This enables context window compression using a trigger_tokens threshold and a sliding window with a target_tokens limit.

This feature is useful for managing long-running audio inputs.

Related Issue

Closes #2188

Testing Plan

Added new unit test: test_streaming_with_context_window_compression_config

thanks!

I was wondering if there is any way to test if this actually works or not.

One challenge we found is that sometimes either due to bugs on our side or on other dependencies, the feature doesn't actually work. So it would be good if we have a way to test if the window actually get compressed or not.

ac-machache · 2025-08-01T21:30:09Z

@hangfei,

Okay, regarding testing the feature window compression — I’ll try it tomorrow and see if it works in a real use case in addition to the unit tests.

As for the 24-hour limit in the agentic workflow: a user might want to resume an old session even 3 days later. Since we can’t really predict this behavior, it might be better to support longer session resumption, even if implementing it is a bit more complex.

ac-machache · 2025-08-02T10:58:27Z

@hangfei ,

Regarding `context_window_compression` Behavior in Gemini Live API

Through my recent tests with the Gemini API, I can confidently confirm that the context_window_compression feature—when configured with trigger_tokens and sliding_window(target_tokens)—operates as expected to manage the conversational context size.

Testing Methodology

My testing approach involved analyzing the prompt_token_count reported by the Gemini Live API. This revealed a crucial structure to the total prompt size:

Token Structure

Fixed Base Tokens (FBT):
I consistently observed an approximate 581 tokens acting as a baseline in prompt_token_count.
This appears to be non-user-controlled overhead from the Gemini API, likely for:
- Internal session state
- Additional system instructions
- Foundational model setup
  These tokens are not compressible.
Effective Context Window (ECW):
This dynamic portion includes the actual conversation history (user input, model responses, function calls).
context_window_compression settings apply only to this portion.

Test Run 1: `trigger_tokens=512`, `sliding_window(target_tokens=256)`

Objective:

Trigger compression when ECW exceeds 512, aiming to reduce it near 256.

Turn	`prompt_token_count`	ECW (`-581 FBT`)	Observation
1	612	31	Initial turn
2	689	108	Growing ECW
3	758	177	Growing ECW
4	1009	428	Nearing trigger
5	1148	567	ECW > trigger (512)
6	813	232	Compression activated

This run confirmed compression kicked in once ECW > trigger_tokens.

Test Run 2: `trigger_tokens=1024`, `sliding_window(target_tokens=512)`

Turn	`prompt_token_count`	ECW (`-581 FBT`)	Observation
1	600	19	Start
2	697	116	Growing
9	1484	903	Still under trigger
10	1618	1037	Trigger Point 1
11	1738	1157	Compression pending
12	1102	521	Compression Effect 1
15	1580	999	ECW regrowth
16	1826	1245	Trigger Point 2
17	1065	484	Compression Effect 2

Conclusion

These results demonstrate that:

The context_window_compression mechanism works predictably, managing dynamic content in long conversations.
Compression activates after the ECW (prompt minus FBT) crosses the trigger_tokens threshold.
Once triggered, the ECW shrinks close to target_tokens, validating the sliding window mechanism.

Note: The Fixed Base Tokens (~581) ensures prompt_token_count will always be higher than target_tokens, but this does not affect the effectiveness of compression.

ac-machache · 2025-08-23T11:03:37Z

@hangfei, Is there any blocking point holding it up? I’d be happy to help unblock or address any feedback if needed

feat: add support for ContetxtWindowCompressionConfig in RunConfig

23a3b06

Merge branch 'main' into support/add-context-compression-config

b7ac6e4

hangfei assigned ac-machache Aug 1, 2025

hangfei added live [Component] This issue is related to live, voice and video chat wip [Status] This issue is being worked on. Either there is a pending PR or is planned to be fixed labels Aug 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add support for ContetxtWindowCompressionConfig in RunConfig #2206

feat: add support for ContetxtWindowCompressionConfig in RunConfig #2206

Uh oh!

ac-machache commented Jul 28, 2025

Uh oh!

ac-machache commented Jul 28, 2025

Uh oh!

hangfei commented Aug 1, 2025

Uh oh!

hangfei commented Aug 1, 2025

Summary

Related Issue

Testing Plan

Uh oh!

ac-machache commented Aug 1, 2025

Uh oh!

ac-machache commented Aug 2, 2025 •

edited

Loading

Uh oh!

ac-machache commented Aug 23, 2025

Uh oh!

Uh oh!

feat: add support for ContetxtWindowCompressionConfig in RunConfig #2206

Are you sure you want to change the base?

feat: add support for ContetxtWindowCompressionConfig in RunConfig #2206

Uh oh!

Conversation

ac-machache commented Jul 28, 2025

Summary

Related Issue

Testing Plan

Uh oh!

ac-machache commented Jul 28, 2025

Uh oh!

hangfei commented Aug 1, 2025

Uh oh!

hangfei commented Aug 1, 2025

Summary

Related Issue

Testing Plan

Uh oh!

ac-machache commented Aug 1, 2025

Uh oh!

ac-machache commented Aug 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Regarding context_window_compression Behavior in Gemini Live API

Testing Methodology

Token Structure

Test Run 1: trigger_tokens=512, sliding_window(target_tokens=256)

Objective:

Test Run 2: trigger_tokens=1024, sliding_window(target_tokens=512)

Conclusion

Uh oh!

ac-machache commented Aug 23, 2025

Uh oh!

Uh oh!

ac-machache commented Aug 2, 2025 •

edited

Loading

Regarding `context_window_compression` Behavior in Gemini Live API

Test Run 1: `trigger_tokens=512`, `sliding_window(target_tokens=256)`

Test Run 2: `trigger_tokens=1024`, `sliding_window(target_tokens=512)`