Skip to content

Conversation

ac-machache
Copy link

Summary

This PR adds support for ContextWindowCompressionConfig in RunConfig.
This enables context window compression using a trigger_tokens threshold and a sliding window with a target_tokens limit.

This feature is useful for managing long-running audio inputs.

Related Issue

Closes #2188

Testing Plan

  • Added new unit test: test_streaming_with_context_window_compression_config

@ac-machache
Copy link
Author

@hangfei, could you please take a look at this and review?

Regarding the session resumption feature — I don’t think it's sufficient to rely solely on transparent=True.

Based on my tests:

  • The transparent mode appears to be available only in the Vertex AI API, not in the AI Studio Gemini API.
  • Additionally, live models, such as gemini-live-2.5-flash-preview, are not available in all regions in Vertex AI API.
  • Since we also have to manually manage the session handle for resumption in both cases, relying only on transparent=True might introduce inconsistencies or limitations.

Suggestion: It might be better not to use session resumption via transparent=True at all — or at least to fall back to explicit session handle management in all cases for consistency.

Also we need another way to handle session resumption in voice mode after 24 hours since the live api session resumption is only valid for that duration.

@hangfei
Copy link
Collaborator

hangfei commented Aug 1, 2025

@hangfei, could you please take a look at this and review?

Regarding the session resumption feature — I don’t think it's sufficient to rely solely on transparent=True.

Based on my tests:

  • The transparent mode appears to be available only in the Vertex AI API, not in the AI Studio Gemini API.
  • Additionally, live models, such as gemini-live-2.5-flash-preview, are not available in all regions in Vertex AI API.
  • Since we also have to manually manage the session handle for resumption in both cases, relying only on transparent=True might introduce inconsistencies or limitations.

Suggestion: It might be better not to use session resumption via transparent=True at all — or at least to fall back to explicit session handle management in all cases for consistency.

Also we need another way to handle session resumption in voice mode after 24 hours since the live api session resumption is only valid for that duration.

@ac-machache good point. i have a fix here and PTAL: https://github.com/google/adk-python/pull/2270/files.

regarding 24 hours, do you have some use cases that span more than 24 hours.

@hangfei
Copy link
Collaborator

hangfei commented Aug 1, 2025

Summary

This PR adds support for ContextWindowCompressionConfig in RunConfig. This enables context window compression using a trigger_tokens threshold and a sliding window with a target_tokens limit.

This feature is useful for managing long-running audio inputs.

Related Issue

Closes #2188

Testing Plan

  • Added new unit test: test_streaming_with_context_window_compression_config

thanks!

I was wondering if there is any way to test if this actually works or not.

One challenge we found is that sometimes either due to bugs on our side or on other dependencies, the feature doesn't actually work. So it would be good if we have a way to test if the window actually get compressed or not.

@hangfei hangfei added live [Component] This issue is related to live, voice and video chat wip [Status] This issue is being worked on. Either there is a pending PR or is planned to be fixed labels Aug 1, 2025
@ac-machache
Copy link
Author

@hangfei,

Okay, regarding testing the feature window compression — I’ll try it tomorrow and see if it works in a real use case in addition to the unit tests.

As for the 24-hour limit in the agentic workflow: a user might want to resume an old session even 3 days later. Since we can’t really predict this behavior, it might be better to support longer session resumption, even if implementing it is a bit more complex.

@ac-machache
Copy link
Author

ac-machache commented Aug 2, 2025

@hangfei ,

Regarding context_window_compression Behavior in Gemini Live API

Through my recent tests with the Gemini API, I can confidently confirm that the context_window_compression feature—when configured with trigger_tokens and sliding_window(target_tokens)—operates as expected to manage the conversational context size.

Testing Methodology

My testing approach involved analyzing the prompt_token_count reported by the Gemini Live API. This revealed a crucial structure to the total prompt size:

Token Structure

  • Fixed Base Tokens (FBT):
    I consistently observed an approximate 581 tokens acting as a baseline in prompt_token_count.
    This appears to be non-user-controlled overhead from the Gemini API, likely for:

    • Internal session state
    • Additional system instructions
    • Foundational model setup
      These tokens are not compressible.
  • Effective Context Window (ECW):
    This dynamic portion includes the actual conversation history (user input, model responses, function calls).
    context_window_compression settings apply only to this portion.


Test Run 1: trigger_tokens=512, sliding_window(target_tokens=256)

Objective:

Trigger compression when ECW exceeds 512, aiming to reduce it near 256.

Turn prompt_token_count ECW (-581 FBT) Observation
1 612 31 Initial turn
2 689 108 Growing ECW
3 758 177 Growing ECW
4 1009 428 Nearing trigger
5 1148 567 ECW > trigger (512)
6 813 232 Compression activated

This run confirmed compression kicked in once ECW > trigger_tokens.


Test Run 2: trigger_tokens=1024, sliding_window(target_tokens=512)

Turn prompt_token_count ECW (-581 FBT) Observation
1 600 19 Start
2 697 116 Growing
9 1484 903 Still under trigger
10 1618 1037 Trigger Point 1
11 1738 1157 Compression pending
12 1102 521 Compression Effect 1
15 1580 999 ECW regrowth
16 1826 1245 Trigger Point 2
17 1065 484 Compression Effect 2

Conclusion

These results demonstrate that:

  • The context_window_compression mechanism works predictably, managing dynamic content in long conversations.
  • Compression activates after the ECW (prompt minus FBT) crosses the trigger_tokens threshold.
  • Once triggered, the ECW shrinks close to target_tokens, validating the sliding window mechanism.

Note: The Fixed Base Tokens (~581) ensures prompt_token_count will always be higher than target_tokens, but this does not affect the effectiveness of compression.


@ac-machache
Copy link
Author

@hangfei, Is there any blocking point holding it up? I’d be happy to help unblock or address any feedback if needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
live [Component] This issue is related to live, voice and video chat wip [Status] This issue is being worked on. Either there is a pending PR or is planned to be fixed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ADK Live: Support SlidingWindow for ADK Live
2 participants