Live API: does ContextWindowCompression `target_tokens` affect the post-compression window for audio (S2S) sessions?

vitbramm · May 22, 2026, 6:09am

I’m using the Live API (gemini-3.1-flash-live-preview) for a speech-to-speech audio session and I’m trying to understand how ContextWindowCompressionConfig / SlidingWindow actually behaves.

Setup — two sessions, identical scripted ~25-turn audio conversation, same audio input, with exactly one parameter changed between them:

trigger_tokens = 25000 (held constant)
Session A: SlidingWindow(target_tokens=512)
Session B: SlidingWindow(target_tokens=8000)

So target_tokens differs 16x between the two runs.

Observed (reading usage_metadata.prompt_token_count per turn):

Compression clearly fires in both runs — there are visible post-trigger drops in the prompt token count.
But the post-compression “landing” sizes differ by only ~10-20% between A and B — nowhere near 16x.
In both sessions the post-compression size keeps escalating turn over turn, and the window ends at ~71,000 prompt tokens by turn 25.
Cumulative billed input tokens were nearly the same — in fact the target_tokens=512 run was ~9% HIGHER, not lower.

So a 16x reduction in target_tokens produced essentially no reduction in realized window size or cost (slightly the opposite).

Questions:

1. For audio / S2S sessions, is target_tokens expected to influence the post-compression window size at all, or is it effectively a soft hint?

2. Is there a documented incompressible floor — e.g. system instruction + tools + the most recent un-discardable user turn(s)? If audio turns are large, is a small target_tokens simply unreachable?

3. Is SlidingWindow discard quantized to whole turns (and aligned to a user-turn boundary)? That would explain why a small target cannot be reached.

I’ve verified the config is constructed and sent correctly (the SlidingWindow.target_tokens field is populated). I’m trying to understand the intended behavior — not just whether this is a bug — so I can decide whether target_tokens is a usable knob for controlling cost in long audio sessions.

Pointers to docs or clarification from the team would be very helpful.

Thanks!

Topic		Replies	Views
Pricing of Speech to Speech live model Gemini API gemini-api , audio	6	451	June 21, 2026
Gemini Live API - sessions exceeding 15 minute limit without compression? Gemini API gemini , ai	2	239	January 9, 2026
Does Gemini Live native-audio bill cumulative prompt tokens on every turn? Cost seems to scale with turn count, not call duration. Gemini API gemini	1	63	July 7, 2026
Gemini Live API Caching Gemini API models	1	49	July 21, 2026
Audio Token Counts Unexpectedly Low in Gemini Live API Gemini API gemini-api , prompt	3	190	January 13, 2026

Live API: does ContextWindowCompression `target_tokens` affect the post-compression window for audio (S2S) sessions?

Related topics