Hey all, I’m working with the Gemini Live API (via WebSockets) and using it for streaming LLM output in real-time. I understand that the API exposes signals like generationComplete and turnComplete, which tell me when the model has finished its current output. I can react to those cleanly in my client or backend.
What I need is something a bit different:
Instead of waiting until the model is done, I want a way to detect when the model is getting close to done — so I can call a function, update my UI, prep the next turn, or transition my state before the final completion happens.
Right now my model pipeline looks like this:
-
Client opens a
gemini.live.connectsession. -
I stream text/audio and receive chunks back from the model.
-
I watch for
server_content.generation_completeorserver_content.turn_completeto know the reply is finished.
But there doesn’t seem to be any built-in “N tokens left” or “almost done” event in the Gemini Live spec that gets emitted before generationComplete. The API docs only define the normal completion flags — no progress percentage or remaining token info.
Before I build a heuristic (like counting streamed tokens/chars and calling my callback when some threshold is met), I wanted to check:
-
Has anyone seen undocumented or hidden signals that signal “approaching end of generation”?
-
Are there better client-side heuristics people use in Gemini Live if they need early notice of an ending?
-
Or is the community just using
generationCompleteas the de facto only reliable signal?
For context: I’m aware this isn’t about end of turn detection or voice activity detection — I’m talking strictly about approaching end of the model’s text/audio generation while it’s streaming.
Thanks!