How do I prevent the Live API from discarding audio when it's given audio while it speaks?

bobber · March 20, 2025, 11:50pm

At the moment when I talk over it, it stops talking and discards what it was saying. I don’t want that. I want it to continue listening and continue producing audio without interruption in a continuous way.

Any way to achieve that? I don’t want it to be interrupted ever. Just keep going in a queue non stop.

from the docs:


### Handle interruptions

Users can interrupt the model's output at any time. When [Voice activity detection](https://ai.google.dev/gemini-api/docs/live#voice-activity-detection) (VAD) detects an interruption, the ongoing generation is canceled and discarded. Only the information already sent to the client is retained in the session history. The server then sends a [BidiGenerateContentServerContent](https://ai.google.dev/gemini-api/docs/live#bidigeneratecontentservercontent) message to report the interruption.

In addition, the Gemini server discards any pending function calls and sends a `BidiGenerateContentServerContent` message with the IDs of the canceled calls.

async for response in session.receive():
if response.server_content.interrupted is not None:
# The generation was interrupted

Govind_Keshari · March 21, 2025, 5:00am

Hi @bobber, Welcome to the forum!!!

I think you can’t achieve it bcz interruption is inbuilt. VAD is always enabled, and its parameters aren’t configurable. This is one of the limitation mentioned in the doc.
I guess Human like interaction or back and forth conversation is the nature of Live API that’s why.

Thanks.

bobber · March 21, 2025, 10:39am

Can we please add a toggle to disable interruptions? It’s ok that it has internal VAD but I really need it to not drop what is being said and continue processing and talking over you regardless of interruptions, and my understanding is that only the live api is capable of processing audio in all languages, and in any case I need the low latency of the live API.

Sergey_Tokarev · March 21, 2025, 1:01pm

It seems like this can be worked around on the client side by collecting input from the microphone, but only sending it as a batch after serverTurn.

But I haven’t tested this idea yet.

bobber · March 21, 2025, 1:19pm

The problem with that is that if your batched audio contains sentnces with silences long enough to activate gemini’s internal VAD, it will drop all of the sentences except the last one, since it will treat the last sentence as an interruption to the previous sentences in the same batch.

I’m not completely sure this is the problem but it seems likely. So far I can’t get it to work.

Hopefully google simply add a toggle to set interrupt to false. I don’t understand why they have to break the entire thing by forcing a “feature” into the api.

Alternatively, if we could manually set the end of turn, that would also work reasonably well, instead of it deciding who knows when when the turn ends…

Govind_Keshari · March 25, 2025, 1:21pm

Hey @bobber,

I will escalate this feature request with the team. Let see whether it is implemented or not.
There are some FR’s in queue like saving of live conversation and pause button. Hopefully, will see modifications in updated version.

Thanks.

bobber · March 25, 2025, 4:06pm

It would be absolutely phenomenal if you could disable auto interrupt and just let it keep going, @Govind_Keshari, basically continually receiving and producing a stream, even if in chunks via the automatic internal VAD, but without stopping and without discarding data that was sent before the interruption!

Truly phenomenal if you add a an option to disable interruptions and just keep it all in queue.

Absolutely key for transcription and translation applications!

bobber · April 10, 2025, 2:31pm

Any luck? We really need this to produce live translation.

Govind_Keshari · April 11, 2025, 5:44am

Hey @bobber, You can now configure VAD settings. Follow this doc.

bobber · April 11, 2025, 6:10pm

Amazing, thank you @Govind_Keshari . I tried implementing it in python and it didn’t work. I assume for this reason? “SDK support for this feature will be available in the coming weeks.”

If so, is there example non-SDK code I can use to adapt my code to not use the SDK and use the disabled VAD? Or otherwise any other way to in fact use the disabled VAD?

For example, is it available on any of the other SDKs?

I have no problem writing the code directly, with no SDK, but I need to see an example or detailed docs to do it.

Govind_Keshari · June 24, 2025, 6:34am

Hey @bobber, Can you please share your code if possible? Just want to check how you are implementing VAD in your code.

Thanks.

Topic		Replies	Views
Disable interruptions for audio streaming for multimodal live api Gemini API api	5	422	June 24, 2025
Interrupting Gemini 2 Flash Multimodal Live API seem not to work as expected Gemini API gemini-flash	1	324	June 16, 2025
Handling user interruptions with gemini-live-2.5-flash vertex ai model Gemini API models , audio	6	71	August 1, 2025
Gemini Live API: token generation suddenly stops Gemini API ai-studio , api , audio , live-streaming	7	194	July 25, 2025
Live API - PTT with external STT & Interruptions Gemini API gemini-api , prompt	1	41	August 1, 2025

How do I prevent the Live API from discarding audio when it's given audio while it speaks?

Related topics