At the moment when I talk over it, it stops talking and discards what it was saying. I don’t want that. I want it to continue listening and continue producing audio without interruption in a continuous way.
Any way to achieve that? I don’t want it to be interrupted ever. Just keep going in a queue non stop.
from the docs:
### Handle interruptions
Users can interrupt the model's output at any time. When [Voice activity detection](https://ai.google.dev/gemini-api/docs/live#voice-activity-detection) (VAD) detects an interruption, the ongoing generation is canceled and discarded. Only the information already sent to the client is retained in the session history. The server then sends a [BidiGenerateContentServerContent](https://ai.google.dev/gemini-api/docs/live#bidigeneratecontentservercontent) message to report the interruption.
In addition, the Gemini server discards any pending function calls and sends a `BidiGenerateContentServerContent` message with the IDs of the canceled calls.
async for response in session.receive():
if response.server_content.interrupted is not None:
# The generation was interrupted
I think you can’t achieve it bcz interruption is inbuilt. VAD is always enabled, and its parameters aren’t configurable. This is one of the limitation mentioned in the doc.
I guess Human like interaction or back and forth conversation is the nature of Live API that’s why.
Can we please add a toggle to disable interruptions? It’s ok that it has internal VAD but I really need it to not drop what is being said and continue processing and talking over you regardless of interruptions, and my understanding is that only the live api is capable of processing audio in all languages, and in any case I need the low latency of the live API.
The problem with that is that if your batched audio contains sentnces with silences long enough to activate gemini’s internal VAD, it will drop all of the sentences except the last one, since it will treat the last sentence as an interruption to the previous sentences in the same batch.
I’m not completely sure this is the problem but it seems likely. So far I can’t get it to work.
Hopefully google simply add a toggle to set interrupt to false. I don’t understand why they have to break the entire thing by forcing a “feature” into the api.
Alternatively, if we could manually set the end of turn, that would also work reasonably well, instead of it deciding who knows when when the turn ends…
I will escalate this feature request with the team. Let see whether it is implemented or not.
There are some FR’s in queue like saving of live conversation and pause button. Hopefully, will see modifications in updated version.
It would be absolutely phenomenal if you could disable auto interrupt and just let it keep going, @Govind_Keshari, basically continually receiving and producing a stream, even if in chunks via the automatic internal VAD, but without stopping and without discarding data that was sent before the interruption!
Truly phenomenal if you add a an option to disable interruptions and just keep it all in queue.
Absolutely key for transcription and translation applications!
Amazing, thank you @Govind_Keshari . I tried implementing it in python and it didn’t work. I assume for this reason? “SDK support for this feature will be available in the coming weeks.”
If so, is there example non-SDK code I can use to adapt my code to not use the SDK and use the disabled VAD? Or otherwise any other way to in fact use the disabled VAD?
For example, is it available on any of the other SDKs?
I have no problem writing the code directly, with no SDK, but I need to see an example or detailed docs to do it.