Disable interruptions for audio streaming for multimodal live api

blue1 · January 20, 2025, 3:59am

Hi,

I was wondering if there was a way to disable interruptions for audio streaming for multimodal live api.

Thanks

godeskAI · January 20, 2025, 9:02am

Hi,

Actually I was looking for the opposite of this, meaning, I wanted interruption to work in the multimodal live api. However, by default this functionality is not working for me using the python genai library. (Posted about it here:
Interrupting Gemini 2 Flash Multimodal Live API seem not to work as expected)

Does the output audio stream from the model get interrupted for you by default when it detects an input? Also, does the ‘interrupted’ flag get set in the ‘server_content’ dict from the server? If so, could you please share your config?

Thanks!

blue1 · January 20, 2025, 4:37pm

Yes to both. However, I’m currently using the api through my own web socket connection, not through the genai library, similar to here:

github.com/google-gemini/cookbook

gemini-2/websockets/live_api_starter.py

main

# -*- coding: utf-8 -*-
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""
## Setup

To install the dependencies for this script, run:

This file has been truncated. show original

godeskAI · January 21, 2025, 4:29pm

Thanks for the info. Will try that out.

Regarding your requirement of uninterrupted audio streaming, guess you can try sending data
through BidiGenerateContentRealtimeInput instead of BidiGenerateContentClientContent, which according to this below link does not interrupt the model generation.

Cheers!

bobber · March 20, 2025, 11:58pm

Were you able to achieve it? I still can’t.

GUNAND_MAYANGLAMBAM · June 24, 2025, 10:12am

Hi, we have introduced a new Voice Activity Detection feature that enables the model to identify when someone is speaking. If VAD detects an interruption, it cancels and discards the current generation. Hope this helps answer your question.

Topic		Replies	Views
How do I prevent the Live API from discarding audio when it's given audio while it speaks? Gemini API api , gemini-api	10	236	June 24, 2025
Interrupting Gemini 2 Flash Multimodal Live API seem not to work as expected Gemini API gemini-flash	1	319	June 16, 2025
Gemini Live API: token generation suddenly stops Gemini API ai-studio , api , audio , live-streaming	7	166	July 25, 2025
Live API - PTT with external STT & Interruptions Gemini API gemini-api , prompt	0	28	July 18, 2025
Will it be possible to receive text and audio data in the multimodal API? Gemini API models , gemini-api	13	785	July 22, 2025

Disable interruptions for audio streaming for multimodal live api

Related topics