Live API Hangs When Using System Prompt with Audio-Only Response Modality

Mahmoud_Ashraf · March 15, 2025, 5:17pm

Hi everyone,

I’m running into an issue with the Live API (using the gemini-2.0-flash-exp model) where it hangs when I include a system prompt, but works fine without one. I’m hoping someone can shed light on whether this is expected behavior, a bug, or if I’m configuring something incorrectly.

What I’m Trying to Do

I’m building an audio-to-audio translation service that takes English audio input and returns Egyptian Arabic audio output. My goal is to set a system instruction like “You are a translator” to guide the model’s behavior.

Setup

Model: gemini-2.0-flash-exp
Config: LiveConnectConfig with response_modalities=[“AUDIO”] and a speech_config for output voice.
Input: Mono, 16kHz, 16-bit PCM audio (verified to work without the prompt).
Code: Using the Python async client (client.aio.live.connect).
config = types.LiveConnectConfig(
response_modalities=[“AUDIO”],
speech_config=types.SpeechConfig(
voice_config=types.VoiceConfig(
prebuilt_voice_config=types.PrebuiltVoiceConfig(voice_name=“Kore”)
)
),
system_instruction=types.Content(
parts=[types.Part.from_text(
text=“You are a translation engine. Your sole purpose is to translate between English and Egyptian Arabic (Egyptian dialect). Do not add any explanations or conversation.”
)],
role=“user”
)
)

async with client.aio.live.connect(model=“gemini-2.0-flash-exp”, config=config) as session:
await session.send(input={“data”: raw_audio, “mime_type”: “audio/pcm”}, end_of_turn=True)
async for response in session.receive():

The Issue

With System Prompt: The code sends the audio successfully (logged as Sending input audio data: X bytes), but it hangs indefinitely at Waiting for audio response… No chunks are received, and it never progresses.
Without System Prompt: If I remove the system_instruction (or a similar turns message), it works perfectly—audio is sent, and I get a response (though it’s not translated, just echoed or processed differently).

Pannaga_J · June 19, 2025, 9:41am

Hi @Mahmoud_Ashraf Apologies for late response .
Hope this issue resolved by now if not could you please try to use latest model like 2.5 Flash and 2.5 Flash-Lite and let us know if the issue still persist? Thank you!

Topic		Replies	Views
Will it be possible to receive text and audio data in the multimodal API? Gemini API models , gemini-api	12	738	June 12, 2025
New Gemini Live API "Native audio output" models not supporting System Instructions Gemini API api , models , live-streaming	4	108	June 10, 2025
outputAudioTranscription NOT WORKING WHEN [Modality.AUDIO] Gemini API api , models , gemini-flash	2	68	June 19, 2025
How to get text output from gemini-2.5-flash-preview-native-audio-dialog Gemini API showcase	2	110	June 18, 2025
Gemini-2.0-flash-live-001 model call starts silent, no greeting or system instruction honored on first connection Documentation api , models	2	78	June 28, 2025

Live API Hangs When Using System Prompt with Audio-Only Response Modality

What I’m Trying to Do

Setup

The Issue

Related topics