Hi everyone,
I’m running into an issue with the Live API (using the gemini-2.0-flash-exp model) where it hangs when I include a system prompt, but works fine without one. I’m hoping someone can shed light on whether this is expected behavior, a bug, or if I’m configuring something incorrectly.
What I’m Trying to Do
I’m building an audio-to-audio translation service that takes English audio input and returns Egyptian Arabic audio output. My goal is to set a system instruction like “You are a translator” to guide the model’s behavior.
Setup
- Model: gemini-2.0-flash-exp
- Config: LiveConnectConfig with response_modalities=[“AUDIO”] and a speech_config for output voice.
- Input: Mono, 16kHz, 16-bit PCM audio (verified to work without the prompt).
- Code: Using the Python async client (client.aio.live.connect).
config = types.LiveConnectConfig(
response_modalities=[“AUDIO”],
speech_config=types.SpeechConfig(
voice_config=types.VoiceConfig(
prebuilt_voice_config=types.PrebuiltVoiceConfig(voice_name=“Kore”)
)
),
system_instruction=types.Content(
parts=[types.Part.from_text(
text=“You are a translation engine. Your sole purpose is to translate between English and Egyptian Arabic (Egyptian dialect). Do not add any explanations or conversation.”
)],
role=“user”
)
)
async with client.aio.live.connect(model=“gemini-2.0-flash-exp”, config=config) as session:
await session.send(input={“data”: raw_audio, “mime_type”: “audio/pcm”}, end_of_turn=True)
async for response in session.receive():
The Issue
- With System Prompt: The code sends the audio successfully (logged as Sending input audio data: X bytes), but it hangs indefinitely at Waiting for audio response… No chunks are received, and it never progresses.
- Without System Prompt: If I remove the system_instruction (or a similar turns message), it works perfectly—audio is sent, and I get a response (though it’s not translated, just echoed or processed differently).