found the correct way to get both text and audio output from gemini-2.5-flash-preview-native-audio-dialog
.
Working Solution
Don’t use response_modalities=["AUDIO", "TEXT"]
- this causes errors.
Instead, use output_audio_transcription
:
python
config = types.LiveConnectConfig(
response_modalities=["AUDIO"], # Audio only here
output_audio_transcription=types.AudioTranscriptionConfig() # This enables text
)
Then handle both outputs in your receive loop:
python
async for response in session.receive():
# Text transcription
if response.server_content and response.server_content.output_transcription:
text = response.server_content.output_transcription.text
display_subtitles(text) # Perfect for subtitles!
# Audio data
if response.server_content and response.server_content.model_turn:
for part in response.server_content.model_turn.parts:
if part.audio and part.audio.data:
play_audio(part.audio.data)
Hope this helps others facing the same issue!