I’m using Gemini Live with ephemeral tokens and the audio stream works correctly — I receive audio replies from Gemini without issues.
However, I never receive input or output transcription events from the server.
Environment
-
Model:
gemini-2.5-flash-native-audio-preview-09-2025 -
Backend: Python (ephemeral token generation)
-
Frontend:
@google/genai, Live API,v1alpha
Ephemeral token creation (Python)
setup_config = {
"model": "gemini-2.5-flash-native-audio-preview-09-2025",
"config": {
"session_resumption": {},
"temperature": 0.7,
"responseModalities": ["AUDIO"],
"systemInstruction": {"parts": [{"text": system_instruction_text}]},
"speechConfig": {"voiceConfig": {"prebuiltVoiceConfig": {"voiceName": "Zephyr"}}},
"inputAudioTranscription": {},
"outputAudioTranscription": {},
"audioConfig": {"targetSampleRate": 16000}
},
}
Frontend live session (JS)
const session = await ai.live.connect({
model: "gemini-2.5-flash-native-audio-preview-09-2025",
config: {
responseModalities: ["AUDIO"],
inputAudioTranscription: {},
outputAudioTranscription: {},
audioConfig: { targetSampleRate: 16000 },
},
callbacks: {
onmessage: (msg) => console.log("Message:", msg),
onerror: (err) => console.error(err),
},
});
Issue
-
I receive audio output from Gemini normally.
-
I receive no
input_transcriptionoroutput_transcriptionmessages at any time. -
No transcription fields appear in any “onmessage” event.
Expected
-
The server should send input transcription events when I speak.
-
The server should send output transcription events for the model’s audio response.
Questions
-
Is transcription supported for this preview model?
-
Are additional fields required in the session config to enable transcriptions?
-
Is this a known issue with the Live API, ephemeral tokens, or the preview SDK?