Live API + Ephemeral Token: No Input/Output Transcription (Audio replies work but no transcription events)

I’m using Gemini Live with ephemeral tokens and the audio stream works correctly — I receive audio replies from Gemini without issues.
However, I never receive input or output transcription events from the server.

Environment

  • Model: gemini-2.5-flash-native-audio-preview-09-2025

  • Backend: Python (ephemeral token generation)

  • Frontend: @google/genai, Live API, v1alpha

Ephemeral token creation (Python)

setup_config = {
    "model": "gemini-2.5-flash-native-audio-preview-09-2025",
    "config": {
        "session_resumption": {},
        "temperature": 0.7,
        "responseModalities": ["AUDIO"],
        "systemInstruction": {"parts": [{"text": system_instruction_text}]},
        "speechConfig": {"voiceConfig": {"prebuiltVoiceConfig": {"voiceName": "Zephyr"}}},
        "inputAudioTranscription": {},
        "outputAudioTranscription": {},
        "audioConfig": {"targetSampleRate": 16000}
    },
}

Frontend live session (JS)

const session = await ai.live.connect({
  model: "gemini-2.5-flash-native-audio-preview-09-2025",
  config: {
    responseModalities: ["AUDIO"],
    inputAudioTranscription: {},
    outputAudioTranscription: {},
    audioConfig: { targetSampleRate: 16000 },
  },
  callbacks: {
    onmessage: (msg) => console.log("Message:", msg),
    onerror: (err) => console.error(err),
  },
});

Issue

  • I receive audio output from Gemini normally.

  • I receive no input_transcription or output_transcription messages at any time.

  • No transcription fields appear in any “onmessage” event.

Expected

  • The server should send input transcription events when I speak.

  • The server should send output transcription events for the model’s audio response.

Questions

  1. Is transcription supported for this preview model?

  2. Are additional fields required in the session config to enable transcriptions?

  3. Is this a known issue with the Live API, ephemeral tokens, or the preview SDK?

Hi @Karan_Dumbre,
Welcome to the AI Forum. Thanks for reporting the issue.
The gemini-2.5-flash-native-audio-preview-09-2025 model fully supports Input and Output Audio Transcriptions.
Try changing the Python keys when creating the ephemeral token to input_audio_transcription and output_audio_transcription to ensure the server correctly provisions the token’s permissions.