Transcript on live audio not been passed back during conversation (ephemeral tokens auth)

scaraliu · October 7, 2025, 2:26pm

Hi there.

When live session on audio gemini 2.0 flash, the transcript not available, even if requested.

Audio streams back, real-time, but no sign of the transcript.

And tried all the possible and impossible types of requests / and reads of responses via web socket.

Is it because of ephemeral tokens authentification?

Shivam_Singh2 · October 8, 2025, 10:20am

Hi @scaraliu
Welcome to the forum!!!

Could you please share the complete payload details along with the steps to reproduce the issue? This will help us investigate it more accurately and provide a more precise response.

scaraliu · October 8, 2025, 10:33am

Hi,

I was just working at it right now.

Here is the object that is been sent as the firsst afer the connectioon via ws is opened

{

"setup": {

    "model": "models/gemini-2.0-flash-live-001",

    "generationConfig": {

        "responseModalities": \[

            "TEXT",

            "AUDIO"

        \]

    },

    "systemInstruction": {

        "parts": \[

            {

                "text": "You are ....................., a helpful and friendly AI assistant. You can answer questions, help with tasks, and for complex queries or bookings, you can use the ……………tool which has access to the full conversation history."

            }

        \]

    }

}

}

scaraliu · October 8, 2025, 10:35am

the request is sent wss://generativelanguage.googleapis.com//ws/google.ai.generativelanguage.v1alpha.GenerativeService.BidiGenerateContentConstrained?access_token=auth_tokens

scaraliu · October 8, 2025, 10:41am

on front end we use, with epheral tokens this version of the library

import { GoogleGenAI } from ‘https://cdn.jsdelivr.net/npm/@google/genai@1.22.0/+esm’;

scaraliu · October 8, 2025, 10:50am

the most wired part, is that even do on some ocasions i get back in the stream of audio something like this

{
“serverContent”: {
“turnComplete”: true
},
“usageMetadata”: {
“promptTokenCount”: 605,
“responseTokenCount”: 11,
“totalTokenCount”: 616,
“promptTokensDetails”: [
{
“modality”: “TEXT”,
“tokenCount”: 604
},
{
“modality”: “AUDIO”,
“tokenCount”: 1
}
],
“responseTokensDetails”: [
{
“modality”: “TEXT”,
“tokenCount”: 11
}
]
}
}.

Still, no sign of text / transcript, and its taking me hours and hours to fix this.

Its very, very painfull as i saw at open ai things work smooth on the api of live audio, and im stuck with this google live audio.

Shivam_Singh2 · October 13, 2025, 7:33am

Hi @scaraliu

Could you please try using our new model gemini-live-2.5-flash-preview-native-audio?

It offers more reliable transcription compared to the earlier 2.0 releases. If you are still experiencing the same issue, please let me know.

Topic		Replies	Views
Live API + Ephemeral Token: No Input/Output Transcription (Audio replies work but no transcription events) Google AI Studio ai-studio , audio	1	78	December 30, 2025
Why in Gemini Live API with Audio Modality its Transcription is not available in response Gemini API audio , live-streaming	5	270	August 15, 2025
outputAudioTranscription NOT WORKING WHEN [Modality.AUDIO] Gemini API api , models , gemini-flash	2	246	June 19, 2025
Live API with ephemeral token ignores the system_instruction Gemini API api , gemini	4	117	January 16, 2026
Will it be possible to receive text and audio data in the multimodal API? Gemini API models , gemini-api	13	986	July 22, 2025

Transcript on live audio not been passed back during conversation (ephemeral tokens auth)

Related topics