Hi there.
When live session on audio gemini 2.0 flash, the transcript not available, even if requested.
Audio streams back, real-time, but no sign of the transcript.
And tried all the possible and impossible types of requests / and reads of responses via web socket.
Is it because of ephemeral tokens authentification?
Hi @scaraliu
Welcome to the forum!!!
Could you please share the complete payload details along with the steps to reproduce the issue? This will help us investigate it more accurately and provide a more precise response.
Hi,
I was just working at it right now.
Here is the object that is been sent as the firsst afer the connectioon via ws is opened
{
"setup": {
"model": "models/gemini-2.0-flash-live-001",
"generationConfig": {
"responseModalities": \[
"TEXT",
"AUDIO"
\]
},
"systemInstruction": {
"parts": \[
{
"text": "You are ....................., a helpful and friendly AI assistant. You can answer questions, help with tasks, and for complex queries or bookings, you can use the ……………tool which has access to the full conversation history."
}
\]
}
}
}
the request is sent wss://generativelanguage.googleapis.com//ws/google.ai.generativelanguage.v1alpha.GenerativeService.BidiGenerateContentConstrained?access_token=auth_tokens
on front end we use, with epheral tokens this version of the library
import { GoogleGenAI } from ‘https://cdn.jsdelivr.net/npm/@google/genai@1.22.0/+esm’;
the most wired part, is that even do on some ocasions i get back in the stream of audio something like this
{
“serverContent”: {
“turnComplete”: true
},
“usageMetadata”: {
“promptTokenCount”: 605,
“responseTokenCount”: 11,
“totalTokenCount”: 616,
“promptTokensDetails”: [
{
“modality”: “TEXT”,
“tokenCount”: 604
},
{
“modality”: “AUDIO”,
“tokenCount”: 1
}
],
“responseTokensDetails”: [
{
“modality”: “TEXT”,
“tokenCount”: 11
}
]
}
}.
Still, no sign of text / transcript, and its taking me hours and hours to fix this.
Its very, very painfull as i saw at open ai things work smooth on the api of live audio, and im stuck with this google live audio.
Hi @scaraliu
Could you please try using our new model gemini-live-2.5-flash-preview-native-audio?
It offers more reliable transcription compared to the earlier 2.0 releases. If you are still experiencing the same issue, please let me know.
1 Like