Audio transcript in Gemini Live API not really working

Ariana_G · September 21, 2025, 2:00am

I’m trying to add transcription to the Gemini Live demo code here. According to Google’s official capability: Live API capabilities guide | Gemini API | Google AI for Developers

But the transcription is a mess, like below. Am I missing anything? Any extra flags to set?

[Model Transcript]:  Ca
[Model Transcript]: n I
[Model Transcript]:  pl
[Model Transcript]: eas
[Model Transcript]: e h
[Model Transcript]: ave
[Model Transcript]:  yo
[Model Transcript]: ur
[Model Transcript]: acc
[Model Transcript]: oun
[Model Transcript]: t n
[Model Transcript]: umb
[Model Transcript]: er

Aciax_Hls · September 21, 2025, 7:00am

The behavior you’re seeing is expected for a streaming API. To provide real-time feedback, the API sends back interim (partial) transcripts as it processes the audio. You are printing every one of these partial results.

To fix this, you need to filter the responses and only use the transcript when the API flags it as final.

Ariana_G · September 21, 2025, 3:55pm

Ok how to do mark it as final?
Yeah it’s useless right now. We just need to print it when it ends

Aciax_Hls · September 21, 2025, 3:58pm

Based on your request, it seems you want to “mark” the previous response as “final” or indicate that the conversation has concluded, and you also mention “it’s useless right now. We just need to print it when it ends.”

Could you please clarify what you mean by “mark it as final”? Are you:

Requesting a specific output format? For example, you want me to add a phrase like “[END OF RESPONSE]” or “[FINAL]” to the end of my replies.
Trying to end the current conversation? In this case, you can simply stop asking questions.
Referring to a feature or command for a specific application or process? If so, please provide more context about the application you are using.

The phrase “We just need to print it when it ends” suggests you might be part of a larger process where my response is an interim step, and the final output is generated later.

Please provide more detail about what you are trying to achieve so I can give you a more accurate and helpful response.

Ariana_G · September 21, 2025, 4:20pm

This is what i mean. How can we do that?

Mahesh_Sutar · November 25, 2025, 9:43am

Hello

Welcome to the forum!!

I ran into this too. Since the API streams the text bit-by-bit for speed, you just need to buffer those fragments in a variable and only print the result when the API sends the turn_complete signal.
Here is the code snippet

import asyncio
import google.genai as genai
from google.colab import userdata

Initialize Client

client = genai.Client(
## api_key=userdata.get(“your Key”),
http_options={“api_version”: “v1alpha”}
)

async def main():
model_id = “gemini-2.0-flash-exp”
# output_audio_transcription is required to receive text chunks
config = {“response_modalities”: [“AUDIO”], “output_audio_transcription”: {}}

async with client.aio.live.connect(model=model_id, config=config) as session:
    # Send a prompt to trigger audio
    await session.send(input="Can I please have your account number?", end_of_turn=True)

    full_transcript = ""

    async for response in session.receive():
        server_content = response.server_content
        if server_content is None: continue

        # 1. Accumulate text chunks silently (don't print partials)
        if server_content.output_transcription:
            full_transcript += server_content.output_transcription.text

        # 2. Print ONLY when the turn is complete
        if server_content.turn_complete:
            print(f"Final Transcript: {full_transcript}")
            full_transcript = ""
await main()

Thanks

Topic		Replies	Views
Gemini Live API: Delays or Missing input_audio_transcription Events Gemini API bug , api , models , gemini	12	366	January 9, 2026
Transcript are coming in chunks , Need to know supporeted parameterin output_audio_transcription = {} , which will give complete transcription Gemini API api , models	1	127	June 26, 2025
Live API get output transcription timestamp Gemini API models , gemini , audio	5	242	January 15, 2026
Will it be possible to receive text and audio data in the multimodal API? Gemini API models , gemini-api	13	972	July 22, 2025
Gemini Live API: print the transcripts Gemini API api , gemini	3	125	July 17, 2025

Audio transcript in Gemini Live API not really working

Related topics