Realtime Transcription in Multimodal Live API

yingchs · December 20, 2024, 3:12am

The Stream Realtime tab in the Google AI Studio is able to display both the audio and the transcript of the model’s response, but the multimodal live API demo on GitHub that does not display the transcript along with the audio response. How do I adjust the code to make it display the transcript too? Is there somewhere that I can go checkout?

KStarobinets · January 9, 2025, 1:15pm

I have the same question.

This is what I get from API:

interface GenerativeContentBlob {
mimeType: string;
data: string;
}

So it seems API does not return transcript, only audio.

I think of sending audio to some audio-to-text API to get transcript but I don’t like this idea. OpenAI’s API just returns transcript together with audio.

Have you found a more elegant solution?

chirag1 · April 19, 2025, 8:28pm

Hello facing same issue did you get any fix or any method from which can we achieve audio and text simuntaneously ?? Please help

Shubham_Sahu · May 6, 2025, 2:20pm

It gives you the transcript. You can find the solution here: Will it be possible to receive text and audio data in the multimodal API?

Topic		Replies	Views
Will it be possible to receive text and audio data in the multimodal API? Gemini API models , gemini-api	13	920	July 22, 2025
Why in Gemini Live API with Audio Modality its Transcription is not available in response Gemini API audio , live-streaming	5	237	August 15, 2025
outputAudioTranscription NOT WORKING WHEN [Modality.AUDIO] Gemini API api , models , gemini-flash	2	198	June 19, 2025
Gemini live api issue multimodal Gemini API api , live-streaming	1	122	October 10, 2025
Transcript on live audio not been passed back during conversation (ephemeral tokens auth) Gemini API models , audio , live-streaming	6	99	October 13, 2025

Realtime Transcription in Multimodal Live API

Related topics