Realtime Transcription in Multimodal Live API

The Stream Realtime tab in the Google AI Studio is able to display both the audio and the transcript of the model’s response, but the multimodal live API demo on GitHub that does not display the transcript along with the audio response. How do I adjust the code to make it display the transcript too? Is there somewhere that I can go checkout?

I have the same question.

This is what I get from API:

interface GenerativeContentBlob {
mimeType: string;
data: string;
}

So it seems API does not return transcript, only audio.

I think of sending audio to some audio-to-text API to get transcript but I don’t like this idea. OpenAI’s API just returns transcript together with audio.

Have you found a more elegant solution?

Hello facing same issue did you get any fix or any method from which can we achieve audio and text simuntaneously ?? Please help

It gives you the transcript. You can find the solution here: Will it be possible to receive text and audio data in the multimodal API?