Realtime Transcription in Multimodal Live API

The Stream Realtime tab in the Google AI Studio is able to display both the audio and the transcript of the model’s response, but the multimodal live API demo on GitHub that does not display the transcript along with the audio response. How do I adjust the code to make it display the transcript too? Is there somewhere that I can go checkout?

I have the same question.

This is what I get from API:

interface GenerativeContentBlob {
mimeType: string;
data: string;
}

So it seems API does not return transcript, only audio.

I think of sending audio to some audio-to-text API to get transcript but I don’t like this idea. OpenAI’s API just returns transcript together with audio.

Have you found a more elegant solution?