Real-Time Speech-to-Text

HungTruong · May 29, 2024, 9:42am

Dear Guys,

I hope this message finds you well.

I am currently working with Gemini 1.5 Pro’s newest multi-model and am seeking a feature similar to the speech-to-text conversion available in the Vertex AI playground. While I have noticed that the current Gemini API examples perform inferencing in batches after uploading a file, my requirement is for real-time processing.

Could you please guide me on how to achieve real-time speech-to-text conversion using Gemini 1.5 Pro? Any assistance or direction you can provide would be greatly appreciated.

Thank you very much for your help.

Best regards,

Hung Truong

afirstenberg · May 29, 2024, 11:29am

Welcome to the forums!

Gemini 1.5 doesn’t have real-time streaming using the public interface. (It isn’t even clear if it has it using internal interfaces, but this is what Project Astra seems to be working on.)

If you want streaming transcription, you best bet is to use the Google Cloud Speech to Text API which gives you control over which model works best for your purposes.

Topic		Replies	Views
Regarding Google Project ready Voice module Gemini API gemini-15 , ai-studio , api , vertexai , gemini	2	112	November 27, 2025
Transcribe text to text and vice versa, speech to speech and image to text in a flutter app using gemini Gemini API	15	874	May 20, 2024
Gemini Live API (Speech to transcription) Gemini API ai	1	42	June 23, 2026
Will it be possible to receive text and audio data in the multimodal API? Gemini API models , gemini-api	13	1051	July 22, 2025
How to add a real-time AI avatar (with lip-sync) to a google AI Studio streaming chatbot? Google AI Studio gemini-15 , ai-studio , api	5	1152	July 2, 2026

Real-Time Speech-to-Text

Related topics