Real-Time Speech-to-Text

Dear Guys,

I hope this message finds you well.

I am currently working with Gemini 1.5 Pro’s newest multi-model and am seeking a feature similar to the speech-to-text conversion available in the Vertex AI playground. While I have noticed that the current Gemini API examples perform inferencing in batches after uploading a file, my requirement is for real-time processing.

Could you please guide me on how to achieve real-time speech-to-text conversion using Gemini 1.5 Pro? Any assistance or direction you can provide would be greatly appreciated.

Thank you very much for your help.

Best regards,

Hung Truong

Welcome to the forums!

Gemini 1.5 doesn’t have real-time streaming using the public interface. (It isn’t even clear if it has it using internal interfaces, but this is what Project Astra seems to be working on.)

If you want streaming transcription, you best bet is to use the Google Cloud Speech to Text API which gives you control over which model works best for your purposes.

1 Like