Gemini 2.5 Flash Preview TTS

I am working on a project and I am thinking of using Gemini 2.5 Flash Preview TTS but I am not sure about the pricing model.


Here, you can see output price as $10 for 1M tokens in USD. I just want to confirm if they are talking about audio tokens (which is 32 tokens = 1 second). Considering this I can generate around 520 minutes of content for 1M tokens (consider 90 seconds for 1K tokens).

Thank You!

Hi @Anirudh_Singh , welcome to the forum.

You are right, gemini-2.5-flash-preview-tts uses 32 tokens per second of audio. So, 1M tokens would yield approximately 520 minutes of content. Also, please note that TTS session has a context window limit of 32k tokens. Refer to the official docs for more details and limitations.

Thank you

1 Like