Im a longtime eleven labs user who was using their super turbo fastest tts or whatever
was super excited to switch my project to google tts due to the cost savings since elevenlabs was eating me alive.
But i notice documentation is kinda slim, i implemented streaming and such…but looking for best practices. Right now the startup talk of the first sentence lags elevenlabs by maybe 2 seconds which is fine
but during the streaming while its talking it will sometimes i guess…pause at the end of a couple sentences as if it ran ahead of its buffer
since theres so much realtime gemini voice out there i figure maybe theres some tips/tricks to get the best out of the latency
Hi @codertradergamer,
Welcome to the Google AI Forum!

Thank you for sharing your experience. Latency and buffering issues with Gemini Flash’s TTS are known challenges.. These issues can stem from various factors, including network latency, server load, and the inherent processing time of the TTS model.
Here are some best practices:
Adjust Thinking Budget:
By configuring a thinking budget, you can control the amount of time the model spends processing before generating speech. This config can help balance latency and response quality. For more details, refer to the Gemini 2.5 Flash documentation.
Optimize Audio Buffering:
Implement dynamic audio buffer that adjusts based on the model’s processing speed. This approach ensures a smoother playback experience.
Utilize Native Audio Output:
Gemini 2.5 Flash supports native audio output, which can enhance the quality and responsiveness of TTS. For more info, please refer to Gemini 2.5 Native Audio documentation.
Manage Server Load:
High server load can contribute to latency issues.
If you continue to experience issues or need further assistance, please provide additional details about your implementation, and I’ll be glad to help.