What are you guys thoughts on googles 2.5 TTS lineup?
ive been toying with it, using it for streaming (before i hit quota of course)
and comparing it to my previous elevenlabs stack.
Things ive noticed so far
the voice is nice but reaaaally inconsistent. It can sound like separate people between clips, they need a seed type option to look a voice down to a preferred output.
Latency pretty high? when it comes to streaming theres often a noticeable delay between segments, whereas the eleven stack was basically smooth as butter so theres gonna need to be some effort to get that down
price…this is where they win and why i was so eager to try, its basically a 7x savings over even the cheapest eleven model which is why i immediately jumped ship
You’ve hit on the exact pros and cons. The price is tempting, but the developer experience has some serious hurdles.
Regarding the latency you mentioned for streaming, you’re probably using the standard TTS API. Google actually has a live API, you can use native audio dialog model designed specifically for real-time, low-latency interaction. It’s much closer to the “smooth as butter” experience you got from ElevenLabs.
However—and this is the huge deal-breaker you mentioned—the quota limits are a massive problem. While the limits on the API are likely to prevent abuse in AI Studio, they’re counter-productive for anyone trying to build a real product. The quotas for the low-latency Live API are even more restrictive, making it almost impossible to scale a startup or even a moderately popular stream on it.
So you’re stuck in a frustrating spot: use the high-latency API with slightly better (but still low) limits, or use the fantastic low-latency API that you can’t actually scale.
Hello,
Could you please confirm whether you are facing this issue with AI Studio or Gemini API?
I played around with the Gemini Flash TTS and wasn’t able to make it stream its response. Hence the latency is very sloooow.
Of course, the Live API is much better in this regard.
Could you please share your code, so that we can try to reproduce your issue?