Gemini 2.5 Pro TTS (Text-to-Speech) Review & Urgent Request for Commercial GA

jinheei · October 22, 2025, 12:50am

Subject: Gemini 2.5 Pro TTS is Groundbreaking, Solves Industry-Wide Problems – URGENT Need for Consistency Fix and Commercial GA Schedule.

Dear Google Gemini Development Team,

I’m a content creator who specializes in feature narratives (drama/high definition narration). After extensively testing almost all major commercial TTS services (Clova, Taipeicast, Eleven Labs, ConanVoice, etc.), I found that all of them were fundamentally unsuitable for high-level voice actors.

Using the Gemini 2.5 Pro Preview TTS feature in AI Studio, I tested over 150,000 characters night and night for three days, and I want to clarify this:

The audio quality and narrative comprehension of this new TTS feature are absolutely innovative and outperform all other competitors in the market.

Key technical achievements (why this is a ‘masterpiece’)

Perfect long consistency: This model neatly handles up to 2,000 characters long narratives (excluding spaces) without artificial relaxation, degradation of emotion, or other AI common robotic accents.

Perfect Conversation/Narrative Delivery: This model delivers interactive and complex narrative phrases with near-perfect contextual flow and emotional authenticity for approximately 1,500 scripts. Understand the intent of the text and the intent of the sentence.

Foreign Language/Self-directed Solving: Demonstrate superior language modeling skills by addressing important pronunciation issues (e.g., ‘Sonoma County’) that other Korean AIs are severely struggling with foreign nominations.

1.5. Current Minor Flaws & The Consistency Crisis

But I can’t say it’s 100% perfect yet. AI narrators work perfectly on hundreds of characters, but exceeding that length can also cause minor glitches, such as mispronouncing a few characters or replacing certain parts of a sentence with other words or sentences. The latter (word/sentence replacement) is generally fine because it doesn’t significantly change the overall context. The real problem is modifying mispronounced characters, which is virtually impossible to replace segments smoothly. This is because you have to correct only certain short parts again, but your voice changes every time you make a new request.

Single significant barrier to commercial use

Despite its incredible quality, it has yet to be commercially adopted due to fundamental flaws in previews:

Problem: Voice characteristics (tone, pitch, certain voice textures) change with almost every subsequent call or slight text editing.

Implications: Due to this lack of consistency, it cannot be used in continuous projects (e.g., dramas or multi-part documentaries) that require a single, stable speaker's voice.


3. Urgent requests and conclusions

The mismatch problem is a feature that needs to be solved simply and fundamentally for commercial products (i.e., speaker ID locking). Once consistency is guaranteed, we firmly believe that this feature is ready to go to market.

    Request 1: Please correct Gemini 2.5 Pro TTS Voice Consistency (LOCKING Speaker ID) first.

   Request 2: Please share the general availability (GA) schedule for this specific TTS feature within the Gemini API.

This technology is ready to revolutionize the high-end audio content industry completely. My entire project and business plan depend on commercial launch. Thank you for developing this groundbreaking technology.

benniepie · October 31, 2025, 1:33am

Pro TTS has so much potential I came here to see if anything had changed since the TTS model came out (May I think) and it hasn’t. The inconsistency of the voices between runs means you can’t use it for anything other than ~10 minutes of standalone audio (max output tokens), there’s no continuity if you wanted to use the model for a longer narration, have a second chapter, as the voice will be different, especially if your prompt specifies an accent.

jinheei · October 31, 2025, 3:05am

Thank you for sharing your experience! I completely agree with everything you said.

I’ve tested this Pro TTS with over 150,000 characters for a drama project, and the quality is truly groundbreaking, solving the narrative issues all competitors have experienced.

However, the speaker ID locking/lack of consistency is the only critical flaw that prevents commercial adoption of major serialized content. I 100% agree with your comment that it doesn’t work after the second chapter.

Thank you for your valuable feedback. Have a great day!

I sincerely hope the Gemini team recognizes that solving this consistency crisis is their only remaining challenge. Once this model stabilizes, there’s no doubt it will dominate the professional narration market.

@GoogleDevelopers: Please prioritize fixing the consistency issue and share the GA schedule immediately!

Mrinal_Ghosh · November 10, 2025, 8:00am

Hi @jinheei ,

Thank you for your feedback. We appreciate you taking the time to share your thoughts with us.

jinheei · November 10, 2025, 8:52am

Thank you for taking the time to share your thoughts!

I’m glad to see that others feel the same way about the potential for Gemini 2.5 Pro Preview TTS. I truly believe that addressing those limitations will be a huge step forward for creators. Hopefully, our feedback helps bring a better commercial product to market soon!

Thanks again for the support!

J_Louise · November 18, 2025, 3:03am

So, how did you make it create consistent voices?

jinheei · November 19, 2025, 2:58pm

I’m still having trouble getting a consistent voice with the Gemini 2.5 Pro Preview TTS. My request for a commercial release was conditional on these issues being fixed first. I think it would be a fantastic TTS if they could just fix the subtle changes in voice and tone that happen every time I re-input the text. It’s frustrating that the dev team hasn’t been able to address that problem.
Thanks for your comment, and have a nice day!

J_Louise · November 27, 2025, 6:06pm

I agree with your assessment. Thanks for responding.

Take Care!!!

Kes · February 20, 2026, 10:19pm

If you’re generating TTS files in a batch or pipeline, you can use tsaudit to automatically detect the speech files that dont sound like the others and then just regenerate them ones

Topic		Replies	Views
Inconsistent Audio Output with Gemini 2.5 Pro Preview TTS Google AI Studio ai-studio , gemini , audio	24	3102	February 20, 2026
When can we expect a stable Gemini 2.5 Pro TTS Release? Gemini API api , audio	3	343	December 1, 2025
Gemini 2.5 Pro Preview TTS: Inconsistent Voice and Tone Output Google AI Studio audio , gemini-2-5	2	370	May 31, 2026
Issues with gemini-tts-2.5-pro in AI Studio (blank audio, voice drift, pacing changes) Google AI Studio models , audio	1	277	May 31, 2026
Gemini TTS Voice consistency Gemini API gemini , gemini_25_pro , gemini-flash-2-5	0	318	February 19, 2026

Gemini 2.5 Pro TTS (Text-to-Speech) Review & Urgent Request for Commercial GA

Related topics