Gemini 2.5 Pro TTS (Text-to-Speech) Review & Urgent Request for Commercial GA

Subject: Gemini 2.5 Pro TTS is Groundbreaking, Solves Industry-Wide Problems – URGENT Need for Consistency Fix and Commercial GA Schedule.

Dear Google Gemini Development Team,

I’m a content creator who specializes in feature narratives (drama/high definition narration). After extensively testing almost all major commercial TTS services (Clova, Taipeicast, Eleven Labs, ConanVoice, etc.), I found that all of them were fundamentally unsuitable for high-level voice actors.

Using the Gemini 2.5 Pro Preview TTS feature in AI Studio, I tested over 150,000 characters night and night for three days, and I want to clarify this:

The audio quality and narrative comprehension of this new TTS feature are absolutely innovative and outperform all other competitors in the market.

  1. Key technical achievements (why this is a ‘masterpiece’)

    Perfect long consistency: This model neatly handles up to 2,000 characters long narratives (excluding spaces) without artificial relaxation, degradation of emotion, or other AI common robotic accents.

    Perfect Conversation/Narrative Delivery: This model delivers interactive and complex narrative phrases with near-perfect contextual flow and emotional authenticity for approximately 1,500 scripts. Understand the intent of the text and the intent of the sentence.

    Foreign Language/Self-directed Solving: Demonstrate superior language modeling skills by addressing important pronunciation issues (e.g., ‘Sonoma County’) that other Korean AIs are severely struggling with foreign nominations.

1.5. Current Minor Flaws & The Consistency Crisis

But I can’t say it’s 100% perfect yet. AI narrators work perfectly on hundreds of characters, but exceeding that length can also cause minor glitches, such as mispronouncing a few characters or replacing certain parts of a sentence with other words or sentences. The latter (word/sentence replacement) is generally fine because it doesn’t significantly change the overall context. The real problem is modifying mispronounced characters, which is virtually impossible to replace segments smoothly. This is because you have to correct only certain short parts again, but your voice changes every time you make a new request.

  1. Single significant barrier to commercial use

Despite its incredible quality, it has yet to be commercially adopted due to fundamental flaws in previews:

Problem: Voice characteristics (tone, pitch, certain voice textures) change with almost every subsequent call or slight text editing.

Implications: Due to this lack of consistency, it cannot be used in continuous projects (e.g., dramas or multi-part documentaries) that require a single, stable speaker's voice.


3. Urgent requests and conclusions

The mismatch problem is a feature that needs to be solved simply and fundamentally for commercial products (i.e., speaker ID locking). Once consistency is guaranteed, we firmly believe that this feature is ready to go to market.

    Request 1: Please correct Gemini 2.5 Pro TTS Voice Consistency (LOCKING Speaker ID) first.

   Request 2: Please share the general availability (GA) schedule for this specific TTS feature within the Gemini API.

This technology is ready to revolutionize the high-end audio content industry completely. My entire project and business plan depend on commercial launch. Thank you for developing this groundbreaking technology.
2 Likes

Pro TTS has so much potential I came here to see if anything had changed since the TTS model came out (May I think) and it hasn’t. The inconsistency of the voices between runs means you can’t use it for anything other than ~10 minutes of standalone audio (max output tokens), there’s no continuity if you wanted to use the model for a longer narration, have a second chapter, as the voice will be different, especially if your prompt specifies an accent.

1 Like

Thank you for sharing your experience! I completely agree with everything you said.

I’ve tested this Pro TTS with over 150,000 characters for a drama project, and the quality is truly groundbreaking, solving the narrative issues all competitors have experienced.

However, the speaker ID locking/lack of consistency is the only critical flaw that prevents commercial adoption of major serialized content. I 100% agree with your comment that it doesn’t work after the second chapter.

Thank you for your valuable feedback. Have a great day!


I sincerely hope the Gemini team recognizes that solving this consistency crisis is their only remaining challenge. Once this model stabilizes, there’s no doubt it will dominate the professional narration market.

@GoogleDevelopers: Please prioritize fixing the consistency issue and share the GA schedule immediately!

Hi @jinheei ,

Thank you for your feedback. We appreciate you taking the time to share your thoughts with us.

Thank you for taking the time to share your thoughts!

I’m glad to see that others feel the same way about the potential for Gemini 2.5 Pro Preview TTS. I truly believe that addressing those limitations will be a huge step forward for creators. Hopefully, our feedback helps bring a better commercial product to market soon!

Thanks again for the support!