Metallic sounds using gemini-2.5-flash-preview-tts

Since today (11-12-2025) i have a different voice (not really a problem) but also an annoying metallic pitch in the generated WAV. This was not a problem before today.

Today, 12-12-2025, after 8 minutes in a generated .WAV file, i thought i heard a fire engine outside. This was also in the WAV and not outside.

Is this related to the adjustment beneath and can i solve it or is this being resolved already?

effective December 10, 2025.

Starting on this date, you will automatically get significant improvements in expressivity, pacing, and overall audio quality. To ensure a seamless transition, the new models maintain many characteristics of the previous version.

What you need to do

No action is required from you.

No code changes required: Your existing API calls to gemini-2.5-flash-preview-tts and gemini-2.5-pro-preview-tts will automatically begin using the new model.

Kind regards,

Chris

5 Likes

Seeing the same issue here. The voice timbre has changed overall as well.

1 Like

Also metallic sound, mostly after couple of minutes, not directly?

Same here. If generating several minutes of speech, after a few minutes it sounds more metallic / degraded. ALSO, the PACE increases as the speech progresses. I gave very specific performance notes to instruct that the pace should be even throughout the entire script, but it does not work. Metallic sound and uneven pace needs to be addressed.

Same here regarding increasing speed

I’ve also noticed those issues.

I am facing the same issues too.

Hi @Chris_Dool,
Thank you for bringing this to our attention. We truly appreciate you flagging this issue, and we have escalated it to the relevant team for further investigation.

1 Like

@Pooja_Kapse could you give some sort of timeline for a fix? this has been a problem since November now.

thanks!

The tts output is totally unusable at this point since the Dec 10th update.

1 Like

I agree the output is unusable

Yes, for months now, the problems persist, as if google never tests their own product. I generate tts a couple times a day, and the results are almost always the same, for any voice used. I typically generate a few minutes of audio per prompt:

  1. The voice quality and pace start by sounding great for the first 25-50% of the results.
  2. The voice quality gets worse, like the voice is coming from a tin can, AND the pace quickens verry noticeably.

The fact that this occurs constantly, with no fix after months, really does seem as if google does not use or support this model, or perhaps it is even intentional for some reason?

I’ve noticed similar behavior as well, especially on longer generations. The gradual drop in audio quality and change in pacing feels consistent enough to suggest a systemic issue rather than random errors. It would be interesting to hear if others see this too, or if Google has acknowledged it anywhere.

Here is a prime example of the two main problems I see fairly consistently with longer tts generations: Eventual quality degradation, and faster pace as the speech evolves over time.

Same issue here! The voice gets quite bad after 2-3 minutes, and if I try to chunk audio into smaller pieces, the voice totally changes for the next chunk.

Day after day, it’s the same results. No word from Google, no changes in the model behavior. At this point, no longer worth commenting on, and will just assume this is an abandoned model..

1 Like