Why was the TTS model nerfed on December 10th?

Chuck · December 16, 2025, 2:10pm

It’s very clear that the updated TTS model is inferior to its predecessor on multiple fronts.

Metallic noise in the generated audio files
Expressivity and prompt adherence degraded. Pro model is also worse, yet somehow slower than befpre.
Worst problem of all: voice identity and timbre has all changed.

Anyone else facing the same issues?

Srikanta_K_N · December 18, 2025, 10:18am

Hi @Chuck, thanks for reaching out!

Could you please share the model you are using? and also if possible the exact prompt for which you are getting problems, so that we can analyze better.

Pavlin_Stoichev · December 18, 2025, 11:00am

Yes, the voices now sound more robotic, fake and don’t change tone properly.
How do we revert to the older models?

I was doing a podcast and it sounded really well and natural. Now it sounds fake and like an “AI slop“

Alnilam is severely downgraded and changed. Achernar less so, but they both sound inadequate.
Worst is the lack of consistency not only between chunks (some faster, some slower…) but also within a single response.

Here are two examples. Both files contain snippets of pre and after nerf recording - using the flash models.

The firs one is about the change of personality.

Notice how the lively natural conversation is replaced by pretend suspense building a-la children television show.

This one is more problematic:

Alnilam changes voice completely within the single API response
Giant pause with awkward resolution.

The new voices generally sound fake and awkward. Pre Dec 10 it was a dynamic, natural, easy to follow chat. Now it is like a neighborhood theater open-mic event. People used to ask me if it is really AI as is sounded so good, now it is obviously fake and with some kind of an amateurish pathos.

P.S. This is the prompt for the second file

SARAH
This episode releases on December 1st, and as the community knows, that means it’s World AIDS Day. Victor, what’s on your mind today as we mark this date?

VICTOR
(Pauses reflectively)
It’s a powerful day for memory, of course. For remembering the resilience we built out of necessity. But a day like this has to be about accountability, too. We’re digging into the ECDC’s 2025 HIV Report today—it’s a major community check-in.

SARAH
(Seriously)
And the news is… complicated. The report essentially warns us about a “Hidden Crisis” in Europe. Victor, the number that slapped me in the face was this: 54% of all new HIV diagnoses in 2024 were late.

VICTOR
Exactly. Over half of the people diagnosed are already immune-compromised. That’s why we’re breaking this data down today: to understand the gap between our amazing success in treatment and our massive failure in testing.

SARAH
We’ll simplify the numbers, talk about the barriers like stigma and PrEP access, and—most importantly—what we can practically do this week. Let’s dive in.

Srikanta_K_N · January 2, 2026, 5:13am

Hi,

Are you still facing this issue?

Chuck · January 2, 2026, 6:06am

Yes we are. This happens for ALL the models, for EVERY generation, for EVERY prompt.

Please do me a favor: deploy your older models internally and simply listen to how the audio has changed. It’s extremely obvious.

Pavlin_Stoichev · January 2, 2026, 5:30pm

Did some tests today and actually recorded a full podcast episode to get the feeling how the model behaves now. For me it is way better!

The pro voices still sound mostly flu-sick and depressed, but the flash is quite good. The experience will probably vary by voice, like I noticed Orus is now completely different.
From the voices that I use:

Achernar (flash) is better than ever! Has the old liveliness back with a polish to sharpen the edges.
Alnilam (flash) is still somewhat unstable, not as much as before, so in a dynamic podcast-form dialogue it is mostly OK, but there are still inconsistencies - sometimes he sounds younger, sometimes older, sometimes the melancholic monotonic performance from the past weeks comes back for a set of lines.
Sometimes background noise appears mid-chunk.

On one hand mixing a bit of acting in the natural talk is OK, but since the “AI director“ operates within the chunk’s context this creates continuity problems - the start of a new chunk sounds radically different. I’m not sure what the balance here should be as the added “acting“ does fix some of the rough edges of the voices from before December.
Maybe an option would be to send the full text (or the prev and next paragraph) for context and an instruction which part to actually voice.

Here is a short clip:

Pavlin_Stoichev · January 15, 2026, 8:07pm

Voices keep changing and mutating not only between chunks, but within a single chunk. Changes from younger to older sounding, or other similar variations that create a creepy uncanny valley experience. So unfortunately the nerf in December is still pretty much an issue.

Shrestha_Basu_Mallic · January 28, 2026, 8:36am

Apologies for the inconvenience. We have passed this feedback to the model team

phil_swenson · January 28, 2026, 4:01pm

@Shrestha_Basu_Mallic is solving these issues on the roadmap? when translating text to speech for long audio the audio must be split into many pieces so we really need a) consistent quality among each “chunk” b) a voice ID to pass to gemini to all the successive api “chunks” for consistency.

is this on the roadmap? if not please let us know and we will try OpenAI or another approach.

Pavlin_Stoichev · February 8, 2026, 4:32pm

3 months now since the nerfing. Noting gets better, voices keep mutating - not only across chunks but within a single chunk just a sentence apart.

nicholsss · February 19, 2026, 3:57pm

Any news on this? Have not tried Gemini TTS For while, is The audio generation better?

Topic		Replies	Views
Gemini TTS voices have changed Gemini API gemini	5	271	January 10, 2026
Metallic sounds using gemini-2.5-flash-preview-tts Gemini API api , gemini-flash	16	496	February 2, 2026
Live API Audio Talk Worse After the Update Gemini API model , audio , live-streaming	2	314	April 29, 2025
Live Api quality has become really bad overnight? Gemini API ai-studio , api , models , audio , live-streaming	3	232	June 20, 2025
Gemini Live Model Regression Gemini API model , gemini-flash	2	149	April 2, 2025

Why was the TTS model nerfed on December 10th?

Related topics