TTS audio generation background noise

Since Sept 15th (1 month and thirteen days as I’m writing this post right now), TTS audio generation quality has been notoriously poorer. As you listen to the generated audio, background noise like that found in radio interference has been persistent. Both Gemini 2.5 Flash Preview and Gemini 2.5 Pro Preview have the same issue going on. I’ve tried everything I can to make those audios better, but whatever tools I try using to correct this issue, the outcome is never satisfactory.

The previous quality was way better.

3 Likes

I’m getting a gradually increasing metalic ‘echo’ type of sound in the background (like Babylon5 ‘Shadow’ talk) and with eventually echo’d voice (like its speaking in a tube) - it gets worse the longer it goes, with around the 5min+ mark it’s very noticeable (using Gemini TTS….2.5 Pro preview).

2 Likes

I’m kind of losing hope this issue will be ever fixed for it’s been two months now since it started and nothing’s been done yet.

1 Like

Hi @Raff_Silver Apologies for late response
I am trying to replicate the issue, could you please provide the following details: Was the setup for single-speaker or multi-speaker use, which specific voice and language were selected, the exact number of tokens used in the prompt? Additionally, if there any other required steps for reproduction?
Also , if possible could you share a sample of the audio file which had background noise?

@Pannaga_J Thank you for your reply.

  • Both “single” and “multi” speaker have the same issue.
  • The selected language is English.
  • All voices have background noise.
  • Any audio lenght is affected.

Thank you very much. Looking forward to hearing from you.

1 Like

Hi @Raff_Silver,

Could you please provide sample audio and prompt to replicate the issue?

Sure. How can I share it with you guys?

Hi @Raff_Silver,
You can DM me with all the required details.

Done already! Thank you!

Hi @Raff_Silver,

Thank you for providing the sample audio. We have escalated this issue to our engineering team for further investigation to look into the noise and echo. We appreciate your patience as we work to resolve these audio quality concerns.

well, for local processing of voice assitants that can be run on even gtx 1080ti you can look at this https://www.youtube.com/watch?v=oZ9YpYAowt8

I’m having the same issue. My hope was to use ai studio as a replacement for elevenlabs. The voice quality is initially amazing, but as the audio goes past a minute or two, it starts to get a metalic echo as the OP described here. by the end it is so bad, it sounds like I’m hearing a phone call through a tin can. I’d love to ditch elevenlabs for aistudio. Any update on this?

3 Likes

I am the author of a book,k and I have tested Gemini 2.5 Pro TTS. It seemed to be perfect. I did multiple tests on two paragraphs, including testing nearly all the different voices. All the tests came back perfect. I decided to move forward. I figured at first, the best would be one page at a time. I had 15 pages in the first chapter. Each page is between 600-700 words. 3900-4250 characters. The first two were perfectly fine. I ran twelve pages and discovered what is being reported. All but two others were unusable. I spoke with Google Billing. They are credited by account. I appreciate that. But, more importantly, I would really like to be able to use Gemini TTS.
I thought I would try maybe two paragraphs at a time. But, reading this thread, others seem to have problems with that, too. Is there any ETA for fixing this? The recordings that I did are really bad. Some, eh! They’re not horrible, but definitely not usable in a professional recording. But most are just horrific. @Raff_Silver I am more than happy to provide the recordings to you for review. It seems to have been reported over four months ago with no update.