Since Sept 15th (1 month and thirteen days as I’m writing this post right now), TTS audio generation quality has been notoriously poorer. As you listen to the generated audio, background noise like that found in radio interference has been persistent. Both Gemini 2.5 Flash Preview and Gemini 2.5 Pro Preview have the same issue going on. I’ve tried everything I can to make those audios better, but whatever tools I try using to correct this issue, the outcome is never satisfactory.
I’m getting a gradually increasing metalic ‘echo’ type of sound in the background (like Babylon5 ‘Shadow’ talk) and with eventually echo’d voice (like its speaking in a tube) - it gets worse the longer it goes, with around the 5min+ mark it’s very noticeable (using Gemini TTS….2.5 Pro preview).
Hi @Raff_Silver Apologies for late response
I am trying to replicate the issue, could you please provide the following details: Was the setup for single-speaker or multi-speaker use, which specific voice and language were selected, the exact number of tokens used in the prompt? Additionally, if there any other required steps for reproduction?
Also , if possible could you share a sample of the audio file which had background noise?
Thank you for providing the sample audio. We have escalated this issue to our engineering team for further investigation to look into the noise and echo. We appreciate your patience as we work to resolve these audio quality concerns.
I’m having the same issue. My hope was to use ai studio as a replacement for elevenlabs. The voice quality is initially amazing, but as the audio goes past a minute or two, it starts to get a metalic echo as the OP described here. by the end it is so bad, it sounds like I’m hearing a phone call through a tin can. I’d love to ditch elevenlabs for aistudio. Any update on this?
I am the author of a book,k and I have tested Gemini 2.5 Pro TTS. It seemed to be perfect. I did multiple tests on two paragraphs, including testing nearly all the different voices. All the tests came back perfect. I decided to move forward. I figured at first, the best would be one page at a time. I had 15 pages in the first chapter. Each page is between 600-700 words. 3900-4250 characters. The first two were perfectly fine. I ran twelve pages and discovered what is being reported. All but two others were unusable. I spoke with Google Billing. They are credited by account. I appreciate that. But, more importantly, I would really like to be able to use Gemini TTS.
I thought I would try maybe two paragraphs at a time. But, reading this thread, others seem to have problems with that, too. Is there any ETA for fixing this? The recordings that I did are really bad. Some, eh! They’re not horrible, but definitely not usable in a professional recording. But most are just horrific. @Raff_Silver I am more than happy to provide the recordings to you for review. It seems to have been reported over four months ago with no update.