Periodically, when generating audio files from a small text (e.g., 500 characters), the resulting audio is approximately 10 minutes and 55 seconds long, where after a few spoken words there is silence. When requesting again, normal audio is produced. I tried using Python, Ruby, and cURL. As I understand it, the duration of 10 minutes and 55 seconds corresponds to the 16,000 token limit, after which Google forcibly terminates the generation. Has anyone encountered this?
Hi @user2794,
Welcome to the Google AI Forum!
![]()
Thanks for reporting the issue.
Can you share steps to reproduce with the input prompts and files information..
It helps us to reproduce the issue and report accordingly.
Hello, here is the request I’m sending to Gemini: https://pastebin.com/26jWahmW (text ~750 characters)
And this is the response I’m getting from Gemini: https://drive.google.com/file/d/1E3CTbhsHcmDl2GPI-YWygIx6kpKV2hac/view?usp=sharing (it’s too large for Pastebin).
The resulting audio file is 10 minutes and 55 seconds long, but the sound disappears after 17 seconds, followed by silence.
Yes this is a persistent problem. I actually switched to elevenLabs because I couldn’t solve it and it was driving me crazy. 10 minutes and 55 seconds regularly and that at other times the exact same code, same script same text would work so it doesn’t seem like it has anything to do with the code or the input text
Is there any update on this issue? I’m having the same problem with gemini-2.5-pro-preview-tts. I wonder if all those minutes of silence are billed as useful output tokens…
Not sure but seems like my issue is related. I am using GenerateContentStream API in Go lang and logging the token UsageMetadata that I get when the stream is complete. I see my text input is 342 token and output is approx 2500 tokens which is just over a minute of audio and as expected. About a week ago on Nov 13, I see there is a charge of $135 for 9.7M output tokens ! I am on free trial so nothing was actually billed but this is concerning. I was testing things and could not have ran the test more than 30-50 times so no more than 100-125k output tokens. That’s 100x+ charged. Since I am streaming, the API ends and there is no additional silence at the end but looks like internally GCP is still charging for hidden tokens
Hi @user2794 ,
The audio gets generated successfully with the provided prompt on my end.
To help troubleshoot, could you please share the relevant code snippet you are using?
I encountered the same silence issue in both gemini-2.5-pro-preview-tts and gemini-2.5-flash-preview-tts, where the audio starts normally and then becomes silent.
After testing, the issue stopped once I removed my custom temperature value (returned to default) and controlled output only using topK / topP. With that setup, audio generation has been stable on both models.
This suggests the silence behavior may be triggered specifically when a non-default temperature is used.