Gemini 2.5 Pro Preview TTS: Inconsistent Voice and Tone Output

Anbu_Studioz · October 30, 2025, 8:17am

I’m having an issue with the Gemini 2.5 Pro Preview TTS model. When I send a single API request with the same text, selected voice (user name), and temperature, the generated audio sometimes changes in tone — and occasionally, the voice sounds slightly different too.

Shivam_Singh2 · November 19, 2025, 12:54pm

Hii @Anbu_Studioz

Thank you for bringing this to our attention.
Could you please share the full payload details along with a sample of the code that you are using? We would like to reproduce the issue.

Arthur_morgan · May 31, 2026, 9:46am

Hi @Shivam_Singh2,

I am experiencing this exact same issue with the Gemini TTS endpoints. In our application, when we send technical text inputs, the audio style processor destabilizes mid-output causing severe pitch distortion, tone dropping, and an unexpected gender flip (the male voice profile completely mutates into a female voice register).

Per Google’s official speech generation limitations documentation, this seems to be a variation of the known “Voice inconsistency with prompt instructions” bug. We’ve tested both prompt optimization and application-side text-chunking (sentence slicing), but the stateless nature of the requests causes the voice profile to randomly re-initialize its acoustic parameters across sequential chunks, creating a highly disjointed “multiverse of voices” effect.

Below are our exact system details, code implementation, and reconstructed API payload for replication.

Environment & Configuration Details

Target Model Endpoint: gemini-2.5-flash-preview-tts
Voice Preset Profile: Puck (Male)
Generation Parameters: Default system values (No explicit temperature, top_p, or top_k are defined in the config)

Reconstructed E2E API Request Payload

{
  "model": "gemini-2.5-flash-preview-tts",
  "contents": "Speak the following text naturally as speech. Follow these guidelines:\n- Language: English\n- For multilingual text (mixing English with Hindi, Punjabi, Tamil...), pronounce each word in its native language naturally\n- Ignore and skip over special characters like quotes, asterisks, hashtags...\n- Convert numbers to their word equivalents\n- Maintain natural pauses at commas and periods\n- Use appropriate intonation and emotion based on context\n\nText to speak:\nCan you describe a time when you used boundary value analysis in manual testing, and explain how it helped you identify defects or improve test coverage?",
  "config": {
    "response_modalities": ["AUDIO"],
    "speech_config": {
      "voice_config": {
        "prebuilt_voice_config": {
          "voice_name": "Puck"
        }
      }
    }
  }
}

Any insights on how to enforce voice consistency or stabilize the speaker profile across long-form/technical token payloads on the preview tier would be highly appreciated. Thank you!

Topic		Replies	Views
Inconsistent Audio Output with Gemini 2.5 Pro Preview TTS Google AI Studio ai-studio , gemini , audio	24	3120	February 20, 2026
Gemini TTS Voice consistency Gemini API gemini , gemini_25_pro , gemini-flash-2-5	0	322	February 19, 2026
Audio Output with Gemini 2.5 Pro Preview TTS is TOTALLY random Gemini API audio	2	348	November 24, 2025
Issues with gemini-tts-2.5-pro in AI Studio (blank audio, voice drift, pacing changes) Google AI Studio models , audio	1	280	May 31, 2026
Inconsistent Character Voice Output and Failure to Interpret Amharic Heteronyms Based on Context Gemini API ai-studio , api , models , gemini , audio	3	215	December 29, 2025

Gemini 2.5 Pro Preview TTS: Inconsistent Voice and Tone Output

Reconstructed E2E API Request Payload

Related topics