Gemini 3.1 Flash TTS SSE sometimes returns exactly 20s / 1,280,000 base64 chars and truncated audio

Soro · May 14, 2026, 11:21am

Hi,

I’m using Gemini TTS with the SSE streaming endpoint:

POST https://generativelanguage.googleapis.com/v1beta/models/gemini-3.1-flash-tts-preview:streamGenerateContent?alt=sse

Request shape is roughly:

{
  "contents": [
    {
      "role": "user",
      "parts": [
        {
          "text": "Hebrew text prompt here..."
        }
      ]
    }
  ],
  "generationConfig": {
    "responseModalities": ["AUDIO"],
    "speechConfig": {
      "voiceConfig": {
        "prebuiltVoiceConfig": {
          "voiceName": "Kore"
        }
      }
    }
  }
}

Most calls work correctly, but sometimes I get HTTP 200 and one audio chunk only. The audio is valid, but it is truncated.

Example response, API key removed:

endpoint: streamGenerateContent?alt=sse
url: https://generativelanguage.googleapis.com/v1beta/models/gemini-3.1-flash-tts-preview:streamGenerateContent?alt=sse&key=<REDACTED>
http_status: 200

Response headers:
{
  "Content-Type": "text/event-stream",
  "Content-Disposition": "attachment",
  "Vary": "Origin, X-Origin, Referer",
  "Transfer-Encoding": "chunked",
  "Date": "Thu, 14 May 2026 10:52:34 GMT",
  "Server": "scaffolding on HTTPServer2",
  "X-XSS-Protection": "0",
  "X-Frame-Options": "SAMEORIGIN",
  "X-Content-Type-Options": "nosniff",
  "Server-Timing": "gfet4t7; dur=7309",
  "Alt-Svc": "h3=\":443\"; ma=2592000,h3-29=\":443\"; ma=2592000"
}

SSE data:
data: {"candidates":[{"content":{"parts":[{"inlineData":{"mimeType":"audio/l16; rate=24000; channels=1","data":"<BASE64_AUDIO_REDACTED chars=1280000>"}}],"role":"model"},"index":0}],"usageMetadata":{"promptTokenCount":217,"candidatesTokenCount":640,"totalTokenCount":857,"promptTokensDetails":[{"modality":"TEXT","tokenCount":217}],"candidatesTokensDetails":[{"modality":"AUDIO","tokenCount":640}],"serviceTier":"standard"},"modelVersion":"gemini-3.1-flash-tts-preview","responseId":"a6kFauDlAozunsEPzaPjkAw"}

data: {"candidates":[{"content":{},"finishReason":"OTHER","index":0}],"usageMetadata":{"promptTokenCount":217,"candidatesTokenCount":640,"totalTokenCount":857,"promptTokensDetails":[{"modality":"TEXT","tokenCount":217}],"candidatesTokensDetails":[{"modality":"AUDIO","tokenCount":640}],"serviceTier":"standard"},"modelVersion":"gemini-3.1-flash-tts-preview","responseId":"a6kFauDlAozunsEPzaPjkAw"}

The suspicious pattern is:

inlineData.data base64 length = 1,280,000 chars
decoded PCM bytes ≈ 960,000
audio format = audio/l16; rate=24000; channels=1
duration = 960,000 / (24,000 * 2) = exactly 20.00 seconds
candidatesTokenCount = 640
audio_chunks = 1
finishReason = OTHER

When this happens and there is no second inlineData.data event after it, the generated audio is usually cut off around 20 seconds. If I retry the exact same request, I often get a longer response with two audio chunks and finishReason: STOP, for example around 29–31 seconds, and the full text is spoken.

Questions:

Is 1,280,000 base64 chars / 20.00 seconds / 640 audio tokens a known internal chunk limit or partial-response behavior for Gemini TTS SSE?
Is finishReason: OTHER expected for successful TTS streams, or should it be treated as a retry signal when the response has only one 20-second chunk?
Is there a recommended way to detect this reliably without running speech-to-text verification on the returned audio?
Should the client retry automatically when it sees this exact pattern?

Currently I treat this as suspicious and verify the audio transcript before accepting it, but I’d like to know whether there is an official signal or recommended client behavior for this case.

Thanks!

Topic		Replies	Views
Gemini-3.1-flash-tts-preview: streamGenerateContent truncates audio + finishReason: OTHER past ~60s, while generateContent (non-streaming) works Gemini API audio	2	165	July 9, 2026
Gemini 2.5 Flash TTS streaming? Gemini API api , audio	12	1422	February 25, 2026
Gemini TTS preview returns HTTP 200 with usageMetadata but no audio payload Gemini API bug , models	0	85	May 19, 2026
Gemini TTS Multi-Speaker Mode: 7 Critical Bugs After 3 Weeks in Production (finishReason 'OTHER', Truncation, Voice Swapping, Hallucinated Lines) Gemini API gemini	7	518	July 5, 2026
2.5 flash audio native - output broken in DE Gemini API models	8	601	October 18, 2025

Gemini 3.1 Flash TTS SSE sometimes returns exactly 20s / 1,280,000 base64 chars and truncated audio

Related topics