gemini-3.1-flash-tts-preview: streamGenerateContent truncates audio + finishReason: OTHER past ~60s, while generateContent (non-streaming) works

Summary: On gemini-3.1-flash-tts-preview, the SSE streaming endpoint (:streamGenerateContent?alt=sse) intermittently returns partial audio + finishReason: OTHER (HTTP 200) once the generation exceeds ~60s of audio. The exact same prompt through non-streaming :generateContent returns the full audio with finishReason: STOP every time. This bills AUDIO tokens for unusable output, with no error surfaced to the client.

Repro (raw REST, single-speaker, fr-FR voice “Leda”). Same request body, only the endpoint differs:

text ~audio :streamGenerateContent (3 trials) :generateContent
50 words ~20s STOP / STOP / STOP STOP
100 words ~40s STOP / STOP / STOP STOP
150 words ~57s STOP / STOP / STOP STOP
200 words ~70s OTHER / STOP / STOP STOP
250 words ~89s OTHER / STOP / STOP STOP
300 words ~106s OTHER / OTHER / OTHER STOP
350+ words ~125s OTHER / OTHER / OTHER STOP (full ~136s)

Streaming reliably truncates once the audio passes ~60-70s; non-streaming has no such cliff. Failures arrive as one/a few PCM chunks then finishReason: OTHER, HTTP 200.

The confusing part: the Gemini API TTS docs state “TTS does not support streaming” under Limitations, yet :streamGenerateContent accepts the request, returns 200, and bills AUDIO tokens, just with truncated output. What is the supported production path for long-form streaming TTS?

Environment: model gemini-3.1-flash-tts-preview; reproduced on both Vertex AI (generateContentStream) and the Gemini Developer API (streamGenerateContent?alt=sse); single-speaker; temperature omitted and 0.6 both reproduce.

Impact: production museum audio-guide product with long-form narration. We cannot ship the streaming path. Non-streaming works but a single ~136s generation takes ~77s wall-time, too slow for interactive playback. So today neither path is viable for >~1 min narration.

Related reports:

Could the Gemini API / TTS team confirm whether streaming TTS is supported and route the truncation? Happy to share full request/response payloads and responseIds.