I am calling `gemini-3.1-flash-tts-preview` for Hebrew TTS generation.
The API returns HTTP 200 and reports generated AUDIO tokens, but the candidate content is empty and the finishReason is `OTHER`.
This makes the request look successful at the HTTP level, but there is no usable audio/content in the response.
Model
gemini-3.1-flash-tts-preview
Issue
The response contains:
- HTTP status: `200`
- `candidates[0].content`: `{}`
- `finishReason`: `OTHER`
- `candidatesTokensDetails`: includes `AUDIO`
- `modelVersion`: `gemini-3.1-flash-tts-preview`
Prompt
I’m sending a Hebrew bedtime-story TTS prompt with voice/performance instructions:
# AUDIO PROFILE: A
## "The Gentle"
## THE SCENE: A Quiet Bedroom at Night
### DIRECTOR'S NOTES
Style: Soft, intimate, warm Hebrew bedtime storytelling. Gentle vocal smile, never theatrical.
Pace: Slow and calm bedtime pace, with natural pauses between sentences and paragraphs.
Accent: Natural Israeli Hebrew.
Volume: Soft and close, like a parent reading beside a child's bed.
Performance: Subtle, safe, soothing, gradually sleepier toward the end.
#### TRANSCRIPT
[softly] [medium pause]
בערב שקט אחד, כשהשמיים נצבעו בכחול עמוק והכוכבים התחילו לנצנץ,
ישב יואב ליד החלון והביט החוצה בשקט.
על המדף לידו עמד פנס קטן,
והאור שלו רעד בעדינות על הקיר,
כאילו גם הוא מתכונן להירדם.
[curiosity]
פתאום הבחין יואב באור קטן ורחוק,
אור חלש שנדלק וכבה ליד קצה השביל.
Actual response
{
"candidates": [
{
"content": {},
"finishReason": "OTHER",
"index": 0
}
],
"usageMetadata": {
"promptTokenCount": 258,
"candidatesTokenCount": 1385,
"totalTokenCount": 1643,
"promptTokensDetails": [
{
"modality": "TEXT",
"tokenCount": 258
}
],
"candidatesTokensDetails": [
{
"modality": "AUDIO",
"tokenCount": 1385
}
],
"serviceTier": "standard"
},
"modelVersion": "gemini-3.1-flash-tts-preview",
"responseId": "SuMAarXfJMSokdUPxd2PmA0"
}
Expected behavior
The API should return usable audio content, or return a clear error / safety / validation message explaining why no audio content was returned.
Questions
-
What does
finishReason: OTHERmean in this TTS context? -
Why are AUDIO tokens counted if
contentis empty? -
Is Hebrew TTS fully supported for
gemini-3.1-flash-tts-preview? -
Are bracketed performance tags like
[softly],[medium pause], and[curiosity]supported, ignored, or likely to cause empty output? -
Is there a recommended prompt format for Hebrew long-form TTS with style instructions?
Additional data point: successful response example
I also have a successful response from the same TTS flow, where the API returns usable audio content as expected.
In the successful case, the response includes:
{
"candidates": [
{
"content": {
"parts": [
{
"inlineData": {
"mimeType": "audio/L16; rate=24000; channels=1",
"data": "AAAA..."
}
}
],
"role": "model"
},
"finishReason": "STOP",
"index": 0
}
],
"usageMetadata": {
"promptTokenCount": 315,
"candidatesTokenCount": 2572,
"totalTokenCount": 2887,
"promptTokensDetails": [
{
"modality": "TEXT",
"tokenCount": 315
}
],
"candidatesTokensDetails": [
{
"modality": "AUDIO",
"tokenCount": 2572
}
],
"serviceTier": "standard"
},
"modelVersion": "gemini-3.1-flash-tts-preview",
"responseId": ".........."
}
So the expected response shape for my use case is:
content.parts[0].inlineData
mimeType: audio/L16; rate=24000; channels=1
non-empty base64 audio data
finishReason: STOP
The problematic case instead returns:
HTTP 200
candidatesTokensDetails with AUDIO
but content: {}
and finishReason: OTHER