[BUG] gemini-3.1-flash-tts-preview returns HTTP 200 with empty content and finishReason OTHER for Hebrew TTS prompt

I am calling gemini-3.1-flash-tts-preview for Hebrew TTS generation.
The API returns HTTP 200 and reports generated AUDIO tokens, but the candidate content is empty and the finishReason is `OTHER`.

This makes the request look successful at the HTTP level, but there is no usable audio/content in the response.

Model

gemini-3.1-flash-tts-preview

Issue

The response contains:

  • HTTP status: `200`
  • `candidates[0].content`: `{}`
  • `finishReason`: `OTHER`
  • `candidatesTokensDetails`: includes `AUDIO`
  • `modelVersion`: `gemini-3.1-flash-tts-preview`

Prompt

I’m sending a Hebrew bedtime-story TTS prompt with voice/performance instructions:


# AUDIO PROFILE: Nira
## "The Gentle Bedtime Storyteller"

## THE SCENE: A Quiet Bedroom at Night
It is bedtime in a calm, dim room. The listener is a child who is safe, comfortable, and slowly getting sleepy. The story should feel warm, protective, and peaceful.

### DIRECTOR'S NOTES
Style: Soft, intimate, warm Hebrew bedtime storytelling. Gentle vocal smile, never theatrical.
Pace: Slow and calm bedtime pace, with natural pauses between sentences and paragraphs.
Accent: Natural Israeli Hebrew.


#### TRANSCRIPT

[softly] [medium pause]
בערב שקט אחד, כשהשמיים נצבעו בכחול עמוק והכוכבים התחילו לנצנץ,
ישב יואב ליד החלון והביט החוצה בשקט.

על המדף לידו עמד פנס קטן,
והאור שלו רעד בעדינות על הקיר,
כאילו גם הוא מתכונן להירדם.

[curiosity]
פתאום הבחין יואב באור קטן ורחוק,
אור חלש שנדלק וכבה ליד קצה השביל.


Actual response

HTTP status: 200

Headers:
{
  "X-Gemini-Service-Tier": "standard",
  "Content-Type": "application/json; charset=UTF-8",
  "Vary": "Origin, X-Origin, Referer",
  "Content-Encoding": "gzip",
  "Date": "Mon, 11 May 2026 10:41:03 GMT",
  "Server": "scaffolding on HTTPServer2",
  "X-XSS-Protection": "0",
  "X-Frame-Options": "SAMEORIGIN",
  "X-Content-Type-Options": "nosniff",
  "Server-Timing": "gfet4t7; dur=20393",
  "Alt-Svc": "h3=\":443\"; ma=2592000,h3-29=\":443\"; ma=2592000",
  "Transfer-Encoding": "chunked"
}

Body:
{
  "candidates": [
    {
      "content": {},
      "finishReason": "OTHER",
      "index": 0
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 267,
    "candidatesTokenCount": 1199,
    "totalTokenCount": 1466,
    "promptTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 267
      }
    ],
    "candidatesTokensDetails": [
      {
        "modality": "AUDIO",
        "tokenCount": 1199
      }
    ],
    "serviceTier": "standard"
  },
  "modelVersion": "gemini-3.1-flash-tts-preview",
  "responseId": "KrIBaufXPOu0kdUP9KSwmQk"
}

Another example:

# AUDIO PROFILE: Nira
## "The Gentle Bedtime Storyteller"

## THE SCENE: A Quiet Bedroom at Night
It is bedtime in a calm, dim room. The listener is a child who is safe, comfortable, and slowly getting sleepy. The story should feel warm, protective, and peaceful.

### DIRECTOR'S NOTES
Style: Soft, intimate, warm Hebrew bedtime storytelling. Gentle vocal smile, never theatrical.
Pace: Slow and calm bedtime pace, with natural pauses between sentences and paragraphs.
Accent: Natural Israeli Hebrew.


#### TRANSCRIPT

[softly] [medium pause]
בערב שקט אחד, כשהשמיים נצבעו בכחול עמוק והכוכבים התחילו לנצנץ,
ישב יואב ליד החלון והביט החוצה בשקט.

על המדף לידו עמד פנס קטן,
והאור שלו רעד בעדינות על הקיר,
כאילו גם הוא מתכונן להירדם.

Response:

HTTP status: 200

Headers:
{
  "X-Gemini-Service-Tier": "standard",
  "Content-Type": "application/json; charset=UTF-8",
  "Vary": "Origin, X-Origin, Referer",
  "Content-Encoding": "gzip",
  "Date": "Mon, 11 May 2026 10:43:51 GMT",
  "Server": "scaffolding on HTTPServer2",
  "X-XSS-Protection": "0",
  "X-Frame-Options": "SAMEORIGIN",
  "X-Content-Type-Options": "nosniff",
  "Server-Timing": "gfet4t7; dur=16086",
  "Alt-Svc": "h3=\":443\"; ma=2592000,h3-29=\":443\"; ma=2592000",
  "Transfer-Encoding": "chunked"
}

Body:
{
  "candidates": [
    {
      "content": {},
      "finishReason": "OTHER",
      "index": 0
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 225,
    "candidatesTokenCount": 953,
    "totalTokenCount": 1178,
    "promptTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 225
      }
    ],
    "candidatesTokensDetails": [
      {
        "modality": "AUDIO",
        "tokenCount": 953
      }
    ],
    "serviceTier": "standard"
  },
  "modelVersion": "gemini-3.1-flash-tts-preview",
  "responseId": "17IBav3ULOj7xN8Pv92J2Ac"
}

Expected behavior

The API should return usable audio content, or return a clear error / safety / validation message explaining why no audio content was returned.

Questions

  1. What does finishReason: OTHER mean in this TTS context?

  2. Why are AUDIO tokens counted if content is empty?

  3. How can I solve it ?

Additional data point: successful response example

I also have a successful response from the same TTS flow, where the API returns usable audio content as expected.

In the successful case, the response includes:

{
  "candidates": [
    {
      "content": {
        "parts": [
          {
            "inlineData": {
              "mimeType": "audio/L16; rate=24000; channels=1",
              "data": "AAAA..."
            }
          }
        ],
        "role": "model"
      },
      "finishReason": "STOP",
      "index": 0
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 315,
    "candidatesTokenCount": 2572,
    "totalTokenCount": 2887,
    "promptTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 315
      }
    ],
    "candidatesTokensDetails": [
      {
        "modality": "AUDIO",
        "tokenCount": 2572
      }
    ],
    "serviceTier": "standard"
  },
  "modelVersion": "gemini-3.1-flash-tts-preview",
  "responseId": ".........."
}

So the expected response shape for my use case is:

content.parts[0].inlineData
mimeType: audio/L16; rate=24000; channels=1
non-empty base64 audio data
finishReason: STOP

The problematic case instead returns:

HTTP 200
candidatesTokensDetails with AUDIO
but content: {}
and finishReason: OTHER

Additional reproducible example using curl.

Curl cuommand

$ curl -i    "https://generativelanguage.googleapis.com/v1beta/models/gemini-3.1-flash-tts-preview:generateContent?key=$GEMINI_API_KEY"   -H "Content-Type: application/json"   -d "$(jq -Rs '{
    contents: [
      {
        parts: [
          {
            text: .
          }
        ]
      }
    ],
    generationConfig: {
      responseModalities: ["AUDIO"],
      speechConfig: {
        voiceConfig: {
          prebuiltVoiceConfig: {
            voiceName: "Kore"
          }
        }
      }
    }
  }' story.txt)"


1

story.txt


$ cat story.txt 
# AUDIO PROFILE: Nira

## THE SCENE: A Quiet Bedroom at Night
It is bedtime in a calm, dim room.

### DIRECTOR'S NOTES
Style: Soft, intimate, warm Hebrew bedtime storytelling. Gentle vocal smile, never theatrical.
Accent: Natural Israeli Hebrew.


#### TRANSCRIPT

[softly] [medium pause]
ישב יואב ליד החלון והביט החוצה בשקט.


Response

HTTP/2 200 
x-gemini-service-tier: standard
content-type: application/json; charset=UTF-8
vary: X-Origin
vary: Referer
vary: Origin,Accept-Encoding
date: Wed, 13 May 2026 09:03:31 GMT
server: scaffolding on HTTPServer2
x-xss-protection: 0
x-frame-options: SAMEORIGIN
x-content-type-options: nosniff
server-timing: gfet4t7; dur=6207
alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000
accept-ranges: none

{
  "candidates": [
    {
      "content": {},
      "finishReason": "OTHER",
      "index": 0
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 94,
    "candidatesTokenCount": 218,
    "totalTokenCount": 312,
    "promptTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 94
      }
    ],
    "candidatesTokensDetails": [
      {
        "modality": "AUDIO",
        "tokenCount": 218
      }
    ],
    "serviceTier": "standard"
  },
  "modelVersion": "gemini-3.1-flash-tts-preview",
  "responseId": "XT4EasXaD8PxnsEPkMne2Ak"
}


2

story.txt


$ cat story.txt 
# AUDIO PROFILE: Nira

## THE SCENE: A Quiet Bedroom at Night
It is bedtime in a calm, dim room.

### DIRECTOR'S NOTES
Style: Soft, intimate, warm Hebrew bedtime storytelling. Gentle vocal smile, never theatrical.
Accent: Natural Israeli Hebrew.


#### TRANSCRIPT

[softly] [medium pause]
בערב שקט אחד, כשהשמיים נצבעו בכחול עמוק והכוכבים התחילו לנצנץ,
ישב יואב ליד החלון והביט החוצה בשקט.



Response

HTTP/2 200 
x-gemini-service-tier: standard
content-type: application/json; charset=UTF-8
vary: X-Origin
vary: Referer
vary: Origin,Accept-Encoding
date: Wed, 13 May 2026 09:01:50 GMT
server: scaffolding on HTTPServer2
x-xss-protection: 0
x-frame-options: SAMEORIGIN
x-content-type-options: nosniff
server-timing: gfet4t7; dur=10901
alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000
accept-ranges: none

{
  "candidates": [
    {
      "content": {},
      "finishReason": "OTHER",
      "index": 0
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 128,
    "candidatesTokenCount": 470,
    "totalTokenCount": 598,
    "promptTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 128
      }
    ],
    "candidatesTokensDetails": [
      {
        "modality": "AUDIO",
        "tokenCount": 470
      }
    ],
    "serviceTier": "standard"
  },
  "modelVersion": "gemini-3.1-flash-tts-preview",
  "responseId": "8z0EapyaGd_0kdUPmPqQwQU"
}


3

story.txt


$ cat story.txt 
# AUDIO PROFILE: Nira
## "The Gentle Bedtime Storyteller"

## THE SCENE: A Quiet Bedroom at Night
It is bedtime in a calm, dim room.

### DIRECTOR'S NOTES
Style: Soft, intimate, warm Hebrew bedtime storytelling. Gentle vocal smile, never theatrical.
Accent: Natural Israeli Hebrew.


#### TRANSCRIPT

[softly] [medium pause]
בערב שקט אחד, כשהשמיים נצבעו בכחול עמוק והכוכבים התחילו לנצנץ,
ישב יואב ליד החלון והביט החוצה בשקט.



Response

HTTP/2 200 
x-gemini-service-tier: standard
content-type: application/json; charset=UTF-8
vary: X-Origin
vary: Referer
vary: Origin,Accept-Encoding
date: Wed, 13 May 2026 08:31:40 GMT
server: scaffolding on HTTPServer2
x-xss-protection: 0
x-frame-options: SAMEORIGIN
x-content-type-options: nosniff
server-timing: gfet4t7; dur=9523
alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000
accept-ranges: none

{
  "candidates": [
    {
      "content": {},
      "finishReason": "OTHER",
      "index": 0
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 138,
    "candidatesTokenCount": 489,
    "totalTokenCount": 627,
    "promptTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 138
      }
    ],
    "candidatesTokensDetails": [
      {
        "modality": "AUDIO",
        "tokenCount": 489
      }
    ],
    "serviceTier": "standard"
  },
  "modelVersion": "gemini-3.1-flash-tts-preview",
  "responseId": "4zYEar2kH9rYkdUP9KiSuAw"
}

4

story.txt


$ cat story.txt 
# AUDIO PROFILE: Nira
## "The Gentle Bedtime Storyteller"

## THE SCENE: A Quiet Bedroom at Night
It is bedtime in a calm, dim room. The listener is a child who is safe, comfortable, and slowly getting sleepy. The story should feel warm, protective, and peaceful.

### DIRECTOR'S NOTES
Style: Soft, intimate, warm Hebrew bedtime storytelling. Gentle vocal smile, never theatrical.



#### TRANSCRIPT

[softly] [medium pause]
בערב שקט אחד, כשהשמיים נצבעו בכחול עמוק והכוכבים התחילו לנצנץ,
ישב יואב ליד החלון והביט החוצה בשקט.


Response

HTTP/2 200 
x-gemini-service-tier: standard
content-type: application/json; charset=UTF-8
vary: X-Origin
vary: Referer
vary: Origin,Accept-Encoding
date: Wed, 13 May 2026 07:06:15 GMT
server: scaffolding on HTTPServer2
x-xss-protection: 0
x-frame-options: SAMEORIGIN
x-content-type-options: nosniff
server-timing: gfet4t7; dur=10287
alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000
accept-ranges: none

{
  "candidates": [
    {
      "content": {},
      "finishReason": "OTHER",
      "index": 0
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 158,
    "candidatesTokenCount": 544,
    "totalTokenCount": 702,
    "promptTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 158
      }
    ],
    "candidatesTokensDetails": [
      {
        "modality": "AUDIO",
        "tokenCount": 544
      }
    ],
    "serviceTier": "standard"
  },
  "modelVersion": "gemini-3.1-flash-tts-preview",
  "responseId": "3SIEao3tBfTk7M8PlLCYuAo"
}

5

story.txt


$ cat story.txt 
# AUDIO PROFILE: Nira
## "The Gentle Bedtime Storyteller"

## THE SCENE: A Quiet Bedroom at Night
It is bedtime in a calm, dim room. The listener is a child who is safe, comfortable, and slowly getting sleepy. The story should feel warm, protective, and peaceful.

### DIRECTOR'S NOTES
Style: Soft, intimate, warm Hebrew bedtime storytelling. Gentle vocal smile, never theatrical.
Accent: Natural Israeli Hebrew.


#### TRANSCRIPT

[softly] [medium pause]
בערב שקט אחד, כשהשמיים נצבעו בכחול עמוק והכוכבים התחילו לנצנץ,
ישב יואב ליד החלון והביט החוצה בשקט.

Response

HTTP/2 200 
x-gemini-service-tier: standard
content-type: application/json; charset=UTF-8
vary: X-Origin
vary: Referer
vary: Origin,Accept-Encoding
date: Wed, 13 May 2026 06:08:51 GMT
server: scaffolding on HTTPServer2
x-xss-protection: 0
x-frame-options: SAMEORIGIN
x-content-type-options: nosniff
server-timing: gfet4t7; dur=10458
alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000
accept-ranges: none

{
  "candidates": [
    {
      "content": {},
      "finishReason": "OTHER",
      "index": 0
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 165,
    "candidatesTokenCount": 548,
    "totalTokenCount": 713,
    "promptTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 165
      }
    ],
    "candidatesTokensDetails": [
      {
        "modality": "AUDIO",
        "tokenCount": 548
      }
    ],
    "serviceTier": "standard"
  },
  "modelVersion": "gemini-3.1-flash-tts-preview",
  "responseId": "aRUEau2aDaSkkdUPmYycuAU"
}

6

story.txt


$ cat story.txt 
# AUDIO PROFILE: Nira
## "The Gentle Bedtime Storyteller"

## THE SCENE: A Quiet Bedroom at Night
It is bedtime in a calm, dim room. The listener is a child who is safe, comfortable, and slowly getting sleepy.

### DIRECTOR'S NOTES
Style: Soft, intimate, warm Hebrew bedtime storytelling. Gentle vocal smile, never theatrical.
Accent: Natural Israeli Hebrew.


#### TRANSCRIPT

[softly] [medium pause]
בערב שקט אחד, כשהשמיים נצבעו בכחול עמוק והכוכבים התחילו לנצנץ,
ישב יואב ליד החלון והביט החוצה בשקט.

Response

HTTP/2 200 
x-gemini-service-tier: standard
content-type: application/json; charset=UTF-8
vary: X-Origin
vary: Referer
vary: Origin,Accept-Encoding
date: Wed, 13 May 2026 06:14:14 GMT
server: scaffolding on HTTPServer2
x-xss-protection: 0
x-frame-options: SAMEORIGIN
x-content-type-options: nosniff
server-timing: gfet4t7; dur=11296
alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000
accept-ranges: none

{
  "candidates": [
    {
      "content": {},
      "finishReason": "OTHER",
      "index": 0
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 154,
    "candidatesTokenCount": 544,
    "totalTokenCount": 698,
    "promptTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 154
      }
    ],
    "candidatesTokensDetails": [
      {
        "modality": "AUDIO",
        "tokenCount": 544
      }
    ],
    "serviceTier": "standard"
  },
  "modelVersion": "gemini-3.1-flash-tts-preview",
  "responseId": "qxYEav6VCPTk7M8PibCYuAo"
}