Unstable but reproducible constrained generation errors with gemini-2.5-flash-lite-preview-06-17 using very simple prompts

dchary · June 20, 2025, 7:25pm

Problem
We are currently running some early tests using gemini-2.5-flash-lite-preview-06-17, and running into some peculiar behavior when utilizing structured output for very simple toy examples.

Reproducible Example
We defined a simple response model as follows:

class GeminiResponse(BaseModel):
    response: str = Field(description="The response from the Gemini model")

The user prompt is similarly basic:

user_prompt = "What is the capital of france?"

When we run the above using gemini-2.5-flash-lite-preview-06-17, the response hangs for ~1+ minute(s) before returning the following error message indication a structured output generation error (note we are utilizing pydanticAI as a wrapper here, calling the VertexAI endpoints for gemini, however I believe the error seems to be on the gemini-side of things):

UnexpectedModelBehavior: Content field missing from Gemini response, body:
candidates=[Candidate(content=Content(parts=None, role=None), citation_metadata=None, finish_message='Malformed function call: call:final_result{response:<ctrl46>The capital of France is', token_count=None, finish_reason=<FinishReason.MALFORMED_FUNCTION_CALL: 'MALFORMED_FUNCTION_CALL'>, url_context_metadata=None, avg_logprobs=None, grounding_metadata=None, index=None, logprobs_result=None, safety_ratings=None)] create_time=datetime.datetime(2025, 6, 20, 19, 1, 46, 250104, tzinfo=TzInfo(UTC)) response_id='GrBVaPihD-ijgLUPxIOokQs' model_version='gemini-2.5-flash-lite-preview-06-17' prompt_feedback=None usage_metadata=GenerateContentResponseUsageMetadata(cache_tokens_details=None, cached_content_token_count=None, candidates_token_count=None, candidates_tokens_details=None, prompt_token_count=27, prompt_tokens_details=[ModalityTokenCount(modality=<MediaModality.TEXT: 'TEXT'>, token_count=27)], thoughts_token_count=None, tool_use_prompt_token_count=None, tool_use_prompt_tokens_details=None, total_token_count=27, traffic_type=<TrafficType.ON_DEMAND: 'ON_DEMAND'>) automatic_function_calling_history=[] parsed=None

What is peculiar about this error is that it seems to be replicable on my end, and notable dissapears under the following conditions:

Capitalizing ‘h’ in ‘What’ in the prompt to change it to “WHat is the capital of france?”
Switching to “gemini-2.0-flash”
Removed structured output generation as a requirement of the answer.

Upon doing so we recieve the expected structured output (shown as text):

'{"response":"The capital of France is Paris."}'

Question
In the meantime we are continuing to test and compare 2.5-flash-lite to 2.0-flash, but I was curious if there is any guidance on why such errors might be happening, even under simple circumstances, and/or what rules can be followed in more complex prompt design to avoid them, or if this is just a general bug?

Lalit_Kumar · June 23, 2025, 9:23am

Hello! Welcome to the Forum!

You mentioned that simply changing one letter in the prompt makes the error disappear. I was just wondering, did you happen to make any other changes besides that one letter?

Piotr_Gloger · July 28, 2025, 2:44pm

We are having the same problems when using structured outputs with gemini-2.5-flash-lite - it produces artifacts like: <ctrl46>, <ctrl95>, etc. instead of quotation marks. Also it sometimes fails to add a quotation mark at all.

Wazbat · July 30, 2025, 4:41pm

Also experiencing similar issues with gemini-2.5-flash-lite as well yeah. Mostly just spam in the message content parts
This is with API requests directly
I’ve shortened the thought blocks and removed some from the middle, but it’s essentially this pattern (including the double thought at the end)

"data": {
   "candidates": [
    {
     "content": {
      "role": "model",
      "parts": [
       {
        "text": "<ctrl46>"
       },
       {
        "text": "Okay, let me think through this. The user wants to know two...",
        "thought": true
       },
       {
        "text": "<ctrl46>"
       },
       {
        "text": "Okay, I understand. You're asking about your spending...",
        "thought": true
       },
       {
        "text": "<ctrl46>"
       },
       {
        "text": "Okay, I understand the request. I need to figure out my daily...",
        "thought": true
       },
       {
        "text": "<ctrl46>"
       },
       {
        "text": "I understand you're looking for a breakdown of...",
        "thought": true
       },
       {
        "text": "It looks like you're asking about...",
        "thought": true
       }
      ]
     },
     "finishReason": "STOP",
     "avgLogprobs": -209.95114135742188
    }
   ],
   "usageMetadata": {
    "promptTokenCount": 3735,
    "candidatesTokenCount": 8,
    "totalTokenCount": 7260,
    "trafficType": "ON_DEMAND",
    "promptTokensDetails": [
     {
      "modality": "TEXT",
      "tokenCount": 3735
     }
    ],
    "candidatesTokensDetails": [
     {
      "modality": "TEXT",
      "tokenCount": 8
     }
    ],
    "thoughtsTokenCount": 3517
   },
   "modelVersion": "gemini-2.5-flash-lite",
   "createTime": "2025-07-30T14:32:48.897953Z",
   "responseId": "EC2KaKHnNtqp2PgPhcbJiAM"
  }

Curiously, there are 8 candidatesTokenCount, and there were 8 <ctrl46>'s

Topic		Replies	Views
Gemini-2.5-flash-lite produces incorrect structured output Gemini API api-key , gemini-flash-2-5	0	38	September 5, 2025
Gemini 2.0 Flash fails to generate a structured output Gemini API bug , api , vertexai , gemini-20	3	350	July 30, 2025
Random Endless \n Output in Gemini API 1.5 Pro Responses Gemini API gemini-15 , model	16	1083	August 8, 2025
The response of gemini-2.5-flash does not have both candidates and finishReason frequently Gemini API gemini-flash-2-5	4	344	June 6, 2025
"finishReason" : "MAX_TOKENS" - But Text is Empty Gemini API prompt , rate-limits	12	1842	July 18, 2025

Unstable but reproducible constrained generation errors with gemini-2.5-flash-lite-preview-06-17 using very simple prompts

Related topics