Unstable but reproducible constrained generation errors with gemini-2.5-flash-lite-preview-06-17 using very simple prompts

Problem
We are currently running some early tests using gemini-2.5-flash-lite-preview-06-17, and running into some peculiar behavior when utilizing structured output for very simple toy examples.

Reproducible Example
We defined a simple response model as follows:

class GeminiResponse(BaseModel):
    response: str = Field(description="The response from the Gemini model")

The user prompt is similarly basic:

user_prompt = "What is the capital of france?"

When we run the above using gemini-2.5-flash-lite-preview-06-17, the response hangs for ~1+ minute(s) before returning the following error message indication a structured output generation error (note we are utilizing pydanticAI as a wrapper here, calling the VertexAI endpoints for gemini, however I believe the error seems to be on the gemini-side of things):

UnexpectedModelBehavior: Content field missing from Gemini response, body:
candidates=[Candidate(content=Content(parts=None, role=None), citation_metadata=None, finish_message='Malformed function call: call:final_result{response:<ctrl46>The capital of France is', token_count=None, finish_reason=<FinishReason.MALFORMED_FUNCTION_CALL: 'MALFORMED_FUNCTION_CALL'>, url_context_metadata=None, avg_logprobs=None, grounding_metadata=None, index=None, logprobs_result=None, safety_ratings=None)] create_time=datetime.datetime(2025, 6, 20, 19, 1, 46, 250104, tzinfo=TzInfo(UTC)) response_id='GrBVaPihD-ijgLUPxIOokQs' model_version='gemini-2.5-flash-lite-preview-06-17' prompt_feedback=None usage_metadata=GenerateContentResponseUsageMetadata(cache_tokens_details=None, cached_content_token_count=None, candidates_token_count=None, candidates_tokens_details=None, prompt_token_count=27, prompt_tokens_details=[ModalityTokenCount(modality=<MediaModality.TEXT: 'TEXT'>, token_count=27)], thoughts_token_count=None, tool_use_prompt_token_count=None, tool_use_prompt_tokens_details=None, total_token_count=27, traffic_type=<TrafficType.ON_DEMAND: 'ON_DEMAND'>) automatic_function_calling_history=[] parsed=None

What is peculiar about this error is that it seems to be replicable on my end, and notable dissapears under the following conditions:

  1. Capitalizing ‘h’ in ‘What’ in the prompt to change it to “WHat is the capital of france?”
  2. Switching to “gemini-2.0-flash”
  3. Removed structured output generation as a requirement of the answer.

Upon doing so we recieve the expected structured output (shown as text):

'{"response":"The capital of France is Paris."}'

Question
In the meantime we are continuing to test and compare 2.5-flash-lite to 2.0-flash, but I was curious if there is any guidance on why such errors might be happening, even under simple circumstances, and/or what rules can be followed in more complex prompt design to avoid them, or if this is just a general bug?

4 Likes

Hello! Welcome to the Forum!

You mentioned that simply changing one letter in the prompt makes the error disappear. I was just wondering, did you happen to make any other changes besides that one letter?

We are having the same problems when using structured outputs with gemini-2.5-flash-lite - it produces artifacts like: <ctrl46>, <ctrl95>, etc. instead of quotation marks. Also it sometimes fails to add a quotation mark at all.

Also experiencing similar issues with gemini-2.5-flash-lite as well yeah. Mostly just spam in the message content parts
This is with API requests directly
I’ve shortened the thought blocks and removed some from the middle, but it’s essentially this pattern (including the double thought at the end)

"data": {
   "candidates": [
    {
     "content": {
      "role": "model",
      "parts": [
       {
        "text": "<ctrl46>"
       },
       {
        "text": "Okay, let me think through this. The user wants to know two...",
        "thought": true
       },
       {
        "text": "<ctrl46>"
       },
       {
        "text": "Okay, I understand. You're asking about your spending...",
        "thought": true
       },
       {
        "text": "<ctrl46>"
       },
       {
        "text": "Okay, I understand the request. I need to figure out my daily...",
        "thought": true
       },
       {
        "text": "<ctrl46>"
       },
       {
        "text": "I understand you're looking for a breakdown of...",
        "thought": true
       },
       {
        "text": "It looks like you're asking about...",
        "thought": true
       }
      ]
     },
     "finishReason": "STOP",
     "avgLogprobs": -209.95114135742188
    }
   ],
   "usageMetadata": {
    "promptTokenCount": 3735,
    "candidatesTokenCount": 8,
    "totalTokenCount": 7260,
    "trafficType": "ON_DEMAND",
    "promptTokensDetails": [
     {
      "modality": "TEXT",
      "tokenCount": 3735
     }
    ],
    "candidatesTokensDetails": [
     {
      "modality": "TEXT",
      "tokenCount": 8
     }
    ],
    "thoughtsTokenCount": 3517
   },
   "modelVersion": "gemini-2.5-flash-lite",
   "createTime": "2025-07-30T14:32:48.897953Z",
   "responseId": "EC2KaKHnNtqp2PgPhcbJiAM"
  }

Curiously, there are 8 candidatesTokenCount, and there were 8 <ctrl46>'s