Truncated Response Issue with Gemini 2.5 Flash Preview

Emir_Arditi · June 8, 2025, 3:54pm

We are also seeing the same error while using 2.5 flash. Randomly but very frequently, the output is cut-off in the middle. What’s weird is that, we are using structured output with function call on langchain, and the json output is complete(we can parse it, the braces are balanced and complete etc. etc.) but the response we are aiming to receive, located on a key of the json, is truncated mid-sentence. We have also implemented a recheck mechanism to re-generate if this occurs, but after the detection, when we re-generate it, it again truncates in approximately the same part. The output tokens are around 2k, input is around 12k and thinking is around 1k, so it is not a max token issue. The finish reason is “STOP”. Is there a planned update or a proposed solution by google for this? This is a very breaking condition for our system!

sanG · June 12, 2025, 8:05am

We have exactly the same issue.
The value of a key is abruptly cut. The JSON itself is intact, hence never gets detected in any validations that are based on the JSON schema.

cor · June 13, 2025, 10:23pm

I really wish someone from google would acknowledge this thread and provide an update

NonAIGuy · June 14, 2025, 9:09am

Exactly, so i doubt this is just truncation or halting because of token limits?

It build VALID json with not complete text in its fields. It is not even random for me anymore i tried change prompts. Almost 80% of the time i get truncated json fields in the same place. I really need it to be fixed because ot is genuinely unusable for me

Yasar_Arafath · June 14, 2025, 3:47pm

Its really frustrating when something wrong in the middle of response, Because its not in our control, Gemini is becomming so good with its capabilities but these kind of issues leads to use other models

NonAIGuy · June 15, 2025, 7:42pm

I found that it probably relates to structured output, because as soon as i turn it off, it starts working without issues. But of course, then we get back to the issue of often crippled JSON schema

Oscar_Hoffmann · June 18, 2025, 6:01pm

I’m experiencing the exact same issue. I’ve been trying for days with different prompt schemas, but there’s no difference. I’ve attempted to generate around 2,000 summaries of tax cases, and in approximately 15% of cases, the AI model just stops but it still leaves the JSON output intact.
The finish reason is just “STOP”
When I run the same prompts again with the same input text, it seems to stop at roughly the same point. Markdown tables are also hit and miss - in about 5% of cases, the model hangs and throws an error after 2 minutes.

I’m using the AI package from Vercel. It happens for both 2.5-flash and 2.5-pro. Less with the Pro version.

It is tasked with generating a JSON schema with a title and a summary. The input length of the cases are between 1k tokens to 25k token

Shivam_Mishra · June 19, 2025, 8:47am

Hello everyone,

Has anyone tried the model ‘gemini-2.5-flash-preview-05-20’ for JSON output (responseMimeType: 'application/json')? I’ve executed queries on this model multiple times and haven’t encountered any problems, even with long contexts.

Could you try using this above model and let me know if your issue is resolved? If you’re still experiencing truncated response with this model, kindly provide the following details:

Input prompt: file (if applicable) and the prompt.
Min Code snippet to reproduce the issue (my testing has been with the JS SDK).

Thank you for your patience.

Oscar_Hoffmann · June 19, 2025, 3:54pm

Just tried the responeMineType, and saw no difference.
My prompt is:

export const generateImprovedSummary = async (
  fullCaseText: string,
  lawItems: string[]
) => {
  const systemPrompt = `You are tasked with creating a summary of legal texts from Skatterådet, the Danish Tax Assessment Council, for a legal information website. Write clear, structured summaries in Danish using focused Markdown formatting.

AVAILABLE LAW ITEMS FOR REFERENCES:
The following is a list of law provisions that have been identified as relevant to this case. You may ONLY create /ref links for laws that appear in this list. Each item represents a specific law and paragraph that can be referenced in your summary:

${lawItems.map((item) => `- ${item}`).join('\n')}

CRITICAL FORMATTING REQUIREMENT:
NEVER START THE SUMMARY WITH ANY HEADER OR SUBHEADER (### or ####). 
- The summary must begin directly with paragraph text
- The title field serves as the main header for the article
- Subheaders (\n\n ### or \n\n ####) should be used later in the summary to organize additional sections
- Make sure the summary is comprehensive and includes all relevant information from the case text


MARKDOWN FORMATTING GUIDELINES:
- Start the summary with a direct paragraph of text - NO HEADERS
- Make effective use of \n\n ### or \n\n #### subheaders to organize different aspects of the case (facts, arguments, decisions)
- Use bullet points (-) for lists if it improves clarity
- When relevant, include tables using markdown table format
- When using tables remember to include a header row with column names

MARKDOWN TABLE EXAMPLE:
When presenting structured data (such as multiple tax years, calculations, or comparisons), use markdown tables like this:

| Skatteår | Indtægt (kr.) | Fradrag (kr.) | Skattepligtig indkomst (kr.) | Bemærkninger |
|----------|---------------|---------------|------------------------------|--------------|
| 2020     | 500.000       | 150.000       | 350.000                      | Accepteret   |
| 2021     | 600.000       | 180.000       | 420.000                      | Under behandling |
| 2022     | 550.000       | 160.000       | 390.000                      | Afvist       |

Tables should be used when they make information clearer and more organized, such as:
- Multiple years of financial data
- Comparisons between different calculations or methods
- Multiple parties with different outcomes
- Timeline of events with dates and details
REFERENCE RULES:

1. Law References (ONLY from the lawItems list above):
Format: [Full Law Name § Number, stk. X, litra Y](/ref/law_name_§_number)

CRITICAL REFERENCE LINK REQUIREMENT:
The reference link part (/ref/...) must ALWAYS use the singular form of the law name (without 's' ending):
CORRECT: /ref/selskabsskatteloven_§_13
INCORRECT: /ref/selskabsskattelovens_§_13

The displayed text can use any grammatically correct form, but the link MUST use singular form:
CORRECT: [Selskabsskattelovens § 13](/ref/selskabsskatteloven_§_13)
INCORRECT: [Selskabsskattelovens § 13](/ref/selskabsskattelovens_§_13)

This rule applies to ALL law references - never use the possessive/genitive form (-ns, -s) in the reference link part.

IMPORTANT: In reference links (/ref/...), ALWAYS include Danish letters (æ, ø, å) if they are part of the law name. DO NOT convert these to ae, oe, or aa.

REFERENCE CREATION RULES:
- You may ONLY create /ref links for laws that appear in the "AVAILABLE LAW ITEMS FOR REFERENCES" list above
- If you want to reference a law that is NOT in the list, write it as plain text without any /ref link
- Each law reference from the list should be used ONCE only at its most relevant point
- Use full law names, not abbreviations
- Create separate references for each paragraph, even from same law

2. SKM References:
- Format: [SKM2023.276.SR](https://info.skat.dk/data.aspx?oid=2387060)
- Must use info.skat.dk domain
- Example: "Som fastslået i [SKM2015.341.SR](https://info.skat.dk/data.aspx?oid=2176440) havde skatteyder mulighed for..."

3. Other References:
- Use standard markdown: [text](url)
- Example: "Dette fremgår også af [Ligningsvejledningen](https://www.example.com/link)"

Examples of Law References (only if they appear in the lawItems list):
1. "Ligningsrådet fandt, at salget var omfattet af [Ejendomsavancebeskatningsloven § 8, stk. 1](/ref/ejendomsavancebeskatningsloven_§_8), hvilket betyder, at gevinsten var skattefri."

2. "I henhold til [Selskabsskatteloven § 13, stk. 1, nr. 2](/ref/selskabsskatteloven_§_13) og [Opkrævningsloven § 4, stk. 3](/ref/opkrævningsloven_§_4) blev det fastslået at..."

Other Legal References (when NOT in lawItems list):
1. EU Directives: Use bold markdown
   Example: "i henhold til **Direktiv 2006/112/EF artikel 28**" or "jf. **Forordning (EU) nr. 282/2011**"
2. Other Legal References: Use plain text for references without actual law names (e.g., "§ 6, stk. 1, i lov nr. 569 af 24. juni 1992")
3. Laws not in the list: Plain text only (e.g., "Bekendtgørelse nr. 103 af 26/01/2024 § 15 a, stk. 3")


The text provided below comes directly from Skatterådet. Response must be in Danish.`;

  const { object } = await generateObject({
    model: google('gemini-2.5-flash-preview-05-20'),
    system: systemPrompt,
    prompt: fullCaseText,
    schema: improvedSummarySchema,
    temperature: 0,
    headers: {
      responseMimeType: 'application/json'
    }
  });
  return object;
};

And the FullCaseText and LawItems are danish cases that looks like:
“Modified by moderator”

Other previous models worked fine, 2.0 flash also worked okay, the quality of the summary was not as good as the new one but 2.0 flash did not stop in the middle of the generation

Shivam_Mishra · June 20, 2025, 12:32pm

Hi @Oscar_Hoffmann ,

Thank you for your reproduction code. Could you please share the entire code, along with your prompt, system instructions (which you’ve already provided), file and the config you are using? I’m still unable to replicate the behavior.

Oscar_Hoffmann · June 21, 2025, 11:41pm

This is the entire code I use to generate the summary. The rest of the code just a basic scraper that extract the HTML from the website i linked and converts the HTML into markdown and this is what i pass to the AI model as input alongside my systemprompt. I have checked my tracing app and i can confirm the input is as expected, a markdown version of the webpages.

Hope this helps

Shivam_Mishra · June 23, 2025, 5:47pm

@Oscar_Hoffmann , Thank you for sharing the detailed prompt. I am now able to replicate the issue with the details you’ve provided. I’ve already escalated this issue to the team, and I’ll share the repro details as well.

Yilmaz_Tandogan · June 25, 2025, 9:41am

Hi everyone,

Adding my findings to this thread as I’m facing the exact same truncation issue, but with gemini-2.5-pro and specifically when using it via the LangChain wrapper (langchain-google-genai) .

My use case involves submitting a complex network diagram for visual analysis. The expected full text response is ~1400 tokens, but it’s consistently being truncated at a much lower, seemingly default limit.

Just like others in this thread, I’ve confirmed this is not a client-side timeout or a simple max_tokens setting error . The issue appears to be how the parameter is being processed or passed to the API.

Here’s a summary of my debugging, which might help the dev team narrow down the cause:

1. Failure within the LangChain Wrapper:
I’ve tested every possible way to set the token limit through ChatGoogleGenerativeAI, and all of them fail for multi-modal (vision) calls:

Using the direct parameter: ChatGoogleGenerativeAI(model=“gemini-2.5-pro”, max_output_tokens=8192) → Truncated
Using the generic parameter: ChatGoogleGenerativeAI(model=“gemini-2.5-pro”, max_tokens=8192) → Truncated
Using the pass-through dictionary: ChatGoogleGenerativeAI(model=“gemini-2.5-pro”, model_kwargs={“max_output_tokens”: 8192}) → Truncated

2. Success with the Native Python SDK:
When I bypass the LangChain wrapper completely and use the native google-generativeai SDK, the problem is 100% resolved . The following code works perfectly and returns the full, non-truncated response every time:

Generated python

import google.generativeai as genai

# ... (configure api_key) ...

model = genai.GenerativeModel('gemini-2.5-pro')

# This generation_config is correctly honored by the native SDK
generation_config = genai.types.GenerationConfig(
    max_output_tokens=8192,
    temperature=0.0
)

response = await model.generate_content_async(
    contents=[prompt, image],
    generation_config=generation_config
)

# `response.text` is complete.

Conclusion:

This provides strong evidence that the issue is not with the Gemini API backend itself, but rather with how client libraries—in my case, specifically langchain-google-genai—are constructing the request for multi-modal calls. It seems the generationConfig containing max_output_tokens is either not being included or is being incorrectly formatted by the wrapper when an image is part of the payload.

Hope this detailed case helps the team in debugging. For now, my workaround is to use the native Python SDK for any vision-related calls.

Thanks

Shivam_Mishra · June 25, 2025, 12:50pm

Thanks, @Yilmaz_Tandogan! That’s a helpful finding for debugging this issue.

Could you please test if gemini-2.5-flash-preview-05-20 is truncating the response? I can reproduce the problem with gemini-2.5-flash-preview-04-17 but not with gemini-2.5-flash-preview-05-20 using the JS SDK.

Prajwal_Jain · June 27, 2025, 7:17am

We are facing the same issue with gemini-2.5-flash created using langchain-vertexAI.

Here is how we are using it . Please help

from langchain_google_vertexai import VertexAI

llm_gemini_2_5_flash_big_document_chat = VertexAI(
    model_name="gemini-2.5-flash",
    temperature=0.1,
    max_output_tokens=65000,
    project="bryckel",
    location="us-central1",
    response_mime_type="application/json",
)

  # ---------- 3. QA chain with Gemini ----------
  qa_chain = load_qa_chain(
      prompt=prompt,
      llm=llm_gemini_2_5_flash_big_document_chat,
      chain_type="stuff"
  )

  chain_response = qa_chain.invoke(
      input={
          "input_documents": (
              relevant_docs if relevant_docs else [Document(page_content=" ")]
          ),
          "question": question,
          "chat_history": "",   # no prior history in this flow
          "format_instructions": document_chat_output_parser.get_format_instructions(),
      }
  )

Prajwal_Jain · June 27, 2025, 7:55am

Update: Tried the same thing using google.genai package Facing the exact same issue of truncation with same config.

Here is the code for your reference

from google import genai
from google.genai import types

# Initialize the GenAI client (same as ai_news.py)
genai_client = genai.Client(
    vertexai=True,
    project="bryckel",
    location="us-central1"
)

# Configure the generation settings with JSON schema
generate_content_config = types.GenerateContentConfig(
    temperature=0.1,
    max_output_tokens=65000,
    response_mime_type="application/json",
    response_schema=document_chat_response_schema
)


response = genai_client.models.generate_content(
                model="gemini-2.5-flash",
                contents=contents,
                config=generate_content_config,
            )
response_text = response.candidates[0].content.parts[0].text
parsed_response = json.loads(response_text)

pbt · July 8, 2025, 2:54am

I’ve also encountered this issue with the GA version gemini-2.5-flash. Whether I called the Gemini API through the LangChain API or the native Google Python SDK, at some point the text field of the candidate returns a large chunk of spaces. It’s been an annoying issue for our production app. Any updates at all?

Raja · July 11, 2025, 10:50am

same issue. anything that takes more than 40 secs to generate gets truncated.

so we switched from SDK 0.24.1 to 0.1.3 and truncation issues reduced almost 95%

Saad_Shaikh · July 12, 2025, 8:21pm

Same issue. Tried with gemini-2.5-flash-preview-05-20 as well. The streaming is returning empty chunks mid generation dont know why. For me its very frequent. I noticed that it occured when I updated my prompt and mentioned to " return image urls as markdown in your response". If I do not mention this then agent does not include any urls in the response and it works fine. However I need the urls i am supplying in data, to be a part of response and as soon as the response is about to generate a markdown link with image url, the streaming fails. You can see in the logs below, after the word bedtime routine, it was supposed to return a url link preferably in markdown but it fails.

Please fix this as soon as possible. Its important.
@Shivam_Mishra

2025-07-12 20:31:18,751 - DEBUG - Streaming delta:  can be incredibly effective for autistic children to understand and follow routines. You can create a visual timetable showing each step of the bedtime routine. This helps David know what to expect next, reducing anxiety.
    *   For example, this image shows
2025-07-12 20:31:18,884 - DEBUG - Streaming delta:  a visual timetable for a bedtime routine:

Step run_agent_step produced event AgentOutput
Running step parse_agent_output
Step parse_agent_output produced event StopEvent
2025-07-12 20:31:18,893 - INFO - The stream has been stopped!

cor · July 26, 2025, 5:20am

Can we get an update from Google? This is a critical issue

Topic		Replies	Views
Truncated responses despite being under limits Gemini API api , gemini-2-5	2	597	June 11, 2025
Output tokens limit for the finetuned gemini flash 1.5 Gemini API fine-tuning	12	2652	October 12, 2024
Gemini 2.5 API bug: missing finishReason when max token limit is reached Gemini API api , gemini-api	1	928	April 30, 2025
Gemini flash 2.0 API sometimes would stop outputting (paused) Gemini API feedback , prompt	18	1584	March 6, 2025
Significant Difference in Response Quality between Google AI Studio and Gemini 2.5 Pro API (gemini-2.5-pro-03-25) Gemini API feedback , api , gemini-25 , gemini-2-5	7	768	June 4, 2025

Truncated Response Issue with Gemini 2.5 Flash Preview

Related topics