Gemini 2.5-flash stuck in a tool call loop when using both tools and structured output

jb_m · December 1, 2025, 1:18pm

I’m observing rare (but deterministic) cases where gemini is “looping”: endlessly calling the same tool with the same parameters.

What I’m seeing: when providing gemini with a set of tools and asking for a structured output, sometimes gemini “loops” on one tool: it calls it, get the reponse, calls again with the same parameters, get he response, …

A few more details:

model: gemini 2.5-flash
modality: both the GenerateContent endpoint and the OpenAI-compatible chat completion endpoint
seed: 42, temperature: 0
switching off the reasoning seems to lower the occurences

I know tools + structured output was not supposed to work for gemini 2.0, but nothing is said in the doc mention a problem for 2.5. And it’s working most of the times.

Abhijit_Pramanik · December 30, 2025, 10:05pm

Hello @jb_m,

Thanks for flagging this. That “endless loop” behavior with Gemini 2.5 Flash is definitely a known edge case when strict JSON schemas interact with function calling.

You could try the following steps:

Run the prompt with tools enabled but no response_schema. Let the model gather the necessary data.
Feed the conversation history (including tool outputs) back into the model with the response_schema enabled, but disable the tools.

jb_m · January 2, 2026, 9:17am

Thanks, we ended up implementing a “finishing tool” (like pydantic) with some retry logic. That way we only provide tools to the model, and no structred output.

Abhijit_Pramanik · January 2, 2026, 5:58pm

Thank you for the update. Great to know things are working for you.

Kody_Kendall · January 3, 2026, 2:17pm

@Abhijit_Pramanik Is this also a known issue with Gemini 3.0 Flash? Having some similar weird looping behavior in Gemini 3.0 Flash with Structured Output & Tool Calling.

Abhijit_Pramanik · January 6, 2026, 7:44pm

Hello @Kody_Kendall,

Welcome to the AI Forum!

Although Gemini 3.0 Flash features improved reasoning capabilities, looping can still occur when combining tools with strict Structured Outputs, usually because the model detects a conflict between the raw tool data and the schema constraints. To resolve this, we recommend separating the workflow: first, allow the model to execute tool calls without a response_schema to gather the necessary information. Then, pass the resulting conversation history back to the model—with tools disabled and the schema enabled—to generate the final JSON.

Brandon3 · January 13, 2026, 10:13am

Hi @Abhijit_Pramanik , thanks for this proposed solution, we managed to solve the infinite loop issue which happening on our end using Gemini 3 flash.

However i do have 1 question: by introducing this exact solution to separate the workflow, meaning we pass dynamic context (one of tools disabled, and one other with schema enabled). This makes the Implicit caching offered from Gemini to not work at all. I tried experimenting with Explicit caching in hope that it could work with only system prompts and input prefix. However i get errors which says this is something we can’t do currently with Gemini:

[Playground] Error in handleSendMessage: ApiError: {“error”:{“code”:400,“message”:“Tool config, tools and system instruction should not be set in therequest when using cached content.”,“status”:“INVALID_ARGUMENT”}}

That being said, does this mean by implementing such workflow meaning sacrificing current Caching capability? Is there any workaround for this? Caching is crucial for us since after doing some math, we could potentially saves cost up to 50-80% from current rates (currently the cache hit rate drops from 42% before the fix, to 9% after the fix (which sacrifice the caching)).

Appreciate for any help, thanks!

Vishal · January 13, 2026, 10:35pm

Hey @Brandon3 - great question! In order to use explicit cache, you would need to re-set the cache with the tools and system instructions included in the cache creation configuration, as they cannot be overridden or toggled in subsequent requests when using cached content.

However, you will still benefit from implicit caching (which is enabled by default and best-effort). As long as your large context (e.g., system instructions or documents) remains consistent at the beginning of your prompt, the model should cache that prefix. This allows you to effectively separate the workflow as suggested without completely losing caching benefits.

Kody_Kendall · January 30, 2026, 2:04pm

We’re still having issues. It seems to have gotten worse in the last 3-4 days. Here’s an example of the loop our agent will get stuck in:

...

Wait, I'll check the severity.It's a "Medium" severity (UX improvement).
Ready. :)
Wait, I'll use 2026-01-30_BUG_auto_collapse_rate_buildup_on_save.md.
Let's go. :)
Wait, I'll check the filename format: YYYY-MM-DD_TYPE_DESCRIPTION.md.2026-01-30_BUG_auto_collapse_rate_buildup_on_save.md. Correct.
Ready. :)
Wait, I'll call write_file now.
Wait, I have to provide the research summary in the response.So I will do that.
One detail: Root Cause Classification.Primary layer: Views.Secondary layers: Controller.Is this a DATA problem or DISPLAY problem? DISPLAY.Evidence: update.turbo_stream.erb surgical updates omit the toggle checkbox.
Ready. :)
Wait, I'll also check app/models/tender.rb for recalculate_grand_total!.Actually, I don't need to, it's irrelevant to the collapse.
Ready. :)
Wait, I'll use 2026-01-30_BUG_auto_collapse_rate_buildup_on_save.md.
Let's go. :)
Wait, I'll call write_file.
Wait, I forgot the "Split Check".1 model, 1 screen. No split.
Ready. :)
Wait, I see Code Health Observations table.I'll include it.
Ready. :)
Wait, I see Implementation Considerations.I'll include them.
Ready. :)
Wait, I'll use 2026-01-30_BUG_auto_collapse_rate_buildup_on_save.md.
Let's go. :)
Wait, I'll call write_file.
Wait, I have to provide the research report in the response.I'll do that now.
Wait, I'll use 2026-01-30_BUG_auto_collapse_rate_buildup_on_save.md.
Let's go. :)
Wait, I'll call write_file.
Wait, I forgot to check app/views/tender_line_items/_tender_line_item.html.erb line 108:peer-checked:max-h-[2000px]If the checkbox is unchecked, the height becomes max-h-0. This is how it collapses.
Ready. :)
Wait, I'll use 2026-01-30_BUG_auto_collapse_rate_buildup_on_save.md.
Let's go. :)
Wait, I'll call write_file.
Wait, I have to provide the research report in the response.I'll do that now.
Wait, I'll use 2026-01-30_BUG_auto_collapse_rate_buildup_on_save.md.
Let's go. :)
Wait, I'll call write_file.
Wait, I forgot to check if the chevron rotation is currently handled by breakdown_editor_controller.js.No, I checked it. It's not.
Okay. :)
Wait, I'll use `20

You Said: stop

Leonardo Said: 26-01-30_BUG_auto_collapse_rate_buildup_on_save.md`.
Let's go. :)
Wait, I'll call write_file.
WaitI've received the file context for the builder page. I am waiting for the technical research to complete so I can identify why the breakdown is staying open during save. 
Should I continue with the research, or did you want to change the scope?

So Gemini 3.0 Flash gets stuck saying: “Wait, I’ll call write_file. Let’s go. :)” Then it will just loop that message multiples times.

Any advice for us? @Logan_Kilpatrick

We’re using the latest LangChain implementation for Gemini 3.0, which is following best practices. @Abhijit_Pramanik

Topic		Replies	Views
Gemini 3 Flash Preview – Infinite Reasoning Loop Causing Max Token Exhaustion & Raw Logic Leak Gemini API prompt , gemini-flash	13	1049	February 16, 2026
Issue with Gemini 2.5 Pro (Vertex AI): Looping / Incorrect Behavior on Sequential Tool Calls (read_file -> write_file) in Single Turn Gemini API api , models	2	508	September 13, 2025
Gemini 2.5 Pro - Tool Call Looping Bug - Reason for v high bills? Gemini API bug , api , gemini , gemini_25_pro	2	231	November 11, 2025
Problems With gemini-2.0-flash Tool Calling Gemini API gemini-flash , gemini-20	13	1908	June 19, 2025
Bug Report the model often starts creating repetitive sequences of tokens Gemini API gemini-15	15	1743	February 5, 2026

Gemini 2.5-flash stuck in a tool call loop when using both tools and structured output

Related topics