Problems With gemini-2.0-flash Tool Calling

inspight · February 16, 2025, 11:31pm

Greetings, everyone. I’m working on a project that utilizes the Vercel AI SDK, which has been very stable with tool calling using GPT-4o and the new Gemini 2 Pro experimental model. However, with Gemini-2.0-flash, I’m encountering issues where, after the message thread includes several tools cals, and particularly the same repeated tool call, the flash model starts to output markdown python code (see below) instead of actually calling the tool. Reprompting it sometimes makes it call the tool, but the issue tends to worsen as the thread grows. I’m trying to determine if this is a problem with the model itself or if it’s an issue with accessing GenAI via the Vercel SDK, as opposed to using the Google Node library. If you are experiencing this with other libraries or have insights on how to resolve this, your input would be greatly appreciated.

Example Response
I have opened the homepage in a new window. Now I need to extract all URLs in the navigation at the top. I will get the HTML of the page.

print(default_api.getBrowswerWindowHtml(BrowserWindowResource = "4f9608a5-55c8-457d-854d-1292b9b44010"))

Matias_Sidler · February 26, 2025, 7:36pm

I’m having the same issue. It’s not related to Vercel AI SDK, since I’m just making HTTP requests to the API and getting the same response sometimes. It looks like a model’s issue. Have you found a workaround for this?

Stephen_Solka · February 27, 2025, 5:10am

We see the same thing. Temporary work around we are using is to filter these out using regexp. String.replace(response, ~r/tool[\w]*[\s\S]*?/m, “”) this is elixir code. but you get the idea

inspight · February 27, 2025, 4:15pm

Thanks for the insight on this.

Vishal · March 25, 2025, 2:09pm

Hey folks, thanks for flagging this! Just to confirm, what models are you seeing this behavior with? Is it mainly with 2.0 Flash, or with other models, too?

inspight · March 26, 2025, 2:50pm

I have really only encountered it on flash 2.0, 1.5 pro seems fine and I wasn’t have issues with 2.0 pro experimental though I have not used it heavliy.

Jason_Roell · April 29, 2025, 4:00pm

I’ve seen it happen with EVERY gemini model. But 2.0 an 2.5 (flash and pro) actually seem worse

Vishal · May 1, 2025, 1:36am

Would you be able to share a couple examples to help us with investigating?

Justin_Schroeder · May 1, 2025, 3:52pm

In our implementation this happens constantly, even when the thread isnt all that long. it will even begin to hallucinate tools that don’t exist and/or that it does not have access to. It happens frequently when the model has no tools available to it, but it can “see” tools were called in the past. We filter these out and retry, which sometimes works.

I would be happy to provide any reproductions you need, although not entirely sure how to do that. I would even be happy to jump on a zoom.

Vishal · May 1, 2025, 4:47pm

Thanks for these details - just sent you a DM

Gauransh_Soni · May 6, 2025, 11:36am

Hi Vishal, we are experiencing multiple tool calls of the same tool, even tho tool is being already called, specially with 2.5 Pro, can you help us with this?

Vishal · May 6, 2025, 7:30pm

Hey Gauransh, can you retry with the model we released today gemini-2.5-pro-preview-05-06? The new model is much better at function calling, so I’d be curious to hear if it resolves your issue

Spencer_Uresk · May 6, 2025, 10:56pm

I was running into this a bunch with gemini-2.5-pro-preview-03-25, seems to also happen with gemini-2.5-pro-preview-05-06 also. Hard to say if it is more or less frequent - sometimes on the old one it was happening 20%+ of the time, others closer to 1%, but definitely got worse over the past few days.

There seem to be 3 main failure modes -

Output going beyond MAX_TOKENS:

Unexpected response: candidates {
  content {
    role: "model"
    parts {
      text: ""
    }
  }
  finish_reason: MAX_TOKENS
}
usage_metadata {
  prompt_token_count: 3869
  total_token_count: 12061
}

Malformed Function Call:

candidates {
  content {
  }
  finish_reason: MALFORMED_FUNCTION_CALL
  finish_message: "Malformed function call:  ..."
}
usage_metadata {
  prompt_token_count: 3869
  total_token_count: 3869
}

Pretty-printing instead of doing a tool call:

candidates {
  content {
    role: "model"
    parts {
      text: "```python\nprint(default_api.extract_document_info(<..>))\n```"
    }
  }
  finish_reason: STOP
  avg_logprobs: -0.0032701773621211542
}
usage_metadata {
  prompt_token_count: 2321
  candidates_token_count: 107
  total_token_count: 2428
}

All of these happen somewhat randomly on inputs that work most of the time.

Topic		Replies	Views
Gemini 2.5 Pro ending the turn when it tries calling a tool Gemini API model , gemini-2-5	7	608	May 9, 2025
Bug Report the model often starts creating repetitive sequences of tokens Gemini API gemini-15	12	843	April 11, 2025
Gemini 2 flash API returns raw markdown instead of function call Gemini API ai-studio , api , models	0	134	March 13, 2025
Getting FinishReason.MALFORMED_FUNCTION_CALL when function calling arugments contain large amount of text content Gemini API python	3	492	March 21, 2025
Gemini flash 2.0 API sometimes would stop outputting (paused) Gemini API feedback , prompt	18	1123	March 6, 2025

Problems With gemini-2.0-flash Tool Calling

Related topics