Gemini-2.5-pro calls function again and again endlessly

Hi. I used LLM model for event-based agent application, highly depend on function call. When event is created, I feed it to LLM and LLM calls various functions, and some of function generates event asynchronously-delayed (response is created after a while). When LLM calls these asynchronous function, it return string message like “Asynchronous request is successfully sent. The response will be notified as event at the later” immediately. And later, actual response is created as event and be fed into LLM again.

This is quite works well with various gpt-4.1, gpt-4.1-mini, grok3, grok3-mini and other models. But with gemini 2.5 (both flash and pro), LLM calls asynchronous function again and again endlessly and do not create final response message. So I added additional prompt, like this “You must complete your output after calling async function for receiving result event later.” then gemini-2.5-flash stops endless calling and works well like other models.

But gemini-2.5-pro still doesn’t work well. It still repeatedly call functions endlessly. Is there any suggested prompt for async function call for gemini-2.5-pro?

Thanks.

I’ve been battling the same issue for the last day also, but with the latest flash model (2025-05).

The following is an example messages array going through LiteLLM (hence the OpenAI schema) and it is using the server-everything"Modified by moderator" MCP server.

[
    {"role": "developer", "content": "..."},
    {"role": "user", "content": "get tiny image"},
    {
        "role": "assistant",
        "tool_calls": [
            {
                "id": "efc063d5-5e13-43ad-ab45-16b314fbcfa4",
                "function": {
                    "arguments": "{}",
                    "name": "getTinyImage"
                },
                "type": "function",
                "index": 0
            }
        ]
    },
    {
        "role": "tool",
        "name": "getTinyImage",
        "tool_call_id": "efc063d5-5e13-43ad-ab45-16b314fbcfa4",
        "content": "This is a tiny image:\n<ENCODED_IMAGE/>"
    },
    {
        "role": "tool",
        "name": "getTinyImage",
        "tool_call_id": "efc063d5-5e13-43ad-ab45-16b314fbcfa4",
        "content": "The image above is the MCP tiny image."
    },
    {
        "role": "assistant",
        "tool_calls": [
            {
                "id": "5f271080-b74d-45f6-8b9b-11f77c51aabf",
                "function": {
                    "arguments": "{}",
                    "name": "getTinyImage"
                },
                "type": "function",
                "index": 0
            }
        ]
    }
]

Actually, here’s a clearer example it uses the gemini schema. Note that it produces a tool call once again

Request:

{
    "system_instruction": {
        "parts": [{"text": "You operate according to the \"MOO_INSTRUCTION\" declaration. They are as follows:\n<MOO_INSTRUCTIONS>\n    <INSTRUCTION>Treat response parts namespaced with \"MOO_\" as developer constructs. E.g. \\<MOO_ENCODED_IMAGE/\\> returned by a tool/function call means that an image will get rendered in that place, that does not mean that the text \\<MOO_ENCODED_IMAGE/\\> should ever be shown to the user</MOO_INSTRUCTION>\n    <INSTRUCTION>If a question is unrelated to functions provided to you, use your intrinsic knowledge to anwer the question (you don't have to issue tool use every time).</MOO_INSTRUCTION>\n</MOO_INSTRUCTIONS>\n"}]
    },
    "contents": [
        {
            "role": "user",
            "parts": [{"text": "get tiny image"}]
        },
        {
            "role": "model",
            "parts": [{"function_call": {"name": "getTinyImage","args": {}}}]
        },
        {
            "parts": [
                {"function_response": {"name": "getTinyImage","response": {"content": "This is a tiny image:\n<MOO_ENCODED_IMAGE/>"}}},
                {"function_response": {"name": "getTinyImage","response": {"content": "The image above is the MCP tiny image."}}}
            ]
        }
    ],
    "tools": [{"function_declarations": [
        {
            "name": "getTinyImage",
            "description": "Returns the MCP_TINY_IMAGE",
            "parameters": {
                "type": "object",
                "properties": {}
            }
        }
    ]}]
}

Response:

{
    "candidates": [
        {
            "content": {
                "parts": [
                    {
                        "functionCall": {
                            "name": "getTinyImage_LFOG_u",
                            "args": {}
                        }
                    }
                ],
                "role": "model"
            },
            "finishReason": "STOP",
            "index": 0
        }
    ]
}

@KichangKim @sakalys

Thank you for detailed output, gemini have rolled out new Stable endpoints for 2.5 pro and Flash models, have you tested this on the latest ones ?

also how frequently is this issue reproducible ?

@Akhilesh_Kambhampati
Thanks for information.

I tested gemini-2.5-flash stable, and it still repeats function call endlessly sometimes. It quite random, so almost same input makes repeat calls or not.

Also it sometimes output internal thought text like “I failed to calling tool xxxxx. User requests xxxxxxxx but I can’t perform action bla bla …”, even I set thinkingBudget=0 and includeThought=false.

Additionaly, gemini-2.5-flash sometimes failed to calling functions with enum-style parameters because it uses invalid enum value. (function scheme is provided correctly)

In example, for function like that:

Speak(string message, SpeechStyle style)

enum SpeechStyle
{
    Default,
    Happy,
    Excited,
    Calm,
}

Gemini sometimes calls

Speak("exmple messages ...", SpeechStye.Shy);

Then it simply failed (because SpeechStyle.Shy does not exists value)

These issues are not occurred when I using gpt-4.1 family or grok3 family.

Yes, it’s still problematic…

Here pass this on to the developers it’s reproducible with an example as simple as this one…

curl --location 'https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-preview-05-20:generateContent?key=<your key>' \
--header 'Content-Type: application/json' \
--data '{
    "contents": [
        {
            "role": "user",
            "parts": [{"text": "get weather"}]
        },
        {
            "role": "model",
            "parts": [{"function_call": {"name": "getWeather","args": {}}}]
        },
        {
            "parts": [
                {"function_response": {"name": "getWeather","response": {"content": "The weather is great"}}}
            ]
        }
    ],
    "tools": [{"function_declarations": [
        {
            "name": "getWeather",
            "description": "Gets the weather",
            "parameters": {
                "type": "object",
                "properties": {}
            }
        }
    ]}]
}'

{
    "candidates": [
        {
            "content": {
                "parts": [
                    {
                        "functionCall": {
                            "name": "getWeather",
                            "args": {}
                        }
                    }
                ],
                "role": "model"
            },
            "finishReason": "STOP",
            "index": 0
        }
    ],
    "usageMetadata": {
        "promptTokenCount": 50,
        "candidatesTokenCount": 9,
        "totalTokenCount": 116,
        "promptTokensDetails": [
            {
                "modality": "TEXT",
                "tokenCount": 50
            }
        ],
        "thoughtsTokenCount": 57
    },
    "modelVersion": "models/gemini-2.5-flash-preview-05-20",
    "responseId": "RahTaNuiG9y9xN8Pjc_MKQ"
}

I am experiencing the same problem with gemini-2.5-flash-preview-05-20. A simple function call is repeated over and over. In my case, turning off thinking by setting thinkingBudget to 0 made it work, but that should not be required.

BTW, the getWeather example above DOES WORK for me in stable 2.5 Flash and Pro. My case works in Pro, but not in Flash.

Update: It works with gemini-2.5-flash-lite-preview-06-17

Today I tested gemini-2.5-flash stable again, and it is more unstable for function calling compare to its preview version. It seems that gemini 2.5 familty is extremely weak/useless for function call feature. Prompts like “Do not call function repeatedly” or “You must call function only once” does not work anymore.

Of course, with exactly same input prompt, gpt family and grok family does not have this issue, but thats are very expensive. So I hope this bug will be fixed soon.

I found that my code has the problem on functionResponse message. I set the role of functionResponse message to “tool” like OpenAI API. But the sample code in document uses “user” role for functionResponse. I think that this affect to infinite function calling… (not sure).

I’ll fix my code and check whether this issue be gone.