Model cannot focus on most recent user request when function calling

ibgib · January 26, 2025, 8:05pm

Hi, I’m currently working on testing out using the Gemini Function Calling in a UX workflow.

The issue that I’m having however is that the model is having a hard time differentiating from previously successful requests and the current user request. Because of this, previous function call requests are being executed more than once.

For example, if I start a conversation with “add a box in the center”, the model will usually call the appropriate createRenderable function. If I then ask for the box to be moved or the color changed, the model sometimes will respond. But sometimes the model just creates another box in the center. The model doesn’t realize that function request was in the past.

Has anyone else run up against this with function calling? Or does anyone have any links to documentation/blogs/videos related to this, or just thoughts in general? It occurs with both Flash 1.5, Flash 2.0, and Pro 1.5. I have tried various system prompt instructions to try to ameliorate this with only varying success. And I have more ideas to “fix” it (I can update the original text, marking it as “complete”, maybe that will work). It just seems like a fundamental issue that the Gemini team would have run across.

Thank you!

KRows · January 26, 2025, 8:09pm

Try adding state management to your conversation by:

Maintaining a list of executed functions
Including this context in your prompts
Using a system prompt like:

const systemPrompt = `
Track previously executed actions. Current scene state:
- Existing objects: [{id, type, position, color}]
- Last action: {actionType, timestamp}
Only modify existing objects or create new ones when explicitly requested.
`

This helps Gemini understand the current state and avoid duplicate actions. You can also explore using chat history or conversation tokens to maintain context between interactions.

ibgib · January 26, 2025, 8:42pm

Hi @KRows , thanks for the quick reply.

I have tried various ways of adding state to the conversation. The problem was in the past that if I include any metadata with an individual prompt entry, the model tries to mimic that metadata instead of just adding text. So I would type “add a box” but the model would see “[metadata here] add a box”. So they thought their responses should be “[hallucinated metadata] their response” However, now that I mention that, I realize I am trying a new method to constrain function mode to ANY and including a tellUser function, so perhaps this won’t manifest itself.

That said, I see you are including the metadata in the system prompt instead of the chat history. Is this kind of metadata associated with function calling supposed to go in the system prompt?

As for the “chat history”, I am currently using the node front end and have started using startChat instead of generate which allows a history` to be included. However, in looking at the raw request, this seems to do nothing special in terms of letting the model know of “history” vs “current”.

The context token was my initial thought (I call it a saga id), however this would take me back to the issue of the model wanting to mimic metadata.

So all of that said, I’ve just been looking into again my other different-but-related question about how to handle multi-function-call response flows and I noticed something about the API with the history?: Content[] definition, and specifically the content’s parts: Part[] definition: There is actually a FunctionResponsePart and that this is in an array.

So maybe my answer to that other question will be my answer to this question. As I mentioned in that other question, I couldn’t figure out how to give responses to the model when there were multiple function call requests spawned, so I was not using the function response mechanism. But it looks like hopefully I will be able to do that now and kill two birds with one fava bean.

Thanks again for your reply @KRows, maybe a rubber ducky moment! Hopefully anyway. I’ll post here and that other thread if that works.

KRows · January 26, 2025, 8:48pm

No problem @ibgib, glad to help!

KRows · January 26, 2025, 9:00pm

I have some links that may or may not help you so:

Another Topic

ibgib · January 26, 2025, 9:26pm

Ah, that last topic looks interesting WRT function calling. Especially the OP’s error message after upgrading:

Invalid argument provided to Gemini: 400 Please ensure that function response turn comes immediately after a function call turn. And the number of function response parts should be equal to number of function call parts of the function call turn.

The “And the number of function response parts should be equal to number of function call parts of the function call turn.” is a good clue to answer my question about multiple function calls workflow.

Thanks again

ibgib · February 2, 2025, 2:38pm

OK, so I’ve implemented the plumbing that always provides parity between the FunctionCallPart and FunctionResponsePart in the Content. Now the model is behaving as expected. So it appears that when there is a FunctionCallPart that is not matched with a FunctionResponsePart, the model seems to believe that the function needs to be called again sometimes.

`FunctionResponsePart` quirk

There is also an issue regarding the response value and I’m including this here to help others with another quirk to get this workflow executing properly. In code (in @google/generative-ai node package) the FunctionResponsePart.functionResponse is typed as follows:

export declare interface FunctionResponse {
    name: string;
    response: object;
}

The response will error out if this is an array, even though in JavaScript an array is an object (typeof [] === 'object'). So this FunctionResponse is expecting a POJO, I imagine stemming from the actual API not being JS-specific. So I simply wrapped my raw function call result in a wrapper object, e.g., response : { value: rawResponse } and the model AFAICT understands this. This also wraps raw response values that may be undefined or null.

summary

So here is a short summary for multi-turn and multi-function calling, and this applies to the contents array when sending a message to the model:

a content entry part of type FunctionCallPart must correspond to a FunctionResponsePart.
the response content entry must…
1. immediately follow the function call entry in the contents sent to the model.
2. match in parts array length and function name in the parts array with the function call entry.
3. have role set to "function" and NOT "user" as is given in the official example documentation.
4. functionCall.response must be a POJO, not just be typeof === 'object'.

So it looks like a documentation tweak could have helped a lot here: include a multi-function-call multi-turn example, correct the role for the FunctionResponsePart, include info on the response “object” issue.

Topic		Replies	Views
About the Gemini API `400 Please ensure that function call turns come immediately after a user turn or after a function response turn.` error Gemini API gemini-api	4	973	February 1, 2025
Tool calling with OpenAI API not working Gemini API open-models , ai	5	622	June 4, 2025
Bug Report: 500 Internal Error with Multi-Turn Tool Use (gemini-2.5-pro-preview-03-25) Gemini API api , model	1	373	April 7, 2025
Gemini 2.0 experimental function calling doesn't return the function most of the time Gemini API new-features , model , train-function	3	504	February 13, 2025
Error: [GoogleGenerativeAI Error] 400 Bad Request Gemini API gemini-15 , api	7	1770	January 22, 2025

Model cannot focus on most recent user request when function calling

FunctionResponsePart quirk

summary

Related topics

`FunctionResponsePart` quirk