Model cannot focus on most recent user request when function calling

Hi, I’m currently working on testing out using the Gemini Function Calling in a UX workflow.

The issue that I’m having however is that the model is having a hard time differentiating from previously successful requests and the current user request. Because of this, previous function call requests are being executed more than once.

For example, if I start a conversation with “add a box in the center”, the model will usually call the appropriate createRenderable function. If I then ask for the box to be moved or the color changed, the model sometimes will respond. But sometimes the model just creates another box in the center. The model doesn’t realize that function request was in the past.

Has anyone else run up against this with function calling? Or does anyone have any links to documentation/blogs/videos related to this, or just thoughts in general? It occurs with both Flash 1.5, Flash 2.0, and Pro 1.5. I have tried various system prompt instructions to try to ameliorate this with only varying success. And I have more ideas to “fix” it (I can update the original text, marking it as “complete”, maybe that will work). It just seems like a fundamental issue that the Gemini team would have run across.

Thank you!

1 Like

Try adding state management to your conversation by:

  1. Maintaining a list of executed functions
  2. Including this context in your prompts
  3. Using a system prompt like:
const systemPrompt = `
Track previously executed actions. Current scene state:
- Existing objects: [{id, type, position, color}]
- Last action: {actionType, timestamp}
Only modify existing objects or create new ones when explicitly requested.
`

This helps Gemini understand the current state and avoid duplicate actions. You can also explore using chat history or conversation tokens to maintain context between interactions.

1 Like

Hi @KRows , thanks for the quick reply.

I have tried various ways of adding state to the conversation. The problem was in the past that if I include any metadata with an individual prompt entry, the model tries to mimic that metadata instead of just adding text. So I would type “add a box” but the model would see “[metadata here] add a box”. So they thought their responses should be “[hallucinated metadata] their response” However, now that I mention that, I realize I am trying a new method to constrain function mode to ANY and including a tellUser function, so perhaps this won’t manifest itself.

That said, I see you are including the metadata in the system prompt instead of the chat history. Is this kind of metadata associated with function calling supposed to go in the system prompt?

As for the “chat history”, I am currently using the node front end and have started using startChat instead of generate which allows a history` to be included. However, in looking at the raw request, this seems to do nothing special in terms of letting the model know of “history” vs “current”.

The context token was my initial thought (I call it a saga id), however this would take me back to the issue of the model wanting to mimic metadata.

So all of that said, I’ve just been looking into again my other different-but-related question about how to handle multi-function-call response flows and I noticed something about the API with the history?: Content[] definition, and specifically the content’s parts: Part[] definition: There is actually a FunctionResponsePart and that this is in an array.

So maybe my answer to that other question will be my answer to this question. As I mentioned in that other question, I couldn’t figure out how to give responses to the model when there were multiple function call requests spawned, so I was not using the function response mechanism. But it looks like hopefully I will be able to do that now and kill two birds with one fava bean.

Thanks again for your reply @KRows, maybe a rubber ducky moment! Hopefully anyway. I’ll post here and that other thread if that works.

No problem @ibgib, glad to help! :cold_face::ok_hand:

1 Like

I have some links that may or may not help you so:

Another Topic

1 Like

Ah, that last topic looks interesting WRT function calling. Especially the OP’s error message after upgrading:

Invalid argument provided to Gemini: 400 Please ensure that function response turn comes immediately after a function call turn. And the number of function response parts should be equal to number of function call parts of the function call turn.

The “And the number of function response parts should be equal to number of function call parts of the function call turn.” is a good clue to answer my question about multiple function calls workflow.

Thanks again :+1:

1 Like

OK, so I’ve implemented the plumbing that always provides parity between the FunctionCallPart and FunctionResponsePart in the Content. Now the model is behaving as expected. So it appears that when there is a FunctionCallPart that is not matched with a FunctionResponsePart, the model seems to believe that the function needs to be called again sometimes.

FunctionResponsePart quirk

There is also an issue regarding the response value and I’m including this here to help others with another quirk to get this workflow executing properly. In code (in @google/generative-ai node package) the FunctionResponsePart.functionResponse is typed as follows:

export declare interface FunctionResponse {
    name: string;
    response: object;
}

The response will error out if this is an array, even though in JavaScript an array is an object (typeof [] === 'object'). So this FunctionResponse is expecting a POJO, I imagine stemming from the actual API not being JS-specific. So I simply wrapped my raw function call result in a wrapper object, e.g., response : { value: rawResponse } and the model AFAICT understands this. This also wraps raw response values that may be undefined or null.

summary

So here is a short summary for multi-turn and multi-function calling, and this applies to the contents array when sending a message to the model:

  • a content entry part of type FunctionCallPart must correspond to a FunctionResponsePart.
  • the response content entry must…
    1. immediately follow the function call entry in the contents sent to the model.
    2. match in parts array length and function name in the parts array with the function call entry.
    3. have role set to "function" and NOT "user" as is given in the official example documentation.
    4. functionCall.response must be a POJO, not just be typeof === 'object'.

So it looks like a documentation tweak could have helped a lot here: include a multi-function-call multi-turn example, correct the role for the FunctionResponsePart, include info on the response “object” issue.