Issue with Gemini-1.5-Flash-002 Function Calling (Mode: ANY) – Sometimes Returns Text Instead

Hello,

I am using gemini-1.5-flash-002 with function calling mode set to “ANY”, but the model does not always call the function as expected. Instead, it sometimes returns raw text or JSON without calling the function explicitly.

For example, I provide the following structured prompt:

let prompt = 'You are an AI specialized in analyzing product catalogs in PDF format. Your role is to extract the most relevant category from the provided list, based on the content of the PDF.  **You MUST call the function `categorize_pdf_catalog` to return level 3 category IDs.**'

prompt += `
### 🔹 **Context**:
The provided PDF is a catalog of products supplied by a company. You must analyze its content and extract the most relevant products presented in this catalog.
`

prompt += `
### 🔹 **Instructions**:
- Based on the content of the PDF catalog, determine the most relevant level 3 categories.
- Carefully analyze the PDF to identify the products it contains.
- Match these products with the most relevant category from the provided list.
- Do NOT list products that are only present for decorative purposes.
- **You MUST call the function \`categorize_pdf_catalog\` to return the extracted Category IDs in JSON format.**
`

prompt += `
### 🔹 **Category List**:
`
structure.forEach((level1) => {
  prompt += `- ${level1.level1_name}:\n`;
  level1.subcategories.forEach((level2) => {
    prompt += `  - ${level2.level2_name}:\n`;
    level2.sub_subcategories.forEach((level3) => {
      prompt += `    - ${level3.name} (ID: ${level3.id})\n`;
    });
  });
});

prompt += `
### 🔹 **Output Format**:
Return a flat JSON array with relevant level_3_ids:

\`\`\`json
["ID1", "ID2", "ID3"]
\`\`\`
`

And I define my function call like this:

const functionDefinitionLevel3 = {
  name: "categorize_pdf_catalog",
  description: "Analyzes the PDF catalog and determines the most relevant level 3 category IDs.",
  parameters: {
    type: "object",
    properties: {
      level_3_ids: {
        type: "array",
        description: "List of the most relevant level 3 category IDs",
        items: { type: "string", pattern: "^[0-9]+$" },
        maxItems: 3,
      },
    },
    required: ["level_3_ids"],
  },
};

I configure function calling with:

javascript

CopyEdit

const toolConfig = {
  function_calling_config: {
    mode: 'ANY',
    allowed_function_names: ["categorize_pdf_catalog", "get_level4_categories"],
  },
};

Expected Behavior:

  • Gemini must always return a function call to categorize_pdf_catalog with category IDs.
  • No raw text or JSON should be returned directly.

Actual Behavior:

  1. Sometimes, Gemini returns plain JSON instead of calling the function:
{
  "candidates": [
    {
      "content": {
        "parts": [
          {
            "text": "```json\n[\"729\"]\n```"
          }
        ]
      }
    }
  ]
}
  1. Other times, it mixes text and a function call:

json

CopyEdit

{
  "candidates": [
    {
      "content": {
        "parts": [
          {
            "text": "Based on the analysis of the PDF catalog, the most relevant level 3 categories are Door, Window & Accessories (ID: 969), Furniture Hardware (ID: 1305), and Hardware (ID: 2196).\n\n"
          },
          {
            "functionCall": {
              "name": "categorize_pdf_catalog",
              "args": {
                "level_3_ids": ["969", "1305", "2196"]
              }
            }
          }
        ]
      }
    }
  ]
}

This is problematic because my system expects only a function call, but the response sometimes contains unexpected raw text.

Hi @Baptiste_Richetin,

Lowering the temperature parameter (e.g., to 0.1 or 0) might reduce the variability in the model’s responses and make it more likely to call the function. However, it’s not a guaranteed solution with mode: ANY.

As a workaround, you could implement logic in your code to:

  • Check if the response contains a functionCall object.
  • If it does, extract the function name and arguments.
  • If it doesn’t, handle the raw text/JSON output appropriately (e.g., by attempting to parse it as JSON and/or extracting the category IDs)

Incases where your calls fail this work around might help.

Cheers!