Hello,
I am using gemini-1.5-flash-002
with function calling mode set to “ANY”, but the model does not always call the function as expected. Instead, it sometimes returns raw text or JSON without calling the function explicitly.
For example, I provide the following structured prompt:
let prompt = 'You are an AI specialized in analyzing product catalogs in PDF format. Your role is to extract the most relevant category from the provided list, based on the content of the PDF. **You MUST call the function `categorize_pdf_catalog` to return level 3 category IDs.**'
prompt += `
### 🔹 **Context**:
The provided PDF is a catalog of products supplied by a company. You must analyze its content and extract the most relevant products presented in this catalog.
`
prompt += `
### 🔹 **Instructions**:
- Based on the content of the PDF catalog, determine the most relevant level 3 categories.
- Carefully analyze the PDF to identify the products it contains.
- Match these products with the most relevant category from the provided list.
- Do NOT list products that are only present for decorative purposes.
- **You MUST call the function \`categorize_pdf_catalog\` to return the extracted Category IDs in JSON format.**
`
prompt += `
### 🔹 **Category List**:
`
structure.forEach((level1) => {
prompt += `- ${level1.level1_name}:\n`;
level1.subcategories.forEach((level2) => {
prompt += ` - ${level2.level2_name}:\n`;
level2.sub_subcategories.forEach((level3) => {
prompt += ` - ${level3.name} (ID: ${level3.id})\n`;
});
});
});
prompt += `
### 🔹 **Output Format**:
Return a flat JSON array with relevant level_3_ids:
\`\`\`json
["ID1", "ID2", "ID3"]
\`\`\`
`
And I define my function call like this:
const functionDefinitionLevel3 = {
name: "categorize_pdf_catalog",
description: "Analyzes the PDF catalog and determines the most relevant level 3 category IDs.",
parameters: {
type: "object",
properties: {
level_3_ids: {
type: "array",
description: "List of the most relevant level 3 category IDs",
items: { type: "string", pattern: "^[0-9]+$" },
maxItems: 3,
},
},
required: ["level_3_ids"],
},
};
I configure function calling with:
javascript
CopyEdit
const toolConfig = {
function_calling_config: {
mode: 'ANY',
allowed_function_names: ["categorize_pdf_catalog", "get_level4_categories"],
},
};
Expected Behavior:
- Gemini must always return a function call to
categorize_pdf_catalog
with category IDs. - No raw text or JSON should be returned directly.
Actual Behavior:
- Sometimes, Gemini returns plain JSON instead of calling the function:
{
"candidates": [
{
"content": {
"parts": [
{
"text": "```json\n[\"729\"]\n```"
}
]
}
}
]
}
- Other times, it mixes text and a function call:
json
CopyEdit
{
"candidates": [
{
"content": {
"parts": [
{
"text": "Based on the analysis of the PDF catalog, the most relevant level 3 categories are Door, Window & Accessories (ID: 969), Furniture Hardware (ID: 1305), and Hardware (ID: 2196).\n\n"
},
{
"functionCall": {
"name": "categorize_pdf_catalog",
"args": {
"level_3_ids": ["969", "1305", "2196"]
}
}
}
]
}
}
]
}
This is problematic because my system expects only a function call, but the response sometimes contains unexpected raw text.