Hey team,
We’ve been running an agentic system on Gemini 3 Flash Preview and wanted to ask about parallel function calling
behavior.
We see from the docs that Gemini supports returning multiple function calls in a single response when they’re
independent. Our framework parses these correctly — if the model returns multiple functionCall parts, we handle them
all and return results. We also explicitly set toolConfig.functionCallingConfig.mode: “AUTO” on every request. But in
practice, Gemini 3 Flash Preview consistently returns one function call per response, even when the calls are clearly
independent.
This adds up fast. A typical turn where the agent reads a couple files and saves a note turns into 4-5 sequential API
round-trips, each re-sending the full context. On a ~14K token context, that’s 56K-70K input tokens for what could
have been 1-2 round-trips with parallel calls.
We think we may have found a related cause. There’s an existing report about Gemini 3 Flash Preview inconsistently
generating thought_signature fields for parallel function calls, which causes 400 errors and potential silent data
loss: [Gemini 3 Flash Preview] Inconsistent thought_signature generation in parallel function calls causes 400 errors and potential silent data loss. If the model is aware that parallel calls trigger
signature issues, it may have learned to avoid generating them entirely.
What we’ve verified on our end:
- functionCallingConfig.mode is explicitly set to AUTO on every request
- Our response parser correctly handles multiple functionCall parts (unique IDs, thought_signature passthrough)
- The behavior is consistent across hundreds of turns — zero parallel calls observed
Questions:
- Is the thought_signature bug causing the model to avoid parallel function calls? If so, is there a timeline for a fix?
- Is there anything else in the request format that encourages batching? System instruction hints, toolConfig options we’re missing?
- Do the Pro models or Gemini 2.5 series parallelize more aggressively, or is this a general limitation?
The cost impact is significant — 3-4x more input tokens than necessary on every tool-using turn. Any guidance would be
huge.
Thanks!
Max