Hi @modaly — we’re seeing the same pattern and I want to add a concrete data point that might help pinpoint what’s going on.
Same model (`gemini-3-flash-preview`) with `tools: [{ googleSearch: {} }]` enabled. Over April 10–11, 2026 we made a few hundred `GenerateContent` calls from a single service. What we saw when we cross-referenced Cloud Monitoring against the `Generate content search query Gemini 3` SKU in the billing console:
- **Actual `GenerateContent` API calls** (from `serviceruntime.googleapis.com/api/request_count` for `generativelanguage.googleapis.com`): **268** total across the two days
- **Billed “search query” count** on SKU `4E4D-442A-64CA`: **30,573** across the same two days
- **Ratio: ~114 internal Google searches per single `GenerateContent` call** — remarkably consistent across both days
Our prompts are research-style but not unusually complex — single-turn, no agent loop on our side, just one `generate_content` request per user query. Before this, my working mental model was “maybe 3–10 searches per call on a research prompt” — not 100+.
This lines up with what you’re describing: request count did not increase, but grounding cost exploded. If the model is autonomously fanning out to ~100 searches per call, that fully explains a ~10–50× cost increase with *flat or decreasing* API traffic. The per-search billing model on Gemini 3 (vs. per-prompt on 2.5) then translates that fanout directly into the bill.
Things I’ve already checked and ruled out as mitigations:
- `GoogleSearch()` tool in the Python / JS SDKs takes no parameters
- `dynamicRetrievalConfig` that existed on Gemini 2.5 grounding appears removed / unsupported on Gemini 3
- No documented `maxGroundingQueries` or equivalent cap
- Docs only say “the model may issue one or more searches” — no upper bound given
Questions I’d love a Google team response on, echoing @modaly and @junkx:
1. Is ~100+ search queries per single `GenerateContent` call within the expected/intended range for `gemini-3-flash-preview`? Or is this a regression / model behavior change that started around mid-March?
2. Is there *any* server- or request-side way to cap the search fanout per call? A replacement for `dynamicRetrievalConfig` on Gemini 3 would be extremely valuable.
3. When grounding fires, are the billed “search queries” always distinct intents, or can they include retries/internal loops that the client can’t see?
Happy to share more details privately with the Gemini team if helpful. Given this has been open across multiple threads for ~a month now, an official acknowledgement or ETA would really help teams like ours plan around it. Thanks!