Sudden Cost Spike with gemini-3-flash-preview Despite Decreased Usage (April 2026)

Hi,

I’ve been experiencing an unexpected and significant spike in Gemini API billing starting around April 7, 2026, despite my actual API usage decreasing during the same period.

My setup:

  • Model: gemini-3-flash-preview

  • Google Search Grounding is enabled on all requests

  • No changes to code, traffic, or configuration around the time the spike began

What I observed:

  • Daily costs jumped from a baseline of roughly a few thousand KRW to ~₩150,000–200,000 per day around April 7–10

  • Total cost for April 5 – April 11: ₩694,000

  • API request count did not increase — it actually slightly decreased

  • The cost chart shows a clear and sudden spike that does not correlate with request volume

I’ve reviewed my code and the only model being called is gemini-3-flash-preview with GoogleSearch grounding enabled on every request. There were no deployments or config changes around April 7.

Has anyone else seen a similar pattern recently? Is there any known billing issue or model update for gemini-3-flash-preview around early April 2026?

Any guidance from the Google team would be appreciated.

Thanks

Me and other devs had a similar issue since march 16th. It’s been almost a month and even though we asked Google team for help several times , we didn’t get a clear answer and it still hasn’t been fixed .

So I have to pay 4x as much as before. I don’t know what to do

you can see these topics:

and

exactly same issue here. costs have exploded rendering my business unprofitable, if not adressed

Hi, I experienced a very similar issue.
My billing suddenly spiked by about ₩340,000 in just two days, even though my usage pattern stayed about the same.
The increase seems to be coming mainly from “Generate content search query Gemini 3”.
I was also using gemini-3-flash-preview with Google Search Grounding enabled.
Did anyone else see the same issue?

Hi @modaly — we’re seeing the same pattern and I want to add a concrete data point that might help pinpoint what’s going on.

Same model (`gemini-3-flash-preview`) with `tools: [{ googleSearch: {} }]` enabled. Over April 10–11, 2026 we made a few hundred `GenerateContent` calls from a single service. What we saw when we cross-referenced Cloud Monitoring against the `Generate content search query Gemini 3` SKU in the billing console:

- **Actual `GenerateContent` API calls** (from `serviceruntime.googleapis.com/api/request_count` for `generativelanguage.googleapis.com`): **268** total across the two days

- **Billed “search query” count** on SKU `4E4D-442A-64CA`: **30,573** across the same two days

- **Ratio: ~114 internal Google searches per single `GenerateContent` call** — remarkably consistent across both days

Our prompts are research-style but not unusually complex — single-turn, no agent loop on our side, just one `generate_content` request per user query. Before this, my working mental model was “maybe 3–10 searches per call on a research prompt” — not 100+.

This lines up with what you’re describing: request count did not increase, but grounding cost exploded. If the model is autonomously fanning out to ~100 searches per call, that fully explains a ~10–50× cost increase with *flat or decreasing* API traffic. The per-search billing model on Gemini 3 (vs. per-prompt on 2.5) then translates that fanout directly into the bill.

Things I’ve already checked and ruled out as mitigations:

- `GoogleSearch()` tool in the Python / JS SDKs takes no parameters

- `dynamicRetrievalConfig` that existed on Gemini 2.5 grounding appears removed / unsupported on Gemini 3

- No documented `maxGroundingQueries` or equivalent cap

- Docs only say “the model may issue one or more searches” — no upper bound given

Questions I’d love a Google team response on, echoing @modaly and @junkx:

1. Is ~100+ search queries per single `GenerateContent` call within the expected/intended range for `gemini-3-flash-preview`? Or is this a regression / model behavior change that started around mid-March?

2. Is there *any* server- or request-side way to cap the search fanout per call? A replacement for `dynamicRetrievalConfig` on Gemini 3 would be extremely valuable.

3. When grounding fires, are the billed “search queries” always distinct intents, or can they include retries/internal loops that the client can’t see?

Happy to share more details privately with the Gemini team if helpful. Given this has been open across multiple threads for ~a month now, an official acknowledgement or ETA would really help teams like ours plan around it. Thanks!

Hi,

In my case its related to gemini-3-flash-preview output text token increase x4 overnight, without any changes on my part. Not specifically related to googleSearch tool.

I am still wating for an answer about my x4 cost increase which has been going on for almost a month.

I’ve contacted billing support twice, and I’ve also sent messages on this forum and on X to the Google Developers, but the only responses I’ve received are “our team will look into it” with no follow-up, or a generic reply from billing support.

My initial suspicion was also that Google Search Grounding might be the culprit — since it’s enabled on all my requests and the cost per grounding call can vary. However, when I actually checked my grounding call frequency, it had gone down during the same period the costs spiked. So grounding frequency alone doesn’t explain it.

Something seems to have changed on Google’s end around this time — possibly in how grounding tokens are counted or billed, even when the number of grounding calls decreases.

A clear and official response from Google would be greatly appreciated.

Same for us! SKU 4E4D-442A-64CA - Generate content search query Gemini 3 - skyrocketed 10-15x since April 7th on Gemini 3 grounded queries.

@modaly
Source of truth: console.cloud.google.com/billing - Your Project - /reports - then klick at Grouping by SKU (1st setting above the diagram).

Some important information:
We are using about 250 different gemini based agents in our application - as each one of them has function calling on, they cannot use GoogleSearch grounding (and it is not enabled). We have a single agent that is a tool web_search - that has GoogleSearch grounding and does websearches for the other agents if they deem necessary which is rarely.

To test we have disabled the tool over the weekend. there was not a single GoogleSearch grounding call on Saturday to Sunday, however we were billed roughly 20.000 calls. This corresponds to 4x the total ammount of Gemini-3-flash-preview + Gemini-3.1-flash-lite calls we have had in the same ammount of time.

The issue: google billing counts EACH gemini-3-flash-preview and each gemini-3.1-flash-lite call as GoogleSearchGrounding.

As I do not trust google billing support at this point, my solution, that we have also tested and that I can confirm now: if you change models back to gemini-2.5-flash-lite or gemini-2.5-flash the issue no longer exists.

While this is only a temporary fix and not very satisfying, I suggest it to people who might have bills they would otherwise be unable to afford.

I have contacted @Logan_Kilpatrick via X I guess many have and I trust that he will look into it (got a short response on Saturday).

From my side, I’ve actually figured out that we never moved the Project to our organisation, where we actually could pay for premium support, but will do that today. I suppose we will get a compensation.

What I can also recommend: on googleAIstudio - at Logs and Datasets (if you do not have sensitive data being processed) turn on the logs and check your gemini calls - you will be able to see the totaly output including every single websearch call - that way you can confirm that the websearch calls are not actuall calls but do not exist and are an error on google’s billing side.

Hope my insights helped at all, this shit stressed me out all weekend, had 1000€ additonal gemini costs per day during working hours, each day…

Hi folks,

In order to do individual investigations, it would be helpful if you could share your project ID. Feel free to send this via direct message.

As described in the Gemini API pricing documentation, search grounding is free up to 5000 queries per month. The sudden spike described here could be the project going over the 5000 limit, or there could be some other issue, but it will be hard for us to triage without looking at a specific project.

Hi all, just so you know, we’ve identified the problem and are actively working on it. It is however still very useful to provide your project IDs to help us fix this faster.

thanks for looking into this! I’m not comfortable sharing my project ID here and I don’t have enough trust level to send a DM. Is it possible for you to look up from [feipeng@clarifisg for the project ID or send an email to me on the email address above so I can share the project ID and the investigation on my side?

I enabled my project logs yesterday and the issue is still happening. On average, one gemini api call triggered ~114 search grounding api calls on billing side, far above what I’d expect and what I experienced previously, and also when I look at the logs, it’s only 5 calls (unless the search query fanout is also counted).

Just sent you an email on that address, feel free to respond with your ID there.

Can I also share my Porject ID via E-Mail?

yes please send it over, alicevik@google.com

hi, just throwing my 2 cents here as well, been seeing insane cost spikes with in the Gemini API Spend section of the Dashboard on AI Studio. and the problem is indeed solely with this one model (gemini 3 flash). frankly i think the charts are completely off. it basically says €0 spent until the past few days when all of a sudden they spike like crazy even though my usage has been completely consistent over a good month or two now. im not running any service or whatever; i literally just chat a bit here and there, at most a few tens of thousands of tokens a day, simply because i prefer AI Studio over the Gemini app. im also nowhere near the 5k search queries monthly threshold for sure.

i thought, maybe somehow my API key got leaked, though i find it very unlikely because i dont use it anywhere else. i even deleted it on saturday and made a new one, but the problem persists. im completely refraining from using AI Studio at all today, just to see what will happen.

Hey everyone, due to a misconfiguration of the billing system of the Gemini API, users of the Google search grounding tool were billed for a larger than consumed amount of searches.

This is now fixed going forward and we are in the process of correcting the billed amount for the affected users and will post updates here on what this will look like.

I’ve got many emails and unfortunately I can’t respond to you all, but please don’t worry, we will fix this.

This cost spike (€290 for search queries tied to €10 of LLM queries) was not caught by my spend cap, which still doesn’t show the overage. The spend cap functionality is not working as intended.

Google preemptively charged my card €200, and I was told by a customer service agent that I will get some money back by the end of the month – I was never told how much. @Ali_Cevik, you and the team need to do a much better job of making things right for customers. Google is holding my money and I don’t even have a guarantee of how much I’ll get back. The erroneous charge should be refunded immediately.

I never received a response from you or Logan, nor did I see anyone from Google post about this outside of this forum.

This incident and the failure of the various systems involved has destroyed my trust in Gemini. There’s not even a way for me to switch to pre-paid billing, to prevent overages in the future. The entire Google AI Studio billing and spend functionality is a UX nightmare. You have a lot of work to do to improve things on this front. But given that people have been complaining about this for years, my impression is that Google is such a bureaucratic nightmare that this will never get fully solved.

@Ali_Cevik

Hi Ali, thank you for confirming the billing misconfiguration and for working on corrections for affected users.

I was affected by this issue - my charges spiked overnight on April 17 while I wasn’t actively using the service. I had set a spend cap of $30 SGD and received a cost estimate of ~$5, but was billed over $100 SGD.

I contacted Google Cloud Billing Support (Case #70171096) and received a partial refund. However, I’m concerned about how this was handled:

  1. The support representative never acknowledged the billing bug that you confirmed publicly

  2. The refund was characterized as a “one-time exception” rather than a correction for the known billing system error

  3. My direct questions about the discrepancies were repeatedly avoided during the chat session

This is concerning because if future billing bugs occur, the “one-time exception” framing suggests I may not receive corrections since this was supposedly already a goodwill gesture.

Can you confirm:

  • Are all affected users being told their corrections are billing bug fixes, or are some being told they’re “one-time exceptions”?

  • Should affected users expect their support cases to acknowledge this was a system error?

I want to ensure I have proper documentation in case similar issues occur in the future, especially since my spend cap and cost estimates clearly did not function as expected.

Thank you for any clarification you can provide.

Case ID: 70171096
Billing Account: 019E0D-F3F4B0-BCD907

@Logan_Kilpatrick, please get on this. Why should I keep using Gemini when Google can just hold my money after a huge cost spike that is due to a mistake made by Google’s engineers? Why does the Gemini spend cap not work? Why do I have no option to switch to prepaid billing?