URGENT: Huge cost cache increase issue

SKU: E181-DFF8-56CF
I have huge problems with cache calculation.

For test issue I created 5 cached contents with 5 seconds time to live. All that caches was invalidated/expired after 5 seconds (which is fine).

Cache check at 17 march 2026:

curl -s “https://generativelanguage.googleapis.com/v1beta/cachedContents?key={MY_KEYS}”

output:

{} - meaning No registered keys whole day.

And right now I see HUGE NUMBERS: 3,418,811.34 hours for $15.38 (in billing 17 march 2026). I’m 1000% sure I didn’t use cache anymore.

How to solve this and return my money?

For 16th march I got same issue with cache that cost me 35$ for nothing.

1 Like

I don’t know what is happening with the API pricing the last 2 days but something is really wrong. my costs went x4 over night

2 Likes

My cache cost still increasing. Looks like it is stuck on server side.

  1. I have removed all keys.
  2. Zero API calls
  3. Still NO cache content in v1beta/cachedContents?key={MY_KEYS}, So I can’t delete ether.

So, how I can stop this background activity?
Or how I can get tech support?
This is not fun to have 36$ per day for nothing….

1 Like

My findings:

  1. When I first noticed this billing anomaly, the very first thing I did was use the Gemini API to FIND and DELETE all existing caches. For the last 3 days, there have been NO caches in the List whatsoever. Whenever I queried the cache list via the API, the result was completely empty. Yet, the charges continued to accumulate for the last 3 days.
  2. Gemini Cache API showed me an empty list, leaving me completely powerless to stop it while it consumed my funds.
  3. It is getting from me daily $ 36 per day, I assume now this is infinite zombie/phantom cache.
  4. Yesterday, as a last resort, I completely disabled the Gemini API in my Cloud Console, hoping this would force-kill the phantom resource. I will wait a few days to account for the 32-hour propagation delay you mentioned, and then reactivate the API to see if the phantom returns.
1 Like

My project has had the same issue. No increase in token usage but 20x in prices, unfortunately destroying the economics of my business. Hope this is an error not an unannounced increase

2 Likes

Check in Google Cloud Reports in your project.
Group by SKU.
And see anomaly.

This is example of mine (blue) anomaly. May be you will get similar.

1 Like

Here is my chart. Even if its not related to your cache problem, for the past few days, the API pricing has been a major issue. We can see it started the same day march 16th.

The day I migrated from 2.5 Flash Lite to 3.1 Flash Lite Preview (dark orange → orange), my Gemini 3 Flash costs (light blue) also exploded — x4 overnight, despite lower token usage.

So I’m hit by two separate problems at once:

  1. Forced migration from 2.5 Flash Lite to 3.1 Flash Lite Preview (2.5 will be deleted on March 31) — output price jumps from $0.34/M to $1.27/M
  2. Unexplained cost explosion on Gemini 3 Flash Preview — nothing changed on my end, yet costs quadrupled

In both cases, the culprit is the same: text output token pricing.

I supsect a silent cost increase from Google side. Apparently we pay the same -HIGH- output cost for thinking and no thinking

1 Like

As you can see with no change in usage my pricing went up 25-50x. I hope this is an error as it renders my product unviable.

1 Like

I also noticed this increase in costs starting on March 16, without using a cache or Google search; I wrote a post about it: URGENT: The cost of API access has increased since March 16-17, 2026

Please change “Group” to “SKU” (top-left filter) and check who eats that much and what it is related to?

Since no one seems to care at Google, the only solution is make a tweet and tag them on X.

like its crazy how a spend limit update on march 16th broke the api cost calculation, and its not the 1st time. What are you doing at this point…

1 Like

You can explain anomaly to the Billing Support.

Next findinds:

I am writing to share new data and findings. Here is the second reason why I am absolutely certain that a “phantom” resource was stuck on your side, and why this case needs to be reviewed by your engineering team.

Currently, my Gemini API has been completely Disabled for three days (since March 18). Interestingly, day by day, those massive cache storage charges are now literally disappearing from my billing graphs.

If you look at the pre-disable graphs for March 15-17, the system was eating about $36 per day. However, after I force-disabled the API, the billing system started retroactively “recalculating” the cache costs back to the actual, correct data.

**Billing Support mentioned a 32-hour propagation delay, but for some reason, this synchronization lag took about 3 days to correct itself. And possibly it’s because I Disabled API. It started cleaning ONLY after I DISABLED API.
**
Furthermore, in a live production environment, I wouldn’t be able to simply disable the entire API to force the billing system to sync. So please check and fix this issue.

My current graph looks like (orange).

I really hope your engineers can fix this frightening delay issue for the Gemini Cache API.

1 Like


Still not fixed after 5 days…

Hi everyone,

I’m following up on the ongoing billing and cache issues mentioned in this thread. To help investigate this further, could you provide more details regarding your account setups?

Specifically, please confirm if you are signed in using your individual primary accounts or if you are accessing the API through a static or shared credential. Any additional information about your authentication method and project configuration would be very helpful.

Best regards,

Google LLC

1 Like

Hi,

Its not only a cache issue, its also Gemini flash 3 preview, 3.1 flash lite preview -and maybe more- using x4/x5 MORE OUTPUT TEXT tokens overnight. I didn’t touch any code or any prompt, same amount of call, same json output size.

Other devs are reporting here:

/

please confirm if you are signed in using your individual primary accounts or if you are accessing the API through a static or shared credential

I don’t know what that means. Im a solo developper with a basic Google cloud account. I created a project, then linked it in Google Ai studio, where i got my API key.

Despite being on the “Free Tier” for experimentation, I am being hit with “Quota Exceeded” errors after only a few interactions. More alarmingly, the API costs associated with these requests have spiked unnaturally, far exceeding the expected billing for the volume of tokens processed.

Given that these are Preview models, I suspect there is a synchronization error between the experimental model’s token counting and the production billing/quota engine. This behavior is inconsistent with the standard limits previously applied to my account.

I am experiencing this issue too on my production application,
Can confirm my application is using an API key assigned in Google AI Studio to make these requests, although one previous answer mentioning that force disabling the API meant that these charges disappeared this is not an option for me at all. The price continues to skyrocket despite request amounts being equal to before march 16th.

Any help would be much appreciated

Thank you for looking into this. I am happy to provide the exact details regarding my setup, as well as a timeline of a test I ran to track this billing anomaly.

Account & Setup Details:

  • Account Type: This is my individual, personal test project. I am the only developer working on it.

  • Authentication Method: I am accessing the Gemini API using a static API Key generated in Google AI Studio. I am not using OAuth for end-users or a Service Account JSON file.

  • Shared Credentials: The API key is strictly private and has never been shared.

  • Tech Stack: Rust backend using the reqwest HTTP client with connection pooling, calling the REST endpoint (generativelanguage.googleapis.com/v1beta).

  • Model: models/gemini-3.1-pro-preview

Cache Test & Logs:
To understand the issue, I ran a strict test using curl. I created a cache with a short TTL (60 seconds) and monitored the API and billing.

1. Creating the cache (TTL: 60s):

% ./cache_create.sh 60s
{
  "name": "cachedContents/frp5xlu6h48e7w2ui9g82imzcoods3e9gb3aonft",
  "model": "models/gemini-3.1-pro-preview",
  "createTime": "2026-03-22T09:34:49.342985Z",
  "updateTime": "2026-03-22T09:34:49.342985Z",
  "expireTime": "2026-03-22T09:35:48.044411163Z",
  "displayName": "",
  "usageMetadata": {
    "totalTokenCount": 4100
  }
}

2. Querying the cache list (Cache exists):

% ./cache_list.sh  
{
  "cachedContents": [
    {
      "name": "cachedContents/frp5xlu6h48e7w2ui9g82imzcoods3e9gb3aonft",
      "model": "models/gemini-3.1-pro-preview",
      "createTime": "2026-03-22T09:34:49.342985Z",
      "updateTime": "2026-03-22T09:34:49.342985Z",
      "expireTime": "2026-03-22T09:35:48.044411163Z",
      "displayName": "",
      "usageMetadata": {
        "totalTokenCount": 4100
      }
    }
  ]
}

3. Querying the cache list after TTL expired (Cache is gone):

% ./cache_list.sh
{}

Billing Observations Timeline:
After confirming the cache was gone via the API, I checked the billing dashboard every hour. Here is what happened with SKU E181-DFF8-56CF:

  • First 12 hours: The SKU did not appear at all (expected delay).

  • Next 12 hours: The SKU appeared, but showed a massive, highly inflated spike of 2801 hours (incorrect).

  • Next Day (24h+ later): The SKU retroactively auto-corrected itself down to 66.85 hours (correct).

My Hypotheses regarding the issue:
Based on these observations, I have two thoughts on what is happening on your backend:

  1. Initial Miscalculation Bug: For some reason, the billing system initially drastically overestimates the cache duration (showing huge spikes on day 1 every time), and only triggers a reconciliation/correction job the following day to fix the data (on next day).

  2. The “Zombie Cache” Scenario (My original issue): In the case where my bill was growing infinitely for 3 days, a cache was expired/deleted, but got stuck in your system. So the billing system assumed it was still active, charging me every hour. It seems the backend only triggered the reconciliation process after I completely Disabled the Gemini API. Once disabled, the system spent the next 3 days (day by day) retroactively correcting the previous 3 days of phantom growth back to the accurate values.

I hope these logs and timeline help your engineering team pinpoint the bug.

After 8 days, still no fix from Google; hopefully the engineers are working on resolving the issue and we’ll get refunds + free credits.

Another dev is reporting the issue:

1 Like