URGENT: The cost of API access has increased since March 16-17, 2026

MorganCDIP · March 20, 2026, 10:43am

Hello,

Since March 16–17, 2026, I have noticed a significant increase in the costs for API calls using the gemini-3-flash-preview model. We do not use the caching system or the Google search system.

Our cost calculations (in terms of user credits UCR) based on the tokens (promptTokenCount, thoughtsTokenCount, and candidatesTokenCount) no longer match the associated costs displayed in Google AI Studio since March 16–17, 2026. The number of requests is valid and has not changed.

Before March 16, we had approximately 100k UCR available for a $1 spend.

On March 16, we saw this value change; we went down to 35k UCR for $1.

From March 17 to today, we are now at 8k UCR for $1.

Details:

March 14: $1 => 100k UCR

March 15: $1 => 100k UCR

March 16: $1 => 35k UCR

March 17: $1 => 8k UCR

March 18: $1 => 8k UCR

March 19: $1 => 8k UCR

Cost chart:

What has happened since March 16 with the gemini-3-flash-preview model?

Liz2k · March 20, 2026, 12:05pm

Change your Grouping (top-left) filter to SKU and look who is eating so much?

junkx · March 20, 2026, 12:27pm

same for me here, at the same date ! :

For me its still output token, literaly a x6 cost increase out of nowhere.

junkx · March 20, 2026, 12:44pm

MorganCDIP · March 20, 2026, 12:48pm

What do you mean? I didn’t understand where to look.

MorganCDIP · March 20, 2026, 2:30pm

Actually, I get the impression that before March 16, we weren’t charged the actual rate for the gemini-3-flash-preview model, but a lower rate. I just checked the actual cost based on the tokens sent and consumed, and actually, the price today seems fair. It was the price displayed before March 16 that was much lower than the actual price we should have paid for this model.

Have you calculated the actual expected cost based on the tokens sent and used, using the current rate schedule?

junkx · March 20, 2026, 3:04pm

No it seems not. When I analyze my SKUs on Google Cloud from march 16th, the flash 3 model used way more output tokens that it did before, like a x4/x5 compare to the previous period. Same thing for the 3.1 flash lite preview model

I didn’t touch any code or any prompt.

shrutimehta · March 20, 2026, 6:11pm

Thanks for flagging, we’re looking into this. Could you please DM your project number to help with the investigation?

VengeanceTheNight · March 21, 2026, 10:13am

I have the same problem

(Note that the token use on yesterday increased %15 compared to march 5, while the costs has increased almost 6 times)

(Note2 : i use gemini-3-flash-preview model)

James_Aduke · March 24, 2026, 6:04am

My Gemini API have suddenly skyrocketed from this is serious

Jay04653 · March 30, 2026, 12:13pm

The same here. Since the 16th of March, the cost increased to x4

junkx · March 30, 2026, 4:17pm

I have to beg on Twitter about this 2 weeks ongoing issue, tagging Gemini devs, but since I don’t have many followers, everyone ignores me

oh BTW :

Nabila_Babar · April 15, 2026, 4:45am

Hi, we are investigating the cost surge on gemini-3-flash-preview. Initial analysis shows that requests with default configurations began generating and billing thinking tokens around this timeframe. We will follow up with next steps once this is confirmed.
In the meantime, please explicitly disable thinking in your API calls by setting “thinking”: false in your generation_config.

junkx · April 15, 2026, 4:29pm

Thank you. Glad it will be fixed

Ali_Cevik · April 15, 2026, 6:20pm

Can you folks send me your project ids at alicevik@google.com please?

Ali_Cevik · April 15, 2026, 9:05pm

Hi folks,

Thank you for your patience while we investigated the recent cost surge reported on the Gemini 3 Flash model. We have completed our root cause analysis, and we’ve identified that the increase is due to a recently fixed billing bug where we were previously undercharging for certain usage. What you are seeing now is the correct, intended billing amount.

To clarify exactly what changed, here is the nuance behind the behavior:

By default, if you don’t explicitly specify any thinking level settings in your request, the Gemini 3 Flash model defaults to HIGH / dynamic thinking level. This is intended behavior and has been the same since we launched this model and will continue being so.
However, previously, there was a bug where users who relied on this default behavior (leaving the thinking configuration blank) were receiving those generated thinking tokens for free. On March 16th, we deployed a fix to our billing system. As of that date, default requests are now accurately being billed for the thinking tokens they generate.

We will not be retrospectively charging usage for these tokens that were offered for free, so you don’t have to worry about additional charges, but we’re also not planning to offer refunds as we were underchaging.

Your options going forward:
If cost is a strict priority for you, you can adjust the model’s thinking behavior.

I also would like to offer a small correction to previous guidance by Nabila: first, I recommend caution before trying to minimize thinking before experimenting with other settings first. Gemini models are trained for thinking and deliver better results with thinking mode enabled. So, depending on your specific use case, minimizing thinking may result in lower performance and response quality. You can experiment with different settings (e.g low thinking, medium thinking, etc) I would suggest running a small evaluation on your end before pushing this change to production. You can also inspect your thinking token usage to get a better sense of the cost-to-performance tradeoff for your specific configurations.

Second, if you still would like to minimize thinking behavior, you need to explicitly set thinking level to MINIMAL. (it’s not a boolean flag) The linked documentation should walk you through this.

I also need to note that you can’t actually disable thinking entirely (there’s no way to do so, as the model is trained to think) but setting it to minimal limits it to the minimum that it can, in most cases model won’t think, for some complex tasks like coding, it might generate some thinking tokens.

Thank you again for raising this, feel free to reach out here or directly to me with questions and feedback. I do my best to read and respond to them.

Thanks,
Ali - Product Lead for Gemini API

junkx · April 15, 2026, 9:43pm

Thank you for your investigation

Adam_Hook · April 19, 2026, 2:41pm

We received an unexpected bill for $13,000 USD on April 17th. We have only integrated Firebase Remote Templates for AI logic into our Android app, which is currently in its early testing phase with zero external users. During our tests, we generated only about 10 pieces of text and images. If this charge cannot be waived, we will be forced to shut down all services, terminate our relationship with Google, and dissolve our studio.

What was supposed to be Firebase AI Remote Templates feels more like ‘Remote Terminate Everything’. This unexpected $13,000 bill is effectively a kill switch for our business.

Ali_Cevik · April 22, 2026, 8:41am

Hi Adam, please send over your project ID, I’ve listed my email above.

Topic		Replies	Views
Gemini API costs skyrocketed overnight (x4 in 3 days) Gemini API gemini-api	3	562	May 9, 2026
API costs exploded after February 4th with same usage Gemini API api , gemini	4	225	February 27, 2026
Sudden Cost Spike with gemini-3-flash-preview Despite Decreased Usage (April 2026) Gemini API gemini-flash , gemini-3	28	1089	April 28, 2026
Gemini API cost suddenly skyrocketed Gemini API api , gemini	100	8683	February 13, 2026
Issue with Gemini 3 products: Price uptick 4x due to "Generate content search query Gemini 3" Gemini API bug , gemini	7	204	April 15, 2026

URGENT: The cost of API access has increased since March 16-17, 2026

My Gemini API have suddenly skyrocketed from this is serious

Related topics