URGENT: The cost of API access has increased since March 16-17, 2026

Hello,

Since March 16–17, 2026, I have noticed a significant increase in the costs for API calls using the gemini-3-flash-preview model. We do not use the caching system or the Google search system.

Our cost calculations (in terms of user credits UCR) based on the tokens (promptTokenCount, thoughtsTokenCount, and candidatesTokenCount) no longer match the associated costs displayed in Google AI Studio since March 16–17, 2026. The number of requests is valid and has not changed.

Before March 16, we had approximately 100k UCR available for a $1 spend.

On March 16, we saw this value change; we went down to 35k UCR for $1.

From March 17 to today, we are now at 8k UCR for $1.

Details:

March 14: $1 => 100k UCR

March 15: $1 => 100k UCR

March 16: $1 => 35k UCR

March 17: $1 => 8k UCR

March 18: $1 => 8k UCR

March 19: $1 => 8k UCR

Cost chart:

What has happened since March 16 with the gemini-3-flash-preview model?

Change your Grouping (top-left) filter to SKU and look who is eating so much?

same for me here, at the same date ! :

For me its still output token, literaly a x6 cost increase out of nowhere.

What do you mean? I didn’t understand where to look.

Actually, I get the impression that before March 16, we weren’t charged the actual rate for the gemini-3-flash-preview model, but a lower rate. I just checked the actual cost based on the tokens sent and consumed, and actually, the price today seems fair. It was the price displayed before March 16 that was much lower than the actual price we should have paid for this model.

Have you calculated the actual expected cost based on the tokens sent and used, using the current rate schedule?

No it seems not. When I analyze my SKUs on Google Cloud from march 16th, the flash 3 model used way more output tokens that it did before, like a x4/x5 compare to the previous period. Same thing for the 3.1 flash lite preview model

I didn’t touch any code or any prompt.

Thanks for flagging, we’re looking into this. Could you please DM your project number to help with the investigation?

I have the same problem

(Note that the token use on yesterday increased %15 compared to march 5, while the costs has increased almost 6 times)

(Note2 : i use gemini-3-flash-preview model)

My Gemini API have suddenly skyrocketed from this is serious

The same here. Since the 16th of March, the cost increased to x4

I have to beg on Twitter about this 2 weeks ongoing issue, tagging Gemini devs, but since I don’t have many followers, everyone ignores me :upside_down_face:

oh BTW :upside_down_face: :

Hi, we are investigating the cost surge on gemini-3-flash-preview. Initial analysis shows that requests with default configurations began generating and billing thinking tokens around this timeframe. We will follow up with next steps once this is confirmed.
In the meantime, please explicitly disable thinking in your API calls by setting “thinking”: false in your generation_config.

Thank you. Glad it will be fixed

Can you folks send me your project ids at alicevik@google.com please?

Hi folks,

Thank you for your patience while we investigated the recent cost surge reported on the Gemini 3 Flash model. We have completed our root cause analysis, and we’ve identified that the increase is due to a recently fixed billing bug where we were previously undercharging for certain usage. What you are seeing now is the correct, intended billing amount.

To clarify exactly what changed, here is the nuance behind the behavior:

  1. By default, if you don’t explicitly specify any thinking level settings in your request, the Gemini 3 Flash model defaults to HIGH / dynamic thinking level. This is intended behavior and has been the same since we launched this model and will continue being so.

  2. However, previously, there was a bug where users who relied on this default behavior (leaving the thinking configuration blank) were receiving those generated thinking tokens for free. On March 16th, we deployed a fix to our billing system. As of that date, default requests are now accurately being billed for the thinking tokens they generate.

We will not be retrospectively charging usage for these tokens that were offered for free, so you don’t have to worry about additional charges, but we’re also not planning to offer refunds as we were underchaging.

Your options going forward:
If cost is a strict priority for you, you can adjust the model’s thinking behavior.

I also would like to offer a small correction to previous guidance by Nabila: first, I recommend caution before trying to minimize thinking before experimenting with other settings first. Gemini models are trained for thinking and deliver better results with thinking mode enabled. So, depending on your specific use case, minimizing thinking may result in lower performance and response quality. You can experiment with different settings (e.g low thinking, medium thinking, etc) I would suggest running a small evaluation on your end before pushing this change to production. You can also inspect your thinking token usage to get a better sense of the cost-to-performance tradeoff for your specific configurations.

Second, if you still would like to minimize thinking behavior, you need to explicitly set thinking level to MINIMAL. (it’s not a boolean flag) The linked documentation should walk you through this.

I also need to note that you can’t actually disable thinking entirely (there’s no way to do so, as the model is trained to think) but setting it to minimal limits it to the minimum that it can, in most cases model won’t think, for some complex tasks like coding, it might generate some thinking tokens.

Thank you again for raising this, feel free to reach out here or directly to me with questions and feedback. I do my best to read and respond to them.

Thanks,
Ali - Product Lead for Gemini API

Thank you for your investigation

We received an unexpected bill for $13,000 USD on April 17th. We have only integrated Firebase Remote Templates for AI logic into our Android app, which is currently in its early testing phase with zero external users. During our tests, we generated only about 10 pieces of text and images. If this charge cannot be waived, we will be forced to shut down all services, terminate our relationship with Google, and dissolve our studio.

What was supposed to be Firebase AI Remote Templates feels more like ‘Remote Terminate Everything’. This unexpected $13,000 bill is effectively a kill switch for our business.

Hi Adam, please send over your project ID, I’ve listed my email above.