Tier 2 paid project hits 429 RESOURCE_EXHAUSTED on first API request, quota usage shows 0%

Problem

Our project (Paid Tier 2, gemini-2.0-flash) consistently returns HTTP 429 RESOURCE_EXHAUSTED on the very first API request of each execution, starting 2 days ago. GCP Console quota page shows all relevant quotas at 0% usage, but actual calls fail with 429.

What I’ve tested

Setup Result
Original GCP project (Paid Tier 2) + original API key 429
Same project + brand new API key (no restrictions) 429
Brand new GCP project (Paid Tier 1) + brand new key 429

All 3 combinations fail with 429 on the first request — strongly suggests this is account-level, not project-level.

Quota status (verified on the project)

For gemini-2.0-flash:

  • RPM Paid Tier 2: 10,000 (0% usage)
  • TPM input Paid Tier 2: 3,000,000 (0% usage)
  • RPD: unlimited (0% usage)

API keys: no IP / Referrer / Application restrictions set.

Reproduction details

  • Caller: Google Apps Script via UrlFetchApp
  • Model: gemini-2.0-flash
  • Per-request token size: ~30,000-50,000
  • First request fails immediately with 429
  • Retries (3x with 20s delay) sometimes succeed after retry, sometimes all 3 fail
  • Same key from local curl works fine (10x burst at 40k tokens, all 200)

What I’d like to know

  1. Why does a Tier 2 paid project hit 429 immediately when GCP Console shows 0% usage for the corresponding project/model?
  2. Are there account-level quotas that aren’t visible in the per-project quota page?
  3. Why does the same key work from local curl but fail from GAS UrlFetchApp? Is there a different quota bucket or rate limit when called from Google’s own infrastructure?
  4. The 429 response body doesn’t include quotaMetric details. How can I identify the exact quota being violated?

Any guidance appreciated.

I think the answer to this is that Google has pretty … communication and is shutting down this model. Lots of users have reported this on Gemini 2.0 Flash and the only solution has been to move to the 3.x series models.

It’s mind-bogglingly because 2.0 flash still is not supposed to be deprecated and they don’t have any actual replacements for its performance/latency but here we are. Basically they want people on models marked “preview” rather than production-stable tagged models.

Probably not the answer you’re hoping for but we’ve seen this behaviour with this exact model and Google has said nothing publicly. This is just based on what myself and many other users have experienced, unfortunately :frowning:

I have been reporting this exact issue for a month through various channels, yet there has been zero response.

I am at a loss as to what Google is doing. Both the decision to deprecate models and the current state of maintenance are executed so poorly. It is deeply disappointing to see a tech giant show such blatant disregard for its users’ workflows and data consistency.

How can we trust Google’s APIs in the future? Look at how they treat their users.