Getting 429 Errors - But Usage Charts Show no Traffic

Damian_Scott · February 13, 2025, 1:26am

FYI, we were experiencing periodic service outage issues (429 or 503 errors) and are on a “pay as you go” plan, it looks like it is due to Google’s new Dynamic Shared Quota (DSQ) “feature”.

DSQ affects the latest versions of Gemini 1.5 Flash (gemini-1.5-flash-002) and Gemini 1.5 Pro (gemini-1.5-pro-002). It distributes on-demand capacity among all queries processed by Google Cloud services for the 1.5-002 and 2.0 models. This means excessive requests within the same region can lead to 429 resource exhausted errors, even for pay-as-you-go accounts. Error code 429 | Generative AI on Vertex AI | Google Cloud

While exponential backoff can help mitigate this, it’s not a foolproof solution. Currently, the only reliable options for production workloads seem to be:

Migrating to a direct Vertex API and switching regions.
Purchasing Provisional Throughput to guarantee resources. Provisioned Throughput overview | Generative AI on Vertex AI | Google Cloud

It’s worth noting that reducing input token count can also improve success rates. Keeping input tokens under 20k and using exponential backoff has resulted in a consistent 70-80% success rate in my recent tests. Input exceeding 20k tokens becomes significantly less reliable (with the current region utilization).

lentil32 · February 13, 2025, 2:10am

I’m experiencing this issue with specific prompts regardless of the model selection.

The prompt:
Translate the following sentence into Korean Sentence
테스트중) 好家伙别人眼中的燃只是一种形容你眼中的燃是事实呗而且都已经燃成这样了还在有人不断的往里扔鞭炮

명확하고 간단 명료하게 작성해줘.

And it’s only with API, cannot reproduce in Google AI Studio.

klopfer · February 13, 2025, 4:51pm

I see a couple of other threads on this problem of 429s appearing as if you’re on free, even when you’re on paid. I’m running into the same issue. 429 Too Many Requests on a paid account with no signs of going anywhere near quota in the dashboard, and only make <5 RPM each with hundreds of tokens. Clearly some people are not having this issue. But I haven’t been able to narrow down anything that changed or I’m doing wrong.

naresh · February 14, 2025, 7:59am

we’re also facing same 429 errors with paid account

anyone here from Google AI team for official response on this ?

current usage is .23%

klopfer · February 14, 2025, 7:49pm

I have found some improvement by sticking strictly to the gemini conversation format (Text generation | Gemini API | Google AI for Developers) which has the multipart format {role: “user”, parts[{text:“text”}]} vs the way I was doing it before {role:“user”, content:“text”}.

cagey · February 15, 2025, 12:41am

Google, fix this and provide an update. We are experiencing in our production app using gemini-1.5-pro!

Geminooo · February 15, 2025, 6:46am

Hey above guys. I found this 429 error from the Google Vertex AI docs

For the models/gemini-2.0-flash the STABLE VERSION is gemini-2.0-flash-001 rather than gemini-2.0-flash

just add the suffix -001 this error will gone!

this is really funny that in models/gemini-1.5-flash

Latest: gemini-1.5-flash-latest
Latest stable: gemini-1.5-flash
Stable:
    gemini-1.5-flash-001
    gemini-1.5-flash-002

update with ???
still get 429 error with gemini-2.0-flash-001 but more stable than gemini-2.0-flash

raventos · February 15, 2025, 4:11pm

Since yesterday, I’ve started getting this error constantly despite being a paid user and even with a very low number of requests per minute.

Remini_Officials · February 16, 2025, 2:51pm

I am also facing the same error in my website ."Modified by moderator"Waiting for solution!

Maak_Beerranger · February 16, 2025, 11:32pm

Got a Google reply about the 429 error with Gemini 2.0 Flash in VertexAI

I got the chance to talk to our business partner at Google this week and told him about the 429 Quota exceed error in Vertex AI, even though we are in a paid tier (1) (don’t ask me what the difference is or how to change) with 2000 requests per minute. The error appeared after 5 requests…

tl;dr: The quota is not guaranteed, so you should consider purchasing “Provisioned Throughput”, BUT Provisioned Throughput is not supported at launch for Gemini 2.0 Flash (as are Fine Tuning, Context Caching and Batch API). So we need to wait A COUPLE OF WEEKS for it to be solved…

My hope is that it’s currently a ressource problem that might solve itself in the next few days somehow and they have allocate more free ressources to us. It’s really a big bummer as we were really looking forward to use 2.0 Flash and the results look promising.

Geminooo · February 17, 2025, 6:12am

I’m really confuse as ‘gemini-1.5-flash’ also got 429 errors. IT’S STABLE VERSION!!! how could GCP Vertex release such “robust” API ?

TitusDecali · February 17, 2025, 12:07pm

Why isn’t anyone from Google replying to these? There’s thousands of us that are dealing with this complete BS issue of “[429 too many requests] resource has been exhausted” after a miniscule amount of requests, even while on a paid tier. One of the largest companies in the world and they have the worst console UI, multiple seemingly disconnected portals that are all strangely linked together with no clear explanation of what’s what, and every time I use the site it memory leaks and crashes after 6gb+ of memory usage in my browser.

Google. Get your sh%t together.

cagey · February 17, 2025, 7:02pm

I was getting the error with gemini-1.5-pro. Does seem to have been resolved as of right now, at least for me.

Geminooo · February 18, 2025, 12:44am

LMAO. PAID users OPENING CASES on Twitter for this 429 errors

while none of GCP from Google replying to these “changes” (Pay as you go with provisioned throughput

I AGREE WITH Dylan’s opinion from Lex Fridman podcast earlier this Feb

 Dylan Patel (04:04:27) … if there’s no revenue for AI stuff or not enough revenue, then obviously, it’s going to blow up. People won’t continue to spend on GPUs forever. And NVIDIA is trying to move up the stack with software that they’re trying to sell and licensed and stuff. But Google has never had that DNA of like, “This is a product we should sell.” The Google Cloud, which is a separate organization from the TPU team, which is a separate organization from the DeepMind team, which is a separate organization from the Search team. There’s a lot of bureaucracy here.
Lex Fridman (04:04:52) Wait. Google Cloud is a separate team than the TPU team?
Dylan Patel (04:04:55) Technically, TPU sits under infrastructure, which sits under Google Cloud. But Google Cloud, for renting stuff-
Dylan Patel (04:05:00) … But Google cloud for renting stuff and TPU architecture are very different goals, and hardware and software, all of this, right? The Jax XLA teams do not serve Google’s customers externally. Whereas NVIDIA’s various CUDA teams for things like NCCL serve external customers. The internal teams like Jax and XLA and stuff, they more so serve DeepMind and Search, right? And so their customer is different. They’re not building a product for them.
Lex Fridman (04:05:27) Do you understand why AWS keeps winning versus Azure for cloud versus Google Cloud?
Dylan Patel (04:05:34) Yeah, there’s-
Lex Fridman (04:05:35) Google Cloud is tiny, isn’t it, relative to AWS?
Dylan Patel (04:05:37) Google Cloud is third. Yeah. Microsoft is the second biggest, but Amazon is the biggest, right?
Lex Fridman (04:05:37) Yeah. 

from: https://lexfridman.com/deepseek-dylan-patel-nathan-lambert-transcript#chapter17_ai_megaclusters

arekn · February 18, 2025, 8:24am

I’m paying customer of GCP and I’m getting rate limited after 2 requests to Gemini Flash and other models on Vertex AI. I’ve been having that for over 2 weeks. I specifically bought support tier to handle that. After 2 weeks of pointless back and forth where I spent many hours testing and documenting I got informed that I should buy provisioned throughput. Yeah, the one that doesn’t exist.

Great, let me build my startup with 2 requests per minute to a small model.

Did I mention that they are 20 token request / resonses?
It’s outrageous. I just tested it on free tier of public gemini API and got 7 times more throughput.

Vishal · February 18, 2025, 2:10pm

Hey everyone, sorry for the trouble. If you’re experiencing issues, please DM me your GCP project number (9-13 digits) and whether you’re using the Developer API or Vertex AI for debugging purposes.

Congxing_Cai · February 18, 2025, 6:59pm

Start to happen today – for a paid account, limit tokens at almost 1 request per minutes… And, there is no project under GCP to report on. GCP customer service is requesting to have meeting… This is production service…

Martin_Schneider · February 18, 2025, 7:04pm

@Vishal We’re also constantly getting 429 errors despite using a paid plan through the developer API (which we’ve been using fine for months). I believe this is a new issue on Gemini’s side due to the new shared dynamic resource allocation.

I don’t see any way to DM you on this discuss.ai.google.dev portal. Do you have an email that I can contact? Thanks!

Vishal · February 18, 2025, 7:33pm

Hey @Martin_Schneider - just sent you a message to get a thread started

Arthur_Cockfield · February 18, 2025, 8:20pm

Hey @Vishal where are we supposed to dm you? Dealing with this error and it’s affecting my production application

Topic		Replies	Views
429 Quota Exceeded with Gemini Pro API Gemini API gemini-api	20	834	May 2, 2025
Why always getting Status 429? Very frustrating Gemini API	18	2776	August 10, 2024
429 Quota exceeded for quota metric 'Generate Content API requests per minute' Gemini API bug , api	3	167	May 13, 2025
429 Resource has been exhausted even enrolled in paid and within quota Gemini API	7	495	October 5, 2024
Error 429: Free Tier Quota Limit Reached Despite Tier 1 Account Gemini API api , gemini-25	1	246	May 1, 2025

Getting 429 Errors - But Usage Charts Show no Traffic

Got a Google reply about the 429 error with Gemini 2.0 Flash in VertexAI

Related topics