Vertex API: Gemini 2.0 Flash returns 429 errors after minimal traffic

Hi there,

I’m using Gemini 2.0 Flash through the Vertex AI API and encountering HTTP 429 (Too Many Requests) errors when scaling up traffic. By “scaling up,” I mean reaching less than 1% of the documented quota (30,000 online prediction requests per minute per region).

Given that we’re operating well below the specified limits, I’m unsure what might be causing these rate limit errors.

The account is on a paid tier and has a billing account linked.

Could someone from Google please take a look?

Project ID: 218588543062

Hi @DanielvA,

Welcome to the Forum,

Thank you for flagging this issue. To help us prioritize, please share the entire error message.

Thanks for your response. The error message is:
{ “error”: { “code”: 429, “message”: "Resource exhausted. Please try again later. Please refer to https://clou (truncated…)

Hi @DanielvA,

Vertex AI handles rate limits differently. By default, all users have dynamic shared quota meaning users can use as much quota as is available globally. If there’s no capacity available globally, users can get 429s at low usage. For guaranteed throughput, users can pre-purchase provisioned throughput.

Thank you!