Unexpected 429 Errors on Vertex AI (Gemini 2.0/2.5 Flash Lite) via Firebase Functions despite <0.1% quota usage

vexenuch · February 14, 2026, 12:08pm

Hello everyone,

I’m seeking insights into recurring 429 (Too Many Requests) errors I’ve been encountering with Gemini models via Vertex AI, even though my Google Cloud Quota dashboard shows negligible usage (<0.1%). While the errors have become rare after my initial mitigations, they still appear occasionally, and I haven’t been able to clarify if this is an instantaneous quota limit or a specific rate-limiting behavior.

Technical Environment

Platform: React Native Expo (Client)
Backend: Firebase Functions v2 (Node.js 22)
API: Vertex AI (Direct integration)
Models: Gemini 2.5 Flash Lite and Gemini 2.0 Flash Lite
Region: Switched from europe-west1 to Global Endpoint after initial errors.

Use Case & Latency Details

File Processing (Gemini 2.5 Flash Lite): Two functions send file URIs from Cloud Storage to Gemini for analysis. This process typically takes 10-15 seconds per request.
Chatbot (Gemini 2.0 Flash Lite): A single function where the first request includes a file URI (latency ~5s), and subsequent turns are text-only (latency 1-3s, sometimes near-instant).

Implemented Solutions & The “Testing” Anomaly

After encountering initial 429 errors, I moved all models to the Global Endpoint and implemented a Jittered Exponential Backoff (1, 2, 3, 4s intervals). This seemed to resolve the issue initially.

However, during a migration to Node.js 22, I deployed the same 3 functions as new instances with a “-testing” suffix for verification. Surprisingly, I started receiving 429 errors on these “testing” functions even with fewer than 10 concurrent users. After refining the jitter mechanism (randomized increases between 1-2s) and updating my original production functions with this new logic, the 429s disappeared entirely.

My Questions

Since my total quota usage is under 0.1%, could the 429s on newly named functions be caused by “Cold Start” related concurrency bursts or instantaneous rate limits?
Is there a known “Warm-up” period for newly created function names or endpoints on the Vertex AI side where rate limits are more restrictive?
Beyond the standard Quota dashboard, is there a specific metric area in Google Cloud Console to monitor instantaneous throttling (RPM/TPM) specifically for Vertex AI?

I would greatly appreciate any technical insights or experiences from anyone who has encountered similar “new deploy/new name” anomalies.

rp2799 · February 23, 2026, 12:45pm

Same here, we’re getting 429s very frequently for Europe endpoints despite being far below quota.

AstroNight · April 5, 2026, 11:32pm

Are you still having these issues?

Topic		Replies	Views
Sudden Spike in 429 Errors with Gemini 2.5 via Vertex AI Global Endpoint Gemini API vertexai , gemini	7	1312	April 8, 2026
Excessive 429 errors today – rate limit or availability issue? Gemini API rate-limits	3	155	May 12, 2026
429s in Vertex AI for Gemini-2.5-Flash-Lite in Europe Gemini API bug , vertexai , gemini-2-5 , gemini-flash-2-5	9	620	March 30, 2026
Vertex API: Gemini 2.0 Flash returns 429 errors after minimal traffic Gemini API gemini-api , vertex-ai	3	358	July 2, 2025
Spike in 429 RESOURCE_EXHAUSTED with v1beta1 StreamGenerateContent (Gemini 3 Flash Preview / Vertex global) — quotas look fine Gemini API gemini-api , live-streaming , gemini-3	3	402	February 18, 2026

Unexpected 429 Errors on Vertex AI (Gemini 2.0/2.5 Flash Lite) via Firebase Functions despite <0.1% quota usage

Technical Environment

Use Case & Latency Details

Implemented Solutions & The “Testing” Anomaly

My Questions

Related topics