Gemini 2.0 Async Endpoint leading to 429, but Sync doesn't

I am currently using the VertexAI API and have synchronous scripts setup and working well. However, I would like to set them up to use the asynchronous endpoints, which is giving me repeated 429 errors even on my very first call, leading me to think the models are down or the endpoints are different. My example code is the following.

from google import genai 

client = genai.Client(
        vertexai=True, 
        http_options=HttpOptions(api_version="v1"),
        location="us-east5",
        project="proj-name"
)

response = await client.aio.models.generate_content(
                model="gemini-2.0-flash",
...
)

The exact version of this with client.models.generate_content() and same API account runs smoothly, so I’m confused why i’m getting a 429 RESOURCE_EXHAUSTED error even on my first run, especially when i’m using semaphore to limit the async calls as well.

Hi @Tyler_Zhu

Welcome to the forum.

Apologies, I’m confused by your source code.
Why do you specify the http_options parameter? Using Vertex AI implicitly uses the v1 version of the API. However, the use of client.aio suggests that you’re experimenting with the latest v1alpha version. Furthermore,. AFAIK, using streaming is based on the method generate_content_streaming.

The HTTP 429 response indicates an issue with the quota. Try using a different model first, then if the issue persists, request an increase of quota on GCP.

Cheers.