Gemini 2.0 Async Endpoint leading to 429, but Sync doesn't

Tyler_Zhu · March 2, 2025, 6:11am

I am currently using the VertexAI API and have synchronous scripts setup and working well. However, I would like to set them up to use the asynchronous endpoints, which is giving me repeated 429 errors even on my very first call, leading me to think the models are down or the endpoints are different. My example code is the following.

from google import genai 

client = genai.Client(
        vertexai=True, 
        http_options=HttpOptions(api_version="v1"),
        location="us-east5",
        project="proj-name"
)

response = await client.aio.models.generate_content(
                model="gemini-2.0-flash",
...
)

The exact version of this with client.models.generate_content() and same API account runs smoothly, so I’m confused why i’m getting a 429 RESOURCE_EXHAUSTED error even on my first run, especially when i’m using semaphore to limit the async calls as well.

jkirstaetter · March 3, 2025, 5:27am

Hi @Tyler_Zhu

Welcome to the forum.

Apologies, I’m confused by your source code.
Why do you specify the http_options parameter? Using Vertex AI implicitly uses the v1 version of the API. However, the use of client.aio suggests that you’re experimenting with the latest v1alpha version. Furthermore,. AFAIK, using streaming is based on the method generate_content_streaming.

The HTTP 429 response indicates an issue with the quota. Try using a different model first, then if the issue persists, request an increase of quota on GCP.

Cheers.

Tyler_Zhu · April 10, 2025, 6:39pm

Hi @jkirstaetter,

Thank you for the response, and sorry for the late reply. I was just supplying http_options based on tutorials. I’m not tring to do streaming actually, I’m simply trying to issue many requests at the same time in a similar manner to how GPT lets us do async (distinct from streaming).

I found that importing vertexai actually does what I want, i.e.

model = GenerativeModel("gemini-2.0-flash")
response = await model.generate_content_async(prompt)
return response.text

You suggest somewhere else that this is an old way of doing this however. What is the genai equivalent of this?

jkirstaetter · April 11, 2025, 5:20am

Hi @Tyler_Zhu

Maybe you start looking at and running (on Colab) the following Jupyter Notebook to get started with Gemini:

github.com/GoogleCloudPlatform/generative-ai

gemini/getting-started/intro_gemini_2_5_pro.ipynb

main

{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "sqi5B7V_Rjim"
      },
      "outputs": [],
      "source": [
        "# Copyright 2025 Google LLC\n",
        "#\n",
        "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "#     https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",

This file has been truncated. show original

It covers all aspects of the new genai package and uses Vertex AI as the API endpoints.

Cheers

Topic		Replies	Views
Getting 429 Errors - But Usage Charts Show no Traffic Gemini API api	44	1667	February 20, 2025
Gemini-1.5-pro-002 quotas lower than 001 Gemini API gemini-15 , vertexai	7	1292	November 19, 2024
429 Quota Exceeded with Gemini Pro API Gemini API gemini-api	20	707	May 2, 2025
Why always getting Status 429? Very frustrating Gemini API	18	2578	August 10, 2024
RESOURCE_EXHAUSTED when use gemini-1.5-pro-002 Gemini API gemini-15	8	915	October 2, 2024

Gemini 2.0 Async Endpoint leading to 429, but Sync doesn't

Related topics