Gemini multimodal live api test code sample occurs resource_exhausted

Hello, I’m testing the multimodal live API based on the sample provided(Multimodal Live API  |  Generative AI on Vertex AI  |  Google Cloud)

import asyncio
from google import genai
from google.genai.live import AsyncSession
from typing import cast

client = genai.Client(
    vertexai=True,
    project="my-project",
    location="my-location"
)
model_id = "gemini-1.5-flash-002"
config = {"response_modalities": ["TEXT"]}

async def main():
    async with client.aio.live.connect(model=model_id, config=config) as session:
        session = cast(AsyncSession, session)
        message = "Hello? Gemini, are you there?"
        print("> ", message, "\n")
        await session.send(input=message, end_of_turn=True)

        async for response in session.receive():
            print(response.text)

if __name__ == "__main__":
    asyncio.run(main())

According to the documentation, there is a limit of three concurrent requests per account. I’m curious about the criteria used to define this concurrency. When I run the sample code exactly four times, the script works fine for the first three runs, but on the fourth attempt, the following error occurs. (I ran the script sequentially, right after the previous one completely ends)

websockets.exceptions.ConnectionClosedError: received 1011 (internal error) Request trace id: 849e20aca9201aad, [ORIGINAL ERROR] generic::resource_exhausted: RESOURCE_EXHAUSTED: Maximum concurrent se; then sent 1011 (internal error) Request trace id: 849e20aca9201aad, [ORIGINAL ERROR] generic::resource_exhausted: RESOURCE_EXHAUSTED: Maximum concurrent se

Am I doing something wrong?

My Environment:
Python 3.11.7
google-genai 0.5.0

Hi @blue_hope, Welcome to the forum.

So, basically, when you run the same code multiple times, you’re creating multiple sessions. According to the documentation, the maximum limit is 3 sessions.

Instead, you can process multiple requests within the same session.

Below is a sample code:

import time

async def interact_with_gemini(text_input):
    """Sends a message to Gemini and displays the response."""
    display(Markdown(f"**Input:** {text_input}"))
    
    await session.send(input=text_input, end_of_turn=True)

    response = []
    async for message in session.receive():
        if message.text:
            response.append(message.text)

    display(Markdown(f"**Response >** {''.join(response)}"))

# Establish the connection outside the loop
async with client.aio.live.connect(
    model=MODEL_ID,
    config=config,
) as session:
    # Process multiple requests within the same session
    for i in range(10):
        text_input = "Hello? Gemini are you there?"  
        await interact_with_gemini(text_input)
        
        # Introduce a delay between requests
        time.sleep(15)

Thanks!

Hello, thank you for your response.
I have essentially finished the code, but I’m wondering why the number of concurrent sessions does not decrease.
In the line async with client.aio.live.connect(...) as session, I’ve confirmed that the connection is being closed at the __aexit__ level of the SDK when the context ends. Therefore, I expected the session count to decrease by one.

Hey @blue_hope, you are right, there seems to be an issue when we initialize the client using OAuth.

When we use the API key, it works as expected. Thanks for pointing out the issue, I’ll escalate it.