import asyncio
from google import genai
from google.genai.live import AsyncSession
from typing import cast
client = genai.Client(
vertexai=True,
project="my-project",
location="my-location"
)
model_id = "gemini-1.5-flash-002"
config = {"response_modalities": ["TEXT"]}
async def main():
async with client.aio.live.connect(model=model_id, config=config) as session:
session = cast(AsyncSession, session)
message = "Hello? Gemini, are you there?"
print("> ", message, "\n")
await session.send(input=message, end_of_turn=True)
async for response in session.receive():
print(response.text)
if __name__ == "__main__":
asyncio.run(main())
According to the documentation, there is a limit of three concurrent requests per account. I’m curious about the criteria used to define this concurrency. When I run the sample code exactly four times, the script works fine for the first three runs, but on the fourth attempt, the following error occurs. (I ran the script sequentially, right after the previous one completely ends)
websockets.exceptions.ConnectionClosedError: received 1011 (internal error) Request trace id: 849e20aca9201aad, [ORIGINAL ERROR] generic::resource_exhausted: RESOURCE_EXHAUSTED: Maximum concurrent se; then sent 1011 (internal error) Request trace id: 849e20aca9201aad, [ORIGINAL ERROR] generic::resource_exhausted: RESOURCE_EXHAUSTED: Maximum concurrent se
So, basically, when you run the same code multiple times, you’re creating multiple sessions. According to the documentation, the maximum limit is 3 sessions.
Instead, you can process multiple requests within the same session.
Below is a sample code:
import time
async def interact_with_gemini(text_input):
"""Sends a message to Gemini and displays the response."""
display(Markdown(f"**Input:** {text_input}"))
await session.send(input=text_input, end_of_turn=True)
response = []
async for message in session.receive():
if message.text:
response.append(message.text)
display(Markdown(f"**Response >** {''.join(response)}"))
# Establish the connection outside the loop
async with client.aio.live.connect(
model=MODEL_ID,
config=config,
) as session:
# Process multiple requests within the same session
for i in range(10):
text_input = "Hello? Gemini are you there?"
await interact_with_gemini(text_input)
# Introduce a delay between requests
time.sleep(15)
Hello, thank you for your response.
I have essentially finished the code, but I’m wondering why the number of concurrent sessions does not decrease.
In the line async with client.aio.live.connect(...) as session, I’ve confirmed that the connection is being closed at the __aexit__ level of the SDK when the context ends. Therefore, I expected the session count to decrease by one.