Hello, I’m testing the multimodal live API based on the sample provided(Multimodal Live API | Generative AI on Vertex AI | Google Cloud)
import asyncio
from google import genai
from google.genai.live import AsyncSession
from typing import cast
client = genai.Client(
vertexai=True,
project="my-project",
location="my-location"
)
model_id = "gemini-1.5-flash-002"
config = {"response_modalities": ["TEXT"]}
async def main():
async with client.aio.live.connect(model=model_id, config=config) as session:
session = cast(AsyncSession, session)
message = "Hello? Gemini, are you there?"
print("> ", message, "\n")
await session.send(input=message, end_of_turn=True)
async for response in session.receive():
print(response.text)
if __name__ == "__main__":
asyncio.run(main())
According to the documentation, there is a limit of three concurrent requests per account. I’m curious about the criteria used to define this concurrency. When I run the sample code exactly four times, the script works fine for the first three runs, but on the fourth attempt, the following error occurs. (I ran the script sequentially, right after the previous one completely ends)
websockets.exceptions.ConnectionClosedError: received 1011 (internal error) Request trace id: 849e20aca9201aad, [ORIGINAL ERROR] generic::resource_exhausted: RESOURCE_EXHAUSTED: Maximum concurrent se; then sent 1011 (internal error) Request trace id: 849e20aca9201aad, [ORIGINAL ERROR] generic::resource_exhausted: RESOURCE_EXHAUSTED: Maximum concurrent se
Am I doing something wrong?
My Environment:
Python 3.11.7
google-genai 0.5.0