Any tip to reduce response time and allow more users at the same time without crashing?

Hi everyone,

I’m a PhD student who developed an app working with Gemini API through Google AI studio for the purpose of my research thanks to Google Education Teams credit (I’m on paid 1). I know basically nothing to code or game development, but Google AI Studio allowed me to build a pretty cool AI narration roleplay interactive adventure game (D&D like) for my students. The only two main problems I have so far are:

  • If more than 5 students are playing it at the same, the game crashes or doesn’t even start (error 500). I can have only 4-5 people max connected and making requests, while I would need at least 10 users to be connected and playing simultaneously.
  • The AI response time for the requests is way too long. The aim of the app is to improve students’ French oral skills, so the conversation between the user and the AI narrator (+Imagen generating an illustration based on the new narration each time) has to be fluid. The fact my students have to wait for 1 or 2 minutes each time between short oral response they give is killing the immersion, making them frustrated, and making me loosing data since they spend more time waiting for Gemini to answer than actually interacting with the app or talking French.

How could I improve that ? As I said, I don’t know anything about coding, Gemini API or anything else that could help me understand where the problem comes from by looking at the metrics, so any tip is appreciated.

Hi @Anna_Laura,

For Error 500, you’re likely hitting the default API rate limit (60 requests/minute). Request a quota increase for the Gemini API in your Google Cloud Console to support more students(ex. requesting 120 or 180 to support 10+ users)

To Fix Slow Responses and get the text back word-by-word, if you are using an API calling look for a “streaming” option. If you are working with a code snippet from AI Studio, modify the generation call : response = model.generate_content(prompt, stream=True)

For separate text and image generation, change your app’s logic to first generate the text narration and display it to the student and once the text is displayed, make a separate API call in the background to generate the image with Imagen. In this way, the student can start reading and preparing their response while the image loads, rather than waiting for both to finish.