I built a mobile app using the Nano Bana Pro API key from the Gemini API to test prompt generation and image responses.
Initial Setup (Testing Phase)
Integrated the Gemini API directly into the frontend and Everything worked perfectly during testing, Response time was under 1 minute per image.
Production Issues
When I moved the app to production, I started facing serious issues:After a few production requests, response time increased to 7–8 minutes per image. This slowdown happened even though the same setup worked fine during testing
I initially deployed the backend using App Hosting from firebase, After seeing delays, I switched to another backend provider,The issue still persisted, After a few requests, response times again increased to 7–8 minutes
it seem like Google might be: Detecting the request source and Throttling requests dynamically Or enforcing some hidden production limits
Errors Encountered (Gemini API)
After continued usage, I started receiving the following error:
{
"error": {
"code": 503,
"message": "The model is overloaded. Please try again later.",
"status": "UNAVAILABLE"
}
}
Switch to Vertex AI
Due to the instability, I migrated the app to Vertex AI and similar problems occurred in it well:
Response times again increased to 7–8 minutes, After a few requests, I started receiving 503 errors
Example error response:
{
"error": {
"code": 503,
"message": "The service is currently unavailable.",
"status": "UNAVAILABLE"
}
}