Hi everyone,
I am experiencing a persistent production issue. I recently migrated my agentic workflow from Google AI Studio to Google Vertex AI (GCP) to ensure production-grade stability. My stack consists of the Agno framework, and the application is hosted on Railway.
Despite being on a paid GCP project, I am constantly hitting 429: Resource has been exhausted errors, leaving my customers without service.
Error Log:
[2026-03-16 22:40:11,497: WARNING] Malu returned unexpected content type str for session... "code": 429, "message": "Resource has been exhausted (e.g. check quota).", "status": "RESOURCE_EXHAUSTED"
Key Details:
-
Framework: Agno (my main agent, “Malu”, manages complex tasks).
-
Infrastructure: Hosted on Railway.
-
Context: This started happening more frequently after the migration to Vertex AI.
I need urgent help with:
-
Quota Increases: Are there specific “Quotas & System Limits” in the GCP Console that I should look for to increase RPM/TPM for Gemini specifically for Vertex AI?
-
Environment Conflict: Could the fact that I am running on Railway (cloud environment) be affecting how Vertex AI throttles my requests?
I am at a point where I’ll have to migrate to Claude or OpenAI if I can’t stabilize this, as the complaints from my users are piling up. Any help would be greatly appreciated.
Best regards,