Gemini 2.5 pro and 1.5 pro APIs take forever to respond if input tokens/min > 15K to 16K

shivanis · April 21, 2025, 7:54pm

15K is much lesser than the tokens per minute specified here for gemini 2.5 pro and 1.5 pro-> Rate limits | Gemini API | Google AI for Developers but API calls stall if number of input tokens is more than approx 15K/min. No error/exception is returned and the code just keeps waiting for the Gemini API to return a response

Code is running in a docker environment on EC2 (t3.large)

GUNAND_MAYANGLAMBAM · May 18, 2025, 4:35am

Hey @shivanis , just getting back to you, are you still seeing the same issue?

Topic		Replies	Views
Gemini pro models not response half of the requests Gemini API gemini	4	190	December 25, 2025
Response time for Gemini API Gemini API models , python	5	1401	December 13, 2024
Extreme latency on gemini-1.5-flash API Gemini API api , models	3	759	January 6, 2025
Unexpected Delay in Gemini-1.5-Flash API Responses Gemini API gemini-15 , api	2	819	November 21, 2024
Gemini 3.0 Pro TTFT issue(?) Gemini API models , gemini-api , api-key	7	423	January 6, 2026

Gemini 2.5 pro and 1.5 pro APIs take forever to respond if input tokens/min > 15K to 16K

Related topics