Persistent High Latency with `gemini-2.5-pro`

Khachik_Smbatyan · July 2, 2025, 9:03am

We are experiencing significant latency issues with the gemini-2.5-pro model when using the generate_content endpoint.

Configuration:

genai_config = {
    'automatic_function_calling': {'disable': True},
    'tool_config': {'function_calling_config': {'mode': 'auto'}},
    'thinking_config': {'thinking_budget': -1, 'include_thoughts': True}
}

Example Call:

response = await client.aio.models.generate_content(
    model="gemini-2.5-pro",
    config=genai_config,
    contents=messages
)

Observed Latency:

Request duration: ~7 minutes

Usage metadata:

prompt_token_count: 3736  
candidates_token_count: 132  
thoughts_token_count: 150  
total_token_count: 4018

Dashboard Metrics:

Average latency: ~35 seconds
99th percentile latency: 8 minutes

These latency issues have persisted for the past 3–4 days and are affecting nearly all requests at the tail end. Is this a known issue, and is any mitigation currently underway? Has anyone else observed similar behavior?

Krish_Varnakavi1 · July 3, 2025, 2:24am

Hi @Khachik_Smbatyan,

Thanks for providing your configuration settings and yes, I have observed the same as well.. I will try to find out the root cause of this problem and try to resolve it soon..

In the meanwhile, please try again in few days and let me know if you are still observing such latencies.

Khachik_Smbatyan · July 14, 2025, 7:55am

Hi @Krish_Varnakavi1,
We are still encountering the same issue. Have you had a chance to investigate it?
Thank you in advance

Bhavesh_Jain · July 19, 2025, 5:41am

Facing the same issue, for the request with 10k tokens, latency reaches 8-10 minutes, completely destroying the user experience on the app.

Also 429 errors randomly pop up even with 1RPM.

FidFed · July 26, 2025, 11:18pm

I face the same issue, any news?

Topic		Replies	Views
Gemini-2.5-pro accessed over https://generativelanguage.googleapis.com/v1beta/openai/ has dramatic latency increase Gemini API api , model , gemini-2-5	10	255	July 21, 2025
Very slow response time on the new 2.5 Pro 0605 model Gemini API generative-ai , gemini-2-5	4	1034	June 27, 2025
Extreme latency on gemini-1.5-flash API Gemini API api , models	3	578	January 6, 2025
Gemini API so slow . Am i doing something wrong? Gemini API api , prompt	7	4395	November 21, 2024
Gemini 2.5-pro-preview-06-05 extremely slow Google AI Studio feedback , gemini-2-5	4	623	June 30, 2025

Persistent High Latency with `gemini-2.5-pro`

Related topics