Persistent High Latency with `gemini-2.5-pro`

We are experiencing significant latency issues with the gemini-2.5-pro model when using the generate_content endpoint.

Configuration:

genai_config = {
    'automatic_function_calling': {'disable': True},
    'tool_config': {'function_calling_config': {'mode': 'auto'}},
    'thinking_config': {'thinking_budget': -1, 'include_thoughts': True}
}

Example Call:

response = await client.aio.models.generate_content(
    model="gemini-2.5-pro",
    config=genai_config,
    contents=messages
)

Observed Latency:

  • Request duration: ~7 minutes

  • Usage metadata:

    prompt_token_count: 3736  
    candidates_token_count: 132  
    thoughts_token_count: 150  
    total_token_count: 4018  
    

Dashboard Metrics:

  • Average latency: ~35 seconds
  • 99th percentile latency: 8 minutes

These latency issues have persisted for the past 3–4 days and are affecting nearly all requests at the tail end. Is this a known issue, and is any mitigation currently underway? Has anyone else observed similar behavior?

1 Like

Hi @Khachik_Smbatyan,

Thanks for providing your configuration settings and yes, I have observed the same as well.. I will try to find out the root cause of this problem and try to resolve it soon..

In the meanwhile, please try again in few days and let me know if you are still observing such latencies.

2 Likes

Hi @Krish_Varnakavi1,
We are still encountering the same issue. Have you had a chance to investigate it?
Thank you in advance

1 Like

Facing the same issue, for the request with 10k tokens, latency reaches 8-10 minutes, completely destroying the user experience on the app.

Also 429 errors randomly pop up even with 1RPM.

I face the same issue, any news?