Gemini-2.5-pro accessed over https://generativelanguage.googleapis.com/v1beta/openai/ has dramatic latency increase

Viktor_Malyi · June 27, 2025, 6:41am

I’m extensively using Gemini 2.5 Pro in an agentic workflow which accesses the model using the OpenAI compatibility approach (to be able to accommodate that model in OpenAI Agents SDK).

Since yesterday, 26.06, the model latency has increased significantly, taking multiple minutes to process requests, which previously took 10-15 seconds leading to timeouts and retry requests, which also fail in more than half of the cases.

The timeline of the issue is along with Patrick’s message on X about starting to serve requests for older 2.5 Pro model versions to actual one: https://x.com/OfficialLoganK/status/1937286302614929409

Switching to any of preview versions of Gemini 2.5 Pro didn’t help.

gemini-2.5-flash doesn’t seem to be affected and works under the same circumstances as expected but not an option for for my particular agentic workflow because of lower capabilities.

I haven’t tried accessing Gemini API directly instead of using the compatibility approach, so cannot provide information if the Pro model has acceptable latency in this case.

Anyone else facing the same?

Viktor_Malyi · June 27, 2025, 6:47am

These are the actual latency numbers for 2.5-pro

Viktor_Malyi · June 27, 2025, 6:48am

The token amount the model operates with is very modest.

Bildschirmfoto 2025-06-27 um 08.48.12

Roh_Jos · June 27, 2025, 3:41pm

I’m having the same issue, but for both the 2.5 pro and 2.5 flash.

Clintin_Brummer · June 28, 2025, 5:24am

When running the model and calling the API key I also encountered this problem you’re going to have to run pip install rust and pip install cryptography to fix the error,

Viktor_Malyi · June 29, 2025, 10:29am

When running the model and calling the API key I also encountered this problem you’re going to have to run pip install rust and pip install cryptography to fix the error,

@Clintin_Brummer, how would this affect latency most likely caused by new model deployment on the API side?

Khachik_Smbatyan · July 1, 2025, 11:26am

I’m encountering the same issue across all versions of gemini-2.5-pro
Screenshot 2025-07-01 at 15.27.51

Viktor_Malyi · July 1, 2025, 1:02pm

I’m encountering the same issue across all versions of gemini-2.5-pro

@Khachik_Smbatyan, what I’ve just discovered to mitigate this is to switch to Runner in OpenAI Agent SDK to run in streaming mode – that magically fixed the issue.

See Streaming - OpenAI Agents SDK for details.

Khachik_Smbatyan · July 1, 2025, 1:12pm

Hi Viktor, thank you I’ll give it a try. The main challenge is that we currently interact with the API using the genai client, in a manner similar to the example below. I’ll look into updating this to support streaming.

from google import genai
from google.genai import types

client = genai.Client()

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents=["Explain how AI works"],
    config=types.GenerateContentConfig(
        temperature=0.1
    )
)
print(response.text)

Clintin_Brummer · July 3, 2025, 5:03am

Good morning I do apologize for the later reply currency using the Gemini-Cli setup I am calling the model using API key as an example here it is my launch prompt llm -m gemini-2.5-pro -c | tee >(espeak -v en-208 -p 0 -s 168 -g2 -k1) .
Please keep in mind that before the pipe functionality which is the straight line you’re going to have to input your message to the model within quoteations.
In terms of launching a new model I experience very little extended or added latency as opposed to running the model within an externall environment.
The error comes with the fact that the call made was via the Beta version . I do suggest rather using the Gemini-Cli approach especially when you’re running a device with limited performance and or hardware capabilities

Nikhil_M_S · July 21, 2025, 6:15am

This is good. Actually I worked with Gemini 2.5 flash and looks like performance has improved

Topic		Replies	Views
Persistent High Latency with `gemini-2.5-pro` Gemini API generative-ai , gemini-2-5	4	1121	July 26, 2025
Extreme latency on gemini-1.5-flash API Gemini API api , models	3	736	January 6, 2025
Gemini API latency Issues Gemini API bug , api , issues	6	782	September 23, 2025
Very slow response time on the new 2.5 Pro 0605 model Gemini API generative-ai , gemini-2-5	4	2421	June 27, 2025
Unexpected Delay in Gemini-1.5-Flash API Responses Gemini API gemini-15 , api	2	801	November 21, 2024

Gemini-2.5-pro accessed over https://generativelanguage.googleapis.com/v1beta/openai/ has dramatic latency increase

Related topics