Gemini 2.5-pro-preview-06-05 extremely slow

Joao_Lazzaro · June 9, 2025, 11:59am

Gemini 2.5-pro has become unusable. When 03-25 was released it could perform the task of summarizing long context under 3 minutes. When it was updated to 05-06 this time went up to 8 minutes with a very frequent ocurrence of timeouts in 10 minutes. Now, 06-05 is never able to finish the same task even if the thinking budget is set to the minimum.

Krish_Varnakavi1 · June 10, 2025, 1:15am

Hi Joao_Lazzaro,

With the release of new models and old models being deprecated, we have seen a huge bump of daily active users which could be one of the reasons for higher processing times..

Can you please provide some of the long context prompts that was used to test the model along with processing times.. It helps us to compare the results from our end and can successfully rule-out other performance reasons.

Joao_Lazzaro · June 10, 2025, 12:20pm

I cannot share the prompt since it consists mostly of proprietary documents (around 190k tokens of documents) and some system prompt in the style of “You are a helpful assistant”, plus a user question. I tried to use 06-05 yesterday again with no success so we’re still testing with 05-06.

scastlara · June 12, 2025, 3:21pm

Hello! I don’t want to hijack the post, but we experience something very similar.

After the new Gemini 2.5 pro 06-05 was released, I wanted to add support for it in an open source tool I created.

After I did it, I ran a few easy evaluations we always execute when adding new models to support. Unfortunately, for the prompts that this tool was using, we saw that while the previous version of the model (05-06) took approximately 1 minute, the newest version actually took 5-10 minutes (!).

The prompts are exactly the same towards both models, and this is a summary of the tokens of the prompt (maybe that helps):

New Gemini 06-05

Call 1
Request count: 1
Request tokens: 2393
Response tokens: 97
Total tokens: 4584

Call 2
Request count: 1
Request tokens: 2104
Response tokens: 97
Total tokens: 3272

I’ve noticed, after trying also with the old model, that the latency also increased for that one (though definitely not as much, taking 2 minutes or so for each call). Which makes me suspect that this might be not only about the model version. Choosing gemini-2.5-flash-preview-05-20 makes it fast again, with latencies of <3s.

Joao_Lazzaro · June 30, 2025, 12:54pm

I just want to update here. With the general availability, the new model has acceptable latency, so issue solved.

Topic		Replies	Views
Very slow response time on the new 2.5 Pro 0605 model Gemini API generative-ai , gemini-2-5	4	1405	June 27, 2025
Gemini 2.5 Pro Preview 05-06 deprecation notice Gemini API announcement , gemini-2-5	28	3620	July 7, 2025
Gemini 2.5 Pro has gotten worse Google AI Studio models , model , gemini-2-5	15	1390	July 24, 2025
Critically poor performance of the latest gemini-2.5 model Google AI Studio models	5	1082	July 6, 2025
Gemini-2.5-pro accessed over https://generativelanguage.googleapis.com/v1beta/openai/ has dramatic latency increase Gemini API api , model , gemini-2-5	10	410	July 21, 2025

Gemini 2.5-pro-preview-06-05 extremely slow

Related topics