Gemini 2.5-pro has become unusable. When 03-25 was released it could perform the task of summarizing long context under 3 minutes. When it was updated to 05-06 this time went up to 8 minutes with a very frequent ocurrence of timeouts in 10 minutes. Now, 06-05 is never able to finish the same task even if the thinking budget is set to the minimum.
Hi Joao_Lazzaro,
With the release of new models and old models being deprecated, we have seen a huge bump of daily active users which could be one of the reasons for higher processing times..
Can you please provide some of the long context prompts that was used to test the model along with processing times.. It helps us to compare the results from our end and can successfully rule-out other performance reasons.
I cannot share the prompt since it consists mostly of proprietary documents (around 190k tokens of documents) and some system prompt in the style of “You are a helpful assistant”, plus a user question. I tried to use 06-05 yesterday again with no success so we’re still testing with 05-06.
Hello! I don’t want to hijack the post, but we experience something very similar.
After the new Gemini 2.5 pro 06-05 was released, I wanted to add support for it in an open source tool I created.
After I did it, I ran a few easy evaluations we always execute when adding new models to support. Unfortunately, for the prompts that this tool was using, we saw that while the previous version of the model (05-06) took approximately 1 minute, the newest version actually took 5-10 minutes (!).
The prompts are exactly the same towards both models, and this is a summary of the tokens of the prompt (maybe that helps):
- New Gemini 06-05
Call 1
Request count: 1
Request tokens: 2393
Response tokens: 97
Total tokens: 4584
Call 2
Request count: 1
Request tokens: 2104
Response tokens: 97
Total tokens: 3272
I’ve noticed, after trying also with the old model, that the latency also increased for that one (though definitely not as much, taking 2 minutes or so for each call). Which makes me suspect that this might be not only about the model version. Choosing gemini-2.5-flash-preview-05-20
makes it fast again, with latencies of <3s.
I just want to update here. With the general availability, the new model has acceptable latency, so issue solved.