Gemini models are good, amongst the bests. However, Google management, products, API… everything else is mediocre. It’s frustrating working on their API, even with productions endpoints, not just preview or experimental.
Does anyone have the following problem?
I am requesting the api to use the model “gemini-2.5-pro-preview-03-25”, but when the result is received, I get a response from gemini-2.5-pro-preview-05-06:
…“role”: “model”
},
“finishReason”: “STOP”,
“index”: 0
}
],
“usageMetadata”: {
“promptTokenCount”: 101992,
“candidatesTokenCount”: 507,
“totalTokenCount”: 104322,
“promptTokensDetails”: [
{
“modality”: “TEXT”,
“tokenCount”: 101992
}
],
“thoughtsTokenCount”: 1823
},
“modelVersion”: “models/gemini-2.5-pro-preview-05-06”,
The response text was cut short and is unusable; this has been happening to me for the last week.
I am on the paid tier.
@MichaelAi The entire thread you are posting in is dedicated to discussing this. Please read it.
I thought people were complaining about the new model 05-06, but not about this “error.” I assumed I was the only one experiencing this “problem,” since normally, if you enter an incorrect model name, it returns a bad response.
Honestly, this is really wrong on Google’s part, especially since I spent hours trying to figure out what was wrong with my code.
They went from having the mediocre models, to having arguably the best in like 6 months flat. And have launched like 20 entire products, revamped their entire branding and marketing, dealt with major legal issues AND continue to give away massive amounts of LLM compute for free.
I’m the first to wax poetic about the “don’t be evil” days but if it weren’t for Google, I literally couldn’t afford to have built up my AI skills as I’ve done with their resources.
Guys, don’t panic, everything will be fine
My early tests show that the new checkpoint which is supposed to become the GA release improves upon the 03-25 model.
Google likely listened to everyone’s thoughts and feedback.
With that said they still haven’t clarified what the naming of their model means and whether we can rely on them or not, which is what this whole post is about.
I have noticed the complete smash of benchmarks by 0605. But in my use case (academic text comprehension, summary , legal reasoning etc.) , 0605 seems to perform worse than 0506(e.g. it loses original text information in the summary produced, less detailed references, missing legal facts that trigger liabilities etc.)
I have noticed the complete smash of benchmarks by 0605. But in my use case (academic text comprehension, summary , legal reasoning etc.) , 0605 seems to perform worse than 0506(e.g. it loses original text information in the summary produced, less detailed references, missing legal facts that trigger liabilities etc.)
(Attachment Certification of Honorable Service for Members of the Selected …pdf is missing)
The same performance degradation for me. 06-05 feels like flash version, rather than pro.
I think it’s mostly because 06-05 thinks 2-3x times less even with maxed thinking budget.
Worst part is that it is not possible to debug thinking process because it’s basically censored now.