Hi All,
In the stack of my team, gemini 2.0 is a very important model because it is so consistent in scoring outputs and incredibly fast. We did many benchmarks and none of your other models are even close in consistency (and accuracy using human judgement as target).
I saw the recommendation is to just use gemini 2.5. This will be a disaster for our workflow (and for many other people as well) because we know already it won’t work for us. Google has no equivalent model. The only solution for us will be to look somewhere else for an equivalent (with going through the pain of many many iterations of testing and benchmarking).
Please be like openAI and hear the voice of your users!
All the best