Model review, gemini-exp-1114

Some people will have had an opportunity to test the quality of gemini-exp-1114 by now. I had some initial thoughts here: How is new Gemini EXP [1114] model different from the regular 1.5 pro? - #2 by OrangiaNebula and with more testing, I have some more thoughts to share, and obviously welcome feedback from fellow developers.

Things that seem broken:
Video is apparently broken: I only get Internal error in AI Studio when presenting the model with a 35 second clip with the prompt “Describe how this video was made and briefly what it depicts.”. That same video and prompt works for all the Gemini 1.5 (Pro, flash, 001, 002 and even Gemini 1.5 Pro Experimental 0801). This is probably a bug. If someone else has managed to get a video processed by gemini-exp-1114 in AI Studio, please say so.

Things that work well:
The model exhibits superior quality when compared to the other Gemini so far, with very few exceptions. It can handle problems that have stymied the other Gemini because it has superior self-evaluation and proceeds with alternative solution templates if the current solution template isn’t working. Tip: you can significantly enhance the model quality by giving the model in the prompt actionable solution evaluation criteria. For example, in a scenario where a function is to be determined, you might add to the prompt ‘Because of (put the reason here), eligible solutions will asymptotically converge towards 1 for t->∞’. The model will use the criteria you provided to quickly eliminate wrong solutions, and will then move on to the next solution template to try out faster.

Where Google might go astray:
The quality of the model improves with the amount of time it is given to try out solutions. That is what test-time (or some call it inference-time) scaling research has shown. Based on my testing, gemini-exp-1114 is given one minute of think time. That’s enough to make gemini-exp-1114 the highest quality Gemini so far, and it’s not enough time for many problems that the model could have handled if it had been given more time to try things out. I very much understand that more thinking time means less throughput and it comes at a cost. It is clear to me at least that one minute isn’t enough to get the model to awe-inspiring quality. Besides, giving the model more inference time isn’t a guarantee that it will find the solution to the task presented to it. I think it will take a creative approach to billing, with a user option (possibly set in google.generativeai.GenerationConfig) to choose between (a) extended inference time, with user specified time limit, billed at a higher rate or billed by the think-minute or (b) default behavior with the current 60s limit. That way, people concerned about billing cost will get a better but not spectacularly better model, and people with actual difficult problems will have the option to give them to the model to solve. Where Google might go astray is, initially picking a generous inference-time time limit, and a few weeks or months down the road throttling it to a lower level because the data center cost is spiraling. That would be a sure fire way to lose customers.

3 Likes

oh that’s Gemini 2.0 Pro

This behavior has been fixed: Video can now be processed as well.