I am using the Gemini 2.0 flash thinking model, and I have found an interesting fact: when the temperature is set to 1, Gemini performs better, whether in coding or everyday responses. The official recommended default temperature is 0.7, while the default temperature for Gemini 2.5 Pro is 1. I want to know if this is specifically set because the Gemini 2.0 flash thinking performance is unstable, or if it’s because the official has not updated it.
February 2025 guide from Gemini developers recommends trying several temperatures in order to find the best for your task in particular Prompt Engineering | Kaggle
As a general starting point, a temperature of .2, top-P of .95, and top-K of 30 will give you relatively coherent results that can be creative but not excessively so. If you want especially creative results, try starting with a temperature of .9, top-P of .99, and top-K of 40. And if you want less creative results, try starting with a temperature of .1, top-P of .9, and top-K of 20. Finally, if your task always has a single correct answer (e.g., answering a math problem), start with a temperature of 0.
These temperatures are notably lower either than 0.7 or 1 set by default, I’m not sure what to make of that
There is no best setting, just use what suits you best.
So I researched the problem a bit more. It’s not Gemini-specific but the common opinion of ML practitioners on r/LocalLLaMA is that if one doesn’t want hallucinations it’s generally advised against setting your temperature as high as 0.7 or more. Some users there prefer temperatures around 0.5-0.6, some are closer to advice in the guide I cited.
Depending on one’s task and context length, hallucinations might be a larger or smaller problem, so I recommend the readers to experiment yourself. If you don’t have any such issues then the current temperature should work for you (RLHF makes chatbots work differently, in a sense, more flexibly than pre-trained “base” models in respect of temperature)