Hi everyone,
I am conducting a comparative study on how Prompt Engineering affects LLM reasoning capabilities. Therefore, I would like to ensure that the model versions remain consistent throughout the experiments. Since I have already generated experimental results using Gemini-3, this specific dataset must be retained. However, since Gemini-3 has been deprecated, all subsequent experiments can only utilize Gemini-3.1 for comparison.
Could anyone clarify the key architectural or parameter differences between these two versions? I’m particularly concerned about whether changes in 3.1 might act as confounding variables in my reasoning benchmarks.
Thanks!