Spatial Reasoning in gemini-1.5-pro-exp-0801

OrangiaNebula · August 3, 2024, 6:27am

Microsoft Research published the paper “Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models” in May 2024 ([2404.03622] Mind's Eye of LLMs: Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models). Their results are easy to reproduce, they had not tested the Google models, so I did. The results I obtained were quite disappointing, so I didn’t publish them at that point in time.

The natural language navigation task using a 3x3 grid has a baseline success rate one in nine, which is 11%. A blindfolded chicken randomly pecking at results would achieve an expected success rate of 11%. Gemini 1.5 flash achieved 24%. The results reported in the Microsoft paper for GPT-4 had success rates at well over 50%. Clearly the 1.5 models were not doing that well with this task. The difficulty with spatial direction was obvious with other tests as well (visual navigation tasks, showing an image with upside down numbers, and in responses from the model about the relative position of objects in an image). The Gemini 1.5 were directionally challenged (both flash and Pro).

I repeated the natural language navigation task with experimental 0801. The preliminary results are encouraging: the success rate worked out to 78%, but with large error bars, which translates into something between success 3 out of 4 times or 4 out of 5. That is a big step difference from success 1 in 4, or about only double the success rate of the blindfolded chicken that the non-experimental models achieve. Just to make sure, I re-ran the test on gemini-1.5-flash-001 and it shows the same disappointing success rate as the one I had measured in May.

Model gemini-1.5-pro-exp-0801 also shows significant improvement when dealing with images with upside down numbers etc. It seems to have overcome the directional challenges of the previous model variants.

There are other differences in performance as well, a few math problems that Gemini 1.5 Pro had difficulty with the 801 model solved, but that works both ways: a few math problems that Gemini 1.5 Pro solves, the 801 model has difficulty with.

Conclusion: the most dramatic improvement in performance between Gemini 1.5 Pro and model 801 that I have observed in testing so far is in spatial navigation and direction. The 801 model did not experience performance degradation due to excessive alignment training, which I am convinced is a malady that afflicted the 1.5 models and that nobody in Google will ever fess up to (since who in their right mind will want to admit to such a potentially career limiting mistake).

Topic		Replies	Views
Gemini-1.5-pro-latest performs WORSE since yesterday. How to use its previous version? Gemini API	35	817	September 2, 2024
Gemini 1.5 Pro Experimental 0801 model available in AI Studio API Google AI Studio	0	204	August 1, 2024
Issues with the Accuracy of Object Coordinates Detected by Gemini 1.5 in Images Gemini API gemini-15	6	325	June 10, 2024
How better is the new model (gemini 1.5 pro experimental 0827) Google AI Studio models	18	1899	September 24, 2024
Model review, gemini-exp-1114 Gemini API bug , models	2	3270	November 21, 2024

Spatial Reasoning in gemini-1.5-pro-exp-0801

Related topics