Difference between Gemini API and Google Search AI Mode in visual recognition?

Hi everyone,
I’m developing an Android app prototype to recognize LEGO sets from a photo of the box or the assembled set.
The prototype works well, but mostly with older LEGO sets.
With newer sets, the identification becomes inaccurate or doesn’t work at all.

However, I’ve noticed that if I upload the same images to Google Search AI Mode in the browser, it recognizes the sets correctly.

My question is: shouldn’t the Gemini API use the same visual recognition capabilities as Google’s AI Mode?
Are there known differences between AI Mode in Search and the Gemini API?

Any suggestions on how to improve recognition accuracy or which model would be better suited for visual matching tasks like this?

Thank you very much!

Hi @Massimiliano_Gianni

My suggestion to improve the performance of Gemini APIs is to use gemini-3-pro (Preview) or gemini-2.5-pro models as they have strongest visual reasoning and multimodal understanding capabilities, which is crucial for distinguishing between highly similar LEGO sets or identifying new ones from complex box art.

Also , implement grounding with google search as it lets the model perform a web search to ground its response in up-to-date, factual information. Bettter prompting techniques will also help in guiding the model’s focus.
Thanks