Gemini Vision API Pricing

I have a question about Gemini Vision. According to documentation sources, each uploaded photo consumes approximately 258 tokens (though this may vary by pixel size). Currently, it can be used with gemini-1.0-pro-vision, gemini 1.5-flash, gemini-1.5-pro, and gemini-2.0-flash.

In this case, are these 258 tokens calculated according to each model’s API pricing? If so, what is the difference between these models in terms of vision understanding? Does gemini-2.0-flash have better vision understanding than gemini-1.5-pro?

gemini-2.0-flash-lite has also been added to the API. Is it possible to use vision with this model as well? I think Google doesn’t pay enough attention to details regarding vision capabilities.

Actually, I’m going to use it as OCR for mathematical problems. I want to decide which one would be best for me.

I couldn’t find any benchmarks or resources about this.

Thank you for your support.

Hi @Ozgur_Ugurlu Welcome to the community. Apologies for the late response
We have released a new model -Gemini 2.5 Pro is positioned as Google’s most intelligent model, with state-of-the-art performance in areas requiring advanced reasoning, including multimodal understanding.Please try this for your use case and let us know if there is any issue
Thank you