Image Classification

Hello everyone, I have thousands of images from a mega civil project. I want to classify them into foundation cleaning, prepour activities, concrete pour activities, crushing plants, stockpile maintenance, etc. I am not an expert and do not know how to start this.

Luis

Happy to help. This is a very simple project to do. I have one of my projects focused on vision also and Gemini is very good for this type of use case you mention.

I can provide you with some sample code if you want but the essence is you will want to provide system prompt instructions on what in the image the model is looking for to classify the image, and have specific instructions to respond back per image on a deterministic classifier that you can look for in the response to help you process the image after the model classifies it.

After the instructions, you will concert the image to base64, send it to Gemini like a normal prompt with system instructions, receive the one word deterministic classification (or structured Json response) that you can then perform an action in your code based on how it was classified (which would be your code performing this unless you were also doing tooling which I don’t think would be necessary for what your doing. Let me know if you want some sample code that does this and if you will be loading this images via a filesystem directory, a database, or via some front end web framework like flask/etc…

Also, the great news is Gemini is the cheapest for this out of all the LLMs. $0.07/3500 images of any size in flash 1.5 and $0.10/3500 images of any size in flash 2.0. every image of any size (assuming your using Gemini API vs Vertex) is only considered 258 tokens per image regardless of size or resolution so it’s quite the steal.

Thanks so much, Jami:

In parallel to my request, I started looking into other options. Once I try them, and if I fail, I’ll get back here.

Regards,

Luis