I would like to know if it’s possible to fine tune a multimodal model such as images of parts like defected or non-defected with examples of multimodal prompts? I have read the finue tuning models but as far as I understand it’s for text prompts.
That’s correct. At this time, tuning only works with text and not with other modalities.
will the fine tuning be possible for other modalities such as vision? In industrial life I would like to use for custom vision datasets
Good to know, thanks for sharing!
Not possible as of now, but we’ll let you know if things change in the future.
I made a mobile application for waste classification at our plant in order to help people to find out the right place to throw the returnable wastes however they wanted to classify multiple wastes at once for example 100 wastes at once (my application could classify one waste) I think that Gemini 1.5 pro will let me classify many wastes at once and I will try and no need to fine tune a custom dataset for it.
I have gotten a lot of mileage out of models that create embedding vectors from images.
Useful for building your own custom image classifiers.