Fine tuning a multimodal model

Bukempas · March 22, 2024, 10:11pm

I would like to know if it’s possible to fine tune a multimodal model such as images of parts like defected or non-defected with examples of multimodal prompts? I have read the finue tuning models but as far as I understand it’s for text prompts.

keertk-google · March 22, 2024, 11:24pm

That’s correct. At this time, tuning only works with text and not with other modalities.

Bukempas · March 24, 2024, 9:10pm

will the fine tuning be possible for other modalities such as vision? In industrial life I would like to use for custom vision datasets

keertk-google · March 25, 2024, 3:53pm

Good to know, thanks for sharing!
Not possible as of now, but we’ll let you know if things change in the future.

Bukempas · March 25, 2024, 5:37pm

I made a mobile application for waste classification at our plant in order to help people to find out the right place to throw the returnable wastes however they wanted to classify multiple wastes at once for example 100 wastes at once (my application could classify one waste) I think that Gemini 1.5 pro will let me classify many wastes at once and I will try and no need to fine tune a custom dataset for it.

Curt_Kennedy · April 25, 2024, 5:34am

I have gotten a lot of mileage out of models that create embedding vectors from images.

Useful for building your own custom image classifiers.

Topic		Replies	Views
When will Gemini support fine-tuning with images/video data? Gemini API	1	422	August 27, 2024
Gemini pro / flash multimodel finetuning Gemini API gemini-15 , api , models	1	192	August 19, 2024
How to trun model with Gemini on Image input and text output? Gemini API	2	110	June 5, 2024
Fine tuning 1.5? Gemini API gemini-15 , fine-tuning	2	183	July 16, 2024
Has Anyone Gained Access to Gemini 1.5 Pro API? (Re: Gemini 1.5 Pro API's Multimodal Features) Gemini API	3	211	April 26, 2024

Fine tuning a multimodal model

Related topics