Help Needed: Converting Fine-Tuned Gemma 3 Vision Model to TensorFlow Lite (TFLite)

Hi everyone,

I’ve successfully fine-tuned the Gemma3ForConditionalGeneration model and have been getting great results. My goal now is to deploy this model on mobile devices for offline use, which requires converting it to the TensorFlow Lite (TFLite) format.

I’ve tried several standard conversion methods, but I’m running into challenges, likely due to the model’s complex multimodal architecture. I’m looking for a reliable workflow or script to handle this conversion.

Key Details:

  • Model Architecture: Gemma3ForConditionalGeneration
  • Special Tokens: The model uses several special tokens, including <bos> (ID: 2), <image_soft_token> (ID: 262144), (ID: 255999), and <end_of_image> (ID: 256000).
  • Input Format: The model expects a specific input sequence combining text and image tokens. Each image is represented by 256 image tokens.

Has anyone successfully converted a fine-tuned Gemma 3 vision model (or a similar multimodal model like PaliGemma) to TFLite? Any scripts, tutorials, or guidance on the correct process would be extremely helpful.

Thank you in advance for your help!

Can you check this procedure?

Having instructions from this:

I do not see a way to convert the gemma3n though (for now). This .task file consists of several tflite files inside.

Thanks george. I tried this route but couldn’t get it to work. I might take a second look. thanks!

Hi @Telli_Koroma ,

To covert the Gemma models to TFLite format, you can utilize the MediaPipe ai-edge conversion script from the following GitHub repository. The example code script which is provided is for PaliGemma. If you would like to explore for Gemma2 or Gemma3 please visit the Gemma and Gemma3 packages under example package in the same GitHub repo.

Thanks.

Thanks! This approach worked!