How to convert trained HuggingFace PaliGemma Model to TFLite?

Hi there! Recently I managed to fine-tune and train the PaliGemma VLM released by Google and achieved great results.

I was wondering if anyone knows how to convert this PaliGemma model into Tensorflow Lite format so that it can be deployed to mobile devices efficiently offline? Can’t seem to find any working methods online so far.

Any help is GREATLY appreciated! Thank you!

1 Like

Hi,

To ensure you can smoothly convert and run efficiently w/ TF Lite, you will need to re-author the model using our Torch Generative API, some examples can be found here:

1 Like

To convert the PaliGemma VLM model to TensorFlow Lite format, follow these steps:

  1. Export to TensorFlow SavedModel: Ensure your PaliGemma model is in TensorFlow SavedModel format.

python

Copy code

model.save('path/to/saved_model')
  1. Install TensorFlow Lite Converter: If you haven’t already, install TensorFlow and TensorFlow Lite.

bash

Copy code

pip install tensorflow
  1. Convert to TensorFlow Lite: Use the TensorFlow Lite Converter to convert the SavedModel to TFLite format.

python

Copy code

import tensorflow as tf

# Load the SavedModel
converter = tf.lite.TFLiteConverter.from_saved_model('path/to/saved_model')

# Set conversion parameters if needed
# converter.optimizations = [tf.lite.Optimize.DEFAULT]

# Convert the model
tflite_model = converter.convert()

# Save the TFLite model
with open('path/to/model.tflite', 'wb') as f:
    f.write(tflite_model)
  1. Test and Deploy: Test the TFLite model using TensorFlow Lite interpreter and deploy it on your mobile device.

For specific issues or advanced configurations, refer to TensorFlow Lite documentation and support forums.

1 Like

Hi, thanks for your input! Actually have tried that before, but I dont seem to see any examples for PaliGemma, only for Gemma. And faced a lot of errors in reauthoring the model itself. Was wondering if you have a working solution available / could guide me through in more detail? Thanks and appreciate it

1 Like

Hi thanks for your help! Have tried this but doesn’t seem like it’s working due to incompatible layers in the model architecture

1 Like

Hi, did you get a solution for this? I am also looking for something similar.

Hey @Dario_Prawara_Teh and @jegaths,

The standard TensorFlow converter fails because PaliGemma’s architecture is too complex for it.
As @Haoliang_Zhang noted, the only working method is to use Google’s ai-edge-torch library. You have to re-author the model to make it compatible with the TFLite conversion process.

To quote official documentation
“Note that PaliGemma models can be converted to TFLite only with the ODML Torch conversion backend. directly mirroring @Haoliang_Zhang’s instructions. This is the only comparible way.

This means a standard tf.Module wrapper or SavedModel conversion will not work. You must use the ai-edge-torch toolchain with this specific backend. This directly follows the path @Haoliang_Zhang suggested and explains why it’s the necessary one.

Here is the correct workflow based on that official requirement. Note that this is merely a demonstration meant to guide you through the process of re-authoring and is in no way, shape or form a copy-and-paste solution.

How to Re-author PaliGemma for Conversion

The goal is to create a new PyTorch nn.Module that wraps your PaliGemma model and has a forward method that ai-edge-torch can trace. You should adapt the Gemma example for PaliGemma’s structure.

Here are the required steps:

1. Create a Wrapper Class
Define a new class that loads your fine-tuned Hugging Face PaliGemma model.

import torch
from transformers import PaliGemmaForConditionalGeneration

class PaliGemmaForTFLite(torch.nn.Module):
    def __init__(self, model_path: str):
        super().__init__()
        self.paligemma = PaliGemmaForConditionalGeneration.from_pretrained(
            model_path
        ).eval()

2. Implement a Traceable forward Method
This is the most critical step. You must manually define the data flow from image and text inputs to the final logits. This replaces the complex generate() method with a single, static forward pass.

    def forward(self, pixel_values: torch.Tensor, input_ids: torch.Tensor):
        # The goal is to replicate the logic inside the model's forward pass.
        # This is a conceptual example; you'll need to check the model's source
        # code for the exact `_prepare_input_embeds` logic.
        
        # Manually create the combined image and text embeddings
        inputs_embeds = self.paligemma._prepare_input_embeds(
            pixel_values=pixel_values, input_ids=input_ids
        )
        
        # Pass the combined embeddings through the language model
        outputs = self.paligemma.language_model(inputs_embeds=inputs_embeds)
        
        # Return the logits for the next token prediction
        return outputs.logits

3. Convert Using ai_edge_torch
Use the convert function with the required odml_torch backend.

import ai_edge_torch

# Initialize your wrapper model
model = PaliGemmaForTFLite("path/to/your/finetuned-model")

# Create example inputs with the correct shape and type
example_pixel_values = torch.randn(1, 3, 224, 224) 
example_input_ids = torch.randint(0, 100, (1, 50))

# Convert the model
edge_model = ai_edge_torch.convert(
    model,
    (example_pixel_values, example_input_ids),
    backend="odml_torch"  # This is the essential flag
)

# Export the TFLite file
edge_model.export("paligemma.tflite")

In summary: You are not converting the original PaliGemma directly. You are building a new, simplified PyTorch module around it that ai-edge-torch can understand and convert.

If you encounter specific errors during conversion, the best place for support is the ai-edge-torch GitHub issues, with a step-by-step guide to reproduce the error and any error messages/logs as the team there can help debug problems with their library.

I hope this helps you!