Hey @Dario_Prawara_Teh and @jegaths,
The standard TensorFlow converter fails because PaliGemma’s architecture is too complex for it.
As @Haoliang_Zhang noted, the only working method is to use Google’s ai-edge-torch library. You have to re-author the model to make it compatible with the TFLite conversion process.
To quote official documentation
“Note that PaliGemma models can be converted to TFLite only with the ODML Torch conversion backend.” directly mirroring @Haoliang_Zhang’s instructions. This is the only comparible way.
This means a standard tf.Module wrapper or SavedModel conversion will not work. You must use the ai-edge-torch toolchain with this specific backend. This directly follows the path @Haoliang_Zhang suggested and explains why it’s the necessary one.
Here is the correct workflow based on that official requirement. Note that this is merely a demonstration meant to guide you through the process of re-authoring and is in no way, shape or form a copy-and-paste solution.
How to Re-author PaliGemma for Conversion
The goal is to create a new PyTorch nn.Module that wraps your PaliGemma model and has a forward method that ai-edge-torch can trace. You should adapt the Gemma example for PaliGemma’s structure.
Here are the required steps:
1. Create a Wrapper Class
Define a new class that loads your fine-tuned Hugging Face PaliGemma model.
import torch
from transformers import PaliGemmaForConditionalGeneration
class PaliGemmaForTFLite(torch.nn.Module):
def __init__(self, model_path: str):
super().__init__()
self.paligemma = PaliGemmaForConditionalGeneration.from_pretrained(
model_path
).eval()
2. Implement a Traceable forward Method
This is the most critical step. You must manually define the data flow from image and text inputs to the final logits. This replaces the complex generate() method with a single, static forward pass.
def forward(self, pixel_values: torch.Tensor, input_ids: torch.Tensor):
# The goal is to replicate the logic inside the model's forward pass.
# This is a conceptual example; you'll need to check the model's source
# code for the exact `_prepare_input_embeds` logic.
# Manually create the combined image and text embeddings
inputs_embeds = self.paligemma._prepare_input_embeds(
pixel_values=pixel_values, input_ids=input_ids
)
# Pass the combined embeddings through the language model
outputs = self.paligemma.language_model(inputs_embeds=inputs_embeds)
# Return the logits for the next token prediction
return outputs.logits
3. Convert Using ai_edge_torch
Use the convert function with the required odml_torch backend.
import ai_edge_torch
# Initialize your wrapper model
model = PaliGemmaForTFLite("path/to/your/finetuned-model")
# Create example inputs with the correct shape and type
example_pixel_values = torch.randn(1, 3, 224, 224)
example_input_ids = torch.randint(0, 100, (1, 50))
# Convert the model
edge_model = ai_edge_torch.convert(
model,
(example_pixel_values, example_input_ids),
backend="odml_torch" # This is the essential flag
)
# Export the TFLite file
edge_model.export("paligemma.tflite")
In summary: You are not converting the original PaliGemma directly. You are building a new, simplified PyTorch module around it that ai-edge-torch can understand and convert.
If you encounter specific errors during conversion, the best place for support is the ai-edge-torch GitHub issues, with a step-by-step guide to reproduce the error and any error messages/logs as the team there can help debug problems with their library.
I hope this helps you!