Hi,
I am totally new to the AI and ML SKD frameworks, hope I can get some guidance here.
The purpose of converting different models to tflite, is to run a speech to text app on-device for Android (and later also iOS).
Have followed the official MediaPipe documentation and some other websites that have step-by-step instructions and I can’t get to convert the tflite with the recommended settings.
https://ai.google.dev/edge/mediapipe/solutions/guide
https://medium.com/@areebbashir13/running-a-llm-on-device-using-googles-mediapipe-c48c5ad816c6
The script to convert is taken directly from the tutorials
from mediapipe.tasks.python.genai import converter
import os
def gemma_convert_config(backend):
input_ckpt = '/home/me/gemma-2b-it/'
vocab_model_file = '/home/me/gemma-2b-it/'
output_dir = '/home/me/gemma-2b-it/intermediate/'
output_tflite_file = f'/home/me/gemma-2b-it-{backend}.tflite'
return converter.ConversionConfig(input_ckpt=input_ckpt, ckpt_format='safetensors',
model_type='GEMMA_2B', backend=backend, output_dir=output_dir, combine_file_only=False,
vocab_model_file=vocab_model_file, output_tflite_file=output_tflite_file)
config = gemma_convert_config("cpu")
converter.convert_checkpoint(config)
The tflite with a cpu backend model has always failed throwing a runtime error:
python3.12/site-packages/mediapipe/tasks/python/genai/converter/llm_converter.py", line 220, in combined_weight_bins_to_tflite
model_ckpt_util.GenerateCpuTfLite(
RuntimeError: INTERNAL: ; RET_CHECK failure (external/odml/odml/infra/genai/inference/utils/xnn_utils/model_ckpt_util.cc:116) tensor
All the options I have tried with different flavors of Ubuntu (wsl2 in Windows 10, a VM with Ubuntu 24) caused the same runtime errors. I was capable of convert with the gpu model, and it actually loads into the android app.
Is there anything I am missing to the the cpu backend to work ? Will the cpu backend convertion will really show obvious performance advantages (as the documentation suggests)?
Any help is welcome.