AI edge torch api converted Gemma 2b inference via Mediapipe on Android

Hi,

We have tried the steps to convert Gemma 2B into TFLite format. We succeed in the transformation (including tokenizer.model incorporation) and can also perform inference through Mediapipe on Android phone, but it outputs garbled characters.

This seems related to the Mediapipe SDK for such AI edge torch converted models since it outputs normal results on this TFLite model over our server. And we get the same result when do it on phi-3-mini.

image

Further, the AI edge torch API supports quantization to int8, are there any plans to extent this to support quantization to int4 in the near future?

Thanks

1 Like