Bad quality of text embeddings with MediaPipe

Lake6985 · July 29, 2025, 2:16pm

Hi all
Today , I have tried the Media Pipe for the first time.
My goal is to compute cosinus similarities between a text query and a set of FAQ entries.
The corpus is a French corpus and contains some industrial technical terms (mainly acronyms & abbreviations)
I am a little bit disappointed with the results. The accuracy/precision is 0.18 and MRR 0.26
I made some experimentations with the model ‘embedder.tflite’ . I also tried ‘universal_sentence_encoder.tflite’ but resultats are not better.

Is there any other model that I can try with MediaPipe to generate my embeddings and to get better results ?

Outside the ecosystem of Media Pipe, I used all-MiniLM-L6-v2 / multilingual-e5. Results are very good but the quantization and porting the tokenizer on Android (my target) is not very easy.

Joel_Sathiyendra · July 30, 2025, 4:45am

Hi @Lake6985 ,

The model accuracy is expected to be low. MediaPipe solutions are for edge devices, and they are benchmarked with run times on CPUs. If anyone has a different view on this, please let me know.

As you said, performance increases with model size. However, I checked the all-MiniLM-L6-v2/multilingual-e5, and they have a TensorFlow model. You can convert it to TFLite and use it.

Lake6985 · July 30, 2025, 6:10am

Hi Joel
Thank you for your prompt response.
I perfectly understand that performances on edge device are the priority. In my work, I mainly target edge devices for our application and it is a tough work.
As you suggest, I will investigate in converting the all-MiniLM-L6-v2 to TFLite in order to be able to use it with media pipe. But what about the tokenisation? I can export the config files (*.json files) and the model of the tokenizer (sentencepiece.bpe.model) but how to build a sentence piece tokenizer from files with Media Pipe ? Is there any way to use the SentencePeice tokenizer of Google with MediaPipe? coud you please tell me if you see any way to do that ?
I really hope I will be able to use Media Pipe to build my FAQ .

Thx

Maisie_Leblanc · August 12, 2025, 1:39pm

What is MediaPipe…? I have never heard of it before!

Lake6985 · August 27, 2025, 11:29am

sorry for the delay @Maisie_Leblanc

you can find information about MediaPipe here: MediaPipe Solutions guide | Google AI Edge | Google AI for Developers

Topic		Replies	Views
Improving state of NLP for TFLite ( Mobile / flutter ) General Discussion models , nlp , tflite , tfhub , education	5	3544	March 24, 2022
How to use batch normalization in Mobile Net for a quantized TensorFlow lite model? General Discussion models , keras , tflite , help_request	8	2472	June 20, 2021
How do I use sentence-transformers/all-MiniLM-L6-v2 tflite model in android studio (kotlin) General Discussion tflite , transformers	1	1595	January 23, 2024
Help converting tflite models with mediapipe Google AI Edge tflite-support , mediapipe	2	407	January 22, 2025
The Different Between MediaPipe and TFLite General Discussion mediapipe , tflite	4	1626	December 27, 2023

Bad quality of text embeddings with MediaPipe

Related topics