Quantization : slower inference on Android phone

AssiaN17 · June 2, 2022, 11:13am

Hello,
I have quantized mobilnet_v2 model, using ‘dynamic’ and ‘full int’ quantization techniques, converted the models to tflite (used the same code from tensorflow tutorial) and benchmarked the inference (with benchmark_model) time over CPU and GPU on Android mobile phone.
My results are the following :

On GPU :

Dynamic range quantization is slightly faster : 0.50 ms faster
Full Int quantization is slightly slower : 0.20 ms slower

On CPU (4 Threads )

Dynamic range quantization is really slow : 8 ms slower
Full Int quantization is slightly slower : 0.30 ms slower

Can anyone please tell why in this case quantization is not accelerating the model ? Has anyone encountered the same problem with same/other model/s ?

Thank you

help_request

Tina_Sabri · June 3, 2022, 6:38am

Both approaches work on the Android device and yield the expected results. Yet, to my great surprise, the inference with the TensorFlow Lite interpreter takes at least twice as long as the inference with the TensorFlowInterface (on the same device, of course). I checked this on various devices, and the results are similar in all cases.

AssiaN17 · June 3, 2022, 7:43am

And did you get any answers about the reason the quantized models are taking this much time with the Tf Lite Interpreter or still no explanations?
Thank you

Topic		Replies	Views
Quantized version of tflite model does slower inferences on Android phones General Discussion models , android , tflite , help_request	1	975	December 13, 2022
Tensorflow lite inference time General Discussion tflite , help_request	4	1111	July 23, 2021
Why is my custom model with mobile netv2 is so slow in inference time? General Discussion models , tflite , model_garden , help_request	2	1659	July 31, 2021
TFlite conversion is slow on raspi4 General Discussion tflite , raspberry_pi , help_request	5	1437	December 8, 2023
Have been training custom tflite custom object detection model using model maker. Any way to train models which can make detections faster General Discussion tflite-support , model-maker , model-training	4	412	January 9, 2024

Quantization : slower inference on Android phone

Related topics