Quantized version of tflite model does slower inferences on Android phones

huzq85 · August 16, 2022, 6:14pm

Recently I have been benchmarking the inference speed between the quantized and non-quantized tflite models. They have been converted from the same pre-trained Tensorflow model (a *.pb file).
The thing is that when I compared the inference speed between these two on Android phones, the (int) quantized version is always slower than the non-quantized one. One thing I’ve learned is that I might need to perform the benchmark on ARM processors due to the optimizations based on NEON (INT TFLITE very much slower than FLOAT TFLITE · Issue #21698 · tensorflow/tensorflow · GitHub). In my case, I don’t think this is the problem, but still the issue is there. Does anyone have any comments? Thanks!

PS:
Sorry to bother you in a short time.

Just now, I figured out something that could be some initial input for my questions. I just set one command line option --use_xnnpack=false for doing inference both on quantized and non-quantized tflite models. This time, it looks normal that the quantized version uses less time than the non-quantized version. As far as I know, XNNPACK could help to speed up inferences in the floating point model, but does it support the int model as well? According to this post support for quantized tflite models · Issue #999 · google/XNNPACK · GitHub, it is disabled by default. Do we have the latest information about the support for int?

Any feedback is appreciated!

chunduriv · December 13, 2022, 1:10pm

Please try to build TFlite with --define tflite_with_xnnpack_qs8=true and tflite_with_xnnpack_qu8=true Bazel options.

Please refer to the XNNPACK delegate on supported operators.

Thank you!

Topic		Replies	Views
Quantization : slower inference on Android phone TensorFlow android , help_request	2	1045	June 3, 2022
Model Maker TF Lite Slow Inference General Discussion models , tflite , model_maker , help_request	11	5477	September 6, 2021
Tensorflow lite inference time General Discussion tflite , help_request	4	1137	July 23, 2021
No model size reduction in Tflite model size with integer Quantisation General Discussion models , keras , tflite , model_optimization , help_request	6	2141	July 7, 2021
Performance issues with tflite quantized models on ARM Cortex-A73 CPU General Discussion models , tflite	1	116	July 31, 2024

Quantized version of tflite model does slower inferences on Android phones

Related topics