Performance issues with tflite quantized models on ARM Cortex-A73 CPU

Michal_Kudelski · June 25, 2024, 1:50pm

Hi everyone,

We are working on effective deployment of AI models on various devices such as smartphones and smart TVs.

We observed an interesting phenomenon:

a significant speed-up when moving from float32 models to int8 models on Cortex_A55 (25ms → 5ms inference time)
a much lower speed-up when moving from float32 models to int8 models on Cortex_A73 (23ms → 18ms inference time)

Do you have any idea what may be the reason of this? It seems that XNNPACK (with quantized models support) does not activate correctly on A73 CPU?

Some technical details:

we use tflite with XNNpack (latest TFLite version), we tested both uint8 and int8 models
we use per-tensor quantization
our convolutional architectures are based on FSMN (https://arxiv.org/pdf/1512.08301) and DeepFSMN (https://arxiv.org/pdf/1803.05030)
we use the benchmark tool for our comparisons (Performance measurement | TensorFlow Lite) and also our own compiled binaries (C++ environment), both benchmarks show similar results

Best regards,
Michal

Aniket_Dubey · July 31, 2024, 9:32am

Hi @Michal_Kudelski ,

There are several factors in play when running on resource constrained devices, from what i see from the ARM comparison pdf table the Cortex-A55 is a more recent design (Armv8.2-A) compared to the Cortex-A73 (Armv8-A)
also the Cortex-A55 explicitly supports Dot Product operations, which can be highly beneficial for int8 computations in neural networks. This feature might not be as optimized or present in the same form in the Cortex-A73.

Thank You .

Topic		Replies	Views
Int8 tflite model performs worse than float32 model on ARM Cortex-A53 General Discussion tflite	2	67	July 2, 2025
Quantized version of tflite model does slower inferences on Android phones General Discussion models , android , tflite , help_request	1	987	December 13, 2022
CMSIS NN for float32 model General Discussion tflite_micro , model_optimization , help_request	1	1140	August 31, 2024
Model Maker TF Lite Slow Inference General Discussion models , tflite , model_maker , help_request	11	5473	September 6, 2021
Full integer quantization General Discussion tflite_micro	3	92	July 8, 2025

Performance issues with tflite quantized models on ARM Cortex-A73 CPU

Related topics