Int8 tflite model performs worse than float32 model on ARM Cortex-A53

FrankS · June 23, 2025, 7:27am

Hi all,

I’m using the TI Linux SDK AM62A with 4 ARM Cortex-A53 to run YOLO models (e.g. v9 tiny) with tflite interpreter. For float32 models XNNPACK and Neon SIMD instructions are enabled and the inference is running multi threaded on all cores. When using the int8 quantized model (quantized with Ultralytics export mode) tflite interpreter uses GEMM (confirmed with perf) instead of XNNPACK while still working multi threaded. Using GEMM and NEON SIMD with int8 results in almost twice as much inference time, while I would normally expect quantized models to be faster than float32 models. Is there any idea to improve the performance of int8 models using this setup?

Thank you!!

Divya_Sree_Kayyuri · July 2, 2025, 6:16am

Hi @Franks, Welcome to the TensorFlow Forum!
Could you please try this with the latest stable version of TensorFlow, and please provide a sample code snippet , how you’re quantizing the model to better understand the issue. Thank you!

FrankS · July 2, 2025, 11:08am

Hi @Divya_Sree_Kayyuri, thanks for your response! The latest TFLite Runtime version I can use is 2.12.0, when using the newest provided Linux-Image by TI for that SDK. The quantization is done as follows:

from ultralytics import YOLO
model = YOLO(“yolov9t.pt”)
model.export(format=“tflite”, int8=True, data=“coco8.yaml”)

This code snippet dumps 5 tflite models of which I used yolov9_full_integer_quant.tflite.

Topic		Replies	Views
Performance issues with tflite quantized models on ARM Cortex-A73 CPU General Discussion models , tflite	1	104	July 31, 2024
Quantized version of tflite model does slower inferences on Android phones General Discussion models , android , tflite , help_request	1	986	December 13, 2022
TFlite conversion is slow on raspi4 General Discussion tflite , raspberry_pi , help_request	5	1437	December 8, 2023
Model Maker TF Lite Slow Inference General Discussion models , tflite , model_maker , help_request	11	5472	September 6, 2021
Tensorflow lite inference time General Discussion tflite , help_request	4	1113	July 23, 2021

Int8 tflite model performs worse than float32 model on ARM Cortex-A53

Related topics