Performance issues with tflite quantized models on ARM Cortex-A73 CPU

Hi everyone,

We are working on effective deployment of AI models on various devices such as smartphones and smart TVs.

We observed an interesting phenomenon:

  • a significant speed-up when moving from float32 models to int8 models on Cortex_A55 (25ms → 5ms inference time)
  • a much lower speed-up when moving from float32 models to int8 models on Cortex_A73 (23ms → 18ms inference time)

Do you have any idea what may be the reason of this? It seems that XNNPACK (with quantized models support) does not activate correctly on A73 CPU?

Some technical details:

Best regards,
Michal

Hi @Michal_Kudelski ,

There are several factors in play when running on resource constrained devices, from what i see from the ARM comparison pdf table the Cortex-A55 is a more recent design (Armv8.2-A) compared to the Cortex-A73 (Armv8-A)
also the Cortex-A55 explicitly supports Dot Product operations, which can be highly beneficial for int8 computations in neural networks. This feature might not be as optimized or present in the same form in the Cortex-A73.

Thank You .