TFLite Batchnorm layer : Qunatized models

It is seen that the Batchnorm layer appears as Mul and Add nodes after conversion; However if the Mul and Add functionalities are replicated using gamma, beta and input tensors, there seems to be a discrepancy for quantized models. Is there any additional computation apart from these two that is causing this issue?
The FP32 model seems to work just as fine when all parameters are extracted and used as Batchnorm layer , the same is not true for INT8/UINT8 models.

1 Like

Hi @Sundari_Swathy_Meena,

The difference you are observing between your FP32 and int8/uint8 models after converting batch normalization layers is due to the quantization process employed in TFLite conversion. The FP32 models usually have high precision compared to the quantized models(int8/uint8). The quantization process involves scaling and shifting the floating point numbers to fit within the 8-bit range which introduces quantization errors. Please refer to the quantization documentation. However if the difference is large between the models, please share the reproducible code to inspect the possible reasons.

Thank you