TFLite Micro Add Layer - INT8 models

The general approach in TFLite Micro (Quantized) for layers such as Convolution, Multiplication is that the operation is performed and the resulting tensor is Scaled/Saturated (SaturatingRoundingDoublingHighMul followed by RoundingDivideByPOT )

However the Add Operation seems to have a different approach.
The input tensors are first scaled/saturated, the resulting tensor after computation is also scaled/saturated.
Any mathematical reason as to why the add layer needs this preprocessing?

The same computational approach is also followed in CMSIS-NN for Add layer.
(arm_nn_requantize function from arm_nnsupportfunctions.h)

Curious to know the mathematical implications behind this.

Hi @Sundari_Swathy_Meena,

Sorry for the delayed response. The main reason to scaled the input tensor before add operation is to avoid numerical overflow and underflow in TFLite Micro and CMSIS-NN, especially working with low-precision fixed-point arithmetic. But in case of multiplication the operand is already a quantized_multiplier.

RoundingDivideByPOT(SaturatingRoundingDoublingHighMul(
                                 x * (1 << left_shift), quantized_multiplier),
                             right_shift);

Thank You