I have a fp16 tflite model running on ARM cpu. When I run inference with a batch of N (N>1) input vectors, all input vectors are the same, I obtained N output vectors. All output vectors are the same as expected. However, when I compare this output vector with the one obtained from the inference result of only one input vector, they are slightly different.
For example, with the worse case in my test, a component of my output vector has a value of 0,00019744 vs 0,0001975 (N>1) (0,03%).
I think you might have understood the reason for your query by this time. Here are a few more details added. Your observation is right, it’s very common to have slight differences in output vectors when running inference with a batch of identical input vectors as compared to a single input vector, even though the model uses FP16 precision. One of the known reason is batch normalization, during batch normalization, the statistical operations like mean, variance are calculated on batchwise so these statistics are obviously different for batch size.
For your use case, the difference is acceptable and to minimize the difference you can use FP32 precision also