Hi, I am new to Tensorflow Lite. I plan to deploy an 8-bit LSTM on a self-developed chip. I would love to use the full integer quantization for simplicity. But I cannot fully understand how the full integer quantization work. Can I get the quantized weights of the LSTM and use them for my computation on the chip?
The problem is that it seems that each tensor has a scale and zero point. I do not know how the operation between such two tensors works. Take the multiplication for example, does the multiplication of two 8-bit quantized tensors simply their multiplication? Do the scale and zero point play roles here?
Thanks in advance.