I’m moving the discussion of this issue here.
Briefly, I’m trying to convert a TensorFlow model to TFLite. This model should maintain a non-trainable state vector that I update at every inference, similar to an RNN. My issue is that when I try to convert it to TFLite, this state vector doesn’t seem to get interpreted as a quantized value, so the network adds quantization and dequantization operations every time it needs to read/write from it.
The model, and corresponding Netron graph, are found in the referenced GitHub issue. They can be found and reproduced with this gist.
Ultimately, I would like to deploy this on a Coral EdgeTPU, but I would like to minimize unnecessary ops, such as the quantize/dequantize blocks. This should be possible since I want the state vector to be int8 and not float32 in the TFLIte model. How can I do that?