hello, I have the following questions related to Quantization Aware Training (QAT)
In the documentation for QAT configs: tfmot.quantization.keras.QuantizeConfig | TensorFlow Model Optimization
it says:
‘’’ In most cases, a layer outputs only a single tensor so it should only have one quantizer’’
in the case I have a multiple outputs in one layer, should I return a list where every element inside is the respective quantizer for every ordered output?
- How can I simulate exactly the way tflite makes the quantization during training?, you have different options here (Last Value, All Values and Moving average): Module: tfmot.quantization.keras.quantizers | TensorFlow Model Optimization
in the case of convolutions, dense, batch normalization or simple max layers which of these techniques is used by tflite to apply the quantization for the respective layer
how do you configurate this properly, why would I need Moving average for the activations and Last value for the weights, do you have any documentation I can read
- Does it make sense to annotate certain operations:
for example suppose I have element-wise max activations instead of relu, does it make sense to quantize the output of these? the output of the previous layer probably is int8 so if there is no posterior transformation to the values but is a slice and selection or max, do I need to necessarily to quantize these ops, what about maxpooling, or paddings.