Add quantization output configuration for QAT

anon10279914 · July 1, 2021, 12:41pm

hello, I have the following questions related to Quantization Aware Training (QAT)

In the documentation for QAT configs: tfmot.quantization.keras.QuantizeConfig | TensorFlow Model Optimization

it says:
‘’’ In most cases, a layer outputs only a single tensor so it should only have one quantizer’’

in the case I have a multiple outputs in one layer, should I return a list where every element inside is the respective quantizer for every ordered output?

How can I simulate exactly the way tflite makes the quantization during training?, you have different options here (Last Value, All Values and Moving average): Module: tfmot.quantization.keras.quantizers | TensorFlow Model Optimization

in the case of convolutions, dense, batch normalization or simple max layers which of these techniques is used by tflite to apply the quantization for the respective layer

how do you configurate this properly, why would I need Moving average for the activations and Last value for the weights, do you have any documentation I can read

Does it make sense to annotate certain operations:

for example suppose I have element-wise max activations instead of relu, does it make sense to quantize the output of these? the output of the previous layer probably is int8 so if there is no posterior transformation to the values but is a slice and selection or max, do I need to necessarily to quantize these ops, what about maxpooling, or paddings.

Rino_Lee · July 30, 2021, 12:48pm

@Jaehong_Kim Could you answer this question?

Jaehong_Kim · August 4, 2021, 8:30am

Current, QAT API doesn’t support multiple output quantizer on QuantizeConfig. (We have to supports at some point.)
Our default QAT API scheme follows some practice as a paper on bottom of this doc : Quantization aware training | TensorFlow Model Optimization
But QAT API supports custom quantization scheme, so you can change the quantizer by create new scheme or quantizeConfig if you want to.

QAT default scheme is only for TFLite int8 quantization. (But API supports other deployment logic by implements scheme.) This QAT default scheme is sometimes changed when we add some optimized kernels on TFLite. (e.g. if we added FC-Relu fused layer to TFLite, then scheme also reflect that and remove fakequant between FC and relu.), And documentation for this underlying logic is not ready yet.

It also depends on TFLite converter and kernel.
I know it’s not easy to find where you should add quantizer.
There’s some tip to know where you should add quantizer:

A. Run PTQ and see tflite structure. most of common case, the result from QAT also very similar result except their weights. You may have to add quantizer between PTQ ops.
B. Add quantizer and then run QAT.

B-1. If result TFLite contains useless quantize op, then remove quantizer on that location.
B-2. If some op is not quantize, then we have to add more quantizer around that op.

Topic		Replies	Views
QAT training: convert input/output to 8 bits instead of float32 General Discussion models , keras , tflite	7	1416	May 16, 2023
Custom Quant. on Conv Layers General Discussion models , tflite , help_request	1	627	February 28, 2025
Quantization aware training with quantizationConfig -> 4 % Accuracy loss General Discussion models , keras , model_optimization , help_request	1	1681	April 25, 2022
Quantization aware training -> In/Output still float32? General Discussion models , keras , model_optimization , help_request	1	1597	May 13, 2022
Why PQT is better than QAT? TensorFlow models , tfkeras	2	274	March 23, 2024

Add quantization output configuration for QAT

Related topics