Post-training quantization. Where to learn more?

aqaaqa · April 27, 2022, 12:33pm

Hi
I am using post-training quantization to create 8-bit quantized tensorflow models. The documentation is simple to follow and I achieve results that I am happy with.

I want to learn more about post-training quantization with regards to what actually happens under the hood. I have read https://arxiv.org/pdf/1712.05877.pdf, which i found as a reference for the quantization scheme. My impression is that this mostly describes quantization aware training, but I could be wrong.

Do you have any recommendations for where I can learn more about post-training quantization? I assume the source code is complex to read. I need to be able to understand the algorithm as part of my master thesis.

My current limited understanding is that parameters such as weights are fixed. These can be quantized by using min and max values. Values such as activations and input to the model are dynamic, and the min/max values of these needs to be estimated using a representative dataset and running inference.

battery · May 3, 2022, 11:12am

Hi @aqaaqa Thank you for showing interests in post-training quantization.

Here are some recommended documentations for TFLite post-training quantization:

https://www.tensorflow.org/lite/performance/model_optimization

Zari_Kokha · May 31, 2024, 3:39pm

Hi @battery
Thank you for the helpful links.
There are some vague points for me in these links. Consider a model with a single Convolution layer followed by ReLU activation (or any other activation) and I want to apply post-training Dynamic range quantization on that. As it has been mentioned in those links, in DRQ, weights are going to be saved in int8 and activations will be saved in floating point. Also, it has mentioned that Activations will be quantized and dequantized on the fly for the int8 supported operations. My questions:
1- What does it mean by “The activations are always stored in floating point” in those links? Does it mean that the output tensor of the activation function stored in floating point? or there is something else related to the activation that sored in floating point.
2- As the input of the model is in floating point, I suppose that the Convolution computation will be done in floating point and it dequantized weights to floating point (or probably there is no need to dequantize the weights)? Can you give me some information about these computations?
3- Is the output of the Convolution computation (before applying activation) in floating point?

Zari_Kokha · June 5, 2024, 11:41am

Someone reply please

Topic		Replies	Views
Detail information about DRQ Tflite model General Discussion tflite , tffloat , model	4	99	June 20, 2024
QAT training: convert input/output to 8 bits instead of float32 General Discussion models , keras , tflite	7	1479	May 16, 2023
Quantization spec for 16x8 quantization TensorFlow tflite-support , litert	1	88	December 31, 2025
How are Conv layer (and others layers) outputs quantized? General Discussion tflite , help_request	1	792	September 5, 2021
Quantize Activations. What values are quantized? General Discussion education , help_request	1	908	October 11, 2021

Post-training quantization. Where to learn more?

Related topics