hi, guys,
Recently I have been studying quantization in tf-lite. I found that “Quantization” appears at several places in the Tensorflow code base.
The first one is under “tensorflow\compiler\mlir\lite\quantization”. Per my understanding, this is the “new” way to quantize the TensorFlow model to tflite.
The second one is under “tensorflow\lite\tools\optimize”. This might be the “old” quantization and it would be deprecated in the future (per my understanding).
And, the third one I can find is under “tensorflow\lite\toco”. Looks like this one can be built to a command line tool, and quantization can be performed then. While the 1st and 2nd can be integrated into Python scripts.
My question is: What are the relationships for the tf-lite quantization that appears at these three places? Which one will be used most frequently? And, do they hold the same mechanism in essential?
I appreciate it if anyone can provide any information.
Overview. Integer quantization is an optimization strategy that converts 32-bit floating-point numbers (such as weights and activation outputs) to the nearest 8-bit fixed-point numbers. This results in a smaller model and increased inferencing speed, which is valuable for low-power devices such as microcontrollers.
hi, @Amin_Jabari thank you for your feedback!
I understand the functionality of integer quantization in tflite. My real question is that from the codebase, it looks like several places (or say, we have several piece of code) that can do quantization. Then, what’s the relationship between these code? Or, if I am converting a tensorflow model to a tflite model with enabling quantization, where is the entrance for the quantization? Is the entrance “tensorflow\compiler\mlir\lite\quantization”, or “tensorflow\lite\tools\optimize”?
One thing I was noticed that the TOCO is a deprecated API, which means the quantization will not choose TOCO as an entrance.