Hello,
I applied the QAT based training following TensorFlow guidelines (Quantization aware training comprehensive guide | TensorFlow Model Optimization)
After that, I converted the quantized model into tflite format following the TF guideline (see below).
with tfmot.quantization.keras.quantize_scope():
quant_aware_model= tf.keras.models.load_model(keras_qat_model_file)
converter = tf.lite.TFLiteConverter.from_keras_model(quant_aware_model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
quantized_tflite_model = converter.convert()
with open(self.output_model_path, 'wb') as f:
f.write(quantized_tflite_model)
After converting the quantized model into tflite format, the output model size is, as expected, 4X smaller than the original model, due to the fact that the parameters are in 8 bits instead of float 32 bits.
However, the input and outputs are still in float 32 bits. What is the recommended way to convert the input and output in 8 bits instead of float 32 bits ?
Do I need to follow the same procedure than the one described in post-training quantization ?
(Post-training integer quantization | TensorFlow Lite)