Hey,
First, the post-quantization works fine! Now, I want to try the quantization aware training to convert it to int8. Therefore, I followed the basic quantization aware training tensorflow tutorial. Basically, I have the following code
import tensorflow_model_optimization as tfmot
model = keras.Sequential([
keras.layers.InputLayer(input_shape=(28, 28)),
keras.layers.Reshape(target_shape=(28, 28, 1)),
keras.layers.Conv2D(filters=12, kernel_size=(3, 3), activation='relu'),
keras.layers.MaxPooling2D(pool_size=(2, 2)),
keras.layers.Flatten(),
keras.layers.Dense(10)
])
quantize_model = tfmot.quantization.keras.quantize_model
q_aware_model = quantize_model(model)
q_aware_model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
converter = tf.lite.TFLiteConverter.from_keras_model(q_aware_model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
quantized_tflite_model = converter.convert()
interpreter = tf.lite.Interpreter(model_content=quantized_tflite_model)
Looking at e.g. interpreter.get_input_details()[0].dtype gives me np.float32 (and does not have any quantization parameters), although full integer post-quantization correctly shows dtype np.uint8 with corresponding parameters. So the in- and output parameters and the entire model appear to be still in float32. What is the problem?