Quantization aware training -> In/Output still float32?

Hey,
First, the post-quantization works fine! Now, I want to try the quantization aware training to convert it to int8. Therefore, I followed the basic quantization aware training tensorflow tutorial. Basically, I have the following code

import tensorflow_model_optimization as tfmot

model = keras.Sequential([
  keras.layers.InputLayer(input_shape=(28, 28)),
  keras.layers.Reshape(target_shape=(28, 28, 1)),
  keras.layers.Conv2D(filters=12, kernel_size=(3, 3), activation='relu'),
  keras.layers.MaxPooling2D(pool_size=(2, 2)),
  keras.layers.Flatten(),
  keras.layers.Dense(10)
 ])

quantize_model = tfmot.quantization.keras.quantize_model
q_aware_model = quantize_model(model)
q_aware_model.compile(optimizer='adam',
          loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
          metrics=['accuracy'])

converter = tf.lite.TFLiteConverter.from_keras_model(q_aware_model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
quantized_tflite_model = converter.convert()

 interpreter = tf.lite.Interpreter(model_content=quantized_tflite_model)

Looking at e.g. interpreter.get_input_details()[0].dtype gives me np.float32 (and does not have any quantization parameters), although full integer post-quantization correctly shows dtype np.uint8 with corresponding parameters. So the in- and output parameters and the entire model appear to be still in float32. What is the problem?

HI,

The QAT API “simulate” the int8 quantization so that the input/output is float32 but still alternating within range represented by 8 bits.

If you want to get real int8 network with QAT, you need to convert the tf graph to tflite model as in Quantization aware training comprehensive guide  |  TensorFlow Model Optimization