Extract Quantization parameters used for convolution layer weights and perform layer by layer inference in C++

rdk · October 21, 2024, 8:57am

Hi everyone,

The project aim is to extract the weights from a trained and quantized tflite model and perform inference using C++ script layer by layer instead of using tensorflow library for the inference.

I am able to extract the model details and its layer weights using the functions. Also, by setting the flag preserver_all_tensors=True I can store the intemediate tensors results.

interpreter = tf.lite.Interpreter(model_path=model_path, experimental_preserve_all_tensors=True)

tensor_details = interpreter.get_tensor_details()

I run the inference for 1st time using tensorflow library in a python script and get the details (input tensor, weight tensor, output tensor, quantiztion parameter) for each layer.
Then using those extracted weights and with the same input I try to run the inference again but now with my implementations of layer operations (convolution, maxpool, dense) instead of tensorflow library.
I implemented convolution, dense, maxpool, flatten functions in my C++ script. Then, I take the preprocessed input and pass it along the functions to perform the layer by layer inference.
I am able to get the same result when I run the inference for both the scripts (using tensorflow library and with my custom implementations) for operations (maxpool and dense) but for convolution it is different.

My convolution function implementation in C++ gives different result compared to the tensorflow library. I think the difference is because the quantization parameters (scaling and zero point) are not provided for the convolution layer weights.
From the output of tensor->get_details(), the quantization parameter for convolution weights tensor is (0, 0) which I believe should be wrong since, with scaling=0 makes the entire kernel to be 0 when multiplied with quantization parameter.

Can anyone please provide me with some insights on

how to implement the convolution function as done in tensorflow library.
is the convolution weights tensor scaling factor should be calculated or is it already defined (if so where to find it)?

The model details are below. The convolution and dense layer details is given below. The dense layer kernel weights have quantization parameter (0.003373592859134078, 0) but the convolution layer kernel weights have quantization parameter as (0, 0).

Layer/Op	Input Tensors	Output Tensors	Input Shape	Output Shape	Quantization (scale, zero_point)	Datatype
Conv2D (conv2d)	T#0 (input), T#11 (kernel), T#10 (bias)	T#12 (conv2d output)	[1, 28, 28, 1]	[1, 26, 26, 32]	(input: 0.00392157, -128), (output: 0.00292, -128)	int8 (input/output), int8 (kernel), int32 (bias)
—	—	—	—	—	—
Dense (dense)	T#17 (flatten output), T#5 (kernel), T#4 (bias)	T#18 (dense output)	[1, 576]	[1, 64]	(input: 0.02445, -128), (output: 0.04887, -128)	int8 (input/output), int8 (kernel), int32 (bias)

The tensor details are

Layer Name	Tensor Index	Input Shape/Weights Shape	Output Shape	Quantization (scale	zero point)
tfl.pseudo_qconst8	10	[32]	-	(0.0, 0)	numpy.int32
tfl.pseudo_qconst9	11	[32, 3, 3, 1]	-	(0.0, 0)	numpy.int8
sequential_1/conv2d_1	12	[1, 26, 26, 32]	-	(0.0029200201388448477, -128)	numpy.int8
—	—	—	—	—	—
sequential_1/flatten_1	17	[1, 576]	-	(0.024446312338113785, -128)	numpy.int8
sequential_1/dense_1	18	[1, 64]	-	(0.048872217535972595, -128)	numpy.int8
tfl.pseudo_qconst2	4	[64]	-	(8.247190271504223e-05, 0)	numpy.int32
tfl.pseudo_qconst3	5	[64, 576]	-	(0.003373592859134078, 0)	numpy.int8

Topic		Replies	Views
Calculation of quantized dense layers General Discussion tflite	1	385	December 28, 2023
Custom Quant. on Conv Layers General Discussion models , tflite , help_request	1	631	February 28, 2025
How are Conv layer (and others layers) outputs quantized? General Discussion tflite , help_request	1	765	September 5, 2021
Post integer quantization error of custom model General Discussion tflite , model	1	159	February 28, 2024
Custom convolutional op in TF Lite General Discussion tflite , help_request	3	1259	July 12, 2021

Extract Quantization parameters used for convolution layer weights and perform layer by layer inference in C++

Related topics