Full integer conversion of LSTM cell

Can anyone help me understand what exactly is done when quantizing an LSTM layer (allowing only integers). I’m very familiar with the mathematical operations that are performed in a classic LSTM cell, but I’m having trouble understanding how sigmoids and tanhs are performed with lookup tables, and when rescaling is performed to avoid values that are too large and thus memory overruns. Also I don’t get what you mean when saying The input activations are quantized as uint8 on the interval [-1, 127/128]. I think I’ve found the source code here : https://github.com/tensorflow/tflite-micro/blob/be11bd79e4a8b28c9ee92c6f02ca0e85414fb768/tensorflow/lite/kernels/internal/reference/lstm_cell.h#L143

Hi @Lisa, The quantization is the process of reducing the number of bits used to represent weights and activations in a neural network. This is done by scaling the values by a quantization factor and rounding them to the nearest integer values.

The look up table contains the pre-computed values of the activation function for a range of possible inputs. During runtime, the function value for a given input is retrieved from this table. please refer to this document to know more about lookup table.

If the scaled values before quantization are too large for the target data causing overflow. Rescaling will ensure the values to be within the representable limits.

All the floating point values will be represented between -1 to 127/128. Thank You.

Example with LSTM Cell

In the context of an LSTM cell, the main operations (matrix multiplications, element-wise additions, and nonlinear functions) are quantized as follows:

// Precompute the sigmoid values for all possible 8-bit integers
std::vector<float> sigmoid_table(256);
for (int i = 0; i < 256; ++i) {
  float x = (i - 128) / 128.0;  // Map int8 to the range of [-1, 1)
  sigmoid_table[i] = 1 / (1 + exp(-x));
}

// During inference, use the quantized value to index into the lookup table
uint8_t quantized_value = ...; // Some quantized value
float sigmoid_result = sigmoid_table[quantized_value];