Tflite batch_matmul fails on large inputs

Finding that my seq2seq model (with dynamic input sizes) runs well in the tflite benchmark, and with the python tflite interpreter, but on mobile with the C API it fails with the following error when the input size becomes too large.

tflite/tensorflow/tensorflow/lite/kernels/batch_matmul.cc:466 scaling_factor_size >= num_batches_to_quantize was not true.

The model was generated from tensorflow using dynamic quantization (the default), the issue does not appear when doing no quantization or strict int8 quantization.

Hi @ameenba,

Might have resolved the issue by this time. Sorry for the delayed response. First of all, the error you encountered might be the C API not have the same level of dynamic allocation flexibility as the Python API, leading to the error if the allocated space for scaling factors is insufficient. Next, if possible limit the maximum batch size supported on mobile. When dynamic input sizes is of concern, it is recommended to use static quantization for inferencing on mobile environment.

Thank You