Hi all,
I am quantizing models to 16x8 bit precision, but I cannot find any information on the actual spec for the quantized ops. The 8-bit version has a nice overview of the spec here LiteRT 8-bit quantization specification | Google AI Edge | Google AI for Developers
What I would like to know is whether there is some blanket format for all the ops in 16x8 e.g. all the ops are 8-bit symmetric weights/16-bit symmetric activations, or 8-bit symmetric weights/16-bit asymmetric activations etc.
Would anyone know where to find this information? Thanks!