Why do we apply the standard compression algorithm after the pruning or weight clustering ?
To use the model, anyhow we need to unzip it, right ?
so what difference does it make ??
Why do we apply the standard compression algorithm after the pruning or weight clustering ?
To use the model, anyhow we need to unzip it, right ?
so what difference does it make ??
Hi Mohanish.
This is because the tflite model storing the pruned elements without changing their data type. But the modern compression algorithms pretty efficiently compresses the zero values, so it can reduce the model size after compression.
Yes, the compression might need to undo before the execution, however, considering the tflite is designed for mobile/ioT devices, the model is usually compressed while delivering and storing it to the device. So it is meaningful to have reduced (compressed) model size.
hope this help, any more discussion is welcome!
Hello @Rino_Lee ,
So, please correct my understanding from your reply.
The models are compressed automatically by tflite when storing it in Microcontroller ?
or it should be done manually ?
That’s not done by tflite. It should be done manually by the application developer.
But when converting the tflite model to C array for deployment on microcontrollers the zip is not used.
So, the g-zip step is only for theoretical use/results and has no practical meaning ?
Because I cannot find any practical flow for this.
An example of a practical flow is ARM’s “VELA” tooling for the U55 NPU. This post-processes tflite flatbuffers, compressing constant weight tensors (U55 supports on-they-fly HW weight decompression ).