Packing of pruned model with 65% sparsity

We have designed a custom packer to pack the sparsity in pruned models for specific target hardwares. After pruning, when the model is exported to .h5 format, the size of model is dropping significantly.

  1. Why is this the case as the number of weights are still same(though 65% are 0s) and precision is float32 ?
  2. Is there any compression inherently happening in storing the data in h5 format ?
  3. If there is some compression/sparsity packing happening while saving the model in h5, how can we stop it so it retains the original size of the model ?

Thanks in advance. :blush:

Hi @Swaraj_Badhei

Sorry for the delayed response. You might have understood the concept by this time. However some more pointers are discussed here. Your observation is correct, when pruned model is exported to .h5 format, the model size is reduced and which is as intended.

  • Usually the pruned model generates large number of zero weights, and hence non-zero weights along their indices need to be stored efficiently where the use of sparse matrix representation plays an important role. Pruned models are stored in sparce matrix representation unlike the other TF or Keras models which are stored in dense format.

  • No explicit compression takes place, the pruning itself reduces the model size.

  • To retain the size of your original custom model, either you can save manually the model to .npy format.
    Please follow the link for further information

Thank You