One popular approach for reducing the resource requirements at test time is Neural Network Pruning. This means systematically removing parameters (neurons, connections, etc.) from an existing network to try to reduce down its size. Tensorflow Model Optimization Toolkit makes it very easy to apply various optimization strategies such as Weight Pruning, Quantization and Weight Clustering. For example this code snippet can be used to prune the model weights by 30 % sparsity
pruning_params = {
'pruning_schedule': tfmot.sparsity.keras.ConstantSparsity(0.3, 0),
'block_size': (1, 1),
'block_pooling_type': 'AVG'
}
model_thirty = tfmot.sparsity.keras.prune_low_magnitude(model,**pruning_params)
log_dir_thirty = tempfile.mkdtemp()
callbacks = [
tfmot.sparsity.keras.UpdatePruningStep(),
tfmot.sparsity.keras.PruningSummaries(log_dir = log_dir_thirty),
WandbCallback(data_type="image",
validation_data=(x_valid, y_valid),
save_model=True)
]
model_thirty.compile(...)
model_thirty.fit(...)
But what about it’s impact on model performance. Not only top level metrics such as top-k accuracy but also it’s performance on the underrepresented classes in the dataset. Well in a paper titled " What Do Compressed Deep Neural Networks Forget?" by Sara Hooker et al, the authors tackled this question.
Check out my minimal reproducibility study verifying the claims of the paper. To minimally reproduce the results, instead of using the Resnet-18, I ran multiple experiments with the InceptionV3 Architecture with a pruning scheduler of constant sparsity s ∈ {0,0.3,0.5,0.7,0.9,0.99}, block size of (1,1) and average block pooling., implemented using the TensorFlow Model Optimization Toolkit. The models were trained for a Binary Image Classification (blonde vs non-blonde) which is an under-represented group in the CelebA dataset (what is sometime referred to as the “long-tail” in literature ). I did not experiment with quantization in my work.
- Github Repository:-
Would love to hear some feedback from the community.