Activation Pruning — Reduce Neural Network Size Without Significant Performance Drop

Get rid of useless neurons.

Once we complete network training, we are almost always left with plenty of useless neurons — ones that make nearly zero contribution to the network’s performance, but they still consume memory.

In other words, there is a high percentage of neurons, which, if removed from the trained network, will not affect the performance remarkably:

And, of course, I am not saying this as a random and uninformed thought.

I have experimentally verified this over and over across my projects.

Here’s the core idea.

After training is complete, we run the dataset through the model (no backpropagation this time) and analyze the average activation of individual neurons.

Here, we often observe that many neuron activations are always close to near-zero values.

Thus, they can be pruned from the network, as they will have very little impact on the model’s output.

For pruning, we can decide on a pruning threshold (λ) and prune all neurons whose activations are less than this threshold.

This makes intuitive sense as well.

More specifically, if a neuron rarely possesses a high activation value, then it is fair to assume that it isn’t contributing to the model’s output, and we can safely prune it.

The following table compares the accuracy of the pruned model with the original (full) model across a range of pruning thresholds (λ):

Notice something here.

At a pruning threshold λ=0.4, the validation accuracy of the model drops by just 0.62%, but the number of parameters drops by 72%.

That is a huge reduction, while both models being almost equally good!

Of course, there is a trade-off because we are not doing as well as the original model.

But in many cases, especially when deploying ML models, accuracy is not the only primary metric that decides these.

Instead, several operational metrics like efficiency, speed, memory consumption, etc., are also a key deciding factor.

That is why model compression techniques are so crucial in such cases.

If you want to learn more, we discussed them in this deep dive: Model Compression: A Critical Step Towards Efficient Machine Learning.

While we only discussed one such technique today (activation pruning), the article discusses 6 model compression techniques, with PyTorch implementation.

👉 Over to you: What are some other ways to make ML models more production-friendly?

👉 If you liked this post, don’t forget to leave a like ❤️. It helps more people discover this newsletter on Substack and tells me that you appreciate reading these daily insights.

The button is located towards the bottom of this email.

Thanks for reading!

Latest full articles

If you’re not a full subscriber, here’s what you missed last month:

To receive all full articles and support the Daily Dose of Data Science, consider subscribing:

👉 Tell the world what makes this newsletter special for you by leaving a review here :)

👉 If you love reading this newsletter, feel free to share it with friends!

Reply

or to participate.