- Daily Dose of Data Science
- Posts
- Feature Scaling is NOT Always Necessary
Feature Scaling is NOT Always Necessary
Here's when it is not needed.
Feature scaling is commonly used to improve the performance and stability of ML models.
This is because it scales the data to a standard range. This prevents a specific feature from having a strong influence on the model’s output.
For instance, in the image above, the scale of Income could massively impact the overall prediction. Scaling both features to the same range can mitigate this and improve the model’s performance.
But is it always necessary?
While feature scaling is often crucial, knowing when to do it is also equally important.
Note that many ML algorithms are unaffected by scale. This is evident from the image below.
As shown above:
Logistic regression, SVM Classifier, MLP, and kNN do better with feature scaling.
Decision trees, Random forests, Naive bayes, and Gradient boosting are unaffected.
Consider a decision tree, for instance. It splits the data based on thresholds determined solely by the feature values, regardless of their scale.
Thus, it’s important to understand the nature of your data and the algorithm you intend to use.
You may never need feature scaling if the algorithm is insensitive to the scale of the data.
👉 Over to you: What other algorithms typically work well without scaling data? Let me know :)
👉 If you liked this post, don’t forget to leave a like ❤️. It helps more people discover this newsletter on Substack and tells me that you appreciate reading these daily insights. The button is located towards the bottom of this email.
Thanks for reading!
Latest full articles
If you’re not a full subscriber, here’s what you missed:
DBSCAN++: The Faster and Scalable Alternative to DBSCAN Clustering
Federated Learning: A Critical Step Towards Privacy-Preserving Machine Learning
You Cannot Build Large Data Projects Until You Learn Data Version Control!
Why Bagging is So Ridiculously Effective At Variance Reduction?
Sklearn Models are Not Deployment Friendly! Supercharge Them With Tensor Computations.
Deploy, Version Control, and Manage ML Models Right From Your Jupyter Notebook with Modelbit
Gaussian Mixture Models (GMMs): The Flexible Twin of KMeans.
To receive all full articles and support the Daily Dose of Data Science, consider subscribing:
👉 Tell the world what makes this newsletter special for you by leaving a review here :)
👉 If you love reading this newsletter, feel free to share it with friends!
Reply