The Limitation of Linear Regression Which is Often Overlooked By Many

...And how to address it.

Linear Regression is the most widely used ML algorithm.

But it is sensitive to outliers.

In fact, even a few outliers can significantly impact Linear Regression performance.

Instead, try RANSAC Regression. It is

  • non-deterministic,

  • iterative, and

  • robust to outliers.

It works as follows:

  • Select a subset of data

  • Fit a model

  • Calculate residuals

  • Classify points as outliers/inliers based on thresholds applied to residuals

  • Repeat (until max iterations or when a condition is met)

As shown above, while Linear Regression is influenced by outliers, RANSAC Regression isn't.

Nonetheless, it is always recommended to experiment with many robust methods and see which one fits your data best.

Having said that, what are some other popular models that are robust to outliers? Let me know :)

👉 Get started with RANSAC: Sklearn Docs.

👉 If you liked this post, don’t forget to leave a like ❤️. It helps more people discover this newsletter on Substack and tells me that you appreciate reading these daily insights.

The button is located towards the bottom of this email.

Thanks for reading!

Latest full articles

If you’re not a full subscriber, here’s what you missed last month:

To receive all full articles and support the Daily Dose of Data Science, consider subscribing:

👉 Tell the world what makes this newsletter special for you by leaving a review here :)

👉 If you love reading this newsletter, feel free to share it with friends!

Reply

or to participate.