Theil-Sen Regression: The Robust Twin of Linear Regression

Addressing the limitation of Linear Regression.

Linear Regression is the most widely used ML algorithm.

But it is sensitive to outliers.

In fact, even a few outliers can significantly impact its performance.

Instead, try TheilSenRegressor. It is an outlier-robust regression algorithm.

It works as follows:

  • Select a subset of data

  • Fit a least squares model

  • Record model weights

  • Repeat

The final weights are the spatial median (or L1 Median) of all models.

The spatial median represents the β€œmiddle” or central location in a multidimensional space.

Essentially, the objective is to find a point in the same multidimensional space which minimizes the sum of the absolute differences between itself and all other points (weight vectors, in this case).

As shown above, while Linear Regression is influenced by outliers, Theil-Sen Regression isn't.

Having said that, it is always recommended to experiment with many robust methods and see which one fits your data best.

πŸ‘‰ Get started with Theil-Sen Estimator: Sklearn Docs.

πŸ‘‰ Over to you: What are some other popular models that are robust to outliers? Let me know :)

πŸ‘‰ Read what others are saying about this post on LinkedIn and Twitter.

πŸ‘‰ Tell the world what makes this newsletter special for you by leaving a review here :)

πŸ‘‰ If you liked this post, don’t forget to leave a like ❀️. It helps more people discover this newsletter on Substack and tells me that you appreciate reading these daily insights. The button is located towards the bottom of this email.

πŸ‘‰ If you love reading this newsletter, feel free to share it with friends!

πŸ‘‰ Sponsor the Daily Dose of Data Science Newsletter. More info here: Sponsorship details.

Find the code for my tips here: GitHub.

I like to explore, experiment and write about data science concepts and tools. You can read my articles on Medium. Also, you can connect with me on LinkedIn and Twitter.

Reply

or to participate.