Daily Dose of Data Science
Posts
The Biggest Limitation of Pearson Correlation Which Many Overlook

The Biggest Limitation of Pearson Correlation Which Many Overlook

...And what to use instead, along with a genuine advice on summary statistics.

December 20, 2023 • Reading Time: 5 minutes

Pearson correlation is commonly used to determine the association between two continuous variables.

Many frameworks (in Pandas, for instance) have it as their default correlation metric.

Yet, unknown to many, Pearson correlation:

Only measures the linear relationship.
Penalizes a non-linear yet monotonic association.

Instead, Spearman correlation is a better alternative.

It assesses monotonicity, which can be linear as well as non-linear.

This is evident from the illustration below:

Pearson and Spearman correlation is the same on linear data.
But Pearson correlation underestimates a non-linear association.

Spearman correlation is also useful when data is ranked or ordinal. If you want to learn more about this, we covered it in this issue: The Limitation of Pearson Correlation While Using It With Ordinal Categorical Data.

Also, before I end, remember to always be cautious before drawing any conclusions using summary statistics.

While analyzing data, so many people get tempted to draw conclusions solely based on its statistics. Yet, the actual data might be conveying a totally different story.

This is also evident from the image below:

All nine datasets have approx. zero correlation between the two variables. However, the summary statistic, Pearson correlation in this case, gives no clue about what’s inside the data because it is always zero.

In fact, this is not just about Pearson correlation but applies to all summary statistics. The idea is that whenever you generate any summary statistic, you lose essential information.

Thus, the importance of looking at the data cannot be stressed enough. It saves us from drawing wrong conclusions, which we could have made otherwise by looking at the statistics alone.

I have written more posts on this topic in the past, and I would highly encourage you to read them next:

👉 Over to you: What are some other alternatives that address Pearson’s limitations?

👉 If you liked this post, don’t forget to leave a like ❤️. It helps more people discover this newsletter on Substack and tells me that you appreciate reading these daily insights.

The button is located towards the bottom of this email.

Thanks for reading!

Latest full articles

If you’re not a full subscriber, here’s what you missed last month:

To receive all full articles and support the Daily Dose of Data Science, consider subscribing:

👉 Tell the world what makes this newsletter special for you by leaving a review here :)

👉 If you love reading this newsletter, feel free to share it with friends!

Reply

or to participate.