- Daily Dose of Data Science
- Posts
- Ridgeline Plots: An Underrated Gem of Data Visualisation
Ridgeline Plots: An Underrated Gem of Data Visualisation
A pretty useful plot which simplifies data distribution visualization.
Understanding the distributional differences of distinct groups in a variable is quite useful in uncovering insights around:
behavioral disparities,
feature engineering,
predictive modeling, and more.
But in such situations, many data scientists tend to create group-level distribution plots (histograms or density plots) on a single axis and compare them.
While this is (somewhat) okay when there are limited groups, in the presence of many groups, it can create cluttered plots, which may not reveal many insights about distributional differences:
Ridgeline plots (shown below) are a pretty compact and elegant way to visualize the distribution of different variables (or categories of a variable).
More specifically, the vertical stacking on a common axis provides an easy comparison between groups and reveals many insights into the shape and variation of the distributions, which otherwise would be difficult to understand.
This allows us to compare the distributions of multiple groups side by side and understand how they differ.
The image below is another classic example of Ridgeline plots. It depicts the search interest across various events that happened in 2023, and it’s so easy to visualize:
I have been wanting to write about Ridgeline plots for quite some time now. But I intentionally saved it for this time of the year because the above plot depicts a pretty neat usage of these plots, which you can easily relate to.
While Seaborn provides a way to create Ridgeline plots, I have often found the Joypy library to be pretty useful and easy to use:
When to consider Ridgeline plot?
Typically, creating a Ridgeline plot makes sense when the variable has anything above 3-4 groups. This is to avoid the overlap that might appear when visualizing them in a single plot:
Also, Ridgelines plots are relatively more useful when there is a clear pattern and/or ranking on the continuous variable plotted between groups like:
monotonically increasing,
monotonically decreasing,
increasing then decreasing (and so on)…
decreasing then increasing (and so on)…
That is why the order in which you vertically stack the distribution of groups becomes quite important.
For instance, consider the above “Search trends” plot again but with a random arrangement of groups:
I don’t think I have to ask you which one is easier to visualize and understand the flow of events in 2023.
So these were some points that will help you determine whether a Ridgeline plot will be a good fit for visualizing your data.
I created this notebook for you to get started with Ridgeline plots using JoyPy: Ridgeline Plots Notebook.
👉 Over to you: What are some other gems of data visualization that deserve more attention?
👉 If you liked this post, don’t forget to leave a like ❤️. It helps more people discover this newsletter on Substack and tells me that you appreciate reading these daily insights.
The button is located towards the bottom of this email.
Thanks for reading!
Latest full articles
If you’re not a full subscriber, here’s what you missed last month:
DBSCAN++: The Faster and Scalable Alternative to DBSCAN Clustering
Federated Learning: A Critical Step Towards Privacy-Preserving Machine Learning
You Cannot Build Large Data Projects Until You Learn Data Version Control!
Sklearn Models are Not Deployment Friendly! Supercharge Them With Tensor Computations.
Deploy, Version Control, and Manage ML Models Right From Your Jupyter Notebook with Modelbit
Gaussian Mixture Models (GMMs): The Flexible Twin of KMeans.
To receive all full articles and support the Daily Dose of Data Science, consider subscribing:
👉 Tell the world what makes this newsletter special for you by leaving a review here :)
👉 If you love reading this newsletter, feel free to share it with friends!
Reply