KMeans vs. Gaussian Mixture Models

Addressing the major limitation of KMeans.

March 29, 2024 • Reading Time: 5 minutes

I like to think of Gaussian Mixture Models as a more generalized version of KMeans.

While we have covered all the necessary conceptual and practical details here: Gaussian Mixture Models…

…let me tell you some of the widely known limitations of KMeans that you might not be aware of.

To begin:

It can only produce globular clusters. For instance, as shown below, even if the data has non-circular clusters, it still produces round clusters.

It performs a hard assignment. There are no probabilistic estimates of each data point belonging to each cluster.

It only relies on distance-based measures to assign data points to clusters.
- To understand better, consider two clusters in 2D — A and B. Cluster A has a higher spread than B.
- Now consider a line that is mid-way between centroids of A and B.
- Although A has a higher spread, even if a point is slightly right to the midline, it will get assigned to cluster B.
- Ideally, however, cluster A should have had a larger area of influence.

These limitations often make KMeans a non-ideal choice for clustering.

Gaussian Mixture Models are often a superior algorithm in this respect.

As the name suggests, they can cluster a dataset that has a mixture of many Gaussian distributions.

They can be thought of as a more flexible twin of KMeans.

The primary difference is that:

For instance, in 2 dimensions:

This is illustrated in the animation below:

The effectiveness of GMMs over KMeans is evident from the image below.

If you want to get into more details, I covered them in-depth here: Gaussian Mixture Models.

It covers:

👉 Over to you: What are some other shortcomings of KMeans?

Thanks for reading!

Every week, I publish 1-2 in-depth deep dives (typically 20+ mins long). Here are some of the latest ones that you will surely like:

To receive all full articles and support the Daily Dose of Data Science, consider subscribing:

👉 If you love reading this newsletter, feel free to share it with friends!

👉 Tell the world what makes this newsletter special for you by leaving a review here :)

or to participate.