Daily Dose of Data Science
Posts
The Advantages and Disadvantages of PCA To Consider Before Using It

The Advantages and Disadvantages of PCA To Consider Before Using It

A quick pros and cons summary of PCA.

May 10, 2023 • Reading Time: 4 minutes

FREE 3-Day Object Detection Challenge

⭐️ Build your own object detection model from start to finish!

Hey friends! Lately, I have been in touch with Data Driven Science. They offer self-paced and hands-on learning on practical data science challenges.

A 3-day object detection challenge is available for free. Here, you’ll get to train an end-to-end ML model for object detection using computer vision techniques.

The challenge is guided, meaning you don’t need any prior expertise. Instead, you will learn as you follow the challenge.

Also, you’ll get to apply many of my previous tips around Image Augmentation, Run-time optimization, and more.

All-in-all, it will be an awesome learning experience.

👉 Register for the challenge here: https://datadrivenscience.com/free-object-detection-challenge/.

Let’s get to today’s post now.

PCA is possibly the most popular dimensionality reduction technique.

If you wish to know how PCA works, I have a highly simplified post here: A Visual and Overly Simplified Guide to PCA.

Yet, it is equally important to be aware of what we get vs. what we compromise when we use PCA.

The above visual depicts five common pros and cons of using PCA.

Advantages

By reducing the data to two dimensions, you can easily visualize it.
PCA removes multicollinearity. Multicollinearity arises when two features are correlated. PCA produces a set of new orthogonal axes to represent the data, which, as the name suggests, are uncorrelated.
PCA removes noise. By reducing the number of dimensions in the data, PCA can help remove noisy and irrelevant features.
PCA reduces model parameters: PCA can help reduce the number of parameters in machine learning models.
PCA reduces model training time. By reducing the number of dimensions, PCA simplifies the calculations involved in a model, leading to faster training times.

Disadvantages

The run-time of PCA is cubic in relation to the number of dimensions of the data. This can be computationally expensive at times for large datasets.
PCA transforms the original input variables into new principal components (or dimensions). The new dimensions offer no interpretability.
While PCA simplifies the data and removes noise, it always leads to some loss of information when we reduce dimensions.
PCA is a linear dimensionality reduction technique, but not all real-world datasets may be linear. Read more about this in my previous post here: The Limitation of PCA Which Many Folks Often Ignore.
PCA gets affected by outliers. This can distort the principal components and affect the accuracy of the results.

Over to you: What are some points that I have missed here? Let me know :)

👉 Read what others are saying about this post on LinkedIn and Twitter.

👉 If you liked this post, don’t forget to leave a like ❤️. It helps more people discover this newsletter on Substack and tells me that you appreciate reading these daily insights. The button is located towards the bottom of this email.

👉 If you love reading this newsletter, feel free to share it with friends!

Find the code for my tips here: GitHub.

I like to explore, experiment and write about data science concepts and tools. You can read my articles on Medium. Also, you can connect with me on LinkedIn and Twitter.

Reply

or to participate.