Avoid Using Pandas' Apply() Method At All Times

Clearing a common misconception about a popular method.

The apply() method in Pandas is the most common approach to apply a function along an axis of a DataFrame/Series.

But contrary to common belief, Pandas' apply() method:

  • is NOT vectorized

  • instead, it's a glorified for-loop

Thus, it does not offer any inherent optimization and the code runs at native Python speed.

One solution is to eliminate the apply() method by using a vectorized approach.

But it is understandable that at times, coming up with a vectorized approach is difficult. (Here’s one of my previous guides on this: If You Are Not Able To Code A Vectorized Approach, Try This)

Another solution is to parallelize the apply() method by using external libraries.

The image above compares the run-time of alternatives that support parallelization.

It is evident that Pandas’ apply() is not the optimal way to apply a method.

Get started with these libraries here:

👉 Over to you: What are some other techniques you commonly use to optimize Pandas’ operations?

👉 Read what others are saying about this post on LinkedIn and Twitter.

👉 Tell the world what makes this newsletter special for you by leaving a review here :)

👉 If you liked this post, don’t forget to leave a like ❤️. It helps more people discover this newsletter on Substack and tells me that you appreciate reading these daily insights. The button is located towards the bottom of this email.

👉 If you love reading this newsletter, feel free to share it with friends!

👉 Sponsor the Daily Dose of Data Science Newsletter. More info here: Sponsorship details.

Find the code for my tips here: GitHub.

I like to explore, experiment and write about data science concepts and tools. You can read my articles on Medium. Also, you can connect with me on LinkedIn and Twitter.

Reply

or to participate.