- Daily Dose of Data Science
- Posts
- Avoid Using Pandas' Apply() Method At All Times
Avoid Using Pandas' Apply() Method At All Times
Clearing a common misconception about a popular method.
The apply() method in Pandas is the most common approach to apply a function along an axis of a DataFrame/Series.
But contrary to common belief, Pandas' apply() method:
is NOT vectorized
instead, it's a glorified for-loop
Thus, it does not offer any inherent optimization and the code runs at native Python speed.
One solution is to eliminate the apply() method by using a vectorized approach.
But it is understandable that at times, coming up with a vectorized approach is difficult. (Here’s one of my previous guides on this: If You Are Not Able To Code A Vectorized Approach, Try This)
Another solution is to parallelize the apply() method by using external libraries.
The image above compares the run-time of alternatives that support parallelization.
It is evident that Pandas’ apply() is not the optimal way to apply a method.
Get started with these libraries here:
Pandarallel: https://github.com/nalepae/pandarallel
Parallel Pandas: https://pypi.org/project/parallel-pandas/
Mapply: https://pypi.org/project/mapply/
👉 Over to you: What are some other techniques you commonly use to optimize Pandas’ operations?
👉 Tell the world what makes this newsletter special for you by leaving a review here :)
👉 If you liked this post, don’t forget to leave a like ❤️. It helps more people discover this newsletter on Substack and tells me that you appreciate reading these daily insights. The button is located towards the bottom of this email.
👉 If you love reading this newsletter, feel free to share it with friends!
👉 Sponsor the Daily Dose of Data Science Newsletter. More info here: Sponsorship details.
Find the code for my tips here: GitHub.
Reply