40 Open-Source Tools to Supercharge Your Pandas Workflow

Pandas receives over 3M downloads per day. But 99% of its users are not using it to its full potential.

I discovered these open-source gems that will immensely supercharge your Pandas workflow the moment you start using them.

  1. Jupyter-Datatables: Enrich the default preview of a DataFrame in jupyter notebook.

    1. Link: https://bit.ly/jupy-dtables

  2. SummaryTools: Supercharge the describe() method in Pandas.

    1. Link: https://bit.ly/summarytools

  3. Sidetable: Supercharge the value_counts() method in Pandas.

    1. Link: https://bit.ly/py-sidetable

  4. Sketch: Generate code/insights about data by asking questions in natural language.

    1. Link: https://bit.ly/py-sketch

  5. Deepchecks: Generate a comprehensive validation report of your data.

    1. Link: https://bit.ly/deepchks

  6. Pandas Flavor: Extend Pandas to attach methods to the dataframe object.

    1. Link: https://bit.ly/pd-flavor

  7. Pandarallel: Parallelize Pandas across multiple CPU cores.

    1. Link: https://bit.ly/pandarallel

  8. PandasML: Pandas, sklearn and matplotlib integrated.

    1. Link: https://bit.ly/pandasml

  9. Geopandas: Work with Geospatial data in Pandas.

    1. Link: https://bit.ly/geo-pd

  10. DuckDB: Run SQL queries on dataframes.

    1. Link: https://bit.ly/duckdb

  11. Modin: Boost Pandas' performance up to 70x by modifying the import.

    1. Link: https://bit.ly/modin-guide

  12. PivotTableJS: Create pivot tables by using drag and drop functionality.

    1. Link: https://bit.ly/pivottablejs

  13. Missingno: Visualize missing values in your dataset.

    1. Link: https://bit.ly/py-missingno

  14. Pandas Alive: Create animated charts for pandas dataframes.

    1. Link: https://bit.ly/pd-alive

  15. Skimpy: Supercharge the describe() method in Pandas.

    1. Link: https://bit.ly/py-skimpy

  16. Pandas-log: Debug pandas pipeline using step-by-step logging.

    1. Link: https://bit.ly/py-log

  17. tsflex: Process time series and perform feature extraction.

    1. Link: https://bit.ly/tsflex

  18. pandas-profiling: Generate EDA report of data in one-line of code.

    1. Link: https://bit.ly/pd-profiling

  19. Mars: A tensor-based framework for scaling numpy, pandas, scikit-learn, and Python functions.

    1. Link: https://bit.ly/py-mars

  20. nptyping: Apply type hints for Pandas data frames.

    1. Link: https://bit.ly/nptyping

  21. popmon: Profile your data to determine its stability.

    1. Link: https://bit.ly/py-popmon

  22. Gspread-pandas: Interact with Google sheets through pandas dataframes.

    1. Link: https://bit.ly/pd-gsheets

  23. pdpipe: Create pandas pipeline easily and intuitively.

    1. Link: https://bit.ly/py-pdpipe

  24. PrettyPandas: Prettify the dataframe when printed.

    1. Link: https://bit.ly/PrettyPandas

  25. Dora: An intuitive API for data cleaning, processing, feature selection, visualization, etc.

    1. Link: https://bit.ly/py-dora

  26. Pandapy: The speed of NumPy combined with Pandas' elegance.

    1. Link: https://bit.ly/pandapy

  27. PyJanitor: A clean API for cleaning data.

    1. Link: https://bit.ly/pyjanitor

  28. swifter: Speed-up the apply() method in Pandas.

    1. Link: https://bit.ly/py-swifter

  29. Mito: Analyze data in Jupyter by editing a spreadsheet.

    1. Link: https://bit.ly/mito-ds

  30. Visual Python: GUI-based Python code generator for data science

    1. Link: https://bit.ly/visual-py

  31. tqdm: Add progress bars to Pandas methods.

    1. Link: https://bit.ly/tqdm-pd

  32. Lux: Automatic data visualization.

    1. Link: https://bit.ly/pd-lux

  33. D-Tale: Visualizer for pandas dataframe.

    1. Link: https://bit.ly/py-dtale

  34. AutoClean: Automated data preprocessing & cleaning.

    1. Link: https://bit.ly/py-autoclean

  35. pytablewriter: Write a dataframe in various formats: AsciiDoc / CSV / HTML / JSON / LaTeX / Markdown / Excel / TOML / TSV / YAML, etc.

    1. Link: https://bit.ly/pytablewriter

  36. itables: Pandas dataframes as interactive datatables.

    1. Link: https://bit.ly/itables

  37. PandasGUI: A GUI for Pandas dataframes.

    1. Link: https://bit.ly/PandasGUI

  38. tabula-py: Extract table from PDF into Pandas dataframe.

    1. Link: https://bit.ly/tabulapy

  39. Pingouin: Perform statistical testing on Pandas dataframe.

    1. Link: https://bit.ly/pypingouin

  40. Dexplot: Create many types of beautiful data visualizations with a simple, consistent, and intuitive API.

    1. Link: https://bit.ly/dexplot

That’s a wrap!!

What cool Python libraries would you add to this list?

👇 Drop your suggestions in the replies below 👇

Share this post on LinkedIn: Post Link.

Find the code for my tips here: GitHub.

I like to explore, experiment and write about data science concepts and tools. You can read my articles on Medium. Also, you can connect with me on LinkedIn.

Reply

or to participate.