Automated EDA Tool Stack

8 automated EDA tools in a single frame.

Below are 8 powerful EDA tools that automate many redundant EDA steps and help you profile your data quickly.

Before I begin:

Please note that these tools are not the ultimate EDA alternatives that will answer all your questions about the dataset.

But given that the preliminary EDA steps in almost all projects are the same — plotting the response variable, checking imbalance, running correlation analysis, missing value analysis, and more, these tools pretty well automate these steps in my opinion.

Also, at times, manual EDA can be prone to human errors and one may miss out on checking a few things.

Automated tools eliminate these risks and provide a standardized report across all projects.

  • SweetViz

    • Creates a variety of data visualizations.

    • Covers information about missing values, data statistics, etc.

    • Integrates with Jupyter Notebook.

    • Get started: GitHub.

  • ydata-profiling

    • Covers info about missing values, data statistics, correlation, etc.

    • Produces data alerts.

    • Plots data feature interactions.

    • Get started: GitHub.

  • DataPrep

    • Produces interactive visualizations.

    • Typically faster than other common tools.

    • Supports Pandas and Dask DataFrames.

    • Covers info about missing values, data statistics, correlation, etc.

    • Plots data feature interactions.

    • Get started: GitHub.

  • AutoViz

    • Supports CSV, TXT, and JSON.

    • Interactive Bokeh charts.

    • Covers info about missing values, data statistics, correlation, etc.

    • Presents data cleaning suggestions.

    • Get started: GitHub.

  • D-Tale

    • Allows you to run many common Pandas operations with no code.

    • Exports code of analysis.

    • Integrates with Jupyter Notebook.

    • Covers info about missing values, data statistics, correlation, etc.

    • Highlights duplicates, outliers, etc.

    • Get started: GitHub.

  • dabl

    • Primarily provides visualizations.

    • Covers a wide range of plots:

      • Target distribution.

      • Scatter pair plots.

      • Histograms.

    • Get started: GitHub.

  • QuickDA

    • Get an overview report of the dataset.

    • Covers info about missing values, data statistics, correlation, etc.

    • Produces data alerts.

    • Plots data feature interactions.

    • Get started: GitHub.

  • Lux

    • Integrates with Jupyter Notebook.

    • Provides visualization recommendations.

    • Supports EDA on a subset of columns.

    • Get started: GitHub.

👉 Over to you: What are some other automated EDA tools that you are aware of?

Are you overwhelmed with the amount of information in ML/DS?

Every week, I publish no-fluff deep dives on topics that truly matter to your skills for ML/DS roles.

For instance:

Join below to unlock all full articles:

SPONSOR US

Get your product in front of 81,000 data scientists and other tech professionals.

Our newsletter puts your products and services directly in front of an audience that matters — thousands of leaders, senior data scientists, machine learning engineers, data analysts, etc., who have influence over significant tech decisions and big purchases.

To ensure your product reaches this influential audience, reserve your space here or reply to this email to ensure your product reaches this influential audience.

Reply

or to participate.