Daily Dose of Data Science
Posts
GROUPING SETS — A HIGHLY Underrated Technique to Run Multiple Aggregations While Scanning the Table Only Once

GROUPING SETS — A HIGHLY Underrated Technique to Run Multiple Aggregations While Scanning the Table Only Once

A lesser-known and much efficient way to run multiple aggregations.

December 09, 2023 • Reading Time: 5 minutes

Whenever running multiple aggregations on some table, most people write multiple SQL queries — one query per aggregation.

Finally, if the results are to be gathered in a single table, they aggregate all the above tables using UNION or UNION ALL (as needed).

But this is among the most inefficient ways to approach this.

Let me explain with an example.

Consider an organization that has the following Employees table in their database:

The task is to get the following information in the same table:

The total employees in each City → This will involve an aggregation on the City column.
The total full-time employees and interns in each City → This will involve an aggregation on the City and Status column.

So the final output must look like this:

Every city has three records:

One for total employees.
One for total full-time employees.
One for total interns.

The standard approach

The most common approach of doing this involves the following SQL query:

We have one query for finding the total employees in each City.
We have another query for finding the total full-time employees and interns in each city by aggregating on both City and Status columns.

Finally, we take a UNION of both results to get the desired results:

Cool! This works as expected.

But the biggest bottleneck in this approach is that it involves scanning the same table twice.

Can we do better?

Of course we can!

The smart approach — Grouping Sets

Grouping sets is a great way to run multiple aggregations on the same table by scanning the table just once instead of multiple times.

Quite clearly, this makes our query much more efficient.

This is demonstrated below:

In this query:

We specify all aggregation columns we want to group on using the GROUPING SETS keyword.

From the above query, it is pretty clear that we are only scanning the whole table once.

Also, just like the UNION approach, it produces the desired results:

The SQL query with GROUPING SETS is much more efficient, elegant, and shorter.

Isn’t that cool?

Try it out by downloading this Jupyter Notebook: Grouping Sets Notebook.

👉 Over to you: What are some other cool ways of using GROUPING SETS?

👉 If you liked this post, don’t forget to leave a like ❤️. It helps more people discover this newsletter on Substack and tells me that you appreciate reading these daily insights.

The button is located towards the bottom of this email.

Thanks for reading!

Latest full articles

If you’re not a full subscriber, here’s what you missed:

To receive all full articles and support the Daily Dose of Data Science, consider subscribing:

👉 Tell the world what makes this newsletter special for you by leaving a review here :)

👉 If you love reading this newsletter, feel free to share it with friends!

Reply

or to participate.