Grouping Sets, Rollup and Cube in SQL

Three lesser-known grouping operations.

A typical GroupBy SQL query aggregates on just one set of columns.

For instance:

  • Grouping data on column “A” will require one query.

  • Grouping data on columns “A” and “B” will require a separate query.

Next, if the two outputs must be gathered in a single table, we use UNION or UNION ALL (as needed).

But this is not efficient as it scans the same table twice.

Instead, there are three ways to run multiple aggregations on the same table by scanning the table just once.

This makes our query much more efficient.

These are Grouping Sets, Rollup, and Cube, and their usage has been depicted in the image below:

#1) Grouping sets

The GROUPING SETS clause allows us to define multiple groupings in a single query.

Each grouping set defines a combination of columns by which the data is grouped.

Given the above query, here are the group aggregations that will be created:

  • (A): Aggregated by A, counting all rows across all other columns.

  • (A, B): Aggregated by (A, B), counting all rows across all other columns.

  • (C): Aggregated by C, counting all rows across all other columns.

A demonstration is shown below:

The above query:

  • Generates an aggregation on City column.

  • Generates another aggregation on Fruit column.

#2) ROLLUP

ROLLUP creates a result set that includes subtotals and a grand total in addition to the regular grouped results.

It does this by grouping the data at multiple levels of aggregation.

Given the above query, here are the group aggregations that will be created:

  • (A): Subtotal for each A, aggregated across all B and C.

  • (A, B): Subtotal for each A and B combination, aggregated across all C.

  • (A, B, C): Regular group by all three columns.

  • (): Grand total, aggregated across all A, B, and C.

A demonstration is shown below:

The above query:

  • Generates an aggregation on Fruit column.

  • Generates another aggregation on (Fruit, City) column.

  • Generates a grand total.

Unlike Grouping Sets, the order is important in ROLLUP.

More specifically, ROLLUP (A, B) will not be the same as ROLLUP (B, A).

#3) CUBE

Finally, CUBE creates a result set that includes all possible combinations of aggregations for the specified columns.

Given the above query, here are the group aggregations that will be created:

  • (A, B, C): Regular group by all three columns.

  • (A, B): Subtotal for each A and B combination, aggregated across all C.

  • (A, C): Subtotal for each A and C combination, aggregated across all B.

  • (B, C): Subtotal for each B and C combination, aggregated across all A.

  • (A): Subtotal for each A, aggregated across all B and C.

  • (B): Subtotal for each B, aggregated across all A and C.

  • (C): Subtotal for each C, aggregated across all A and B.

  • (): Grand total, aggregated across all A, B, and C.

A demonstration is shown below:

The above query:

  • Generates an aggregation on Fruit column.

  • Generates an aggregation on City column.

  • Generates another aggregation on (Fruit, City) column.

  • Generates a grand total.

Done!

Try it out by downloading this Jupyter Notebook: Grouping Sets, ROLLUP, CUBE Notebook.

👉 Over to you: What are some other lesser-known SQL keywords?

Let me help you more…

Every week, I publish in-depth ML deep dives. The topics align with the practical skills that typical ML/DS roles demand.

Join below to unlock all full articles:

Here are some of the top articles:

Join below to unlock all full articles:

SPONSOR US

Get your product in front of more than 76,000 data scientists and other tech professionals.

Our newsletter puts your products and services directly in front of an audience that matters — thousands of leaders, senior data scientists, machine learning engineers, data analysts, etc., who have influence over significant tech decisions and big purchases.

To ensure your product reaches this influential audience, reserve your space here or reply to this email.

Reply

or to participate.