Grouping is one of the most fundamental operations in SQL. Whether you are counting transactions, averaging prices, or summarising volumes, GROUP BY lets you collapse rows into categories and compute an aggregate for each one.

Unlike Python libraries where grouping syntax varies between pandas, Polars, and other tools, SQL’s GROUP BY has been standard for decades. Learn it once and it works in SQLite, PostgreSQL, MySQL, SQL Server, and every other relational database.

This article covers five patterns, each demonstrated on an interactive dataset you can edit and run directly in your browser:

  1. GROUP BY with COUNT(*) — count rows per group
  2. Column aliases with AS — name the output column
  3. Multiple aggregations in one SELECT
  4. HAVING — filter groups after aggregation
  5. Grouping by multiple columns

The dataset

We will use a small petrol station dataset. Each row represents a fuel transaction recorded at stations across Australia. The columns capture the station name, fuel_type, litres sold, price_per_litre, and the state where the station is located. Some values are intentionally NULL to mirror real-world data quality issues.

Click Run on the first block to create the table and see the data:

SQL — editable
Figure 1: Fuel transactions — 15 rows, 5 columns.

The dataset has 15 transactions spread across three Australian petrol stations: Caltex Bondi (NSW), BP Southbank (VIC) and Shell Fortitude Valley (QLD). A natural question is: how many transactions were recorded at each station?

SQL — editable

GROUP BY station collapses all rows with the same station name into a single group, and COUNT(*) counts how many rows fall into each group. The result has one row per station.

The only issue is the column name — COUNT(*) isn’t very descriptive. Let’s fix that with an alias.

Naming columns with AS

In SQL, you name output columns using AS. This is called a column alias:

SQL — editable
Figure 2: The count column is now labeled "transactions".

AS transactions gives the aggregated column a meaningful name. This works with any aggregate function — SUM(litres) AS total_litres, AVG(price_per_litre) AS avg_price, and so on.

Multiple aggregations in one query

You can compute several aggregations in a single SELECT. Just list them all, each with its own alias:

SQL — editable
Figure 3: Three aggregations in one query — count, average, and max.

Each aggregate function produces its own column. ROUND(AVG(litres), 1) rounds the average to one decimal place. Note that AVG automatically ignores NULL values — you don’t need to handle them explicitly.

Filtering groups with HAVING

WHERE filters individual rows before grouping. HAVING filters groups after aggregation. This is one of the most important distinctions in SQL:

SQL — editable
Figure 4: Only stations with more than 4 transactions.

HAVING transactions > 4 removes any group where the count is 4 or fewer. Only BP Southbank (6 transactions) and Caltex Bondi (5 transactions) survive. Shell Fortitude Valley had only 3 transactions and was filtered out.

Grouping by multiple columns

List multiple columns in GROUP BY to create finer groups — one for each unique combination:

SQL — editable
Figure 5: Transactions by station and fuel type.

Each combination of station and fuel_type becomes its own group. We added ORDER BY to sort the results for readability — without it, the order is not guaranteed.

SQL vs pandas vs Polars: grouping compared

If you are coming from Python, here is how the key operations map:

The core advantage of SQL: GROUP BY is declarative. You describe what you want, not how to compute it. The database engine optimises the execution plan for you.

Try editing the code blocks above — change the grouping column to fuel_type or state, swap COUNT(*) for SUM(litres), or add a HAVING clause to see how each pattern behaves.

References

Suhith Illesinghe
Curiosity is the first step to make a difference. I hope to inspire others to explore, build and champion collaborative growth.