How do you group data in SQL?

Use GROUP BY with an aggregate function: SELECT column, COUNT(*) FROM table GROUP BY column. You can use COUNT, SUM, AVG, MIN, or MAX to summarise each group.

What is the difference between WHERE and HAVING in SQL?

WHERE filters individual rows before grouping. HAVING filters groups after aggregation. For example, WHERE price > 1.90 removes rows first, while HAVING COUNT(*) > 3 removes groups with too few rows.

Can you GROUP BY multiple columns in SQL?

Yes. List multiple columns separated by commas: GROUP BY station, fuel_type. This creates a group for each unique combination of the listed columns.

How do you name an aggregated column in SQL?

Use AS to create a column alias: SELECT brand, COUNT(*) AS transaction_count FROM fuel GROUP BY brand. The alias makes the output column readable and referenceable.

How to Group Data with SQL — GROUP BY, COUNT, SUM, AVG, HAVING

Grouping is one of the most fundamental operations in SQL. Whether you are counting transactions, averaging prices, or summarising volumes, GROUP BY lets you collapse rows into categories and compute an aggregate for each one.

Unlike Python libraries where grouping syntax varies between pandas, Polars, and other tools, SQL’s GROUP BY has been standard for decades. Learn it once and it works in SQLite, PostgreSQL, MySQL, SQL Server, and every other relational database.

This article covers five patterns, each demonstrated on an interactive dataset you can edit and run directly in your browser:

GROUP BY with COUNT(*) — count rows per group
Column aliases with AS — name the output column
Multiple aggregations in one SELECT
HAVING — filter groups after aggregation
Grouping by multiple columns

The dataset

We will use a small petrol station dataset. Each row represents a fuel transaction recorded at stations across Australia. The columns capture the station name, fuel_type, litres sold, price_per_litre, and the state where the station is located. Some values are intentionally NULL to mirror real-world data quality issues.

Click Run on the first block to create the table and see the data:

SQL — editable

SELECT * FROM fuel;

Figure 1: Fuel transactions — 15 rows, 5 columns.

The dataset has 15 transactions spread across three Australian petrol stations: Caltex Bondi (NSW), BP Southbank (VIC) and Shell Fortitude Valley (QLD). A natural question is: how many transactions were recorded at each station?

SQL — editable

SELECT station, COUNT(*)
FROM fuel
GROUP BY station;

GROUP BY station collapses all rows with the same station name into a single group, and COUNT(*) counts how many rows fall into each group. The result has one row per station.

The only issue is the column name — COUNT(*) isn’t very descriptive. Let’s fix that with an alias.

Naming columns with `AS`

In SQL, you name output columns using AS. This is called a column alias:

SQL — editable

SELECT station, COUNT(*) AS transactions
FROM fuel
GROUP BY station;

Figure 2: The count column is now labeled "transactions".

AS transactions gives the aggregated column a meaningful name. This works with any aggregate function — SUM(litres) AS total_litres, AVG(price_per_litre) AS avg_price, and so on.

Multiple aggregations in one query

You can compute several aggregations in a single SELECT. Just list them all, each with its own alias:

SQL — editable

SELECT station,
       COUNT(*) AS transactions,
       ROUND(AVG(litres), 1) AS avg_litres,
       MAX(price_per_litre) AS max_price
FROM fuel
GROUP BY station;

Figure 3: Three aggregations in one query — count, average, and max.

Each aggregate function produces its own column. ROUND(AVG(litres), 1) rounds the average to one decimal place. Note that AVG automatically ignores NULL values — you don’t need to handle them explicitly.

Filtering groups with `HAVING`

WHERE filters individual rows before grouping. HAVING filters groups after aggregation. This is one of the most important distinctions in SQL:

SQL — editable

SELECT station, COUNT(*) AS transactions
FROM fuel
GROUP BY station
HAVING transactions > 4;

Figure 4: Only stations with more than 4 transactions.

HAVING transactions > 4 removes any group where the count is 4 or fewer. Only BP Southbank (6 transactions) and Caltex Bondi (5 transactions) survive. Shell Fortitude Valley had only 3 transactions and was filtered out.

Grouping by multiple columns

List multiple columns in GROUP BY to create finer groups — one for each unique combination:

SQL — editable

SELECT station, fuel_type, COUNT(*) AS transactions
FROM fuel
GROUP BY station, fuel_type
ORDER BY station, fuel_type;

Figure 5: Transactions by station and fuel type.

Each combination of station and fuel_type becomes its own group. We added ORDER BY to sort the results for readability — without it, the order is not guaranteed.

SQL vs pandas vs Polars: grouping compared

If you are coming from Python, here is how the key operations map:

df.groupby('col').size() (pandas) → SELECT col, COUNT(*) FROM t GROUP BY col
df.group_by('col').len() (Polars) → SELECT col, COUNT(*) FROM t GROUP BY col
.agg({'col': 'mean'}) (pandas) → SELECT col, AVG(col) FROM t GROUP BY col
.agg(pl.col('col').mean()) (Polars) → SELECT col, AVG(col) FROM t GROUP BY col
No equivalent of reset_index() needed — SQL results are always flat tables
HAVING has no direct pandas equivalent — you would filter the aggregated DataFrame with boolean indexing

The core advantage of SQL: GROUP BY is declarative. You describe what you want, not how to compute it. The database engine optimises the execution plan for you.

Try editing the code blocks above — change the grouping column to fuel_type or state, swap COUNT(*) for SUM(litres), or add a HAVING clause to see how each pattern behaves.

Data Science SQL GROUP BY Aggregation

References

SQLite documentation: SELECT — GROUP BY
SQLite documentation: Aggregate Functions
sql.js: SQLite compiled to WebAssembly
Pandas equivalent: How to Group Data in Pandas
Polars equivalent: How to Group Data with Polars

Suhith Illesinghe

Curiosity is the first step to make a difference. I hope to inspire others to explore, build and champion collaborative growth.

How to group data with SQL?

The dataset

Naming columns with AS

Multiple aggregations in one query

Filtering groups with HAVING

Grouping by multiple columns

SQL vs pandas vs Polars: grouping compared

References

Related Articles

Naming columns with `AS`

Filtering groups with `HAVING`