Grouping is one of the most fundamental operations in SQL. Whether you are counting transactions, averaging prices, or summarising volumes, GROUP BY lets you collapse rows into categories and compute an aggregate for each one.
Unlike Python libraries where grouping syntax varies between pandas, Polars, and other tools, SQL’s GROUP BY has been standard for decades. Learn it once and it works in SQLite, PostgreSQL, MySQL, SQL Server, and every other relational database.
This article covers five patterns, each demonstrated on an interactive dataset you can edit and run directly in your browser:
GROUP BYwithCOUNT(*)— count rows per group- Column aliases with
AS— name the output column - Multiple aggregations in one
SELECT HAVING— filter groups after aggregation- Grouping by multiple columns
The dataset
We will use a small petrol station dataset. Each row represents a fuel transaction recorded at stations across Australia. The columns capture the station name, fuel_type, litres sold, price_per_litre, and the state where the station is located. Some values are intentionally NULL to mirror real-world data quality issues.
Click Run on the first block to create the table and see the data:
The dataset has 15 transactions spread across three Australian petrol stations: Caltex Bondi (NSW), BP Southbank (VIC) and Shell Fortitude Valley (QLD). A natural question is: how many transactions were recorded at each station?
GROUP BY station collapses all rows with the same station name into a single group, and COUNT(*) counts how many rows fall into each group. The result has one row per station.
The only issue is the column name — COUNT(*) isn’t very descriptive. Let’s fix that with an alias.
Naming columns with AS
In SQL, you name output columns using AS. This is called a column alias:
AS transactions gives the aggregated column a meaningful name. This works with any aggregate function — SUM(litres) AS total_litres, AVG(price_per_litre) AS avg_price, and so on.
Multiple aggregations in one query
You can compute several aggregations in a single SELECT. Just list them all, each with its own alias:
Each aggregate function produces its own column. ROUND(AVG(litres), 1) rounds the average to one decimal place. Note that AVG automatically ignores NULL values — you don’t need to handle them explicitly.
Filtering groups with HAVING
WHERE filters individual rows before grouping. HAVING filters groups after aggregation. This is one of the most important distinctions in SQL:
HAVING transactions > 4 removes any group where the count is 4 or fewer. Only BP Southbank (6 transactions) and Caltex Bondi (5 transactions) survive. Shell Fortitude Valley had only 3 transactions and was filtered out.
Grouping by multiple columns
List multiple columns in GROUP BY to create finer groups — one for each unique combination:
Each combination of station and fuel_type becomes its own group. We added ORDER BY to sort the results for readability — without it, the order is not guaranteed.
SQL vs pandas vs Polars: grouping compared
If you are coming from Python, here is how the key operations map:
df.groupby('col').size()(pandas) →SELECT col, COUNT(*) FROM t GROUP BY coldf.group_by('col').len()(Polars) →SELECT col, COUNT(*) FROM t GROUP BY col.agg({'col': 'mean'})(pandas) →SELECT col, AVG(col) FROM t GROUP BY col.agg(pl.col('col').mean())(Polars) →SELECT col, AVG(col) FROM t GROUP BY col- No equivalent of
reset_index()needed — SQL results are always flat tables HAVINGhas no direct pandas equivalent — you would filter the aggregated DataFrame with boolean indexing
The core advantage of SQL: GROUP BY is declarative. You describe what you want, not how to compute it. The database engine optimises the execution plan for you.
Try editing the code blocks above — change the grouping column to fuel_type or state, swap COUNT(*) for SUM(litres), or add a HAVING clause to see how each pattern behaves.
References
- SQLite documentation: SELECT — GROUP BY
- SQLite documentation: Aggregate Functions
- sql.js: SQLite compiled to WebAssembly
- Pandas equivalent: How to Group Data in Pandas
- Polars equivalent: How to Group Data with Polars