Grouping is one of the most common operations in data analysis. Whether you are counting transactions, averaging prices, or summarising volumes, grouping rows by category and computing an aggregate is almost always the starting point.
If you have used pandas, you may have wrestled with groupby returning unnamed Series, group columns buried in the index, and the chain of to_frame() and reset_index() needed to get a clean DataFrame. Polars avoids all of that. There is no index concept — group_by always returns a tidy DataFrame, and you name columns with .alias() inside expressions.
This article covers four patterns, each demonstrated on an interactive dataset you can edit and run directly in your browser:
group_by().len()— count rows per groupgroup_by().agg()with.alias()— name the output column- Multiple aggregations in a single
.agg()call - Grouping by multiple columns
The dataset
We will use a small petrol station dataset. Each row represents a fuel transaction recorded at stations across Australia. The columns capture the station name, fuel type, litres sold, price per litre, and the state where the station is located. Some values are intentionally missing (null) to mirror real-world data quality issues.
The dataset has 15 transactions spread across three Australian petrol stations: Caltex Bondi (NSW), BP Southbank (VIC) and Shell Fortitude Valley (QLD). A natural question is: how many transactions were recorded at each station?
Notice how clean the result is compared to pandas. Polars returns a proper DataFrame — not a Series — with two columns: station and len. There is no index to reset, no unnamed column to fix. The group column is just a regular column.
The only thing you might want to change is the column name len. Let's fix that.
Naming columns with alias()
In Polars, you name output columns using .alias() inside an expression. Instead of the shortcut .len(), we switch to the full .agg() syntax:
The pl.len() expression counts rows per group, and .alias('transactions') gives the output column a meaningful name. This is the Polars equivalent of the pandas chain groupby().size().to_frame('transactions').reset_index() — but in a single, readable call.
You can also use .rename() on the result if you prefer renaming after the fact:
Multiple aggregations in one call
One of Polars' biggest strengths is running multiple aggregations in a single .agg() call. Each expression produces a column in the output:
In pandas, achieving this would typically require separate .agg() calls or a dictionary syntax with multi-level column naming. Polars keeps it flat and explicit — each expression produces exactly one named column.
Grouping by multiple columns
Pass a list of column names to group_by() to group by more than one column:
Again, both station and fuel type are regular columns in the output — no multi-level index to flatten.
Polars vs pandas: grouping compared
If you are coming from pandas, here is how the key operations map:
df.groupby('col').size()→df.group_by('col').len().to_frame('name').reset_index()→.agg(pl.len().alias('name'))as_index=False→ not needed (Polars never uses an index).agg({'col': 'mean'})→.agg(pl.col('col').mean()).rename(columns={...})→.rename({...})or.alias()inside expressions
The core advantage: Polars eliminates the index complexity that causes most of the friction in pandas grouping. You never need reset_index(), to_frame(), or as_index=False. The result is always a flat, named DataFrame.
Try editing the code blocks above — change the grouping column to fuel type or state, swap pl.len() for pl.col('litres').sum(), or add your own stations to see how each pattern behaves.
References
- Polars documentation: polars.DataFrame.group_by
- Polars documentation: polars.Expr.alias
- Polars user guide: Aggregation
- Pandas equivalent: How to Group Data in Pandas