How do you group data in Polars?

Use df.group_by('column').agg(...) to group data in Polars. For counting rows, use df.group_by('column').len(). Unlike pandas, Polars has no index concept, so the result is always a clean DataFrame.

What is the difference between Polars group_by and pandas groupby?

Polars group_by always returns a DataFrame (never a Series) and never puts group columns in an index. You name output columns with .alias() inside expressions rather than using to_frame() or reset_index(). Polars also supports multiple aggregations in a single .agg() call using expression syntax.

How do you name a grouped column in Polars?

Use .alias('name') on the aggregation expression inside .agg(). For example: df.group_by('station').agg(pl.len().alias('transactions')). You can also use .rename() on the resulting DataFrame.

Can Polars run multiple aggregations at once?

Yes. Pass multiple expressions to .agg(): df.group_by('station').agg(pl.col('litres').mean().alias('avg_litres'), pl.col('litres').sum().alias('total_litres')). Each expression produces a named column in the output.

How to Group Data with Polars — group

Grouping is one of the most common operations in data analysis. Whether you are counting transactions, averaging prices, or summarising volumes, grouping rows by category and computing an aggregate is almost always the starting point.

If you have used pandas, you may have wrestled with groupby returning unnamed Series, group columns buried in the index, and the chain of to_frame() and reset_index() needed to get a clean DataFrame. Polars avoids all of that. There is no index concept — group_by always returns a tidy DataFrame, and you name columns with .alias() inside expressions.

This article covers four patterns, each demonstrated on an interactive dataset you can edit and run directly in your browser:

group_by().len() — count rows per group
group_by().agg() with .alias() — name the output column
Multiple aggregations in a single .agg() call
Grouping by multiple columns

The dataset

We will use a small petrol station dataset. Each row represents a fuel transaction recorded at stations across Australia. The columns capture the station name, fuel type, litres sold, price per litre, and the state where the station is located. Some values are intentionally missing (null) to mirror real-world data quality issues.

Python — editable

import polars as pl

fuel = {
    "station": [
        'Caltex Bondi','Caltex Bondi','Caltex Bondi',
        'BP Southbank','BP Southbank','BP Southbank','BP Southbank',
        'Shell Fortitude Valley','Shell Fortitude Valley','Shell Fortitude Valley',
        'Caltex Bondi','Caltex Bondi',
        'BP Southbank','BP Southbank','BP Southbank'
    ],
    "fuel type": [
        'Unleaded','Diesel','Premium',
        'Unleaded','Unleaded','Diesel','Premium',
        'Diesel','Unleaded','Premium',
        'Diesel','Unleaded',
        'Diesel','Premium','Unleaded'
    ],
    "litres": [
        45.2, 60.0, 38.5,
        52.1, 47.8, None, 41.0,
        55.3, 44.9, None,
        58.7, 40.1,
        63.2, 35.6, 49.0
    ],
    "price per litre": [
        1.89, 1.95, 2.12,
        1.85, 1.85, 1.92, 2.09,
        1.93, None, 2.15,
        1.95, 1.89,
        1.92, 2.09, 1.85
    ],
    "state": [
        'NSW','NSW','NSW',
        'VIC','VIC','VIC','VIC',
        'QLD','QLD', None,
        'NSW','NSW',
        'VIC','VIC','VIC'
    ]
}

fuel_df = pl.DataFrame(fuel)
fuel_df

Figure 1: Fuel transactions — 15 rows, 5 columns.

The dataset has 15 transactions spread across three Australian petrol stations: Caltex Bondi (NSW), BP Southbank (VIC) and Shell Fortitude Valley (QLD). A natural question is: how many transactions were recorded at each station?

Python — editable

fuel_df.group_by('station').len()

Notice how clean the result is compared to pandas. Polars returns a proper DataFrame — not a Series — with two columns: station and len. There is no index to reset, no unnamed column to fix. The group column is just a regular column.

The only thing you might want to change is the column name len. Let's fix that.

Naming columns with `alias()`

In Polars, you name output columns using .alias() inside an expression. Instead of the shortcut .len(), we switch to the full .agg() syntax:

Python — editable

fuel_df.group_by('station').agg(pl.len().alias('transactions'))

Figure 2: The count column is now labeled "transactions".

The pl.len() expression counts rows per group, and .alias('transactions') gives the output column a meaningful name. This is the Polars equivalent of the pandas chain groupby().size().to_frame('transactions').reset_index() — but in a single, readable call.

You can also use .rename() on the result if you prefer renaming after the fact:

Python — editable

fuel_df.group_by('station').len().rename({'len': 'transactions'})

Figure 3: Same result using .rename() after .len().

Multiple aggregations in one call

One of Polars' biggest strengths is running multiple aggregations in a single .agg() call. Each expression produces a column in the output:

Python — editable

fuel_df.group_by('station').agg(
    pl.len().alias('transactions'),
    pl.col('litres').mean().alias('avg_litres'),
    pl.col('price per litre').max().alias('max_price')
)

Figure 4: Three aggregations in one call — count, mean, and max.

In pandas, achieving this would typically require separate .agg() calls or a dictionary syntax with multi-level column naming. Polars keeps it flat and explicit — each expression produces exactly one named column.

Grouping by multiple columns

Pass a list of column names to group_by() to group by more than one column:

Python — editable

fuel_df.group_by(['station', 'fuel type']).agg(
    pl.len().alias('transactions')
)

Figure 5: Transactions by station and fuel type.

Again, both station and fuel type are regular columns in the output — no multi-level index to flatten.

Polars vs pandas: grouping compared

If you are coming from pandas, here is how the key operations map:

df.groupby('col').size() → df.group_by('col').len()
.to_frame('name').reset_index() → .agg(pl.len().alias('name'))
as_index=False → not needed (Polars never uses an index)
.agg({'col': 'mean'}) → .agg(pl.col('col').mean())
.rename(columns={...}) → .rename({...}) or .alias() inside expressions

The core advantage: Polars eliminates the index complexity that causes most of the friction in pandas grouping. You never need reset_index(), to_frame(), or as_index=False. The result is always a flat, named DataFrame.

Try editing the code blocks above — change the grouping column to fuel type or state, swap pl.len() for pl.col('litres').sum(), or add your own stations to see how each pattern behaves.

Data Science Polars Python groupby

References

Polars documentation: polars.DataFrame.group_by
Polars documentation: polars.Expr.alias
Polars user guide: Aggregation
Pandas equivalent: How to Group Data in Pandas

Suhith Illesinghe

Curiosity is the first step to make a difference. I hope to inspire others to explore, build and champion collaborative growth.

How to group data with Polars?

The dataset

Naming columns with alias()

Multiple aggregations in one call

Grouping by multiple columns

Polars vs pandas: grouping compared

References

Related Articles

Naming columns with `alias()`