How do you group data in pandas?

Use the groupby() method on a DataFrame followed by an aggregation function. For example, df.groupby('column').size() groups the data by the specified column and counts the rows in each group. Other common aggregations include .mean(), .sum(), and .count().

How do you name a grouped column in pandas?

There are three common ways: (1) Use .to_frame('name') to convert the Series result to a named DataFrame column. (2) Pass as_index=False in groupby() then .rename(columns={'size': 'name'}). (3) Use .reset_index(name='name') for the most concise one-liner.

What is the difference between to_frame and as_index=False in pandas groupby?

to_frame() is called after groupby and converts the resulting Series into a DataFrame with a specified column name, but the group column stays in the index. as_index=False is passed inside groupby() and keeps the group column as a regular column from the start, but the aggregated column gets a default name like 'size' that you may need to rename.

Why does pandas groupby return a Series instead of a DataFrame?

When you call a single aggregation like .size(), .sum(), or .mean() on a groupby object, pandas returns a Series because the result has only one column of values. The group column becomes the index. To get a DataFrame, use .to_frame(), pass as_index=False, or call .reset_index().

How to Group Data in Pandas — groupby, to_frame, reset

Grouping is one of the most common operations in data analysis. Whether you are counting transactions, averaging prices, or summarising volumes, pandas groupby is almost always the starting point. But the output of a groupby call often comes back as an unnamed Series with the grouping column buried in the index — not the tidy DataFrame you need for downstream merges, charts, or exports.

This article shows you how to group data in pandas and, crucially, how to name and reshape the result into a clean DataFrame. We cover three approaches, each demonstrated on an interactive dataset you can edit and run directly in your browser:

to_frame() — convert the Series to a DataFrame and name the column explicitly
as_index=False — prevent the group column from becoming the index in the first place
reset_index(name=) — the most concise one-liner that does both in a single call

The dataset

We will use a small petrol station dataset. Each row represents a fuel transaction recorded at stations across Australia. The columns capture the station name, fuel type, litres sold, price per litre, and the state where the station is located. Some values are intentionally missing to mirror real-world data quality issues.

Python — editable

import pandas as pd
import numpy as np

fuel = {
    "station": [
        'Caltex Bondi','Caltex Bondi','Caltex Bondi',
        'BP Southbank','BP Southbank','BP Southbank','BP Southbank',
        'Shell Fortitude Valley','Shell Fortitude Valley','Shell Fortitude Valley',
        'Caltex Bondi','Caltex Bondi',
        'BP Southbank','BP Southbank','BP Southbank'
    ],
    "fuel type": [
        'Unleaded','Diesel','Premium',
        'Unleaded','Unleaded','Diesel','Premium',
        'Diesel','Unleaded','Premium',
        'Diesel','Unleaded',
        'Diesel','Premium','Unleaded'
    ],
    "litres": [
        45.2, 60.0, 38.5,
        52.1, 47.8, np.nan, 41.0,
        55.3, 44.9, np.nan,
        58.7, 40.1,
        63.2, 35.6, 49.0
    ],
    "price per litre": [
        1.89, 1.95, 2.12,
        1.85, 1.85, 1.92, 2.09,
        1.93, np.nan, 2.15,
        1.95, 1.89,
        1.92, 2.09, 1.85
    ],
    "state": [
        'NSW','NSW','NSW',
        'VIC','VIC','VIC','VIC',
        'QLD','QLD',np.nan,
        'NSW','NSW',
        'VIC','VIC','VIC'
    ]
}

fuel_df = pd.DataFrame(fuel)
fuel_df

Figure 1: Fuel transactions — 15 rows, 5 columns.

The dataset has 15 transactions spread across three Australian petrol stations: Caltex Bondi (NSW), BP Southbank (VIC) and Shell Fortitude Valley (QLD). A natural question is: how many transactions were recorded at each station? Let's find out using groupby.

Python — editable

fuel_df.groupby('station').size()

Notice two things about the result. First, it is a Series, not a DataFrame — there is no column header for the counts. Second, the station column has become the index rather than a regular column. Both of these quirks make the output harder to work with if you need to merge, export, or chart the data. Let's fix that.

Method 1: `to_frame()`

The to_frame() method converts a Series into a DataFrame and accepts a string argument that becomes the column name.

Python — editable

fuel_df.groupby('station').size().to_frame('transactions')

Figure 2: The grouped column is now labeled "transactions".

We now have a proper DataFrame with a meaningful column name. However, station is still sitting in the index. If you want it back as a regular column, chain reset_index().

Python — editable

fuel_df.groupby('station').size().to_frame('transactions').reset_index()

Figure 3: A clean DataFrame with a default integer index.

This is a clean, merge-ready result. The downside is the two-step chain — to_frame() then reset_index(). The next method achieves the same thing differently.

Method 2: `as_index=False`

Passing as_index=False directly inside groupby() tells pandas to keep the grouping column as a regular column instead of promoting it to the index.

Python — editable

fuel_df.groupby('station', as_index=False).size()

Figure 4: as_index=False produces a DataFrame directly.

This already returns a DataFrame with a normal integer index — no reset_index() needed. The trade-off is that the aggregated column is automatically named size, which may not be the label you want. A quick rename() fixes that.

Python — editable

fuel_df.groupby('station', as_index=False).size().rename(columns={'size': 'transactions'})

Figure 5: Renamed from "size" to "transactions".

Same result, different route. Whether you prefer this over Method 1 is largely a matter of taste.

Method 3: `reset_index(name=)`

The name parameter on reset_index() lets you rename the Series values column and move the index back to a regular column — all in one call.

Python — editable

fuel_df.groupby('station').size().reset_index(name='transactions')

Figure 6: The shortest approach — one chained call.

Method 3 is the most concise. A single chained call after size() handles both the naming and the index reset. When readability and brevity both matter, this is usually the best choice.

Which method should you use?

All three methods produce an identical DataFrame. The decision comes down to context:

Method 1 (to_frame + reset_index) — clearest when you want to separate the "convert to DataFrame" step from the "fix the index" step, which can be useful in longer chains.
Method 2 (as_index=False) — best when you know upfront that you never want the group column in the index. Pair it with rename() if the default name isn't suitable.
Method 3 (reset_index(name=)) — fewest characters, least cognitive overhead. Ideal for quick exploratory work and scripts.

Pick the one that fits your workflow, and try editing the code blocks above to experiment — change the grouping column to fuel type or state, swap size() for mean() on the litres column, or add your own stations to see how each method behaves.

Data Science Data Science Training Data Engineering Pandas Python

References

Original article: How to name grouped data in Pandas? — Medium
pandas documentation: pandas.DataFrame.groupby
pandas documentation: pandas.Series.to_frame
pandas documentation: pandas.DataFrame.reset_index

Suhith Illesinghe

Curiosity is the first step to make a difference. I hope to inspire others to explore, build and champion collaborative growth.

Follow on Medium ↗

How to group data in Pandas?

The dataset

Method 1: to_frame()

Method 2: as_index=False

Method 3: reset_index(name=)

Which method should you use?

References

Related Articles

Method 1: `to_frame()`

Method 2: `as_index=False`

Method 3: `reset_index(name=)`