Data Science

How to group data in Pandas?

A practical guide to groupby in pandas — aggregate, name, and reshape grouped data using to_frame(), as_index=False, and reset_index(name=).
Suhith Illesinghe · Apr 5, 2026 · 5 min read
Ad Advertisement — 728 x 90

Grouping is one of the most common operations in data analysis. Whether you are counting transactions, averaging prices, or summarising volumes, pandas groupby is almost always the starting point. But the output of a groupby call often comes back as an unnamed Series with the grouping column buried in the index — not the tidy DataFrame you need for downstream merges, charts, or exports.

This article shows you how to group data in pandas and, crucially, how to name and reshape the result into a clean DataFrame. We cover three approaches, each demonstrated on an interactive dataset you can edit and run directly in your browser:

  1. to_frame() — convert the Series to a DataFrame and name the column explicitly
  2. as_index=False — prevent the group column from becoming the index in the first place
  3. reset_index(name=) — the most concise one-liner that does both in a single call

The dataset

We will use a small petrol station dataset. Each row represents a fuel transaction recorded at stations across Australia. The columns capture the station name, fuel type, litres sold, price per litre, and the state where the station is located. Some values are intentionally missing to mirror real-world data quality issues.

Python — editable
Figure 1: Fuel transactions — 15 rows, 5 columns.

The dataset has 15 transactions spread across three Australian petrol stations: Caltex Bondi (NSW), BP Southbank (VIC) and Shell Fortitude Valley (QLD). A natural question is: how many transactions were recorded at each station? Let's find out using groupby.

Python — editable

Notice two things about the result. First, it is a Series, not a DataFrame — there is no column header for the counts. Second, the station column has become the index rather than a regular column. Both of these quirks make the output harder to work with if you need to merge, export, or chart the data. Let's fix that.

Ad Advertisement — 300 x 250

Method 1: to_frame()

The to_frame() method converts a Series into a DataFrame and accepts a string argument that becomes the column name.

Python — editable
Figure 2: The grouped column is now labeled "transactions".

We now have a proper DataFrame with a meaningful column name. However, station is still sitting in the index. If you want it back as a regular column, chain reset_index().

Python — editable
Figure 3: A clean DataFrame with a default integer index.

This is a clean, merge-ready result. The downside is the two-step chain — to_frame() then reset_index(). The next method achieves the same thing differently.

Ad Advertisement — 300 x 250

Method 2: as_index=False

Passing as_index=False directly inside groupby() tells pandas to keep the grouping column as a regular column instead of promoting it to the index.

Python — editable
Figure 4: as_index=False produces a DataFrame directly.

This already returns a DataFrame with a normal integer index — no reset_index() needed. The trade-off is that the aggregated column is automatically named size, which may not be the label you want. A quick rename() fixes that.

Python — editable
Figure 5: Renamed from "size" to "transactions".

Same result, different route. Whether you prefer this over Method 1 is largely a matter of taste.

Ad Advertisement — 300 x 250

Method 3: reset_index(name=)

The name parameter on reset_index() lets you rename the Series values column and move the index back to a regular column — all in one call.

Python — editable
Figure 6: The shortest approach — one chained call.

Method 3 is the most concise. A single chained call after size() handles both the naming and the index reset. When readability and brevity both matter, this is usually the best choice.

Which method should you use?

All three methods produce an identical DataFrame. The decision comes down to context:

Pick the one that fits your workflow, and try editing the code blocks above to experiment — change the grouping column to fuel type or state, swap size() for mean() on the litres column, or add your own stations to see how each method behaves.

Ad Advertisement — 728 x 90

References

Suhith Illesinghe
Curiosity is the first step to make a difference. I hope to inspire others to explore, build and champion collaborative growth.
Follow on Medium ↗