How do you create a new column in a pandas DataFrame?

Use df['new_column'] = value to create a new column. The value can be a constant (e.g. 0), a list, a Series, or a computed expression like df['col_a'] + df['col_b']. You can also use df.loc[:, 'new_column'] = value.

What is pd.cut in pandas?

pd.cut() bins continuous values into discrete categories. You specify bin edges and optional labels. For example, pd.cut(df['price'], bins=[0, 1.90, 2.00, 2.20], labels=['Low', 'Mid', 'High']) creates a categorical column from price ranges.

How do you one-hot encode a column in pandas?

Use pd.get_dummies(df['column'], prefix='prefix') to convert a categorical column into binary indicator columns. Each unique value becomes its own column with 1 or 0. Add dummy_na=True to include a column for NaN values.

How do you handle NaN when creating computed columns in pandas?

Use fillna(0) or fillna(value) to replace NaN before performing arithmetic. For example, df['total'] = df['col_a'].fillna(0) + df['col_b'].fillna(0). Without this, any row with NaN will produce NaN in the result.

What is the difference between df['col'] and df.loc[:, 'col'] for creating columns?

Both create a new column. df['col'] = value is the most common shorthand. df.loc[:, 'col'] = value uses label-based indexing and avoids the SettingWithCopyWarning that can occur when chaining operations on filtered DataFrames.

How to Create Columns in Pandas — New Variables, pd.cut, pd.get

Creating new columns is one of the most important skills in data analysis. Whether you're computing a derived metric, binning continuous data into categories, or preparing features for a machine learning model — you'll be adding columns to a DataFrame constantly.

In this article you'll learn three things:

How to create and assign values to a column
How to create a categorical column with pd.cut()
How to create one-hot encodings with pd.get_dummies()

1. The Dataset

We'll use a petrol station dataset with fuel transactions across Australia. Each row has a station, fuel type, litres sold, price per litre, and the state. Some values are intentionally missing.

Python — editable

import pandas as pd
import numpy as np

fuel_df = pd.DataFrame({
    'station': ['Caltex Bondi','Caltex Bondi','Caltex Bondi',
                'BP Southbank','BP Southbank','BP Southbank','BP Southbank',
                'Shell Fortitude Valley','Shell Fortitude Valley','Shell Fortitude Valley',
                'Caltex Bondi','Caltex Bondi',
                'BP Southbank','BP Southbank','BP Southbank'],
    'fuel_type': ['Unleaded','Diesel','Premium',
                  'Unleaded','Unleaded','Diesel','Premium',
                  'Diesel','Unleaded','Premium',
                  'Diesel','Unleaded',
                  'Diesel','Premium','Unleaded'],
    'litres': [45.2, 60.0, 38.5, 52.1, 47.8, np.nan, 41.0,
               55.3, 44.9, np.nan, 58.7, 40.1, 63.2, 35.6, 49.0],
    'price_per_litre': [1.89, 1.95, 2.12, 1.85, 1.85, 1.92, 2.09,
                        1.93, np.nan, 2.15, 1.95, 1.89, 1.92, 2.09, 1.85],
    'state': ['NSW','NSW','NSW','VIC','VIC','VIC','VIC',
              'QLD','QLD',np.nan,'NSW','NSW','VIC','VIC','VIC']
})

fuel_df

Figure 1: Fuel transactions — 15 rows, 5 columns.

2. Create a Column with Bracket Assignment

The simplest way to create a new column is bracket assignment. Let's add a revenue column initialised to zero:

Python — editable

# Create a new column filled with zeros
fuel_df['revenue'] = 0.0
fuel_df

Figure 2: New 'revenue' column filled with zeros.

You can also use loc to achieve the same thing. This is useful when working with filtered DataFrames to avoid the SettingWithCopyWarning:

Python — editable

# Alternative: create with loc
fuel_df.loc[:, 'discount'] = 0.0
fuel_df[['station', 'fuel_type', 'revenue', 'discount']]

Figure 3: Both 'revenue' and 'discount' columns created.

3. Populate with Random Values

Zeros aren't very interesting. Let's populate the discount column with random values between 0 and 15 (cents off per litre) using np.random.randint():

Python — editable

# Populate with random discount values (0-15 cents)
np.random.seed(42)
fuel_df['discount'] = np.random.randint(0, 16, size=len(fuel_df)).astype('float64')

fuel_df[['station', 'fuel_type', 'price_per_litre', 'discount']]

Figure 4: Random discount values between 0 and 15.

4. Compute a Derived Column

Now let's compute the actual revenue column. Revenue = litres x price per litre. But there's a catch — some rows have NaN in litres or price_per_litre. If you multiply a number by NaN, the result is NaN. Use fillna(0) first:

Python — editable

# Compute revenue — handle NaN with fillna
fuel_df['revenue'] = (
    fuel_df['litres'].fillna(0) * fuel_df['price_per_litre'].fillna(0)
).round(2)

fuel_df[['station', 'fuel_type', 'litres', 'price_per_litre', 'revenue']]

Figure 5: Revenue computed from litres x price. NaN rows produce 0.00.

Let's also compute the discounted price by subtracting the discount (in cents) from the price per litre:

Python — editable

# Discounted price = price - discount/100
fuel_df['discounted_price'] = (
    fuel_df['price_per_litre'].fillna(0) - fuel_df['discount'] / 100
).round(2)

fuel_df[['station', 'price_per_litre', 'discount', 'discounted_price']]

Figure 6: Discounted price derived from price minus discount.

Let's clean up the extra columns and keep the ones we need going forward:

Python — editable

# Drop temporary columns
fuel_df.drop(columns=['discount', 'discounted_price'], inplace=True)
fuel_df

Figure 7: Cleaned DataFrame with the revenue column retained.

5. Create Categorical Columns with `pd.cut()`

You have a continuous revenue column. What if the business wants to categorise transactions into groups — say Low, Medium, and High revenue? This is where pd.cut() comes in. It takes a continuous column and bins it into discrete categories.

First, let's calculate the bins dynamically from the data:

Python — editable

# Calculate bins dynamically
number_of_groups = 3
bin_start = fuel_df['revenue'].min()
bin_end = fuel_df['revenue'].max()
bin_size = int((bin_end - bin_start) / number_of_groups)

bins = np.arange(bin_start, bin_end, bin_size).astype(int)

# Make sure the last bin captures the max value
if bins[-1] < bin_end:
    bins = list(np.append(bins, int(bin_end)))

print(f"Bins: {bins}")

Figure 8: Dynamically calculated bin edges.

Now create labels for each bin range and apply pd.cut():

Python — editable

# Create labels from bin edges
labels = [f"{bins[i]}-{bins[i+1]}" for i in range(len(bins)-1)]
print(f"Labels: {labels}")

# Apply pd.cut to create the categorical column
fuel_df['revenue_category'] = pd.cut(
    fuel_df['revenue'], bins=bins,
    labels=labels, include_lowest=True
)

fuel_df[['station', 'fuel_type', 'revenue', 'revenue_category']]

Figure 9: Revenue categorised into bins.

Now you can group by the category to see how many transactions fall into each bucket:

Python — editable

# Count transactions per revenue category
fuel_df.groupby('revenue_category').size().reset_index(name='count')

Figure 10: Transaction volume per revenue category.

6. One-Hot Encoding with `pd.get_dummies()`

Categorical columns are great for analysis, but machine learning models typically need numerical inputs. One-hot encoding converts each category into its own binary column (1 or 0). Use pd.get_dummies() to do this in one line:

Python — editable

# One-hot encode the revenue_category column
dummies = pd.get_dummies(
    fuel_df['revenue_category'],
    prefix='rev_cat', dummy_na=True
)
dummies

Figure 11: One-hot encoded columns — one per category plus NaN.

The dummy_na=True parameter adds a column for NaN values — useful if missing data is meaningful. Now merge the dummies back into the original DataFrame:

Python — editable

# Merge one-hot columns back into the DataFrame
fuel_df = pd.merge(fuel_df, dummies, left_index=True, right_index=True)

fuel_df[['station', 'revenue', 'revenue_category'] + list(dummies.columns)]

Figure 12: Original data combined with one-hot encoded columns.

The DataFrame now has binary indicator columns ready for any machine learning pipeline or statistical model.

Summary

You've learned three essential column creation techniques:

Direct assignment — df['col'] = value or df.loc[:, 'col'] = value to create new columns with constants, random values, or computed expressions
Categorical binning — pd.cut() to bin continuous values into labelled categories with dynamic or manual bins
One-hot encoding — pd.get_dummies() to convert categorical columns into binary indicators for ML models

Try editing the code blocks above — change the number of bins, use fuel_type instead of revenue_category for one-hot encoding, or compute new derived columns like revenue per litre.

Data Science Data Science Training Feature Engineering Pandas Python

References

Original article: How to create columns in Pandas? — Medium
pandas documentation: pandas.cut
pandas documentation: pandas.get_dummies
pandas documentation: pandas.DataFrame.fillna

Suhith Illesinghe

Curiosity is the first step to make a difference. I hope to inspire others to explore, build and champion collaborative growth.

Follow on Medium ↗

How to Create Columns in Pandas

1. The Dataset

2. Create a Column with Bracket Assignment

3. Populate with Random Values

4. Compute a Derived Column

5. Create Categorical Columns with pd.cut()

6. One-Hot Encoding with pd.get_dummies()

Summary

References

Related Articles

5. Create Categorical Columns with `pd.cut()`

6. One-Hot Encoding with `pd.get_dummies()`