Creating new columns is one of the most fundamental operations in data analysis. Whether you are adding a constant label, computing a derived value from existing columns, or categorising rows based on conditions, column creation is something you will do in virtually every project.

If you have used pandas, you are probably used to assigning columns with df['new'] = value or df.assign(). Polars takes a different approach. Because Polars DataFrames are immutable, you use with_columns() to return a new DataFrame with the added columns. This might feel unusual at first, but it leads to cleaner, more composable code.

This article covers seven patterns for creating columns in Polars, each demonstrated on an interactive dataset you can edit and run directly in your browser:

  1. pl.DataFrame() — create the dataset
  2. pl.lit() — add a constant column
  3. Arithmetic expressions — compute a column from existing columns
  4. pl.when().then().otherwise() — conditional column
  5. .cut() — bin continuous values into categories
  6. .to_dummies() — one-hot encoding
  7. Multiple columns in a single with_columns() call

The dataset

We will use a small petrol station dataset. Each row represents a fuel transaction recorded at stations across Australia. The columns capture the station name, fuel type, litres sold, price per litre, and the state where the station is located. Some values are intentionally missing (None) to mirror real-world data quality issues.

Python — editable
Figure 1: Fuel transactions — 15 rows, 5 columns.

The dataset has 15 transactions spread across three Australian petrol stations: Caltex Bondi (NSW), BP Southbank (VIC) and Shell Fortitude Valley (QLD). Now let's start adding new columns.

Adding a constant column with pl.lit()

The simplest column creation is adding a constant value to every row. In Polars, you wrap scalar values with pl.lit() (short for "literal") and give the column a name with .alias():

Python — editable
Figure 2: Every row now has a "currency" column set to "AUD".

The pl.lit('AUD') expression creates a column where every row contains the string "AUD", and .alias('currency') names it. This is the Polars equivalent of df['currency'] = 'AUD' in pandas — but instead of mutating in place, with_columns() returns a new DataFrame. The original df remains unchanged.

Computing a column from existing columns

You can create columns by combining existing columns with arithmetic expressions. Here we multiply litres by price per litre to calculate the total cost of each transaction:

Python — editable
Figure 3: A new "total_cost" column computed from litres and price per litre.

Notice that rows where either litres or price per litre is null produce a null in total_cost. Polars propagates nulls through arithmetic automatically — no special handling needed. In pandas, you would write df['total_cost'] = df['litres'] * df['price per litre'], which looks simpler but mutates the DataFrame in place.

Conditional columns with when/then/otherwise

For if/else logic, Polars provides pl.when(), .then(), and .otherwise(). This is the equivalent of numpy.where() or a pandas .apply() with a lambda, but it runs as a native Polars expression — no Python loop overhead:

Python — editable
Figure 4: Transactions classified as "Large" or "Small" based on litres.

Rows where litres exceeds 50 are labeled "Large"; everything else is "Small". Rows with null litres will produce "Small" because null > 50 evaluates to false in the condition. If you need to handle nulls separately, you can chain an additional .when(pl.col('litres').is_null()).then(pl.lit('Unknown')) before .otherwise().

Binning with cut()

When you need to bucket continuous values into labeled categories, Polars offers .cut() on column expressions. This is the equivalent of pd.cut() in pandas:

Python — editable
Figure 5: Litres binned into volume bands — Small, Medium, Large, XL.

The breaks [30, 45, 60] create four bins: up to 30 (Small), 30–45 (Medium), 45–60 (Large), and above 60 (XL). The labels parameter assigns human-readable names to each bin. Null values in litres will produce null in volume_band.

One-hot encoding with to_dummies()

For machine learning pipelines, you often need to convert categorical columns into binary indicator columns. Polars provides to_dummies() for this:

Python — editable
Figure 6: One-hot encoded fuel type columns — each unique value becomes a binary column.

Each unique value in fuel type becomes its own column (fuel type_Diesel, fuel type_Premium, fuel type_Unleaded) with a 1 where the row matches and 0 otherwise. This is the Polars equivalent of pd.get_dummies(df, columns=['fuel type']) in pandas.

Multiple columns in one with_columns() call

One of Polars' biggest strengths is creating multiple columns in a single with_columns() call. Each expression produces a column in the output, and Polars can optimise them together:

Python — editable
Figure 7: Four new columns created in a single with_columns call.

All four columns — currency, total_cost, fill_size, and volume_band — are added in one operation. This is more efficient than chaining multiple with_columns() calls because Polars can run the expressions in parallel internally. In pandas, you would typically need separate assignment statements or use df.assign() with multiple keyword arguments.

Polars vs pandas: creating columns compared

If you are coming from pandas, here is how the key column-creation operations map:

The core difference: pandas mutates DataFrames in place (or returns copies depending on the method), while Polars always returns a new DataFrame. This immutability makes Polars code easier to reason about and debug — you never have to worry about accidental side effects or SettingWithCopyWarning.

Try editing the code blocks above — change the threshold in the when() condition, add new bin boundaries to cut(), or compute your own derived columns to see how each pattern behaves.

References

Suhith Illesinghe
Curiosity is the first step to make a difference. I hope to inspire others to explore, build and champion collaborative growth.