Creating new columns is one of the most fundamental operations in data analysis. Whether you are adding a constant label, computing a derived value from existing columns, or categorising rows based on conditions, column creation is something you will do in virtually every project.
If you have used pandas, you are probably used to assigning columns with df['new'] = value or df.assign(). Polars takes a different approach. Because Polars DataFrames are immutable, you use with_columns() to return a new DataFrame with the added columns. This might feel unusual at first, but it leads to cleaner, more composable code.
This article covers seven patterns for creating columns in Polars, each demonstrated on an interactive dataset you can edit and run directly in your browser:
pl.DataFrame()— create the datasetpl.lit()— add a constant column- Arithmetic expressions — compute a column from existing columns
pl.when().then().otherwise()— conditional column.cut()— bin continuous values into categories.to_dummies()— one-hot encoding- Multiple columns in a single
with_columns()call
The dataset
We will use a small petrol station dataset. Each row represents a fuel transaction recorded at stations across Australia. The columns capture the station name, fuel type, litres sold, price per litre, and the state where the station is located. Some values are intentionally missing (None) to mirror real-world data quality issues.
The dataset has 15 transactions spread across three Australian petrol stations: Caltex Bondi (NSW), BP Southbank (VIC) and Shell Fortitude Valley (QLD). Now let's start adding new columns.
Adding a constant column with pl.lit()
The simplest column creation is adding a constant value to every row. In Polars, you wrap scalar values with pl.lit() (short for "literal") and give the column a name with .alias():
The pl.lit('AUD') expression creates a column where every row contains the string "AUD", and .alias('currency') names it. This is the Polars equivalent of df['currency'] = 'AUD' in pandas — but instead of mutating in place, with_columns() returns a new DataFrame. The original df remains unchanged.
Computing a column from existing columns
You can create columns by combining existing columns with arithmetic expressions. Here we multiply litres by price per litre to calculate the total cost of each transaction:
Notice that rows where either litres or price per litre is null produce a null in total_cost. Polars propagates nulls through arithmetic automatically — no special handling needed. In pandas, you would write df['total_cost'] = df['litres'] * df['price per litre'], which looks simpler but mutates the DataFrame in place.
Conditional columns with when/then/otherwise
For if/else logic, Polars provides pl.when(), .then(), and .otherwise(). This is the equivalent of numpy.where() or a pandas .apply() with a lambda, but it runs as a native Polars expression — no Python loop overhead:
Rows where litres exceeds 50 are labeled "Large"; everything else is "Small". Rows with null litres will produce "Small" because null > 50 evaluates to false in the condition. If you need to handle nulls separately, you can chain an additional .when(pl.col('litres').is_null()).then(pl.lit('Unknown')) before .otherwise().
Binning with cut()
When you need to bucket continuous values into labeled categories, Polars offers .cut() on column expressions. This is the equivalent of pd.cut() in pandas:
The breaks [30, 45, 60] create four bins: up to 30 (Small), 30–45 (Medium), 45–60 (Large), and above 60 (XL). The labels parameter assigns human-readable names to each bin. Null values in litres will produce null in volume_band.
One-hot encoding with to_dummies()
For machine learning pipelines, you often need to convert categorical columns into binary indicator columns. Polars provides to_dummies() for this:
Each unique value in fuel type becomes its own column (fuel type_Diesel, fuel type_Premium, fuel type_Unleaded) with a 1 where the row matches and 0 otherwise. This is the Polars equivalent of pd.get_dummies(df, columns=['fuel type']) in pandas.
Multiple columns in one with_columns() call
One of Polars' biggest strengths is creating multiple columns in a single with_columns() call. Each expression produces a column in the output, and Polars can optimise them together:
All four columns — currency, total_cost, fill_size, and volume_band — are added in one operation. This is more efficient than chaining multiple with_columns() calls because Polars can run the expressions in parallel internally. In pandas, you would typically need separate assignment statements or use df.assign() with multiple keyword arguments.
Polars vs pandas: creating columns compared
If you are coming from pandas, here is how the key column-creation operations map:
df['new'] = value→df.with_columns(pl.lit(value).alias('new'))df.assign(new=...)→df.with_columns(...)df['a'] * df['b']→pl.col('a') * pl.col('b')insidewith_columnspd.cut(df['col'], bins)→pl.col('col').cut(breaks)pd.get_dummies(df)→df.to_dummies()np.where(cond, a, b)→pl.when(cond).then(a).otherwise(b)
The core difference: pandas mutates DataFrames in place (or returns copies depending on the method), while Polars always returns a new DataFrame. This immutability makes Polars code easier to reason about and debug — you never have to worry about accidental side effects or SettingWithCopyWarning.
Try editing the code blocks above — change the threshold in the when() condition, add new bin boundaries to cut(), or compute your own derived columns to see how each pattern behaves.
References
- Polars documentation: polars.DataFrame.with_columns
- Polars documentation: polars.lit
- Polars documentation: polars.when
- Polars documentation: polars.Expr.cut
- Polars documentation: polars.DataFrame.to_dummies
- Polars user guide: Column selections