SQL is not just for retrieving data — it is also a powerful tool for transforming data on the fly. Every SELECT statement can include expressions that derive new columns from existing ones, without modifying the underlying table.

Whether you need to calculate a total cost, categorise rows based on a condition, handle NULL values, round numbers, or convert types, SQL provides built-in functions and syntax to do it all inside the query itself.

If you are coming from Python, think of this as the SQL equivalent of df.assign() or df.apply() in pandas, or .with_columns() and .when().then().otherwise() in Polars. The difference is that SQL’s syntax is standard across every relational database.

This article covers seven patterns, each demonstrated on an interactive dataset you can edit and run directly in your browser:

  1. Viewing the dataset with SELECT *
  2. Arithmetic expressions with AS — computed columns
  3. Conditional logic with CASE WHEN
  4. NULL handling with COALESCE
  5. Rounding with ROUND
  6. String functions — UPPER and LENGTH
  7. Type conversion with CAST

The dataset

We will use a small petrol station dataset. Each row represents a fuel transaction recorded at stations across Australia. The columns capture the station name, fuel_type, litres sold, price_per_litre, and the state where the station is located. Some values are intentionally NULL to mirror real-world data quality issues.

Click Run on the first block to see the data:

SQL — editable
Figure 1: Fuel transactions — 15 rows, 5 columns.

The dataset has 15 transactions spread across three Australian petrol stations: Caltex Bondi (NSW), BP Southbank (VIC), and Shell Fortitude Valley (QLD). Notice the NULL values in litres, price_per_litre, and state — we will handle those later with COALESCE.

Computed columns with arithmetic and AS

The simplest way to create a new column is to write an expression in the SELECT clause and give it a name with AS. Here we multiply litres by price_per_litre to calculate the total cost of each transaction:

SQL — editable
Figure 2: A new "total_cost" column computed from litres and price.

The expression litres * price_per_litre is evaluated for every row, and the result appears as a new column named total_cost. Notice that when either operand is NULL, the result is also NULL — this is standard SQL behaviour. The original table is not modified; the computed column exists only in the query result.

In pandas, the equivalent would be df.assign(total_cost=df['litres'] * df['price_per_litre']). In Polars, you would write df.with_columns((pl.col('litres') * pl.col('price_per_litre')).alias('total_cost')).

Conditional logic with CASE WHEN

CASE WHEN is SQL’s equivalent of if-elif-else. It evaluates conditions in order and returns the value for the first true condition. This is perfect for creating categorical columns from numeric data:

SQL — editable
Figure 3: A "price_tier" column based on conditional logic.

The conditions are evaluated top to bottom. A row with price_per_litre = 2.12 matches the first condition and gets labelled “Premium”. A row with price_per_litre = 1.95 fails the first but matches the second, so it becomes “Mid-range”. Everything else falls through to ELSE and becomes “Budget”. When price_per_litre is NULL, no WHEN condition is true, so the ELSE branch applies.

The pandas equivalent is np.select() or pd.cut(). In Polars, you would use pl.when().then().when().then().otherwise() chained inside .with_columns().

Handling NULLs with COALESCE

NULL values propagate through calculations — any arithmetic with NULL produces NULL. COALESCE returns the first non-NULL argument, making it the standard way to provide fallback values:

SQL — editable
Figure 4: NULL litres replaced with 0 using COALESCE.

COALESCE(litres, 0) checks if litres is NULL. If it is, the function returns 0 instead. If litres has a value, that value is returned unchanged. You can chain multiple fallbacks: COALESCE(a, b, c) returns the first non-NULL among a, b, and c.

SQLite also provides IFNULL(a, b), which is equivalent to COALESCE(a, b) with exactly two arguments. However, COALESCE is the SQL standard and works across all databases, so it is the recommended choice.

In pandas, the equivalent is df['litres'].fillna(0). In Polars, you would use pl.col('litres').fill_null(0).

Rounding with ROUND

ROUND controls the number of decimal places in numeric output. This is useful for reporting and readability:

SQL — editable
Figure 5: Prices rounded to one decimal place.

ROUND(price_per_litre, 1) rounds to one decimal place. The second argument specifies the number of digits after the decimal point. Use ROUND(value, 0) to round to the nearest integer, or ROUND(value, 2) for two decimal places. When the input is NULL, the result is NULL.

In pandas, the equivalent is df['price_per_litre'].round(1). In Polars, you would use pl.col('price_per_litre').round(1).

String functions — UPPER and LENGTH

SQL provides a rich set of string functions for transforming text columns. Here we use UPPER to convert station names to uppercase and LENGTH to count the number of characters:

SQL — editable
Figure 6: Uppercase station names and character counts.

UPPER(station) converts every character to uppercase, while LENGTH(station) returns the number of characters in the string. Other useful string functions include LOWER(), SUBSTR(), REPLACE(), and TRIM(). These work in SQLite and most other SQL databases.

In pandas, the equivalent would be df.assign(station_upper=df['station'].str.upper(), name_length=df['station'].str.len()). In Polars, you would use pl.col('station').str.to_uppercase() and pl.col('station').str.len_chars().

Type conversion with CAST

CAST explicitly converts a value from one data type to another. This is essential when you need integer division, want to store results in a specific format, or need to inspect the original type:

SQL — editable
Figure 7: Litres cast to integer and original type inspection.

CAST(litres AS INTEGER) truncates the decimal part, converting 45.2 to 45. The typeof() function (SQLite-specific) shows the storage type of the original value, confirming it is real (floating point). We added WHERE litres IS NOT NULL to exclude rows where the cast would produce NULL.

Common CAST targets include INTEGER, REAL, TEXT, and NUMERIC. Unlike implicit conversion (which happens automatically during comparisons), CAST makes your intention explicit and avoids surprises across different database engines.

In pandas, the equivalent is df['litres'].astype(int). In Polars, you would use pl.col('litres').cast(pl.Int64).

SQL vs pandas vs Polars: creating columns compared

If you are coming from Python, here is how the key operations map:

The core advantage of SQL: computed columns are declarative. You describe what you want, not how to compute it. The database engine optimises the execution plan for you. And unlike pandas, the original data is never mutated — every query produces a fresh result set.

Try editing the code blocks above — change the CASE WHEN thresholds, swap UPPER for LOWER, combine COALESCE with arithmetic, or chain ROUND with CAST to see how each pattern behaves.

Suhith Illesinghe
Curiosity is the first step to make a difference. I hope to inspire others to explore, build and champion collaborative growth.