Filtering is the most common operation in SQL. Every time you query a database, you almost always need to narrow down the results — keep only rows that match a condition, exclude missing values, or grab a specific page of results.

SQL gives you a rich set of filtering tools: WHERE for conditions, IN for matching against a list, BETWEEN for ranges, LIKE for pattern matching, IS NULL for missing data, and LIMIT/OFFSET for pagination. If you have used pandas or Polars, think of WHERE as the SQL equivalent of boolean indexing (df[df['col'] > value]) or the .filter() method.

This article covers seven patterns, each demonstrated on an interactive dataset you can edit and run directly in your browser:

  1. Viewing the dataset with SELECT *
  2. Basic filtering with WHERE
  3. Combining conditions with AND
  4. Matching a list with IN
  5. Selecting a range with BETWEEN
  6. Finding missing values with IS NULL
  7. Paginating results with LIMIT and OFFSET

The dataset

We will use the same petrol station dataset from across this series. Each row represents a fuel transaction recorded at stations across Australia. The columns capture the station name, fuel_type, litres sold, price_per_litre, and the state where the station is located. Some values are intentionally NULL to mirror real-world data quality issues.

Click Run on the first block to create the table and see the data:

SQL — editable
Figure 1: Fuel transactions — 15 rows, 5 columns.

The dataset has 15 transactions spread across three Australian petrol stations: Caltex Bondi (NSW), BP Southbank (VIC) and Shell Fortitude Valley (QLD). Notice the NULL values in the litres, price_per_litre, and state columns — we will use those later when we explore IS NULL.

Filtering rows with WHERE

WHERE is the workhorse of SQL filtering. It goes after the FROM clause and keeps only rows where the condition evaluates to TRUE. You can use comparison operators like =, !=, <, >, <=, and >=.

Let’s find all transactions where the price per litre exceeded $1.90:

SQL — editable
Figure 2: Only rows where price_per_litre exceeds 1.90.

The query scans every row in fuel and returns only those where price_per_litre > 1.90. Rows with a NULL price are automatically excluded because any comparison with NULL evaluates to NULL, not TRUE.

In pandas, the equivalent would be df[df['price_per_litre'] > 1.90]. In Polars, you would write df.filter(pl.col('price_per_litre') > 1.90).

Combining conditions with AND and OR

You can chain multiple conditions using AND (both must be true) or OR (at least one must be true). Let’s find Unleaded fuel transactions at BP Southbank:

SQL — editable
Figure 3: Unleaded transactions at BP Southbank only.

Both conditions must be satisfied for a row to appear. AND narrows the results; OR widens them. When mixing AND and OR, use parentheses to make the logic explicit: WHERE (a = 1 OR b = 2) AND c = 3.

In pandas this is df[(df['station'] == 'BP Southbank') & (df['fuel_type'] == 'Unleaded')]. In Polars: df.filter((pl.col('station') == 'BP Southbank') & (pl.col('fuel_type') == 'Unleaded')).

Matching a list of values with IN

When you need to match a column against several possible values, IN is cleaner than writing multiple OR conditions. Think of it as asking “is this value in my list?”

SQL — editable
Figure 4: Transactions from Caltex Bondi and Shell Fortitude Valley.

This is equivalent to WHERE station = 'Caltex Bondi' OR station = 'Shell Fortitude Valley', but far more readable — especially when the list grows. You can also negate it with NOT IN to exclude specific values.

The pandas equivalent is df[df['station'].isin(['Caltex Bondi', 'Shell Fortitude Valley'])]. In Polars: df.filter(pl.col('station').is_in(['Caltex Bondi', 'Shell Fortitude Valley'])).

Selecting a range with BETWEEN

BETWEEN filters rows where a value falls within an inclusive range. It is shorthand for column >= low AND column <= high:

SQL — editable
Figure 5: Prices in the $1.85 to $1.95 range (inclusive).

BETWEEN is inclusive on both ends — rows with exactly 1.85 or 1.95 are included. This is important to remember because some programming languages use half-open ranges by default. You can also use NOT BETWEEN to exclude a range.

In pandas: df[df['price_per_litre'].between(1.85, 1.95)]. In Polars: df.filter(pl.col('price_per_litre').is_between(1.85, 1.95)).

Finding missing values with IS NULL

NULL in SQL represents a missing or unknown value. You cannot use = NULL to find it — that always returns no rows because NULL is not equal to anything, not even itself. Instead, use IS NULL:

SQL — editable
Figure 6: Rows where litres is missing (NULL).

Two rows have a NULL litres value: the BP Southbank Diesel transaction and the Shell Fortitude Valley Premium transaction. To find rows that do have a value, use IS NOT NULL.

This is one of the most common mistakes for beginners: writing WHERE litres = NULL and getting zero rows. SQL uses three-valued logic (TRUE, FALSE, NULL), and any comparison with NULL yields NULL, which is treated as not-true by WHERE.

In pandas the equivalent is df[df['litres'].isna()]. In Polars: df.filter(pl.col('litres').is_null()).

Paginating results with LIMIT and OFFSET

LIMIT restricts the number of rows returned. OFFSET skips a number of rows before starting to return results. Together they enable pagination — showing results page by page:

SQL — editable
Figure 7: 5 rows starting from the 3rd row (skipping the first 2).

LIMIT 5 OFFSET 2 skips the first 2 rows and returns the next 5. This is the standard pattern for pagination: page 1 is LIMIT 5 OFFSET 0, page 2 is LIMIT 5 OFFSET 5, and so on.

Note that without an ORDER BY, the row order is not guaranteed, so pagination results could vary between runs. In practice, always combine LIMIT/OFFSET with ORDER BY for predictable pages.

In pandas: df.iloc[2:7] (positional slicing). In Polars: df.slice(2, 5).

SQL vs pandas vs Polars: filtering compared

If you are coming from Python, here is how the key filtering operations map:

The core advantage of SQL: WHERE is declarative. You describe what rows you want, not how to scan for them. The database engine chooses the fastest execution plan, potentially using indexes to skip millions of rows.

Try editing the code blocks above — change > to <, swap column names, add OR conditions, or combine WHERE with LIMIT to see how each pattern behaves.

References

Suhith Illesinghe
Curiosity is the first step to make a difference. I hope to inspire others to explore, build and champion collaborative growth.