What is the difference between WHERE and HAVING in SQL?

WHERE filters individual rows before any grouping or aggregation occurs. HAVING filters groups after a GROUP BY aggregation. For example, WHERE price > 1.90 removes rows before grouping, while HAVING COUNT(*) > 3 removes groups that have too few rows after aggregation.

Should I use IN or multiple OR conditions in SQL?

IN is preferred over chaining multiple OR conditions. WHERE station IN ('Caltex Bondi', 'Shell Fortitude Valley') is cleaner and easier to read than WHERE station = 'Caltex Bondi' OR station = 'Shell Fortitude Valley'. Both produce the same result, but IN scales better when the list grows.

Why does = NULL not work in SQL?

In SQL, NULL represents an unknown value. Any comparison with NULL using = returns NULL (not TRUE or FALSE), so WHERE column = NULL matches zero rows. You must use IS NULL or IS NOT NULL instead. This is part of SQL's three-valued logic: TRUE, FALSE, and NULL.

What do the % and _ wildcards mean in SQL LIKE?

In a LIKE pattern, % matches zero or more characters and _ matches exactly one character. For example, LIKE 'B%' matches any string starting with B, LIKE '%bank' matches any string ending with bank, and LIKE '_P%' matches strings where the second character is P.

How to Filter Data with SQL — WHERE, IN, BETWEEN, LIKE, IS NULL

Filtering is the most common operation in SQL. Every time you query a database, you almost always need to narrow down the results — keep only rows that match a condition, exclude missing values, or grab a specific page of results.

SQL gives you a rich set of filtering tools: WHERE for conditions, IN for matching against a list, BETWEEN for ranges, LIKE for pattern matching, IS NULL for missing data, and LIMIT/OFFSET for pagination. If you have used pandas or Polars, think of WHERE as the SQL equivalent of boolean indexing (df[df['col'] > value]) or the .filter() method.

This article covers seven patterns, each demonstrated on an interactive dataset you can edit and run directly in your browser:

Viewing the dataset with SELECT *
Basic filtering with WHERE
Combining conditions with AND
Matching a list with IN
Selecting a range with BETWEEN
Finding missing values with IS NULL
Paginating results with LIMIT and OFFSET

The dataset

We will use the same petrol station dataset from across this series. Each row represents a fuel transaction recorded at stations across Australia. The columns capture the station name, fuel_type, litres sold, price_per_litre, and the state where the station is located. Some values are intentionally NULL to mirror real-world data quality issues.

Click Run on the first block to create the table and see the data:

SQL — editable

SELECT * FROM fuel;

Figure 1: Fuel transactions — 15 rows, 5 columns.

The dataset has 15 transactions spread across three Australian petrol stations: Caltex Bondi (NSW), BP Southbank (VIC) and Shell Fortitude Valley (QLD). Notice the NULL values in the litres, price_per_litre, and state columns — we will use those later when we explore IS NULL.

Filtering rows with `WHERE`

WHERE is the workhorse of SQL filtering. It goes after the FROM clause and keeps only rows where the condition evaluates to TRUE. You can use comparison operators like =, !=, <, >, <=, and >=.

Let’s find all transactions where the price per litre exceeded $1.90:

SQL — editable

SELECT * FROM fuel
WHERE price_per_litre > 1.90;

Figure 2: Only rows where price_per_litre exceeds 1.90.

The query scans every row in fuel and returns only those where price_per_litre > 1.90. Rows with a NULL price are automatically excluded because any comparison with NULL evaluates to NULL, not TRUE.

In pandas, the equivalent would be df[df['price_per_litre'] > 1.90]. In Polars, you would write df.filter(pl.col('price_per_litre') > 1.90).

Combining conditions with `AND` and `OR`

You can chain multiple conditions using AND (both must be true) or OR (at least one must be true). Let’s find Unleaded fuel transactions at BP Southbank:

SQL — editable

SELECT * FROM fuel
WHERE station = 'BP Southbank'
  AND fuel_type = 'Unleaded';

Figure 3: Unleaded transactions at BP Southbank only.

Both conditions must be satisfied for a row to appear. AND narrows the results; OR widens them. When mixing AND and OR, use parentheses to make the logic explicit: WHERE (a = 1 OR b = 2) AND c = 3.

In pandas this is df[(df['station'] == 'BP Southbank') & (df['fuel_type'] == 'Unleaded')]. In Polars: df.filter((pl.col('station') == 'BP Southbank') & (pl.col('fuel_type') == 'Unleaded')).

Matching a list of values with `IN`

When you need to match a column against several possible values, IN is cleaner than writing multiple OR conditions. Think of it as asking “is this value in my list?”

SQL — editable

SELECT * FROM fuel
WHERE station IN ('Caltex Bondi', 'Shell Fortitude Valley');

Figure 4: Transactions from Caltex Bondi and Shell Fortitude Valley.

This is equivalent to WHERE station = 'Caltex Bondi' OR station = 'Shell Fortitude Valley', but far more readable — especially when the list grows. You can also negate it with NOT IN to exclude specific values.

The pandas equivalent is df[df['station'].isin(['Caltex Bondi', 'Shell Fortitude Valley'])]. In Polars: df.filter(pl.col('station').is_in(['Caltex Bondi', 'Shell Fortitude Valley'])).

Selecting a range with `BETWEEN`

BETWEEN filters rows where a value falls within an inclusive range. It is shorthand for column >= low AND column <= high:

SQL — editable

SELECT * FROM fuel
WHERE price_per_litre BETWEEN 1.85 AND 1.95;

Figure 5: Prices in the $1.85 to $1.95 range (inclusive).

BETWEEN is inclusive on both ends — rows with exactly 1.85 or 1.95 are included. This is important to remember because some programming languages use half-open ranges by default. You can also use NOT BETWEEN to exclude a range.

In pandas: df[df['price_per_litre'].between(1.85, 1.95)]. In Polars: df.filter(pl.col('price_per_litre').is_between(1.85, 1.95)).

Finding missing values with `IS NULL`

NULL in SQL represents a missing or unknown value. You cannot use = NULL to find it — that always returns no rows because NULL is not equal to anything, not even itself. Instead, use IS NULL:

SQL — editable

SELECT * FROM fuel
WHERE litres IS NULL;

Figure 6: Rows where litres is missing (NULL).

Two rows have a NULL litres value: the BP Southbank Diesel transaction and the Shell Fortitude Valley Premium transaction. To find rows that do have a value, use IS NOT NULL.

This is one of the most common mistakes for beginners: writing WHERE litres = NULL and getting zero rows. SQL uses three-valued logic (TRUE, FALSE, NULL), and any comparison with NULL yields NULL, which is treated as not-true by WHERE.

In pandas the equivalent is df[df['litres'].isna()]. In Polars: df.filter(pl.col('litres').is_null()).

Paginating results with `LIMIT` and `OFFSET`

LIMIT restricts the number of rows returned. OFFSET skips a number of rows before starting to return results. Together they enable pagination — showing results page by page:

SQL — editable

SELECT * FROM fuel
LIMIT 5 OFFSET 2;

Figure 7: 5 rows starting from the 3rd row (skipping the first 2).

LIMIT 5 OFFSET 2 skips the first 2 rows and returns the next 5. This is the standard pattern for pagination: page 1 is LIMIT 5 OFFSET 0, page 2 is LIMIT 5 OFFSET 5, and so on.

Note that without an ORDER BY, the row order is not guaranteed, so pagination results could vary between runs. In practice, always combine LIMIT/OFFSET with ORDER BY for predictable pages.

In pandas: df.iloc[2:7] (positional slicing). In Polars: df.slice(2, 5).

SQL vs pandas vs Polars: filtering compared

If you are coming from Python, here is how the key filtering operations map:

df[df['col'] > val] (pandas) → SELECT * FROM t WHERE col > val
df.filter(pl.col('col') > val) (Polars) → SELECT * FROM t WHERE col > val
df[df['col'].isin([...]) ] (pandas) → WHERE col IN (...)
df['col'].between(a, b) (pandas) → WHERE col BETWEEN a AND b
df[df['col'].isna()] (pandas) → WHERE col IS NULL
df.iloc[offset:offset+limit] (pandas) → LIMIT n OFFSET m
df.slice(offset, limit) (Polars) → LIMIT n OFFSET m

The core advantage of SQL: WHERE is declarative. You describe what rows you want, not how to scan for them. The database engine chooses the fastest execution plan, potentially using indexes to skip millions of rows.

Try editing the code blocks above — change > to <, swap column names, add OR conditions, or combine WHERE with LIMIT to see how each pattern behaves.

Data Science SQL WHERE Filtering

References

SQLite documentation: WHERE clause
sql.js: SQLite compiled to WebAssembly
Pandas equivalent: How to Filter and Slice Data with Pandas
Polars equivalent: How to Filter and Slice Data with Polars

Suhith Illesinghe

Curiosity is the first step to make a difference. I hope to inspire others to explore, build and champion collaborative growth.

How to filter data with SQL?

The dataset

Filtering rows with WHERE

Combining conditions with AND and OR

Matching a list of values with IN

Selecting a range with BETWEEN

Finding missing values with IS NULL

Paginating results with LIMIT and OFFSET

SQL vs pandas vs Polars: filtering compared

References

Related Articles

Filtering rows with `WHERE`

Combining conditions with `AND` and `OR`

Matching a list of values with `IN`

Selecting a range with `BETWEEN`

Finding missing values with `IS NULL`

Paginating results with `LIMIT` and `OFFSET`