Filtering is the most common operation in SQL. Every time you query a database, you almost always need to narrow down the results — keep only rows that match a condition, exclude missing values, or grab a specific page of results.
SQL gives you a rich set of filtering tools: WHERE for conditions, IN for matching against a list, BETWEEN for ranges, LIKE for pattern matching, IS NULL for missing data, and LIMIT/OFFSET for pagination. If you have used pandas or Polars, think of WHERE as the SQL equivalent of boolean indexing (df[df['col'] > value]) or the .filter() method.
This article covers seven patterns, each demonstrated on an interactive dataset you can edit and run directly in your browser:
- Viewing the dataset with
SELECT * - Basic filtering with
WHERE - Combining conditions with
AND - Matching a list with
IN - Selecting a range with
BETWEEN - Finding missing values with
IS NULL - Paginating results with
LIMITandOFFSET
The dataset
We will use the same petrol station dataset from across this series. Each row represents a fuel transaction recorded at stations across Australia. The columns capture the station name, fuel_type, litres sold, price_per_litre, and the state where the station is located. Some values are intentionally NULL to mirror real-world data quality issues.
Click Run on the first block to create the table and see the data:
The dataset has 15 transactions spread across three Australian petrol stations: Caltex Bondi (NSW), BP Southbank (VIC) and Shell Fortitude Valley (QLD). Notice the NULL values in the litres, price_per_litre, and state columns — we will use those later when we explore IS NULL.
Filtering rows with WHERE
WHERE is the workhorse of SQL filtering. It goes after the FROM clause and keeps only rows where the condition evaluates to TRUE. You can use comparison operators like =, !=, <, >, <=, and >=.
Let’s find all transactions where the price per litre exceeded $1.90:
The query scans every row in fuel and returns only those where price_per_litre > 1.90. Rows with a NULL price are automatically excluded because any comparison with NULL evaluates to NULL, not TRUE.
In pandas, the equivalent would be df[df['price_per_litre'] > 1.90]. In Polars, you would write df.filter(pl.col('price_per_litre') > 1.90).
Combining conditions with AND and OR
You can chain multiple conditions using AND (both must be true) or OR (at least one must be true). Let’s find Unleaded fuel transactions at BP Southbank:
Both conditions must be satisfied for a row to appear. AND narrows the results; OR widens them. When mixing AND and OR, use parentheses to make the logic explicit: WHERE (a = 1 OR b = 2) AND c = 3.
In pandas this is df[(df['station'] == 'BP Southbank') & (df['fuel_type'] == 'Unleaded')]. In Polars: df.filter((pl.col('station') == 'BP Southbank') & (pl.col('fuel_type') == 'Unleaded')).
Matching a list of values with IN
When you need to match a column against several possible values, IN is cleaner than writing multiple OR conditions. Think of it as asking “is this value in my list?”
This is equivalent to WHERE station = 'Caltex Bondi' OR station = 'Shell Fortitude Valley', but far more readable — especially when the list grows. You can also negate it with NOT IN to exclude specific values.
The pandas equivalent is df[df['station'].isin(['Caltex Bondi', 'Shell Fortitude Valley'])]. In Polars: df.filter(pl.col('station').is_in(['Caltex Bondi', 'Shell Fortitude Valley'])).
Selecting a range with BETWEEN
BETWEEN filters rows where a value falls within an inclusive range. It is shorthand for column >= low AND column <= high:
BETWEEN is inclusive on both ends — rows with exactly 1.85 or 1.95 are included. This is important to remember because some programming languages use half-open ranges by default. You can also use NOT BETWEEN to exclude a range.
In pandas: df[df['price_per_litre'].between(1.85, 1.95)]. In Polars: df.filter(pl.col('price_per_litre').is_between(1.85, 1.95)).
Finding missing values with IS NULL
NULL in SQL represents a missing or unknown value. You cannot use = NULL to find it — that always returns no rows because NULL is not equal to anything, not even itself. Instead, use IS NULL:
Two rows have a NULL litres value: the BP Southbank Diesel transaction and the Shell Fortitude Valley Premium transaction. To find rows that do have a value, use IS NOT NULL.
This is one of the most common mistakes for beginners: writing WHERE litres = NULL and getting zero rows. SQL uses three-valued logic (TRUE, FALSE, NULL), and any comparison with NULL yields NULL, which is treated as not-true by WHERE.
In pandas the equivalent is df[df['litres'].isna()]. In Polars: df.filter(pl.col('litres').is_null()).
Paginating results with LIMIT and OFFSET
LIMIT restricts the number of rows returned. OFFSET skips a number of rows before starting to return results. Together they enable pagination — showing results page by page:
LIMIT 5 OFFSET 2 skips the first 2 rows and returns the next 5. This is the standard pattern for pagination: page 1 is LIMIT 5 OFFSET 0, page 2 is LIMIT 5 OFFSET 5, and so on.
Note that without an ORDER BY, the row order is not guaranteed, so pagination results could vary between runs. In practice, always combine LIMIT/OFFSET with ORDER BY for predictable pages.
In pandas: df.iloc[2:7] (positional slicing). In Polars: df.slice(2, 5).
SQL vs pandas vs Polars: filtering compared
If you are coming from Python, here is how the key filtering operations map:
df[df['col'] > val](pandas) →SELECT * FROM t WHERE col > valdf.filter(pl.col('col') > val)(Polars) →SELECT * FROM t WHERE col > valdf[df['col'].isin([...]) ](pandas) →WHERE col IN (...)df['col'].between(a, b)(pandas) →WHERE col BETWEEN a AND bdf[df['col'].isna()](pandas) →WHERE col IS NULLdf.iloc[offset:offset+limit](pandas) →LIMIT n OFFSET mdf.slice(offset, limit)(Polars) →LIMIT n OFFSET m
The core advantage of SQL: WHERE is declarative. You describe what rows you want, not how to scan for them. The database engine chooses the fastest execution plan, potentially using indexes to skip millions of rows.
Try editing the code blocks above — change > to <, swap column names, add OR conditions, or combine WHERE with LIMIT to see how each pattern behaves.
References
- SQLite documentation: WHERE clause
- sql.js: SQLite compiled to WebAssembly
- Pandas equivalent: How to Filter and Slice Data with Pandas
- Polars equivalent: How to Filter and Slice Data with Polars