Filtering is the most fundamental operation in data analysis. Whether you need transactions from a single state, rows matching multiple criteria, or just the first few records, Polars gives you a clean, expression-based API that avoids the quirks of pandas indexing.
If you have used pandas, you have probably written df.loc[df['col'] > 50] or wrestled with .iloc vs .loc distinctions. Polars replaces all of that with .filter() for row selection, .select() for column selection, and .slice() for positional access — no index to worry about.
This article covers eight patterns, each demonstrated on an interactive dataset you can edit and run directly in your browser:
filter()with a single condition — basic row filteringfilter()with multiple conditions using&— AND logicfilter()withis_in()— membership testingselect()— column selectionslice()and bracket indexing — row slicing by positionfilter()withis_null()— finding missing values- Combining
filter()andselect()— rows and columns together
The dataset
We will use the same petrol station dataset from the Polars grouping tutorial. Each row represents a fuel transaction recorded at stations across Australia. The columns capture the station name, fuel type, litres sold, price per litre, and the state where the station is located. Some values are intentionally missing (null) to mirror real-world data quality issues.
The dataset has 15 transactions spread across three Australian petrol stations: Caltex Bondi (NSW), BP Southbank (VIC) and Shell Fortitude Valley (QLD). Let's start by filtering rows for a single state.
Basic filtering with filter()
The simplest filter selects rows where a column matches a value. Use pl.col() to reference the column and a comparison operator to define the condition:
This returns all rows where state equals 'NSW'. The syntax is similar to pandas (df[df['state'] == 'NSW']), but Polars wraps it in a dedicated .filter() method that takes an expression. The result is always a DataFrame — never a copy-on-write view or an ambiguous slice.
Multiple conditions with & (AND)
To combine conditions, use & for AND and | for OR. Each condition must be wrapped in parentheses:
The parentheses around each condition are required because Python's & operator has higher precedence than ==. Without them, you will get a confusing error. This is the same rule as pandas, but Polars makes it slightly more readable by keeping everything inside .filter() rather than using bracket notation.
You can also use | for OR logic. For example, df.filter((pl.col('state') == 'NSW') | (pl.col('state') == 'QLD')) would return rows from either state — though for multiple values, is_in() is cleaner.
Membership testing with is_in()
When you need to check if a column value is in a list of options, use .is_in() instead of chaining multiple OR conditions:
This is the Polars equivalent of pandas df[df['fuel type'].isin(['Diesel', 'Premium'])]. The method name is is_in (with an underscore) rather than isin. It accepts any iterable — a list, tuple, or even a Polars Series.
Column selection with select()
To pick specific columns from a DataFrame, use .select(). Pass a list of column names:
Unlike pandas where df[['col1', 'col2']] works for column selection, Polars uses the explicit .select() method. This is deliberate — bracket indexing in Polars is reserved for row slicing, which avoids the ambiguity that plagues pandas when a single bracket could mean either rows or columns.
You can also use expressions inside .select() for computed columns, but for simple column picking, a list of names is all you need.
Row slicing with slice()
To get rows by position (like pandas .iloc), use .slice(offset, length) or Python's bracket notation:
df.slice(0, 3) takes two arguments: the starting offset and the number of rows. The bracket notation df[0:3] works the same way as standard Python slicing. Both are the Polars equivalent of pandas df.iloc[0:3].
Note that Polars does not have .iloc or .loc — it uses .slice() for positional access and .filter() for conditional access. This simpler model eliminates the common pandas confusion between label-based and position-based indexing.
Filtering for null values
Real-world data has missing values. Use .is_null() to find rows where a column is null, or .is_not_null() to exclude them:
This is the Polars equivalent of pandas df[df['litres'].isna()]. Polars uses null (not NaN) for missing data, and the method names use underscores: is_null() and is_not_null(). You can combine null checks with other conditions — for example, df.filter(pl.col('litres').is_not_null() & (pl.col('state') == 'VIC')) to get VIC rows with valid litre values.
Combining filter() and select()
The real power comes from chaining operations. Filter rows first, then select columns — or vice versa:
This is the Polars equivalent of the pandas pattern df.loc[mask, ['col1', 'col2']]. In Polars you chain .filter() for the row condition and .select() for the column list. The result is always a clean DataFrame, and the order of operations is explicit and readable.
Polars vs pandas: filtering compared
If you are coming from pandas, here is how the key filtering operations map:
df.loc[df['col'] > 50]→df.filter(pl.col('col') > 50)df[df['col'].isin([...])]→df.filter(pl.col('col').is_in([...]))df.loc[mask, ['col1','col2']]→df.filter(mask).select(['col1','col2'])df.iloc[0:3]→df.slice(0, 3)ordf[0:3]
The core advantage: Polars eliminates the .loc vs .iloc confusion. There is one method for conditional row selection (.filter()), one for column selection (.select()), and one for positional slicing (.slice()). No index to manage, no ambiguous bracket behaviour.
Try editing the code blocks above — change the filter column to price per litre, add an OR condition with |, or combine is_in() with select() to see how each pattern behaves.
References
- Polars documentation: polars.DataFrame.filter
- Polars documentation: polars.DataFrame.select
- Polars documentation: polars.DataFrame.slice
- Polars documentation: polars.Expr.is_in
- Polars user guide: Column selections
- Pandas equivalent: How to Filter Data in Pandas