Joining two datasets on a shared key is one of the most fundamental operations in data analysis. Whether you are enriching a transaction table with customer details, combining survey responses with demographic data, or linking sensor readings to location metadata, joins are how you bring related tables together.
If you have used pandas, you know pd.merge() well. It works, but it only offers four join types: inner, left, right, and outer. Polars goes further. On top of the standard joins, Polars provides anti joins and semi joins as native, first-class operations — no workarounds, no indicator columns, no multi-step filtering.
This article covers seven patterns, each demonstrated with interactive code you can edit and run directly in your browser:
- Create two DataFrames with a shared key
join()— inner join (default)how='left'— left joinhow='full'— full/outer joinhow='anti'— anti join (rows NOT in right)how='semi'— semi join (rows in right, no extra columns)left_on/right_on— join on differently named columns
The datasets
We will use two small DataFrames inspired by Australian petrol stations. The first, stations, contains station metadata: name, state, and pump capacity. The second, fuel_sales, contains fuel sales records: station name, fuel type, and total litres sold. The key column they share is station, but not every station appears in both tables — this is intentional so we can see how different join types handle mismatches.
Notice that Ampol Hobart exists in stations but has no sales records, and 7-Eleven Parramatta exists in fuel_sales but has no station record. These mismatches will reveal the difference between each join type.
Inner join (default)
An inner join returns only the rows where the key exists in both DataFrames. This is the default behaviour of .join():
Ampol Hobart is dropped (no sales data), and 7-Eleven Parramatta is dropped (no station record). Only the three stations that appear in both tables make it through. This is identical to pd.merge(left, right, on='key') in pandas.
Left join
A left join keeps all rows from the left DataFrame and fills in null where there is no match on the right:
Ampol Hobart now appears with null for fuel_type and total_litres because there is no matching row in fuel_sales. 7-Eleven Parramatta is still excluded because it does not exist in the left table (stations).
Full join
A full join keeps all rows from both DataFrames. Important: Polars uses how='full', not how='outer' as in pandas:
Now both Ampol Hobart and 7-Eleven Parramatta appear. Ampol Hobart has nulls for the sales columns, and 7-Eleven Parramatta has nulls for the station metadata columns. In a full join with Polars, you will also see a station_right column — this is because the key column from the right DataFrame is preserved separately when rows do not match.
Anti join — a Polars superpower
An anti join returns rows from the left DataFrame that have no match in the right DataFrame. This is incredibly useful for finding missing data, orphan records, or rows that failed to link. In pandas, you would need a multi-step workaround: pd.merge() with indicator='_merge', then filter for 'left_only'. Polars makes it a single, native operation:
Only Ampol Hobart is returned — it is the only station with no matching row in fuel_sales. No columns from the right DataFrame are added. This is one of the clearest advantages Polars has over pandas: what takes three lines in pandas is a single, expressive call in Polars.
Semi join
A semi join is the opposite of an anti join: it returns rows from the left DataFrame that do have a match in the right DataFrame, but — crucially — it does not add any columns from the right. Think of it as a filter: "keep only the left rows that exist in the right table."
Three stations are returned: Caltex Bondi, BP Southbank, and Shell Fortitude Valley. These are the stations that have at least one record in fuel_sales. Ampol Hobart is excluded. Notice that no columns from the right DataFrame appear — the result has the same schema as stations. There is no direct equivalent in pandas.
Joining on different column names
Sometimes the key columns have different names in each DataFrame. Instead of on, use left_on and right_on:
When using left_on and right_on, Polars keeps only the left key column in the result by default. This avoids the duplicate key column problem you sometimes get with pd.merge().
Polars vs pandas: joining compared
If you are coming from pandas, here is how the key join operations map:
pd.merge(left, right, on='key')→left.join(right, on='key')how='outer'→how='full'(Polars renamed this for clarity)indicator='_merge'+ filter for'left_only'→how='anti'(native!)- No pandas equivalent →
how='semi'(native!) suffixes=('_x', '_y')→suffix='_right'left_on='a', right_on='b'→left_on='a', right_on='b'(same pattern)
The most significant advantages are anti and semi joins. In pandas, an anti join requires merging with an indicator column and then filtering — three operations that are easy to get wrong. In Polars, it is a single parameter: how='anti'. Similarly, a semi join in pandas requires either a merge-and-drop or an isin() filter on the key column. In Polars, it is simply how='semi'.
Try editing the code blocks above — swap stations and fuel_sales to see how the join direction changes, add new rows to either DataFrame, or change how to explore each join type.
References
- Polars documentation: polars.DataFrame.join
- Polars user guide: Joins
- Polars documentation: polars.LazyFrame.join (lazy equivalent)
- Pandas equivalent: pandas.DataFrame.merge