How do you join two DataFrames in Polars?

Use left_df.join(right_df, on='key_column') for an inner join. You can specify the join type with the how parameter: 'inner' (default), 'left', 'full', 'anti', or 'semi'.

What is the difference between Polars join and pandas merge?

Polars uses left.join(right, on='key') instead of pd.merge(left, right, on='key'). Polars uses how='full' instead of how='outer'. Polars natively supports anti and semi joins which have no direct pandas equivalent. Polars uses suffix='_right' instead of suffixes=('_x','_y').

What is an anti join in Polars?

An anti join returns rows from the left DataFrame that have no matching key in the right DataFrame. Use stations.join(fuel_sales, on='station', how='anti'). In pandas, you would need pd.merge() with indicator='_merge' and then filter for 'left_only', but Polars makes this a single native operation.

What is a semi join in Polars?

A semi join returns rows from the left DataFrame that have a matching key in the right DataFrame, but does not add any columns from the right DataFrame. Use stations.join(fuel_sales, on='station', how='semi'). There is no direct pandas equivalent.

How do you join on columns with different names in Polars?

Use left_on and right_on instead of on: left.join(right, left_on='col_a', right_on='col_b'). This is equivalent to pd.merge(left, right, left_on='col_a', right_on='col_b') in pandas.

How to Join Data with Polars — join, left, full, anti, semi Guide

Joining two datasets on a shared key is one of the most fundamental operations in data analysis. Whether you are enriching a transaction table with customer details, combining survey responses with demographic data, or linking sensor readings to location metadata, joins are how you bring related tables together.

If you have used pandas, you know pd.merge() well. It works, but it only offers four join types: inner, left, right, and outer. Polars goes further. On top of the standard joins, Polars provides anti joins and semi joins as native, first-class operations — no workarounds, no indicator columns, no multi-step filtering.

This article covers seven patterns, each demonstrated with interactive code you can edit and run directly in your browser:

Create two DataFrames with a shared key
join() — inner join (default)
how='left' — left join
how='full' — full/outer join
how='anti' — anti join (rows NOT in right)
how='semi' — semi join (rows in right, no extra columns)
left_on / right_on — join on differently named columns

The datasets

We will use two small DataFrames inspired by Australian petrol stations. The first, stations, contains station metadata: name, state, and pump capacity. The second, fuel_sales, contains fuel sales records: station name, fuel type, and total litres sold. The key column they share is station, but not every station appears in both tables — this is intentional so we can see how different join types handle mismatches.

Python — editable

import polars as pl

stations = pl.DataFrame({
    "station": [
        "Caltex Bondi", "BP Southbank",
        "Shell Fortitude Valley", "Ampol Hobart"
    ],
    "state": ["NSW", "VIC", "QLD", "TAS"],
    "capacity": [8, 12, 6, 4]
})

fuel_sales = pl.DataFrame({
    "station": [
        "Caltex Bondi", "Caltex Bondi",
        "BP Southbank", "BP Southbank",
        "Shell Fortitude Valley",
        "7-Eleven Parramatta"
    ],
    "fuel_type": [
        "Unleaded", "Diesel",
        "Unleaded", "Premium",
        "Diesel",
        "Unleaded"
    ],
    "total_litres": [
        4520, 3100,
        5200, 1800,
        2750,
        6100
    ]
})

print("stations:")
print(stations)
print("\nfuel_sales:")
print(fuel_sales)

Figure 1: Two DataFrames — stations (4 rows) and fuel_sales (6 rows). Ampol Hobart has no sales; 7-Eleven Parramatta has no station record.

Notice that Ampol Hobart exists in stations but has no sales records, and 7-Eleven Parramatta exists in fuel_sales but has no station record. These mismatches will reveal the difference between each join type.

Inner join (default)

An inner join returns only the rows where the key exists in both DataFrames. This is the default behaviour of .join():

Python — editable

stations.join(fuel_sales, on="station")

Figure 2: Inner join — only stations present in both DataFrames appear.

Ampol Hobart is dropped (no sales data), and 7-Eleven Parramatta is dropped (no station record). Only the three stations that appear in both tables make it through. This is identical to pd.merge(left, right, on='key') in pandas.

Left join

A left join keeps all rows from the left DataFrame and fills in null where there is no match on the right:

Python — editable

stations.join(fuel_sales, on="station", how="left")

Figure 3: Left join — Ampol Hobart is kept with null sales values. 7-Eleven Parramatta is excluded.

Ampol Hobart now appears with null for fuel_type and total_litres because there is no matching row in fuel_sales. 7-Eleven Parramatta is still excluded because it does not exist in the left table (stations).

Full join

A full join keeps all rows from both DataFrames. Important: Polars uses how='full', not how='outer' as in pandas:

Python — editable

stations.join(fuel_sales, on="station", how="full")

Figure 4: Full join — every row from both DataFrames appears. Nulls fill gaps on either side.

Now both Ampol Hobart and 7-Eleven Parramatta appear. Ampol Hobart has nulls for the sales columns, and 7-Eleven Parramatta has nulls for the station metadata columns. In a full join with Polars, you will also see a station_right column — this is because the key column from the right DataFrame is preserved separately when rows do not match.

Anti join — a Polars superpower

An anti join returns rows from the left DataFrame that have no match in the right DataFrame. This is incredibly useful for finding missing data, orphan records, or rows that failed to link. In pandas, you would need a multi-step workaround: pd.merge() with indicator='_merge', then filter for 'left_only'. Polars makes it a single, native operation:

Python — editable

stations.join(fuel_sales, on="station", how="anti")

Figure 5: Anti join — only Ampol Hobart appears because it has no sales records.

Only Ampol Hobart is returned — it is the only station with no matching row in fuel_sales. No columns from the right DataFrame are added. This is one of the clearest advantages Polars has over pandas: what takes three lines in pandas is a single, expressive call in Polars.

Semi join

A semi join is the opposite of an anti join: it returns rows from the left DataFrame that do have a match in the right DataFrame, but — crucially — it does not add any columns from the right. Think of it as a filter: "keep only the left rows that exist in the right table."

Python — editable

stations.join(fuel_sales, on="station", how="semi")

Figure 6: Semi join — three stations returned (those with sales), but no sales columns added.

Three stations are returned: Caltex Bondi, BP Southbank, and Shell Fortitude Valley. These are the stations that have at least one record in fuel_sales. Ampol Hobart is excluded. Notice that no columns from the right DataFrame appear — the result has the same schema as stations. There is no direct equivalent in pandas.

Joining on different column names

Sometimes the key columns have different names in each DataFrame. Instead of on, use left_on and right_on:

Python — editable

# Rename the key column in fuel_sales to simulate different names
fuel_sales_renamed = fuel_sales.rename({"station": "site_name"})

print("fuel_sales_renamed:")
print(fuel_sales_renamed)
print()

# Join using left_on and right_on
stations.join(
    fuel_sales_renamed,
    left_on="station",
    right_on="site_name"
)

Figure 7: Joining on differently named columns with left_on and right_on.

When using left_on and right_on, Polars keeps only the left key column in the result by default. This avoids the duplicate key column problem you sometimes get with pd.merge().

Polars vs pandas: joining compared

If you are coming from pandas, here is how the key join operations map:

pd.merge(left, right, on='key') → left.join(right, on='key')
how='outer' → how='full' (Polars renamed this for clarity)
indicator='_merge' + filter for 'left_only' → how='anti' (native!)
No pandas equivalent → how='semi' (native!)
suffixes=('_x', '_y') → suffix='_right'
left_on='a', right_on='b' → left_on='a', right_on='b' (same pattern)

The most significant advantages are anti and semi joins. In pandas, an anti join requires merging with an indicator column and then filtering — three operations that are easy to get wrong. In Polars, it is a single parameter: how='anti'. Similarly, a semi join in pandas requires either a merge-and-drop or an isin() filter on the key column. In Polars, it is simply how='semi'.

Try editing the code blocks above — swap stations and fuel_sales to see how the join direction changes, add new rows to either DataFrame, or change how to explore each join type.

Data Science Polars Python joins merge

References

Polars documentation: polars.DataFrame.join
Polars user guide: Joins
Polars documentation: polars.LazyFrame.join (lazy equivalent)
Pandas equivalent: pandas.DataFrame.merge

Suhith Illesinghe

Curiosity is the first step to make a difference. I hope to inspire others to explore, build and champion collaborative growth.

How to Join Data with Polars?

The datasets

Inner join (default)

Left join

Full join

Anti join — a Polars superpower

Semi join

Joining on different column names

Polars vs pandas: joining compared

References

Related Articles