What is the difference between INNER JOIN and LEFT JOIN in SQL?

INNER JOIN returns only rows that have a matching value in both tables. LEFT JOIN returns every row from the left table and fills in NULLs for columns from the right table where there is no match. Use INNER JOIN when you only want matched data; use LEFT JOIN when you need to keep all rows from one table regardless of whether a match exists.

What is an anti-join in SQL?

An anti-join finds rows in one table that have no matching row in another table. SQL has no dedicated ANTI JOIN keyword — instead you use LEFT JOIN combined with WHERE right_table.key IS NULL. For example: SELECT s.* FROM stations s LEFT JOIN fuel f ON s.station = f.station WHERE f.station IS NULL returns stations with no fuel transactions.

When would you use a CROSS JOIN?

CROSS JOIN produces the Cartesian product — every row from one table paired with every row from the other. Common uses include generating test data, creating combinations for reporting grids (e.g. every product with every region), and building date scaffolds. Be careful with large tables as the result set grows multiplicatively.

What is the difference between ON and USING in SQL joins?

Both ON and USING specify the join condition. ON is more flexible and lets you join on columns with different names or use complex conditions: ON a.id = b.a_id. USING is shorthand when both tables have a column with the same name: USING (station). With USING, the shared column appears only once in the result set.

How does SQL JOIN compare to pandas merge and Polars join?

SQL INNER JOIN is equivalent to pandas pd.merge(df1, df2, on='key') or Polars df1.join(df2, on='key'). SQL LEFT JOIN corresponds to pandas merge with how='left' or Polars join with how='left'. The concepts are identical — only the syntax differs. SQL uses ON to specify keys, pandas uses on= or left_on=/right_on=, and Polars uses on= or left_on=/right_on=.

How to Join Data with SQL — INNER, LEFT, CROSS JOIN and Anti-Join

When your data lives in more than one table, you need a way to combine them. That is exactly what JOIN does — it links rows from two (or more) tables based on a column they share.

Joins are the backbone of relational databases. A well-normalised schema splits data across tables to avoid duplication — stations in one table, transactions in another — and JOIN stitches them back together at query time.

This article covers six patterns, each demonstrated with interactive examples you can edit and run directly in your browser:

Exploring the fuel transactions table
Exploring the stations reference table
INNER JOIN — return only matching rows
LEFT JOIN — keep all rows from the left table
Anti-join — find rows with no match
CROSS JOIN — every combination of rows
Self-join — compare rows within the same table

The dataset

We use two tables. The fuel table records petrol transactions — each row captures the station name, fuel_type, litres sold, and price_per_litre. The stations table is a reference table with metadata about each station: its suburb, brand, and state.

Crucially, the fuel table does not contain state or brand information. To get that, you need to join it to stations. Also note that 7-Eleven Parramatta appears in the stations table but has no transactions in fuel — this will be useful for demonstrating LEFT JOIN and anti-join behaviour.

Click Run on the first block to see the fuel transactions:

SQL — editable

SELECT * FROM fuel LIMIT 5;

Figure 1: First 5 rows of the fuel transactions table — no state or brand column.

The fuel table has 14 transactions across three stations: Caltex Bondi, BP Southbank, and Shell Fortitude Valley. Notice there is no state or brand column — that information lives in the stations table.

The stations table

Now let’s look at the reference table that holds station metadata:

SQL — editable

SELECT * FROM stations;

Figure 2: The stations reference table — 4 stations, including 7-Eleven with no transactions.

Four stations are registered. Three of them — Caltex Bondi, BP Southbank, and Shell Fortitude Valley — have matching rows in the fuel table. The fourth, 7-Eleven Parramatta, has no transactions at all. This mismatch is what makes joins interesting.

INNER JOIN — only matching rows

INNER JOIN returns only the rows where the join key exists in both tables. Think of it as the intersection — if a station appears in fuel and in stations, it is included. If it appears in only one table, it is excluded.

Visually, imagine two overlapping circles (a Venn diagram). INNER JOIN returns only the overlap.

SQL — editable

SELECT f.station, f.fuel_type, f.litres, s.suburb, s.brand
FROM fuel f
INNER JOIN stations s ON f.station = s.station;

Figure 3: INNER JOIN — only stations present in both tables appear.

The ON f.station = s.station clause specifies the join key. We use table aliases (f for fuel, s for stations) to keep the query concise. Notice that 7-Eleven Parramatta is absent from the result — it has no matching fuel transactions.

In pandas, this is equivalent to pd.merge(fuel, stations, on='station') (which defaults to an inner join). In Polars, it is fuel.join(stations, on='station').

LEFT JOIN — keep everything from one table

LEFT JOIN returns every row from the left table, regardless of whether a match exists in the right table. Where there is no match, the right-side columns are filled with NULL.

This is the most common join in practice — you almost always want to preserve one table completely and enrich it with data from another.

SQL — editable

SELECT s.station, s.brand, f.fuel_type, f.litres
FROM stations s
LEFT JOIN fuel f ON s.station = f.station;

Figure 4: LEFT JOIN — 7-Eleven Parramatta appears with NULLs for fuel columns.

7-Eleven Parramatta now appears in the result with NULL values for fuel_type and litres, because there are no matching transactions. Every other station shows its transactions normally.

In pandas: pd.merge(stations, fuel, on='station', how='left'). In Polars: stations.join(fuel, on='station', how='left').

Anti-join — find rows with no match

An anti-join finds rows in one table that have no corresponding row in another table. SQL has no dedicated ANTI JOIN keyword — instead you combine LEFT JOIN with a WHERE ... IS NULL filter on the right table’s key:

SQL — editable

SELECT s.station, s.brand
FROM stations s
LEFT JOIN fuel f ON s.station = f.station
WHERE f.station IS NULL;

Figure 5: Anti-join — stations with zero fuel transactions.

Only 7-Eleven Parramatta is returned — it is the only station with no fuel transactions. This pattern is extremely useful for finding orphaned records, missing data, or unmatched keys. You can also write this using NOT EXISTS or NOT IN, but the LEFT JOIN + IS NULL pattern is the most readable and portable.

CROSS JOIN — every combination

CROSS JOIN produces the Cartesian product — every row from the left table paired with every row from the right table. With 14 fuel rows and 4 station rows, that gives 56 combinations. Be careful with large tables.

Common uses include generating test data, building reporting scaffolds (every product with every region), and creating date dimension tables.

SQL — editable

SELECT f.station, f.fuel_type, s.brand
FROM fuel f
CROSS JOIN stations s
LIMIT 10;

Figure 6: CROSS JOIN — first 10 rows of all possible combinations.

Every fuel transaction is paired with every station — even stations that have nothing to do with that transaction. The LIMIT 10 keeps the output manageable. Remove it to see all 56 rows.

Self-join — compare rows in the same table

A self-join joins a table to itself. This is useful when you want to compare rows within the same table — for example, comparing fuel prices across different fuel types at the same station.

You must use different aliases for the two copies of the table (f1 and f2). The condition f1.fuel_type < f2.fuel_type avoids duplicate pairs and self-comparisons:

SQL — editable

SELECT f1.station,
       f1.fuel_type, f1.price_per_litre,
       f2.fuel_type AS compared_to,
       f2.price_per_litre AS other_price
FROM fuel f1
INNER JOIN fuel f2 ON f1.station = f2.station AND f1.fuel_type < f2.fuel_type;

Figure 7: Self-join — comparing fuel type prices at the same station.

Each row shows a pair of fuel types at the same station with their respective prices. The < comparison on fuel_type ensures each pair appears only once (Diesel vs Premium, not also Premium vs Diesel) and prevents a row from joining with itself.

SQL vs pandas vs Polars: joins compared

If you are coming from Python, here is how the join operations map across tools:

pd.merge(df1, df2, on='key') (pandas) → SELECT * FROM t1 INNER JOIN t2 ON t1.key = t2.key
df1.join(df2, on='key') (Polars) → SELECT * FROM t1 INNER JOIN t2 ON t1.key = t2.key
pd.merge(df1, df2, on='key', how='left') (pandas) → LEFT JOIN
df1.join(df2, on='key', how='left') (Polars) → LEFT JOIN
df1.join(df2, on='key', how='cross') (Polars) → CROSS JOIN (pandas uses pd.merge(df1, df2, how='cross'))
df1.join(df2, on='key', how='anti') (Polars) → LEFT JOIN ... WHERE t2.key IS NULL (anti-join)
Polars has a dedicated how='anti' parameter; pandas requires boolean masking after merge

The key concept is the same across all three: you specify the key column(s), the join type (inner, left, cross, anti), and the engine figures out how to match rows. SQL uses ON to specify the key, pandas uses on= (or left_on=/right_on=), and Polars uses on= (or left_on=/right_on=).

One difference: SQL also supports USING (column) as shorthand when both tables share the same column name. This is cleaner than ON t1.column = t2.column and avoids duplicating the join column in the result set.

Try editing the code blocks above — change INNER JOIN to LEFT JOIN, swap the table order, add a WHERE clause, or combine a join with GROUP BY to aggregate across tables.

Data Science SQL JOIN Merging

References

SQLite documentation: SELECT — JOIN
SQLite documentation: FROM clause and JOIN types
sql.js: SQLite compiled to WebAssembly
Pandas equivalent: How to Merge Data with Pandas
Polars equivalent: How to Join Data with Polars

Suhith Illesinghe

Curiosity is the first step to make a difference. I hope to inspire others to explore, build and champion collaborative growth.

How to join data with SQL?

The dataset

The stations table

INNER JOIN — only matching rows

LEFT JOIN — keep everything from one table

Anti-join — find rows with no match

CROSS JOIN — every combination

Self-join — compare rows in the same table

SQL vs pandas vs Polars: joins compared

References

Related Articles