When your data lives in more than one table, you need a way to combine them. That is exactly what JOIN does — it links rows from two (or more) tables based on a column they share.

Joins are the backbone of relational databases. A well-normalised schema splits data across tables to avoid duplication — stations in one table, transactions in another — and JOIN stitches them back together at query time.

This article covers six patterns, each demonstrated with interactive examples you can edit and run directly in your browser:

  1. Exploring the fuel transactions table
  2. Exploring the stations reference table
  3. INNER JOIN — return only matching rows
  4. LEFT JOIN — keep all rows from the left table
  5. Anti-join — find rows with no match
  6. CROSS JOIN — every combination of rows
  7. Self-join — compare rows within the same table

The dataset

We use two tables. The fuel table records petrol transactions — each row captures the station name, fuel_type, litres sold, and price_per_litre. The stations table is a reference table with metadata about each station: its suburb, brand, and state.

Crucially, the fuel table does not contain state or brand information. To get that, you need to join it to stations. Also note that 7-Eleven Parramatta appears in the stations table but has no transactions in fuel — this will be useful for demonstrating LEFT JOIN and anti-join behaviour.

Click Run on the first block to see the fuel transactions:

SQL — editable
Figure 1: First 5 rows of the fuel transactions table — no state or brand column.

The fuel table has 14 transactions across three stations: Caltex Bondi, BP Southbank, and Shell Fortitude Valley. Notice there is no state or brand column — that information lives in the stations table.

The stations table

Now let’s look at the reference table that holds station metadata:

SQL — editable
Figure 2: The stations reference table — 4 stations, including 7-Eleven with no transactions.

Four stations are registered. Three of them — Caltex Bondi, BP Southbank, and Shell Fortitude Valley — have matching rows in the fuel table. The fourth, 7-Eleven Parramatta, has no transactions at all. This mismatch is what makes joins interesting.

INNER JOIN — only matching rows

INNER JOIN returns only the rows where the join key exists in both tables. Think of it as the intersection — if a station appears in fuel and in stations, it is included. If it appears in only one table, it is excluded.

Visually, imagine two overlapping circles (a Venn diagram). INNER JOIN returns only the overlap.

SQL — editable
Figure 3: INNER JOIN — only stations present in both tables appear.

The ON f.station = s.station clause specifies the join key. We use table aliases (f for fuel, s for stations) to keep the query concise. Notice that 7-Eleven Parramatta is absent from the result — it has no matching fuel transactions.

In pandas, this is equivalent to pd.merge(fuel, stations, on='station') (which defaults to an inner join). In Polars, it is fuel.join(stations, on='station').

LEFT JOIN — keep everything from one table

LEFT JOIN returns every row from the left table, regardless of whether a match exists in the right table. Where there is no match, the right-side columns are filled with NULL.

This is the most common join in practice — you almost always want to preserve one table completely and enrich it with data from another.

SQL — editable
Figure 4: LEFT JOIN — 7-Eleven Parramatta appears with NULLs for fuel columns.

7-Eleven Parramatta now appears in the result with NULL values for fuel_type and litres, because there are no matching transactions. Every other station shows its transactions normally.

In pandas: pd.merge(stations, fuel, on='station', how='left'). In Polars: stations.join(fuel, on='station', how='left').

Anti-join — find rows with no match

An anti-join finds rows in one table that have no corresponding row in another table. SQL has no dedicated ANTI JOIN keyword — instead you combine LEFT JOIN with a WHERE ... IS NULL filter on the right table’s key:

SQL — editable
Figure 5: Anti-join — stations with zero fuel transactions.

Only 7-Eleven Parramatta is returned — it is the only station with no fuel transactions. This pattern is extremely useful for finding orphaned records, missing data, or unmatched keys. You can also write this using NOT EXISTS or NOT IN, but the LEFT JOIN + IS NULL pattern is the most readable and portable.

CROSS JOIN — every combination

CROSS JOIN produces the Cartesian product — every row from the left table paired with every row from the right table. With 14 fuel rows and 4 station rows, that gives 56 combinations. Be careful with large tables.

Common uses include generating test data, building reporting scaffolds (every product with every region), and creating date dimension tables.

SQL — editable
Figure 6: CROSS JOIN — first 10 rows of all possible combinations.

Every fuel transaction is paired with every station — even stations that have nothing to do with that transaction. The LIMIT 10 keeps the output manageable. Remove it to see all 56 rows.

Self-join — compare rows in the same table

A self-join joins a table to itself. This is useful when you want to compare rows within the same table — for example, comparing fuel prices across different fuel types at the same station.

You must use different aliases for the two copies of the table (f1 and f2). The condition f1.fuel_type < f2.fuel_type avoids duplicate pairs and self-comparisons:

SQL — editable
Figure 7: Self-join — comparing fuel type prices at the same station.

Each row shows a pair of fuel types at the same station with their respective prices. The < comparison on fuel_type ensures each pair appears only once (Diesel vs Premium, not also Premium vs Diesel) and prevents a row from joining with itself.

SQL vs pandas vs Polars: joins compared

If you are coming from Python, here is how the join operations map across tools:

The key concept is the same across all three: you specify the key column(s), the join type (inner, left, cross, anti), and the engine figures out how to match rows. SQL uses ON to specify the key, pandas uses on= (or left_on=/right_on=), and Polars uses on= (or left_on=/right_on=).

One difference: SQL also supports USING (column) as shorthand when both tables share the same column name. This is cleaner than ON t1.column = t2.column and avoids duplicating the join column in the result set.

Try editing the code blocks above — change INNER JOIN to LEFT JOIN, swap the table order, add a WHERE clause, or combine a join with GROUP BY to aggregate across tables.

References

Suhith Illesinghe
Curiosity is the first step to make a difference. I hope to inspire others to explore, build and champion collaborative growth.