LEFT and RIGHT JOIN
Keep unmatched rows — every LEFT JOIN pattern from basic optional enrichment to anti-join gap analysis, plus when RIGHT JOIN makes sense and the ON vs WHERE trap
// Part 01
The Core Difference — Inclusion of Unmatched Rows
INNER JOIN keeps only rows that match on both sides. But many analytical questions specifically need rows that have no match on one side — customers who have never ordered, products that have never been sold, stores with no employees. These questions cannot be answered with INNER JOIN because unmatched rows are excluded by definition.
LEFT JOIN solves this. It keeps every row from the left table — whether or not it has a match in the right table. For rows that do have a match, right-side columns are populated normally. For rows with no match, right-side columns are filled with NULL. The left table is never filtered — every one of its rows appears in the result exactly once per matching right-side row (or once with NULLs if no match exists).
// Part 02
LEFT JOIN — Basic Syntax and Behaviour
// Part 03
The Anti-Join Pattern — LEFT JOIN + IS NULL
The most powerful LEFT JOIN pattern is the anti-join — finding rows in the left table that have no corresponding row in the right table. The structure is always the same: LEFT JOIN, then WHERE right_table.primary_key IS NULL. The IS NULL filter keeps only the rows where no match was found — the NULLs that LEFT JOIN added.
Customers who have never placed an order
Products that have never been sold
Employees with no store assignment
🎯 Pro Tip
The anti-join (LEFT JOIN + IS NULL) is one of the five patterns every SQL analyst must have memorised. It answers "which X has no Y?" — and that question appears constantly in production: churned users, unsold inventory, incomplete data pipelines, orphaned records. Whenever a business question contains the words "never", "no", "missing", or "without", reach for LEFT JOIN + IS NULL.
// Part 04
LEFT JOIN for Optional Enrichment — Adding Data Where Available
Not all LEFT JOINs are anti-joins. A common pattern is optional enrichment — adding data from a right table where it exists, while keeping all left rows regardless. The right-side columns may be NULL for some rows and populated for others, and both are valid in the result.
Orders with optional delivery details
Products with sales metrics — zero for unsold
Customers with loyalty tier benefits (optional reference table)
// Part 05
The ON vs WHERE Trap with LEFT JOIN
This is the most important technical subtlety about LEFT JOIN — placing a filter in WHERE instead of ON accidentally converts the LEFT JOIN to an INNER JOIN. Understanding why prevents one of the most common silent query bugs in production SQL.
How the trap works
Run both queries and compare the row counts. The WRONG version returns fewer rows — it excluded customers with no delivered orders. The RIGHT version returns all customers with 0 for those who have no delivered orders.
The decision rule
| Question | Condition goes in | Why |
|---|---|---|
| Should unmatched left rows appear with NULLs when this condition fails? | ON | Filter is part of the join relationship — unmatched rows are still kept |
| Should unmatched left rows be excluded when this condition is NULL? | WHERE | If exclusion is desired, WHERE is correct — but this converts LEFT to INNER |
| Is this condition on a left-table column only? | WHERE (safe) | Left-table conditions in WHERE never affect NULL row preservation |
| Is this condition on a right-table column? | ON (for LEFT JOIN) | Right-table conditions in WHERE kill NULL rows — move to ON to preserve them |
// Part 06
Multiple LEFT JOINs — Chaining Optional Relationships
Multiple LEFT JOINs chain together just like INNER JOINs. Each LEFT JOIN independently preserves all rows from the preceding result — adding NULLs for each table that has no match.
Mixing INNER and LEFT JOINs
You can mix INNER JOIN and LEFT JOIN in the same query. INNER JOIN for required relationships (the match is guaranteed), LEFT JOIN for optional ones (the match may not exist).
// Part 07
COALESCE with LEFT JOIN — Replacing NULLs
LEFT JOIN produces NULLs for unmatched rows. COALESCE replaces those NULLs with meaningful defaults — turning NULL into 0 for counts and sums, or into a label like 'No orders' for display.
// Part 08
RIGHT JOIN — The Mirror of LEFT JOIN
RIGHT JOIN keeps every row from the right table — whether or not it matches the left table. Left-side columns are NULL for unmatched right rows. RIGHT JOIN is the exact mirror of LEFT JOIN with the tables swapped.
In practice, RIGHT JOIN is almost never used in production code. Any RIGHT JOIN can be rewritten as a LEFT JOIN by simply swapping the table positions — and the LEFT JOIN version is universally considered more readable because most developers read queries left-to-right and expect the "anchor" table to appear first.
When RIGHT JOIN has legitimate use
RIGHT JOIN can be useful when refactoring a complex query where swapping tables is difficult — for example, when a CTE or subquery is already in the left position and adding a RIGHT JOIN to an existing chain avoids restructuring the whole query. It is also occasionally used in data pipeline code where the right table is always the "source of truth" that must be fully preserved.
// Part 09
LEFT JOIN for Data Quality Checks
LEFT JOIN with IS NULL is the standard technique for finding data quality issues — orphaned records, missing references, incomplete data pipelines, and referential integrity violations that should have been caught by FK constraints.
Find orders referencing non-existent customers
Find order items with no matching order
Find stores with no employees assigned
🎯 Pro Tip
Run LEFT JOIN + IS NULL checks on every table after a bulk data import or migration. These queries catch referential integrity violations that FK constraints would prevent in normal operation — but bulk imports sometimes bypass constraints. A 5-minute data quality check after any migration saves hours of debugging downstream when reports start showing wrong numbers.
// Part 10
LEFT JOIN vs NOT EXISTS vs NOT IN — Three Ways to Anti-Join
The anti-join (find rows with no match) can be written three ways. Each produces the same result for non-NULL values but differs in performance and NULL handling.
Method 1 — LEFT JOIN + IS NULL
Method 2 — NOT EXISTS
Method 3 — NOT IN
| Method | NULL safe? | Performance | Readability | Preferred when |
|---|---|---|---|---|
| LEFT JOIN + IS NULL | Yes | Good — uses index on join key | Moderate | Default choice for anti-join |
| NOT EXISTS | Yes | Good — short-circuits on first match | High | Readable intent, correlated check |
| NOT IN | No — NULLs break it | Worse on large lists | High | Only when subquery is guaranteed NULL-free |
// Part 11
What This Looks Like at Work
You are a data analyst at Sephora, India's beauty and personal care platform. The retention team runs a monthly lapsed-customer campaign. They need three lists: (1) customers who registered but never purchased, (2) customers who purchased at least once but not in the last 90 days, and (3) products that have been added to carts but never purchased. All three are anti-join or conditional LEFT JOIN queries — adapted here for FreshCart.
🎯 Pro Tip
Retention analytics is almost entirely built on LEFT JOIN patterns. The questions "who hasn't done X?", "who did X but not recently?", and "what hasn't been Y?" all require keeping rows that have no match in the activity table. Build these three query patterns into your personal SQL library — you will write variations of them every week in product analytics.
// Part 12
Interview Prep — 5 Questions With Complete Answers
LEFT JOIN returns every row from the left table whether or not it has a matching row in the right table. For rows that have a match, right-side columns are populated with the matched values. For rows with no match, right-side columns are filled with NULL. INNER JOIN returns only rows where the join condition is satisfied in both tables — unmatched rows from either side are excluded.
The key behavioural difference: INNER JOIN filters the left table — only rows with a match survive. LEFT JOIN never filters the left table — every row appears at least once. With a customers-to-orders LEFT JOIN, every customer appears in the result. With an INNER JOIN, only customers who have placed at least one order appear.
When to choose which: use INNER JOIN when you only care about rows that have related data on both sides — orders with known customers, order items with known products. The FK constraints in a well-designed schema mean INNER JOIN and LEFT JOIN produce the same result for required relationships. Use LEFT JOIN when you need all rows from the left table regardless of whether a match exists — all products including unsold, all customers including those who have never ordered, all stores including those with no employees. The presence of NULL values in the right-side columns after LEFT JOIN carries meaning: it signals the absence of a relationship that the query was looking for.
The LEFT JOIN + WHERE IS NULL pattern is called an anti-join. It finds rows in the left table that have no corresponding row in the right table. The structure is: SELECT left.* FROM left_table LEFT JOIN right_table ON left.key = right.key WHERE right.key IS NULL.
Why it works: LEFT JOIN produces NULLs in all right-side columns for every left row that found no match. The WHERE right.key IS NULL filter then keeps only those NULL rows — the unmatched ones. Rows that did match have a real key value (not NULL) and are excluded by the IS NULL filter. The result is exclusively the left rows with no right-side match.
This pattern answers "which X has no Y?" — customers who have never ordered, products that have never been sold, employees without store assignments, stores with no employees, orders with no line items. It is one of the five SQL patterns every analyst must know. The anti-join is functionally equivalent to NOT EXISTS and NOT IN (for non-NULL cases), but LEFT JOIN + IS NULL is often the most readable and performs well with proper indexing on the join column. It appears constantly in retention analysis (lapsed users), inventory management (unsold stock), and data quality checks (orphaned records).
LEFT JOIN adds NULL values to right-side columns for unmatched left rows. When a WHERE condition references a right-table column, it evaluates that condition on the NULL values. In SQL's three-valued logic, any comparison with NULL returns NULL — not TRUE and not FALSE. WHERE discards all rows where the condition does not evaluate to TRUE — which includes NULL. So rows with NULL right-side columns (the unmatched rows LEFT JOIN was supposed to preserve) are discarded by the WHERE filter.
Example: customers LEFT JOIN orders — all customers appear, with NULL order columns for unordered customers. Adding WHERE o.order_status = 'Delivered' evaluates 'Delivered' = NULL for unordered customers, which returns NULL, and WHERE discards those rows. The final result contains only customers with at least one delivered order — identical to INNER JOIN.
The fix: move the right-table filter to the ON clause. ON o.order_status = 'Delivered' is applied during the join itself, before NULL rows are added. It restricts which right-table rows can match, but does not filter the left table. Customers with no delivered orders still appear — they just have NULL order columns because no delivered order matched. The decision rule: for LEFT JOIN, any condition that references right-table columns and should preserve unmatched left rows belongs in ON. Only left-table conditions and aggregate-level conditions belong in WHERE.
To count records per group including groups with a count of zero, use LEFT JOIN from the group dimension table to the fact table, then COUNT the fact table's primary key (not COUNT(*)), combined with GROUP BY on the dimension table columns.
The key is COUNT(fact_table.pk) — not COUNT(*). COUNT(*) counts all rows including the NULL rows that LEFT JOIN adds for unmatched groups — it would return 1 for groups with no facts, not 0. COUNT(fact_table.pk) counts non-NULL values of the fact table's primary key. For unmatched groups, the pk is NULL, so COUNT returns 0. For matched groups, COUNT returns the number of matched rows.
Example: to count orders per store including stores with no orders — SELECT s.store_id, s.city, COUNT(o.order_id) AS order_count FROM stores AS s LEFT JOIN orders AS o ON s.store_id = o.store_id GROUP BY s.store_id, s.city. Stores with no orders have o.order_id = NULL, so COUNT(o.order_id) = 0. If you use COUNT(*) instead, stores with no orders would incorrectly show 1 (the NULL row is still a row). The same principle applies to SUM — COALESCE(SUM(o.total_amount), 0) returns 0 instead of NULL for groups with no matching rows, since SUM of an empty set returns NULL.
All three find rows in the left set that have no match in the right set, but they differ in NULL handling, performance, and readability. NOT IN is the most dangerous: NOT IN (subquery) returns zero rows if the subquery returns any NULL value — because NOT IN (1, 2, NULL) evaluates as NOT (col = 1 OR col = 2 OR col = NULL), and since col = NULL is always NULL, the entire expression is NULL for every row. This is a silent correctness bug. Only use NOT IN when you are absolutely certain the subquery column has no NULLs and will remain NULL-free.
NOT EXISTS is NULL-safe and usually the most semantically clear — "return rows where no matching row exists." The correlated subquery checks each left row individually. Most query optimisers convert NOT EXISTS to an efficient anti-join plan internally. NOT EXISTS short-circuits — it stops looking as soon as one match is found, making it efficient when matches are common.
LEFT JOIN + IS NULL is also NULL-safe and usually the most familiar pattern for analysts who think in terms of table joins. It produces an explicit NULL for unmatched rows that the IS NULL filter then selects. Performance is generally equivalent to NOT EXISTS on modern optimisers — both are translated to anti-join plans internally. LEFT JOIN + IS NULL has one practical advantage: it allows you to also SELECT the right-table columns (to see the NULLs themselves) which is useful for debugging. NOT EXISTS cannot return right-table columns at all since the subquery's SELECT is not referenced. Preference in production code: NOT EXISTS for maximum semantic clarity, LEFT JOIN + IS NULL for analytical queries where seeing the NULL columns is useful for verification.
// Part 13
Errors You Will Hit — And Exactly Why They Happen
🎯 Key Takeaways
- ✓LEFT JOIN keeps every row from the left table. Right-side columns are NULL for rows with no match. The left table is never filtered — every row appears at least once in the result.
- ✓Anti-join pattern: LEFT JOIN + WHERE right_table.pk IS NULL finds left rows with no match. This is the standard SQL idiom for "which X has no Y?" — customers with no orders, products never sold.
- ✓The ON vs WHERE trap: a WHERE condition on a right-table column converts LEFT JOIN to INNER JOIN by filtering out NULL rows. Move right-table filters to the ON clause to preserve unmatched left rows.
- ✓Left-table conditions in WHERE are always safe — they never affect NULL row preservation. Only right-table conditions in WHERE cause the trap.
- ✓COUNT(right_table.pk) returns 0 for unmatched groups. COUNT(*) returns 1 — it counts the NULL row. Always use COUNT(column) not COUNT(*) when counting after a LEFT JOIN with GROUP BY.
- ✓COALESCE wraps NULL aggregate results from LEFT JOIN: COALESCE(SUM(col), 0) and COALESCE(AVG(col), 0) replace NULL with zero for cleaner display.
- ✓RIGHT JOIN is LEFT JOIN with tables swapped — always rewritable as LEFT JOIN with the anchor table on the left. Prefer LEFT JOIN for readability.
- ✓NOT IN is dangerous when the subquery can return NULLs — it silently returns zero rows. Use LEFT JOIN + IS NULL or NOT EXISTS instead.
- ✓Multiple LEFT JOINs chain independently — each preserves all rows from the preceding result. INNER JOIN and LEFT JOIN can be mixed in the same query.
- ✓LEFT JOIN + IS NULL is the essential tool for data quality checks: orphaned records, missing references, incomplete pipeline outputs — all are found with this pattern.
What comes next
In Module 33, you learn FULL OUTER JOIN — combining both LEFT and RIGHT behaviour to keep all rows from both tables, with real use cases for reconciliation, gap analysis, and symmetric difference queries.
Module 33 → FULL OUTER JOINDiscussion
0Have a better approach? Found something outdated? Share it — your knowledge helps everyone learning here.