SQL — Module 34Intermediate

SELF JOIN

Join a table to itself — manager-employee hierarchies, comparing rows within the same table, finding duplicates, and every pattern where two rows of the same table need to be compared

12–16 min April 2026

Section 7 · Joins

Joins · 6 modulesModule 34

Introduction INNER LEFT FULL SELF CROSS

// Part 01

What a SELF JOIN Is and Why You Need It

Every JOIN you have written so far combines two different tables. A SELF JOIN joins a table to itself — the same table appears twice in the FROM clause under two different aliases. The database treats each alias as an independent copy of the table, allowing rows within the same table to be compared or related to each other.

This sounds unusual but it solves a class of problems that no other SQL construct handles cleanly:

👤

Hierarchical relationships

An employee's manager is also an employee in the same table. The only way to retrieve both the employee's data and their manager's data in one row is to join employees to itself.

🔍

Row-to-row comparison

Finding orders placed by the same customer on the same day, products in the same price range, or employees hired on the same date — all require comparing rows within the same table.

🔄

Duplicate detection

Finding rows that share the same values across multiple columns requires joining the table to itself and comparing each pair.

📊

Sequential analysis

Comparing each row to the previous or next row — price changes, order sequences, event timelines — uses SELF JOIN with an inequality condition.

// Part 02

Syntax — Aliases Make SELF JOIN Possible

A SELF JOIN requires two aliases for the same table. Without aliases, the database cannot distinguish which copy of the table each column reference belongs to. The aliases are what make the table appear as two independent sources.

SELF JOIN syntax — aliases are mandatory

-- Self join: same table, two aliases
SELECT
  a.column1  AS col_from_left_copy,
  b.column2  AS col_from_right_copy
FROM table_name AS a
JOIN table_name AS b ON a.key = b.other_key;

-- The database sees this as joining two independent tables
-- 'a' is one copy, 'b' is another copy
-- Any JOIN type works: INNER, LEFT, RIGHT, FULL OUTER

-- Mandatory: descriptive alias names
-- BAD:  employees AS e1, employees AS e2  (not descriptive)
-- GOOD: employees AS emp, employees AS mgr  (tells you the role of each copy)

// Part 03

Manager-Employee Hierarchy — The Classic SELF JOIN

The most common SELF JOIN use case is an organisational hierarchy stored in a single table. The employees table has a manager_id column that references the employee_id of another employee in the same table. To get both the employee's details and their manager's name in one row, you join employees to itself.

Basic manager lookup

Loading FreshCart DB…
-- Each employee with their manager's name
-- emp = the employee row
-- mgr = the manager row (same table, different alias)
SELECT
  emp.employee_id,
  emp.first_name || ' ' || emp.last_name    AS employee,
  emp.role,
  emp.salary,
  mgr.first_name || ' ' || mgr.last_name   AS manager_name,
  mgr.role                                  AS manager_role
FROM employees AS emp
JOIN employees AS mgr ON emp.manager_id = mgr.employee_id
ORDER BY emp.salary DESC;Ctrl + Enter to run
Loading FreshCart database in your browser…

Including employees with no manager — LEFT SELF JOIN

INNER SELF JOIN excludes employees with no manager (manager_id IS NULL — top-level employees). Use LEFT JOIN to include them with NULL in the manager columns.

Loading FreshCart DB…
-- All employees including top-level (no manager)
-- TOP-LEVEL employees: manager columns will be NULL
SELECT
  emp.employee_id,
  emp.first_name || ' ' || emp.last_name    AS employee,
  emp.role,
  emp.salary,
  emp.store_id,
  COALESCE(
    mgr.first_name || ' ' || mgr.last_name,
    'No manager (top level)'
  )                                         AS manager_name
FROM employees AS emp
LEFT JOIN employees AS mgr ON emp.manager_id = mgr.employee_id
ORDER BY mgr.employee_id NULLS FIRST, emp.salary DESC;Ctrl + Enter to run
Loading FreshCart database in your browser…

Three-level hierarchy — employee, manager, and manager's manager

Loading FreshCart DB…
-- Three-level hierarchy in one query
-- Chain: emp → direct_manager → senior_manager
SELECT
  emp.first_name || ' ' || emp.last_name         AS employee,
  emp.role                                        AS employee_role,
  mgr.first_name || ' ' || mgr.last_name         AS direct_manager,
  mgr.role                                        AS manager_role,
  senior.first_name || ' ' || senior.last_name   AS senior_manager,
  senior.role                                     AS senior_role
FROM employees AS emp
LEFT JOIN employees AS mgr    ON emp.manager_id = mgr.employee_id
LEFT JOIN employees AS senior ON mgr.manager_id = senior.employee_id
ORDER BY senior.employee_id NULLS LAST,
         mgr.employee_id    NULLS LAST,
         emp.salary DESC;Ctrl + Enter to run
Loading FreshCart database in your browser…

💡 Note

The three-level hierarchy query with explicit JOINs works well for shallow trees (2-3 levels). For trees of arbitrary depth — organisational charts, file systems, category hierarchies — you need recursive CTEs (covered in Module 56). Self JOIN handles a fixed known depth; recursive CTE handles unlimited depth.

// Part 04

Finding Direct Reports — Who Manages Whom

Loading FreshCart DB…
-- For each manager: list their direct reports
SELECT
  mgr.employee_id                             AS manager_id,
  mgr.first_name || ' ' || mgr.last_name      AS manager_name,
  mgr.role                                    AS manager_role,
  COUNT(emp.employee_id)                      AS direct_reports,
  GROUP_CONCAT(
    emp.first_name || ' ' || emp.last_name,
    ', '
  )                                           AS report_names
FROM employees AS mgr
JOIN employees AS emp ON emp.manager_id = mgr.employee_id
GROUP BY mgr.employee_id, mgr.first_name, mgr.last_name, mgr.role
ORDER BY direct_reports DESC;Ctrl + Enter to run
Loading FreshCart database in your browser…

Loading FreshCart DB…
-- Managers whose direct reports earn more than them on average
SELECT
  mgr.employee_id,
  mgr.first_name || ' ' || mgr.last_name    AS manager,
  mgr.salary                                AS manager_salary,
  ROUND(AVG(emp.salary), 0)                 AS avg_report_salary,
  COUNT(emp.employee_id)                    AS report_count
FROM employees AS mgr
JOIN employees AS emp ON emp.manager_id = mgr.employee_id
GROUP BY mgr.employee_id, mgr.first_name, mgr.last_name, mgr.salary
HAVING AVG(emp.salary) > mgr.salary
ORDER BY avg_report_salary DESC;Ctrl + Enter to run
Loading FreshCart database in your browser…

// Part 05

Row-to-Row Comparison — Comparing Rows Within the Same Table

SELF JOIN with an inequality condition pairs each row with every other row that satisfies a condition — enabling comparisons, difference calculations, and relationship detection between rows of the same table.

Customers in the same city

Loading FreshCart DB…
-- Pairs of customers in the same city
-- a.customer_id < b.customer_id prevents duplicate pairs (A,B) and (B,A)
SELECT
  a.customer_id                             AS customer_a_id,
  a.first_name || ' ' || a.last_name        AS customer_a,
  b.customer_id                             AS customer_b_id,
  b.first_name || ' ' || b.last_name        AS customer_b,
  a.city
FROM customers AS a
JOIN customers AS b
  ON a.city = b.city
  AND a.customer_id < b.customer_id   -- prevent duplicate pairs and self-match
ORDER BY a.city, a.customer_id;Ctrl + Enter to run
Loading FreshCart database in your browser…

Products in the same price range

Loading FreshCart DB…
-- Pairs of products in the same category with similar prices
-- (within ₹20 of each other)
SELECT
  a.product_name                            AS product_a,
  b.product_name                            AS product_b,
  a.category,
  a.unit_price                              AS price_a,
  b.unit_price                              AS price_b,
  ABS(a.unit_price - b.unit_price)          AS price_difference
FROM products AS a
JOIN products AS b
  ON a.category = b.category
  AND a.product_id < b.product_id          -- no duplicates or self-match
  AND ABS(a.unit_price - b.unit_price) <= 20
ORDER BY a.category, price_difference;Ctrl + Enter to run
Loading FreshCart database in your browser…

Orders placed by the same customer on the same day

Loading FreshCart DB…
-- Customers who placed multiple orders on the same day
-- (potential duplicate order alert)
SELECT
  a.customer_id,
  a.order_id                                AS order_a,
  b.order_id                                AS order_b,
  a.order_date,
  a.total_amount                            AS amount_a,
  b.total_amount                            AS amount_b,
  a.payment_method                          AS method_a,
  b.payment_method                          AS method_b
FROM orders AS a
JOIN orders AS b
  ON a.customer_id = b.customer_id
  AND a.order_date = b.order_date
  AND a.order_id < b.order_id              -- prevent duplicates
ORDER BY a.customer_id, a.order_date;Ctrl + Enter to run
Loading FreshCart database in your browser…

The pair-deduplication rule

When comparing all pairs within a table, always add a.id < b.id to the ON condition. Without this, every pair (A, B) also appears as (B, A) — doubling the result. It also prevents rows from matching themselves (A, A). The less-than condition keeps only the canonical ordering of each pair.

Pair deduplication — why a.id < b.id is essential

-- WITHOUT deduplication:
-- Customer 1 and Customer 2 in Seattle appear as:
-- (1, 2, Seattle) ← pair
-- (2, 1, Seattle) ← duplicate of the same pair
-- (1, 1, Seattle) ← self-match (same customer!)
-- Total: 3 rows for 1 unique pair

-- WITH a.customer_id < b.customer_id:
-- Only (1, 2, Seattle) survives — unique, no self-match
-- Total: 1 row for 1 unique pair

-- Use < for unordered pairs (A,B same as B,A)
-- Use <> to exclude self-matches but keep both directions

// Part 06

Duplicate Detection Using SELF JOIN

SELF JOIN is a classic approach for finding duplicate records — rows that share the same values in key columns but have different primary keys. It is useful after bulk imports or when investigating data quality issues.

Loading FreshCart DB…
-- Find customers with the same email address
-- (should not exist with UNIQUE constraint, but useful post-migration check)
SELECT
  a.customer_id                             AS id_a,
  b.customer_id                             AS id_b,
  a.first_name || ' ' || a.last_name        AS name_a,
  b.first_name || ' ' || b.last_name        AS name_b,
  a.email,
  a.joined_date                             AS joined_a,
  b.joined_date                             AS joined_b
FROM customers AS a
JOIN customers AS b
  ON a.email = b.email
  AND a.customer_id < b.customer_id         -- one pair per duplicate set
ORDER BY a.email;Ctrl + Enter to run
Loading FreshCart database in your browser…

Loading FreshCart DB…
-- Find employees with identical salary AND same role
-- Could indicate cloned records from bulk import
SELECT
  a.employee_id                             AS emp_a,
  b.employee_id                             AS emp_b,
  a.first_name || ' ' || a.last_name        AS name_a,
  b.first_name || ' ' || b.last_name        AS name_b,
  a.role,
  a.salary,
  a.hire_date                               AS hired_a,
  b.hire_date                               AS hired_b
FROM employees AS a
JOIN employees AS b
  ON a.role   = b.role
  AND a.salary = b.salary
  AND a.employee_id < b.employee_id
ORDER BY a.role, a.salary;Ctrl + Enter to run
Loading FreshCart database in your browser…

// Part 07

Sequential Comparison — Each Row vs the Previous

SELF JOIN with inequality conditions on an ordered column (like a date or ID) pairs each row with adjacent rows — allowing you to compute differences, detect sequences, and identify consecutive events.

Compare each order to the customer's previous order

Loading FreshCart DB…
-- Each customer's order compared to their immediately previous order
-- Shows order-to-order growth/decline per customer
SELECT
  curr.customer_id,
  curr.order_id                             AS current_order,
  curr.order_date                           AS current_date,
  curr.total_amount                         AS current_amount,
  prev.order_id                             AS previous_order,
  prev.order_date                           AS previous_date,
  prev.total_amount                         AS previous_amount,
  ROUND(curr.total_amount - prev.total_amount, 2) AS amount_change
FROM orders AS curr
JOIN orders AS prev
  ON curr.customer_id = prev.customer_id
  AND prev.order_date = (
    -- The most recent previous order date for this customer
    SELECT MAX(o2.order_date)
    FROM orders AS o2
    WHERE o2.customer_id = curr.customer_id
      AND o2.order_date < curr.order_date
  )
WHERE curr.order_status = 'Delivered'
  AND prev.order_status = 'Delivered'
ORDER BY curr.customer_id, curr.order_date;Ctrl + Enter to run
Loading FreshCart database in your browser…

Finding the next higher-paid employee in the same role

Loading FreshCart DB…
-- For each employee: find the next person in their role with a higher salary
SELECT
  emp.first_name || ' ' || emp.last_name    AS employee,
  emp.role,
  emp.salary,
  next_up.first_name || ' ' || next_up.last_name AS next_higher,
  next_up.salary                            AS next_salary,
  next_up.salary - emp.salary               AS salary_gap
FROM employees AS emp
JOIN employees AS next_up
  ON emp.role = next_up.role
  AND next_up.salary = (
    SELECT MIN(e2.salary)
    FROM employees AS e2
    WHERE e2.role = emp.role
      AND e2.salary > emp.salary
  )
ORDER BY emp.role, emp.salary;Ctrl + Enter to run
Loading FreshCart database in your browser…

🎯 Pro Tip

Sequential row comparison with SELF JOIN works but can be verbose — especially when finding "the immediately next row." Window functions (LAG and LEAD, Module 54) are the modern, cleaner solution for this pattern. SELF JOIN is worth knowing for databases that do not support window functions and for situations where the "next row" definition is complex enough that a subquery is clearer than a window expression.

// Part 08

Salary Comparison — SELF JOIN for Benchmarking

Loading FreshCart DB…
-- Employees earning more than the average salary in their department
-- Using SELF JOIN to compute department average
SELECT
  emp.employee_id,
  emp.first_name || ' ' || emp.last_name    AS employee,
  emp.role,
  emp.salary,
  dept_avg.avg_dept_salary,
  ROUND(emp.salary - dept_avg.avg_dept_salary, 0) AS above_avg_by
FROM employees AS emp
JOIN (
  SELECT
    department,
    ROUND(AVG(salary), 0)  AS avg_dept_salary
  FROM employees
  GROUP BY department
) AS dept_avg ON emp.department = dept_avg.department
WHERE emp.salary > dept_avg.avg_dept_salary
ORDER BY above_avg_by DESC;Ctrl + Enter to run
Loading FreshCart database in your browser…

Loading FreshCart DB…
-- Employees earning less than their manager
SELECT
  emp.first_name || ' ' || emp.last_name    AS employee,
  emp.role                                  AS emp_role,
  emp.salary                                AS emp_salary,
  mgr.first_name || ' ' || mgr.last_name   AS manager,
  mgr.salary                                AS mgr_salary,
  mgr.salary - emp.salary                   AS salary_gap
FROM employees AS emp
JOIN employees AS mgr ON emp.manager_id = mgr.employee_id
WHERE emp.salary < mgr.salary
ORDER BY salary_gap DESC;Ctrl + Enter to run
Loading FreshCart database in your browser…

// Part 09

SELF JOIN for Network and Relationship Analysis

SELF JOIN enables relationship analysis — finding all pairs of entities that share a property, are connected, or should be grouped together. This is the SQL equivalent of a graph adjacency traversal at one hop.

Customers who share the same loyalty tier and city

Loading FreshCart DB…
-- Customer peer groups: same loyalty tier AND same city
-- Groups customers who could be targeted with identical campaigns
SELECT
  a.city,
  a.loyalty_tier,
  COUNT(*) OVER (PARTITION BY a.city, a.loyalty_tier) AS group_size,
  a.customer_id                             AS customer_a,
  a.first_name || ' ' || a.last_name        AS name_a,
  b.customer_id                             AS customer_b,
  b.first_name || ' ' || b.last_name        AS name_b
FROM customers AS a
JOIN customers AS b
  ON a.city = b.city
  AND a.loyalty_tier = b.loyalty_tier
  AND a.customer_id < b.customer_id
ORDER BY a.loyalty_tier, a.city, a.customer_id
LIMIT 15;Ctrl + Enter to run
Loading FreshCart database in your browser…

Products that always appear together in orders

Loading FreshCart DB…
-- Product co-occurrence: which products are ordered together?
-- Join order_items to itself on order_id to find co-purchased pairs
SELECT
  p1.product_name                           AS product_a,
  p2.product_name                           AS product_b,
  COUNT(DISTINCT oi1.order_id)              AS orders_together
FROM order_items AS oi1
JOIN order_items AS oi2
  ON oi1.order_id = oi2.order_id
  AND oi1.product_id < oi2.product_id       -- no duplicates
JOIN products AS p1 ON oi1.product_id = p1.product_id
JOIN products AS p2 ON oi2.product_id = p2.product_id
GROUP BY p1.product_name, p2.product_name
HAVING COUNT(DISTINCT oi1.order_id) >= 2
ORDER BY orders_together DESC;Ctrl + Enter to run
Loading FreshCart database in your browser…

// Part 10

What This Looks Like at Work

You are a people analytics engineer at DoorDash. The HR team needs an org chart report for a performance review cycle. They need: each employee listed with their direct manager, the manager's manager, how many peers (colleagues with the same manager) each employee has, and whether they earn more or less than their manager. Everything from a single self-referencing employees table.

2:00 PM

Requirements arrive

Org chart with four data points per employee: manager name, manager's manager name, peer count, and salary comparison vs manager.

2:20 PM

You break it into two parts

Part 1: three-alias self join for employee + manager + senior manager. Part 2: peer count subquery or aggregation. Combine with CTE.

Loading FreshCart DB…
-- Full org chart report: employee, manager, senior manager,
-- peer count, and salary comparison
WITH hierarchy AS (
  SELECT
    emp.employee_id,
    emp.first_name || ' ' || emp.last_name    AS employee_name,
    emp.role,
    emp.department,
    emp.salary                                AS emp_salary,
    mgr.employee_id                           AS manager_id,
    mgr.first_name || ' ' || mgr.last_name   AS manager_name,
    mgr.salary                                AS mgr_salary,
    senior.first_name || ' ' || senior.last_name AS senior_manager_name
  FROM employees AS emp
  LEFT JOIN employees AS mgr    ON emp.manager_id = mgr.employee_id
  LEFT JOIN employees AS senior ON mgr.manager_id = senior.employee_id
),
peer_counts AS (
  SELECT
    manager_id,
    COUNT(*) AS peer_count
  FROM employees
  WHERE manager_id IS NOT NULL
  GROUP BY manager_id
)
SELECT
  h.employee_name,
  h.role,
  h.department,
  h.emp_salary,
  COALESCE(h.manager_name, 'No manager')    AS manager,
  COALESCE(h.senior_manager_name, '—')      AS senior_manager,
  COALESCE(p.peer_count, 0)                 AS peers_same_manager,
  CASE
    WHEN h.mgr_salary IS NULL       THEN 'Top level'
    WHEN h.emp_salary > h.mgr_salary THEN 'Earns more than manager'
    WHEN h.emp_salary = h.mgr_salary THEN 'Same as manager'
    ELSE 'Earns less than manager'
  END                                       AS salary_vs_manager
FROM hierarchy AS h
LEFT JOIN peer_counts AS p ON h.manager_id = p.manager_id
ORDER BY h.department, h.mgr_salary DESC NULLS LAST, h.emp_salary DESC;Ctrl + Enter to run
Loading FreshCart database in your browser…

2:55 PM

Report delivered

One query — three self-join aliases, one peer-count CTE, one salary comparison CASE. The HR team gets a complete org chart view with all requested data points. The query runs in under 100ms on the employee table.

🎯 Pro Tip

When writing multi-level self joins (employee → manager → senior manager), always use LEFT JOIN not INNER JOIN between levels. INNER JOIN excludes employees whose manager has no manager — which would drop every second-level employee from a two-level tree. LEFT JOIN preserves all employees at every level, with NULL for levels that do not exist for that person.

// Part 11

Interview Prep — 5 Questions With Complete Answers

Q: What is a SELF JOIN and when would you use it?

A SELF JOIN joins a table to itself — the same table appears twice in the FROM clause under two different aliases. The database treats each alias as an independent copy, allowing rows within the same table to be related to or compared with each other. Without aliases, the database cannot distinguish which "copy" of the table each column reference belongs to, so aliases are mandatory.

The most common use case is hierarchical data stored in a single table — organisational charts where an employees table has a manager_id that references another row in the same table. To retrieve an employee alongside their manager's name, you join employees AS emp to employees AS mgr on emp.manager_id = mgr.employee_id. Both references go to the same physical table, but the aliases create two logical copies.

Other use cases: row-to-row comparison within the same table (finding customers in the same city by joining customers to itself on city = city and customer_id < customer_id), duplicate detection (same email or same name), product co-occurrence analysis (which products appear together in orders by joining order_items to itself on order_id), and sequential row comparison (comparing each order to the previous order for the same customer). The pattern to reach for: whenever you need to compare two rows of the same table, or when a foreign key in a table references the same table's primary key, SELF JOIN is the tool.

Q: How do you prevent duplicate pairs when using SELF JOIN for comparison?

When joining a table to itself to compare all pairs of rows, each pair (A, B) would appear twice — once as (A, B) and once as (B, A) — plus each row would match itself (A, A). Adding a.primary_key < b.primary_key to the ON condition eliminates both problems in one constraint.

The less-than condition works because it enforces that the left alias always has a smaller ID than the right alias. For any two rows 1 and 2: only (1, 2) survives — (2, 1) is excluded because 2 < 1 is false. Self-matches (1, 1) are excluded because 1 < 1 is false. The result contains each unique pair exactly once.

Choose the condition based on what you need: use a.id < b.id to get each unordered pair once — use this when (A,B) and (B,A) represent the same pair and you only need one row. Use a.id <> b.id to exclude self-matches but keep both directions — use this when order matters, such as "find all pairs where employee A earns more than employee B" (A earns more than B is different from B earns more than A). Use a.id < b.id for symmetric relationships (same city, same salary, same category); use a.id <> b.id for directional relationships (earns more than, reports to).

Q: How do you write a query to find an employee and their manager's name from a self-referencing table?

Use a SELF JOIN on the employees table with two aliases — one for the employee row and one for the manager row. The join condition connects the employee's manager_id foreign key to the manager row's employee_id primary key: FROM employees AS emp JOIN employees AS mgr ON emp.manager_id = mgr.employee_id.

SELECT emp.employee_id, emp.first_name AS employee_name, emp.role, mgr.first_name AS manager_name, mgr.role AS manager_role FROM employees AS emp JOIN employees AS mgr ON emp.manager_id = mgr.employee_id ORDER BY emp.employee_id. This INNER JOIN returns only employees who have a manager — employees with NULL manager_id are excluded.

To include employees with no manager (top-level executives), change INNER JOIN to LEFT JOIN: FROM employees AS emp LEFT JOIN employees AS mgr ON emp.manager_id = mgr.employee_id. Employees with no manager appear with NULL in all manager columns. Use COALESCE(mgr.first_name, 'No manager') to display a meaningful label. For a three-level hierarchy, add a third alias: LEFT JOIN employees AS senior ON mgr.manager_id = senior.employee_id. Each additional level adds one more alias and one more LEFT JOIN. For unlimited depth, use recursive CTEs (WITH RECURSIVE) instead of fixed-depth self joins.

Q: What is the difference between using a self join and a subquery for the manager lookup?

A self join retrieves manager data as additional columns in the same result row — you get employee columns and manager columns side by side. A correlated subquery in SELECT retrieves one value per employee row — useful for fetching a single manager attribute but not for fetching multiple manager attributes cleanly.

Self join for multiple attributes: SELECT emp.name, mgr.name AS manager, mgr.salary AS mgr_salary, mgr.role AS mgr_role FROM employees AS emp LEFT JOIN employees AS mgr ON emp.manager_id = mgr.employee_id. This fetches name, salary, and role from the manager row in one join. Clean and efficient — one table scan per alias.

Subquery for a single attribute: SELECT emp.name, (SELECT mgr.name FROM employees AS mgr WHERE mgr.employee_id = emp.manager_id) AS manager_name FROM employees AS emp. This is a correlated subquery — it executes once per employee row, which is O(n) subquery executions. For fetching one attribute it is readable, but for fetching multiple manager attributes it requires multiple correlated subqueries (one per attribute), each executing O(n) times — significantly less efficient than a single self join. The self join is the preferred approach for any query that needs more than one column from the related row. Use a correlated subquery only when you need one specific value and the join would be complex or unintuitive.

Q: How would you find all pairs of customers who placed orders on the same day?

This requires comparing orders rows within the same table — specifically joining orders to itself on both customer conditions and date equality. The self join: FROM orders AS a JOIN orders AS b ON a.order_date = b.order_date AND a.customer_id <> b.customer_id AND a.order_id < b.order_id.

The three ON conditions serve different purposes. a.order_date = b.order_date finds orders on the same day. a.customer_id <> b.customer_id ensures we are comparing different customers — not the same customer's two orders (though you might want those too if the question is about same-customer same-day). a.order_id < b.order_id deduplicates pairs so (order 5, order 8) appears once, not as both (5,8) and (8,5).

The full query: SELECT a.customer_id AS customer_a, b.customer_id AS customer_b, a.order_date, a.order_id AS order_a, b.order_id AS order_b, a.total_amount AS amount_a, b.total_amount AS amount_b FROM orders AS a JOIN orders AS b ON a.order_date = b.order_date AND a.customer_id <> b.customer_id AND a.order_id < b.order_id ORDER BY a.order_date, customer_a. This can then be joined to the customers table using both customer_a and customer_b IDs to retrieve customer names. This pattern — cross-table pair detection through a junction table self join — also underlies product co-occurrence analysis (which products appear in the same order) by joining order_items to itself on order_id with product_id deduplication.

// Part 12

Errors You Will Hit — And Exactly Why They Happen

SELF JOIN returns n² rows — every row matched with every other row

Cause: The ON condition is missing or too broad. Without a selective join condition, every row in alias 'a' matches every row in alias 'b' — producing a Cartesian product. With 20 employees, this gives 400 rows (20×20) instead of the expected 20. Forgetting the primary key deduplication condition (a.id < b.id) when doing pair comparison also inflates results by 2x.

Fix: Add the correct join condition. For hierarchy: ON emp.manager_id = mgr.employee_id — this is highly selective (each employee has at most one manager). For pair comparison: ON a.shared_column = b.shared_column AND a.id < b.id — both the shared value and the deduplication condition are required. Run SELECT COUNT(*) first to detect unexpected row counts before fetching all results.

Column reference is ambiguous — both aliases have the same column name

Cause: Without table alias prefixes, the database cannot determine which copy of the table a column belongs to. In a self join, both aliases share all the same column names. SELECT employee_id without a prefix is ambiguous between emp.employee_id and mgr.employee_id.

Fix: Prefix every column reference with its alias: emp.employee_id, mgr.employee_id. In a self join, every single column reference must be prefixed — there are no unambiguous columns because both sides have identical column names. Use descriptive aliases (emp/mgr not e1/e2) so the prefix communicates meaning: mgr.salary is immediately clear; e2.salary is not.

SELF JOIN for hierarchy excludes top-level employees — missing C-suite or senior managers

Cause: INNER JOIN on the manager relationship excludes employees whose manager_id is NULL (employees with no manager). Top-level executives, founders, or any employee without a manager record are silently dropped from the result. This is a common mistake — the intent is to see all employees with their manager, but the result is only employees who have a manager.

Fix: Use LEFT JOIN instead of INNER JOIN: FROM employees AS emp LEFT JOIN employees AS mgr ON emp.manager_id = mgr.employee_id. All employees appear; manager columns are NULL for top-level employees. Wrap manager columns with COALESCE: COALESCE(mgr.first_name || ' ' || mgr.last_name, 'No manager') for clean display.

Three-level self join drops employees at certain hierarchy levels

Cause: INNER JOIN between the manager and senior manager aliases excludes all employees whose manager has no manager. In a two-level organisation where most managers report to a top-level executive, INNER JOIN on the third level drops everyone whose manager is top-level (manager.manager_id IS NULL). The deeper the hierarchy chain with INNER JOINs, the more rows are dropped.

Fix: Use LEFT JOIN for every level beyond the first: LEFT JOIN employees AS mgr ON emp.manager_id = mgr.employee_id AND LEFT JOIN employees AS senior ON mgr.manager_id = senior.employee_id. INNER JOIN only the first level if the relationship is mandatory (every employee must have a manager). LEFT JOIN all subsequent levels so employees at the top of each sub-hierarchy are preserved with NULLs in the missing level columns.

Product co-occurrence self join on order_items is extremely slow

Cause: The self join on order_items compares every (item, item) pair within the same order. With 100 orders each containing 5 items, there are 500 item rows. The self join on order_id produces 500×500 = 250,000 candidate pairs, filtered down to 1,000 unique pairs per order (5×4/2 = 10 pairs × 100 orders). On a real e-commerce table with millions of order items, this becomes billions of candidate pairs — too expensive.

Fix: Limit the scope with WHERE before the self join: restrict to a specific time window (order_date >= '2024-01-01'), a specific category, or a specific store. This reduces the order_items rows before the self join multiplies them. Also ensure order_id is indexed — the join ON oi1.order_id = oi2.order_id must use an index. For production co-occurrence analysis at scale, pre-aggregate pairs into a summary table using a batch job rather than computing them on the fly.

Try It Yourself

Write two queries using SELF JOIN: (1) A hierarchy query showing each employee's name, role, salary, their direct manager's name and salary, and a 'salary_relationship' column showing 'Higher than manager', 'Lower than manager', 'Same as manager', or 'No manager' as appropriate. Include all employees. Sort by department then salary descending. (2) A pair comparison query finding all pairs of products in the same category where both products are in stock and their price difference is less than ₹50. Show product_a name, product_b name, category, price_a, price_b, and price_difference. Sort by category and price_difference ascending.

🎯 Key Takeaways

✓SELF JOIN joins a table to itself using two different aliases. The database treats each alias as an independent copy, enabling row-to-row comparisons within the same table.
✓Aliases are mandatory in SELF JOIN — without them the database cannot distinguish which "copy" of the table each column reference belongs to. Use descriptive aliases (emp/mgr, not e1/e2).
✓The classic use case: manager-employee hierarchy where manager_id is a FK to employee_id in the same table. JOIN employees AS emp to employees AS mgr on emp.manager_id = mgr.employee_id.
✓Use LEFT JOIN not INNER JOIN for hierarchy queries — INNER JOIN silently excludes employees with no manager (NULL manager_id), dropping top-level employees from the result.
✓For pair comparison: add a.id < b.id to the ON condition. This prevents duplicate pairs (A,B) and (B,A) and self-matches (A,A). Use <> instead of < when direction matters.
✓Three-level hierarchy: chain three aliases with LEFT JOINs — emp → mgr → senior. Use LEFT JOIN at every level to preserve employees at the top of each sub-hierarchy.
✓Product co-occurrence: JOIN order_items AS oi1 to order_items AS oi2 on order_id AND oi1.product_id < oi2.product_id. Then join each alias to products to get names. This finds which products are purchased together.
✓SELF JOIN for sequential comparison (comparing each row to its predecessor) is verbose. Window functions LAG and LEAD (Module 54) are the cleaner modern solution for this pattern.
✓Duplicate detection: JOIN table AS a to itself AS b on the columns that define a duplicate AND a.id < b.id. Rows that match on those columns are potential duplicates.
✓For arbitrary-depth hierarchies (unknown number of levels), SELF JOIN with fixed aliases handles only known depth. Recursive CTEs (Module 56) handle unlimited depth.

What comes next

In Module 35, you learn CROSS JOIN — generating all combinations, calendar tables, test data grids, and the scenarios where the Cartesian product is intentional and powerful.

Module 35 → CROSS JOIN

Discussion

Have a better approach? Found something outdated? Share it — your knowledge helps everyone learning here.

Continue with GitHub