SELF JOIN
Join a table to itself — manager-employee hierarchies, comparing rows within the same table, finding duplicates, and every pattern where two rows of the same table need to be compared
// Part 01
What a SELF JOIN Is and Why You Need It
Every JOIN you have written so far combines two different tables. A SELF JOIN joins a table to itself — the same table appears twice in the FROM clause under two different aliases. The database treats each alias as an independent copy of the table, allowing rows within the same table to be compared or related to each other.
This sounds unusual but it solves a class of problems that no other SQL construct handles cleanly:
// Part 02
Syntax — Aliases Make SELF JOIN Possible
A SELF JOIN requires two aliases for the same table. Without aliases, the database cannot distinguish which copy of the table each column reference belongs to. The aliases are what make the table appear as two independent sources.
// Part 03
Manager-Employee Hierarchy — The Classic SELF JOIN
The most common SELF JOIN use case is an organisational hierarchy stored in a single table. The employees table has a manager_id column that references the employee_id of another employee in the same table. To get both the employee's details and their manager's name in one row, you join employees to itself.
Basic manager lookup
Including employees with no manager — LEFT SELF JOIN
INNER SELF JOIN excludes employees with no manager (manager_id IS NULL — top-level employees). Use LEFT JOIN to include them with NULL in the manager columns.
Three-level hierarchy — employee, manager, and manager's manager
// Part 04
Finding Direct Reports — Who Manages Whom
// Part 05
Row-to-Row Comparison — Comparing Rows Within the Same Table
SELF JOIN with an inequality condition pairs each row with every other row that satisfies a condition — enabling comparisons, difference calculations, and relationship detection between rows of the same table.
Customers in the same city
Products in the same price range
Orders placed by the same customer on the same day
The pair-deduplication rule
When comparing all pairs within a table, always add a.id < b.id to the ON condition. Without this, every pair (A, B) also appears as (B, A) — doubling the result. It also prevents rows from matching themselves (A, A). The less-than condition keeps only the canonical ordering of each pair.
// Part 06
Duplicate Detection Using SELF JOIN
SELF JOIN is a classic approach for finding duplicate records — rows that share the same values in key columns but have different primary keys. It is useful after bulk imports or when investigating data quality issues.
// Part 07
Sequential Comparison — Each Row vs the Previous
SELF JOIN with inequality conditions on an ordered column (like a date or ID) pairs each row with adjacent rows — allowing you to compute differences, detect sequences, and identify consecutive events.
Compare each order to the customer's previous order
Finding the next higher-paid employee in the same role
🎯 Pro Tip
Sequential row comparison with SELF JOIN works but can be verbose — especially when finding "the immediately next row." Window functions (LAG and LEAD, Module 54) are the modern, cleaner solution for this pattern. SELF JOIN is worth knowing for databases that do not support window functions and for situations where the "next row" definition is complex enough that a subquery is clearer than a window expression.
// Part 08
Salary Comparison — SELF JOIN for Benchmarking
// Part 09
SELF JOIN for Network and Relationship Analysis
SELF JOIN enables relationship analysis — finding all pairs of entities that share a property, are connected, or should be grouped together. This is the SQL equivalent of a graph adjacency traversal at one hop.
Customers who share the same loyalty tier and city
Products that always appear together in orders
// Part 10
What This Looks Like at Work
You are a people analytics engineer at DoorDash. The HR team needs an org chart report for a performance review cycle. They need: each employee listed with their direct manager, the manager's manager, how many peers (colleagues with the same manager) each employee has, and whether they earn more or less than their manager. Everything from a single self-referencing employees table.
🎯 Pro Tip
When writing multi-level self joins (employee → manager → senior manager), always use LEFT JOIN not INNER JOIN between levels. INNER JOIN excludes employees whose manager has no manager — which would drop every second-level employee from a two-level tree. LEFT JOIN preserves all employees at every level, with NULL for levels that do not exist for that person.
// Part 11
Interview Prep — 5 Questions With Complete Answers
A SELF JOIN joins a table to itself — the same table appears twice in the FROM clause under two different aliases. The database treats each alias as an independent copy, allowing rows within the same table to be related to or compared with each other. Without aliases, the database cannot distinguish which "copy" of the table each column reference belongs to, so aliases are mandatory.
The most common use case is hierarchical data stored in a single table — organisational charts where an employees table has a manager_id that references another row in the same table. To retrieve an employee alongside their manager's name, you join employees AS emp to employees AS mgr on emp.manager_id = mgr.employee_id. Both references go to the same physical table, but the aliases create two logical copies.
Other use cases: row-to-row comparison within the same table (finding customers in the same city by joining customers to itself on city = city and customer_id < customer_id), duplicate detection (same email or same name), product co-occurrence analysis (which products appear together in orders by joining order_items to itself on order_id), and sequential row comparison (comparing each order to the previous order for the same customer). The pattern to reach for: whenever you need to compare two rows of the same table, or when a foreign key in a table references the same table's primary key, SELF JOIN is the tool.
When joining a table to itself to compare all pairs of rows, each pair (A, B) would appear twice — once as (A, B) and once as (B, A) — plus each row would match itself (A, A). Adding a.primary_key < b.primary_key to the ON condition eliminates both problems in one constraint.
The less-than condition works because it enforces that the left alias always has a smaller ID than the right alias. For any two rows 1 and 2: only (1, 2) survives — (2, 1) is excluded because 2 < 1 is false. Self-matches (1, 1) are excluded because 1 < 1 is false. The result contains each unique pair exactly once.
Choose the condition based on what you need: use a.id < b.id to get each unordered pair once — use this when (A,B) and (B,A) represent the same pair and you only need one row. Use a.id <> b.id to exclude self-matches but keep both directions — use this when order matters, such as "find all pairs where employee A earns more than employee B" (A earns more than B is different from B earns more than A). Use a.id < b.id for symmetric relationships (same city, same salary, same category); use a.id <> b.id for directional relationships (earns more than, reports to).
Use a SELF JOIN on the employees table with two aliases — one for the employee row and one for the manager row. The join condition connects the employee's manager_id foreign key to the manager row's employee_id primary key: FROM employees AS emp JOIN employees AS mgr ON emp.manager_id = mgr.employee_id.
SELECT emp.employee_id, emp.first_name AS employee_name, emp.role, mgr.first_name AS manager_name, mgr.role AS manager_role FROM employees AS emp JOIN employees AS mgr ON emp.manager_id = mgr.employee_id ORDER BY emp.employee_id. This INNER JOIN returns only employees who have a manager — employees with NULL manager_id are excluded.
To include employees with no manager (top-level executives), change INNER JOIN to LEFT JOIN: FROM employees AS emp LEFT JOIN employees AS mgr ON emp.manager_id = mgr.employee_id. Employees with no manager appear with NULL in all manager columns. Use COALESCE(mgr.first_name, 'No manager') to display a meaningful label. For a three-level hierarchy, add a third alias: LEFT JOIN employees AS senior ON mgr.manager_id = senior.employee_id. Each additional level adds one more alias and one more LEFT JOIN. For unlimited depth, use recursive CTEs (WITH RECURSIVE) instead of fixed-depth self joins.
A self join retrieves manager data as additional columns in the same result row — you get employee columns and manager columns side by side. A correlated subquery in SELECT retrieves one value per employee row — useful for fetching a single manager attribute but not for fetching multiple manager attributes cleanly.
Self join for multiple attributes: SELECT emp.name, mgr.name AS manager, mgr.salary AS mgr_salary, mgr.role AS mgr_role FROM employees AS emp LEFT JOIN employees AS mgr ON emp.manager_id = mgr.employee_id. This fetches name, salary, and role from the manager row in one join. Clean and efficient — one table scan per alias.
Subquery for a single attribute: SELECT emp.name, (SELECT mgr.name FROM employees AS mgr WHERE mgr.employee_id = emp.manager_id) AS manager_name FROM employees AS emp. This is a correlated subquery — it executes once per employee row, which is O(n) subquery executions. For fetching one attribute it is readable, but for fetching multiple manager attributes it requires multiple correlated subqueries (one per attribute), each executing O(n) times — significantly less efficient than a single self join. The self join is the preferred approach for any query that needs more than one column from the related row. Use a correlated subquery only when you need one specific value and the join would be complex or unintuitive.
This requires comparing orders rows within the same table — specifically joining orders to itself on both customer conditions and date equality. The self join: FROM orders AS a JOIN orders AS b ON a.order_date = b.order_date AND a.customer_id <> b.customer_id AND a.order_id < b.order_id.
The three ON conditions serve different purposes. a.order_date = b.order_date finds orders on the same day. a.customer_id <> b.customer_id ensures we are comparing different customers — not the same customer's two orders (though you might want those too if the question is about same-customer same-day). a.order_id < b.order_id deduplicates pairs so (order 5, order 8) appears once, not as both (5,8) and (8,5).
The full query: SELECT a.customer_id AS customer_a, b.customer_id AS customer_b, a.order_date, a.order_id AS order_a, b.order_id AS order_b, a.total_amount AS amount_a, b.total_amount AS amount_b FROM orders AS a JOIN orders AS b ON a.order_date = b.order_date AND a.customer_id <> b.customer_id AND a.order_id < b.order_id ORDER BY a.order_date, customer_a. This can then be joined to the customers table using both customer_a and customer_b IDs to retrieve customer names. This pattern — cross-table pair detection through a junction table self join — also underlies product co-occurrence analysis (which products appear in the same order) by joining order_items to itself on order_id with product_id deduplication.
// Part 12
Errors You Will Hit — And Exactly Why They Happen
🎯 Key Takeaways
- ✓SELF JOIN joins a table to itself using two different aliases. The database treats each alias as an independent copy, enabling row-to-row comparisons within the same table.
- ✓Aliases are mandatory in SELF JOIN — without them the database cannot distinguish which "copy" of the table each column reference belongs to. Use descriptive aliases (emp/mgr, not e1/e2).
- ✓The classic use case: manager-employee hierarchy where manager_id is a FK to employee_id in the same table. JOIN employees AS emp to employees AS mgr on emp.manager_id = mgr.employee_id.
- ✓Use LEFT JOIN not INNER JOIN for hierarchy queries — INNER JOIN silently excludes employees with no manager (NULL manager_id), dropping top-level employees from the result.
- ✓For pair comparison: add a.id < b.id to the ON condition. This prevents duplicate pairs (A,B) and (B,A) and self-matches (A,A). Use <> instead of < when direction matters.
- ✓Three-level hierarchy: chain three aliases with LEFT JOINs — emp → mgr → senior. Use LEFT JOIN at every level to preserve employees at the top of each sub-hierarchy.
- ✓Product co-occurrence: JOIN order_items AS oi1 to order_items AS oi2 on order_id AND oi1.product_id < oi2.product_id. Then join each alias to products to get names. This finds which products are purchased together.
- ✓SELF JOIN for sequential comparison (comparing each row to its predecessor) is verbose. Window functions LAG and LEAD (Module 54) are the cleaner modern solution for this pattern.
- ✓Duplicate detection: JOIN table AS a to itself AS b on the columns that define a duplicate AND a.id < b.id. Rows that match on those columns are potential duplicates.
- ✓For arbitrary-depth hierarchies (unknown number of levels), SELF JOIN with fixed aliases handles only known depth. Recursive CTEs (Module 56) handle unlimited depth.
What comes next
In Module 35, you learn CROSS JOIN — generating all combinations, calendar tables, test data grids, and the scenarios where the Cartesian product is intentional and powerful.
Module 35 → CROSS JOINDiscussion
0Have a better approach? Found something outdated? Share it — your knowledge helps everyone learning here.