Python · SQL · Web Dev · Java · AI/ML tracks launching soon — your one platform for all of IT
SQL — Module 56Advanced
Recursive CTEs
Query hierarchical and graph data without fixed-depth self-joins — org charts, category trees, bill-of-materials, path finding, and number generation using WITH RECURSIVE
Most SQL queries work on flat data — each row is independent. But real-world data is often hierarchical: an employee reports to a manager who reports to a director who reports to a VP. A product category contains subcategories which contain sub-subcategories. A delivery route visits nodes that connect to other nodes.
Querying these structures with ordinary SQL requires knowing the depth in advance. To find all employees under a VP you would write a chain of self-joins — one join per level. This breaks the moment the hierarchy deepens or shallows. Recursive CTEs solve this cleanly: they repeat a query against their own previous output until no new rows are produced, traversing hierarchies of any depth without knowing the depth in advance.
The problem a recursive CTE solves
-- WITHOUT recursive CTE: fixed-depth self-joins (brittle)
-- Works only if hierarchy is exactly 3 levels deep
SELECT e1.name AS level1, e2.name AS level2, e3.name AS level3
FROM employees e1
JOIN employees e2 ON e2.manager_id = e1.employee_id
JOIN employees e3 ON e3.manager_id = e2.employee_id
WHERE e1.manager_id IS NULL;
-- Breaks when someone adds a 4th level
-- WITH RECURSIVE: traverses any depth automatically
WITH RECURSIVE org_tree AS (
-- Anchor: start at the root (no manager)
SELECT employee_id, name, manager_id, 0 AS depth
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive: join each employee to their manager's row
SELECT e.employee_id, e.name, e.manager_id, ot.depth + 1
FROM employees AS e
JOIN org_tree AS ot ON e.manager_id = ot.employee_id
)
SELECT * FROM org_tree ORDER BY depth, name;
-- Works for 1 level or 100 levels — same query
// Part 02
Anatomy of a Recursive CTE
A recursive CTE has exactly two parts separated by UNION ALL: the anchor member (a regular SELECT that starts the recursion) and the recursive member (a SELECT that references the CTE itself, joining the previous iteration's output to the base table). The database alternates between running the recursive member and accumulating results until the recursive member produces zero new rows.
Recursive CTE structure — every part explained
WITH RECURSIVE cte_name AS (
-- ── ANCHOR MEMBER ───────────────────────────────────────────
-- Runs exactly ONCE. Seeds the recursion.
-- Returns the starting rows (roots, first elements, initial values).
-- Must NOT reference cte_name.
SELECT col1, col2, 1 AS depth, col1::TEXT AS path
FROM base_table
WHERE starting_condition
UNION ALL -- always UNION ALL (not UNION) in recursive CTEs
-- ── RECURSIVE MEMBER ────────────────────────────────────────
-- Runs REPEATEDLY until it produces 0 new rows.
-- References cte_name — joins the PREVIOUS iteration's output.
-- Each iteration sees only the rows added by the PREVIOUS iteration.
SELECT bt.col1, bt.col2, r.depth + 1, r.path || ' > ' || bt.col1
FROM base_table AS bt
JOIN cte_name AS r -- r = previous iteration's rows
ON bt.parent_id = r.col1
WHERE r.depth < 10 -- ALWAYS add a depth/cycle guard
)
-- Final SELECT: reads ALL accumulated rows from all iterations
SELECT * FROM cte_name ORDER BY depth;
-- Execution model:
-- Iteration 0: anchor runs → produces root rows → added to result
-- Iteration 1: recursive runs with iteration 0 rows → adds children
-- Iteration 2: recursive runs with iteration 1 rows → adds grandchildren
-- ...continues until recursive produces 0 rows
-- Final result: UNION ALL of all iterations
⚠️ Important
Always include a termination condition in the recursive member — either a WHERE depth < N guard or a WHERE NOT EXISTS cycle check. Without it, a cycle in the data (A → B → A) causes infinite recursion. PostgreSQL enforces a maximum recursion depth (default 100) but hitting it raises an error rather than returning partial results.
// Part 03
Number and Date Series Generation
The simplest recursive CTE generates a series of numbers or dates — anchoring at a start value and incrementing each iteration. This is a clean alternative to generate_series() and works across all SQL databases.
Integer series
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
Date series — fill calendar gaps
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
// Part 04
Hierarchical Data — Org Charts and Category Trees
The classic recursive CTE use case: traversing a parent-child relationship stored in a self-referencing table. The same table has both the child rows and the parent rows — the recursive CTE walks from root to leaves (top-down) or from leaf to root (bottom-up).
FreshCart employee org chart — top-down traversal
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
Bottom-up traversal — find all ancestors of an employee
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
Count reports at each level
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
// Part 05
Product Category Trees — Multi-Level Navigation
E-commerce category hierarchies are a textbook recursive CTE use case. Electronics → Mobiles → Smartphones → Android Phones. The same self-referencing table stores every level. The recursive CTE walks the tree to produce breadcrumb paths, flattened lists, or roll-up totals.
Category tree — schema and recursive traversal
-- Self-referencing categories table:
-- CREATE TABLE categories (
-- category_id INTEGER PRIMARY KEY,
-- category_name TEXT NOT NULL,
-- parent_id INTEGER REFERENCES categories(category_id),
-- level INTEGER DEFAULT 0
-- );
-- Top-down traversal with breadcrumb path
WITH RECURSIVE category_tree AS (
-- Anchor: root categories (no parent)
SELECT
category_id,
category_name,
parent_id,
0 AS depth,
category_name::TEXT AS breadcrumb,
category_id::TEXT AS sort_path
FROM categories
WHERE parent_id IS NULL
UNION ALL
SELECT
c.category_id,
c.category_name,
c.parent_id,
ct.depth + 1,
ct.breadcrumb || ' > ' || c.category_name,
ct.sort_path || '.' || LPAD(c.category_id::TEXT, 6, '0')
FROM categories AS c
JOIN category_tree AS ct ON c.parent_id = ct.category_id
WHERE ct.depth < 8
)
SELECT
REPEAT(' ', depth) || category_name AS indented_name,
breadcrumb,
depth AS level
FROM category_tree
ORDER BY sort_path;
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
// Part 06
Path Finding — Shortest Route Between Nodes
Recursive CTEs can traverse graph structures — tables where rows represent edges (connections between nodes). The classic application is finding paths between stores, delivery waypoints, or network nodes. The recursive member builds paths by adding one edge per iteration.
Graph traversal — path finding between nodes
-- Delivery route graph:
-- CREATE TABLE routes (
-- from_node TEXT,
-- to_node TEXT,
-- distance INTEGER
-- );
WITH RECURSIVE paths AS (
-- Anchor: start at the origin node, zero distance
SELECT
from_node,
to_node,
distance,
ARRAY[from_node, to_node] AS visited_nodes,
distance AS total_distance,
1 AS hops
FROM routes
WHERE from_node = 'Seattle'
UNION ALL
SELECT
r.from_node,
r.to_node,
r.distance,
p.visited_nodes || r.to_node,
p.total_distance + r.distance,
p.hops + 1
FROM routes AS r
JOIN paths AS p ON r.from_node = p.to_node
-- Cycle prevention: don't revisit a node already in the path
WHERE NOT (r.to_node = ANY(p.visited_nodes))
AND p.hops < 10
)
SELECT
visited_nodes[1] AS origin,
to_node AS destination,
array_to_string(visited_nodes, ' → ') AS full_path,
total_distance,
hops
FROM paths
WHERE to_node = 'New York'
ORDER BY total_distance
LIMIT 5;
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
// Part 07
Bill of Materials — Exploding Nested Components
A bill of materials (BOM) is a hierarchical parts list: a product contains components, each component contains sub-components, and so on. Recursive CTEs explode BOMs to compute total quantities and costs at every level — without knowing how many levels deep the nesting goes.
Bill of materials explosion
-- BOM table: each row = one parent contains N units of one child
-- CREATE TABLE bom (
-- parent_id INTEGER,
-- child_id INTEGER,
-- quantity NUMERIC,
-- unit_cost NUMERIC
-- );
WITH RECURSIVE bom_explosion AS (
-- Anchor: start with the top-level product (no parent)
SELECT
child_id AS component_id,
child_id::TEXT AS path,
quantity,
quantity AS total_quantity,
unit_cost,
quantity * unit_cost AS total_cost,
0 AS depth
FROM bom
WHERE parent_id = 101 -- product ID 101
UNION ALL
SELECT
b.child_id,
be.path || ' > ' || b.child_id::TEXT,
b.quantity,
-- Multiply child quantity by parent's total quantity
be.total_quantity * b.quantity,
b.unit_cost,
be.total_quantity * b.quantity * b.unit_cost,
be.depth + 1
FROM bom AS b
JOIN bom_explosion AS be ON b.parent_id = be.component_id
WHERE be.depth < 8
)
SELECT
component_id,
path,
depth,
quantity AS qty_per_parent,
total_quantity AS total_qty_needed,
unit_cost,
ROUND(total_cost, 2) AS total_cost
FROM bom_explosion
ORDER BY path;
-- Roll up: total cost of the entire product
SELECT SUM(total_cost) AS total_product_cost
FROM bom_explosion;
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
// Part 08
Cycle Detection and Prevention
A cycle occurs when following parent-child links eventually leads back to a node already visited — A is parent of B, B is parent of C, C is parent of A. Without protection, the recursive CTE loops infinitely. Two mechanisms prevent this: a depth limit (simple but imprecise) and a visited-array check (precise cycle detection).
Cycle prevention — depth limit vs visited array
-- Method 1: depth limit (simple — stops after N iterations)
WITH RECURSIVE safe_traverse AS (
SELECT id, parent_id, name, 0 AS depth
FROM nodes WHERE parent_id IS NULL
UNION ALL
SELECT n.id, n.parent_id, n.name, st.depth + 1
FROM nodes AS n
JOIN safe_traverse AS st ON n.parent_id = st.id
WHERE st.depth < 20 -- stops at depth 20 regardless of cycles
)
SELECT * FROM safe_traverse;
-- Risk: if a cycle exists deeper than 20 levels, it is silently cut off
-- But for most real hierarchies, depth 20 is a safe upper bound
-- Method 2: visited array (precise — detects actual cycles)
WITH RECURSIVE cycle_safe AS (
SELECT id, parent_id, name,
ARRAY[id] AS visited -- track IDs seen in this path
FROM nodes WHERE parent_id IS NULL
UNION ALL
SELECT n.id, n.parent_id, n.name,
cs.visited || n.id
FROM nodes AS n
JOIN cycle_safe AS cs ON n.parent_id = cs.id
WHERE NOT (n.id = ANY(cs.visited)) -- stop if we've seen this ID before
)
SELECT * FROM cycle_safe;
-- Precise: stops exactly when a cycle is detected
-- Cost: array grows with depth — slight overhead for deep hierarchies
-- Method 3: both (belt and suspenders for production)
WHERE NOT (n.id = ANY(cs.visited)) AND cs.depth < 50
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
// Part 09
Aggregating Over Hierarchies — Rollup Totals
Once a recursive CTE has mapped the full hierarchy, joining it back to fact data enables rollup aggregations — total sales for a category including all its subcategories, total headcount under each manager including all indirect reports.
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
// Part 10
What This Looks Like at Work
You are a data engineer at Amazon. The supply chain team needs a complete product component cost report — for any product, show every raw component it contains at any depth, the cumulative quantity needed, and the total cost contribution. The product hierarchy is 3–5 levels deep and changes weekly. They need this as an on-demand SQL query, not a hardcoded report.
2:00 PM
Requirement: explode any product's BOM on demand
Given a product_id, return all components at every depth, their cumulative quantities (multiplied through the hierarchy), and total cost. Works for any product regardless of depth.
2:30 PM
Design: recursive CTE anchors at the product, recurses into components
Anchor = direct components of the product. Recursive = sub-components of each component. Each iteration multiplies the child quantity by the parent's cumulative quantity. Depth guard prevents runaway recursion.
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
3:30 PM
Report complete — works for any product, any depth
The recursive CTE handles a 3-level hierarchy today and a 6-level hierarchy next quarter without any query changes. The supply chain team parameterises it by changing the basket_id value. The cumulative quantity multiplication propagates correctly through every level — 3 Juice Packs × 3 Bottles each = 9 total bottles, correctly computed at depth 1.
🎯 Pro Tip
When building recursive CTEs for BOM explosion, always track cumulative_qty by multiplying the child's qty_per_parent by the parent's cumulative_qty at each level — not just the qty_at_level. A 2-unit parent containing 3-unit children means 6 children total, not 3. The multiplication must compound through every level of the hierarchy. Test the cumulative_qty column first on a small known BOM before running on production data.
// Part 11
Interview Prep — 5 Questions With Complete Answers
Q: What is a recursive CTE and when would you use one?
A recursive CTE is a Common Table Expression that references itself in its own definition. It has two parts separated by UNION ALL: an anchor member (a regular SELECT that runs once and seeds the recursion) and a recursive member (a SELECT that references the CTE by name and joins the previous iteration's output to the base table). The database alternates between running the recursive member against the previous iteration's rows and accumulating results, until the recursive member produces zero new rows.
Use a recursive CTE when the data has a hierarchical or graph structure stored in a self-referencing table — a parent_id column that points back to the same table's primary key. The key advantage over self-joins: recursive CTEs traverse hierarchies of unknown depth. A fixed chain of self-joins works only when you know the exact number of levels in advance. Recursive CTEs adapt automatically to any depth.
Common use cases: organisational hierarchies (employee → manager relationships), product category trees (category → parent category), bill of materials (product → components → sub-components), network/graph traversal (finding paths between nodes), and generating series (sequences of numbers or dates). Any time you find yourself writing a chain of self-joins to handle levels of a hierarchy, replace it with a recursive CTE — the query becomes shorter, more correct, and depth-independent.
Q: Explain the anchor and recursive members of a recursive CTE.
The anchor member is the first SELECT in the recursive CTE — the one that does not reference the CTE itself. It runs exactly once when the CTE begins executing and produces the initial set of rows that seeds the recursion. For a top-down hierarchy traversal, the anchor typically selects root nodes (WHERE parent_id IS NULL). For a bottom-up traversal, the anchor selects a specific leaf node to start from. For a number series, the anchor selects the starting value (SELECT 1 AS n).
The recursive member is the second SELECT — it references the CTE by name and joins the CTE's rows (which represent the previous iteration's output) to the base table to find the next level of rows. Each iteration of the recursive member sees only the rows produced by the immediately preceding iteration — not all accumulated rows. The recursive member must produce fewer and fewer rows as the recursion progresses, eventually producing zero rows to terminate.
The two members are connected by UNION ALL (not UNION). Using UNION would deduplicate across iterations, which is both semantically wrong for most hierarchies and significantly slower. The final result of the CTE is the UNION ALL of all rows produced by all iterations — the anchor's rows plus every recursive iteration's rows combined. The outer SELECT that queries the CTE reads this complete accumulated result set.
Q: How do you prevent infinite recursion in a recursive CTE?
Infinite recursion occurs when following parent-child links leads back to a node already visited (a cycle: A → B → C → A) or when the termination condition is never satisfied. Without protection, the recursive member keeps producing rows forever — PostgreSQL enforces a default max_recursion_depth of 100 and raises an error when hit, but the error is less useful than intentional termination.
Two protection mechanisms. Method 1: depth limit — add a depth counter to the anchor (0 AS depth) and increment it in the recursive member (depth + 1). Add WHERE depth < N to the recursive member. The recursion stops after N iterations regardless of data. Simple and cheap, but N must be chosen carefully — too low and valid deep hierarchies are cut off, too high and cycles spin for N iterations before stopping. For most org charts N = 10–20 is safe; for BOMs N = 5–8 is typical.
Method 2: visited array — maintain an array of node IDs seen in the current path. In the anchor, initialise it with ARRAY[id]. In the recursive member, append the new node's ID and add WHERE NOT (new_id = ANY(visited_array)). This stops the moment a cycle is detected, at exactly the right point regardless of depth. Cost: the array grows with depth and IS DISTINCT FROM comparisons are slightly more expensive than integer comparison. For production code on data that might have cycles, use both: the visited array for correctness and a depth limit as a safety backstop.
Q: How do you generate a date series using a recursive CTE?
The pattern: anchor on the start date, recursive member adds one day per iteration, terminate when the date reaches the end date. WITH RECURSIVE date_series AS (SELECT '2024-01-01'::DATE AS dt UNION ALL SELECT dt + 1 FROM date_series WHERE dt < '2024-01-31') SELECT dt FROM date_series.
The primary use of date series in analytics is filling gaps in time-series data. When querying daily revenue, days with no orders produce no rows — the result has gaps. A date series CTE produces every calendar date, and a LEFT JOIN to the revenue data fills in zero for missing days. This ensures reports show a complete timeline without gaps, which is essential for charts and moving averages that depend on consecutive date sequences.
For monthly series, add INTERVAL '1 month' instead of +1: SELECT '2024-01-01'::DATE AS m UNION ALL SELECT (m + INTERVAL '1 month')::DATE FROM months WHERE m < '2024-12-01'. PostgreSQL also provides the built-in generate_series(start, end, interval) function which is more concise for this purpose, but the recursive CTE pattern is portable across all SQL databases that support WITH RECURSIVE (PostgreSQL, SQL Server, MySQL 8+, SQLite 3.35+).
Q: What is the difference between a recursive CTE and a self-join for hierarchical queries?
A self-join joins a table to itself on a specific relationship — typically parent_id = id. A single self-join produces one level of the hierarchy: employees joined to their direct managers. Two self-joins produce two levels. N self-joins produce N levels. The critical limitation: the number of self-joins must be known and hardcoded. If the hierarchy deepens from 4 to 5 levels, the query must be rewritten to add another self-join. This is brittle — any schema change to the depth of the hierarchy requires a query change.
A recursive CTE traverses any depth automatically. The recursive member defines the relationship between one level and the next. The database applies this definition repeatedly until no new rows are produced. A 4-level hierarchy and a 40-level hierarchy use the identical query — only the data depth differs, not the SQL. This makes recursive CTEs the correct tool for any hierarchy where the depth is variable, unknown at query time, or changes over time.
Performance comparison: for shallow, fixed-depth hierarchies (always exactly 2-3 levels), self-joins can be marginally faster because the query plan is simpler and the planner can optimise fully. For variable-depth hierarchies, recursive CTEs are both more correct (they handle any depth) and practically faster (a 5-level self-join requires 5 table scans with Cartesian intermediate results; a recursive CTE performs one scan per level with only the relevant rows). The readability advantage strongly favours recursive CTEs — 10 lines replacing 50 lines of chained self-joins, with no depth limitation.
// Part 12
Errors You Will Hit — And Exactly Why They Happen
ERROR: WITH RECURSIVE query has no self-reference — recursive term missing
Cause: The recursive member of the CTE does not reference the CTE's own name. A recursive CTE requires the second SELECT (after UNION ALL) to join to the CTE itself — that is what makes it recursive. Without the self-reference, PostgreSQL raises this error because the CTE structure is syntactically recursive (WITH RECURSIVE) but semantically it is just a regular CTE.
Fix: Ensure the recursive member contains a FROM or JOIN that references the CTE by name: WITH RECURSIVE org AS (SELECT ... FROM employees WHERE manager_id IS NULL UNION ALL SELECT e.* FROM employees AS e JOIN org ON e.manager_id = org.employee_id). The JOIN org ON ... is the self-reference. If the query genuinely does not need recursion, remove RECURSIVE from WITH RECURSIVE — a non-recursive CTE does not require the keyword and can use UNION instead of UNION ALL.
ERROR: maximum recursion depth exceeded (100)
Cause: The recursive CTE ran 100 iterations without the recursive member producing zero rows. Either the data contains a cycle (A is parent of B, B is parent of A), there is no termination condition in the WHERE clause of the recursive member, or the hierarchy is genuinely deeper than 100 levels.
Fix: Add a depth counter and termination condition: 0 AS depth in the anchor, depth + 1 in the recursive member, and WHERE depth < 50 in the recursive member's WHERE clause. For cycle detection, add a visited array: ARRAY[id] in the anchor, visited || new_id in the recursive member, and WHERE NOT (new_id = ANY(visited)) in the WHERE clause. To increase the limit temporarily (not recommended for production): SET max_recursion_depth = 200. Always prefer fixing the data or adding a proper termination condition over raising the limit.
Recursive CTE returns duplicate rows — same node appears multiple times
Cause: The hierarchy data contains a node with multiple parents (a diamond pattern: A → B → D and A → C → D), causing the recursive member to reach node D twice via different paths. In a true tree (each node has exactly one parent), this should not happen — its presence indicates either bad data or that the data is a directed acyclic graph (DAG) rather than a tree.
Fix: If the data is a DAG (multiple parents allowed), use UNION instead of UNION ALL between anchor and recursive members — UNION deduplicates and ensures each node appears only once in the result. Note: UNION is slower than UNION ALL because deduplication requires sorting or hashing. Alternatively, use a visited array to skip nodes already accumulated. If the data should be a tree and duplicates indicate bad data, audit the parent_id assignments: SELECT child_id, COUNT(DISTINCT parent_id) FROM hierarchy_table GROUP BY child_id HAVING COUNT(DISTINCT parent_id) > 1 — any result here is a node with multiple parents.
Cause: The cumulative quantity is not being multiplied through the hierarchy correctly. A common mistake: using qty_per_parent instead of cumulative_qty * qty_per_parent for the child's cumulative quantity. If a product contains 3 packs and each pack contains 4 bottles, the correct total is 12 bottles — but if the recursive member uses only 4 (qty_per_parent of the pack) instead of 3 × 4 = 12 (parent's cumulative_qty × child's qty_per_parent), the total is wrong.
Fix: In the recursive member, compute the child's cumulative quantity as parent.cumulative_qty * child.qty_per_parent — always multiply by the parent's cumulative quantity, not just the level's quantity. Verify with a small known BOM: manually calculate the expected quantities for each component and compare to the CTE output. Add a CHECK column to the output showing the multiplication: parent_cumulative_qty || ' × ' || qty_per_parent || ' = ' || cumulative_qty to make the calculation visible and auditable.
Date series CTE generates incorrect dates — skips days or produces wrong range
Cause: The termination condition uses a strict less-than (<) when it should include the end date, or the date arithmetic is using INTERVAL in a way that skips non-existent dates (e.g., adding months to January 31 produces March 2 instead of February 28). Alternatively, the cast from INTERVAL addition back to DATE is missing, causing a TIMESTAMP result that does not match a DATE column in the LEFT JOIN.
Fix: For daily series: use WHERE dt <= '2024-01-31' (inclusive) or WHERE dt < '2024-02-01' (exclusive boundary). Both include January 31. For monthly series: use (month_start + INTERVAL '1 month')::DATE — the ::DATE cast is essential. PostgreSQL's INTERVAL '1 month' handles month-end correctly: '2024-01-31'::DATE + INTERVAL '1 month' = '2024-02-29' (in a leap year), not March 2. Always cast the result back to DATE to ensure the join condition with DATE columns works correctly.
Try It Yourself
Build two recursive CTEs using FreshCart data. (1) Write a recursive CTE that generates every date in the range of FreshCart's order data (from the MIN order_date to the MAX order_date), then LEFT JOIN it to orders to produce a complete daily revenue calendar showing: calendar_date, daily_revenue (0 if no orders that day), order_count (0 if none), and a 3-day moving average of daily_revenue using a window function on the result. (2) Write a recursive CTE that traverses the FreshCart employee table from root (manager_id IS NULL) downward, producing: employee_id, full_name, job_title, depth (0 for root), indented_name (REPEAT spaces + name), full_path (names joined by ' → '), and direct_report_count (how many employees have this person as their manager_id — compute with a subquery or join). Order the org chart by the path so siblings appear together.
🎯 Key Takeaways
✓A recursive CTE has two parts joined by UNION ALL: an anchor member (runs once, seeds the recursion) and a recursive member (references the CTE itself, runs repeatedly until it produces zero rows).
✓The recursive member sees only the rows from the immediately preceding iteration — not all accumulated rows. Each iteration adds a new level of the hierarchy to the accumulated result.
✓Always include a termination condition: a depth counter with WHERE depth < N (simple) or a visited array with WHERE NOT (id = ANY(visited)) (precise cycle detection). Both together is safest for production.
✓Top-down traversal: anchor on roots (WHERE parent_id IS NULL), recursive member joins children to parents. Bottom-up traversal: anchor on a specific leaf, recursive member joins parent to child via manager_id = previous.employee_id.
✓Date series: anchor on start date, recursive increments by +1 or INTERVAL '1 month'. LEFT JOIN to fact data fills gaps with zero — essential for complete time-series charts and moving averages.
✓Build path strings by concatenating in the recursive member: path || ' → ' || new_name. Build sort keys with LPAD(id, 6, '0') so siblings sort correctly by numeric ID while maintaining lexicographic path order.
✓BOM explosion: cumulative quantity must multiply through every level — child.cumulative_qty = parent.cumulative_qty × child.qty_per_parent. Not just qty_per_parent alone.
✓Cycle prevention with visited array: ARRAY[id] in anchor, visited || new_id in recursive member, WHERE NOT (new_id = ANY(visited)) to stop. Use UNION (not UNION ALL) for DAGs where a node can have multiple parents.
✓Recursive CTEs replace chains of self-joins for hierarchical data. Self-joins require knowing the depth in advance; recursive CTEs handle any depth with the same query.
✓PostgreSQL default max_recursion_depth is 100. Hit it by accident means a cycle or missing termination condition — fix the query, not the limit. SET max_recursion_depth is a last resort.
What comes next
In Module 57, you learn EXPLAIN and Query Optimisation — reading execution plans, identifying bottlenecks, rewriting slow queries, and the systematic approach to making any query faster.