CTE — WITH Clause
Named intermediate results that make complex queries readable — single CTEs, chained multi-step CTEs, reuse within one query, DML with CTEs, and performance considerations
// Part 01
What a CTE Is and Why It Exists
A CTE (Common Table Expression) is a named temporary result set defined at the top of a query using the WITH keyword. It works like a derived table — it computes a SELECT and makes the result available as a named table — but it appears before the main query instead of inline within it, and it can be referenced multiple times.
The problem CTEs solve is readability and reuse. As queries grow in complexity — multiple pre-aggregations, sequential analytical steps, values needed in several places — inline subqueries and nested derived tables become hard to read and maintain. CTEs let you write each logical step once, give it a descriptive name, and refer to it cleanly by name wherever it is needed.
// Part 02
Single CTE — The Basic Pattern
Replacing a derived table with a CTE
CTE for pre-filtering
// Part 03
Multiple CTEs — Sequential Step-by-Step Logic
Multiple CTEs chain together — each CTE can reference CTEs defined before it. This enables expressing complex multi-step analytical logic as a sequence of clearly named transformations, each building on the previous. The main SELECT at the end assembles the final result from the named CTEs.
Three-step customer analysis
Four-step product performance pipeline
// Part 04
CTE Reuse — Referencing the Same CTE Multiple Times
The key advantage of CTEs over derived tables: a CTE can be referenced multiple times in the main query or in subsequent CTEs. This eliminates the need to repeat the same subquery in multiple places — define it once, use it everywhere.
Using the same CTE in multiple JOINs
CTE used in a subsequent CTE and in the main SELECT
// Part 05
CTEs with INSERT, UPDATE, and DELETE — DML CTEs
In PostgreSQL, CTEs are not limited to SELECT queries. They can be used with INSERT, UPDATE, and DELETE — enabling multi-step data modifications in a single atomic statement. The RETURNING clause from DML CTEs makes this especially powerful.
INSERT using CTE result
UPDATE with CTE — complex conditional update
DELETE with RETURNING — archive-then-delete
// Part 06
CTE for Deduplication and Data Cleaning
CTEs make deduplication queries readable by separating the identification of duplicates from the deletion or selection logic.
// Part 07
Performance — Materialisation vs Inlining
CTEs and derived tables are logically equivalent but may differ in execution. PostgreSQL (before version 12) always materialised CTEs — computed them once and stored the result. PostgreSQL 12+ made CTEs inlined by default unless they contain side effects. Understanding this helps you write CTEs that perform well.
Materialised vs inlined — what changes
When to force materialisation
Force MATERIALIZED when the CTE is referenced multiple times in the query and the computation is expensive — materialising once is cheaper than recomputing N times. Force NOT MATERIALIZED when a filter in the outer query is highly selective — allowing the optimiser to push that filter into the CTE avoids computing the full CTE result.
// Part 08
CTEs for Report Building — The Complete Pattern
The full CTE-based report pattern: define each data dimension as its own named CTE, then assemble the final report in the main SELECT by joining those CTEs. Each CTE is independently testable — run it alone to verify its output before the full query.
// Part 09
What This Looks Like at Work
You are a senior analyst at DoorDash. The weekly business review requires a single comprehensive report: customer cohort analysis — how customers who joined in January 2024 have behaved over time. For each customer: their join month, total orders, lifetime value, average order value, days since last order, and a lifecycle stage label. The query needs multiple intermediate computations — a textbook CTE use case.
🎯 Pro Tip
The best CTEs are independently testable. During development, run each CTE as a standalone SELECT to verify it produces the expected rows and values before adding the next CTE. This incremental verification pattern — run one CTE, confirm it, add the next — catches bugs at the step where they occur rather than having to untangle the full query.
// Part 10
Interview Prep — 5 Questions With Complete Answers
A CTE (Common Table Expression) is a named temporary result set defined before the main query using the WITH keyword. It creates a named intermediate result that can be referenced anywhere in the subsequent query — in the main SELECT, in subsequent CTEs, in JOIN conditions, in WHERE clauses, and in subqueries within the main query. A subquery or derived table is defined inline — inside the WHERE, FROM, or another clause — and exists only at that single location in the query.
The key practical differences: CTEs can be referenced multiple times in the same query; a derived table can only be used once at its inline position. CTEs appear before the main query and read top-to-bottom in logical order; derived tables are read inside-out which degrades readability for complex logic. CTEs give each step a descriptive name that documents what it computes; derived tables have aliases but the computation is interleaved with the outer query's structure. CTEs support recursive queries (WITH RECURSIVE) for hierarchical data; derived tables cannot be recursive.
Computationally, CTEs and derived tables produce identical results — both define a temporary result set the outer query uses as a virtual table. The choice is primarily about readability and reuse. Use a derived table for a simple, short, single-use pre-aggregation where the inline position makes context clear. Use a CTE for anything complex, multi-step, reused more than once, or deserving of a descriptive name that documents its purpose.
Multiple CTEs are chained with commas — each CTE definition is separated from the next by a comma. The WITH keyword appears only once at the very beginning: WITH first_cte AS (...), second_cte AS (...), third_cte AS (...) SELECT ... The chain ends with the main query — no comma before the final SELECT.
The dependency rule: a CTE can reference any CTE that was defined before it in the chain. second_cte can reference first_cte. third_cte can reference both first_cte and second_cte. But first_cte cannot reference second_cte — references must go forward only (except in recursive CTEs, which have their own rules). The main SELECT at the end can reference any CTE in the chain.
The naming scope rule: CTE names are scoped to the entire WITH block. A CTE name defined in the WITH block shadows any base table with the same name for the duration of the query — be careful not to accidentally reuse a base table name as a CTE alias. Column names within each CTE are independent — two CTEs can have columns with the same name without conflict, as long as the outer query references them with the correct CTE prefix. The practical pattern: define CTEs in the order of their dependencies — base computations first, enrichments next, final transformations last. The main SELECT reads the final result from the last CTE or assembles from multiple CTEs.
The "top N per group" query (find the top 2 orders per store, or the highest-paid employee per department) is typically solved with a window function (ROW_NUMBER or RANK) inside a CTE or derived table, followed by filtering on the rank in the outer query. The CTE and window function work together — the CTE enables filtering on the window function result.
A CTE is needed because window functions cannot be referenced in WHERE of the same query where they are defined — WHERE runs before SELECT in the logical execution order. The pattern: WITH ranked AS (SELECT *, ROW_NUMBER() OVER (PARTITION BY store_id ORDER BY total_amount DESC) AS rn FROM orders WHERE order_status = 'Delivered') SELECT * FROM ranked WHERE rn <= 2. The CTE computes the rank, the outer query filters by it.
A correlated subquery alternative exists (WHERE (SELECT COUNT(*) FROM orders o2 WHERE o2.store_id = o.store_id AND o2.total_amount > o.total_amount) < 2) but is generally slower for large tables — it runs once per outer row. The CTE + window function approach makes a single pass through the data with the window function, then one filtering pass — O(n log n) for the sort inside the window function, O(n) for the filter. For large tables, the CTE + window function approach is significantly faster than the correlated subquery alternative.
In PostgreSQL 12 and later, CTEs default to NOT MATERIALIZED behaviour — the optimiser treats the CTE as an inline view, exactly like a derived table. It can push predicates from the outer query into the CTE, reorder joins involving the CTE, and optimise the CTE's execution as part of the overall query plan. This is equivalent to writing the CTE as a derived table inline in the FROM clause.
MATERIALIZED forces the CTE to be computed once, stored in memory, and reused for all references. The CTE becomes a fence for optimisation — predicates from the outer query cannot be pushed into it. The result is computed in full and cached. This is the PostgreSQL behaviour before version 12 (and is still the default in some other databases).
When to use each: force MATERIALIZED when the CTE is referenced multiple times in the query and the computation is expensive — materialising once prevents the expensive computation from being repeated once per reference. Use MATERIALIZED when you want the CTE to be an optimisation boundary — for example, when you want to ensure the CTE computes its result on the full data before outer-query filters are applied (important for correctness when the CTE has side effects or when the optimiser is making bad decisions). Force NOT MATERIALIZED when the CTE has a highly selective outer query filter that the optimiser should be able to push in — for example, when the outer query filters by customer_id = 5 and you want the CTE to only scan orders for customer 5 rather than all customers. In practice, most CTEs work well with the default (NOT MATERIALIZED / inlined) behaviour — only reach for these hints when EXPLAIN ANALYZE shows the optimiser making a suboptimal choice.
In PostgreSQL, CTEs can contain DML statements (INSERT, UPDATE, DELETE) using the RETURNING clause to capture the affected rows. This enables atomic multi-step operations — the canonical example being archive-then-delete: delete rows from one table and simultaneously insert them into another, all in one transaction.
The pattern: WITH deleted_rows AS (DELETE FROM source_table WHERE condition RETURNING *) INSERT INTO archive_table SELECT *, NOW() AS archived_at FROM deleted_rows. The DELETE CTE runs first, capturing all deleted rows via RETURNING *. The main INSERT then uses those captured rows to populate the archive table. Because the entire statement executes in a single transaction, either both the DELETE and INSERT succeed together or neither does — there is no risk of deleting without archiving.
The critical constraint: DML CTEs can only appear in PostgreSQL (and SQL Server with different syntax) — MySQL does not support DML in CTEs. Also, a DML CTE always executes even if the main query references it zero times — it has side effects by definition. This is different from SELECT CTEs which are only computed when referenced. The archive-then-delete pattern is the safest approach for data retention cleanup jobs: wrap the statement in an explicit transaction for additional protection, log the operation to an audit table, and verify the RETURNING clause captures the expected rows before deploying to production.
// Part 11
Errors You Will Hit — And Exactly Why They Happen
🎯 Key Takeaways
- ✓A CTE (WITH clause) defines a named temporary result set before the main query. It reads top-to-bottom — each step builds on the previous — unlike nested subqueries which read inside-out.
- ✓CTEs can be referenced multiple times in the same query; derived tables cannot. This is the key advantage that makes CTEs superior for complex multi-step logic.
- ✓Multiple CTEs chain with commas. Each CTE can reference only CTEs defined before it — forward-only dependencies. The main SELECT can reference any CTE in the chain.
- ✓CTEs make complex queries independently testable: run each CTE as a standalone SELECT during development to verify its output before adding the next step.
- ✓CTEs support DML in PostgreSQL: a DELETE or UPDATE CTE with RETURNING captures the affected rows for use by the main query. Enables atomic archive-then-delete in a single statement.
- ✓PostgreSQL 12+ defaults to NOT MATERIALIZED (inline) CTEs — the optimiser can push predicates in. Use AS MATERIALIZED to force one-time computation when the CTE is expensive and referenced multiple times.
- ✓CTEs are the correct solution for filtering on window function results: compute the window function in the CTE, filter in the outer query WHERE. Window functions cannot be referenced in WHERE of the same query.
- ✓Naming matters: customer_spend, tier_averages, store_combined are informative. t1, t2, sub are not. Good CTE names make a complex query self-documenting.
- ✓CTE vs derived table decision: derived table for short, single-use, inline. CTE for multi-step, reused results, anything deserving a descriptive name.
- ✓CTEs do not improve performance by themselves — they are organisational tools. Use EXPLAIN ANALYZE to verify the query plan and add MATERIALIZED hints only when the optimiser is making provably suboptimal choices.
What comes next
In Module 42, you learn string functions — CONCAT, SUBSTRING, TRIM, UPPER, LOWER, REPLACE, LIKE pattern matching, REGEXP, and every text manipulation tool you need for real data cleaning and formatting.
Module 42 → String FunctionsDiscussion
0Have a better approach? Found something outdated? Share it — your knowledge helps everyone learning here.