Derived Tables
Subqueries in FROM as reusable virtual tables — pre-aggregation, multi-step analytics, fan-out prevention, filtering before joining, and when to choose derived tables over CTEs
// Part 01
What a Derived Table Is
A derived table is a subquery that appears in the FROM clause — treated by the outer query exactly like a regular table. It has columns, it has rows, it can be JOINed, filtered, grouped, and aggregated. The difference from a base table is that it is computed on the fly from a SELECT — it does not exist in storage, only in memory during query execution.
The term "derived table" and "inline view" mean the same thing. In some contexts people also call it a "subquery in FROM." All three phrases describe the same construct: a SELECT inside parentheses in the FROM clause, given a mandatory alias.
// Part 02
Why Derived Tables Exist — The Two-Step Problem
Many analytical questions require two steps that cannot be expressed in a single SELECT. Step 1 computes something — a per-group aggregate, a filtered subset, a ranked result. Step 2 uses that computed result to filter, join, or aggregate further. Without derived tables (or CTEs), you would need two separate queries and manual data transfer between them.
The canonical example: find customers whose total spend is above the 75th percentile. Step 1 computes total spend per customer. Step 2 computes the 75th percentile of those totals. Step 3 filters customers above it. All three steps in one query require derived tables or CTEs.
The problem without derived tables
// Part 03
Pre-Aggregation — The Most Common Pattern
Pre-aggregation is the most frequent use of derived tables: aggregate a table first, then join the aggregated result to other tables. This prevents the fan-out problem (Module 31) and ensures aggregates are computed at the correct granularity before joining.
Per-store revenue joined to store details
Per-product revenue joined to product catalogue
// Part 04
Two-Level Aggregation — Aggregate of Aggregates
SQL does not allow directly aggregating an aggregate: AVG(SUM(col)) is not valid syntax. A derived table solves this — compute the inner aggregate in the derived table, then aggregate those results in the outer query.
// Part 05
Pre-Filtering — Reduce Rows Before Joining
A derived table can filter a table before it is joined — reducing the number of rows the join must process. This is particularly useful when only a small subset of a large table is relevant to the join, and pushing the filter into a derived table allows the database to reduce rows early.
// Part 06
Fan-Out Prevention — The Critical Use Case
The fan-out problem (Module 31): joining a one-to-many relationship before aggregating causes the aggregate to count the "one" side multiple times. Derived tables prevent this by aggregating the "many" side first — before the join — so the join is one-to-one at the order level.
The fan-out bug and its fix
// Part 07
Derived Tables as Ranking Sources
Derived tables that use window functions to compute rankings can then be filtered in the outer query — the classic "filter by rank" pattern. This is how you find the top-N per group cleanly.
// Part 08
Multiple Derived Tables in One Query
A single outer query can join multiple derived tables — each one a pre-computed virtual table. This allows assembling a complex report from several independent pre-aggregations without nested CTEs.
// Part 09
Derived Tables vs CTEs — When to Use Each
Derived tables and CTEs (WITH clause) produce identical results — both create named intermediate result sets. The choice between them is about readability, reuse, and style.
| Criterion | Derived Table | CTE |
|---|---|---|
| Definition location | Inline — inside FROM or JOIN | Before the main query — WITH clause |
| Reusability in same query | Cannot be referenced more than once | Can be referenced multiple times |
| Readability | Harder to read when complex (nested) | Cleaner — each step named at the top |
| Self-referencing | Cannot reference itself | Can (recursive CTEs) |
| Naming | Anonymous until aliased inline | Named at definition, used by name throughout |
| Performance | Identical — same execution plan | May be materialised in PostgreSQL (controllable) |
| Best for | Short, single-use transformations; when CTE overhead feels excessive | Multi-step logic; reused results; self-documenting complex queries |
Same query — derived table vs CTE
Both produce identical results. The derived table version is more compact — good for simple pre-aggregations. The CTE version is more readable for complex logic — each step is named and the main SELECT reads clearly. The decision is style: use whichever makes the query easier for the next person to understand and maintain.
🎯 Pro Tip
The practical rule: if the subquery in FROM is 3 lines or fewer and used only once, keep it inline as a derived table. If it is longer, used more than once, or benefits from a descriptive name that documents what it computes, extract it to a CTE. When in doubt, prefer CTE — the naming overhead is small and the readability benefit is large.
// Part 10
What This Looks Like at Work
You are a data analyst at Gopuff, a quick-commerce platform. The operations team needs a daily metrics dashboard that shows, for each delivery partner (adapted here as FreshCart store): their total deliveries, average delivery time, their fastest and slowest delivery, how they compare to the overall average, and their rank among all partners. This requires multiple derived tables at different aggregation levels.
🎯 Pro Tip
CROSS JOIN to a single-row derived table is the standard SQL pattern for "broadcast a single computed value to every row." The single-row subquery (SELECT AVG(...) FROM ...) returns one row, and CROSS JOIN attaches it to every row of the other table — making the global average available for comparison in every row without a correlated subquery.
// Part 11
Interview Prep — 5 Questions With Complete Answers
A derived table is a subquery that appears in the FROM clause of an outer query. The database computes the subquery first and presents its result as a virtual table that the outer query can JOIN, filter, aggregate, and otherwise treat exactly like a base table. The derived table exists only for the duration of the query — it is not stored in the database.
Three mandatory rules: First, the subquery must be enclosed in parentheses. Second, the derived table must have an alias — FROM (SELECT ...) AS alias_name. Without the alias, the query produces a syntax error. Third, the outer query must reference the derived table by its alias — any column the outer query wants to use must be included in the derived table's SELECT list.
The practical rules: derived tables cannot be referenced more than once in the same query (use a CTE for reuse), they cannot reference each other (derived table A cannot reference derived table B in the same FROM clause — use CTEs for sequential dependencies), and they cannot contain ORDER BY in most databases unless LIMIT is also specified. Beyond these restrictions, derived tables are fully functional SQL — they can contain JOINs, GROUP BY, HAVING, window functions, and any other SQL construct. The power comes from their composability: the outer query treats the result of any arbitrary SELECT as a clean, queryable table.
A correlated subquery in WHERE executes once per outer row — N executions for N outer rows. This can be O(n²) in complexity for large tables. A derived table in FROM is computed once — it runs the subquery a single time, produces a result set, and the outer query joins to that result. For aggregations that would otherwise be correlated, the derived table approach is dramatically faster at scale.
The specific case where derived table beats correlated subquery: "for each employee, find those whose salary is above the department average." Correlated subquery runs SELECT AVG(salary) FROM employees WHERE department = e.department once per employee — N subquery executions. Derived table runs SELECT department, AVG(salary) FROM employees GROUP BY department once total — one execution — and the outer query JOINs employees to these pre-computed averages. For 10,000 employees across 20 departments: correlated = 10,000 subquery executions, derived table = 1 execution producing 20 rows, then one JOIN pass through 10,000 employee rows.
Additional reasons to prefer derived table over correlated subquery: when the outer query needs multiple columns from the subquery result (correlated scalar subquery returns one value — you need one correlated subquery per column; derived table provides all columns at once), when the same computation is needed in multiple places in the outer query (derived table computes it once and it is reusable via alias; correlated subquery must be repeated), and when the subquery logic is complex enough that seeing it inline in WHERE hurts readability. Use correlated subquery when the logic is simple, involves existence checks (EXISTS), or when the per-row dependency is genuinely required and cannot be expressed as a join.
The fan-out problem occurs when joining a one-to-many relationship before aggregating — each "one" side row is duplicated once per "many" side row, causing aggregates on the one-side column to count the same value multiple times. SUM(orders.total_amount) after joining orders to order_items triple-counts each order's total_amount for an order with three items.
The derived table solution: pre-aggregate the many-side table before joining. Instead of joining orders to order_items and then aggregating, first aggregate order_items to order-level metrics in a derived table (one row per order_id with SUM(line_total) and SUM(quantity)), then join that one-row-per-order result to orders. The resulting join is one-to-one at the order level — no fan-out, no inflated aggregates.
Implementation: SELECT o.store_id, SUM(o.total_amount) AS revenue, SUM(item_stats.total_qty) AS total_items FROM orders AS o JOIN (SELECT order_id, SUM(quantity) AS total_qty FROM order_items GROUP BY order_id) AS item_stats ON o.order_id = item_stats.order_id GROUP BY o.store_id. The derived table item_stats has exactly one row per order — the join is safe. Using COUNT(DISTINCT o.order_id) instead of COUNT(*) is the other approach when pre-aggregation is not possible — it counts unique orders rather than order-item rows. The derived table approach is preferred when multiple many-side metrics are needed, as it is cleaner and more efficient.
Window functions are computed at the SELECT stage of query execution — after FROM, WHERE, GROUP BY, and HAVING have already been processed. WHERE runs before window functions are computed, so referencing a window function result in WHERE is a reference to a value that does not yet exist. This is why WHERE rank_in_store <= 2 fails — rank_in_store is a window function result that has not been computed when WHERE evaluates.
The derived table solution: compute the window function in an inner query, then filter on its result in the outer query's WHERE. The inner query: SELECT *, RANK() OVER (PARTITION BY store_id ORDER BY total_amount DESC) AS rank_in_store FROM orders. This produces a result set where every row has its rank attached. The outer query: SELECT * FROM (inner_query) AS ranked WHERE rank_in_store <= 2. The outer WHERE now filters on rank_in_store which exists as a column in the derived table — it was computed in the inner query and is just a regular column from the outer query's perspective.
This pattern — window function in derived table, filter in outer WHERE — is one of the most common uses of derived tables. It appears every time you need to filter by rank, percentile, running total threshold, or any other window-computed value. The same pattern works for HAVING (cannot filter on window function results in HAVING of the same query) and for ORDER BY (though ORDER BY can reference window function columns in the same query without a derived table). An equivalent approach uses CTEs: WITH ranked AS (SELECT *, RANK() OVER (...) AS rn FROM orders) SELECT * FROM ranked WHERE rn <= 2. Both derived table and CTE are correct — the CTE version is more readable for complex ranking queries.
A derived table is defined inline within the FROM clause — it appears as a parenthesised subquery immediately where it is used. A CTE (Common Table Expression) is defined before the main query using the WITH keyword — it is given a name and can be referenced by that name anywhere in the subsequent query. Both produce the same intermediate result set and both are computed at query execution time (not stored).
The key functional differences: CTEs can be referenced multiple times in the same query; a derived table can only be referenced once at the location it is defined. CTEs can be recursive (WITH RECURSIVE) enabling hierarchical queries; derived tables cannot. CTEs read top-to-bottom in logical order — each step builds on the previous; derived tables are read inside-out which can be harder to follow for complex logic. In PostgreSQL, CTEs are materialised by default (computed once and cached) while derived tables are typically inlined into the main query plan; this difference is controllable with the MATERIALIZED and NOT MATERIALIZED hints.
Choosing between them: use a derived table when the subquery is short (under 5 lines), used exactly once, and the inline position makes the query's logic clear — a simple pre-aggregation that is directly joined to one other table. Use a CTE when the intermediate result has a meaningful name that documents what it computes, when the same result is referenced more than once, when the logic has multiple sequential steps where each builds on the previous, or when the derived table would be long enough to obscure the outer query's logic. For simple cases, derived tables are more compact. For complex analytical queries, CTEs produce code that is significantly easier to read, review, and maintain.
// Part 12
Errors You Will Hit — And Exactly Why They Happen
🎯 Key Takeaways
- ✓A derived table is a subquery in the FROM clause — treated by the outer query as a virtual table. Every derived table must have an alias. No alias = syntax error.
- ✓Pre-aggregation is the primary use: compute per-group aggregates in a derived table first, then join the aggregated result to other tables. This prevents fan-out and ensures correct aggregation granularity.
- ✓Two-level aggregation — AVG(SUM(col)) — is impossible in one flat SELECT but clean with a derived table: inner query computes the SUM per group, outer query AVGs those sums.
- ✓Derived tables prevent the fan-out bug: aggregate the many-side (order_items) to match the one-side granularity (orders) before joining. The derived table has one row per order — no row multiplication.
- ✓Window function results cannot be filtered in WHERE of the same query. Wrap the window function in a derived table, then filter on the result column in the outer WHERE.
- ✓CROSS JOIN to a single-row derived table broadcasts a global value (overall average, total count) to every row — the clean alternative to a correlated scalar subquery.
- ✓Multiple derived tables can be joined in one outer query: three derived tables joined to stores gives per-store order stats, employee stats, and product variety — all in one query.
- ✓Derived table vs CTE: derived tables are inline and compact — best for short, single-use transformations. CTEs are named and reusable — best for multi-step logic, repeated references, and self-documenting complex queries.
- ✓Derived tables cannot reference each other in the same FROM clause (use CTEs for sequential dependencies) and cannot be referenced more than once (use CTEs for reuse).
- ✓Always verify derived table row count: SELECT COUNT(*) FROM (derived table subquery). The count should equal the expected granularity — one row per store, per customer, per product. More rows than expected signals a join or missing GROUP BY inside the derived table.
What comes next
In Module 41, you learn the String Function in full depth — syntax, multi-step composition, reuse within one query, performance considerations, and every pattern where CTEs outshine derived tables and subqueries.
Module 41 → String FunctionDiscussion
0Have a better approach? Found something outdated? Share it — your knowledge helps everyone learning here.