Window Functions
Compute aggregates across related rows without collapsing them — ROW_NUMBER, RANK, DENSE_RANK, NTILE, LAG, LEAD, running totals, moving averages, and the PARTITION BY / ORDER BY frame
// Part 01
What Window Functions Are — and Why They Change Everything
Standard aggregate functions (SUM, AVG, COUNT) collapse a group of rows into one row. This is useful but limiting — you can compute the total revenue per store, but you lose the individual order rows. Window functions break this constraint. They compute an aggregate across a set of related rows while keeping every row in the result.
This single capability — computing group-level metrics alongside individual row data — unlocks an entire class of analytical queries that are either impossible or extremely verbose without window functions: ranking rows within groups, computing running totals, finding each row's value relative to the previous row, and identifying what percentage of the total each row represents.
// Part 02
The OVER() Clause — Defining the Window
The OVER() clause is the heart of every window function. It defines three things: which rows belong to the same group (PARTITION BY), how those rows are ordered within the group (ORDER BY), and which subset of the ordered rows to include in the calculation (frame clause — ROWS or RANGE BETWEEN).
// Part 03
Ranking Functions — ROW_NUMBER, RANK, DENSE_RANK, NTILE
ROW_NUMBER — unique sequential number per partition
RANK vs DENSE_RANK — handling ties differently
NTILE — split into equal buckets
// Part 04
Top-N Per Group — The Essential Pattern
The most frequently used window function pattern in production analytics: find the top N rows within each group. ROW_NUMBER assigns ranks within each partition; a wrapping query filters to the top N. This is cleaner and more flexible than any alternative.
// Part 05
Aggregate Window Functions — SUM, AVG, COUNT with OVER()
All standard aggregate functions work as window functions when paired with OVER(). They compute the aggregate over the defined window while keeping every row. This is the mechanism for running totals, percentage-of-total, and comparing each row to its group's aggregate.
Running total and running average
Percentage of total
Comparing each row to its group aggregate
// Part 06
LAG and LEAD — Accessing Adjacent Rows
LAG and LEAD access a value from a row that is N positions before or after the current row within the window. They are the essential functions for period-over-period comparison — comparing each row to the previous row without a self-join.
// Part 07
Moving Averages — The Frame Clause
The frame clause defines which rows within the ordered partition are included in the window function calculation for the current row. It is the mechanism for rolling/moving calculations — a 7-day moving average, a 3-row moving sum, a cumulative total.
// Part 08
FIRST_VALUE, LAST_VALUE, NTH_VALUE — Value Navigation
These functions return specific values from the ordered window — the first, last, or Nth value — making it easy to attach a reference value (like the store's first order amount) to every row in the partition.
// Part 09
WINDOW Alias — Reusing Window Definitions
When multiple window functions in the same query share the same OVER() definition, the WINDOW clause names the window once and each function references it by name — eliminating repetition and making the query shorter and easier to maintain.
// Part 10
What This Looks Like at Work
You are a data analyst at Shopify. The growth team needs a weekly seller performance report for the last month: each seller's daily revenue, their 7-day moving average, their rank by revenue within their city, how their current week compares to last week (WoW change), and their cumulative revenue for the month. Without window functions this would require multiple queries and a complex join. With window functions it is one query.
🎯 Pro Tip
When a query needs multiple window functions on the same data, group them into a CTE that pre-aggregates to the right granularity first (daily store revenue in the example above). Then apply all window functions in the outer SELECT. This prevents the window functions from running on the raw row-level data and ensures each window computation operates on the correct granularity.
// Part 11
Interview Prep — 5 Questions With Complete Answers
A window function computes an aggregate or ranking value across a set of related rows — the window — while preserving every row in the result. GROUP BY collapses rows into one row per group, losing the individual row detail. Window functions never collapse rows — the result set has the same number of rows as the input, with the computed window value added as an extra column alongside the original data.
The defining syntax is the OVER() clause. Every window function requires OVER() — it is what makes the function "windowed." Inside OVER(), PARTITION BY defines which rows form the window for each row (analogous to GROUP BY), ORDER BY defines the order within the partition, and the optional frame clause defines which subset of the ordered partition to include. A bare OVER() with nothing inside uses all rows as the single window.
The power this unlocks: questions that aggregate data and need row-level detail simultaneously. "What is each order's value as a percentage of its store's total revenue?" cannot be answered with GROUP BY because after grouping, the individual orders are gone. With a window function — total_amount / SUM(total_amount) OVER (PARTITION BY store_id) — each order row retains its total_amount AND gains the store-level sum for comparison, all in the same query without subqueries or joins.
All three assign position numbers to rows within a window, but they handle ties differently. ROW_NUMBER assigns a unique sequential integer to every row — even if two rows have identical values in the ORDER BY column, they get different numbers. The ordering of ties is arbitrary (depends on physical row order) but the numbers are always unique: 1, 2, 3, 4. ROW_NUMBER is used when you need unique positions — for deduplication (keep rn = 1 per group) or pagination.
RANK assigns tied rows the same rank but then skips the next rank(s). If two rows tie for rank 2, both get rank 2 and the next row gets rank 4 (rank 3 is skipped). The sequence has gaps: 1, 2, 2, 4, 5. RANK mirrors how sports competitions work — two silver medalists means no bronze. DENSE_RANK also gives tied rows the same rank, but the next rank does not skip — 1, 2, 2, 3, 4. Dense rank is used when gaps are confusing or when you want to count distinct rank levels.
Practical choice: use ROW_NUMBER for deduplication and top-N filtering (WHERE rn = 1 gives exactly one row per group). Use RANK for competitive leaderboards where gaps signal the tied count (two rows at rank 1 followed by rank 3 shows there was no second place). Use DENSE_RANK for tier classification where you want consecutive tier numbers regardless of ties. In a query like "find the top 3 revenue stores" — with RANK, if two stores tie for first, WHERE rank <= 3 returns 4 stores (both first-place stores plus two more). With ROW_NUMBER, exactly 3 stores are returned but the tie-breaking is arbitrary.
LAG(column, offset, default) returns the value of column from the row offset positions before the current row within the window. LEAD(column, offset, default) returns the value from offset positions after the current row. The offset defaults to 1 (immediate predecessor/successor). The default parameter specifies what to return when no preceding or following row exists (the first row has no predecessor for LAG, the last row has no successor for LEAD) — if omitted, NULL is returned.
Both require ORDER BY in the OVER clause — otherwise "previous" and "next" have no defined meaning. PARTITION BY is optional: without it, the entire result set is the window; with it, the preceding/following row is within the same partition (so LAG on a customer-partitioned window gives the previous order for the same customer, not the globally previous order).
Primary use cases: period-over-period comparison — LAG(revenue) OVER (ORDER BY month) gives last month's revenue on the same row as this month's, enabling revenue - LAG(revenue) as the month-over-month change without a self-join. Customer order gap analysis — LAG(order_date) OVER (PARTITION BY customer_id ORDER BY order_date) gives the previous order date for each customer, enabling order_date - LAG(order_date) as the gap in days between orders. Sequential event analysis — detecting state changes (IF NEW.status != LAG(status) OVER (...) THEN ...), finding first occurrences, and computing durations between events. LAG and LEAD eliminate the verbose self-join alternative for all of these patterns.
The frame clause (ROWS BETWEEN ... AND ... or RANGE BETWEEN ... AND ...) defines which rows within the ordered partition are included in the window function calculation for the current row. Without a frame clause, the default is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW when ORDER BY is present — which is usually the intended behaviour for cumulative calculations but not for rolling windows.
ROWS uses physical row offsets — ROWS BETWEEN 6 PRECEDING AND CURRENT ROW includes exactly the 7 rows: the current row and the 6 immediately preceding rows, regardless of their column values. RANGE uses value-based offsets — RANGE BETWEEN INTERVAL '6 days' PRECEDING AND CURRENT ROW includes all rows whose ORDER BY column value is within 6 days of the current row's value. RANGE is useful for time-series where "the last 7 days" should include all rows with dates in that range, not exactly 7 rows.
The practical difference becomes important with duplicate ORDER BY values. If 5 orders have the same order_date, ROWS BETWEEN 0 PRECEDING AND CURRENT ROW includes only the current row. RANGE BETWEEN CURRENT ROW AND CURRENT ROW includes all 5 rows with the same date (because they are all within 0 value-distance of the current row). This causes LAST_VALUE to behave unexpectedly with the default frame — LAST_VALUE with the default ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW returns the current row's value (because the current row is the last row in the frame), not the last row of the partition. The fix: explicitly specify ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING to include all rows in the partition.
Use ROW_NUMBER (or RANK) inside a subquery or CTE to assign a rank to each order within its store partition, then filter in the outer query for ranks 1 through 3. The window function cannot be referenced in WHERE of the same query — window functions are evaluated at the SELECT stage, after WHERE. The wrapping subquery makes the rank value a regular column that the outer WHERE can filter.
Pattern: WITH ranked AS (SELECT *, ROW_NUMBER() OVER (PARTITION BY store_id ORDER BY total_amount DESC) AS rn FROM orders WHERE order_status = 'Delivered') SELECT store_id, order_id, order_date, total_amount, rn FROM ranked WHERE rn <= 3 ORDER BY store_id, rn.
Choosing between ROW_NUMBER and RANK for this pattern: ROW_NUMBER always returns exactly 3 rows per store — if two orders tie for the third position, one is arbitrarily included and one is excluded. RANK returns 3 or more rows — if two orders tie for third, both get rank 3 and both are included (WHERE rank <= 3 returns 4 rows for that store). Use ROW_NUMBER when you need exactly N rows per group. Use RANK when ties should be included (a "top 3 by revenue" report where two stores are equally third should show both). DENSE_RANK is rarely needed for this pattern but useful when the rank number itself is meaningful to display (a ranking table where ranks are shown as 1, 2, 3 even with ties).
// Part 12
Errors You Will Hit — And Exactly Why They Happen
🎯 Key Takeaways
- ✓Window functions compute aggregates across related rows without collapsing them — every input row remains in the result, with the computed window value added as an extra column.
- ✓The OVER() clause defines the window: PARTITION BY splits rows into groups (like GROUP BY but rows are kept), ORDER BY orders rows within each partition, and the frame clause selects a subset of the ordered partition.
- ✓ROW_NUMBER: unique sequential integers — no ties. RANK: shared rank with gaps after ties (1,1,3). DENSE_RANK: shared rank without gaps (1,1,2). NTILE(n): assigns rows to n equal-size buckets.
- ✓Top-N per group pattern: ROW_NUMBER() OVER (PARTITION BY grp ORDER BY col DESC) AS rn in a subquery or CTE, then WHERE rn <= N in the outer query. Window functions cannot be filtered in WHERE of the same query.
- ✓LAG(col, n) returns the value n rows before the current row. LEAD(col, n) returns the value n rows after. Both require ORDER BY to define before/after. Use them for period-over-period comparisons without self-joins.
- ✓Frame clause: ROWS BETWEEN N PRECEDING AND CURRENT ROW for rolling windows. ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW for cumulative totals. Always use ROWS not RANGE for predictable behaviour with duplicate ORDER BY values.
- ✓LAST_VALUE requires explicit ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING — without it, the default frame ends at CURRENT ROW and LAST_VALUE returns the current row's own value.
- ✓WINDOW alias: define a named window once with WINDOW w AS (...) and reference it as OVER w across multiple functions — eliminates duplication and makes complex queries readable.
- ✓Aggregate functions (SUM, AVG, COUNT, MIN, MAX) all work as window functions with OVER(). SUM() OVER (PARTITION BY store ORDER BY date ROWS UNBOUNDED PRECEDING) is a running total.
- ✓Pre-aggregate to the right granularity in a CTE before applying window functions. Window functions on raw row-level data when you need daily totals will produce wrong results — group first, then window.
What comes next
In Module 53, you learn Ranking Functions in depth — advanced RANK patterns, percentile ranks, conditional rankings, multi-level rankings, and every production scenario where ranking drives business decisions.
Module 53 → Ranking FunctionsDiscussion
0Have a better approach? Found something outdated? Share it — your knowledge helps everyone learning here.