Python · SQL · Web Dev · Java · AI/ML tracks launching soon — your one platform for all of IT
Advanced

Analytics with LAG and LEAD

Period-over-period comparison, retention analysis, funnel drop-off, session gap detection, first and last event patterns — every time-series analytics pattern built on offset window functions

14–20 min April 2026
Section 11 · Window Functions & CTEs
Window Functions & CTEs · 5 modulesModule 54

// Part 01

The Core Insight — Comparing Rows Without Self-Joins

Before window functions, comparing each row to the previous row required a self-join — joining the table to itself on a relationship like "previous order date for the same customer." Self-joins are verbose, error-prone, and slow on large tables because they force a Cartesian product before filtering.

LAG and LEAD replace every self-join for adjacent-row comparison with a single function call. LAG accesses the value from N rows before the current row within the window. LEAD accesses the value from N rows after. The window (PARTITION BY + ORDER BY) defines what "before" and "after" mean — previous order for the same customer, previous day's revenue for the same store, previous status in the same event sequence.

LAG / LEAD — complete syntax
-- LAG: value from N rows BEFORE current row in the window
LAG(column)                    OVER (PARTITION BY grp ORDER BY col)
LAG(column, 1)                 OVER (...)  -- same as above (offset=1 is default)
LAG(column, 2)                 OVER (...)  -- 2 rows back
LAG(column, 1, 0)              OVER (...)  -- default value 0 when no prior row
LAG(column, 1, column)         OVER (...)  -- default = current value (no change signal)

-- LEAD: value from N rows AFTER current row in the window
LEAD(column)                   OVER (PARTITION BY grp ORDER BY col)
LEAD(column, 1, 'end')         OVER (...)  -- default 'end' when no following row

-- NULL handling:
-- First row in partition: LAG returns NULL (no prior row)
-- Last row in partition:  LEAD returns NULL (no following row)
-- Use the third argument (default) to substitute NULL with a meaningful value

-- Both require ORDER BY — without it, "before" and "after" are undefined
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

// Part 02

Period-Over-Period Comparison — MoM, WoW, YoY

Period-over-period (PoP) comparison is the most common use of LAG in production analytics. Every business dashboard shows "this month vs last month" — LAG makes this a single expression rather than a self-join or two separate subqueries.

Month-over-month revenue change

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

Week-over-week per store

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

Same period last year — LAG with offset 12 (monthly data)

Year-over-year comparison — LAG with offset
-- YoY: compare to the same month 12 periods ago
WITH monthly AS (
  SELECT
    DATE_TRUNC('month', order_date)::DATE   AS month_start,
    ROUND(SUM(total_amount), 2)             AS revenue
  FROM orders WHERE order_status = 'Delivered'
  GROUP BY DATE_TRUNC('month', order_date)
)
SELECT
  month_start,
  revenue,
  -- LAG with offset 12 = same month last year
  LAG(revenue, 12) OVER (ORDER BY month_start)     AS same_month_last_year,
  ROUND(
    (revenue - LAG(revenue, 12) OVER (ORDER BY month_start))
    / NULLIF(LAG(revenue, 12) OVER (ORDER BY month_start), 0) * 100
  , 1)                                              AS yoy_pct
FROM monthly
ORDER BY month_start;
-- Requires at least 13 months of data for YoY to be non-NULL

// Part 03

Retention Analysis — Did the Customer Come Back?

Retention analysis answers "of the customers who ordered in period N, how many ordered again in period N+1?" LAG and LEAD make this query straightforward — for each customer's order, LEAD shows their next order date, enabling cohort retention measurement without complex subqueries.

Days to next order — repeat purchase gap

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

Cohort retention — first order month vs return rate

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

// Part 04

Session Gap Detection — Finding Breaks in Event Streams

Session gap detection identifies when a sequence of events has a meaningful gap — defining "sessions" in clickstream data, finding delivery delays, or flagging unusual pauses in order streams. The pattern: use LAG to compute the gap between consecutive events, then flag gaps above a threshold.

Detecting gaps in order sequences

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

Customer order gap analysis — churn signal detection

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

// Part 05

First and Last Event Patterns — Identifying Boundaries

Identifying the first and last occurrence of an event per entity is a foundational analytics pattern. Combined with LAG and LEAD, it powers acquisition analysis (first order), churn detection (last order), and event sequence boundaries.

First order detection — is this the customer's first?

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

First-order uplift — do first orders differ from repeat orders?

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

// Part 06

State Change Detection — Tracking Status Transitions

State change detection identifies when a row's categorical value changes — order status transitions, loyalty tier upgrades, product availability changes. The pattern: LAG the status column and compare to the current value. When they differ, a transition occurred.

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

Loyalty tier upgrade detection

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

// Part 07

Funnel Analysis — Drop-off Between Steps

Funnel analysis measures how many users/customers complete each step in a sequence and where drop-off occurs. LAG and LEAD make it easy to compare adjacent funnel steps without multiple subqueries.

Funnel analysis pattern — drop-off between steps
-- Define funnel steps with a CTE, then use LAG to compute drop-off
WITH funnel_steps AS (
  SELECT 1 AS step_num, 'Visited' AS step_name, 1000 AS users
  UNION ALL SELECT 2, 'Added to cart', 650
  UNION ALL SELECT 3, 'Started checkout', 420
  UNION ALL SELECT 4, 'Entered payment', 310
  UNION ALL SELECT 5, 'Completed order', 245
)
SELECT
  step_num,
  step_name,
  users,
  -- Previous step's users
  LAG(users) OVER (ORDER BY step_num)              AS prev_step_users,
  -- Drop-off count
  LAG(users) OVER (ORDER BY step_num) - users      AS dropped_off,
  -- Drop-off rate from previous step
  ROUND(
    (LAG(users) OVER (ORDER BY step_num) - users)::NUMERIC
    / NULLIF(LAG(users) OVER (ORDER BY step_num), 0) * 100
  , 1)                                             AS dropoff_pct,
  -- Conversion rate from step 1 (top of funnel)
  ROUND(users::NUMERIC / FIRST_VALUE(users) OVER (
    ORDER BY step_num
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  ) * 100, 1)                                      AS overall_conversion_pct
FROM funnel_steps
ORDER BY step_num;
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

// Part 08

Running Comparisons — Each Row vs Its Previous N Rows

Beyond simple one-step LAG, production analytics often needs to compare a value to an average or sum of the previous N values — a baseline computed from the recent history. The combination of LAG-based values and frame-based window aggregates handles this cleanly.

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

// Part 09

Islands and Gaps — Consecutive Sequence Analysis

The islands and gaps problem: given a sequence of events, identify consecutive runs ("islands") and the breaks between them ("gaps"). Classic examples: consecutive days a store was active, runs of increasing revenue, streaks of on-time deliveries. The LAG-based solution is elegant.

Consecutive active days per store

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

🎯 Pro Tip

The islands-and-gaps pattern always follows the same structure: (1) LAG to compute the gap from the previous row, (2) flag rows where gap > threshold as island starts, (3) cumulative SUM of the flag gives each island a unique number, (4) GROUP BY the island number to summarise each island. This pattern works for any "consecutive run" problem — daily activity streaks, consecutive revenue increases, runs of on-time delivery.

// Part 10

What This Looks Like at Work

You are a data analyst at DoorDash. The retention team needs a comprehensive customer health report: for each customer who placed their most recent order within the last 60 days, show their order history summary, their average gap between orders, whether their gaps are increasing or decreasing (churn signal), and their predicted next order window based on their typical gap.

11:00 AM
Requirements: customer order cadence report
Most recent order date, total orders, avg gap between orders, gap trend (increasing/stable/decreasing), and predicted next order window. Only customers whose last order was within 60 days.
11:25 AM
Design: LAG chain computes gaps, trend uses LAG of LAG
Gap = order_date - LAG(order_date). Gap trend = latest gap vs average of previous gaps. Prediction = last order date + avg gap.
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
12:20 PM
Report delivered — actionable retention signals
The LAG-powered cadence report gives the retention team three actionable segments: customers ordering faster (reward them), customers ordering slower (trigger a re-engagement campaign), and at-risk customers (urgent outreach). The predicted next order date helps the team time outreach optimally — contact customers whose predicted date has passed.

🎯 Pro Tip

When building predictive signals from order gaps, always filter to customers with at least 2-3 orders before computing gap trends. A customer with one order has no gap; a customer with two orders has one gap but no trend. Meaningful gap trends require at least 3 orders (two gaps to compare). Add a WHERE total_orders >= 3 filter to the final report to avoid misleading signals from customers with limited history.

// Part 11

Interview Prep — 5 Questions With Complete Answers

Q: How do you compute month-over-month growth using SQL window functions?

The pattern: aggregate data to monthly granularity in a CTE using DATE_TRUNC('month', date), then apply LAG(revenue) OVER (ORDER BY month_start) in the outer SELECT to access the previous month's value. The MoM change is current_revenue - LAG(revenue), and the growth percentage is (current - previous) / NULLIF(previous, 0) * 100.

Full implementation: WITH monthly AS (SELECT DATE_TRUNC('month', order_date)::DATE AS month_start, ROUND(SUM(total_amount), 2) AS revenue FROM orders WHERE order_status = 'Delivered' GROUP BY DATE_TRUNC('month', order_date)) SELECT month_start, revenue, LAG(revenue) OVER (ORDER BY month_start) AS prev_revenue, ROUND(revenue - LAG(revenue) OVER (ORDER BY month_start), 2) AS mom_change, ROUND((revenue - LAG(revenue) OVER (ORDER BY month_start)) / NULLIF(LAG(revenue) OVER (ORDER BY month_start), 0) * 100, 1) AS mom_pct FROM monthly ORDER BY month_start.

Important details: NULLIF(LAG(revenue), 0) prevents division by zero if a prior month had zero revenue. The first month's LAG returns NULL — handle this in the display (show '— First period' instead of a percentage). For per-entity PoP (per store, per product), add PARTITION BY entity_id to the OVER clause — LAG then looks at the previous month for the same entity, not the globally previous month. For year-over-year using monthly data, use LAG(revenue, 12) OVER (ORDER BY month_start) — the offset 12 reaches back 12 months.

Q: How do you detect if a customer is churning using SQL?

Churn detection combines LAG (to compute gaps between orders) with aggregation (to compare recent gaps to historical averages). The pattern: for each customer's orders, compute the gap in days between consecutive orders using order_date - LAG(order_date) OVER (PARTITION BY customer_id ORDER BY order_date). Then aggregate per customer: the latest gap, the average historical gap, and the ratio between them.

A customer shows churn signals when their latest gap significantly exceeds their historical average: latest_gap > avg_gap * 1.5 is "at risk," latest_gap > avg_gap * 2 is "likely churned." Customers whose last order date is beyond their predicted next order date (last_order_date + avg_gap < CURRENT_DATE) are overdue. Combining these signals produces an actionable churn risk score: Green (ordering within normal cadence), Amber (gap longer than usual), Red (significantly overdue).

A complete signal: for customers whose last order was more than 90 days ago AND whose last gap is more than twice their average gap, send a re-engagement campaign. Use LEAD instead of LAG for forward-looking analysis — LEAD(order_date) OVER (...) on the most recent order returns NULL (no next order yet), which is itself a churn signal for customers whose last gap has exceeded their typical cadence. Filter to customers with at least 3 orders before computing trends — fewer orders produce unreliable gap averages.

Q: What is the islands and gaps problem and how do you solve it with LAG?

The islands and gaps problem identifies consecutive runs of events (islands) separated by breaks (gaps). Examples: consecutive days a user was active, consecutive months of positive revenue growth, consecutive on-time deliveries. The challenge is grouping consecutive rows into the same island when there is no explicit island identifier in the data.

The LAG-based solution in four steps. Step 1: use LAG to compute the gap between each row and its predecessor — active_date - LAG(active_date) OVER (PARTITION BY entity ORDER BY active_date). Step 2: flag rows where the gap exceeds the threshold as island starts — CASE WHEN gap > 1 OR LAG is NULL THEN 1 ELSE 0 END AS is_island_start. Step 3: compute a cumulative sum of the island start flag using SUM(is_island_start) OVER (PARTITION BY entity ORDER BY active_date) — each island gets a unique incrementing number. Step 4: GROUP BY the island number to summarise each consecutive run — MIN(date) is the island start, MAX(date) is the island end, COUNT(*) is the island length.

The elegance of this approach: the cumulative SUM of the is_island_start flag increments exactly once per island. All rows in the same consecutive run share the same cumulative sum value — the same island number. Any non-consecutive gap increments the flag, creating a new island number for all subsequent rows. This pattern generalises to any consecutive-run problem: consecutive days of positive revenue (flag when revenue turns negative or date jumps), consecutive on-time deliveries (flag when a late delivery occurs), consecutive months of sales growth (flag when growth rate goes negative).

Q: How do you compute the average gap between a customer's orders?

Use LAG to compute the gap for each order, then aggregate per customer. The gap for each order is order_date - LAG(order_date) OVER (PARTITION BY customer_id ORDER BY order_date, order_id). The first order per customer has LAG = NULL (no prior order) — this NULL is excluded from AVG automatically, so AVG correctly computes the average gap only across orders that have a preceding order.

Implementation: WITH order_gaps AS (SELECT customer_id, order_date, order_date - LAG(order_date) OVER (PARTITION BY customer_id ORDER BY order_date, order_id) AS gap_days FROM orders WHERE order_status = 'Delivered') SELECT customer_id, COUNT(*) AS order_count, ROUND(AVG(gap_days), 1) AS avg_gap_days, MIN(gap_days) AS min_gap, MAX(gap_days) AS max_gap FROM order_gaps GROUP BY customer_id HAVING COUNT(*) >= 2. The HAVING COUNT(*) >= 2 excludes single-order customers who have no meaningful gap to compute.

The result is used for: predicted next order date (last_order_date + avg_gap_days::INTEGER), churn signal (CURRENT_DATE - last_order_date > avg_gap_days * 1.5), and segment classification (customers with avg_gap < 7 days are weekly shoppers, avg_gap 7-30 days are monthly shoppers). Always add a tiebreaker to the ORDER BY inside the LAG (order_id is a good choice) — when two orders share the same date, the gap computation must be deterministic.

Q: How would you build a funnel analysis query showing drop-off rates between steps?

A funnel analysis needs: the count at each step, the drop-off from the previous step, and the overall conversion from the top of the funnel. LAG provides the previous step's count, FIRST_VALUE provides the top-of-funnel count, and simple arithmetic computes the rates.

Pattern: define each funnel step as a row in a CTE — either using a VALUES clause for known step counts, or computing counts per step from event tables. Then in the outer SELECT, apply LAG(users) OVER (ORDER BY step_num) for the previous step count. Drop-off = LAG(users) - users. Drop-off rate = drop-off / LAG(users) * 100. Overall conversion = users / FIRST_VALUE(users) OVER (ORDER BY step_num ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) * 100 — this gives the cumulative conversion from step 1.

For a real event-based funnel (not pre-aggregated counts), aggregate each step separately in a CTE — count distinct users who performed action A, action B, action C — then UNION ALL the step counts into one result set that LAG can process. The key is computing steps independently (not chaining — A then B then C — which double-counts and misses users who skip steps) and then joining the step counts. Use DISTINCT user counts at each step, not cumulative — a user who completes step 3 should not be double-counted at step 2. The funnel shows what percentage of the initial population makes it to each step, independent of whether they did all prior steps.

// Part 12

Errors You Will Hit — And Exactly Why They Happen

LAG returns NULL for all rows — expected to see previous values

Cause: The OVER clause is missing ORDER BY. LAG requires ORDER BY to define which row is 'previous' — without it, the concept of row order is undefined and the result is NULL for all rows. Alternatively, PARTITION BY is creating partitions of one row each (all rows have a unique partition key), so every row is alone in its partition and has no predecessor.

Fix: Add ORDER BY to the OVER clause: LAG(col) OVER (PARTITION BY customer_id ORDER BY order_date). Verify PARTITION BY groups multiple rows — check SELECT customer_id, COUNT(*) FROM orders GROUP BY customer_id to confirm multiple orders per customer. If PARTITION BY customer_id gives single-row partitions because the data is already one-row-per-customer (after aggregation), the LAG needs to be computed before the aggregation step.

Month-over-month LAG produces wrong values — compares to wrong month

Cause: The ORDER BY in the OVER clause is not using DATE_TRUNC output — it may be using a string representation or a partial date component. If two different months sort identically (e.g., both have month_num = 1 for January across different years), LAG skips to the wrong prior row. Or PARTITION BY is inadvertently splitting months into single-row partitions.

Fix: Use DATE_TRUNC('month', date)::DATE as both the GROUP BY key and the ORDER BY in the OVER clause: LAG(revenue) OVER (ORDER BY DATE_TRUNC('month', order_date)). The DATE comparison is unambiguous — 2024-01-01 sorts before 2024-02-01 regardless of year. If PARTITION BY is present for multi-entity PoP, ensure it is only on the entity column (store_id, product_id) — not on a date component that would create single-month partitions.

Gap calculation produces NULL for the first row — expected 0

Cause: The first row in a partition has no predecessor — LAG returns NULL. When computing gap = date - LAG(date), the first row's gap is NULL. If downstream logic treats NULL as 0 (e.g., a WHERE gap > 5 filter excludes NULL rows), first orders are silently excluded from analysis.

Fix: Use the default parameter of LAG to substitute a value for the first row: LAG(order_date, 1, order_date) OVER (...) returns the current date when no prior row exists — making the first row's gap = 0. Alternatively, use COALESCE: COALESCE(order_date - LAG(order_date) OVER (...), 0). Choose the default based on semantics: 0 days gap for first orders (they have no wait), or keep NULL to explicitly mark first orders and handle them separately in downstream logic.

Islands-and-gaps island number resets to 1 mid-sequence — unexpected new island

Cause: The is_island_start flag is incorrectly set to 1 for rows that should be in the same island. The gap condition (gap > threshold) is triggering for rows that are actually consecutive. Common causes: the ORDER BY column has duplicate values causing non-deterministic ordering, the gap threshold is too tight, or the date arithmetic is wrong (e.g., comparing dates when the column is a TIMESTAMP with time components that differ).

Fix: Verify the gap values: SELECT store_id, active_date, active_date - LAG(active_date) OVER (PARTITION BY store_id ORDER BY active_date) AS gap FROM active_days ORDER BY store_id, active_date. Confirm the gap values match expectations. If dates are TIMESTAMP type, cast to DATE first: active_date::DATE - LAG(active_date::DATE) OVER (...) to avoid sub-day gaps from time components. Add a secondary ORDER BY tiebreaker to resolve ties in the primary sort column.

Funnel drop-off rate exceeds 100% — more users at step N+1 than at step N

Cause: The funnel steps are not properly defined as sequential subsets. A user counted at step 3 was not counted at step 2 — meaning step 3's population is not a subset of step 2's. This happens when steps use different filters that are not strictly nested (e.g., step 2 counts orders in January but step 3 counts all delivered orders regardless of month).

Fix: Ensure each funnel step's population is a strict subset of the previous step. In SQL, this means joining each step to the prior step: users at step 3 must have also completed step 2. Use EXISTS or IN to enforce the sequence: step 3 count = COUNT(DISTINCT user_id) WHERE completed_step3 AND user_id IN (step2_users) AND user_id IN (step1_users). For event-based funnels, filter each step to only users who completed all prior steps before counting — this ensures the funnel is strictly sequential and each step can only have fewer or equal users compared to the prior step.

Try It Yourself

Build a store performance trend report using LAG and LEAD. For each store and each day they had delivered orders, show: store_id, city, order_date, daily_revenue, the previous day's revenue for this store (prev_day_revenue — NULL if no previous day), day-over-day change (dod_change), day-over-day percentage change (dod_pct, rounded to 1dp), the next day's revenue for this store (next_day_revenue), a 3-day trailing average excluding the current day (trailing_3day_avg — use ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING), and a performance label: 'Best day' if daily_revenue = MAX for that store, 'Growing' if dod_change > 0, 'Declining' if dod_change < 0, 'First day' if prev_day_revenue IS NULL, 'Flat' otherwise. Use a CTE for daily aggregation. Sort by store_id then order_date.

🎯 Key Takeaways

  • LAG(col, n, default) returns the value n rows before the current row within the window. LEAD(col, n, default) returns n rows after. Both require ORDER BY — without it, before/after are undefined.
  • LAG replaces every self-join for adjacent-row comparison. The window (PARTITION BY + ORDER BY) defines what adjacent means — previous order for the same customer, previous day for the same store.
  • Period-over-period pattern: aggregate to period granularity in a CTE (DATE_TRUNC), then apply LAG(revenue) OVER (ORDER BY month_start) in the outer SELECT. MoM change = revenue - LAG(revenue). Growth rate uses NULLIF on the denominator.
  • For per-entity PoP (per store, per product), add PARTITION BY entity_id to the OVER clause — LAG looks at the previous period for the same entity, not the globally previous row.
  • Churn signal: compute gap_days = order_date - LAG(order_date), then compare latest_gap to avg_gap per customer. latest_gap > avg_gap * 1.5 is at risk; > 2x is likely churned.
  • First-order detection: LAG(order_id) IS NULL flags the first order per customer (no predecessor). LEAD(order_id) IS NULL flags the last order (no successor).
  • Islands and gaps: (1) LAG to compute gap, (2) flag gap > threshold as island_start = 1, (3) SUM(island_start) OVER (...) cumulative gives each island a unique number, (4) GROUP BY island_num to summarise runs.
  • Funnel drop-off: define step counts in a CTE, then LAG(users) OVER (ORDER BY step_num) gives the prior step count. Drop-off rate = (LAG - current) / LAG * 100. Overall conversion uses FIRST_VALUE.
  • Trailing average excluding current row: ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING. This frame ends one row before the current row — the average is from prior rows only, a true trailing baseline.
  • Always pre-aggregate to the correct granularity in a CTE before applying LAG/LEAD. Window functions on raw row-level data when you need daily or monthly values produce one result per raw row, not per period.

What comes next

In Module 56, you learn Recursive CTEs — querying hierarchical data like org charts, category trees, and bill-of-materials without fixed-depth self-joins, using the WITH RECURSIVE pattern.

Module 56 → Recursive CTEs
Share

Discussion

0

Have a better approach? Found something outdated? Share it — your knowledge helps everyone learning here.

Continue with GitHub
Loading...