SQL for Data Analysis
Real analytical patterns — revenue trends, customer segmentation, cohort analysis, and RFM scoring — applied to the FreshCart dataset
// Part 01
What Data Analysis Looks Like in SQL
Data analysis in SQL is not about learning new syntax — it is about applying the syntax you already know to answer real business questions. The patterns in this module are used daily by analysts at DoorDash, Instacart, HDFC, and every data-driven company: revenue breakdowns, customer cohorts, retention curves, and RFM segmentation.
We will use the FreshCart dataset — the same one in every playground in this course — so every query you run here is real and produces real results. By the end of this module you will have a toolkit of analytical query patterns you can adapt to any business dataset.
// Part 02
Revenue Analysis — Breakdowns and Trends
The first question any business asks is: where does our money come from? Revenue breakdowns by time, channel, city, and product category reveal which segments drive growth and which are declining. These queries combine GROUP BY with date functions and window functions for trend context.
Revenue by store and city
Revenue by payment method
Monthly revenue trend
Analyst insight
Month-over-month revenue trend tells you whether the business is growing. Running total tells you where you stand against annual targets. Combining both in one query (monthly + cumulative) is the standard format for a revenue dashboard. The window function SUM() OVER (ORDER BY month ROWS UNBOUNDED PRECEDING) computes the running total without a self-join.
// Part 03
Product Analysis — What Sells and What Does Not
Product analysis answers: which products drive the most revenue, which have the highest volume, and which are underperforming relative to their category. These insights drive inventory decisions, promotional strategy, and supplier negotiations.
// Part 04
Customer Segmentation — Who Are Your Best Customers
Customer segmentation divides your customer base into groups with similar characteristics or behaviour. The simplest and most powerful segmentation for e-commerce is loyalty tier analysis — understanding revenue concentration across Bronze, Silver, Gold, and Platinum customers.
Business insight
Loyalty tier analysis almost always reveals the 80/20 rule: Platinum and Gold customers (top 20%) typically generate 70-80% of revenue. This drives decisions like: should we invest more in retaining top-tier customers (cheaper than acquiring new ones) or in upgrading Silver customers to Gold (via targeted promotions)?
City-level customer distribution
// Part 05
RFM Segmentation — Recency, Frequency, Monetary
RFM is the most widely used customer segmentation framework in e-commerce. It scores each customer on three dimensions: Recency (how recently did they order?), Frequency (how often do they order?), Monetary (how much do they spend?). High scores on all three = your best customers. Low scores = at-risk or churned customers.
// Part 06
Cohort Analysis — Retention Over Time
A cohort is a group of customers who share a common characteristic — typically the month they placed their first order. Cohort analysis tracks what percentage of each cohort is still ordering in subsequent months. It is the gold standard for measuring whether customer retention is improving or declining over time.
Reading a cohort table
Each row is one acquisition cohort. M0 = customers who ordered in their first month (always 100% of cohort size). M1 = customers still ordering 1 month later. M2 = 2 months later. A healthy retention curve has M1/M0 above 40% for e-commerce. If M1/M0 is declining cohort-over-cohort, something changed in the customer experience or product quality.
// Part 07
Order Funnel and Conversion Analysis
A funnel tracks how orders move through stages: placed → confirmed → shipped → delivered. Drop-off at each stage reveals operational bottlenecks. High cancellation rate after placement = payment issues. High drop-off before delivery = logistics problems.
// Part 08
Basket Analysis — What Do Customers Buy Together
Basket analysis identifies which products are frequently purchased in the same order. This drives cross-sell recommendations ("customers who bought X also bought Y") and bundle promotions. It uses a self-join on order_items — each order is joined to itself to find co-occurring products.
// Part 09
The Complete FreshCart Business Report
A real business report combines multiple analytical angles into one document. Below is a comprehensive FreshCart Q-period performance report using window functions, CTEs, and aggregations — the kind of query a data analyst would write for a management review.
🎯 Key Takeaways
- ✓Revenue analysis = GROUP BY + SUM + window functions for share percentages and running totals. Always compute revenue_share_pct alongside absolute revenue.
- ✓Product analysis: rank products by revenue, identify zero-sellers with NOT EXISTS, compute category share with SUM() OVER () window total.
- ✓RFM segmentation scores customers on Recency (days since last order), Frequency (order count), Monetary (total spend). High RFM total = Champion; low = Lost or At Risk.
- ✓Cohort analysis: group customers by acquisition month, track how many are still ordering in M1, M2, M3. Declining M1/M0 ratio signals a retention problem.
- ✓Order funnel: COUNT by order_status to find drop-off. Add store and tier breakdowns to identify where cancellations concentrate.
- ✓Basket analysis: self-join order_items on order_id with product_id < product_id to find co-purchased products without duplicates.
- ✓The standard data analysis structure: 2-3 CTEs that prepare sub-aggregations, a final SELECT that joins them. Readable, composable, easy to extend.
- ✓julianday() in SQLite gives the number of days between two dates. strftime() formats dates for monthly grouping. These replace PostgreSQL's DATE_TRUNC and age() functions.
What comes next
In Module 61, you tackle the Top 50 SQL interview questions — every question that appears in data analyst and data engineer interviews, with complete answers and the traps interviewers set.
Module 61 → Top 50 SQL Interview QuestionsDiscussion
0Have a better approach? Found something outdated? Share it — your knowledge helps everyone learning here.