Aggregate Functions
COUNT, SUM, AVG, MIN, MAX — turn raw rows into business metrics, understand NULL behaviour, combine with DISTINCT, and build the analytics queries that power every dashboard
// Part 01
From Rows to Metrics — What Aggregation Does
Every query you have written so far returns individual rows — one row per customer, one row per order, one row per product. Aggregate functions change this completely. They collapse many rows into a single computed value — the count of all orders, the total revenue for a month, the average order value, the most expensive product.
This is the moment SQL becomes the language of analytics. Every dashboard metric, every KPI report, every business intelligence query is built on aggregate functions. How many active users? What is our daily revenue? What is the average delivery time? Which product has the highest margin? All of these are aggregate questions — and aggregate functions answer them.
SQL provides five core aggregate functions:
// Part 02
The Five Aggregate Functions — Reference
// Part 03
COUNT — The Most Used Aggregate
COUNT has three distinct forms that answer three different questions. Understanding the difference is essential — they produce different results on the same data.
COUNT(*) — count every row
COUNT(column) — count non-NULL values
COUNT(DISTINCT column) — count unique values
// Part 04
SUM — Totalling Numeric Values
SUM with expressions
// Part 05
AVG — Calculating Averages
AVG and NULL — the silent exclusion
AVG ignores NULL values — it computes the average only over non-NULL rows. This means AVG gives the average of known values, not the average including unknowns treated as zero. In most cases this is correct, but it can surprise you when NULLs are meaningful.
// Part 06
MIN and MAX — Finding Extremes
MIN and MAX on text — alphabetical extremes
MIN and MAX on dates — timeline extremes
// Part 07
Aggregates Without GROUP BY — Single-Row Results
Aggregate functions without GROUP BY collapse the entire table (or the filtered rows) into a single result row. This is the simplest form and produces one number that summarises the entire dataset.
// Part 08
NULL Behaviour — The Critical Detail
All five aggregate functions ignore NULL values — they compute only over non-NULL rows. This is usually the correct behaviour but produces surprising results in specific situations. You must understand this to interpret aggregate results correctly.
// Part 09
Aggregates with Expressions and CASE WHEN
Aggregate functions do not have to operate on raw columns. You can aggregate expressions — calculations, CASE WHEN results, and any SQL expression that produces a numeric value. This unlocks powerful analytical patterns.
SUM with CASE WHEN — conditional totals
COUNT with CASE WHEN — conditional counts
Aggregate over calculated values
// Part 10
Aggregates Across Multiple Tables — JOINs + Aggregates
The most powerful analytical queries combine JOINs (to bring data from multiple tables together) with aggregate functions (to summarise that combined data). This is the foundation of every real analytics report.
// Part 11
What This Looks Like at Work
You are an analyst at DoorDash. It is Monday morning and the weekly business review is in two hours. Your manager has sent three requests overnight: a summary of last week's GMV by city, the top 5 restaurants by order count, and the average delivery time by restaurant rating band. These are classic aggregate queries that you need to produce in under 30 minutes.
🎯 Pro Tip
Every recurring business report should be a saved, parameterised SQL query — not a one-off command. Before writing each query, ask: "Will someone need this next week?" If yes, write it cleanly with clear aliases, add a comment at the top describing what it does, and save it in your team's shared query library. The 5 minutes of cleanup pays back every time the query is reused.
// Part 12
Interview Prep — 5 Questions With Complete Answers
COUNT(*) counts every row in the result set, including rows where all columns are NULL. It answers "how many rows are there?" It is the correct function for counting the total number of records — orders, customers, transactions — regardless of their values.
COUNT(column) counts rows where the specified column is NOT NULL. If delivery_date is NULL for undelivered orders, COUNT(delivery_date) counts only the delivered orders. This answers "how many rows have a value in this column?" It is useful for understanding data completeness — how many records have a phone number, how many orders have been assigned a delivery date.
COUNT(DISTINCT column) counts the number of unique non-NULL values in the column. COUNT(DISTINCT customer_id) from the orders table counts how many distinct customers have placed at least one order — not how many total orders, and not including customers who have never ordered. This answers "how many unique values exist?" It is the cardinality question. A common analytical pattern combining all three: SELECT COUNT(*) AS total_orders, COUNT(delivery_date) AS delivered, COUNT(DISTINCT customer_id) AS unique_customers FROM orders — three different metrics from the same table in one query.
All five aggregate functions (COUNT, SUM, AVG, MIN, MAX) ignore NULL values — they operate only over non-NULL rows. The one exception is COUNT(*), which counts all rows regardless of NULL content. This behaviour is defined by the SQL standard and is consistent across all databases.
The practical consequence that surprises most people: AVG(column) divides by the count of non-NULL values, not by the total row count. If 20 orders exist and 5 have NULL delivery_date, AVG(delivery_date - order_date) divides the sum by 15 — not 20. This gives the average delivery time for delivered orders, which is usually what you want, but it can mislead if you expect the average to account for all orders.
When NULLs should be treated as a specific value rather than ignored, use COALESCE before the aggregate: AVG(COALESCE(delivery_date - order_date, 0)) treats undelivered orders as 0 days and divides by 20 instead of 15. SUM returns NULL (not 0) if all values in the column are NULL — COALESCE(SUM(column), 0) handles this. The rule: always be explicit about whether NULLs should be ignored (use the aggregate directly) or substituted (use COALESCE inside the aggregate) and document the choice in a SQL comment.
Yes — aggregate functions can be used without GROUP BY. When used without GROUP BY, they collapse the entire result set (or the entire filtered result set after WHERE) into a single row. SELECT COUNT(*), SUM(total_amount), AVG(total_amount) FROM orders returns exactly one row containing three numbers that summarise all orders.
Without GROUP BY, all rows that pass the WHERE filter are treated as one group. This is the simplest and most common form of aggregation — getting a single summary metric for a dataset. Total revenue for the month, average order value, the number of active customers, the highest-paid employee salary — all of these are single-row aggregate queries without GROUP BY.
The important rule: when aggregate functions are used without GROUP BY, you cannot include non-aggregated columns in SELECT. SELECT customer_id, COUNT(*) FROM orders without GROUP BY is an error — the database cannot return one row from COUNT(*) and simultaneously return a different customer_id for each row. Every column in SELECT must either be inside an aggregate function or be part of the GROUP BY clause. The only exception is window functions (covered in later modules), which compute aggregates alongside individual rows without collapsing them.
COUNT counts the number of rows (or non-NULL values, or distinct values). SUM adds up the numeric values in a column. They answer fundamentally different questions: COUNT answers "how many?" and SUM answers "how much total?"
Use COUNT when you want a quantity of records: how many orders were placed, how many customers signed up this month, how many products are in stock. Use SUM when you want a total of a numeric value: total revenue, total units sold, total salary budget. A single query often uses both: SELECT COUNT(*) AS order_count, SUM(total_amount) AS total_revenue FROM orders — count tells you the volume, sum tells you the value.
A common confusion: SUM(CASE WHEN condition THEN 1 ELSE 0 END) is equivalent to COUNT(CASE WHEN condition THEN 1 END) for conditional counting. Both count rows where the condition is true. The SUM version with ELSE 0 is slightly more readable for conditional aggregation alongside other aggregates, and it returns 0 (not NULL) when no rows match — COUNT of all NULLs returns 0 anyway, so the difference is subtle. The key insight: SUM of 1s and 0s is the same as COUNT of 1s. This is the foundation of the conditional aggregation pattern.
The delivery rate percentage requires two quantities: the count of delivered orders (the numerator) and the total count of all orders (the denominator). Divide numerator by denominator and multiply by 100. Several equivalent approaches exist in SQL.
Approach 1 — conditional SUM: ROUND(SUM(CASE WHEN order_status = 'Delivered' THEN 1.0 ELSE 0 END) / COUNT(*) * 100, 1) AS delivery_rate_pct. The 1.0 (not integer 1) forces decimal division. This is the most common and most readable approach for conditional percentages.
Approach 2 — AVG of a boolean expression: ROUND(AVG(CASE WHEN order_status = 'Delivered' THEN 1.0 ELSE 0 END) * 100, 1). AVG of 0s and 1s gives the proportion directly (between 0 and 1), then multiply by 100 for the percentage. This is mathematically identical to approach 1. Both produce the same result. The choice is style preference — approach 1 (SUM / COUNT) is more explicit about what the numerator and denominator are, which aids readability. Add a NULLIF to prevent division by zero when the table might be empty: SUM(...) / NULLIF(COUNT(*), 0) * 100.
// Part 13
Errors You Will Hit — And Exactly Why They Happen
🎯 Key Takeaways
- ✓Five core aggregate functions: COUNT (how many), SUM (total), AVG (mean), MIN (smallest), MAX (largest). They collapse many rows into a single computed value.
- ✓COUNT(*) counts all rows including NULLs. COUNT(column) counts non-NULL values only. COUNT(DISTINCT column) counts unique non-NULL values. Three different answers to three different questions.
- ✓All aggregates except COUNT(*) ignore NULL values. AVG divides by the count of non-NULL rows — not total rows. Use COALESCE inside the aggregate to treat NULLs as a specific value.
- ✓SUM returns NULL (not 0) when no rows match. Wrap in COALESCE(SUM(col), 0) to return 0 for empty result sets.
- ✓Aggregates without GROUP BY: collapse all rows into one result row. Every non-aggregate column in SELECT must be in GROUP BY — or the query errors.
- ✓SUM(CASE WHEN condition THEN 1 ELSE 0 END) counts rows matching the condition. SUM(CASE WHEN condition THEN amount ELSE 0 END) sums values for matching rows. This conditional aggregation pattern replaces multiple queries with one.
- ✓AVG(CASE WHEN condition THEN value END) without ELSE computes the average only for matching rows — non-matching rows return NULL which AVG ignores.
- ✓Combining JOINs with aggregates is the foundation of analytics: JOIN to bring data together, GROUP BY to define the groups, aggregates to summarise each group.
- ✓COUNT(DISTINCT col) can be slow on large tables — requires deduplication across all values. For frequent queries, consider pre-aggregating into a summary table.
- ✓Always alias aggregate results: COUNT(*) AS total_orders, not just COUNT(*). Application code and dashboards reference columns by name — unnamed aggregates produce unreadable or broken references.
What comes next
In Module 28, you learn GROUP BY — the clause that splits rows into groups so aggregate functions compute separately per group. This is where SQL analytics becomes truly powerful.
Module 28 → GROUP BYDiscussion
0Have a better approach? Found something outdated? Share it — your knowledge helps everyone learning here.