Python · SQL · Web Dev · Java · AI/ML tracks launching soon — your one platform for all of IT
Intermediate

Aggregate Functions

COUNT, SUM, AVG, MIN, MAX — turn raw rows into business metrics, understand NULL behaviour, combine with DISTINCT, and build the analytics queries that power every dashboard

12–16 min April 2026
Section 6 · Aggregation
Aggregation · 3 modulesModule 27

// Part 01

From Rows to Metrics — What Aggregation Does

Every query you have written so far returns individual rows — one row per customer, one row per order, one row per product. Aggregate functions change this completely. They collapse many rows into a single computed value — the count of all orders, the total revenue for a month, the average order value, the most expensive product.

This is the moment SQL becomes the language of analytics. Every dashboard metric, every KPI report, every business intelligence query is built on aggregate functions. How many active users? What is our daily revenue? What is the average delivery time? Which product has the highest margin? All of these are aggregate questions — and aggregate functions answer them.

SQL provides five core aggregate functions:

COUNT()
How many rows?
SUM()
What is the total?
AVG()
What is the average?
MIN()
What is the smallest?
MAX()
What is the largest?

// Part 02

The Five Aggregate Functions — Reference

COUNT()— count rows or non-null values

Syntax

COUNT(*) | COUNT(col) | COUNT(DISTINCT col)

NULL handling

COUNT(*) counts all rows including NULLs. COUNT(col) excludes NULL values.

Use for

Number of orders, customers, products, transactions

SUM()— total of numeric values

Syntax

SUM(column) | SUM(expression)

NULL handling

Ignores NULL values. SUM of all NULLs returns NULL (not 0).

Use for

Total revenue, total quantity sold, total salary budget

AVG()— arithmetic mean

Syntax

AVG(column) | AVG(expression)

NULL handling

Ignores NULL values. AVG over all NULLs returns NULL.

Use for

Average order value, average delivery days, average rating

MIN()— smallest value

Syntax

MIN(column)

NULL handling

Ignores NULL values. Works on numbers, text (alphabetical), and dates.

Use for

Cheapest product, first order date, lowest salary

MAX()— largest value

Syntax

MAX(column)

NULL handling

Ignores NULL values. Works on numbers, text (alphabetical), and dates.

Use for

Most expensive product, latest order date, highest salary

// Part 03

COUNT — The Most Used Aggregate

COUNT has three distinct forms that answer three different questions. Understanding the difference is essential — they produce different results on the same data.

COUNT(*) — count every row

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

COUNT(column) — count non-NULL values

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

COUNT(DISTINCT column) — count unique values

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

// Part 04

SUM — Totalling Numeric Values

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

SUM with expressions

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

// Part 05

AVG — Calculating Averages

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

AVG and NULL — the silent exclusion

AVG ignores NULL values — it computes the average only over non-NULL rows. This means AVG gives the average of known values, not the average including unknowns treated as zero. In most cases this is correct, but it can surprise you when NULLs are meaningful.

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

// Part 06

MIN and MAX — Finding Extremes

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

MIN and MAX on text — alphabetical extremes

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

MIN and MAX on dates — timeline extremes

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

// Part 07

Aggregates Without GROUP BY — Single-Row Results

Aggregate functions without GROUP BY collapse the entire table (or the filtered rows) into a single result row. This is the simplest form and produces one number that summarises the entire dataset.

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

// Part 08

NULL Behaviour — The Critical Detail

All five aggregate functions ignore NULL values — they compute only over non-NULL rows. This is usually the correct behaviour but produces surprising results in specific situations. You must understand this to interpret aggregate results correctly.

NULL behaviour of all five aggregates
-- Given these values: 10, 20, NULL, 30, NULL

COUNT(*) = 5          -- counts all rows including NULLs
COUNT(col) = 3        -- counts only non-NULL: 10, 20, 30
SUM(col) = 60         -- sums only non-NULL: 10+20+30
AVG(col) = 20         -- averages only non-NULL: 60/3 (NOT 60/5)
MIN(col) = 10         -- minimum of non-NULL values
MAX(col) = 30         -- maximum of non-NULL values

-- The dangerous case: AVG
-- If 2 out of 5 values are NULL, AVG divides by 3, not 5
-- This gives a HIGHER average than if NULLs were treated as 0
-- 60/3 = 20 vs 60/5 = 12 — very different answers

-- If NULLs should be treated as 0 for AVG:
AVG(COALESCE(col, 0)) = 12   -- treats NULL as 0, divides by 5
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
⚠️ Important
AVG(column) and AVG(COALESCE(column, 0)) answer fundamentally different business questions. The first asks "what is the average among rows that have this value?" The second asks "what is the average across all rows, treating missing values as zero?" Always be explicit about which question you are answering — and document it in a SQL comment.

// Part 09

Aggregates with Expressions and CASE WHEN

Aggregate functions do not have to operate on raw columns. You can aggregate expressions — calculations, CASE WHEN results, and any SQL expression that produces a numeric value. This unlocks powerful analytical patterns.

SUM with CASE WHEN — conditional totals

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

COUNT with CASE WHEN — conditional counts

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

Aggregate over calculated values

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

// Part 10

Aggregates Across Multiple Tables — JOINs + Aggregates

The most powerful analytical queries combine JOINs (to bring data from multiple tables together) with aggregate functions (to summarise that combined data). This is the foundation of every real analytics report.

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

// Part 11

What This Looks Like at Work

You are an analyst at DoorDash. It is Monday morning and the weekly business review is in two hours. Your manager has sent three requests overnight: a summary of last week's GMV by city, the top 5 restaurants by order count, and the average delivery time by restaurant rating band. These are classic aggregate queries that you need to produce in under 30 minutes.

8:00 AM
Request 1 — GMV by city (adapted for FreshCart)
Adapted to FreshCart: total delivered revenue by store city for the most recent available data.
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
8:15 AM
Request 2 — Top stores by order count
Top 5 stores by delivered order count with their revenue.
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
8:30 AM
Request 3 — Delivery time by loyalty tier
Adapted: average delivery time grouped by customer loyalty tier.
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
8:45 AM
All three reports done, 75 minutes before the meeting
Three queries, three results, delivered 75 minutes before the business review. The queries are clean, well-aliased, and reproducible — the manager can re-run them next week with no modifications and get updated numbers automatically.

🎯 Pro Tip

Every recurring business report should be a saved, parameterised SQL query — not a one-off command. Before writing each query, ask: "Will someone need this next week?" If yes, write it cleanly with clear aliases, add a comment at the top describing what it does, and save it in your team's shared query library. The 5 minutes of cleanup pays back every time the query is reused.

// Part 12

Interview Prep — 5 Questions With Complete Answers

Q: What is the difference between COUNT(*), COUNT(column), and COUNT(DISTINCT column)?

COUNT(*) counts every row in the result set, including rows where all columns are NULL. It answers "how many rows are there?" It is the correct function for counting the total number of records — orders, customers, transactions — regardless of their values.

COUNT(column) counts rows where the specified column is NOT NULL. If delivery_date is NULL for undelivered orders, COUNT(delivery_date) counts only the delivered orders. This answers "how many rows have a value in this column?" It is useful for understanding data completeness — how many records have a phone number, how many orders have been assigned a delivery date.

COUNT(DISTINCT column) counts the number of unique non-NULL values in the column. COUNT(DISTINCT customer_id) from the orders table counts how many distinct customers have placed at least one order — not how many total orders, and not including customers who have never ordered. This answers "how many unique values exist?" It is the cardinality question. A common analytical pattern combining all three: SELECT COUNT(*) AS total_orders, COUNT(delivery_date) AS delivered, COUNT(DISTINCT customer_id) AS unique_customers FROM orders — three different metrics from the same table in one query.

Q: How do aggregate functions handle NULL values?

All five aggregate functions (COUNT, SUM, AVG, MIN, MAX) ignore NULL values — they operate only over non-NULL rows. The one exception is COUNT(*), which counts all rows regardless of NULL content. This behaviour is defined by the SQL standard and is consistent across all databases.

The practical consequence that surprises most people: AVG(column) divides by the count of non-NULL values, not by the total row count. If 20 orders exist and 5 have NULL delivery_date, AVG(delivery_date - order_date) divides the sum by 15 — not 20. This gives the average delivery time for delivered orders, which is usually what you want, but it can mislead if you expect the average to account for all orders.

When NULLs should be treated as a specific value rather than ignored, use COALESCE before the aggregate: AVG(COALESCE(delivery_date - order_date, 0)) treats undelivered orders as 0 days and divides by 20 instead of 15. SUM returns NULL (not 0) if all values in the column are NULL — COALESCE(SUM(column), 0) handles this. The rule: always be explicit about whether NULLs should be ignored (use the aggregate directly) or substituted (use COALESCE inside the aggregate) and document the choice in a SQL comment.

Q: Can you use aggregate functions without GROUP BY? What do they return?

Yes — aggregate functions can be used without GROUP BY. When used without GROUP BY, they collapse the entire result set (or the entire filtered result set after WHERE) into a single row. SELECT COUNT(*), SUM(total_amount), AVG(total_amount) FROM orders returns exactly one row containing three numbers that summarise all orders.

Without GROUP BY, all rows that pass the WHERE filter are treated as one group. This is the simplest and most common form of aggregation — getting a single summary metric for a dataset. Total revenue for the month, average order value, the number of active customers, the highest-paid employee salary — all of these are single-row aggregate queries without GROUP BY.

The important rule: when aggregate functions are used without GROUP BY, you cannot include non-aggregated columns in SELECT. SELECT customer_id, COUNT(*) FROM orders without GROUP BY is an error — the database cannot return one row from COUNT(*) and simultaneously return a different customer_id for each row. Every column in SELECT must either be inside an aggregate function or be part of the GROUP BY clause. The only exception is window functions (covered in later modules), which compute aggregates alongside individual rows without collapsing them.

Q: What is the difference between SUM and COUNT and when would you use each?

COUNT counts the number of rows (or non-NULL values, or distinct values). SUM adds up the numeric values in a column. They answer fundamentally different questions: COUNT answers "how many?" and SUM answers "how much total?"

Use COUNT when you want a quantity of records: how many orders were placed, how many customers signed up this month, how many products are in stock. Use SUM when you want a total of a numeric value: total revenue, total units sold, total salary budget. A single query often uses both: SELECT COUNT(*) AS order_count, SUM(total_amount) AS total_revenue FROM orders — count tells you the volume, sum tells you the value.

A common confusion: SUM(CASE WHEN condition THEN 1 ELSE 0 END) is equivalent to COUNT(CASE WHEN condition THEN 1 END) for conditional counting. Both count rows where the condition is true. The SUM version with ELSE 0 is slightly more readable for conditional aggregation alongside other aggregates, and it returns 0 (not NULL) when no rows match — COUNT of all NULLs returns 0 anyway, so the difference is subtle. The key insight: SUM of 1s and 0s is the same as COUNT of 1s. This is the foundation of the conditional aggregation pattern.

Q: How would you calculate the percentage of orders delivered successfully?

The delivery rate percentage requires two quantities: the count of delivered orders (the numerator) and the total count of all orders (the denominator). Divide numerator by denominator and multiply by 100. Several equivalent approaches exist in SQL.

Approach 1 — conditional SUM: ROUND(SUM(CASE WHEN order_status = 'Delivered' THEN 1.0 ELSE 0 END) / COUNT(*) * 100, 1) AS delivery_rate_pct. The 1.0 (not integer 1) forces decimal division. This is the most common and most readable approach for conditional percentages.

Approach 2 — AVG of a boolean expression: ROUND(AVG(CASE WHEN order_status = 'Delivered' THEN 1.0 ELSE 0 END) * 100, 1). AVG of 0s and 1s gives the proportion directly (between 0 and 1), then multiply by 100 for the percentage. This is mathematically identical to approach 1. Both produce the same result. The choice is style preference — approach 1 (SUM / COUNT) is more explicit about what the numerator and denominator are, which aids readability. Add a NULLIF to prevent division by zero when the table might be empty: SUM(...) / NULLIF(COUNT(*), 0) * 100.

// Part 13

Errors You Will Hit — And Exactly Why They Happen

ERROR: column 'customer_id' must appear in GROUP BY or be used in an aggregate function

Cause: You included a non-aggregated column in SELECT alongside aggregate functions, without listing that column in GROUP BY. When aggregate functions are used, every column in SELECT must either be inside an aggregate function (COUNT, SUM, AVG, MIN, MAX) or be listed in GROUP BY. The database cannot return one summary row from COUNT(*) and simultaneously return a different customer_id per row.

Fix: Either add the column to GROUP BY: SELECT customer_id, COUNT(*) FROM orders GROUP BY customer_id. Or wrap the column in an aggregate function: SELECT MAX(customer_id), COUNT(*) FROM orders. Or remove the column from SELECT if it is not needed. The rule is: aggregate queries either return one summary row (no GROUP BY) or one row per group (with GROUP BY listing all non-aggregated columns).

SUM returns NULL instead of 0 — no rows matched the WHERE condition

Cause: SUM of zero rows returns NULL, not 0. When no rows match the WHERE condition, SUM has no values to add — and the result is NULL rather than the more intuitive 0. This happens when filtering by a date range with no data, a status with no matching rows, or a store with no orders.

Fix: Wrap SUM in COALESCE to replace NULL with 0: COALESCE(SUM(total_amount), 0) AS total_revenue. This handles the empty-set case cleanly. The same applies to AVG and other aggregates: COALESCE(AVG(total_amount), 0) AS avg_revenue. Note that COUNT(*) already returns 0 for empty sets (not NULL) — only SUM, AVG, MIN, and MAX return NULL when no rows match.

AVG gives a higher number than expected — some rows are being excluded

Cause: AVG ignores NULL values. If the column being averaged has NULLs (for example, delivery_date - order_date is NULL for undelivered orders), AVG only divides by the count of non-NULL rows. With 20 orders and 5 undelivered (NULL delivery_date), AVG divides by 15, not 20. The result is the average for the delivered subset only — which is higher than the average including all orders.

Fix: If you want NULLs included as a specific value: AVG(COALESCE(delivery_date - order_date, 0)) treats undelivered orders as 0 days. If you want the average of delivered orders only (NULL exclusion is correct): document this explicitly in a comment and verify the denominator: SELECT COUNT(*) AS total, COUNT(delivery_date) AS non_null_count to confirm how many rows AVG is dividing by. The key is to be intentional about which question you are answering.

COUNT(DISTINCT column) returns a larger number than expected

Cause: The column has more unique values than expected — possibly due to case sensitivity, whitespace, or formatting inconsistencies. 'Seattle' and 'bangalore' and 'Seattle ' (trailing space) are three distinct values in PostgreSQL's case-sensitive comparison. COUNT(DISTINCT city) returns 3 for what you might consider the same city.

Fix: Normalise the values before counting: COUNT(DISTINCT LOWER(TRIM(city))) counts distinct cities after converting to lowercase and removing whitespace — treating 'Seattle', 'bangalore', and 'Seattle ' as the same value. To investigate inconsistencies: SELECT DISTINCT city FROM customers ORDER BY city — examine the result for near-duplicates. Fix at the data level (UPDATE to canonical forms) for a permanent solution.

Aggregate result is wrong — CASE WHEN SUM returns unexpected values

Cause: The CASE WHEN expression inside SUM has a logic error — conditions are in the wrong order, a condition is missing, or the ELSE clause is absent (returning NULL instead of 0 for non-matching rows). SUM(CASE WHEN condition THEN amount END) sums NULL for non-matching rows, which is ignored by SUM — equivalent to ELSE 0. But SUM(CASE WHEN condition THEN amount ELSE amount END) incorrectly counts everything.

Fix: Always include an explicit ELSE 0 in SUM(CASE WHEN): SUM(CASE WHEN order_status = 'Delivered' THEN total_amount ELSE 0 END). This makes the intent clear and prevents NULL accumulation surprises. To debug, run the CASE WHEN as a SELECT column first — no GROUP BY, just the CASE expression per row — and verify that each row produces the expected value before wrapping in SUM.

Try It Yourself

Write a single query that produces the FreshCart monthly executive summary. Show: total orders, unique customers, total delivered revenue (rounded to 2 decimal places), average order value for delivered orders (rounded to 2 decimal places), the count of orders by status (delivered, cancelled, returned, processing as separate columns), percentage of orders delivered (rounded to 1 decimal place), and the most expensive single order. All from the orders table — no GROUP BY needed, one summary row.

🎯 Key Takeaways

  • Five core aggregate functions: COUNT (how many), SUM (total), AVG (mean), MIN (smallest), MAX (largest). They collapse many rows into a single computed value.
  • COUNT(*) counts all rows including NULLs. COUNT(column) counts non-NULL values only. COUNT(DISTINCT column) counts unique non-NULL values. Three different answers to three different questions.
  • All aggregates except COUNT(*) ignore NULL values. AVG divides by the count of non-NULL rows — not total rows. Use COALESCE inside the aggregate to treat NULLs as a specific value.
  • SUM returns NULL (not 0) when no rows match. Wrap in COALESCE(SUM(col), 0) to return 0 for empty result sets.
  • Aggregates without GROUP BY: collapse all rows into one result row. Every non-aggregate column in SELECT must be in GROUP BY — or the query errors.
  • SUM(CASE WHEN condition THEN 1 ELSE 0 END) counts rows matching the condition. SUM(CASE WHEN condition THEN amount ELSE 0 END) sums values for matching rows. This conditional aggregation pattern replaces multiple queries with one.
  • AVG(CASE WHEN condition THEN value END) without ELSE computes the average only for matching rows — non-matching rows return NULL which AVG ignores.
  • Combining JOINs with aggregates is the foundation of analytics: JOIN to bring data together, GROUP BY to define the groups, aggregates to summarise each group.
  • COUNT(DISTINCT col) can be slow on large tables — requires deduplication across all values. For frequent queries, consider pre-aggregating into a summary table.
  • Always alias aggregate results: COUNT(*) AS total_orders, not just COUNT(*). Application code and dashboards reference columns by name — unnamed aggregates produce unreadable or broken references.

What comes next

In Module 28, you learn GROUP BY — the clause that splits rows into groups so aggregate functions compute separately per group. This is where SQL analytics becomes truly powerful.

Module 28 → GROUP BY
Share

Discussion

0

Have a better approach? Found something outdated? Share it — your knowledge helps everyone learning here.

Continue with GitHub
Loading...