HAVING
Filter groups after aggregation — the clause that completes WHERE + GROUP BY + HAVING, with real business examples of every pattern you will use in production analytics
// Part 01
The Problem HAVING Solves
You now know WHERE (filters rows) and GROUP BY (groups rows for aggregation). Together they answer most analytical questions. But there is a category of question they cannot answer alone:
All four questions require filtering groups based on aggregate values — not individual rows. WHERE cannot do this because WHERE runs before groups are formed, so aggregate values do not exist yet. HAVING is the answer: it runs after GROUP BY, when groups and their aggregates are fully computed, and filters which groups appear in the final result.
// Part 02
Basic HAVING Syntax
Stores with more than 3 delivered orders
Categories with average price above ₹100
Customers who have spent more than ₹1,000
// Part 03
HAVING with COUNT — Frequency Filters
Filtering by COUNT is the most common HAVING pattern. It answers questions about minimum activity thresholds — customers with at least N orders, products ordered at least N times, stores processing at least N transactions.
// Part 04
HAVING with SUM and AVG — Value Thresholds
// Part 05
HAVING with Multiple Conditions
HAVING supports the same AND, OR, and NOT logic as WHERE. You can combine multiple aggregate conditions to define complex group-level filters.
// Part 06
WHERE + GROUP BY + HAVING Together — The Full Pattern
In production analytics queries, WHERE, GROUP BY, and HAVING all appear together. Each does its distinct job at its distinct stage of execution.
// Part 07
HAVING Without GROUP BY — Rare But Valid
HAVING can be used without GROUP BY. In this case, the entire table (or filtered rows) is treated as one group. HAVING then decides whether that single group appears in the result. This is rarely useful in practice — a WHERE condition on a scalar subquery usually achieves the same result more clearly — but it is syntactically valid.
// Part 08
HAVING vs WHERE — Performance and Correctness
Understanding when to use WHERE versus HAVING is both a correctness issue and a performance issue. Using HAVING where WHERE should be used is one of the most common SQL performance mistakes.
Correctness — when only HAVING works
HAVING is required when the filter condition references an aggregate function. WHERE cannot reference aggregates — they do not exist at the WHERE stage. This is a hard requirement, not a preference.
Performance — when WHERE is better than HAVING
For conditions on non-aggregate columns, always use WHERE instead of HAVING. WHERE filters rows before grouping — fewer rows means less grouping work. HAVING filters after grouping — all the grouping work is done before the filter is applied.
| Condition type | Use | Why |
|---|---|---|
| Filter on individual column value | WHERE | Runs before grouping — reduces rows GROUP BY must process |
| Filter on aggregate result (COUNT, SUM, AVG) | HAVING | Aggregates do not exist until after GROUP BY — WHERE cannot access them |
| Filter on GROUP BY column value | WHERE (preferred) | Technically works in HAVING but WHERE is faster — filter before grouping |
| Filter combining row and aggregate conditions | BOTH | Row conditions in WHERE, aggregate conditions in HAVING |
// Part 09
Advanced HAVING — Filtering on Computed Expressions
HAVING can filter on any aggregate expression — not just raw aggregates. This enables sophisticated group-level filters that express complex business rules.
Filter on a ratio or percentage
Filter comparing two aggregates
Filter using HAVING with DISTINCT count ratio
// Part 10
HAVING in Subqueries and CTEs
HAVING queries are frequently used as subqueries or CTEs — first filtering groups with HAVING, then using that filtered result as input for further queries.
HAVING result as a subquery
HAVING in a CTE
// Part 11
What This Looks Like at Work
You are a data analyst at Stripe, India's largest payment gateway. The risk team has asked you to identify merchant accounts showing unusual activity patterns that may indicate fraud or policy violations. They have given you three criteria that define "flagged" merchants.
🎯 Pro Tip
HAVING is the right tool whenever the business question is of the form "show me groups WHERE the aggregate is X." The moment you find yourself wanting to filter by a count, a sum, an average, or any other aggregate result, that is HAVING. The mental model: WHERE is for individual records (rows), HAVING is for groups (buckets). Write WHERE first, GROUP BY next, then add HAVING for any remaining aggregate filters.
// Part 12
Interview Prep — 5 Questions With Complete Answers
HAVING is a filter clause that operates on groups after GROUP BY has formed them and after aggregate functions have been computed. WHERE is a filter clause that operates on individual rows before GROUP BY processes them. Both filter data, but they work at different stages of query execution and on different things.
The distinction follows directly from SQL's execution order: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY. WHERE runs at step 2, before groups exist. HAVING runs at step 4, after groups and their aggregates are fully computed. This means WHERE can only reference raw column values — the aggregate functions have not run yet. HAVING can reference aggregate results — the groups exist and their metrics are known.
Concrete example: to find stores with more than 5 delivered orders, you need WHERE order_status = 'Delivered' to filter individual orders (a row-level condition), GROUP BY store_id to group them by store, and HAVING COUNT(*) > 5 to keep only groups with more than 5 rows. Using WHERE COUNT(*) > 5 is a syntax error — WHERE cannot reference aggregates. Using HAVING order_status = 'Delivered' instead of WHERE works technically but is inefficient — it groups all orders including cancelled ones, then discards the non-Delivered groups after the grouping work is done. Always put row-level conditions in WHERE and aggregate conditions in HAVING.
Yes — HAVING without GROUP BY is syntactically valid in SQL. When HAVING appears without GROUP BY, the entire table (or the rows surviving the WHERE filter) is treated as a single group. HAVING then decides whether that single group appears in the result. If the HAVING condition is TRUE for the aggregate over all rows, the query returns one summary row. If FALSE, the query returns zero rows.
Example: SELECT COUNT(*), SUM(total_amount) FROM orders HAVING COUNT(*) > 100 returns the count and sum if there are more than 100 orders, and returns nothing if there are 100 or fewer. This is technically correct SQL.
In practice, HAVING without GROUP BY is rarely used or recommended. The same result can almost always be expressed more clearly with a subquery or WHERE clause. SELECT COUNT(*), SUM(total_amount) FROM orders WHERE (SELECT COUNT(*) FROM orders) > 100 or putting the condition in application code after checking the count. HAVING without GROUP BY exists in the SQL standard and databases support it, but you will rarely see it in production code and it can confuse readers who expect HAVING to accompany GROUP BY.
This requires two aggregate conditions on the same group — COUNT of orders and AVG of order values — so both belong in HAVING. The WHERE clause filters to only delivered orders (a row-level condition that should happen before grouping for efficiency).
The query: SELECT c.customer_id, c.first_name || ' ' || c.last_name AS customer, COUNT(o.order_id) AS order_count, ROUND(AVG(o.total_amount), 2) AS avg_order_value FROM customers AS c JOIN orders AS o ON c.customer_id = o.customer_id WHERE o.order_status = 'Delivered' GROUP BY c.customer_id, c.first_name, c.last_name HAVING COUNT(o.order_id) > 2 AND AVG(o.total_amount) > 500 ORDER BY avg_order_value DESC.
The reasoning for each clause: JOIN brings orders alongside customer data. WHERE filters to delivered orders before grouping — this is a row-level condition. GROUP BY customer_id (plus name columns per the non-aggregate rule) creates one group per customer. HAVING COUNT > 2 filters to customers with more than two orders. HAVING AVG > 500 filters to customers whose average order exceeds ₹500. Both HAVING conditions must be satisfied simultaneously (AND). ORDER BY sorts by average order value descending — highest-value customers first. This query pattern — filter by count AND filter by aggregate value — is one of the most common HAVING patterns in customer analytics.
Using HAVING for a condition that could be expressed in WHERE is a performance mistake that causes unnecessary work. HAVING filters groups after GROUP BY has already processed all rows. If a non-aggregate condition belongs in WHERE but is mistakenly placed in HAVING, the database groups all rows — including those that would be excluded by the condition — before discarding them at the HAVING stage. All the grouping computation for those excluded rows is wasted.
Concrete example with scale: a table with 10 million orders, 2 million of which are 'Delivered'. HAVING order_status = 'Delivered' after GROUP BY processes all 10 million rows and forms groups for all statuses, then discards non-Delivered groups. WHERE order_status = 'Delivered' filters first — GROUP BY processes only 2 million rows. The WHERE version does 80% less grouping work on this data distribution.
The rule: move any condition expressible on individual row values from HAVING to WHERE. Only conditions requiring aggregate function results belong in HAVING. A quick test: if the condition does not contain COUNT, SUM, AVG, MIN, MAX, or any aggregate function, it should be in WHERE. If it does contain an aggregate, it must be in HAVING. Following this rule consistently keeps GROUP BY queries at their most efficient regardless of table size.
This requires comparing a per-product aggregate (how many orders contain this product) against an overall aggregate (total number of orders). It is a HAVING filter where the threshold is itself a computed value — not a hardcoded number.
Approach 1 — subquery in HAVING: SELECT product_id, COUNT(DISTINCT order_id) AS order_count FROM order_items GROUP BY product_id HAVING COUNT(DISTINCT order_id) > (SELECT COUNT(*) * 0.5 FROM orders). The subquery (SELECT COUNT(*) * 0.5 FROM orders) computes half the total order count and is evaluated once. HAVING then compares each product's order count against this threshold.
Approach 2 — window function (more elegant): SELECT product_id, COUNT(DISTINCT order_id) AS order_count, COUNT(DISTINCT order_id) * 1.0 / COUNT(DISTINCT order_id) OVER () AS fraction_of_orders FROM order_items GROUP BY product_id HAVING COUNT(DISTINCT order_id) * 1.0 / (SELECT COUNT(*) FROM orders) > 0.5. The subquery approach is simpler and more universally supported. In practice, "appears in more than X% of all orders" is called market basket penetration rate — a key metric in retail analytics used to identify anchor products that drive traffic. Products with very high penetration (>50%) are typically staples like milk, bread, or cooking oil that customers purchase on almost every visit.
// Part 13
Errors You Will Hit — And Exactly Why They Happen
🎯 Key Takeaways
- ✓HAVING filters groups after GROUP BY has formed them and aggregates have been computed. WHERE filters individual rows before GROUP BY. Same concept — different execution stage, different purpose.
- ✓HAVING is required when the filter condition references an aggregate function (COUNT, SUM, AVG, MIN, MAX). WHERE cannot reference aggregates — they do not exist at the WHERE stage.
- ✓Always put non-aggregate conditions in WHERE, not HAVING. WHERE filters rows before grouping — less work for GROUP BY. HAVING filters after all grouping is done — wasted work on excluded groups.
- ✓Full execution order: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT. HAVING sees aggregate results; WHERE does not.
- ✓HAVING supports AND, OR, NOT — the same logical operators as WHERE. Multiple aggregate conditions can be combined: HAVING COUNT(*) > 5 AND AVG(amount) > 1000.
- ✓HAVING can filter on computed aggregate expressions: ratios (SUM/COUNT), comparisons between aggregates (MAX > MIN * 2), and percentage thresholds.
- ✓SELECT aliases cannot be used in HAVING — HAVING runs before SELECT. Repeat the full aggregate expression in HAVING, or use a CTE to define the alias first.
- ✓HAVING without GROUP BY is valid — treats all rows as one group. Rarely needed in practice; a subquery or application logic usually serves the same purpose more clearly.
- ✓HAVING results work as subqueries and CTEs — filter groups first with HAVING, then use the result for further joins, aggregations, or reporting.
- ✓The WHERE + GROUP BY + HAVING triad is the foundation of every analytics query. WHERE selects the rows, GROUP BY defines the dimensions, HAVING filters the metrics.
What comes next
In Module 30, you learn JOINs — the most powerful concept in relational SQL. How to combine data from multiple tables, the difference between INNER, LEFT, RIGHT, and FULL OUTER JOINs, and why joins are the foundation of every non-trivial query.
Module 30 → Introduction to JOINsDiscussion
0Have a better approach? Found something outdated? Share it — your knowledge helps everyone learning here.