Python · SQL · Web Dev · Java · AI/ML tracks launching soon — your one platform for all of IT
Intermediate

HAVING

Filter groups after aggregation — the clause that completes WHERE + GROUP BY + HAVING, with real business examples of every pattern you will use in production analytics

10–14 min April 2026
Section 6 · Aggregation
Aggregation · 3 modulesModule 29

// Part 01

The Problem HAVING Solves

You now know WHERE (filters rows) and GROUP BY (groups rows for aggregation). Together they answer most analytical questions. But there is a category of question they cannot answer alone:

Q1Which stores have processed more than 5 delivered orders?
Q2Which product categories have average price above ₹100?
Q3Which customers have spent more than ₹2,000 in total?
Q4Which payment methods account for more than 20% of revenue?

All four questions require filtering groups based on aggregate values — not individual rows. WHERE cannot do this because WHERE runs before groups are formed, so aggregate values do not exist yet. HAVING is the answer: it runs after GROUP BY, when groups and their aggregates are fully computed, and filters which groups appear in the final result.

WHERE vs HAVING — the fundamental difference
-- WHERE: filters individual rows BEFORE grouping
-- Cannot use aggregate functions here
WHERE order_status = 'Delivered'          -- ✓ row-level value
WHERE COUNT(*) > 5                        -- ✗ aggregate — does not exist yet

-- HAVING: filters groups AFTER grouping and aggregation
-- Can use aggregate functions here
HAVING COUNT(*) > 5                       -- ✓ group-level aggregate
HAVING SUM(total_amount) > 1000           -- ✓ group-level aggregate
HAVING AVG(unit_price) > 100              -- ✓ group-level aggregate
HAVING order_status = 'Delivered'         -- ✗ works but belongs in WHERE (inefficient)

// Part 02

Basic HAVING Syntax

HAVING syntax — position in the full query
SELECT   column, aggregate_function
FROM     table
WHERE    row_level_condition        -- step 2: filter rows
GROUP BY column                    -- step 3: form groups
HAVING   group_level_condition     -- step 4: filter groups
ORDER BY column                    -- step 5: sort results
LIMIT    n;                        -- step 6: truncate

Stores with more than 3 delivered orders

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

Categories with average price above ₹100

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

Customers who have spent more than ₹1,000

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

// Part 03

HAVING with COUNT — Frequency Filters

Filtering by COUNT is the most common HAVING pattern. It answers questions about minimum activity thresholds — customers with at least N orders, products ordered at least N times, stores processing at least N transactions.

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

// Part 04

HAVING with SUM and AVG — Value Thresholds

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

// Part 05

HAVING with Multiple Conditions

HAVING supports the same AND, OR, and NOT logic as WHERE. You can combine multiple aggregate conditions to define complex group-level filters.

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

// Part 06

WHERE + GROUP BY + HAVING Together — The Full Pattern

In production analytics queries, WHERE, GROUP BY, and HAVING all appear together. Each does its distinct job at its distinct stage of execution.

The three-clause filter pattern
SELECT   store_id, COUNT(*), SUM(total_amount)
FROM     orders
WHERE    order_status = 'Delivered'    -- (1) row filter: only delivered orders
  AND    order_date >= '2024-01-01'   -- (2) row filter: only 2024 orders
GROUP BY store_id                      -- (3) group: one bucket per store
HAVING   COUNT(*) >= 3                 -- (4) group filter: stores with 3+ orders
  AND    SUM(total_amount) > 1500      -- (5) group filter: revenue threshold
ORDER BY SUM(total_amount) DESC;       -- (6) sort by revenue
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

// Part 07

HAVING Without GROUP BY — Rare But Valid

HAVING can be used without GROUP BY. In this case, the entire table (or filtered rows) is treated as one group. HAVING then decides whether that single group appears in the result. This is rarely useful in practice — a WHERE condition on a scalar subquery usually achieves the same result more clearly — but it is syntactically valid.

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
💡 Note
HAVING without GROUP BY is mostly a curiosity. The more common and clearer pattern for "only return results if the aggregate condition is met" is to use a subquery or CTE. In practice, if you find yourself writing HAVING without GROUP BY, ask whether the business question is better served by a different query structure.

// Part 08

HAVING vs WHERE — Performance and Correctness

Understanding when to use WHERE versus HAVING is both a correctness issue and a performance issue. Using HAVING where WHERE should be used is one of the most common SQL performance mistakes.

Correctness — when only HAVING works

HAVING is required when the filter condition references an aggregate function. WHERE cannot reference aggregates — they do not exist at the WHERE stage. This is a hard requirement, not a preference.

Only HAVING works — aggregate conditions
-- These REQUIRE HAVING (cannot use WHERE):
HAVING COUNT(*) > 5                     -- aggregate: row count per group
HAVING SUM(total_amount) > 1000         -- aggregate: sum per group
HAVING AVG(unit_price) > 100            -- aggregate: average per group
HAVING MAX(salary) > MIN(salary) * 2    -- aggregate: comparison of aggregates

Performance — when WHERE is better than HAVING

For conditions on non-aggregate columns, always use WHERE instead of HAVING. WHERE filters rows before grouping — fewer rows means less grouping work. HAVING filters after grouping — all the grouping work is done before the filter is applied.

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
Condition typeUseWhy
Filter on individual column valueWHERERuns before grouping — reduces rows GROUP BY must process
Filter on aggregate result (COUNT, SUM, AVG)HAVINGAggregates do not exist until after GROUP BY — WHERE cannot access them
Filter on GROUP BY column valueWHERE (preferred)Technically works in HAVING but WHERE is faster — filter before grouping
Filter combining row and aggregate conditionsBOTHRow conditions in WHERE, aggregate conditions in HAVING

// Part 09

Advanced HAVING — Filtering on Computed Expressions

HAVING can filter on any aggregate expression — not just raw aggregates. This enables sophisticated group-level filters that express complex business rules.

Filter on a ratio or percentage

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

Filter comparing two aggregates

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

Filter using HAVING with DISTINCT count ratio

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

// Part 10

HAVING in Subqueries and CTEs

HAVING queries are frequently used as subqueries or CTEs — first filtering groups with HAVING, then using that filtered result as input for further queries.

HAVING result as a subquery

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

HAVING in a CTE

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

// Part 11

What This Looks Like at Work

You are a data analyst at Stripe, India's largest payment gateway. The risk team has asked you to identify merchant accounts showing unusual activity patterns that may indicate fraud or policy violations. They have given you three criteria that define "flagged" merchants.

3:00 PM
Risk team criteria arrive
Flag any merchant where: (1) average transaction value exceeds ₹5,000 AND total transaction count is below 10 (low volume, high value — unusual), (2) OR cancellation rate exceeds 30% of all transactions, (3) OR they have only 1 distinct payment method used (legitimate merchants typically use multiple methods). Adapted for FreshCart: flag stores matching analogous patterns.
3:20 PM
You build the query
Three OR conditions on aggregate values — this is a HAVING query with multiple conditions.
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
3:50 PM
Results reviewed
The query returns the flagged stores with the flag reason labelled via CASE WHEN. The risk team immediately has an actionable list. Two stores show high cancellation rates — these are escalated for review. The pattern-detection query is saved to the shared analytics library and scheduled to run weekly.

🎯 Pro Tip

HAVING is the right tool whenever the business question is of the form "show me groups WHERE the aggregate is X." The moment you find yourself wanting to filter by a count, a sum, an average, or any other aggregate result, that is HAVING. The mental model: WHERE is for individual records (rows), HAVING is for groups (buckets). Write WHERE first, GROUP BY next, then add HAVING for any remaining aggregate filters.

// Part 12

Interview Prep — 5 Questions With Complete Answers

Q: What is HAVING and how does it differ from WHERE?

HAVING is a filter clause that operates on groups after GROUP BY has formed them and after aggregate functions have been computed. WHERE is a filter clause that operates on individual rows before GROUP BY processes them. Both filter data, but they work at different stages of query execution and on different things.

The distinction follows directly from SQL's execution order: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY. WHERE runs at step 2, before groups exist. HAVING runs at step 4, after groups and their aggregates are fully computed. This means WHERE can only reference raw column values — the aggregate functions have not run yet. HAVING can reference aggregate results — the groups exist and their metrics are known.

Concrete example: to find stores with more than 5 delivered orders, you need WHERE order_status = 'Delivered' to filter individual orders (a row-level condition), GROUP BY store_id to group them by store, and HAVING COUNT(*) > 5 to keep only groups with more than 5 rows. Using WHERE COUNT(*) > 5 is a syntax error — WHERE cannot reference aggregates. Using HAVING order_status = 'Delivered' instead of WHERE works technically but is inefficient — it groups all orders including cancelled ones, then discards the non-Delivered groups after the grouping work is done. Always put row-level conditions in WHERE and aggregate conditions in HAVING.

Q: Can you use HAVING without GROUP BY?

Yes — HAVING without GROUP BY is syntactically valid in SQL. When HAVING appears without GROUP BY, the entire table (or the rows surviving the WHERE filter) is treated as a single group. HAVING then decides whether that single group appears in the result. If the HAVING condition is TRUE for the aggregate over all rows, the query returns one summary row. If FALSE, the query returns zero rows.

Example: SELECT COUNT(*), SUM(total_amount) FROM orders HAVING COUNT(*) > 100 returns the count and sum if there are more than 100 orders, and returns nothing if there are 100 or fewer. This is technically correct SQL.

In practice, HAVING without GROUP BY is rarely used or recommended. The same result can almost always be expressed more clearly with a subquery or WHERE clause. SELECT COUNT(*), SUM(total_amount) FROM orders WHERE (SELECT COUNT(*) FROM orders) > 100 or putting the condition in application code after checking the count. HAVING without GROUP BY exists in the SQL standard and databases support it, but you will rarely see it in production code and it can confuse readers who expect HAVING to accompany GROUP BY.

Q: Write a query to find customers who have placed more than 2 orders with an average order value above ₹500.

This requires two aggregate conditions on the same group — COUNT of orders and AVG of order values — so both belong in HAVING. The WHERE clause filters to only delivered orders (a row-level condition that should happen before grouping for efficiency).

The query: SELECT c.customer_id, c.first_name || ' ' || c.last_name AS customer, COUNT(o.order_id) AS order_count, ROUND(AVG(o.total_amount), 2) AS avg_order_value FROM customers AS c JOIN orders AS o ON c.customer_id = o.customer_id WHERE o.order_status = 'Delivered' GROUP BY c.customer_id, c.first_name, c.last_name HAVING COUNT(o.order_id) > 2 AND AVG(o.total_amount) > 500 ORDER BY avg_order_value DESC.

The reasoning for each clause: JOIN brings orders alongside customer data. WHERE filters to delivered orders before grouping — this is a row-level condition. GROUP BY customer_id (plus name columns per the non-aggregate rule) creates one group per customer. HAVING COUNT > 2 filters to customers with more than two orders. HAVING AVG > 500 filters to customers whose average order exceeds ₹500. Both HAVING conditions must be satisfied simultaneously (AND). ORDER BY sorts by average order value descending — highest-value customers first. This query pattern — filter by count AND filter by aggregate value — is one of the most common HAVING patterns in customer analytics.

Q: What is the performance implication of using HAVING instead of WHERE for non-aggregate conditions?

Using HAVING for a condition that could be expressed in WHERE is a performance mistake that causes unnecessary work. HAVING filters groups after GROUP BY has already processed all rows. If a non-aggregate condition belongs in WHERE but is mistakenly placed in HAVING, the database groups all rows — including those that would be excluded by the condition — before discarding them at the HAVING stage. All the grouping computation for those excluded rows is wasted.

Concrete example with scale: a table with 10 million orders, 2 million of which are 'Delivered'. HAVING order_status = 'Delivered' after GROUP BY processes all 10 million rows and forms groups for all statuses, then discards non-Delivered groups. WHERE order_status = 'Delivered' filters first — GROUP BY processes only 2 million rows. The WHERE version does 80% less grouping work on this data distribution.

The rule: move any condition expressible on individual row values from HAVING to WHERE. Only conditions requiring aggregate function results belong in HAVING. A quick test: if the condition does not contain COUNT, SUM, AVG, MIN, MAX, or any aggregate function, it should be in WHERE. If it does contain an aggregate, it must be in HAVING. Following this rule consistently keeps GROUP BY queries at their most efficient regardless of table size.

Q: How would you find products that appear in more than half of all orders?

This requires comparing a per-product aggregate (how many orders contain this product) against an overall aggregate (total number of orders). It is a HAVING filter where the threshold is itself a computed value — not a hardcoded number.

Approach 1 — subquery in HAVING: SELECT product_id, COUNT(DISTINCT order_id) AS order_count FROM order_items GROUP BY product_id HAVING COUNT(DISTINCT order_id) > (SELECT COUNT(*) * 0.5 FROM orders). The subquery (SELECT COUNT(*) * 0.5 FROM orders) computes half the total order count and is evaluated once. HAVING then compares each product's order count against this threshold.

Approach 2 — window function (more elegant): SELECT product_id, COUNT(DISTINCT order_id) AS order_count, COUNT(DISTINCT order_id) * 1.0 / COUNT(DISTINCT order_id) OVER () AS fraction_of_orders FROM order_items GROUP BY product_id HAVING COUNT(DISTINCT order_id) * 1.0 / (SELECT COUNT(*) FROM orders) > 0.5. The subquery approach is simpler and more universally supported. In practice, "appears in more than X% of all orders" is called market basket penetration rate — a key metric in retail analytics used to identify anchor products that drive traffic. Products with very high penetration (>50%) are typically staples like milk, bread, or cooking oil that customers purchase on almost every visit.

// Part 13

Errors You Will Hit — And Exactly Why They Happen

ERROR: aggregate functions are not allowed in WHERE

Cause: You tried to use COUNT, SUM, AVG, MIN, or MAX inside a WHERE clause. WHERE runs before GROUP BY in the execution order — aggregate values do not exist yet at the WHERE stage. The database cannot evaluate an aggregate condition on individual rows because aggregates are computed across groups of rows, not single rows.

Fix: Move the aggregate condition to HAVING. HAVING runs after GROUP BY when groups and their aggregates are fully computed. WHERE COUNT(*) > 5 becomes HAVING COUNT(*) > 5. If there is no GROUP BY in your query, add one: GROUP BY the relevant column, then add HAVING. If you genuinely want to filter based on a whole-table aggregate (not per-group), use a subquery in WHERE: WHERE column > (SELECT AVG(column) FROM table).

HAVING clause returns no rows — expected results

Cause: The HAVING condition is too restrictive — no group satisfies it. This happens when the threshold in HAVING is higher than any group's aggregate value, when the WHERE clause already filtered too aggressively leaving insufficient rows in any group, or when the GROUP BY columns create too many small groups (each with a low count) and the HAVING COUNT threshold is never reached.

Fix: Debug by temporarily removing the HAVING clause and examining what the aggregates actually are: SELECT store_id, COUNT(*) AS cnt FROM orders GROUP BY store_id ORDER BY cnt DESC. See the maximum COUNT value across all groups. If the maximum is 3 and your HAVING COUNT(*) > 10 was expecting results, the data does not have groups that large. Adjust the HAVING threshold to match realistic data distributions, or check whether the WHERE clause is too restrictive.

HAVING filters correctly but query is very slow on a large table

Cause: A non-aggregate condition is in HAVING instead of WHERE — forcing the database to group all rows before filtering. For example, HAVING order_status = 'Delivered' instead of WHERE order_status = 'Delivered'. The database must group all rows (including Cancelled, Returned, Processing) into their groups, compute aggregates for all groups, and only then discard the non-Delivered groups. All the grouping work for non-Delivered rows is wasted.

Fix: Move any condition on non-aggregate columns from HAVING to WHERE. Scan the HAVING clause: if any condition does not reference an aggregate function (COUNT, SUM, AVG, MIN, MAX), move it to WHERE. This pre-filters rows before GROUP BY, dramatically reducing the number of rows that need to be grouped. Use EXPLAIN ANALYZE to compare execution plans before and after — the filter-before-group plan will show significantly lower row counts at the GROUP BY node.

HAVING with SELECT alias fails — column 'order_count' does not exist in HAVING

Cause: You tried to use a SELECT alias in HAVING: SELECT COUNT(*) AS order_count ... HAVING order_count > 5. HAVING runs before SELECT in the logical execution order — the alias 'order_count' does not exist yet when HAVING evaluates. This is the same reason SELECT aliases cannot be used in WHERE.

Fix: Repeat the aggregate expression in HAVING: HAVING COUNT(*) > 5. Do not reference the alias. In DuckDB and MySQL, using SELECT aliases in HAVING sometimes works due to implementation-specific extensions, but relying on this is not portable to PostgreSQL or standard SQL. For complex expressions repeated in both SELECT and HAVING, use a CTE: WITH agg AS (SELECT store_id, COUNT(*) AS cnt FROM orders GROUP BY store_id) SELECT store_id, cnt FROM agg WHERE cnt > 5 — the outer query can reference the alias because it is now a proper column in the CTE result.

Query returns wrong counts — HAVING COUNT(*) includes rows that should be excluded

Cause: A condition that should be in WHERE is absent, causing rows that should be excluded to still participate in grouping and be counted by the HAVING aggregate. For example, counting all order statuses when only 'Delivered' orders should count — HAVING COUNT(*) > 3 counts every order, not just delivered ones, so a store with 2 delivered and 2 cancelled orders passes the threshold incorrectly.

Fix: Add the appropriate WHERE condition: WHERE order_status = 'Delivered'. This excludes non-Delivered rows before grouping, so COUNT(*) in HAVING only counts the rows you care about. Alternatively, use conditional counting in HAVING: HAVING SUM(CASE WHEN order_status = 'Delivered' THEN 1 ELSE 0 END) > 3 — this counts only Delivered rows within each group without filtering other rows from the group. Use conditional counting when you need both the filtered count and other metrics (like total orders including all statuses) in the same query.

Try It Yourself

The FreshCart loyalty team needs to identify customers for a targeted re-engagement campaign. Write a query that finds customers who meet ALL of these criteria: placed at least 2 delivered orders, have an average delivered order value between ₹300 and ₹800 (not too cheap, not already high-value), and their most recent delivered order was before '2024-03-01' (lapsed customers). Show customer_id, full name, city, loyalty_tier, order_count, avg_order_value (rounded to 2 decimal places), and last_order_date. Sort by last_order_date ascending (longest lapsed first).

🎯 Key Takeaways

  • HAVING filters groups after GROUP BY has formed them and aggregates have been computed. WHERE filters individual rows before GROUP BY. Same concept — different execution stage, different purpose.
  • HAVING is required when the filter condition references an aggregate function (COUNT, SUM, AVG, MIN, MAX). WHERE cannot reference aggregates — they do not exist at the WHERE stage.
  • Always put non-aggregate conditions in WHERE, not HAVING. WHERE filters rows before grouping — less work for GROUP BY. HAVING filters after all grouping is done — wasted work on excluded groups.
  • Full execution order: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT. HAVING sees aggregate results; WHERE does not.
  • HAVING supports AND, OR, NOT — the same logical operators as WHERE. Multiple aggregate conditions can be combined: HAVING COUNT(*) > 5 AND AVG(amount) > 1000.
  • HAVING can filter on computed aggregate expressions: ratios (SUM/COUNT), comparisons between aggregates (MAX > MIN * 2), and percentage thresholds.
  • SELECT aliases cannot be used in HAVING — HAVING runs before SELECT. Repeat the full aggregate expression in HAVING, or use a CTE to define the alias first.
  • HAVING without GROUP BY is valid — treats all rows as one group. Rarely needed in practice; a subquery or application logic usually serves the same purpose more clearly.
  • HAVING results work as subqueries and CTEs — filter groups first with HAVING, then use the result for further joins, aggregations, or reporting.
  • The WHERE + GROUP BY + HAVING triad is the foundation of every analytics query. WHERE selects the rows, GROUP BY defines the dimensions, HAVING filters the metrics.

What comes next

In Module 30, you learn JOINs — the most powerful concept in relational SQL. How to combine data from multiple tables, the difference between INNER, LEFT, RIGHT, and FULL OUTER JOINs, and why joins are the foundation of every non-trivial query.

Module 30 → Introduction to JOINs
Share

Discussion

0

Have a better approach? Found something outdated? Share it — your knowledge helps everyone learning here.

Continue with GitHub
Loading...