Python · SQL · Web Dev · Java · AI/ML tracks launching soon — your one platform for all of IT
SQL — Module 39Intermediate
UNION, INTERSECT, and EXCEPT
Combine result sets vertically — stack rows from multiple queries, find common rows, subtract one set from another, and every rule about column matching, deduplication, and ordering
Every SQL operation you have learned — JOINs, subqueries, aggregates — combines data horizontally, adding columns. Set operations combine data vertically, stacking rows from multiple queries on top of each other. The result has the same columns as each input query but more (or fewer) rows.
SQL provides three set operations, each from classical set theory:
UNION / UNION ALL
Returns
All rows from both queries combined
Deduplicates
UNION: yes. UNION ALL: no
Think of it as
A ∪ B — all rows from either set
Use for
Combining data from multiple sources, combining multiple segments, stacking historical periods
INTERSECT
Returns
Only rows that appear in BOTH queries
Deduplicates
Yes — always
Think of it as
A ∩ B — rows in common between both sets
Use for
Finding rows that satisfy two independent conditions simultaneously
EXCEPT / MINUS
Returns
Rows from the first query that do NOT appear in the second
Deduplicates
Yes — always
Think of it as
A − B — rows in the first set but not the second
Use for
Subtracting one result set from another, finding what changed between two periods
The fundamental rule — columns must match
All set operations require the same number of columns in the same order with compatible data types across all queries being combined. Column names in the result come from the first query.
Set operation column matching rule
-- VALID: same column count, compatible types
SELECT customer_id, first_name, city FROM customers
UNION ALL
SELECT customer_id, first_name, city FROM former_customers;
-- INVALID: different column counts
SELECT customer_id, first_name FROM customers
UNION ALL
SELECT customer_id, first_name, city FROM former_customers;
-- ERROR: each SELECT must have the same number of columns
-- Column names come from the FIRST query:
SELECT customer_id AS id, first_name AS name FROM customers
UNION ALL
SELECT employee_id, first_name FROM employees;
-- Result columns are named: id, name (from the first SELECT)
// Part 02
UNION vs UNION ALL — Deduplication Cost
UNION removes duplicate rows — any row that appears in both queries appears only once in the result. UNION ALL keeps every row from both queries, including duplicates. UNION ALL is almost always the right choice when you know the result sets are disjoint (no overlapping rows) — it is significantly faster because it skips the deduplication sort.
UNION vs UNION ALL — the difference
-- UNION: deduplicates by sorting and comparing all rows
-- If the same row appears in both queries, it appears once in the result
-- Cost: O(n log n) — requires sorting all rows to find duplicates
-- UNION ALL: no deduplication — just concatenates both result sets
-- If the same row appears in both queries, it appears twice in the result
-- Cost: O(n) — just append rows
-- When to use UNION (with deduplication):
-- When the two queries might return the same row and you want each row once
-- Example: combining active and inactive customer lists where an ID could be in both
-- When to use UNION ALL (without deduplication):
-- When the two queries return disjoint sets (no possible overlap)
-- When you intentionally want duplicates (counting occurrences)
-- When performance matters — UNION ALL is always faster than UNION
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
// Part 03
UNION ALL — The Workhorse of Vertical Combination
UNION ALL is the most used set operation in production SQL. It stacks rows from multiple queries without deduplication — exactly what you need when combining disjoint data sources, multiple time periods, or labelled segments.
Combining multiple customer segments into one list
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
Combining data from multiple periods
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
Combining different tables with compatible structures
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
// Part 04
UNION for Report Subtotals — Combining Detail and Summary
A classic UNION ALL pattern in reporting: combine detail rows with summary rows. The result looks like a hierarchical report — individual records at the top (or bottom) with a total row appended.
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
💡 Note
The UNION ALL + totals pattern is a manual alternative to ROLLUP (covered in Module 28). ROLLUP is more concise for standard hierarchical totals. UNION ALL gives more control when you need custom labels, different formatting, or non-standard aggregation levels.
// Part 05
INTERSECT — Finding Common Rows
INTERSECT returns only rows that appear in both result sets. It is the SQL equivalent of set intersection — A ∩ B. Rows unique to either side are excluded. INTERSECT always deduplicates.
Customers who have placed both delivered AND cancelled orders
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
Products sold in multiple specific stores
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
INTERSECT vs INNER JOIN — when each is appropriate
INTERSECT vs INNER JOIN — different tools
-- INTERSECT: find customer_ids that appear in BOTH queries
-- Works on the result sets — finds common values across two queries
SELECT customer_id FROM orders WHERE order_status = 'Delivered'
INTERSECT
SELECT customer_id FROM orders WHERE order_status = 'Cancelled';
-- Equivalent using EXISTS (more flexible — can filter differently):
SELECT DISTINCT o1.customer_id
FROM orders AS o1
WHERE o1.order_status = 'Delivered'
AND EXISTS (
SELECT 1 FROM orders AS o2
WHERE o2.customer_id = o1.customer_id
AND o2.order_status = 'Cancelled'
);
-- INNER JOIN: combines columns from two different tables
-- Completely different purpose — adds columns, not finds common rows
-- Do not use INNER JOIN where INTERSECT is appropriate
// Part 06
EXCEPT — Subtracting One Set From Another
EXCEPT (called MINUS in Oracle and some other databases) returns rows from the first query that do not appear in the second query. It is the set difference — A − B. Rows unique to the second query are discarded. EXCEPT always deduplicates.
Customers who ordered but never cancelled
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
Products sold in January but not in February
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
EXCEPT vs NOT EXISTS vs LEFT JOIN IS NULL
Three ways to find set difference — comparison
-- Goal: customer_ids in delivered orders but not in cancelled orders
-- EXCEPT (cleanest — set operation on result sets):
SELECT customer_id FROM orders WHERE order_status = 'Delivered'
EXCEPT
SELECT customer_id FROM orders WHERE order_status = 'Cancelled';
-- NOT EXISTS (most flexible — any conditions in subquery):
SELECT DISTINCT o.customer_id
FROM orders AS o
WHERE o.order_status = 'Delivered'
AND NOT EXISTS (
SELECT 1 FROM orders AS o2
WHERE o2.customer_id = o.customer_id
AND o2.order_status = 'Cancelled'
);
-- LEFT JOIN IS NULL (when you need right-side columns too):
SELECT DISTINCT o1.customer_id
FROM orders AS o1
LEFT JOIN orders AS o2
ON o1.customer_id = o2.customer_id
AND o2.order_status = 'Cancelled'
WHERE o1.order_status = 'Delivered'
AND o2.order_id IS NULL;
-- All three return the same customer_ids
-- EXCEPT: most concise for pure set difference on complete queries
-- NOT EXISTS: more flexible, handles complex multi-table conditions
-- LEFT JOIN IS NULL: use when right-side columns are needed
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
// Part 07
Ordering and LIMIT With Set Operations
ORDER BY and LIMIT apply to the entire result of the set operation — not to individual queries. They must appear after the final query in the set operation chain. Attempting to add ORDER BY to an individual query within a UNION causes an error in most databases.
ORDER BY and LIMIT with set operations
-- CORRECT: ORDER BY after the final query
SELECT customer_id, 'Delivered' AS type FROM orders WHERE order_status = 'Delivered'
UNION ALL
SELECT customer_id, 'Cancelled' AS type FROM orders WHERE order_status = 'Cancelled'
ORDER BY customer_id, type; -- applies to the full combined result
-- CORRECT: LIMIT after ORDER BY
SELECT city FROM customers
UNION
SELECT city FROM stores
ORDER BY city
LIMIT 5;
-- WRONG: ORDER BY inside an individual query
SELECT customer_id FROM orders WHERE order_status = 'Delivered'
ORDER BY customer_id -- ERROR in most databases
UNION ALL
SELECT customer_id FROM orders WHERE order_status = 'Cancelled';
-- WORKAROUND: wrap individual queries in subqueries if you need separate ordering
-- (rarely needed — usually ORDER BY on the final result is what you want)
SELECT * FROM (
SELECT customer_id, total_amount FROM orders WHERE order_status = 'Delivered'
ORDER BY total_amount DESC LIMIT 5
) AS top_delivered
UNION ALL
SELECT * FROM (
SELECT customer_id, total_amount FROM orders WHERE order_status = 'Cancelled'
ORDER BY total_amount DESC LIMIT 5
) AS top_cancelled;
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
// Part 08
Practical Patterns — Set Operations in Production Analytics
Full UNION pipeline report — all order statuses in one view
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
Change detection — what changed between two snapshots
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
Finding symmetric difference with EXCEPT + UNION ALL
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
// Part 09
What This Looks Like at Work
You are a data engineer at Square. Every morning at 7 AM, the reconciliation job produces a report comparing yesterday's transaction data across three systems: the primary OLTP database, the data warehouse, and the partner bank's settlement file. Each source must agree. You use EXCEPT and UNION ALL to identify discrepancies.
7:00 AM
Reconciliation job starts
Three sources need to agree: primary DB transactions, warehouse transactions, settlement file transactions. Adapted for FreshCart: compare ST001 orders, ST002 orders, and a combined view.
7:05 AM
Step 1 — find order IDs in one source but not the other
EXCEPT identifies order_ids present in one system but missing from another.
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
7:20 AM
Step 2 — full reconciliation summary
UNION ALL combines: matched records, ST001-only records, and ST002-only records into a complete reconciliation view.
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
7:35 AM
Reconciliation report delivered
The UNION ALL of three result sets — matched, source-A-only, source-B-only — produces a complete reconciliation report. Any row with "⚠" status routes to the investigation queue. The report runs in under 5 seconds and is automatically emailed to the operations team.
🎯 Pro Tip
Reconciliation reports are one of the highest-value use cases for UNION ALL + EXCEPT. The pattern: (1) define each source as a CTE, (2) EXCEPT to find rows exclusive to each source, (3) UNION ALL to combine matched and exclusive rows with a status label. This architecture handles any number of sources — add one more CTE and one more UNION ALL block.
// Part 10
Interview Prep — 5 Questions With Complete Answers
Q: What is the difference between UNION and UNION ALL?
Both UNION and UNION ALL combine the result sets of two or more queries by stacking rows vertically. The difference is deduplication. UNION removes duplicate rows — if the same row appears in both queries, it appears only once in the final result. UNION ALL keeps all rows from all queries, including duplicates — if the same row appears in both queries, it appears twice.
The performance implication is significant. UNION must sort or hash all rows to identify and remove duplicates — O(n log n) work. UNION ALL simply appends result sets — O(n) work. UNION ALL is always faster than UNION. On large result sets, this difference can be substantial.
When to use each: use UNION ALL when you know the result sets are disjoint (no possible overlapping rows) — different time periods, different status values, different segments. Use UNION when the same row could appear in multiple queries and you want each unique row once — combining a current customer list with a historical list where some customers appear in both. The common mistake is defaulting to UNION (with deduplication) when UNION ALL is correct and faster. Always ask: "can the same row appear in both queries?" If yes and you want it once, use UNION. If yes and you want it multiple times, or if it is impossible for the same row to appear twice, use UNION ALL.
Q: What requirements must be met for a UNION or set operation to be valid?
Three requirements must be satisfied for any set operation (UNION, UNION ALL, INTERSECT, EXCEPT) to execute successfully. First: all queries must return the same number of columns. A SELECT with 3 columns cannot be combined with a SELECT with 4 columns — the column count must match exactly across all queries in the set operation.
Second: the corresponding columns must have compatible data types. Column 1 of query A must be compatible with column 1 of query B — both numeric, both text, both date, etc. Incompatible types (combining a DATE column with an INTEGER column in the same position) cause a type error. Most databases will attempt implicit type casting for closely related types (INTEGER and DECIMAL are usually compatible) but will error on clearly incompatible combinations (DATE and VARCHAR).
Third: ORDER BY and LIMIT must appear after the final query and apply to the entire combined result — not to individual queries within the set operation. Adding ORDER BY to an individual query within UNION causes a syntax error in most databases (PostgreSQL, MySQL 8+). If you need to sort or limit individual queries before combining, wrap each query in a subquery: SELECT * FROM (SELECT ... ORDER BY ... LIMIT n) AS q1 UNION ALL SELECT * FROM (SELECT ... ORDER BY ... LIMIT n) AS q2. The column names in the result come from the first query — if query A has column named "customer_name" and query B has column named "emp_name", the result column is named "customer_name" throughout.
Q: When would you use INTERSECT instead of a JOIN?
INTERSECT finds rows that appear in both result sets simultaneously — it operates on complete result rows, not on join keys. It is the natural choice when you want to find entities that satisfy two independent conditions that can be expressed as separate queries, and you want the common elements.
INTERSECT is appropriate when: the two queries are logically independent (neither is a "lookup" of the other), both queries naturally return the same columns, and you want the rows that appear in both. Example: customer_ids who placed delivered orders INTERSECT customer_ids who placed cancelled orders — these are two independent queries against the same table, and INTERSECT cleanly finds customers who appear in both.
JOIN is appropriate when you want to combine columns from different tables based on a relationship — horizontally expanding the result. INTERSECT is a vertical set operation — it does not add columns, it finds common rows. A JOIN of customers to orders adds order columns to customer rows. An INTERSECT of delivered customer_ids and cancelled customer_ids finds customer_ids that appear in both sets — no additional columns. When to prefer INTERSECT over a JOIN + filter: when both conditions are expressed on the same table with different WHERE clauses (same column, different values), INTERSECT is more direct. When you need columns from both result sets, EXISTS or JOIN is more appropriate than INTERSECT because INTERSECT does not give you access to the second query's specific rows.
Q: What is the difference between EXCEPT and NOT IN / NOT EXISTS for finding set differences?
All three find rows in one set that are absent from another, but they work at different levels of abstraction and have different capabilities. EXCEPT operates on complete result sets — it compares entire rows across two queries and returns rows from the first query that are not present in the second. It always deduplicates. It works cleanly when both queries return the same columns naturally.
NOT IN operates on a single column — it filters outer rows where a specific column's value does not appear in a list returned by the subquery. NOT IN fails silently when the subquery contains NULL values. NOT EXISTS is a correlated subquery that checks row-by-row whether a related record exists — it is NULL-safe and can handle complex multi-column or multi-table conditions in the subquery.
Choose based on the problem structure: EXCEPT when both sides are naturally expressed as full queries returning the same columns — "customer_ids who ordered in January" EXCEPT "customer_ids who ordered in February" is a natural set difference. NOT EXISTS when the anti-join condition is complex — involves multiple tables, multiple conditions, or you need flexibility beyond what two parallel queries provide. NOT IN when the subquery is simple, small, and guaranteed NULL-free. EXCEPT is cleanest for comparing two entire queries; NOT EXISTS is most flexible for row-level anti-join with complex conditions. Performance: all three produce anti-join plans in the optimiser for most databases — performance is typically equivalent when proper indexes exist.
Q: How do set operations handle NULL values?
Set operations treat NULL values as equal to each other for the purpose of deduplication and comparison — unlike standard SQL WHERE comparisons where NULL ≠ NULL. This is a specific exception to NULL's usual semantics and it is important for understanding UNION, INTERSECT, and EXCEPT behaviour.
In UNION (with deduplication): if both queries return a row where a specific column is NULL, the NULL row appears only once in the result — the database treats the two NULL values as "equal" for deduplication. In INTERSECT: a row with NULL in one query can match a row with NULL in the other query — NULLs are considered equal. In EXCEPT: a row with NULL in the first query is considered matched by a row with NULL in the second query and is therefore excluded from the EXCEPT result.
This NULL-equality behaviour in set operations is defined by the SQL standard and implemented consistently in PostgreSQL, MySQL 8+, and SQL Server. It differs from WHERE NULL = NULL (which evaluates to NULL, not TRUE) and is closer to IS NOT DISTINCT FROM semantics. Practical implication: if you are using EXCEPT to find "new" rows and some rows have NULL keys, those NULL-key rows from the first query will be considered matched by NULL-key rows from the second query and excluded from EXCEPT — which is usually the desired behaviour. When you specifically need NULL to not match NULL in a set operation context, you need a more complex approach using NOT EXISTS with IS NULL checks.
// Part 11
Errors You Will Hit — And Exactly Why They Happen
ERROR: each UNION query must have the same number of columns
Cause: The two queries being combined with UNION, UNION ALL, INTERSECT, or EXCEPT return different numbers of columns. Set operations require exactly the same column count in all queries. This commonly happens when adding a new column to one query in a set operation without updating the others, or when copying a query template and forgetting to match the column count.
Fix: Count the columns in each SELECT clause and ensure they match. Add NULL AS column_name placeholder columns to queries that have fewer columns: SELECT customer_id, first_name, NULL AS city FROM employees UNION ALL SELECT customer_id, first_name, city FROM customers. Each NULL placeholder occupies a column position and is cast to the appropriate type from the other query's column. Alternatively, remove columns from the query that has more until they match.
UNION is very slow — takes 30 seconds for queries that individually take under 1 second
Cause: UNION (without ALL) performs deduplication by sorting all combined rows — O(n log n) work even when the result sets are already disjoint and contain no duplicates. The sort is expensive proportional to the total number of rows across all combined queries. With large result sets, this deduplication sort becomes the dominant cost.
Fix: Use UNION ALL instead of UNION if the result sets are disjoint (which they often are when you are combining queries with mutually exclusive WHERE conditions). UNION ALL skips the deduplication sort entirely — O(n) instead of O(n log n). Only use UNION (with deduplication) when the same row could genuinely appear in multiple queries and you need each row once. Verify the result sets are disjoint by confirming their WHERE conditions are mutually exclusive: WHERE status = 'Active' and WHERE status = 'Inactive' cannot produce the same row — use UNION ALL.
ERROR: ORDER BY clause not allowed at this point — inside a UNION query
Cause: An ORDER BY was placed inside one of the individual queries within a UNION/UNION ALL rather than after the final query. ORDER BY inside a UNION member query is not allowed in standard SQL (PostgreSQL, MySQL 8+) because it would apply only to that query's result before the set operation combines them — which is undefined behaviour since the set operation does not preserve ordering anyway.
Fix: Move ORDER BY to after the last query in the set operation: SELECT ... UNION ALL SELECT ... ORDER BY column. This applies ORDER BY to the entire combined result. If you need to sort and limit individual queries before combining them (for example, top-5 from each source), wrap each query in a subquery: SELECT * FROM (SELECT ... ORDER BY col LIMIT 5) AS q1 UNION ALL SELECT * FROM (SELECT ... ORDER BY col LIMIT 5) AS q2. The subquery wrapping allows ORDER BY and LIMIT to apply to each part independently.
EXCEPT returns zero rows — expected to find differences between two result sets
Cause: The two queries passed to EXCEPT return identical result sets, so EXCEPT eliminates all rows from the first query. Or the column types or values differ slightly between the two queries (different precision, different string formatting, leading/trailing whitespace) preventing rows from being considered equal — so EXCEPT does not eliminate any rows from the first query and returns more rows than expected.
Fix: Run each query independently and compare the results visually. Check column types — EXCEPT considers rows identical only when all column values match including type. A DECIMAL(10,2) value of 5.00 may not match an INTEGER value of 5 in an EXCEPT comparison. Ensure consistent data types and formatting across both queries. Use ROUND() or CAST() to normalise numeric precision before EXCEPT. For string columns, use TRIM() and LOWER() to normalise whitespace and case before comparing.
INTERSECT returns fewer rows than expected — some common rows are missing
Cause: The rows that should be common between the two queries are not considered identical by INTERSECT because of type or value differences in one or more columns. INTERSECT compares entire rows — all columns must match exactly. If column 3 has a different value or type in the two queries (even by one decimal place, trailing space, or NULL vs empty string), the row is not considered a match.
Fix: Simplify by reducing the columns being compared — only include the columns that define the common identity. SELECT customer_id FROM query_a INTERSECT SELECT customer_id FROM query_b is more reliable than comparing all columns. If you need multi-column INTERSECT, ensure consistent formatting: use the same ROUND() precision, the same TRIM() and LOWER() normalisation, and the same NULL handling (COALESCE to a default value) across both queries. Use FULL OUTER JOIN instead of INTERSECT for debugging — it shows matched and unmatched rows with all column values visible.
Try It Yourself
Write three queries demonstrating all three set operations: (1) UNION ALL — produce a combined 'activity feed' showing the 5 most recent delivered orders and the 5 most expensive products, with columns: id (order_id or product_id), name (a description), amount (total_amount or unit_price), and type ('Order' or 'Product'). (2) INTERSECT — find store_ids that appear in BOTH delivered orders AND have at least one employee assigned. Show just the store_id. (3) EXCEPT — find product categories that appear in the products table but have NOT appeared in any delivered order's order_items. Show just the category name, sorted alphabetically.
🎯 Key Takeaways
✓Set operations combine query results vertically — stacking rows. JOINs combine horizontally — adding columns. They solve different problems.
✓Three set operations: UNION (all rows, deduplicates), INTERSECT (common rows only), EXCEPT (first minus second). All require matching column count and compatible types.
✓UNION ALL skips deduplication — O(n) vs UNION's O(n log n). Default to UNION ALL when result sets are disjoint or duplicates are acceptable. Only use UNION when you need deduplication across sources.
✓Column names in the result come from the first query. All subsequent queries contribute rows but their column names are ignored.
✓ORDER BY and LIMIT apply to the entire combined result and must appear after the final query. To sort/limit individual queries, wrap each in a subquery.
✓INTERSECT finds rows that satisfy two independent conditions — entities that appear in both result sets. NULL = NULL for set operation comparisons (unlike standard WHERE).
✓EXCEPT returns rows from the first query not present in the second. Equivalent to NOT EXISTS and LEFT JOIN IS NULL for set differences — choose based on complexity of conditions.
✓Set operations always deduplicate (except UNION ALL). UNION deduplicates across both queries. INTERSECT and EXCEPT deduplicate by definition.
✓Symmetric difference (rows unique to either side): (A EXCEPT B) UNION ALL (B EXCEPT A). This finds everything that does not overlap.
✓Reconciliation reports are the primary production use case: EXCEPT to find missing records, UNION ALL to combine matched and unmatched sets with status labels into one complete reconciliation view.
What comes next
In Module 40, you learn derived tables in depth — subqueries in FROM as building blocks for multi-step analytics, the rules for aliasing and referencing, and when a derived table beats a CTE.