SQL — Module 39Intermediate

UNION, INTERSECT, and EXCEPT

Combine result sets vertically — stack rows from multiple queries, find common rows, subtract one set from another, and every rule about column matching, deduplication, and ordering

12–16 min April 2026

Section 8 · Subqueries & Set Operations

Subqueries & Set Operations · 5 modulesModule 39

Subqueries Correlated EXISTS UNION,Derived

// Part 01

Set Operations — Combining Queries Vertically

Every SQL operation you have learned — JOINs, subqueries, aggregates — combines data horizontally, adding columns. Set operations combine data vertically, stacking rows from multiple queries on top of each other. The result has the same columns as each input query but more (or fewer) rows.

SQL provides three set operations, each from classical set theory:

UNION / UNION ALL

Returns

All rows from both queries combined

Deduplicates

UNION: yes. UNION ALL: no

Think of it as

A ∪ B — all rows from either set

Use for

Combining data from multiple sources, combining multiple segments, stacking historical periods

INTERSECT

Returns

Only rows that appear in BOTH queries

Deduplicates

Yes — always

Think of it as

A ∩ B — rows in common between both sets

Use for

Finding rows that satisfy two independent conditions simultaneously

EXCEPT / MINUS

Returns

Rows from the first query that do NOT appear in the second

Deduplicates

Yes — always

Think of it as

A − B — rows in the first set but not the second

Use for

Subtracting one result set from another, finding what changed between two periods

The fundamental rule — columns must match

All set operations require the same number of columns in the same order with compatible data types across all queries being combined. Column names in the result come from the first query.

Set operation column matching rule

-- VALID: same column count, compatible types
SELECT customer_id, first_name, city FROM customers
UNION ALL
SELECT customer_id, first_name, city FROM former_customers;

-- INVALID: different column counts
SELECT customer_id, first_name FROM customers
UNION ALL
SELECT customer_id, first_name, city FROM former_customers;
-- ERROR: each SELECT must have the same number of columns

-- Column names come from the FIRST query:
SELECT customer_id AS id, first_name AS name FROM customers
UNION ALL
SELECT employee_id, first_name FROM employees;
-- Result columns are named: id, name (from the first SELECT)

// Part 02

UNION vs UNION ALL — Deduplication Cost

UNION removes duplicate rows — any row that appears in both queries appears only once in the result. UNION ALL keeps every row from both queries, including duplicates. UNION ALL is almost always the right choice when you know the result sets are disjoint (no overlapping rows) — it is significantly faster because it skips the deduplication sort.

UNION vs UNION ALL — the difference

-- UNION: deduplicates by sorting and comparing all rows
-- If the same row appears in both queries, it appears once in the result
-- Cost: O(n log n) — requires sorting all rows to find duplicates

-- UNION ALL: no deduplication — just concatenates both result sets
-- If the same row appears in both queries, it appears twice in the result
-- Cost: O(n) — just append rows

-- When to use UNION (with deduplication):
-- When the two queries might return the same row and you want each row once
-- Example: combining active and inactive customer lists where an ID could be in both

-- When to use UNION ALL (without deduplication):
-- When the two queries return disjoint sets (no possible overlap)
-- When you intentionally want duplicates (counting occurrences)
-- When performance matters — UNION ALL is always faster than UNION

Loading FreshCart DB…
-- UNION: deduplicates rows
-- If a customer has both delivered and cancelled orders,
-- their customer_id would only appear once
SELECT DISTINCT customer_id FROM orders WHERE order_status = 'Delivered'
UNION
SELECT DISTINCT customer_id FROM orders WHERE order_status = 'Cancelled'
ORDER BY customer_id;Ctrl + Enter to run
Loading FreshCart database in your browser…

Loading FreshCart DB…
-- UNION ALL: keeps all rows including duplicates
-- Customers appearing in both segments appear twice
SELECT DISTINCT customer_id, 'Delivered' AS status_type
FROM orders WHERE order_status = 'Delivered'
UNION ALL
SELECT DISTINCT customer_id, 'Cancelled' AS status_type
FROM orders WHERE order_status = 'Cancelled'
ORDER BY customer_id, status_type;Ctrl + Enter to run
Loading FreshCart database in your browser…

// Part 03

UNION ALL — The Workhorse of Vertical Combination

UNION ALL is the most used set operation in production SQL. It stacks rows from multiple queries without deduplication — exactly what you need when combining disjoint data sources, multiple time periods, or labelled segments.

Combining multiple customer segments into one list

Loading FreshCart DB…
-- Three customer segments combined into one ranked list
SELECT
  c.customer_id,
  c.first_name || ' ' || c.last_name  AS customer,
  c.loyalty_tier,
  'VIP'                               AS segment
FROM customers AS c
WHERE c.loyalty_tier = 'Platinum'

UNION ALL

SELECT
  c.customer_id,
  c.first_name || ' ' || c.last_name,
  c.loyalty_tier,
  'High Value'                        AS segment
FROM customers AS c
WHERE c.loyalty_tier = 'Gold'

UNION ALL

SELECT
  c.customer_id,
  c.first_name || ' ' || c.last_name,
  c.loyalty_tier,
  'Standard'                          AS segment
FROM customers AS c
WHERE c.loyalty_tier IN ('Silver', 'Bronze')

ORDER BY segment, loyalty_tier, customer_id;Ctrl + Enter to run
Loading FreshCart database in your browser…

Combining data from multiple periods

Loading FreshCart DB…
-- Stack January and February order summaries
SELECT
  'January 2024'                       AS period,
  store_id,
  COUNT(*)                             AS orders,
  ROUND(SUM(total_amount), 2)          AS revenue
FROM orders
WHERE order_status = 'Delivered'
  AND strftime('%m', order_date) = '01'
  AND strftime('%Y', order_date) = '2024'
GROUP BY store_id

UNION ALL

SELECT
  'February 2024'                      AS period,
  store_id,
  COUNT(*)                             AS orders,
  ROUND(SUM(total_amount), 2)          AS revenue
FROM orders
WHERE order_status = 'Delivered'
  AND strftime('%m', order_date) = '02'
  AND strftime('%Y', order_date) = '2024'
GROUP BY store_id

ORDER BY store_id, period;Ctrl + Enter to run
Loading FreshCart database in your browser…

Combining different tables with compatible structures

Loading FreshCart DB…
-- All people in the FreshCart system: customers and employees
-- Both have first_name and last_name — combine into one list
SELECT
  customer_id  AS person_id,
  first_name,
  last_name,
  city,
  'Customer'   AS person_type
FROM customers

UNION ALL

SELECT
  employee_id,
  first_name,
  last_name,
  NULL         AS city,      -- employees have no city column
  'Employee'   AS person_type
FROM employees

ORDER BY person_type, last_name, first_name;Ctrl + Enter to run
Loading FreshCart database in your browser…

// Part 04

UNION for Report Subtotals — Combining Detail and Summary

A classic UNION ALL pattern in reporting: combine detail rows with summary rows. The result looks like a hierarchical report — individual records at the top (or bottom) with a total row appended.

Loading FreshCart DB…
-- Per-store revenue with a grand total row appended
SELECT
  store_id,
  store_id                             AS label,
  COUNT(*)                             AS orders,
  ROUND(SUM(total_amount), 2)          AS revenue
FROM orders
WHERE order_status = 'Delivered'
GROUP BY store_id

UNION ALL

SELECT
  'TOTAL'                              AS store_id,
  'ALL STORES'                         AS label,
  COUNT(*)                             AS orders,
  ROUND(SUM(total_amount), 2)          AS revenue
FROM orders
WHERE order_status = 'Delivered'

ORDER BY store_id;Ctrl + Enter to run
Loading FreshCart database in your browser…

Loading FreshCart DB…
-- Category breakdown with subtotals per category and grand total
SELECT
  category,
  brand,
  COUNT(*)                             AS products,
  ROUND(AVG(unit_price), 2)            AS avg_price
FROM products
WHERE in_stock = true
GROUP BY category, brand

UNION ALL

-- Category subtotals
SELECT
  category,
  '— SUBTOTAL'                         AS brand,
  COUNT(*)                             AS products,
  ROUND(AVG(unit_price), 2)            AS avg_price
FROM products
WHERE in_stock = true
GROUP BY category

UNION ALL

-- Grand total
SELECT
  '=== GRAND TOTAL'                    AS category,
  ''                                   AS brand,
  COUNT(*)                             AS products,
  ROUND(AVG(unit_price), 2)            AS avg_price
FROM products
WHERE in_stock = true

ORDER BY category, brand;Ctrl + Enter to run
Loading FreshCart database in your browser…

💡 Note

The UNION ALL + totals pattern is a manual alternative to ROLLUP (covered in Module 28). ROLLUP is more concise for standard hierarchical totals. UNION ALL gives more control when you need custom labels, different formatting, or non-standard aggregation levels.

// Part 05

INTERSECT — Finding Common Rows

INTERSECT returns only rows that appear in both result sets. It is the SQL equivalent of set intersection — A ∩ B. Rows unique to either side are excluded. INTERSECT always deduplicates.

Customers who have placed both delivered AND cancelled orders

Loading FreshCart DB…
-- Customers appearing in both delivered AND cancelled order sets
SELECT customer_id FROM orders WHERE order_status = 'Delivered'
INTERSECT
SELECT customer_id FROM orders WHERE order_status = 'Cancelled'
ORDER BY customer_id;Ctrl + Enter to run
Loading FreshCart database in your browser…

Loading FreshCart DB…
-- Enrich the result — join back to get customer details
SELECT
  c.customer_id,
  c.first_name || ' ' || c.last_name  AS customer,
  c.loyalty_tier,
  c.city
FROM customers AS c
WHERE c.customer_id IN (
  SELECT customer_id FROM orders WHERE order_status = 'Delivered'
  INTERSECT
  SELECT customer_id FROM orders WHERE order_status = 'Cancelled'
)
ORDER BY c.loyalty_tier, c.customer_id;Ctrl + Enter to run
Loading FreshCart database in your browser…

Products sold in multiple specific stores

Loading FreshCart DB…
-- Products that have been sold in BOTH ST001 AND ST002
SELECT DISTINCT oi.product_id
FROM order_items AS oi
JOIN orders AS o ON oi.order_id = o.order_id
WHERE o.store_id = 'ST001' AND o.order_status = 'Delivered'

INTERSECT

SELECT DISTINCT oi.product_id
FROM order_items AS oi
JOIN orders AS o ON oi.order_id = o.order_id
WHERE o.store_id = 'ST002' AND o.order_status = 'Delivered'

ORDER BY product_id;Ctrl + Enter to run
Loading FreshCart database in your browser…

INTERSECT vs INNER JOIN — when each is appropriate

INTERSECT vs INNER JOIN — different tools

-- INTERSECT: find customer_ids that appear in BOTH queries
-- Works on the result sets — finds common values across two queries
SELECT customer_id FROM orders WHERE order_status = 'Delivered'
INTERSECT
SELECT customer_id FROM orders WHERE order_status = 'Cancelled';

-- Equivalent using EXISTS (more flexible — can filter differently):
SELECT DISTINCT o1.customer_id
FROM orders AS o1
WHERE o1.order_status = 'Delivered'
  AND EXISTS (
    SELECT 1 FROM orders AS o2
    WHERE o2.customer_id = o1.customer_id
      AND o2.order_status = 'Cancelled'
  );

-- INNER JOIN: combines columns from two different tables
-- Completely different purpose — adds columns, not finds common rows
-- Do not use INNER JOIN where INTERSECT is appropriate

// Part 06

EXCEPT — Subtracting One Set From Another

EXCEPT (called MINUS in Oracle and some other databases) returns rows from the first query that do not appear in the second query. It is the set difference — A − B. Rows unique to the second query are discarded. EXCEPT always deduplicates.

Customers who ordered but never cancelled

Loading FreshCart DB…
-- Customers who have delivered orders
-- MINUS customers who have ever cancelled
-- = customers who have ordered but never cancelled

SELECT customer_id FROM orders WHERE order_status = 'Delivered'
EXCEPT
SELECT customer_id FROM orders WHERE order_status = 'Cancelled'
ORDER BY customer_id;Ctrl + Enter to run
Loading FreshCart database in your browser…

Products sold in January but not in February

Loading FreshCart DB…
-- Products that appeared in January delivered orders
-- but NOT in February delivered orders
SELECT DISTINCT oi.product_id
FROM order_items AS oi
JOIN orders AS o ON oi.order_id = o.order_id
WHERE o.order_status = 'Delivered'
  AND strftime('%m', o.order_date) = '01'
  AND strftime('%Y', o.order_date) = '2024'

EXCEPT

SELECT DISTINCT oi.product_id
FROM order_items AS oi
JOIN orders AS o ON oi.order_id = o.order_id
WHERE o.order_status = 'Delivered'
  AND strftime('%m', o.order_date) = '02'
  AND strftime('%Y', o.order_date) = '2024'

ORDER BY product_id;Ctrl + Enter to run
Loading FreshCart database in your browser…

EXCEPT vs NOT EXISTS vs LEFT JOIN IS NULL

Three ways to find set difference — comparison

-- Goal: customer_ids in delivered orders but not in cancelled orders

-- EXCEPT (cleanest — set operation on result sets):
SELECT customer_id FROM orders WHERE order_status = 'Delivered'
EXCEPT
SELECT customer_id FROM orders WHERE order_status = 'Cancelled';

-- NOT EXISTS (most flexible — any conditions in subquery):
SELECT DISTINCT o.customer_id
FROM orders AS o
WHERE o.order_status = 'Delivered'
  AND NOT EXISTS (
    SELECT 1 FROM orders AS o2
    WHERE o2.customer_id  = o.customer_id
      AND o2.order_status = 'Cancelled'
  );

-- LEFT JOIN IS NULL (when you need right-side columns too):
SELECT DISTINCT o1.customer_id
FROM orders AS o1
LEFT JOIN orders AS o2
  ON o1.customer_id  = o2.customer_id
  AND o2.order_status = 'Cancelled'
WHERE o1.order_status = 'Delivered'
  AND o2.order_id IS NULL;

-- All three return the same customer_ids
-- EXCEPT: most concise for pure set difference on complete queries
-- NOT EXISTS: more flexible, handles complex multi-table conditions
-- LEFT JOIN IS NULL: use when right-side columns are needed

Loading FreshCart DB…
-- Verify: all three approaches return the same result
-- Method 1: EXCEPT
SELECT COUNT(*) AS except_count
FROM (
  SELECT customer_id FROM orders WHERE order_status = 'Delivered'
  EXCEPT
  SELECT customer_id FROM orders WHERE order_status = 'Cancelled'
) AS exc;Ctrl + Enter to run
Loading FreshCart database in your browser…

Loading FreshCart DB…
-- Method 2: NOT EXISTS (should match EXCEPT count)
SELECT COUNT(DISTINCT o.customer_id) AS not_exists_count
FROM orders AS o
WHERE o.order_status = 'Delivered'
  AND NOT EXISTS (
    SELECT 1 FROM orders o2
    WHERE o2.customer_id = o.customer_id
      AND o2.order_status = 'Cancelled'
  );Ctrl + Enter to run
Loading FreshCart database in your browser…

// Part 07

Ordering and LIMIT With Set Operations

ORDER BY and LIMIT apply to the entire result of the set operation — not to individual queries. They must appear after the final query in the set operation chain. Attempting to add ORDER BY to an individual query within a UNION causes an error in most databases.

ORDER BY and LIMIT with set operations

-- CORRECT: ORDER BY after the final query
SELECT customer_id, 'Delivered' AS type FROM orders WHERE order_status = 'Delivered'
UNION ALL
SELECT customer_id, 'Cancelled' AS type FROM orders WHERE order_status = 'Cancelled'
ORDER BY customer_id, type;    -- applies to the full combined result

-- CORRECT: LIMIT after ORDER BY
SELECT city FROM customers
UNION
SELECT city FROM stores
ORDER BY city
LIMIT 5;

-- WRONG: ORDER BY inside an individual query
SELECT customer_id FROM orders WHERE order_status = 'Delivered'
ORDER BY customer_id   -- ERROR in most databases
UNION ALL
SELECT customer_id FROM orders WHERE order_status = 'Cancelled';

-- WORKAROUND: wrap individual queries in subqueries if you need separate ordering
-- (rarely needed — usually ORDER BY on the final result is what you want)
SELECT * FROM (
  SELECT customer_id, total_amount FROM orders WHERE order_status = 'Delivered'
  ORDER BY total_amount DESC LIMIT 5
) AS top_delivered
UNION ALL
SELECT * FROM (
  SELECT customer_id, total_amount FROM orders WHERE order_status = 'Cancelled'
  ORDER BY total_amount DESC LIMIT 5
) AS top_cancelled;

Loading FreshCart DB…
-- Top 5 products by revenue from delivered orders
-- UNION ALL with per-subquery LIMIT then outer ORDER BY
SELECT * FROM (
  SELECT
    p.product_name,
    'In stock'      AS availability,
    ROUND(SUM(oi.line_total), 2) AS revenue
  FROM products AS p
  JOIN order_items AS oi ON p.product_id = oi.product_id
  JOIN orders      AS o  ON oi.order_id  = o.order_id
  WHERE p.in_stock = true AND o.order_status = 'Delivered'
  GROUP BY p.product_name
  ORDER BY revenue DESC LIMIT 3
) AS in_stock_top

UNION ALL

SELECT * FROM (
  SELECT
    p.product_name,
    'Out of stock'  AS availability,
    ROUND(SUM(oi.line_total), 2) AS revenue
  FROM products AS p
  JOIN order_items AS oi ON p.product_id = oi.product_id
  JOIN orders      AS o  ON oi.order_id  = o.order_id
  WHERE p.in_stock = false AND o.order_status = 'Delivered'
  GROUP BY p.product_name
  ORDER BY revenue DESC LIMIT 3
) AS out_stock_top

ORDER BY availability, revenue DESC;Ctrl + Enter to run
Loading FreshCart database in your browser…

// Part 08

Practical Patterns — Set Operations in Production Analytics

Full UNION pipeline report — all order statuses in one view

Loading FreshCart DB…
-- Complete order status summary — one row per status
-- with percentage of total
WITH total AS (
  SELECT COUNT(*) AS total_orders, SUM(total_amount) AS total_revenue
  FROM orders
)
SELECT
  order_status,
  COUNT(*)                                          AS orders,
  ROUND(SUM(total_amount), 2)                       AS revenue,
  ROUND(COUNT(*) * 100.0 / t.total_orders, 1)      AS pct_of_orders
FROM orders, total AS t
GROUP BY order_status, t.total_orders

UNION ALL

SELECT
  'TOTAL',
  t.total_orders,
  ROUND(t.total_revenue, 2),
  100.0
FROM total AS t

ORDER BY orders DESC;Ctrl + Enter to run
Loading FreshCart database in your browser…

Change detection — what changed between two snapshots

Loading FreshCart DB…
-- Products whose price changed: compare Jan vs Feb unit prices
-- Using EXCEPT to find rows that exist in Jan but not Feb (changed or removed)
-- and Feb but not Jan (new or changed)

-- Products with different prices (using EXCEPT both ways)
SELECT 'In Jan not Feb (removed or changed)' AS change_type, product_id, unit_price
FROM (
  SELECT product_id, ROUND(unit_price, 2) AS unit_price FROM products
  WHERE in_stock = true
  EXCEPT
  SELECT product_id, ROUND(unit_price * 0.95, 2) FROM products  -- simulate Feb price
  WHERE in_stock = true
) AS removed_or_changed

UNION ALL

SELECT 'In Feb not Jan (new or changed)' AS change_type, product_id, unit_price
FROM (
  SELECT product_id, ROUND(unit_price * 0.95, 2) AS unit_price FROM products
  WHERE in_stock = true
  EXCEPT
  SELECT product_id, ROUND(unit_price, 2) FROM products
  WHERE in_stock = true
) AS new_or_changed

ORDER BY change_type, product_id
LIMIT 10;Ctrl + Enter to run
Loading FreshCart database in your browser…

Finding symmetric difference with EXCEPT + UNION ALL

Loading FreshCart DB…
-- Symmetric difference: cities that appear in ONLY ONE of the two tables
-- Cities in customers but not stores, PLUS cities in stores but not customers
SELECT city, 'Customer city only' AS source
FROM (
  SELECT DISTINCT city FROM customers WHERE city IS NOT NULL
  EXCEPT
  SELECT DISTINCT city FROM stores
) AS cust_only

UNION ALL

SELECT city, 'Store city only' AS source
FROM (
  SELECT DISTINCT city FROM stores
  EXCEPT
  SELECT DISTINCT city FROM customers WHERE city IS NOT NULL
) AS store_only

ORDER BY city;Ctrl + Enter to run
Loading FreshCart database in your browser…

// Part 09

What This Looks Like at Work

You are a data engineer at Square. Every morning at 7 AM, the reconciliation job produces a report comparing yesterday's transaction data across three systems: the primary OLTP database, the data warehouse, and the partner bank's settlement file. Each source must agree. You use EXCEPT and UNION ALL to identify discrepancies.

7:00 AM

Reconciliation job starts

Three sources need to agree: primary DB transactions, warehouse transactions, settlement file transactions. Adapted for FreshCart: compare ST001 orders, ST002 orders, and a combined view.

7:05 AM

Step 1 — find order IDs in one source but not the other

EXCEPT identifies order_ids present in one system but missing from another.

Loading FreshCart DB…
-- Simulate reconciliation: orders in ST001 but not ST002 (and vice versa)
-- In production: primary_db.orders vs warehouse.orders

-- Orders in ST001 that have no equivalent in ST002 (different store, same pattern)
SELECT order_id, total_amount, ROUND(total_amount, 0) AS rounded_total
FROM orders
WHERE store_id = 'ST001' AND order_status = 'Delivered'

EXCEPT

SELECT order_id, total_amount, ROUND(total_amount, 0) AS rounded_total
FROM orders
WHERE store_id = 'ST002' AND order_status = 'Delivered'

ORDER BY order_id
LIMIT 8;Ctrl + Enter to run
Loading FreshCart database in your browser…

7:20 AM

Step 2 — full reconciliation summary

UNION ALL combines: matched records, ST001-only records, and ST002-only records into a complete reconciliation view.

Loading FreshCart DB…
-- Full reconciliation: match status for each order
WITH st001 AS (
  SELECT order_id, ROUND(total_amount, 2) AS amount
  FROM orders WHERE store_id = 'ST001' AND order_status = 'Delivered'
),
st002 AS (
  SELECT order_id, ROUND(total_amount, 2) AS amount
  FROM orders WHERE store_id = 'ST002' AND order_status = 'Delivered'
)
-- Matched orders (in both)
SELECT a.order_id, a.amount AS st001_amount, b.amount AS st002_amount,
       '✓ In both' AS status
FROM st001 AS a
JOIN st002 AS b ON a.order_id = b.order_id

UNION ALL

-- ST001 only
SELECT a.order_id, a.amount, NULL, '⚠ ST001 only'
FROM st001 AS a
WHERE NOT EXISTS (SELECT 1 FROM st002 b WHERE b.order_id = a.order_id)

UNION ALL

-- ST002 only
SELECT b.order_id, NULL, b.amount, '⚠ ST002 only'
FROM st002 AS b
WHERE NOT EXISTS (SELECT 1 FROM st001 a WHERE a.order_id = b.order_id)

ORDER BY status DESC, order_id
LIMIT 15;Ctrl + Enter to run
Loading FreshCart database in your browser…

7:35 AM

Reconciliation report delivered

The UNION ALL of three result sets — matched, source-A-only, source-B-only — produces a complete reconciliation report. Any row with "⚠" status routes to the investigation queue. The report runs in under 5 seconds and is automatically emailed to the operations team.

🎯 Pro Tip

Reconciliation reports are one of the highest-value use cases for UNION ALL + EXCEPT. The pattern: (1) define each source as a CTE, (2) EXCEPT to find rows exclusive to each source, (3) UNION ALL to combine matched and exclusive rows with a status label. This architecture handles any number of sources — add one more CTE and one more UNION ALL block.

// Part 10

Interview Prep — 5 Questions With Complete Answers

Q: What is the difference between UNION and UNION ALL?

Both UNION and UNION ALL combine the result sets of two or more queries by stacking rows vertically. The difference is deduplication. UNION removes duplicate rows — if the same row appears in both queries, it appears only once in the final result. UNION ALL keeps all rows from all queries, including duplicates — if the same row appears in both queries, it appears twice.

The performance implication is significant. UNION must sort or hash all rows to identify and remove duplicates — O(n log n) work. UNION ALL simply appends result sets — O(n) work. UNION ALL is always faster than UNION. On large result sets, this difference can be substantial.

When to use each: use UNION ALL when you know the result sets are disjoint (no possible overlapping rows) — different time periods, different status values, different segments. Use UNION when the same row could appear in multiple queries and you want each unique row once — combining a current customer list with a historical list where some customers appear in both. The common mistake is defaulting to UNION (with deduplication) when UNION ALL is correct and faster. Always ask: "can the same row appear in both queries?" If yes and you want it once, use UNION. If yes and you want it multiple times, or if it is impossible for the same row to appear twice, use UNION ALL.

Q: What requirements must be met for a UNION or set operation to be valid?

Three requirements must be satisfied for any set operation (UNION, UNION ALL, INTERSECT, EXCEPT) to execute successfully. First: all queries must return the same number of columns. A SELECT with 3 columns cannot be combined with a SELECT with 4 columns — the column count must match exactly across all queries in the set operation.

Second: the corresponding columns must have compatible data types. Column 1 of query A must be compatible with column 1 of query B — both numeric, both text, both date, etc. Incompatible types (combining a DATE column with an INTEGER column in the same position) cause a type error. Most databases will attempt implicit type casting for closely related types (INTEGER and DECIMAL are usually compatible) but will error on clearly incompatible combinations (DATE and VARCHAR).

Third: ORDER BY and LIMIT must appear after the final query and apply to the entire combined result — not to individual queries within the set operation. Adding ORDER BY to an individual query within UNION causes a syntax error in most databases (PostgreSQL, MySQL 8+). If you need to sort or limit individual queries before combining, wrap each query in a subquery: SELECT * FROM (SELECT ... ORDER BY ... LIMIT n) AS q1 UNION ALL SELECT * FROM (SELECT ... ORDER BY ... LIMIT n) AS q2. The column names in the result come from the first query — if query A has column named "customer_name" and query B has column named "emp_name", the result column is named "customer_name" throughout.

Q: When would you use INTERSECT instead of a JOIN?

INTERSECT finds rows that appear in both result sets simultaneously — it operates on complete result rows, not on join keys. It is the natural choice when you want to find entities that satisfy two independent conditions that can be expressed as separate queries, and you want the common elements.

INTERSECT is appropriate when: the two queries are logically independent (neither is a "lookup" of the other), both queries naturally return the same columns, and you want the rows that appear in both. Example: customer_ids who placed delivered orders INTERSECT customer_ids who placed cancelled orders — these are two independent queries against the same table, and INTERSECT cleanly finds customers who appear in both.

JOIN is appropriate when you want to combine columns from different tables based on a relationship — horizontally expanding the result. INTERSECT is a vertical set operation — it does not add columns, it finds common rows. A JOIN of customers to orders adds order columns to customer rows. An INTERSECT of delivered customer_ids and cancelled customer_ids finds customer_ids that appear in both sets — no additional columns. When to prefer INTERSECT over a JOIN + filter: when both conditions are expressed on the same table with different WHERE clauses (same column, different values), INTERSECT is more direct. When you need columns from both result sets, EXISTS or JOIN is more appropriate than INTERSECT because INTERSECT does not give you access to the second query's specific rows.

Q: What is the difference between EXCEPT and NOT IN / NOT EXISTS for finding set differences?

All three find rows in one set that are absent from another, but they work at different levels of abstraction and have different capabilities. EXCEPT operates on complete result sets — it compares entire rows across two queries and returns rows from the first query that are not present in the second. It always deduplicates. It works cleanly when both queries return the same columns naturally.

NOT IN operates on a single column — it filters outer rows where a specific column's value does not appear in a list returned by the subquery. NOT IN fails silently when the subquery contains NULL values. NOT EXISTS is a correlated subquery that checks row-by-row whether a related record exists — it is NULL-safe and can handle complex multi-column or multi-table conditions in the subquery.

Choose based on the problem structure: EXCEPT when both sides are naturally expressed as full queries returning the same columns — "customer_ids who ordered in January" EXCEPT "customer_ids who ordered in February" is a natural set difference. NOT EXISTS when the anti-join condition is complex — involves multiple tables, multiple conditions, or you need flexibility beyond what two parallel queries provide. NOT IN when the subquery is simple, small, and guaranteed NULL-free. EXCEPT is cleanest for comparing two entire queries; NOT EXISTS is most flexible for row-level anti-join with complex conditions. Performance: all three produce anti-join plans in the optimiser for most databases — performance is typically equivalent when proper indexes exist.

Q: How do set operations handle NULL values?

Set operations treat NULL values as equal to each other for the purpose of deduplication and comparison — unlike standard SQL WHERE comparisons where NULL ≠ NULL. This is a specific exception to NULL's usual semantics and it is important for understanding UNION, INTERSECT, and EXCEPT behaviour.

In UNION (with deduplication): if both queries return a row where a specific column is NULL, the NULL row appears only once in the result — the database treats the two NULL values as "equal" for deduplication. In INTERSECT: a row with NULL in one query can match a row with NULL in the other query — NULLs are considered equal. In EXCEPT: a row with NULL in the first query is considered matched by a row with NULL in the second query and is therefore excluded from the EXCEPT result.

This NULL-equality behaviour in set operations is defined by the SQL standard and implemented consistently in PostgreSQL, MySQL 8+, and SQL Server. It differs from WHERE NULL = NULL (which evaluates to NULL, not TRUE) and is closer to IS NOT DISTINCT FROM semantics. Practical implication: if you are using EXCEPT to find "new" rows and some rows have NULL keys, those NULL-key rows from the first query will be considered matched by NULL-key rows from the second query and excluded from EXCEPT — which is usually the desired behaviour. When you specifically need NULL to not match NULL in a set operation context, you need a more complex approach using NOT EXISTS with IS NULL checks.

// Part 11

Errors You Will Hit — And Exactly Why They Happen

ERROR: each UNION query must have the same number of columns

Cause: The two queries being combined with UNION, UNION ALL, INTERSECT, or EXCEPT return different numbers of columns. Set operations require exactly the same column count in all queries. This commonly happens when adding a new column to one query in a set operation without updating the others, or when copying a query template and forgetting to match the column count.

Fix: Count the columns in each SELECT clause and ensure they match. Add NULL AS column_name placeholder columns to queries that have fewer columns: SELECT customer_id, first_name, NULL AS city FROM employees UNION ALL SELECT customer_id, first_name, city FROM customers. Each NULL placeholder occupies a column position and is cast to the appropriate type from the other query's column. Alternatively, remove columns from the query that has more until they match.

UNION is very slow — takes 30 seconds for queries that individually take under 1 second

Cause: UNION (without ALL) performs deduplication by sorting all combined rows — O(n log n) work even when the result sets are already disjoint and contain no duplicates. The sort is expensive proportional to the total number of rows across all combined queries. With large result sets, this deduplication sort becomes the dominant cost.

Fix: Use UNION ALL instead of UNION if the result sets are disjoint (which they often are when you are combining queries with mutually exclusive WHERE conditions). UNION ALL skips the deduplication sort entirely — O(n) instead of O(n log n). Only use UNION (with deduplication) when the same row could genuinely appear in multiple queries and you need each row once. Verify the result sets are disjoint by confirming their WHERE conditions are mutually exclusive: WHERE status = 'Active' and WHERE status = 'Inactive' cannot produce the same row — use UNION ALL.

ERROR: ORDER BY clause not allowed at this point — inside a UNION query

Cause: An ORDER BY was placed inside one of the individual queries within a UNION/UNION ALL rather than after the final query. ORDER BY inside a UNION member query is not allowed in standard SQL (PostgreSQL, MySQL 8+) because it would apply only to that query's result before the set operation combines them — which is undefined behaviour since the set operation does not preserve ordering anyway.

Fix: Move ORDER BY to after the last query in the set operation: SELECT ... UNION ALL SELECT ... ORDER BY column. This applies ORDER BY to the entire combined result. If you need to sort and limit individual queries before combining them (for example, top-5 from each source), wrap each query in a subquery: SELECT * FROM (SELECT ... ORDER BY col LIMIT 5) AS q1 UNION ALL SELECT * FROM (SELECT ... ORDER BY col LIMIT 5) AS q2. The subquery wrapping allows ORDER BY and LIMIT to apply to each part independently.

EXCEPT returns zero rows — expected to find differences between two result sets

Cause: The two queries passed to EXCEPT return identical result sets, so EXCEPT eliminates all rows from the first query. Or the column types or values differ slightly between the two queries (different precision, different string formatting, leading/trailing whitespace) preventing rows from being considered equal — so EXCEPT does not eliminate any rows from the first query and returns more rows than expected.

Fix: Run each query independently and compare the results visually. Check column types — EXCEPT considers rows identical only when all column values match including type. A DECIMAL(10,2) value of 5.00 may not match an INTEGER value of 5 in an EXCEPT comparison. Ensure consistent data types and formatting across both queries. Use ROUND() or CAST() to normalise numeric precision before EXCEPT. For string columns, use TRIM() and LOWER() to normalise whitespace and case before comparing.

INTERSECT returns fewer rows than expected — some common rows are missing

Cause: The rows that should be common between the two queries are not considered identical by INTERSECT because of type or value differences in one or more columns. INTERSECT compares entire rows — all columns must match exactly. If column 3 has a different value or type in the two queries (even by one decimal place, trailing space, or NULL vs empty string), the row is not considered a match.

Fix: Simplify by reducing the columns being compared — only include the columns that define the common identity. SELECT customer_id FROM query_a INTERSECT SELECT customer_id FROM query_b is more reliable than comparing all columns. If you need multi-column INTERSECT, ensure consistent formatting: use the same ROUND() precision, the same TRIM() and LOWER() normalisation, and the same NULL handling (COALESCE to a default value) across both queries. Use FULL OUTER JOIN instead of INTERSECT for debugging — it shows matched and unmatched rows with all column values visible.

Try It Yourself

Write three queries demonstrating all three set operations: (1) UNION ALL — produce a combined 'activity feed' showing the 5 most recent delivered orders and the 5 most expensive products, with columns: id (order_id or product_id), name (a description), amount (total_amount or unit_price), and type ('Order' or 'Product'). (2) INTERSECT — find store_ids that appear in BOTH delivered orders AND have at least one employee assigned. Show just the store_id. (3) EXCEPT — find product categories that appear in the products table but have NOT appeared in any delivered order's order_items. Show just the category name, sorted alphabetically.

🎯 Key Takeaways

✓Set operations combine query results vertically — stacking rows. JOINs combine horizontally — adding columns. They solve different problems.
✓Three set operations: UNION (all rows, deduplicates), INTERSECT (common rows only), EXCEPT (first minus second). All require matching column count and compatible types.
✓UNION ALL skips deduplication — O(n) vs UNION's O(n log n). Default to UNION ALL when result sets are disjoint or duplicates are acceptable. Only use UNION when you need deduplication across sources.
✓Column names in the result come from the first query. All subsequent queries contribute rows but their column names are ignored.
✓ORDER BY and LIMIT apply to the entire combined result and must appear after the final query. To sort/limit individual queries, wrap each in a subquery.
✓INTERSECT finds rows that satisfy two independent conditions — entities that appear in both result sets. NULL = NULL for set operation comparisons (unlike standard WHERE).
✓EXCEPT returns rows from the first query not present in the second. Equivalent to NOT EXISTS and LEFT JOIN IS NULL for set differences — choose based on complexity of conditions.
✓Set operations always deduplicate (except UNION ALL). UNION deduplicates across both queries. INTERSECT and EXCEPT deduplicate by definition.
✓Symmetric difference (rows unique to either side): (A EXCEPT B) UNION ALL (B EXCEPT A). This finds everything that does not overlap.
✓Reconciliation reports are the primary production use case: EXCEPT to find missing records, UNION ALL to combine matched and unmatched sets with status labels into one complete reconciliation view.

What comes next

In Module 40, you learn derived tables in depth — subqueries in FROM as building blocks for multi-step analytics, the rules for aliasing and referencing, and when a derived table beats a CTE.

Module 40 → Derived Tables

Discussion

Have a better approach? Found something outdated? Share it — your knowledge helps everyone learning here.

Continue with GitHub