SQL — Module 38Intermediate

EXISTS and NOT EXISTS

The cleanest existence check in SQL — how EXISTS works, when it beats IN and LEFT JOIN IS NULL, every anti-join pattern, NULL safety, and performance at scale

12–16 min April 2026

Section 8 · Subqueries & Set Operations

Subqueries & Set Operations · 5 modulesModule 38

Subqueries Correlated EXISTS UNION,Derived

// Part 01

What EXISTS Does — True or False, Nothing Else

EXISTS is a predicate — it returns TRUE or FALSE. It takes a subquery as its argument and returns TRUE if the subquery produces at least one row, FALSE if it produces zero rows. That is its entire job. The subquery's column values are completely irrelevant — only whether the subquery returns any rows matters.

This simplicity is its strength. EXISTS answers one question and one question only: does at least one matching row exist? The moment the database finds the first matching row, it stops scanning — it does not need to find all matches, count them, or return their values. This short-circuit behaviour makes EXISTS uniquely efficient for existence checks.

EXISTS — the mechanics

-- EXISTS returns TRUE if subquery returns >= 1 row
-- EXISTS returns FALSE if subquery returns 0 rows
-- The SELECT list inside EXISTS is irrelevant — SELECT 1, SELECT *, SELECT NULL all work

-- Common convention: SELECT 1 (signals intent — result not used)
WHERE EXISTS (SELECT 1 FROM orders WHERE customer_id = c.customer_id)

-- Equivalent but unconventional:
WHERE EXISTS (SELECT * FROM orders WHERE customer_id = c.customer_id)
WHERE EXISTS (SELECT order_id FROM orders WHERE customer_id = c.customer_id)
WHERE EXISTS (SELECT NULL FROM orders WHERE customer_id = c.customer_id)

-- All four produce identical results
-- SELECT 1 is the universal convention — it communicates "I only care about existence"

-- NOT EXISTS inverts the result:
-- TRUE if subquery returns 0 rows
-- FALSE if subquery returns >= 1 row

// Part 02

Basic EXISTS — Finding Rows With Related Data

Loading FreshCart DB…
-- Customers who have placed at least one order
-- EXISTS: TRUE as soon as one matching order is found
SELECT
  c.customer_id,
  c.first_name || ' ' || c.last_name  AS customer,
  c.city,
  c.loyalty_tier
FROM customers AS c
WHERE EXISTS (
  SELECT 1
  FROM orders AS o
  WHERE o.customer_id = c.customer_id   -- correlated link
)
ORDER BY c.loyalty_tier, c.customer_id;Ctrl + Enter to run
Loading FreshCart database in your browser…

Loading FreshCart DB…
-- Stores that have at least one delivered order
SELECT
  s.store_id,
  s.store_name,
  s.city,
  s.monthly_target
FROM stores AS s
WHERE EXISTS (
  SELECT 1
  FROM orders AS o
  WHERE o.store_id      = s.store_id
    AND o.order_status  = 'Delivered'
)
ORDER BY s.city;Ctrl + Enter to run
Loading FreshCart database in your browser…

Loading FreshCart DB…
-- Products that have been ordered at least once
SELECT
  p.product_id,
  p.product_name,
  p.category,
  p.unit_price
FROM products AS p
WHERE EXISTS (
  SELECT 1
  FROM order_items AS oi
  WHERE oi.product_id = p.product_id
)
ORDER BY p.category, p.unit_price DESC;Ctrl + Enter to run
Loading FreshCart database in your browser…

// Part 03

NOT EXISTS — The Anti-Join

NOT EXISTS finds rows in the outer table that have no matching row in the subquery — the anti-join. It is the safest, most semantically clear way to ask "which X has no Y?" It correctly handles NULLs, unlike NOT IN.

Customers who have never ordered

Loading FreshCart DB…
-- Customers with NO orders at all
SELECT
  c.customer_id,
  c.first_name || ' ' || c.last_name  AS customer,
  c.email,
  c.city,
  c.joined_date
FROM customers AS c
WHERE NOT EXISTS (
  SELECT 1
  FROM orders AS o
  WHERE o.customer_id = c.customer_id
)
ORDER BY c.joined_date DESC;Ctrl + Enter to run
Loading FreshCart database in your browser…

Products never sold

Loading FreshCart DB…
-- Products with zero sales history
SELECT
  p.product_id,
  p.product_name,
  p.category,
  p.brand,
  p.unit_price,
  p.in_stock
FROM products AS p
WHERE NOT EXISTS (
  SELECT 1
  FROM order_items AS oi
  WHERE oi.product_id = p.product_id
)
ORDER BY p.category, p.unit_price DESC;Ctrl + Enter to run
Loading FreshCart database in your browser…

Employees with no store assignment

Loading FreshCart DB…
-- Employees who are not assigned to any store
SELECT
  e.employee_id,
  e.first_name || ' ' || e.last_name  AS employee,
  e.role,
  e.department,
  e.salary
FROM employees AS e
WHERE NOT EXISTS (
  SELECT 1
  FROM stores AS s
  WHERE s.store_id = e.store_id
)
ORDER BY e.department, e.salary DESC;Ctrl + Enter to run
Loading FreshCart database in your browser…

// Part 04

EXISTS with Multiple Conditions — Precise Existence Checks

The subquery inside EXISTS can have any WHERE conditions — combining multiple filters to check for a very specific kind of related row. This is where EXISTS becomes more powerful than a simple JOIN or IN check.

Customers with at least one large delivered order

Loading FreshCart DB…
-- Customers who have at least one delivered order above ₹1,000
SELECT
  c.customer_id,
  c.first_name || ' ' || c.last_name  AS customer,
  c.loyalty_tier
FROM customers AS c
WHERE EXISTS (
  SELECT 1
  FROM orders AS o
  WHERE o.customer_id  = c.customer_id
    AND o.order_status = 'Delivered'
    AND o.total_amount > 1000
)
ORDER BY c.loyalty_tier, c.customer_id;Ctrl + Enter to run
Loading FreshCart database in your browser…

Stores that have sold Dairy products

Loading FreshCart DB…
-- Stores that have at least one delivered order containing a Dairy product
SELECT
  s.store_id,
  s.store_name,
  s.city
FROM stores AS s
WHERE EXISTS (
  SELECT 1
  FROM orders      AS o
  JOIN order_items AS oi ON o.order_id    = oi.order_id
  JOIN products    AS p  ON oi.product_id = p.product_id
  WHERE o.store_id     = s.store_id
    AND o.order_status = 'Delivered'
    AND p.category     = 'Dairy'
)
ORDER BY s.city;Ctrl + Enter to run
Loading FreshCart database in your browser…

Products ordered by Platinum customers

Loading FreshCart DB…
-- Products that have been ordered by at least one Platinum customer
SELECT
  p.product_id,
  p.product_name,
  p.category
FROM products AS p
WHERE EXISTS (
  SELECT 1
  FROM order_items AS oi
  JOIN orders      AS o  ON oi.order_id   = o.order_id
  JOIN customers   AS c  ON o.customer_id = c.customer_id
  WHERE oi.product_id   = p.product_id
    AND c.loyalty_tier  = 'Platinum'
    AND o.order_status  = 'Delivered'
)
ORDER BY p.category, p.product_name;Ctrl + Enter to run
Loading FreshCart database in your browser…

// Part 05

NOT EXISTS vs NOT IN vs LEFT JOIN IS NULL

Three approaches solve the anti-join problem — finding rows with no match. They produce the same results for non-NULL data, but differ critically in NULL handling, readability, and performance.

The NULL trap in NOT IN

Why NOT IN fails with NULLs

-- Scenario: product_id 99 is in products but NOT in order_items
-- order_items.product_id has no NULL values (NOT NULL column)

-- NOT IN works correctly here:
SELECT product_name FROM products
WHERE product_id NOT IN (SELECT product_id FROM order_items);
-- Returns: products never ordered

-- But if order_items had a NULL product_id:
-- NOT IN (1, 2, 3, NULL) evaluates as:
-- NOT (pid=1 OR pid=2 OR pid=3 OR pid=NULL)
-- = NOT (FALSE OR FALSE OR FALSE OR NULL)
-- = NOT NULL = NULL  (not TRUE!)
-- Result: ZERO rows returned — completely wrong

-- NOT EXISTS is immune:
-- It checks row by row whether a match exists
-- NULL in the inner table simply means "this row has no product_id"
-- NOT EXISTS still correctly identifies products with no matches

All three approaches — same result on clean data

Loading FreshCart DB…
-- Method 1: NOT EXISTS (preferred — NULL-safe, semantic)
SELECT product_id, product_name
FROM products AS p
WHERE NOT EXISTS (
  SELECT 1 FROM order_items AS oi
  WHERE oi.product_id = p.product_id
)
ORDER BY product_id;Ctrl + Enter to run
Loading FreshCart database in your browser…

Loading FreshCart DB…
-- Method 2: NOT IN (works here — order_items.product_id is NOT NULL)
SELECT product_id, product_name
FROM products
WHERE product_id NOT IN (
  SELECT product_id FROM order_items
  -- Safe: product_id is NOT NULL in order_items
  -- Would break if product_id could be NULL
)
ORDER BY product_id;Ctrl + Enter to run
Loading FreshCart database in your browser…

Loading FreshCart DB…
-- Method 3: LEFT JOIN IS NULL
SELECT p.product_id, p.product_name
FROM products AS p
LEFT JOIN order_items AS oi ON p.product_id = oi.product_id
WHERE oi.item_id IS NULL
ORDER BY p.product_id;Ctrl + Enter to run
Loading FreshCart database in your browser…

Comparison table

Approach	NULL safe?	Short-circuits?	Can select right-side cols?	Best when
NOT EXISTS	✅ Always	✅ First match found	❌ No — subquery not projected	Default anti-join — clearest semantics
NOT IN	❌ Fails with NULLs	❌ Must build full list	❌ No	Only when 100% sure no NULLs in subquery
LEFT JOIN IS NULL	✅ Always	❌ Processes all rows	✅ Yes — right cols accessible	When you also need right-side data or already in a JOIN chain

⚠️ Important

The NOT IN NULL trap is one of the most common silent correctness bugs in SQL. If a developer adds a row with NULL product_id to order_items, a previously working NOT IN query silently starts returning zero rows — with no error, no warning, and no visible indication that anything changed. NOT EXISTS never has this problem. Default to NOT EXISTS for all anti-join queries.

// Part 06

EXISTS vs IN — The Semi-Join Comparison

Both EXISTS and IN implement the semi-join — finding outer rows that have at least one match in the inner table. They produce the same result for non-NULL data. The choice between them is about semantics, performance, and NULL safety.

EXISTS vs IN — equivalent results, different mechanics

Loading FreshCart DB…
-- IN: collects all matching customer_ids, then filters
SELECT order_id, total_amount, order_status
FROM orders
WHERE customer_id IN (
  SELECT customer_id
  FROM customers
  WHERE city = 'Seattle'
)
ORDER BY total_amount DESC
LIMIT 8;Ctrl + Enter to run
Loading FreshCart database in your browser…

Loading FreshCart DB…
-- EXISTS: checks one customer at a time, stops at first match
SELECT o.order_id, o.total_amount, o.order_status
FROM orders AS o
WHERE EXISTS (
  SELECT 1
  FROM customers AS c
  WHERE c.customer_id = o.customer_id
    AND c.city = 'Seattle'
)
ORDER BY o.total_amount DESC
LIMIT 8;Ctrl + Enter to run
Loading FreshCart database in your browser…

When EXISTS is faster than IN

EXISTS short-circuits — it stops as soon as one matching row is found. IN builds the complete list first, then checks membership. When the inner table is large and matches are common, EXISTS does dramatically less work.

Performance mechanics — EXISTS vs IN

-- Scenario: checking if any of 1,000,000 orders belongs to Seattle customers

-- IN approach:
-- 1. Scan customers table: collect ALL Seattle customer_ids → 5,000 IDs
-- 2. For each of 1,000,000 orders: check if customer_id is in the 5,000-ID set
-- Total: 1,005,000 rows processed + list membership check per order

-- EXISTS approach:
-- For each of 1,000,000 orders:
--   1. Look up customer_id in customers table (index lookup)
--   2. Check if city = 'Seattle'
--   3. If YES → stop immediately (short-circuit), include order
--   4. If NO → exclude order
-- Total: 1,000,000 index lookups (typically fast with index on customer_id)

-- On most modern optimisers:
-- Both are converted to the same JOIN plan internally
-- The logical difference matters most for complex subqueries
-- that the optimiser cannot simplify

// Part 07

EXISTS with Aggregates — Conditional Existence

The subquery inside EXISTS can contain aggregates, GROUP BY, and HAVING — making it possible to check existence based on aggregate conditions, not just row-level conditions.

Loading FreshCart DB…
-- Customers who have placed more than 2 delivered orders
-- EXISTS + HAVING: checks existence of the aggregate condition
SELECT
  c.customer_id,
  c.first_name || ' ' || c.last_name  AS customer,
  c.loyalty_tier
FROM customers AS c
WHERE EXISTS (
  SELECT 1
  FROM orders AS o
  WHERE o.customer_id  = c.customer_id
    AND o.order_status = 'Delivered'
  GROUP BY o.customer_id
  HAVING COUNT(*) > 2
)
ORDER BY c.loyalty_tier;Ctrl + Enter to run
Loading FreshCart database in your browser…

Loading FreshCart DB…
-- Stores that have at least one product category with 3+ orders
-- Multi-table EXISTS with aggregation
SELECT
  s.store_id,
  s.city
FROM stores AS s
WHERE EXISTS (
  SELECT 1
  FROM orders      AS o
  JOIN order_items AS oi ON o.order_id    = oi.order_id
  JOIN products    AS p  ON oi.product_id = p.product_id
  WHERE o.store_id     = s.store_id
    AND o.order_status = 'Delivered'
  GROUP BY p.category
  HAVING COUNT(DISTINCT o.order_id) >= 3
)
ORDER BY s.city;Ctrl + Enter to run
Loading FreshCart database in your browser…

// Part 08

Double NOT EXISTS — Relational Division

Relational division asks "which X has done ALL of Y?" — customers who have ordered every product in a category, employees who have completed every required training, stores that have sold every product. This is one of the hardest patterns in SQL, cleanly solved by double NOT EXISTS.

The logic of double NOT EXISTS

Relational division — the double NOT EXISTS pattern

-- GOAL: find customers who have ordered EVERY product in the Dairy category
-- Read as: customers for whom there does NOT EXIST a Dairy product
--          that they have NOT ordered

SELECT c.customer_id, c.first_name
FROM customers AS c
WHERE NOT EXISTS (
  -- Is there a Dairy product...
  SELECT 1
  FROM products AS p
  WHERE p.category = 'Dairy'
    AND NOT EXISTS (
      -- ...that this customer has NOT ordered?
      SELECT 1
      FROM order_items AS oi
      JOIN orders AS o ON oi.order_id = o.order_id
      WHERE oi.product_id  = p.product_id
        AND o.customer_id  = c.customer_id  -- correlated to outer customer
        AND o.order_status = 'Delivered'
    )
)
-- Logic:
-- Outer NOT EXISTS: no Dairy product satisfies the inner condition
-- Inner NOT EXISTS: the customer has NOT ordered this product
-- Together: no Dairy product exists that the customer hasn't ordered
--           = customer has ordered ALL Dairy products

Loading FreshCart DB…
-- Customers who have ordered at least one product from EVERY category
-- Double NOT EXISTS — relational division
SELECT
  c.customer_id,
  c.first_name || ' ' || c.last_name  AS customer,
  c.loyalty_tier
FROM customers AS c
WHERE NOT EXISTS (
  -- Is there a category...
  SELECT 1
  FROM (SELECT DISTINCT category FROM products) AS cats
  WHERE NOT EXISTS (
    -- ...that this customer has NOT ordered from?
    SELECT 1
    FROM order_items AS oi
    JOIN orders      AS o  ON oi.order_id   = o.order_id
    JOIN products    AS p  ON oi.product_id = p.product_id
    WHERE o.customer_id  = c.customer_id    -- correlated
      AND o.order_status = 'Delivered'
      AND p.category     = cats.category   -- correlated to outer category
  )
)
ORDER BY c.customer_id;Ctrl + Enter to run
Loading FreshCart database in your browser…

🎯 Pro Tip

Double NOT EXISTS (relational division) is one of the few SQL problems that has no clean equivalent using JOINs or window functions. It is the canonical solution for "find X that has done ALL of Y" queries. The nested logic reads inside-out: for a customer to qualify, there must NOT EXIST any category for which it is true that the customer has NOT ordered from that category. Read it slowly, inside-out, and it becomes clear.

// Part 09

EXISTS in UPDATE and DELETE — Conditional DML

EXISTS is not limited to SELECT queries. It appears in UPDATE and DELETE statements to conditionally modify rows based on whether related data exists.

UPDATE rows where related data exists

EXISTS in UPDATE — conditional modification

-- Mark customers as 'Gold' if they have placed 3+ delivered orders
-- Only update customers who currently are 'Silver'
UPDATE customers AS c
SET loyalty_tier = 'Gold'
WHERE c.loyalty_tier = 'Silver'
  AND EXISTS (
    SELECT 1
    FROM orders AS o
    WHERE o.customer_id  = c.customer_id
      AND o.order_status = 'Delivered'
    GROUP BY o.customer_id
    HAVING COUNT(*) >= 3
  );

-- Preview the change before running:
SELECT customer_id, first_name, loyalty_tier
FROM customers AS c
WHERE loyalty_tier = 'Silver'
  AND EXISTS (
    SELECT 1 FROM orders AS o
    WHERE o.customer_id = c.customer_id AND o.order_status = 'Delivered'
    GROUP BY o.customer_id HAVING COUNT(*) >= 3
  );

DELETE rows where related data does NOT exist

NOT EXISTS in DELETE — orphan cleanup

-- Delete order_items that reference non-existent orders
-- (data quality cleanup after migration)
DELETE FROM order_items AS oi
WHERE NOT EXISTS (
  SELECT 1
  FROM orders AS o
  WHERE o.order_id = oi.order_id
);

-- Always preview with SELECT before DELETE:
SELECT oi.item_id, oi.order_id
FROM order_items AS oi
WHERE NOT EXISTS (
  SELECT 1 FROM orders AS o
  WHERE o.order_id = oi.order_id
);

Loading FreshCart DB…
-- Preview: which customers would get a loyalty upgrade?
-- EXISTS in UPDATE simulation — show affected rows without changing data
SELECT
  c.customer_id,
  c.first_name || ' ' || c.last_name  AS customer,
  c.loyalty_tier                       AS current_tier,
  'Gold'                               AS would_become,
  (
    SELECT COUNT(*)
    FROM orders AS o
    WHERE o.customer_id  = c.customer_id
      AND o.order_status = 'Delivered'
  )                                    AS delivered_orders
FROM customers AS c
WHERE c.loyalty_tier = 'Bronze'
  AND EXISTS (
    SELECT 1
    FROM orders AS o
    WHERE o.customer_id  = c.customer_id
      AND o.order_status = 'Delivered'
    GROUP BY o.customer_id
    HAVING COUNT(*) >= 2
  )
ORDER BY c.customer_id;Ctrl + Enter to run
Loading FreshCart database in your browser…

// Part 10

Performance Deep Dive — How EXISTS Executes

EXISTS performance depends on the database's ability to use indexes on the correlated column and how quickly the first matching row is found. Understanding the execution model helps you write EXISTS queries that run fast at any scale.

Index usage with EXISTS

EXISTS performance — index usage

-- FAST: EXISTS with indexed join column
-- orders.customer_id is a FK — should be indexed
WHERE EXISTS (
  SELECT 1 FROM orders AS o
  WHERE o.customer_id = c.customer_id   -- index lookup on customer_id
)
-- For each customer: one index lookup into orders
-- Cost: O(customers) × O(log orders) = very fast

-- SLOW: EXISTS without index on join column
WHERE EXISTS (
  SELECT 1 FROM orders AS o
  WHERE o.payment_method = 'COD'        -- payment_method may not be indexed
    AND o.customer_id = c.customer_id
)
-- For each customer: index lookup on customer_id, then filter payment_method
-- If customer_id index exists, still fast
-- If no index at all: full table scan per customer = O(n²)

-- Rule: the correlated column in EXISTS must be indexed in the inner table
-- Check: SELECT indexname FROM pg_indexes WHERE tablename = 'orders'

EXISTS vs COUNT(*) > 0 — always prefer EXISTS

EXISTS vs COUNT — why EXISTS wins

-- WRONG pattern: use COUNT to check existence
WHERE (SELECT COUNT(*) FROM orders WHERE customer_id = c.customer_id) > 0
-- This scans ALL matching rows and counts them before comparing to 0
-- Wasted work: the count after 1 is irrelevant

-- RIGHT pattern: use EXISTS
WHERE EXISTS (SELECT 1 FROM orders WHERE customer_id = c.customer_id)
-- Stops immediately on finding the first row
-- No counting needed

-- Performance difference:
-- Customer with 10,000 orders:
--   COUNT approach: scans all 10,000 rows to count them
--   EXISTS approach: scans 1 row and stops
-- At scale this is the difference between seconds and milliseconds

Loading FreshCart DB…
-- Demonstrate: EXISTS stops at first match
-- Both return same customers but EXISTS is faster at scale

-- EXISTS version
SELECT c.customer_id, c.first_name
FROM customers AS c
WHERE EXISTS (
  SELECT 1
  FROM orders AS o
  WHERE o.customer_id = c.customer_id
    AND o.order_status = 'Delivered'
    AND o.total_amount > 500
)
ORDER BY c.customer_id;Ctrl + Enter to run
Loading FreshCart database in your browser…

// Part 11

What This Looks Like at Work

You are a backend engineer at DoorDash. The marketing team is planning three campaign segments and needs the customer lists. Each segment uses a different existence condition — perfect for EXISTS and NOT EXISTS queries. The lists must be mutually exclusive and exhaustive.

3:00 PM

Three segment definitions

Segment A: customers who have placed at least one order above ₹1,500 (high-value ever). Segment B: customers who have placed orders but never one above ₹500 (consistent low-value). Segment C: customers who have never placed any delivered order (dormant). Adapted for FreshCart data.

3:20 PM

Segment A — high value customers

Loading FreshCart DB…
-- Segment A: customers with at least one delivered order above ₹1,000
SELECT
  c.customer_id,
  c.first_name || ' ' || c.last_name  AS customer,
  c.city,
  c.loyalty_tier,
  'Segment A: High Value'             AS segment
FROM customers AS c
WHERE EXISTS (
  SELECT 1 FROM orders AS o
  WHERE o.customer_id  = c.customer_id
    AND o.order_status = 'Delivered'
    AND o.total_amount > 1000
)
ORDER BY c.loyalty_tier, c.customer_id;Ctrl + Enter to run
Loading FreshCart database in your browser…

3:35 PM

Segment B — consistent low-value customers

Loading FreshCart DB…
-- Segment B: customers who have ordered but NEVER above ₹500
-- Has orders (EXISTS) BUT has no large order (NOT EXISTS)
SELECT
  c.customer_id,
  c.first_name || ' ' || c.last_name  AS customer,
  c.city,
  c.loyalty_tier,
  'Segment B: Low Value'              AS segment
FROM customers AS c
WHERE EXISTS (
  -- Has at least one delivered order
  SELECT 1 FROM orders AS o
  WHERE o.customer_id  = c.customer_id
    AND o.order_status = 'Delivered'
)
AND NOT EXISTS (
  -- But never had one above ₹500
  SELECT 1 FROM orders AS o
  WHERE o.customer_id  = c.customer_id
    AND o.order_status = 'Delivered'
    AND o.total_amount > 500
)
ORDER BY c.customer_id;Ctrl + Enter to run
Loading FreshCart database in your browser…

3:50 PM

Segment C — dormant customers

Loading FreshCart DB…
-- Segment C: customers with NO delivered orders at all
SELECT
  c.customer_id,
  c.first_name || ' ' || c.last_name  AS customer,
  c.city,
  c.loyalty_tier,
  c.joined_date,
  'Segment C: Dormant'                AS segment
FROM customers AS c
WHERE NOT EXISTS (
  SELECT 1 FROM orders AS o
  WHERE o.customer_id  = c.customer_id
    AND o.order_status = 'Delivered'
)
ORDER BY c.joined_date DESC;Ctrl + Enter to run
Loading FreshCart database in your browser…

4:05 PM

All three segments delivered in 45 minutes

Three EXISTS/NOT EXISTS queries — each clean, readable, and correct. The marketing team gets three mutually exclusive lists. Combining EXISTS and NOT EXISTS in Segment B (ordered at least once but never above ₹500) is the clearest way to express that exact business logic — no JOIN or GROUP BY alternative would be as direct.

🎯 Pro Tip

Combining EXISTS and NOT EXISTS in the same WHERE clause is the natural SQL expression for "has done X but never done Y" — a compound existence condition. This pattern appears constantly in retention and segmentation analytics: customers who churned (ordered before but not recently), users who started but never completed, employees who are assigned but have not performed. Build the EXISTS + NOT EXISTS compound pattern into your analytics toolkit.

// Part 12

Interview Prep — 5 Questions With Complete Answers

Q: How does EXISTS work and why does it use SELECT 1 inside the subquery?

EXISTS evaluates a correlated subquery and returns TRUE if the subquery produces at least one row, FALSE if it produces zero rows. It does not examine the values returned by the subquery — only whether any rows exist. The database stops evaluating the subquery as soon as the first matching row is found (short-circuit behaviour), making it efficient for existence checks.

SELECT 1 inside EXISTS is a universal convention that signals intent: "I only care whether rows exist, not what values they contain." The 1 is a constant — it carries no information. Writing SELECT 1 rather than SELECT * or SELECT column_name communicates to any reader that the subquery's output is irrelevant and the purpose is purely an existence check. This convention is widely recognised and expected in professional SQL code.

Technically, SELECT *, SELECT column, SELECT NULL, and SELECT 1 all produce equivalent results inside EXISTS. The database optimiser ignores the SELECT list entirely when evaluating EXISTS — it only cares whether the WHERE conditions inside the subquery are satisfied for at least one row. SELECT 1 is the standard precisely because it makes this intent explicit. Using SELECT * inside EXISTS can mislead readers into thinking the column values matter, and in older database versions, SELECT * could cause unnecessary column retrieval overhead (though modern optimisers eliminate this).

Q: Why is NOT EXISTS safer than NOT IN for anti-join queries?

NOT IN fails silently when the subquery contains NULL values, returning zero rows for every comparison — a critical correctness bug with no error message. The mechanism: NOT IN (1, 2, NULL) is evaluated as NOT (col = 1 OR col = 2 OR col = NULL). Since col = NULL evaluates to NULL in SQL's three-valued logic (not FALSE), the entire OR expression can evaluate to NULL for any col value, and NOT NULL = NULL (still not TRUE). WHERE discards all NULL-result rows, returning zero results for the entire query — even when non-NULL unmatched rows clearly exist.

NOT EXISTS is immune to this problem. It evaluates row-by-row whether a matching row exists. A NULL value in the inner table's column means "this particular row has no value for this column" — it does not affect whether other rows match. The existence check for the outer row is independent of NULLs in non-matching rows.

The silent failure mode of NOT IN makes it particularly dangerous in production. A schema change that adds a nullable foreign key column, a bulk import that inserts a row with NULL in the join column, or a data quality issue that introduces unexpected NULLs — any of these can cause a previously working NOT IN query to silently start returning zero rows with no error, warning, or any visible indication that the query's behaviour changed. NOT EXISTS never exhibits this behaviour. Professional SQL standards at most companies mandate NOT EXISTS (or LEFT JOIN IS NULL) for all anti-join queries, with NOT IN only permitted when a NOT NULL constraint on the join column is verified and documented.

Q: What is relational division and how do you implement it with double NOT EXISTS?

Relational division finds entities that satisfy all members of a set — "which customers have ordered every product in the Dairy category?", "which employees have completed all required trainings?", "which stores have sold every product?" It is the relational equivalent of universal quantification: for all Y in the required set, X has done Y.

The double NOT EXISTS pattern implements this by expressing the condition logically: "find X for which there does NOT EXIST a required Y that X has NOT completed." This translates to two nested NOT EXISTS: the outer NOT EXISTS checks "is there a required item (Y)?", the inner NOT EXISTS checks "has X done Y?" If the outer NOT EXISTS is satisfied — no required item exists that X has not done — then X has done all required items.

SQL implementation: SELECT * FROM customers AS c WHERE NOT EXISTS (SELECT 1 FROM products AS p WHERE p.category = 'Dairy' AND NOT EXISTS (SELECT 1 FROM order_items AS oi JOIN orders AS o ON oi.order_id = o.order_id WHERE oi.product_id = p.product_id AND o.customer_id = c.customer_id AND o.order_status = 'Delivered')). Read inside-out: the innermost subquery asks "has this customer ordered this Dairy product?" The middle NOT EXISTS makes it "is there a Dairy product this customer has NOT ordered?" The outer NOT EXISTS makes it "is there NO Dairy product this customer hasn't ordered?" — which is TRUE only when the customer has ordered all Dairy products. This is the only clean SQL implementation of relational division; JOIN-based alternatives are more complex and fragile.

Q: When would you combine EXISTS and NOT EXISTS in the same WHERE clause?

Combining EXISTS and NOT EXISTS in WHERE expresses compound existence conditions — "has done X but never done Y", "is active but has not completed Z", "was created but has not been processed." These are very common in analytics and CRM segmentation.

The pattern: WHERE EXISTS (...has the required condition...) AND NOT EXISTS (...lacks the disqualifying condition...). Examples: customers who have placed at least one order (EXISTS) but have never cancelled one (NOT EXISTS) — loyal customers. Employees who are assigned to a store (EXISTS) but have never managed a project (NOT EXISTS) — candidates for project lead rotation. Products that have been sold (EXISTS) but never returned (NOT EXISTS) — reliable products for promotion.

The compound pattern is more readable than the JOIN alternative. WHERE EXISTS (...) AND NOT EXISTS (...) reads almost like English — "customers who have ordered AND have not cancelled." The equivalent with JOINs requires careful outer join construction and NULL checking, which is less immediately clear. Use EXISTS + NOT EXISTS whenever a business question involves multiple independent existence conditions that combine with AND — each condition naturally maps to one EXISTS or NOT EXISTS clause.

Q: How does the performance of EXISTS compare to a JOIN for the same query?

For most queries on modern databases, EXISTS and the equivalent JOIN produce the same query plan — the optimiser recognises the semi-join pattern and converts both to the same execution strategy. The logical difference (EXISTS short-circuits, JOIN processes all rows) is typically erased by the optimiser's plan selection.

Where EXISTS has a genuine performance advantage over JOIN: when the subquery's early exit (short-circuit) is preserved by the optimiser and matches are very common. If 90% of customers have orders, EXISTS finds one order per customer and stops — processing on average ~1.1 inner rows per outer row. A JOIN processes all matching orders per customer, which could be 5, 10, or 100 rows per customer. For large tables with many matches per outer row, this can be significant.

Where JOIN can be faster: when the JOIN uses a hash join with a small build side — the optimiser builds a hash table from the inner table and probes it for each outer row in O(1) per probe. For EXISTS on large tables with complex subqueries that the optimiser cannot convert to a hash join, the correlated execution (one subquery per outer row) can be slower. The practical guidance: both approaches are equivalent for most production queries on indexed tables. Use EXISTS when existence semantics are the intent (clearest code), use JOIN when you need columns from the right table (EXISTS cannot project them). If performance matters, measure both with EXPLAIN ANALYZE on realistic data volumes.

// Part 13

Errors You Will Hit — And Exactly Why They Happen

NOT EXISTS returns all rows — the filter appears to have no effect

Cause: The correlated condition inside NOT EXISTS is missing. NOT EXISTS (SELECT 1 FROM orders) without a WHERE clause linking to the outer query checks whether the orders table has any rows at all — which is always TRUE when the table is non-empty. NOT TRUE = FALSE for every outer row, so no customers are excluded. The subquery needs to be correlated to the current outer row.

Fix: Add the correlated WHERE condition: WHERE NOT EXISTS (SELECT 1 FROM orders AS o WHERE o.customer_id = c.customer_id). The inner WHERE must reference the outer alias (c.customer_id) to make the check specific to the current outer row. Verify by testing the inner query manually: SELECT 1 FROM orders WHERE customer_id = 1 — this should return rows for customers who have ordered and no rows for those who have not.

NOT EXISTS returns zero rows — expected to find some customers with no orders

Cause: The NOT EXISTS subquery has a condition that is always satisfied or references the wrong column. A common mistake: NOT EXISTS (SELECT 1 FROM orders WHERE customer_id IS NOT NULL) — this checks whether any order exists with a non-NULL customer_id, which is almost always TRUE, making NOT EXISTS always FALSE and returning zero rows. Another cause: missing the correlated link so it tests a global condition.

Fix: Verify the correlated condition specifically links to the outer row: WHERE NOT EXISTS (SELECT 1 FROM orders AS o WHERE o.customer_id = c.customer_id). Test by temporarily changing NOT EXISTS to EXISTS — it should return the rows you want to exclude. If EXISTS returns the expected rows, NOT EXISTS should return the complement. Also check whether the outer table has any rows that genuinely have no matches in the inner table — NOT EXISTS correctly returns zero rows if all outer rows do have matches.

EXISTS subquery runs very slowly — takes minutes on a large table

Cause: The correlated column in the EXISTS subquery's WHERE is not indexed in the inner table. Without an index, each outer row evaluation requires a full scan of the inner table. For N outer rows, total cost is N × full_table_scan — O(n²). With 100,000 customers and 1,000,000 orders, this means 100,000 full scans of 1,000,000 rows = 100 billion row comparisons.

Fix: Create an index on the correlated column in the inner table: CREATE INDEX idx_orders_customer_id ON orders(customer_id). With this index, each EXISTS check is one index lookup — O(log n) instead of O(n). Also add indexes on any additional filter columns used inside the subquery: CREATE INDEX idx_orders_status_customer ON orders(order_status, customer_id) for EXISTS queries that filter on both status and customer_id. Use EXPLAIN ANALYZE to verify the index is being used after creation.

Double NOT EXISTS (relational division) returns no rows — expected some results

Cause: The double NOT EXISTS logic is inverted or the correlated conditions reference the wrong tables. The outer NOT EXISTS should reference the required set (e.g., Dairy products), and the inner NOT EXISTS should reference the action table (e.g., order_items). Common mistake: both NOT EXISTS reference the same table, making the logic wrong. Another cause: the required set is empty — if no Dairy products exist, NOT EXISTS (SELECT 1 FROM products WHERE category = 'Dairy' AND NOT EXISTS (...)) is vacuously TRUE for all customers.

Fix: Read the double NOT EXISTS inside-out to verify logic: inner NOT EXISTS should ask 'has this customer ordered this product?', outer NOT EXISTS should ask 'does a product exist that the customer has NOT ordered?'. Trace through one specific customer manually: pick a customer_id, run the inner NOT EXISTS subquery for each Dairy product and verify it correctly returns rows when the customer hasn't ordered that product. Also verify the required set is non-empty: SELECT COUNT(*) FROM products WHERE category = 'Dairy'.

EXISTS in UPDATE modifies too many rows — more than expected

Cause: The EXISTS subquery is not specific enough — it matches rows that should not be included in the update. A missing filter condition in the subquery, or a correlated condition that is too broad, causes EXISTS to return TRUE for outer rows that should not be updated.

Fix: Always preview an EXISTS UPDATE as a SELECT first: convert UPDATE table SET col = val WHERE EXISTS (...) to SELECT * FROM table WHERE EXISTS (...) and verify exactly which rows would be affected. The SELECT shows you the scope before any modification. If the SELECT returns unexpected rows, tighten the EXISTS subquery's WHERE conditions. Add AND o.order_status = 'Delivered' or other filters to make the existence check more specific. Only run the UPDATE after the SELECT confirms the exact expected rows.

Try It Yourself

Write three queries using EXISTS and NOT EXISTS: (1) Find all stores that have at least one employee AND at least one delivered order worth more than ₹800. Show store_id, city, and store_name. (2) Find all customers who have placed at least one delivered order but have NEVER used COD (Cash on Delivery) as a payment method. Show customer_id, full name, city, and loyalty_tier. (3) Find all products that exist in the products table but have never appeared in any delivered order. Then also find products that appear in delivered orders but have a unit_price below ₹30 AND have been ordered more than once. Combine both using UNION ALL with a 'reason' column labelling each set.

🎯 Key Takeaways

✓EXISTS returns TRUE if the subquery produces at least one row, FALSE if it produces zero rows. It short-circuits — stops scanning on the first match. Column values inside the subquery are irrelevant.
✓SELECT 1 inside EXISTS is the universal convention — it signals that only existence matters, not the subquery's output. SELECT *, SELECT NULL, and SELECT 1 all produce identical results inside EXISTS.
✓NOT EXISTS is the safest anti-join — it correctly handles NULLs. NOT IN fails silently when the subquery contains NULL values, returning zero rows with no error.
✓The correlated condition inside EXISTS must link to the outer query's current row. WITHOUT EXISTS (SELECT 1 FROM orders WHERE customer_id = c.customer_id) — missing this link makes EXISTS a global check, not a per-row check.
✓EXISTS + NOT EXISTS combined in WHERE expresses compound existence: "has done X but never done Y". This is the SQL pattern for segmentation conditions like "ordered but never cancelled."
✓Double NOT EXISTS implements relational division — "find X that has done ALL of Y." Read inside-out: outer NOT EXISTS checks for missing required items, inner NOT EXISTS checks if the specific item was done.
✓EXISTS in UPDATE and DELETE enables conditional DML — update rows that have related data, delete orphaned rows that reference missing parents.
✓Always preview EXISTS-based UPDATE and DELETE as SELECT first — convert to SELECT * WHERE EXISTS (...) and verify the exact rows affected before running the modification.
✓For performance: index the correlated column in the inner table. Without an index, EXISTS is O(n²). With an index, it is O(n log n). Use EXPLAIN ANALYZE to verify index usage.
✓EXISTS vs COUNT(*) > 0: always use EXISTS. COUNT scans all matching rows before comparing. EXISTS stops at the first match. COUNT(*) > 0 is always slower and semantically misleading.

What comes next

In Module 39, you learn UNION, INTERSECT, and EXCEPT — set operations that combine result sets vertically, with every deduplication, ordering, and column-matching rule you need.

Module 39 → UNION, INTERSECT, and EXCEPT

Discussion

Have a better approach? Found something outdated? Share it — your knowledge helps everyone learning here.

Continue with GitHub