EXISTS and NOT EXISTS
The cleanest existence check in SQL — how EXISTS works, when it beats IN and LEFT JOIN IS NULL, every anti-join pattern, NULL safety, and performance at scale
// Part 01
What EXISTS Does — True or False, Nothing Else
EXISTS is a predicate — it returns TRUE or FALSE. It takes a subquery as its argument and returns TRUE if the subquery produces at least one row, FALSE if it produces zero rows. That is its entire job. The subquery's column values are completely irrelevant — only whether the subquery returns any rows matters.
This simplicity is its strength. EXISTS answers one question and one question only: does at least one matching row exist? The moment the database finds the first matching row, it stops scanning — it does not need to find all matches, count them, or return their values. This short-circuit behaviour makes EXISTS uniquely efficient for existence checks.
// Part 02
Basic EXISTS — Finding Rows With Related Data
// Part 03
NOT EXISTS — The Anti-Join
NOT EXISTS finds rows in the outer table that have no matching row in the subquery — the anti-join. It is the safest, most semantically clear way to ask "which X has no Y?" It correctly handles NULLs, unlike NOT IN.
Customers who have never ordered
Products never sold
Employees with no store assignment
// Part 04
EXISTS with Multiple Conditions — Precise Existence Checks
The subquery inside EXISTS can have any WHERE conditions — combining multiple filters to check for a very specific kind of related row. This is where EXISTS becomes more powerful than a simple JOIN or IN check.
Customers with at least one large delivered order
Stores that have sold Dairy products
Products ordered by Platinum customers
// Part 05
NOT EXISTS vs NOT IN vs LEFT JOIN IS NULL
Three approaches solve the anti-join problem — finding rows with no match. They produce the same results for non-NULL data, but differ critically in NULL handling, readability, and performance.
The NULL trap in NOT IN
All three approaches — same result on clean data
Comparison table
| Approach | NULL safe? | Short-circuits? | Can select right-side cols? | Best when |
|---|---|---|---|---|
| NOT EXISTS | ✅ Always | ✅ First match found | ❌ No — subquery not projected | Default anti-join — clearest semantics |
| NOT IN | ❌ Fails with NULLs | ❌ Must build full list | ❌ No | Only when 100% sure no NULLs in subquery |
| LEFT JOIN IS NULL | ✅ Always | ❌ Processes all rows | ✅ Yes — right cols accessible | When you also need right-side data or already in a JOIN chain |
// Part 06
EXISTS vs IN — The Semi-Join Comparison
Both EXISTS and IN implement the semi-join — finding outer rows that have at least one match in the inner table. They produce the same result for non-NULL data. The choice between them is about semantics, performance, and NULL safety.
EXISTS vs IN — equivalent results, different mechanics
When EXISTS is faster than IN
EXISTS short-circuits — it stops as soon as one matching row is found. IN builds the complete list first, then checks membership. When the inner table is large and matches are common, EXISTS does dramatically less work.
// Part 07
EXISTS with Aggregates — Conditional Existence
The subquery inside EXISTS can contain aggregates, GROUP BY, and HAVING — making it possible to check existence based on aggregate conditions, not just row-level conditions.
// Part 08
Double NOT EXISTS — Relational Division
Relational division asks "which X has done ALL of Y?" — customers who have ordered every product in a category, employees who have completed every required training, stores that have sold every product. This is one of the hardest patterns in SQL, cleanly solved by double NOT EXISTS.
The logic of double NOT EXISTS
🎯 Pro Tip
Double NOT EXISTS (relational division) is one of the few SQL problems that has no clean equivalent using JOINs or window functions. It is the canonical solution for "find X that has done ALL of Y" queries. The nested logic reads inside-out: for a customer to qualify, there must NOT EXIST any category for which it is true that the customer has NOT ordered from that category. Read it slowly, inside-out, and it becomes clear.
// Part 09
EXISTS in UPDATE and DELETE — Conditional DML
EXISTS is not limited to SELECT queries. It appears in UPDATE and DELETE statements to conditionally modify rows based on whether related data exists.
UPDATE rows where related data exists
DELETE rows where related data does NOT exist
// Part 10
Performance Deep Dive — How EXISTS Executes
EXISTS performance depends on the database's ability to use indexes on the correlated column and how quickly the first matching row is found. Understanding the execution model helps you write EXISTS queries that run fast at any scale.
Index usage with EXISTS
EXISTS vs COUNT(*) > 0 — always prefer EXISTS
// Part 11
What This Looks Like at Work
You are a backend engineer at DoorDash. The marketing team is planning three campaign segments and needs the customer lists. Each segment uses a different existence condition — perfect for EXISTS and NOT EXISTS queries. The lists must be mutually exclusive and exhaustive.
🎯 Pro Tip
Combining EXISTS and NOT EXISTS in the same WHERE clause is the natural SQL expression for "has done X but never done Y" — a compound existence condition. This pattern appears constantly in retention and segmentation analytics: customers who churned (ordered before but not recently), users who started but never completed, employees who are assigned but have not performed. Build the EXISTS + NOT EXISTS compound pattern into your analytics toolkit.
// Part 12
Interview Prep — 5 Questions With Complete Answers
EXISTS evaluates a correlated subquery and returns TRUE if the subquery produces at least one row, FALSE if it produces zero rows. It does not examine the values returned by the subquery — only whether any rows exist. The database stops evaluating the subquery as soon as the first matching row is found (short-circuit behaviour), making it efficient for existence checks.
SELECT 1 inside EXISTS is a universal convention that signals intent: "I only care whether rows exist, not what values they contain." The 1 is a constant — it carries no information. Writing SELECT 1 rather than SELECT * or SELECT column_name communicates to any reader that the subquery's output is irrelevant and the purpose is purely an existence check. This convention is widely recognised and expected in professional SQL code.
Technically, SELECT *, SELECT column, SELECT NULL, and SELECT 1 all produce equivalent results inside EXISTS. The database optimiser ignores the SELECT list entirely when evaluating EXISTS — it only cares whether the WHERE conditions inside the subquery are satisfied for at least one row. SELECT 1 is the standard precisely because it makes this intent explicit. Using SELECT * inside EXISTS can mislead readers into thinking the column values matter, and in older database versions, SELECT * could cause unnecessary column retrieval overhead (though modern optimisers eliminate this).
NOT IN fails silently when the subquery contains NULL values, returning zero rows for every comparison — a critical correctness bug with no error message. The mechanism: NOT IN (1, 2, NULL) is evaluated as NOT (col = 1 OR col = 2 OR col = NULL). Since col = NULL evaluates to NULL in SQL's three-valued logic (not FALSE), the entire OR expression can evaluate to NULL for any col value, and NOT NULL = NULL (still not TRUE). WHERE discards all NULL-result rows, returning zero results for the entire query — even when non-NULL unmatched rows clearly exist.
NOT EXISTS is immune to this problem. It evaluates row-by-row whether a matching row exists. A NULL value in the inner table's column means "this particular row has no value for this column" — it does not affect whether other rows match. The existence check for the outer row is independent of NULLs in non-matching rows.
The silent failure mode of NOT IN makes it particularly dangerous in production. A schema change that adds a nullable foreign key column, a bulk import that inserts a row with NULL in the join column, or a data quality issue that introduces unexpected NULLs — any of these can cause a previously working NOT IN query to silently start returning zero rows with no error, warning, or any visible indication that the query's behaviour changed. NOT EXISTS never exhibits this behaviour. Professional SQL standards at most companies mandate NOT EXISTS (or LEFT JOIN IS NULL) for all anti-join queries, with NOT IN only permitted when a NOT NULL constraint on the join column is verified and documented.
Relational division finds entities that satisfy all members of a set — "which customers have ordered every product in the Dairy category?", "which employees have completed all required trainings?", "which stores have sold every product?" It is the relational equivalent of universal quantification: for all Y in the required set, X has done Y.
The double NOT EXISTS pattern implements this by expressing the condition logically: "find X for which there does NOT EXIST a required Y that X has NOT completed." This translates to two nested NOT EXISTS: the outer NOT EXISTS checks "is there a required item (Y)?", the inner NOT EXISTS checks "has X done Y?" If the outer NOT EXISTS is satisfied — no required item exists that X has not done — then X has done all required items.
SQL implementation: SELECT * FROM customers AS c WHERE NOT EXISTS (SELECT 1 FROM products AS p WHERE p.category = 'Dairy' AND NOT EXISTS (SELECT 1 FROM order_items AS oi JOIN orders AS o ON oi.order_id = o.order_id WHERE oi.product_id = p.product_id AND o.customer_id = c.customer_id AND o.order_status = 'Delivered')). Read inside-out: the innermost subquery asks "has this customer ordered this Dairy product?" The middle NOT EXISTS makes it "is there a Dairy product this customer has NOT ordered?" The outer NOT EXISTS makes it "is there NO Dairy product this customer hasn't ordered?" — which is TRUE only when the customer has ordered all Dairy products. This is the only clean SQL implementation of relational division; JOIN-based alternatives are more complex and fragile.
Combining EXISTS and NOT EXISTS in WHERE expresses compound existence conditions — "has done X but never done Y", "is active but has not completed Z", "was created but has not been processed." These are very common in analytics and CRM segmentation.
The pattern: WHERE EXISTS (...has the required condition...) AND NOT EXISTS (...lacks the disqualifying condition...). Examples: customers who have placed at least one order (EXISTS) but have never cancelled one (NOT EXISTS) — loyal customers. Employees who are assigned to a store (EXISTS) but have never managed a project (NOT EXISTS) — candidates for project lead rotation. Products that have been sold (EXISTS) but never returned (NOT EXISTS) — reliable products for promotion.
The compound pattern is more readable than the JOIN alternative. WHERE EXISTS (...) AND NOT EXISTS (...) reads almost like English — "customers who have ordered AND have not cancelled." The equivalent with JOINs requires careful outer join construction and NULL checking, which is less immediately clear. Use EXISTS + NOT EXISTS whenever a business question involves multiple independent existence conditions that combine with AND — each condition naturally maps to one EXISTS or NOT EXISTS clause.
For most queries on modern databases, EXISTS and the equivalent JOIN produce the same query plan — the optimiser recognises the semi-join pattern and converts both to the same execution strategy. The logical difference (EXISTS short-circuits, JOIN processes all rows) is typically erased by the optimiser's plan selection.
Where EXISTS has a genuine performance advantage over JOIN: when the subquery's early exit (short-circuit) is preserved by the optimiser and matches are very common. If 90% of customers have orders, EXISTS finds one order per customer and stops — processing on average ~1.1 inner rows per outer row. A JOIN processes all matching orders per customer, which could be 5, 10, or 100 rows per customer. For large tables with many matches per outer row, this can be significant.
Where JOIN can be faster: when the JOIN uses a hash join with a small build side — the optimiser builds a hash table from the inner table and probes it for each outer row in O(1) per probe. For EXISTS on large tables with complex subqueries that the optimiser cannot convert to a hash join, the correlated execution (one subquery per outer row) can be slower. The practical guidance: both approaches are equivalent for most production queries on indexed tables. Use EXISTS when existence semantics are the intent (clearest code), use JOIN when you need columns from the right table (EXISTS cannot project them). If performance matters, measure both with EXPLAIN ANALYZE on realistic data volumes.
// Part 13
Errors You Will Hit — And Exactly Why They Happen
🎯 Key Takeaways
- ✓EXISTS returns TRUE if the subquery produces at least one row, FALSE if it produces zero rows. It short-circuits — stops scanning on the first match. Column values inside the subquery are irrelevant.
- ✓SELECT 1 inside EXISTS is the universal convention — it signals that only existence matters, not the subquery's output. SELECT *, SELECT NULL, and SELECT 1 all produce identical results inside EXISTS.
- ✓NOT EXISTS is the safest anti-join — it correctly handles NULLs. NOT IN fails silently when the subquery contains NULL values, returning zero rows with no error.
- ✓The correlated condition inside EXISTS must link to the outer query's current row. WITHOUT EXISTS (SELECT 1 FROM orders WHERE customer_id = c.customer_id) — missing this link makes EXISTS a global check, not a per-row check.
- ✓EXISTS + NOT EXISTS combined in WHERE expresses compound existence: "has done X but never done Y". This is the SQL pattern for segmentation conditions like "ordered but never cancelled."
- ✓Double NOT EXISTS implements relational division — "find X that has done ALL of Y." Read inside-out: outer NOT EXISTS checks for missing required items, inner NOT EXISTS checks if the specific item was done.
- ✓EXISTS in UPDATE and DELETE enables conditional DML — update rows that have related data, delete orphaned rows that reference missing parents.
- ✓Always preview EXISTS-based UPDATE and DELETE as SELECT first — convert to SELECT * WHERE EXISTS (...) and verify the exact rows affected before running the modification.
- ✓For performance: index the correlated column in the inner table. Without an index, EXISTS is O(n²). With an index, it is O(n log n). Use EXPLAIN ANALYZE to verify index usage.
- ✓EXISTS vs COUNT(*) > 0: always use EXISTS. COUNT scans all matching rows before comparing. EXISTS stops at the first match. COUNT(*) > 0 is always slower and semantically misleading.
What comes next
In Module 39, you learn UNION, INTERSECT, and EXCEPT — set operations that combine result sets vertically, with every deduplication, ordering, and column-matching rule you need.
Module 39 → UNION, INTERSECT, and EXCEPTDiscussion
0Have a better approach? Found something outdated? Share it — your knowledge helps everyone learning here.