IN and BETWEEN Operators
Clean shorthand for multiple OR conditions and range checks — how IN and BETWEEN work, their equivalents, NULL behaviour, and every production pattern
// Part 01
The Problem They Solve
You have already learned AND and OR from Module 07. They work perfectly — but they get verbose quickly. Consider finding all orders with one of four possible statuses:
Now consider finding orders within a price range:
Both work. But SQL provides two operators specifically designed to make these patterns cleaner, more readable, and less error-prone: IN for matching against a list of values, and BETWEEN for matching a range. Every professional SQL writer uses both constantly.
// Part 02
IN — Match Against a List of Values
The IN operator checks whether a column value matches any value in a specified list. It is exactly equivalent to multiple OR conditions with = — just far more readable.
IN with text values
IN with numeric values
IN with dates
// Part 03
NOT IN — Excluding a List of Values
NOT IN is the inverse of IN — it returns rows where the column value does NOT match any value in the list. Equivalent to multiple <> conditions combined with AND.
// Part 04
IN with a Subquery — Dynamic Lists
The most powerful use of IN is with a subquery — instead of a hardcoded list, the list of values is dynamically computed by another SELECT. This is called a subquery or inner query. You will learn subqueries fully in Module 36, but here is the pattern so you can use it now.
🎯 Pro Tip
When using IN with a subquery, the subquery must return exactly one column — the column you are comparing against. If the subquery returns multiple columns, SQL throws an error. Also add DISTINCT to the subquery when possible: SELECT DISTINCT customer_id FROM orders returns fewer rows for the IN check to process, which improves performance on large tables.
// Part 05
BETWEEN — Match a Range (Inclusive)
BETWEEN checks whether a value falls within a range — inclusive of both endpoints. It is equivalent to >= lower_bound AND <= upper_bound, but cleaner and less error-prone (no risk of accidentally mixing > and >= or forgetting which end is inclusive).
BETWEEN with numbers
BETWEEN with dates
BETWEEN with text — alphabetical range
BETWEEN works on text columns too — it checks alphabetical order. This is less commonly used but valid.
// Part 06
NOT BETWEEN — Excluding a Range
NOT BETWEEN is the inverse — it returns rows where the value falls outside the specified range (strictly less than the lower bound OR strictly greater than the upper bound).
// Part 07
BETWEEN and Date Ranges — The Timestamp Trap
BETWEEN with dates has a critical gotcha when your column stores timestamps (date + time) rather than plain dates. This is one of the most common bugs in analytics queries.
The timestamp boundary problem
In FreshCart, order_date is a DATE column so BETWEEN works correctly. But in any system where timestamps are stored (most production systems log exact timestamps), always use >= start AND < day_after_end instead of BETWEEN for date range filtering.
// Part 08
Combining IN and BETWEEN in the Same Query
IN and BETWEEN combine with AND, OR, and NOT exactly like any other WHERE condition. Real queries often use both together.
// Part 09
Performance — IN vs OR and BETWEEN vs >= AND <=
IN vs multiple OR conditions
For most databases, IN (value1, value2, ...) and the equivalent OR conditions produce identical execution plans — the query optimiser converts them to the same internal representation. There is no performance difference between them for a hardcoded list. Choose IN for readability when you have more than two or three values.
For IN with a subquery, performance depends on the subquery's complexity and size. The database executes the subquery first, builds the result set, then uses it for the IN check. On large subquery results, this can be slower than a JOIN — you will learn how to convert IN subqueries to JOINs in Module 31.
BETWEEN vs >= AND <=
BETWEEN and the equivalent >= AND <= conditions are identical in execution — the optimiser converts BETWEEN to the range comparison internally. Both can use B-tree indexes for range scans equally well. Choose BETWEEN when both endpoints are known and it makes the query more readable. Choose explicit >= AND <= when one endpoint is open (no lower bound, or no upper bound), or when dealing with timestamps where the boundary precision matters.
IN with large lists
IN with a very large hardcoded list (hundreds or thousands of values) can be slow because the database must check each value against the list. For large value sets, a JOIN to a temporary table or a subquery is more efficient. As a rule of thumb: IN with fewer than 100 values is fine; above that, consider an alternative approach.
🎯 Pro Tip
When you find yourself writing IN with a long list of IDs that came from another query, that is almost always a sign you should be writing a JOIN instead. Instead of WHERE customer_id IN (SELECT customer_id FROM orders WHERE ...), write the JOIN directly: FROM customers JOIN orders ON customers.customer_id = orders.customer_id WHERE .... The JOIN is typically faster and more readable. You will learn to make this conversion naturally from Module 30 onwards.
// Part 10
NULL Behaviour — IN, NOT IN, and BETWEEN
NULL in the IN list
If NULL is in the IN list, it does not affect the result for the rows that match other values. NULL simply cannot match anything — so it adds nothing to the result.
NULL in the column being tested with IN
If the column itself is NULL, IN returns NULL (not TRUE), so NULL rows are excluded. NOT IN also returns NULL for NULL column values — same exclusion behaviour.
BETWEEN and NULL
BETWEEN with a NULL column also returns NULL — the row is excluded. BETWEEN with NULL as a bound always returns NULL for every row (since any comparison with NULL is NULL).
// Part 11
What This Looks Like at Work
You are an analyst at Stripe's merchant analytics team. The merchant success team needs three different reports for their quarterly business review. All three need to be ready within an hour.
// Part 12
Interview Prep — 5 Questions With Complete Answers
IN checks whether a column value matches any value in a specified list. WHERE city IN ('Seattle', 'Austin', 'New York') returns rows where city is any of the three cities. It is exactly equivalent to multiple OR conditions using equality: WHERE city = 'Seattle' OR city = 'Austin' OR city = 'New York'. The query optimiser typically converts IN to OR internally, so they produce identical execution plans and have identical performance for hardcoded lists.
IN is preferred over multiple OR conditions for three reasons. First, readability — IN lists the values once, clearly, without repeating the column name before every value. With five values, OR requires writing the column name and = operator five times. IN writes it once. Second, maintainability — adding a new value to an IN list requires adding one value inside the parentheses. Adding a value to OR conditions requires adding a full new OR clause. Third, correctness — IN with many values is harder to accidentally miswrite than a long chain of OR conditions, which requires careful attention to whether each uses = (not <> or >= by mistake).
IN also accepts a subquery instead of a hardcoded list: WHERE customer_id IN (SELECT customer_id FROM orders WHERE ...). This is dynamic IN — the list of values is computed by the subquery at runtime rather than hardcoded in the query. The subquery must return exactly one column. This pattern is a simpler alternative to JOIN for existence checks, though JOINs are often preferred for performance on large datasets.
BETWEEN checks whether a value falls within a range. BETWEEN lower AND upper is exactly equivalent to column >= lower AND column <= upper. Both boundaries are inclusive — the lower bound and the upper bound are included in the result set. A row where unit_price = 50 is returned by WHERE unit_price BETWEEN 50 AND 200, and so is a row where unit_price = 200.
BETWEEN works on numeric, date, and text columns. For numbers and dates, the range semantics are intuitive. For text columns, BETWEEN uses lexicographic (dictionary) order — WHERE name BETWEEN 'A' AND 'M' returns names that fall alphabetically between A and M inclusive. This is valid but less commonly used.
NOT BETWEEN is the inverse: NOT BETWEEN lower AND upper returns rows where the value is strictly less than lower OR strictly greater than upper — the boundaries are excluded from NOT BETWEEN results, consistent with the inclusive definition of BETWEEN itself.
BETWEEN is inclusive on both ends. When the column is a DATE type, '2024-01-31' means the entire day of January 31st, so BETWEEN '2024-01-01' AND '2024-01-31' correctly captures all of January. When the column is a TIMESTAMP or DATETIME type, '2024-01-31' is interpreted as '2024-01-31 00:00:00.000' — midnight at the start of January 31st. Any timestamp from 2024-01-31 00:00:01 to 2024-01-31 23:59:59 is greater than '2024-01-31 00:00:00' and therefore NOT between the two bounds. Records created during January 31st (except at exactly midnight) are silently excluded.
The impact in practice: if you use BETWEEN '2024-01-01' AND '2024-01-31' on a timestamp column for a January monthly report, you get all of January except orders placed on January 31st after midnight — which is most of January 31st. The report shows January 1-30 plus the very first second of January 31st. The discrepancy is subtle and can go unnoticed until someone cross-checks totals.
The correct approach for timestamp columns is always: WHERE timestamp_col >= '2024-01-01' AND timestamp_col < '2024-02-01'. The >= captures everything from the very start of January 1st. The strictly less than '2024-02-01' captures everything up to but not including the first moment of February — which is the entire last second of January 31st. This pattern works identically for DATE and TIMESTAMP columns and is unambiguous about what it includes. Adopt this as a strict professional standard: use >= and < for date range filtering, BETWEEN only when you are certain the column is DATE (not TIMESTAMP) and you need both ends inclusive.
When NULL appears in a NOT IN list, the entire NOT IN condition returns NULL for every row in the outer query — resulting in zero rows in the output. This is one of the most dangerous silent bugs in SQL.
The mechanism: NOT IN (val1, NULL) expands to: column <> val1 AND column <> NULL. The second condition — column <> NULL — always evaluates to NULL because any comparison with NULL returns NULL in three-valued logic. NOT NULL = NULL. Since WHERE discards rows where the condition is NULL, every single row is discarded regardless of what the column contains. The query returns zero rows with no error message.
This becomes a real problem when the NOT IN list comes from a subquery that might return NULLs: WHERE customer_id NOT IN (SELECT customer_id FROM orders). If any order has a NULL customer_id (which it might if customer_id is nullable), the entire NOT IN fails silently. The safe alternatives: use NOT EXISTS instead of NOT IN for subqueries — NOT EXISTS handles NULL correctly because it checks for the presence or absence of rows, not value equality. Or add WHERE customer_id IS NOT NULL to the subquery. As a professional rule: never use NOT IN with a subquery unless you are absolutely certain the subquery cannot return NULL. Default to NOT EXISTS for all subquery-based exclusion checks.
IN with a subquery and a JOIN can often express the same logical operation but have different performance characteristics and readability trade-offs. IN with a subquery is readable and semantically clear: WHERE customer_id IN (SELECT customer_id FROM orders WHERE ...) clearly communicates "find customers whose ID appears in this set." It is a good choice for simple existence checks on small to medium tables where readability matters more than micro-optimisation.
A JOIN is typically preferred over IN with a subquery for three reasons. Performance: most query optimisers convert IN subqueries to semi-joins internally, but explicit JOINs give the optimiser more flexibility to choose the most efficient join algorithm (hash join, merge join, nested loop) based on table statistics. For large tables, an explicit JOIN with proper indexing almost always outperforms IN with a subquery. Flexibility: a JOIN allows you to SELECT columns from both tables in the result. IN subquery only filters based on the subquery result — you cannot include columns from the subquery in the outer SELECT. Clarity for complex logic: multi-table queries with multiple IN subqueries become difficult to read and maintain, while the same logic expressed as JOINs with clear ON conditions and aliases is more organised.
Use IN with a subquery when: the query is simple and the subquery is small, you want to express an existence check without adding JOIN complexity, or you are writing a quick ad-hoc query. Use a JOIN when: tables are large and performance matters, you need columns from both tables in the result, or the query is complex enough that explicit JOIN structure aids maintainability. For production code in a team environment, explicit JOINs are the professional standard — they make the data flow visible and are easier to optimise with EXPLAIN ANALYZE.
// Part 13
Errors You Will Hit — And Exactly Why They Happen
🎯 Key Takeaways
- ✓IN checks whether a column value matches any value in a list. WHERE city IN ('Seattle', 'Austin') is equivalent to WHERE city = 'Seattle' OR city = 'Austin' — just cleaner.
- ✓NOT IN excludes rows matching any value in the list. It is equivalent to multiple <> conditions combined with AND.
- ✓BETWEEN checks for an inclusive range. WHERE price BETWEEN 50 AND 200 includes both 50 and 200. It is equivalent to WHERE price >= 50 AND price <= 200.
- ✓NOT BETWEEN excludes the range. It is equivalent to column < lower OR column > upper — the boundary values themselves are excluded from NOT BETWEEN results.
- ✓BETWEEN requires the lower bound first. BETWEEN 200 AND 50 (reversed) always returns zero rows — no value is simultaneously >= 200 and <= 50.
- ✓BETWEEN on TIMESTAMP columns has a critical bug: '2024-01-31' is interpreted as '2024-01-31 00:00:00', missing the entire last day. Use >= '2024-01-01' AND < '2024-02-01' instead for timestamps.
- ✓NOT IN is dangerous when the list contains NULL — it returns zero rows for the entire query. Use NOT EXISTS for subquery-based exclusions to avoid this trap.
- ✓IN with a subquery (WHERE col IN (SELECT col FROM table)) dynamically computes the list at runtime. The subquery must return exactly one column.
- ✓For large IN lists from subqueries, consider a JOIN instead — it gives the optimiser more flexibility and is often faster on large tables.
- ✓IN and BETWEEN combine with AND, OR, and NOT like any other WHERE condition. Use parentheses when mixing with OR to ensure correct precedence.
What comes next
In Module 16, you learn CASE WHEN — the SQL equivalent of an if-else statement. It lets you create conditional columns, build custom categories, and handle complex branching logic directly inside a query.
Module 16 → CASE WHEN — Conditional LogicDiscussion
0Have a better approach? Found something outdated? Share it — your knowledge helps everyone learning here.