Filtering Rows — WHERE Clause
Get only the rows you actually need — every comparison operator, every data type, and how the database evaluates filters internally
// Part 01
Why WHERE Is the Most Important Clause in SQL
In Module 05 you learned SELECT and FROM — they let you read data from a table. But SELECT without WHERE returns every single row in the table. The customers table has 20 rows. The orders table at a real company has 500 million. Returning all 500 million rows every time you need to answer a question is not just slow — it is impossible. Your screen cannot display it, your network cannot transfer it, and your database server will collapse under the load.
WHERE is the clause that tells the database: "Only give me rows that satisfy this condition." It is the difference between "show me everything" and "show me exactly what I asked for." Every useful SQL query in a production system has a WHERE clause. Mastering WHERE is mastering SQL.
Here is WHERE in its simplest form:
Same table, same columns — but the second query returns only the rows where city equals 'Bangalore'. The WHERE clause evaluated every row and kept only the ones where the condition was true. Everything else was discarded before the result was returned to you.
// Part 02
How WHERE Works Internally
Understanding what the database does when it processes a WHERE clause makes you a dramatically better SQL writer. It explains why some queries are fast and others are slow — and what you can do about it.
The evaluation process — row by row
When the database executes a WHERE clause, it goes through the table row by row. For each row, it evaluates your condition. The condition produces one of three results: TRUE (the row satisfies the condition — include it), FALSE (the row does not satisfy the condition — discard it), or NULL (the condition could not be evaluated because a value is missing — also discarded). Only rows where the condition evaluates to TRUE make it into the result.
This happens before SELECT — remember from Module 05 that the execution order is FROM → WHERE → SELECT. The database identifies all rows that match the WHERE condition first, then extracts the requested columns from only those rows. This is efficient: if WHERE filters 10 million rows down to 100, SELECT only has to process those 100.
Without an index — full table scan
If the column in your WHERE clause has no index, the database must read every single page of the table, check every row, and keep only the ones where the condition is true. This is called a full table scan. On a table with 500 million rows and no index, a full table scan takes minutes to hours.
With an index — direct lookup
If the column in your WHERE clause has an index (a B-tree), the database jumps directly to the matching rows without reading the whole table. Finding 100 rows out of 500 million with an index takes milliseconds — the same query without an index takes minutes. You will learn to create indexes in Module 46. For now, know that WHERE on a primary key or a properly indexed column is always fast regardless of table size.
// Part 03
Comparison Operators — The Six Tools of Filtering
A WHERE condition uses a comparison operator to compare a column value against a test value. There are six comparison operators in SQL, and you will use all six regularly.
| Operator | Meaning | Example | Returns rows where... |
|---|---|---|---|
| = | Equal to | city = 'Bangalore' | city is exactly Bangalore |
| <> or != | Not equal to | city <> 'Bangalore' | city is anything other than Bangalore |
| > | Greater than | unit_price > 100 | unit_price is more than 100 |
| < | Less than | unit_price < 100 | unit_price is less than 100 |
| >= | Greater than or equal to | salary >= 50000 | salary is 50000 or more |
| <= | Less than or equal to | salary <= 50000 | salary is 50000 or less |
Equality — = operator
Not equal — <> operator
Greater than and less than
// Part 04
Filtering Different Data Types — Numbers, Text, and Dates
WHERE works on every data type, but the syntax for each type is slightly different. Getting the syntax wrong is one of the most common beginner mistakes.
Filtering numbers — no quotes
Number values in WHERE conditions are written as-is, without quotes. The database compares the column value numerically.
Filtering text — always use single quotes
Text (VARCHAR, CHAR, TEXT) values in WHERE conditions must be wrapped in single quotes. Double quotes are used for identifiers (column names, table names) in SQL — not for string values. Forgetting this is the most common syntax error beginners make.
String comparison is case-sensitive
In PostgreSQL (and DuckDB, which the playground uses), string comparisons are case-sensitive. 'Bangalore' is not the same as 'bangalore' or 'BANGALORE'. The WHERE condition will only return rows where the value matches exactly — including case.
To do a case-insensitive comparison in PostgreSQL, use the ILIKE operator or wrap the column in LOWER():
Filtering dates — use ISO 8601 format
Date values in WHERE conditions are written as strings in YYYY-MM-DD format, wrapped in single quotes. Always use this format — it is unambiguous across every database and every locale. Never use DD/MM/YYYY or MM-DD-YYYY in SQL queries.
Filtering booleans
// Part 05
WHERE with Calculated Expressions
The condition in WHERE does not have to compare a raw column to a fixed value. You can use calculations on both sides of the comparison. The database evaluates the expression for each row and checks whether the result satisfies the condition.
// Part 06
Filtering NULL Values — IS NULL and IS NOT NULL
NULL is the absence of a value. It is not zero, not an empty string, not false — it means "we do not know" or "this information does not apply." In FreshMart's orders table, delivery_date is NULL for orders that have not been delivered yet — we do not know the delivery date because it has not happened.
NULL has a special property that surprises almost every beginner: you cannot use = to check for NULL. The expression column = NULL does not return TRUE or FALSE — it returns NULL. And remember: WHERE discards rows where the condition is NULL. So WHERE delivery_date = NULL returns zero rows — always, no matter what the data contains.
🎯 Pro Tip
NULL confusion causes silent data quality bugs that are extremely hard to find. Whenever you write a WHERE condition on a column that might contain NULLs, ask yourself: "What do I want to happen with NULL rows?" If you want them included, you need to explicitly handle them. If you want them excluded, your regular condition already excludes them (remember: NULL condition = row discarded). Module 11 covers NULL in full depth — this is a preview of why it matters.
// Part 07
WHERE on Different FreshMart Tables — Real Scenarios
Let us practice WHERE across all six FreshMart tables with real business questions that would come up in day-to-day work.
Customers — targeting by tier and location
Products — filtering by price, category, and stock
Orders — operational filtering
Employees — HR and payroll filtering
// Part 08
Common WHERE Patterns You Will Write Every Day
In real work, certain WHERE patterns appear constantly. Recognising them makes you write queries faster and more accurately.
Exact match on a primary key or ID
The most common query in any application: look up one specific record by its ID. This is always fast because primary keys are always indexed.
Date range filter
Filtering by a date range — this month, last quarter, last 30 days — appears in almost every analytics query.
Status filter
Almost every transactional table has a status column. Filtering by status is one of the most frequent operations.
Amount threshold
Filtering by a monetary amount — high-value transactions, orders above a minimum, products in a price range — is constant in financial and e-commerce analytics.
// Part 09
WHERE vs HAVING — A Critical Distinction
Both WHERE and HAVING filter rows, but they operate at completely different points in query execution and on completely different things. Confusing them causes errors that can be frustrating to debug if you do not understand the distinction.
WHERE filters individual rows — before any grouping or aggregation happens. It operates on the raw column values of each row.
HAVING filters groups — after GROUP BY has aggregated rows into groups. It operates on aggregate values like COUNT(), SUM(), AVG(). You cannot use HAVING without GROUP BY.
You will learn GROUP BY and HAVING in full depth in Modules 28 and 29. For now, remember this one rule: WHERE for rows, HAVING for groups.
// Part 10
Performance — Writing WHERE Clauses That Stay Fast
A WHERE clause that works correctly but runs slowly is still a problem in production. Here are the most important performance rules for WHERE — you will learn the full details in Module 57, but these habits should start now.
Filter on indexed columns
Primary keys are always indexed. Other columns may or may not have indexes — that depends on how the database was designed. When you write WHERE on a column that has an index, the database does an index lookup (fast). When you write WHERE on an unindexed column, the database does a full table scan (slow on large tables). Ask your team which columns are indexed when joining a new project.
Avoid functions on the left side of WHERE
Applying a function to a column in WHERE prevents index usage. The database's index stores raw column values — not the result of functions applied to them.
Put the most selective filter first when possible
The most selective filter is the one that eliminates the most rows. In most databases the query optimiser figures this out automatically, but writing your most restrictive condition first makes the query easier for humans to read and reason about — the first condition narrows the data most aggressively.
🎯 Pro Tip
Every WHERE clause you write in a production environment should be accompanied by a mental question: "Does this column have an index?" If you are filtering on a column that millions of rows will be scanned against and there is no index, the query will be slow. The fix is adding an index — which you will learn to do in Module 46. For now, build the habit of asking the question.
// Part 11
What This Looks Like at Work
You are an analyst at Meesho, a Bangalore-based social commerce platform. It is 2 PM on a Thursday. The customer support team sends an urgent message — a customer is complaining that their order was marked delivered but they never received it. They need you to pull everything about this specific order immediately.
🎯 Pro Tip
This is real SQL work. Not complex analytics — just precise, targeted WHERE queries that narrow from "the whole database" to "the exact rows that explain the problem." The investigation above uses only what you have learned in Modules 01–06. This is how much SQL you need to be genuinely useful on day one.
// Part 12
Interview Prep — 5 Questions With Complete Answers
The WHERE clause filters rows in a SQL query — it specifies conditions that each row must satisfy to be included in the result. Only rows where the WHERE condition evaluates to TRUE are passed forward. Rows where the condition is FALSE or NULL are discarded.
WHERE is executed before SELECT in the query execution order. The full execution order is: FROM (identify the source table), WHERE (filter rows), GROUP BY (group filtered rows), HAVING (filter groups), SELECT (compute output columns from remaining rows), ORDER BY (sort), LIMIT (cut to n rows). This sequence has important consequences: you cannot reference a SELECT alias in a WHERE clause because WHERE runs before SELECT and the alias does not yet exist. You cannot use aggregate functions (COUNT, SUM, AVG) in WHERE because aggregation happens after WHERE — use HAVING instead for filtering aggregate results.
Performance-wise, WHERE is the most important clause for query speed. An effective WHERE condition on an indexed column can reduce the rows the database processes from 500 million to 100 — a 5-million-fold reduction in work before SELECT, ORDER BY, or any other clause runs.
NULL represents the absence of a value — it means "unknown" or "not applicable." In SQL, any comparison involving NULL returns NULL, not TRUE or FALSE. This is called three-valued logic. The expression column = NULL does not return TRUE for rows where column is NULL — it returns NULL. And the WHERE clause discards rows where the condition result is NULL (not just FALSE). So WHERE column = NULL will always return zero rows, regardless of what the data contains.
This behaviour is technically correct but surprises almost every SQL beginner. NULL means "we don't know the value" — and "we don't know" = "we don't know" does not logically produce a definitive TRUE answer, so SQL returns NULL instead.
To check for NULL, SQL provides the IS NULL and IS NOT NULL operators. These are specifically designed to handle the three-valued logic of NULL: IS NULL returns TRUE when the value is NULL, and IS NOT NULL returns TRUE when the value is anything other than NULL. WHERE delivery_date IS NULL finds all rows where delivery_date has not been set. WHERE delivery_date IS NOT NULL finds all rows where it has been set to a real date. These are the only correct ways to filter NULL values.
WHERE and HAVING both filter data, but they operate at different stages of query execution and on different things. WHERE filters individual rows before any grouping or aggregation. It operates on raw column values and runs early in the execution pipeline — immediately after FROM. HAVING filters groups after GROUP BY has aggregated rows, and it operates on aggregate values like COUNT(), SUM(), AVG(). HAVING runs after SELECT in the execution order.
A concrete example: to find the total revenue per city, but only for cities with more than ₹5,000 in delivered orders, you need both. WHERE order_status = 'Delivered' filters individual rows before the city grouping. HAVING SUM(total_amount) > 5000 filters the city groups after their revenue has been computed. You cannot use WHERE SUM(total_amount) > 5000 because SUM() does not exist yet when WHERE runs.
Performance difference: WHERE filters early and reduces the number of rows that GROUP BY must process. HAVING filters late after all aggregation is done. For performance, always move filtering to WHERE when possible and use HAVING only for conditions that genuinely require aggregate values. If you can express a filter in WHERE instead of HAVING, always do — it reduces the work done by GROUP BY.
A full table scan is when the database reads every page of a table from start to finish to find rows that match a WHERE condition. On a small table this is fine. On a table with 500 million rows stored across thousands of disk pages, a full table scan can take minutes or longer — and blocks other queries from running efficiently.
Whether WHERE causes a full table scan depends on whether the filtered column has an index. If the column has a B-tree index, the database uses the index to jump directly to the matching rows without reading the whole table — an index lookup. This turns a minutes-long full scan into a sub-millisecond operation. If the column has no index, the database has no choice but to read every row.
WHERE conditions that prevent index usage include: applying a function to the column (WHERE UPPER(city) = 'BANGALORE' cannot use an index on city), using OR conditions across different columns without a composite index, and using leading wildcards in LIKE (WHERE email LIKE '%gmail.com' cannot use an index because the prefix is unknown). The solution for function-wrapped columns is to rewrite the condition: WHERE city = 'Bangalore' (case-sensitive, uses index) or add a functional index on UPPER(city). Understanding when WHERE triggers a full scan versus an index lookup is the foundation of query optimisation.
String comparison in WHERE checks whether a column value matches a specified string. In PostgreSQL, SQLite, and DuckDB (used in the Chaduvuko playground), string comparisons are case-sensitive by default — the exact characters must match including capitalisation. 'Bangalore' does not equal 'bangalore' or 'BANGALORE'. In MySQL, string comparisons are case-insensitive by default because MySQL's default collation (utf8mb4_general_ci, where ci stands for case-insensitive) treats uppercase and lowercase as equivalent.
A real example of a case-sensitivity issue: an analyst queries WHERE city = 'bangalore' on a PostgreSQL database where cities are stored as 'Bangalore' (capital B). The query returns zero rows — not because there are no customers in Bangalore, but because 'bangalore' and 'Bangalore' are treated as different strings. The analyst might incorrectly conclude the data is missing.
Solutions for case-insensitive matching: use LOWER(city) = 'bangalore' to convert both sides to lowercase before comparing — this works in all databases but prevents index usage on city. In PostgreSQL, use the ILIKE operator (case-insensitive LIKE) as an alternative: WHERE city ILIKE 'bangalore'. For production systems where case-insensitive search is frequent, create a functional index on LOWER(city) so that WHERE LOWER(city) = 'bangalore' can use the index. The safest long-term practice is to standardise data capitalisation on insert — store all cities as 'Bangalore' consistently, then case-sensitive comparison is always correct and fast.
// Part 13
Errors You Will Hit — And Exactly Why They Happen
🎯 Key Takeaways
- ✓WHERE filters rows by evaluating a condition for each row. Only rows where the condition evaluates to TRUE are included. FALSE and NULL both cause the row to be discarded.
- ✓WHERE executes before SELECT in the logical execution order: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT. This is why SELECT aliases cannot be used in WHERE.
- ✓The six comparison operators: = (equal), <> or != (not equal), > (greater than), < (less than), >= (greater than or equal), <= (less than or equal). All work on numbers, text, and dates.
- ✓Numbers in WHERE have no quotes: WHERE salary > 50000. Text values always use single quotes: WHERE city = 'Bangalore'. Dates use single quotes in YYYY-MM-DD format: WHERE order_date >= '2024-01-01'.
- ✓String comparison is case-sensitive in PostgreSQL and DuckDB. WHERE city = 'bangalore' finds nothing if values are stored as 'Bangalore'. Use LOWER() or ILIKE for case-insensitive matching.
- ✓You cannot use = NULL to check for NULL values — any comparison involving NULL returns NULL, not TRUE. Always use IS NULL and IS NOT NULL instead.
- ✓WHERE filters rows. HAVING filters groups after GROUP BY. You cannot use aggregate functions (COUNT, SUM, AVG) in WHERE — use HAVING for that.
- ✓WHERE on an indexed column triggers a fast index lookup. WHERE on an unindexed column triggers a slow full table scan. Avoid applying functions to the column side of WHERE conditions — it prevents index usage.
- ✓The debug workflow for unexpected zero results: run SELECT * FROM table LIMIT 10 to see actual stored values, check for case and whitespace mismatches, verify you are using IS NULL not = NULL.
- ✓Before any UPDATE or DELETE, always run the equivalent SELECT with the same WHERE to confirm which rows will be affected. Never run a destructive operation without first seeing the result of the equivalent SELECT.
What comes next
In Module 07, you combine multiple WHERE conditions using AND, OR, and NOT — turning simple single-condition filters into precise multi-condition queries that can answer complex business questions in a single statement.
Module 07 → Multiple Conditions — AND, OR, NOTDiscussion
0Have a better approach? Found something outdated? Share it — your knowledge helps everyone learning here.