SQL — Module 59Advanced

Query Best Practices

Write clean, fast, readable SQL your team will thank you for — SARGability, avoiding SELECT *, NULL handling, and formatting conventions

14–18 min April 2026

Section 12 · Performance & Optimization

Performance & Optimization · 3 modulesModule 59

EXPLAIN Index Query

// Part 01

Why Query Style Matters

Bad SQL is not just ugly — it is slow, hard to debug, and breaks in unexpected ways when the dataset grows. Most of the worst-performing queries in production share a small set of anti-patterns: functions applied to indexed columns, vague column selection, missing aliases, and ambiguous NULLs. Learning to recognize and avoid these patterns will make your queries 10x faster and your code 10x easier to review.

This module covers the patterns senior engineers check for in every SQL code review. They are not stylistic preferences — most of them have direct, measurable performance implications.

Loading FreshCart DB…
-- The FreshCart dataset we will optimize queries against
SELECT
  'customers'   AS tbl, COUNT(*) AS rows FROM customers
UNION ALL SELECT 'orders',      COUNT(*) FROM orders
UNION ALL SELECT 'order_items', COUNT(*) FROM order_items
UNION ALL SELECT 'products',    COUNT(*) FROM products;Ctrl + Enter to run
Loading FreshCart database in your browser…

// Part 02

Never Use SELECT *

SELECT * seems convenient but causes real problems in production: it reads every column from disk even if you only need two, breaks views and application code when table schemas change (new column added = different column order), and makes queries self-documenting only if you know the table schema by heart.

Avoid

-- Reads all columns even if only 2 are used
SELECT *
FROM orders
WHERE order_status = 'Delivered';

-- application code then does:
-- order.order_id, order.customer_id
-- all other columns: wasted I/O

Prefer

-- Only read what you need
SELECT
  order_id,
  customer_id,
  total_amount,
  delivery_date
FROM orders
WHERE order_status = 'Delivered';

The only acceptable use of SELECT * is in quick exploratory queries directly in a SQL client — never in application code, views, or stored procedures. In CTEs, SELECT * from an inner CTE into a final SELECT is acceptable if you immediately alias what you need.

Loading FreshCart DB…
-- Show the difference: explicit columns vs implicit *
-- Explicit: exactly what we need for a delivery summary
SELECT
  o.order_id,
  o.order_date,
  o.total_amount,
  c.first_name || ' ' || c.last_name AS customer_name,
  c.city
FROM orders     AS o
JOIN customers  AS c ON o.customer_id = c.customer_id
WHERE o.order_status = 'Delivered'
ORDER BY o.order_date DESC
LIMIT 5;Ctrl + Enter to run
Loading FreshCart database in your browser…

// Part 03

SARGability — Write Index-Friendly WHERE Clauses

SARGable stands for Search ARGument Able — a WHERE clause the database can use an index to satisfy. A non-SARGable condition forces a full table scan even when a perfect index exists. The single most common cause: applying a function to the indexed column.

The rule: never transform the indexed column

An index on a column stores the raw column values in sorted order. If your WHERE clause applies a function to the column, the database cannot use the index — the computed value is not in the index. Move transformations to the comparison value instead.

Avoid

-- Non-SARGable: function applied to indexed column
-- Full scan even with index on order_date
WHERE strftime('%Y', order_date) = '2024'

WHERE UPPER(city) = 'MUMBAI'

WHERE LENGTH(product_name) > 10

WHERE total_amount * 1.18 > 1000

Prefer

-- SARGable: transform the literal, not the column
-- Index on order_date can be used
WHERE order_date >= '2024-01-01'
  AND order_date <  '2025-01-01'

WHERE city = 'New York'   -- or store consistently-cased data

WHERE product_name > ''  -- different logic needed

WHERE total_amount > 847.46  -- pre-compute: 1000 / 1.18

Non-SARGable — LIKE with leading wildcard

-- Full scan: leading wildcard prevents index use
WHERE product_name LIKE '%milk%'
WHERE email LIKE '%@gmail.com'

SARGable — LIKE with trailing wildcard

-- Index usable: trailing wildcard only
WHERE product_name LIKE 'Amul%'
WHERE email LIKE 'rahul%'
-- For contains-search: use full-text indexes

Loading FreshCart DB…
-- SARGable date range (index-friendly) vs function-based (index-killing)
-- SARGable: compare column directly to range boundaries
SELECT COUNT(*) AS delivered_in_2024
FROM orders
WHERE order_date >= '2024-01-01'
  AND order_date <  '2025-01-01'
  AND order_status = 'Delivered';Ctrl + Enter to run
Loading FreshCart database in your browser…

// Part 04

Avoid Functions on Indexed Columns in JOINs

The same SARGability rule applies to JOIN conditions. If a JOIN applies a function to either side's join key, the index on that key cannot be used — the database must compute the function for every row and compare, resulting in a nested-loop full scan.

Avoid

-- Non-SARGable JOIN: function kills the FK index
SELECT o.order_id, c.first_name
FROM orders    AS o
JOIN customers AS c
  ON CAST(o.customer_id AS TEXT)
   = CAST(c.customer_id AS TEXT);
-- Computes CAST for every row pair — full scan

Prefer

-- SARGable JOIN: same types, direct comparison
SELECT o.order_id, c.first_name
FROM orders    AS o
JOIN customers AS c
  ON o.customer_id = c.customer_id;
-- Index on customer_id (FK) is used

Avoid

-- Implicit type coercion: date stored as TEXT
-- WHERE CAST(order_date AS DATE) = '2024-03-15'
-- or: WHERE date(order_date) = '2024-03-15'
-- Function on column = no index

Prefer

-- Store dates as DATE/TEXT in ISO format: 'YYYY-MM-DD'
-- Then direct comparison works and is SARGable:
WHERE order_date = '2024-03-15'

Pro Tip

Run EXPLAIN QUERY PLAN before your query to see whether the database is using an index. If you see "SCAN TABLE" instead of "SEARCH TABLE USING INDEX", your WHERE or JOIN condition is likely non-SARGable. Fix the condition and re-run EXPLAIN QUERY PLAN to confirm the index is now being used.

Loading FreshCart DB…
-- Check query plan: index usage vs full scan
EXPLAIN QUERY PLAN
SELECT o.order_id, c.first_name, o.total_amount
FROM orders    AS o
JOIN customers AS c ON o.customer_id = c.customer_id
WHERE o.order_status = 'Delivered'
  AND o.total_amount > 500;Ctrl + Enter to run
Loading FreshCart database in your browser…

// Part 05

Always Alias Tables in Multi-Table Queries

Unaliased multi-table queries are hard to read and can be ambiguous when two tables share a column name. Always assign short, meaningful aliases to every table in a query with more than one table. Use the table's initials or a short mnemonic — not t1, t2, t3.

Avoid

-- Ambiguous: which table does customer_id belong to?
SELECT order_id, customer_id, first_name, total_amount
FROM orders
JOIN customers ON orders.customer_id = customers.customer_id
WHERE order_status = 'Delivered';
-- 'customer_id' is ambiguous in SELECT — whose?

Prefer

-- Clear: every column is table-qualified
SELECT
  o.order_id,
  o.customer_id,
  c.first_name,
  o.total_amount
FROM orders    AS o
JOIN customers AS c ON o.customer_id = c.customer_id
WHERE o.order_status = 'Delivered';

Avoid

-- t1/t2 aliases: meaningless, hard to review
SELECT t1.order_id, t2.product_name, t1.quantity
FROM order_items t1
JOIN products t2 ON t1.product_id = t2.product_id;

Prefer

-- Semantic aliases: self-documenting
SELECT
  oi.order_id,
  p.product_name,
  oi.quantity,
  oi.line_total
FROM order_items AS oi
JOIN products    AS p ON oi.product_id = p.product_id;

// Part 06

Use EXISTS Instead of COUNT(*) for Existence Checks

A common pattern: checking whether any row matches a condition, then branching on the result. The slow way is COUNT(*) > 0 — it scans all matching rows and counts them before you can compare. The correct way is EXISTS — it stops as soon as the first matching row is found.

Avoid

-- Scans all matching rows to get the count
SELECT customer_id
FROM customers
WHERE (
  SELECT COUNT(*)
  FROM orders
  WHERE orders.customer_id = customers.customer_id
    AND order_status = 'Delivered'
) > 0;

Prefer

-- Stops at the first matching row — much faster
SELECT customer_id
FROM customers AS c
WHERE EXISTS (
  SELECT 1
  FROM orders AS o
  WHERE o.customer_id = c.customer_id
    AND o.order_status = 'Delivered'
);

Slow — scans all to count

-- Checking if a product has any orders
SELECT product_id
FROM products
WHERE (
  SELECT COUNT(*) FROM order_items
  WHERE order_items.product_id = products.product_id
) > 0;

Fast — stops at first match

-- EXISTS is always better for "does any row match?"
SELECT product_id
FROM products AS p
WHERE EXISTS (
  SELECT 1 FROM order_items AS oi
  WHERE oi.product_id = p.product_id
);

Loading FreshCart DB…
-- Find customers who have at least one delivered order
-- Using EXISTS (stops at first match per customer)
SELECT
  c.customer_id,
  c.first_name,
  c.loyalty_tier
FROM customers AS c
WHERE EXISTS (
  SELECT 1
  FROM orders AS o
  WHERE o.customer_id = c.customer_id
    AND o.order_status = 'Delivered'
)
ORDER BY c.loyalty_tier, c.first_name;Ctrl + Enter to run
Loading FreshCart database in your browser…

// Part 07

NULL Handling — Three Rules You Must Know

NULL is not a value — it is the absence of a value. It obeys different rules than regular values. Getting NULL handling wrong produces silent incorrect results: rows are silently excluded from aggregations, comparisons return the wrong result, and joins drop rows. Three rules cover 90% of NULL bugs.

Rule 1: NULL comparisons always return NULL (not TRUE or FALSE)

Avoid

-- Both of these return 0 rows — not the rows with NULL delivery_date
SELECT * FROM orders WHERE delivery_date = NULL;
SELECT * FROM orders WHERE delivery_date != NULL;
-- = NULL and != NULL are always NULL (falsy), never TRUE

Prefer

-- IS NULL and IS NOT NULL are the correct tests
SELECT * FROM orders WHERE delivery_date IS NULL;
SELECT * FROM orders WHERE delivery_date IS NOT NULL;

Rule 2: Aggregates silently ignore NULL

Avoid

-- AVG ignores NULL rows — result looks like average of all,
-- but it is only the average of non-NULL rows
SELECT AVG(delivery_date) FROM orders;
-- If 20% of rows have NULL delivery_date, the average
-- is computed over only 80% of rows — no warning given

Prefer

-- Count NULLs explicitly to know what AVG is actually over
SELECT
  COUNT(*)                 AS total_orders,
  COUNT(delivery_date)     AS orders_with_delivery,
  COUNT(*) - COUNT(delivery_date) AS orders_missing_delivery
FROM orders;

Rule 3: Use COALESCE or IFNULL to provide a default

Avoid

-- NULL propagates: any arithmetic with NULL = NULL
SELECT
  first_name,
  discount_pct * total_amount AS discount_amount
-- If discount_pct is NULL → discount_amount is NULL
-- Report shows NULL instead of 0

Prefer

-- Replace NULL with a sensible default before arithmetic
SELECT
  first_name,
  COALESCE(discount_pct, 0) * total_amount AS discount_amount
-- COALESCE returns first non-NULL argument
-- NULL discount_pct treated as 0 → correct ₹0 discount

Loading FreshCart DB…
-- NULL audit on the FreshCart orders table
SELECT
  COUNT(*)                                        AS total_orders,
  COUNT(delivery_date)                            AS with_delivery_date,
  COUNT(*) - COUNT(delivery_date)                 AS missing_delivery_date,
  ROUND(100.0 * (COUNT(*) - COUNT(delivery_date))
        / COUNT(*), 1)                            AS pct_missing
FROM orders;Ctrl + Enter to run
Loading FreshCart database in your browser…

// Part 08

Formatting and Readability Conventions

SQL has no official style guide, but teams that adopt consistent conventions find code review, debugging, and onboarding dramatically easier. These are the conventions used by most professional SQL teams:

SQL formatting conventions

-- KEYWORDS: uppercase
-- table/column names: lowercase_snake_case
-- aliases: always use AS keyword (never implicit)

-- Structure: one clause per line, aligned
SELECT
  o.order_id,
  o.order_date,
  o.total_amount,
  c.first_name || ' ' || c.last_name  AS customer_name,
  s.store_name,
  COUNT(oi.product_id)                AS item_count
FROM   orders     AS o
JOIN   customers  AS c  ON c.customer_id  = o.customer_id
JOIN   stores     AS s  ON s.store_id     = o.store_id
JOIN   order_items AS oi ON oi.order_id   = o.order_id
WHERE  o.order_status = 'Delivered'
  AND  o.order_date >= '2024-01-01'
GROUP BY
  o.order_id,
  o.order_date,
  o.total_amount,
  customer_name,
  s.store_name
HAVING COUNT(oi.product_id) >= 3
ORDER BY o.order_date DESC
LIMIT 20;

-- CTEs: name them as what they represent (not cte1, subq2)
WITH
delivered_orders AS (
  SELECT order_id, customer_id, total_amount
  FROM   orders
  WHERE  order_status = 'Delivered'
),
high_value_customers AS (
  SELECT customer_id, SUM(total_amount) AS lifetime_value
  FROM   delivered_orders
  GROUP  BY customer_id
  HAVING SUM(total_amount) > 5000
)
SELECT c.first_name, h.lifetime_value
FROM   high_value_customers AS h
JOIN   customers             AS c ON c.customer_id = h.customer_id
ORDER  BY h.lifetime_value DESC;

RULE

Uppercase keywords

SELECT, FROM, WHERE, JOIN, GROUP BY, HAVING, ORDER BY, LIMIT

RULE

One SELECT column per line

Easier to comment out, reorder, or add a column during debugging

RULE

Always use AS for aliases

SELECT total_amount AS revenue — never SELECT total_amount revenue

RULE

Align JOIN ON conditions

Makes foreign key relationships visually obvious

RULE

Name CTEs semantically

delivered_orders not cte1 — the name documents the purpose

RULE

Trailing commas vs leading commas

Team choice — leading commas make it easier to comment out the last column

// Part 09

Common Anti-Patterns to Eliminate

Anti-pattern 1: Implicit GROUP BY

Avoid

-- Selecting non-aggregated columns not in GROUP BY
-- (Some databases allow this; result is non-deterministic)
SELECT customer_id, first_name, SUM(total_amount)
FROM orders
GROUP BY customer_id;
-- first_name is not in GROUP BY — which first_name
-- does the database pick? Undefined.

Prefer

SELECT customer_id, first_name, SUM(total_amount) AS revenue
FROM orders AS o
JOIN customers AS c USING (customer_id)
GROUP BY o.customer_id, c.first_name;
-- Every selected column is either aggregated or in GROUP BY

Anti-pattern 2: Correlated subquery in SELECT (N+1)

Avoid

-- Runs a subquery for EVERY row in orders — O(n) queries
SELECT
  order_id,
  (SELECT first_name FROM customers
   WHERE customer_id = o.customer_id) AS name
FROM orders AS o;

Prefer

-- One JOIN — single pass
SELECT o.order_id, c.first_name AS name
FROM orders    AS o
JOIN customers AS c ON c.customer_id = o.customer_id;

Anti-pattern 3: DISTINCT as a bug-hider

Avoid

-- DISTINCT hides the JOIN that is producing duplicates
SELECT DISTINCT c.customer_id, c.first_name
FROM customers AS c
JOIN orders    AS o ON o.customer_id = c.customer_id;
-- Why are there duplicates? DISTINCT masks the root cause

Prefer

-- Use EXISTS to find customers with at least one order
SELECT c.customer_id, c.first_name
FROM customers AS c
WHERE EXISTS (
  SELECT 1 FROM orders AS o
  WHERE o.customer_id = c.customer_id
);

Anti-pattern 4: OR across different columns in WHERE

Avoid

-- OR prevents index use when columns are different
WHERE city = 'New York' OR store_id = 'ST001'
-- Database cannot satisfy both sides with one index scan

Prefer

-- UNION ALL splits into two index-friendly queries
SELECT * FROM customers WHERE city = 'New York'
UNION ALL
SELECT * FROM customers WHERE store_id = 'ST001'
  AND city != 'New York';  -- avoid duplicates

Loading FreshCart DB…
-- Anti-pattern 3 fixed: EXISTS vs DISTINCT
-- Find customers who placed at least one order in 2024

-- DISTINCT (hides the why):
-- SELECT DISTINCT c.customer_id, c.first_name
-- FROM customers c JOIN orders o ON o.customer_id = c.customer_id
-- WHERE o.order_date >= '2024-01-01'

-- EXISTS (correct intent):
SELECT
  c.customer_id,
  c.first_name,
  c.loyalty_tier
FROM customers AS c
WHERE EXISTS (
  SELECT 1
  FROM orders AS o
  WHERE o.customer_id = c.customer_id
    AND o.order_date >= '2024-01-01'
)
ORDER BY c.loyalty_tier DESC, c.first_name;Ctrl + Enter to run
Loading FreshCart database in your browser…

// Part 10

Interview Prep — 5 Questions With Complete Answers

Q: What is SARGability and why does it matter?

SARGable (Search ARGument Able) describes a WHERE clause condition that allows the database engine to use an index to satisfy it. A condition is SARGable when the indexed column appears alone on one side of the comparison, without any function or transformation applied to it. The database can walk the sorted index to find matching rows rather than scanning the full table.

A condition becomes non-SARGable when a function is applied to the indexed column: WHERE YEAR(order_date) = 2024, WHERE UPPER(city) = 'MUMBAI', or WHERE total_amount * 1.18 > 1000. The database cannot use the index on order_date for the first example because the index stores raw dates, not YEAR-extracted integers. The engine must compute YEAR() for every row and compare — a full table scan.

Fix: transform the comparison value instead of the column. Replace WHERE YEAR(order_date) = 2024 with WHERE order_date > '2024-01-01' AND order_date < '2025-01-01'. The column appears unmodified, the index can be used. SARGability matters because on tables with millions of rows, the difference between an index seek and a full scan is measured in seconds vs. milliseconds. Always check EXPLAIN QUERY PLAN to verify index usage.

Q: Why should you avoid SELECT * in production queries?

SELECT * has three production problems. First, it reads every column from disk even if your application only uses two — unnecessary I/O, more data transferred over the network, more memory used. On wide tables (100+ columns, JSONB blobs, large TEXT fields) this overhead is significant. Second, SELECT * is fragile: when a column is added to the table, SELECT * returns it even if the application does not expect it, breaking deserialization code, ORMs that map columns to object fields, or CSV exports with fixed column order. Third, it makes queries unreadable — a reader cannot determine what data the query produces without knowing the table schema.

The only acceptable context for SELECT * is interactive exploration in a SQL client — never in views, stored procedures, application code, or automated reports. In CTEs, SELECT * from a named CTE into a final outer SELECT is acceptable if the final SELECT explicitly names what it needs. Rule of thumb: if the query will run more than once, name the columns.

Q: What is the difference between WHERE and HAVING?

WHERE filters rows before grouping. HAVING filters groups after grouping. They operate at different stages of query execution. WHERE runs first — it restricts which rows are included in the aggregation. HAVING runs after GROUP BY — it restricts which groups appear in the final result based on aggregated values.

WHERE can only reference columns that exist in the raw table (individual row values). It cannot reference aggregate functions (SUM, COUNT, AVG) because the aggregation has not happened yet. HAVING can reference both the GROUP BY columns and aggregate functions applied to the group.

Performance implication: filter as early as possible using WHERE. If you can eliminate rows before grouping, the aggregation processes fewer rows and runs faster. HAVING should only contain conditions that genuinely require the aggregated value — anything else belongs in WHERE. Example: WHERE order_status = 'Delivered' eliminates non-delivered rows before counting, which is far more efficient than HAVING order_status = 'Delivered' which would group all rows first, then discard non-delivered groups.

Q: When should you use a CTE vs a subquery vs a temporary table?

CTEs (WITH clauses) are best for: breaking a complex query into readable named steps, referencing the same intermediate result multiple times within one query, and recursive queries. Most databases optimize CTEs inline (treating them as subqueries), so there is typically no performance difference between a CTE and an equivalent subquery — choose CTEs for readability. Use meaningful CTE names that describe what the set represents.

Subqueries are best for: simple one-off filtering (EXISTS / IN subqueries), scalar subqueries that return a single value, and cases where the subquery is used in only one place and is simple enough to read inline. Correlated subqueries run once per outer row — avoid them in SELECT lists when the same result can be achieved with a JOIN (the N+1 problem).

Temporary tables are best for: large intermediate results that need to be queried multiple times (CTEs re-materialize each time in some databases), results that need indexes for subsequent queries (you can CREATE INDEX on a temp table), and multi-step data transformation pipelines where each step produces a significant intermediate dataset. Temp tables have setup overhead; for small datasets, a CTE is faster and requires no cleanup.

Q: What is the N+1 query problem and how do you fix it?

The N+1 problem occurs when application code executes 1 query to get N rows, then executes 1 additional query per row to fetch related data — N+1 queries total. Example: fetch 100 orders (1 query), then for each order, fetch the customer name (100 queries) = 101 queries. At 100 orders this is annoying; at 10,000 orders it is catastrophic.

In raw SQL, the N+1 pattern appears as a correlated subquery in the SELECT list: SELECT order_id, (SELECT first_name FROM customers WHERE customer_id = o.customer_id) FROM orders o. This executes one subquery per order row.

Fix: JOIN the related table in the original query. SELECT o.order_id, c.first_name FROM orders o JOIN customers c ON c.customer_id = o.customer_id. The database executes one query with one pass over both tables — O(1) queries regardless of N. In ORM frameworks (Django, Rails, SQLAlchemy), the equivalent fix is eager loading: select_related() in Django, includes() in Rails, joinedload() in SQLAlchemy. The fix is always the same: fetch all needed data in one query, not one query per row.

Try It Yourself

Rewrite the following poorly-written query using all best practices from this module: SELECT *, (SELECT COUNT(*) FROM order_items WHERE order_items.order_id = orders.order_id) as item_count FROM orders WHERE strftime('%Y', order_date) = '2024' AND order_status = 'Delivered'. Fix: (1) remove SELECT *, specify needed columns; (2) fix the non-SARGable date filter; (3) replace the correlated COUNT(*) subquery with a JOIN + GROUP BY; (4) use proper aliases and formatting.

🎯 Key Takeaways

✓Never use SELECT * in production queries — name every column. Explicit columns prevent silent breaks when schema changes and eliminate unnecessary I/O.
✓SARGable WHERE clauses allow index use. Never apply functions to indexed columns: WHERE YEAR(col) = 2024 kills the index; WHERE col >= '2024-01-01' uses it.
✓Trailing LIKE wildcards ('Amul%') are SARGable and use indexes. Leading wildcards ('%milk%') force full scans. Use full-text search for contains-matching.
✓Use EXISTS instead of COUNT(*) > 0 for existence checks. EXISTS stops at the first match; COUNT(*) scans all matching rows.
✓NULL is not a value — it is unknown. Use IS NULL / IS NOT NULL (not = NULL). Aggregates silently skip NULLs. Use COALESCE to provide defaults.
✓Filter with WHERE before grouping; filter groups with HAVING after grouping. Move conditions to WHERE whenever they do not require aggregate values.
✓Correlated subqueries in SELECT are N+1 queries — one per outer row. Replace with a JOIN. The database does one pass regardless of N.
✓Avoid DISTINCT as a bug-hider for duplicate-producing JOINs. Investigate why duplicates appear; usually EXISTS or a better JOIN is the fix.
✓Format SQL with uppercase keywords, one column per line, semantic table aliases, and named CTEs that describe what the set represents.

What comes next

In Module 60, you apply everything to real data analysis — cohort analysis, RFM segmentation, and revenue breakdowns using the full FreshCart dataset.

Module 60 → SQL for Data Analysis

Discussion

Have a better approach? Found something outdated? Share it — your knowledge helps everyone learning here.

Continue with GitHub