INNER JOIN
Master the most-used JOIN — two-table joins, three and four-table chains, joining on expressions, aggregate queries with joins, and every production pattern you will write daily
// Part 01
INNER JOIN — The Intersection
INNER JOIN is the default JOIN type and the one you will write most often. It returns only rows where the join condition is satisfied in both tables simultaneously. Any row in the left table with no matching row in the right is excluded. Any row in the right table with no matching row in the left is excluded. Only the intersection survives.
This behaviour is exactly what you want for the vast majority of analytics queries: orders with their customers, order items with their products, employees with their stores. In a well-designed schema with foreign key constraints, every FK value is guaranteed to exist in the parent table — so INNER JOIN and LEFT JOIN produce identical results for those relationships. The choice between them becomes meaningful when the relationship is optional or when you specifically need to find unmatched rows.
// Part 02
Two-Table INNER JOIN — Every Core Pattern
Orders → Customers
Orders → Stores
Order Items → Products
Employees → Stores
// Part 03
Three-Table INNER JOIN — Chaining Relationships
Every additional JOIN adds one more table to the combined result. The database processes JOINs left to right — each JOIN combines the current result with the next table. The query optimiser may reorder joins for performance, but logically they chain sequentially.
Orders + Customers + Stores
Order Items + Orders + Products
Employees + Stores + (grouped summary)
// Part 04
Four-Table INNER JOIN — The FreshCart Master Query
The full FreshCart order detail query joins all four core tables: orders, customers, stores, order_items, and products. This five-table join (treating order_items as the bridge between orders and products) is the foundation for almost every sales and customer report.
// Part 05
INNER JOIN on Expressions — Non-Equality Joins
The ON clause is not limited to equality. You can join on any boolean expression — ranges, inequalities, calculated values. Non-equality joins are less common than equality joins but essential for certain analytical patterns.
Range join — joining on a value falling within a range
Joining on a calculated condition
// Part 06
INNER JOIN with Aggregates — The Analytics Core
The combination of INNER JOIN and GROUP BY is the foundation of every analytics report. Join to bring data together, group to define dimensions, aggregate to compute metrics.
Revenue by store
Top products by revenue
Customer lifetime value
// Part 07
INNER JOIN with Subqueries — Joining to Derived Tables
The right side of a JOIN does not have to be a base table. It can be a subquery — a derived table computed on the fly. This pattern is useful when you need to join pre-aggregated data or filter a table before joining it.
Join to a pre-aggregated subquery
// Part 08
INNER JOIN with CTEs — Readable Multi-Step Queries
CTEs (Common Table Expressions) define named intermediate results that can be joined like regular tables. For complex multi-step analytics, CTEs make the logic readable and maintainable — each step builds on the previous.
// Part 09
The Fan-Out Problem — When JOINs Inflate Aggregates
The most common INNER JOIN mistake in analytics: joining a one-to-many relationship before aggregating inflates aggregate values. If one order has three items, joining orders to order_items gives three rows per order. Summing orders.total_amount on this result adds the total_amount three times — once per item row.
Demonstrating the fan-out bug
The rules to prevent fan-out
// Part 10
JOIN Order and Performance
SQL is declarative — you describe what you want, not how to get it. The query optimiser decides the physical join order based on table statistics, indexes, and cardinality estimates. However, understanding how the optimiser thinks helps you write queries that perform well.
How the optimiser chooses join order
The optimiser aims to minimise the number of rows processed at each stage. It typically starts with the most selective filter (the table + WHERE condition that produces the fewest rows) and progressively joins larger tables. On a well-tuned database with up-to-date statistics, the optimiser usually picks the right order automatically.
Index usage in JOINs
The join column in the "inner" table (the one being looked up) should be indexed. When the database evaluates o.customer_id = c.customer_id, it scans orders and looks up each customer_id in customers. If customers has an index on customer_id (it does — it is the primary key), each lookup is O(log n). Without an index, each lookup is a full scan — O(n) per order row — making the join O(n²).
Writing joins for the optimiser
// Part 11
What This Looks Like at Work
You are a senior analyst at Shopify, India's social commerce platform. The growth team needs a weekly seller performance report — for each seller, show total orders handled, revenue generated, unique products listed, average order value, and the top product category. This is a multi-table INNER JOIN with aggregation — the most common production analytics pattern.
🎯 Pro Tip
The incremental build pattern — base join → verify → add aggregation → verify → add enrichment — is the professional workflow for every complex JOIN query. Never write the full query in one shot and hope it is correct. Build it layer by layer, verify the row count and spot-check values at each stage, and add the next layer only when the current one is confirmed correct.
// Part 12
Interview Prep — 5 Questions With Complete Answers
INNER JOIN works by evaluating the ON condition for every candidate pair of rows from the left and right tables. For each row in the left table, the database uses the join column's index (if available) to find matching rows in the right table — rows where the ON condition evaluates to TRUE. When a match is found, the columns from both rows are concatenated into a single output row. Rows from either table that have no matching partner in the other table are excluded from the result.
The database does not literally evaluate every combination of rows (which would be O(n²)). Instead it uses join algorithms: the nested loop join (iterate left rows, index-lookup right), hash join (build a hash table from the smaller table, probe it with each left row), or merge join (sort both tables on the join key, merge-scan). The query optimiser chooses the algorithm based on table sizes, available indexes, and memory.
The result of INNER JOIN is always a subset of the Cartesian product of the two tables — only rows where the ON condition is satisfied. The output has all columns from both tables (as specified in SELECT) for each matched pair. For foreign key relationships in a well-designed schema, INNER JOIN and LEFT JOIN produce the same result — because the FK constraint guarantees every FK value has a matching PK. The difference emerges only for optional relationships or when checking for unmatched rows.
This is the fan-out bug. The orders-to-order_items relationship is one-to-many: one order can have many items. When you JOIN orders to order_items, each order row is duplicated once per item — an order with 3 items appears 3 times in the joined result. When you then SUM(orders.total_amount), you are adding the same total_amount three times for that order.
Example: order 1001 has total_amount = ₹500 and 3 items. After the JOIN, the result has 3 rows all with total_amount = ₹500. SUM(total_amount) for this order contributes ₹1,500 instead of ₹500 — triple-counted. The final sum across all orders is inflated by the average number of items per order.
The three fixes: (1) SUM the item-level column instead — SUM(order_items.line_total) is correct because each line_total is unique to each item row, not duplicated. (2) Use COUNT(DISTINCT order_id) instead of COUNT(*) to count orders, not item rows. (3) Pre-aggregate order_items in a subquery or CTE first (SELECT order_id, SUM(line_total) AS total FROM order_items GROUP BY order_id), then join that aggregated result to orders — one row per order in the joined result, no fan-out. The fan-out bug is especially dangerous because the query runs without errors and returns plausible-looking numbers. Always cross-check JOIN aggregates against simpler single-table queries.
For INNER JOIN specifically, a filter in ON and a filter in WHERE produce identical results. This is because INNER JOIN already excludes unmatched rows — it does not have the asymmetric NULL-preservation behaviour of LEFT JOIN. Whether the filter is applied during the join (ON) or after (WHERE), the same rows end up being excluded.
Example: JOIN customers AS c ON o.customer_id = c.customer_id AND c.city = 'Seattle' produces the same result as JOIN customers AS c ON o.customer_id = c.customer_id WHERE c.city = 'Seattle'. Both return only orders from Seattle customers.
However, there is a convention preference: conditions that define the join relationship (matching keys) belong in ON; conditions that filter the result belong in WHERE. This makes queries more readable and semantically clear. ON answers "how do the tables relate?" WHERE answers "what rows from the joined result do I want?" Mixing business filters into ON makes the join condition harder to scan. The performance implication is theoretically the same for INNER JOIN since the optimiser can push predicates into the join regardless — but readability favours keeping relationship conditions in ON and filter conditions in WHERE.
Joining more than two tables means chaining multiple JOIN clauses. Each JOIN adds one more table to the current result. The syntax is sequential: FROM a JOIN b ON a.key = b.key JOIN c ON b.key = c.key JOIN d ON c.key = d.key. Each JOIN's ON clause can reference columns from any table that has already been introduced — you cannot reference a table before it appears in the FROM/JOIN chain.
Logically, the order of JOINs matters for which columns are available in subsequent ON clauses. You cannot join order_items to products before joining orders to order_items — because order_items is not yet in scope when you try to join products to it. The practical rule: join tables in the order of the relationship chain, following the FK path through the schema.
Physically, the query optimiser reorders JOINs for performance — it does not respect the written order. It evaluates all possible join orderings (up to a threshold) and chooses the one with the lowest estimated cost based on table sizes, indexes, and statistics. For queries with 5+ tables, the optimiser uses heuristics rather than exhaustive search. You can influence the physical order with hints in some databases (e.g., LEADING in PostgreSQL), but this is rarely necessary unless the optimiser is making a provably bad choice. Write JOINs in the logical relationship order for readability; trust the optimiser for physical ordering.
Finding the top seller per store is a "top N per group" problem. The approach requires computing revenue per (store, product) combination, then selecting the product with the highest revenue within each store group.
Step 1 — compute revenue per store per product: SELECT o.store_id, p.product_id, p.product_name, SUM(oi.line_total) AS revenue FROM orders AS o JOIN order_items AS oi ON o.order_id = oi.order_id JOIN products AS p ON oi.product_id = p.product_id WHERE o.order_status = 'Delivered' GROUP BY o.store_id, p.product_id, p.product_name. This gives one row per (store, product) pair with total revenue.
Step 2 — select the top product per store. The cleanest approach uses ROW_NUMBER() window function: wrap Step 1 in a CTE or subquery, add ROW_NUMBER() OVER (PARTITION BY store_id ORDER BY revenue DESC) AS rn, then filter WHERE rn = 1 in the outer query. This assigns rank 1 to the highest-revenue product within each store and returns only those. An alternative without window functions: join the Step 1 result to a subquery that finds MAX(revenue) per store, matching on both store_id and revenue = max_revenue. The ROW_NUMBER approach is cleaner and handles ties more explicitly (ROW_NUMBER picks one; DENSE_RANK preserves ties). Window functions are covered in depth in Module 45.
// Part 13
Errors You Will Hit — And Exactly Why They Happen
🎯 Key Takeaways
- ✓INNER JOIN returns only rows where the ON condition matches in both tables. It is the default JOIN type — writing JOIN without a qualifier means INNER JOIN.
- ✓Chain multiple JOINs sequentially — each JOIN adds one more table. Join in the logical relationship order: parent before child, following the FK path through the schema.
- ✓The fan-out bug: joining a one-to-many relationship before aggregating inflates aggregate values. SUM(orders.total_amount) after joining to order_items triple-counts each order with three items.
- ✓Prevent fan-out: SUM the many-side column (order_items.line_total), use COUNT(DISTINCT pk) for the one-side count, or pre-aggregate the many side in a CTE before joining.
- ✓For INNER JOIN, filters in ON and WHERE produce identical results. By convention: join relationship conditions in ON, business filter conditions in WHERE.
- ✓The right side of JOIN can be a subquery or CTE — join pre-aggregated or filtered data without base table constraints.
- ✓Join columns on the looked-up table must be indexed for performance. Primary keys are always indexed. Add explicit indexes on FK columns for frequently-joined non-PK columns.
- ✓Incremental build pattern: write base JOIN → SELECT COUNT(*) + LIMIT 5 spot-check → add WHERE → add GROUP BY → add HAVING. Never write the full complex query in one shot.
- ✓Verify aggregates from JOIN queries against simpler single-table queries. The fan-out bug is silent — queries run without errors and return plausible but wrong numbers.
- ✓CTEs make complex multi-JOIN queries readable by naming each analytical step. Each CTE is defined once and can be joined like a regular table in subsequent CTEs or the final SELECT.
What comes next
In Module 32, you learn LEFT and RIGHT JOINs in full depth — every pattern for keeping unmatched rows, the IS NULL filter idiom, optional relationships, and when to choose LEFT over INNER.
Module 32 → LEFT and RIGHT JOINDiscussion
0Have a better approach? Found something outdated? Share it — your knowledge helps everyone learning here.