Python · SQL · Web Dev · Java · AI/ML tracks launching soon — your one platform for all of IT
Intermediate

INNER JOIN

Master the most-used JOIN — two-table joins, three and four-table chains, joining on expressions, aggregate queries with joins, and every production pattern you will write daily

14–18 min April 2026
Section 7 · Joins
Joins · 6 modulesModule 31

// Part 01

INNER JOIN — The Intersection

INNER JOIN is the default JOIN type and the one you will write most often. It returns only rows where the join condition is satisfied in both tables simultaneously. Any row in the left table with no matching row in the right is excluded. Any row in the right table with no matching row in the left is excluded. Only the intersection survives.

This behaviour is exactly what you want for the vast majority of analytics queries: orders with their customers, order items with their products, employees with their stores. In a well-designed schema with foreign key constraints, every FK value is guaranteed to exist in the parent table — so INNER JOIN and LEFT JOIN produce identical results for those relationships. The choice between them becomes meaningful when the relationship is optional or when you specifically need to find unmatched rows.

INNER JOIN — syntax reminder
-- Explicit keyword (clearest):
SELECT columns
FROM   left_table AS l
INNER JOIN right_table AS r ON l.key = r.key;

-- Shorthand (JOIN without qualifier = INNER JOIN):
SELECT columns
FROM   left_table AS l
JOIN   right_table AS r ON l.key = r.key;

-- Both are identical — INNER is the default when no type is specified

// Part 02

Two-Table INNER JOIN — Every Core Pattern

Orders → Customers

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

Orders → Stores

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

Order Items → Products

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

Employees → Stores

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

// Part 03

Three-Table INNER JOIN — Chaining Relationships

Every additional JOIN adds one more table to the combined result. The database processes JOINs left to right — each JOIN combines the current result with the next table. The query optimiser may reorder joins for performance, but logically they chain sequentially.

Orders + Customers + Stores

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

Order Items + Orders + Products

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

Employees + Stores + (grouped summary)

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

// Part 04

Four-Table INNER JOIN — The FreshCart Master Query

The full FreshCart order detail query joins all four core tables: orders, customers, stores, order_items, and products. This five-table join (treating order_items as the bridge between orders and products) is the foundation for almost every sales and customer report.

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

// Part 05

INNER JOIN on Expressions — Non-Equality Joins

The ON clause is not limited to equality. You can join on any boolean expression — ranges, inequalities, calculated values. Non-equality joins are less common than equality joins but essential for certain analytical patterns.

Range join — joining on a value falling within a range

Range join — ON with BETWEEN
-- Classify orders into salary bands defined in a reference table
-- salary_bands(band_name, min_amount, max_amount)
SELECT
  o.order_id,
  o.total_amount,
  sb.band_name
FROM orders AS o
JOIN salary_bands AS sb
  ON o.total_amount BETWEEN sb.min_amount AND sb.max_amount;

-- This is a range join: no single equality condition
-- Each order matches the band whose range contains its total_amount

Joining on a calculated condition

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

// Part 06

INNER JOIN with Aggregates — The Analytics Core

The combination of INNER JOIN and GROUP BY is the foundation of every analytics report. Join to bring data together, group to define dimensions, aggregate to compute metrics.

Revenue by store

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

Top products by revenue

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

Customer lifetime value

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

// Part 07

INNER JOIN with Subqueries — Joining to Derived Tables

The right side of a JOIN does not have to be a base table. It can be a subquery — a derived table computed on the fly. This pattern is useful when you need to join pre-aggregated data or filter a table before joining it.

Join to a pre-aggregated subquery

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

// Part 08

INNER JOIN with CTEs — Readable Multi-Step Queries

CTEs (Common Table Expressions) define named intermediate results that can be joined like regular tables. For complex multi-step analytics, CTEs make the logic readable and maintainable — each step builds on the previous.

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

// Part 09

The Fan-Out Problem — When JOINs Inflate Aggregates

The most common INNER JOIN mistake in analytics: joining a one-to-many relationship before aggregating inflates aggregate values. If one order has three items, joining orders to order_items gives three rows per order. Summing orders.total_amount on this result adds the total_amount three times — once per item row.

Demonstrating the fan-out bug

Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…

The rules to prevent fan-out

Fan-out prevention rules
-- Rule 1: Only aggregate columns from the many-side table
-- WRONG: SUM(orders.total_amount) after joining to order_items
-- RIGHT:  SUM(order_items.line_total) — aggregate the many-side column

-- Rule 2: Use COUNT(DISTINCT pk) not COUNT(*) on the one-side
-- WRONG: COUNT(*) counts one row per item, not one per order
-- RIGHT:  COUNT(DISTINCT o.order_id) counts distinct orders

-- Rule 3: Pre-aggregate the many side before joining
WITH item_totals AS (
  SELECT order_id, SUM(line_total) AS item_total
  FROM order_items GROUP BY order_id
)
SELECT o.order_id, o.total_amount, it.item_total
FROM orders AS o
JOIN item_totals AS it ON o.order_id = it.order_id;
-- Now one row per order — no fan-out

-- Rule 4: Verify by comparing totals to known values
-- Run the aggregate, then cross-check against a simpler query
⚠️ Important
The fan-out bug is silent — the query runs without errors and returns results that look plausible. The numbers are simply wrong. Always cross-check aggregate results from a JOIN query against a simpler single-table aggregate to detect inflation before trusting the output.

// Part 10

JOIN Order and Performance

SQL is declarative — you describe what you want, not how to get it. The query optimiser decides the physical join order based on table statistics, indexes, and cardinality estimates. However, understanding how the optimiser thinks helps you write queries that perform well.

How the optimiser chooses join order

The optimiser aims to minimise the number of rows processed at each stage. It typically starts with the most selective filter (the table + WHERE condition that produces the fewest rows) and progressively joins larger tables. On a well-tuned database with up-to-date statistics, the optimiser usually picks the right order automatically.

Index usage in JOINs

The join column in the "inner" table (the one being looked up) should be indexed. When the database evaluates o.customer_id = c.customer_id, it scans orders and looks up each customer_id in customers. If customers has an index on customer_id (it does — it is the primary key), each lookup is O(log n). Without an index, each lookup is a full scan — O(n) per order row — making the join O(n²).

Index usage in JOINs
-- Fast: joining on primary keys and indexed columns
JOIN customers AS c ON o.customer_id = c.customer_id
-- customer_id is the PK of customers — always indexed

JOIN products AS p ON oi.product_id = p.product_id
-- product_id is the PK of products — always indexed

-- Potentially slow: joining on non-indexed columns
JOIN orders AS o ON o.payment_method = p.category
-- payment_method in orders may not be indexed
-- Consider adding: CREATE INDEX idx_orders_payment ON orders(payment_method)

-- Use EXPLAIN ANALYZE to see the actual join plan and identify slow joins:
EXPLAIN ANALYZE
SELECT * FROM orders AS o
JOIN customers AS c ON o.customer_id = c.customer_id
WHERE o.order_status = 'Delivered';

Writing joins for the optimiser

Performance-friendly JOIN patterns
-- Filter early: put selective WHERE conditions to reduce rows before joining
SELECT ...
FROM orders AS o
JOIN customers AS c ON o.customer_id = c.customer_id
WHERE o.order_date >= '2024-01-01'   -- reduces orders table first
  AND c.loyalty_tier = 'Platinum';   -- further reduces matched customers

-- Avoid functions on join columns — prevents index use
-- WRONG:
JOIN customers AS c ON LOWER(o.customer_email) = LOWER(c.email)
-- The LOWER() call prevents the index on email from being used

-- RIGHT: normalise data at write time, join on raw values
JOIN customers AS c ON o.customer_id = c.customer_id

-- Avoid OR conditions in ON — harder to optimise
-- Use UNION ALL instead of OR in join conditions when possible

// Part 11

What This Looks Like at Work

You are a senior analyst at Shopify, India's social commerce platform. The growth team needs a weekly seller performance report — for each seller, show total orders handled, revenue generated, unique products listed, average order value, and the top product category. This is a multi-table INNER JOIN with aggregation — the most common production analytics pattern.

9:00 AM
Requirements received
Weekly seller performance: seller_id, seller_name, total orders, total revenue, unique products, avg order value, top category. Adapted for FreshCart: store performance with product category breakdown.
9:20 AM
Step 1 — build the base join
Start with the master join covering all required tables.
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
9:35 AM
Step 2 — add aggregation
Base join is correct. Now add GROUP BY and aggregates.
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
9:55 AM
Step 3 — add top category per store
Use a CTE to find each store's top category, then join to the main report.
Loading FreshCart DB…
Ctrl + Enter to run
Loading FreshCart database in your browser…
10:20 AM
Report delivered — one hour from brief to delivery
Three incremental steps: base join verification → aggregation → CTE enrichment. Each step was verified before the next was built. The final report has every metric the growth team requested, in one query.

🎯 Pro Tip

The incremental build pattern — base join → verify → add aggregation → verify → add enrichment — is the professional workflow for every complex JOIN query. Never write the full query in one shot and hope it is correct. Build it layer by layer, verify the row count and spot-check values at each stage, and add the next layer only when the current one is confirmed correct.

// Part 12

Interview Prep — 5 Questions With Complete Answers

Q: How does INNER JOIN work internally and what does it return?

INNER JOIN works by evaluating the ON condition for every candidate pair of rows from the left and right tables. For each row in the left table, the database uses the join column's index (if available) to find matching rows in the right table — rows where the ON condition evaluates to TRUE. When a match is found, the columns from both rows are concatenated into a single output row. Rows from either table that have no matching partner in the other table are excluded from the result.

The database does not literally evaluate every combination of rows (which would be O(n²)). Instead it uses join algorithms: the nested loop join (iterate left rows, index-lookup right), hash join (build a hash table from the smaller table, probe it with each left row), or merge join (sort both tables on the join key, merge-scan). The query optimiser chooses the algorithm based on table sizes, available indexes, and memory.

The result of INNER JOIN is always a subset of the Cartesian product of the two tables — only rows where the ON condition is satisfied. The output has all columns from both tables (as specified in SELECT) for each matched pair. For foreign key relationships in a well-designed schema, INNER JOIN and LEFT JOIN produce the same result — because the FK constraint guarantees every FK value has a matching PK. The difference emerges only for optional relationships or when checking for unmatched rows.

Q: When you join orders to order_items, why does SUM(orders.total_amount) give wrong results?

This is the fan-out bug. The orders-to-order_items relationship is one-to-many: one order can have many items. When you JOIN orders to order_items, each order row is duplicated once per item — an order with 3 items appears 3 times in the joined result. When you then SUM(orders.total_amount), you are adding the same total_amount three times for that order.

Example: order 1001 has total_amount = ₹500 and 3 items. After the JOIN, the result has 3 rows all with total_amount = ₹500. SUM(total_amount) for this order contributes ₹1,500 instead of ₹500 — triple-counted. The final sum across all orders is inflated by the average number of items per order.

The three fixes: (1) SUM the item-level column instead — SUM(order_items.line_total) is correct because each line_total is unique to each item row, not duplicated. (2) Use COUNT(DISTINCT order_id) instead of COUNT(*) to count orders, not item rows. (3) Pre-aggregate order_items in a subquery or CTE first (SELECT order_id, SUM(line_total) AS total FROM order_items GROUP BY order_id), then join that aggregated result to orders — one row per order in the joined result, no fan-out. The fan-out bug is especially dangerous because the query runs without errors and returns plausible-looking numbers. Always cross-check JOIN aggregates against simpler single-table queries.

Q: What is the difference between writing a filter in ON vs WHERE for an INNER JOIN?

For INNER JOIN specifically, a filter in ON and a filter in WHERE produce identical results. This is because INNER JOIN already excludes unmatched rows — it does not have the asymmetric NULL-preservation behaviour of LEFT JOIN. Whether the filter is applied during the join (ON) or after (WHERE), the same rows end up being excluded.

Example: JOIN customers AS c ON o.customer_id = c.customer_id AND c.city = 'Seattle' produces the same result as JOIN customers AS c ON o.customer_id = c.customer_id WHERE c.city = 'Seattle'. Both return only orders from Seattle customers.

However, there is a convention preference: conditions that define the join relationship (matching keys) belong in ON; conditions that filter the result belong in WHERE. This makes queries more readable and semantically clear. ON answers "how do the tables relate?" WHERE answers "what rows from the joined result do I want?" Mixing business filters into ON makes the join condition harder to scan. The performance implication is theoretically the same for INNER JOIN since the optimiser can push predicates into the join regardless — but readability favours keeping relationship conditions in ON and filter conditions in WHERE.

Q: How do you join more than two tables and does the order matter?

Joining more than two tables means chaining multiple JOIN clauses. Each JOIN adds one more table to the current result. The syntax is sequential: FROM a JOIN b ON a.key = b.key JOIN c ON b.key = c.key JOIN d ON c.key = d.key. Each JOIN's ON clause can reference columns from any table that has already been introduced — you cannot reference a table before it appears in the FROM/JOIN chain.

Logically, the order of JOINs matters for which columns are available in subsequent ON clauses. You cannot join order_items to products before joining orders to order_items — because order_items is not yet in scope when you try to join products to it. The practical rule: join tables in the order of the relationship chain, following the FK path through the schema.

Physically, the query optimiser reorders JOINs for performance — it does not respect the written order. It evaluates all possible join orderings (up to a threshold) and chooses the one with the lowest estimated cost based on table sizes, indexes, and statistics. For queries with 5+ tables, the optimiser uses heuristics rather than exhaustive search. You can influence the physical order with hints in some databases (e.g., LEADING in PostgreSQL), but this is rarely necessary unless the optimiser is making a provably bad choice. Write JOINs in the logical relationship order for readability; trust the optimiser for physical ordering.

Q: How would you find the top-selling product for each store using INNER JOIN?

Finding the top seller per store is a "top N per group" problem. The approach requires computing revenue per (store, product) combination, then selecting the product with the highest revenue within each store group.

Step 1 — compute revenue per store per product: SELECT o.store_id, p.product_id, p.product_name, SUM(oi.line_total) AS revenue FROM orders AS o JOIN order_items AS oi ON o.order_id = oi.order_id JOIN products AS p ON oi.product_id = p.product_id WHERE o.order_status = 'Delivered' GROUP BY o.store_id, p.product_id, p.product_name. This gives one row per (store, product) pair with total revenue.

Step 2 — select the top product per store. The cleanest approach uses ROW_NUMBER() window function: wrap Step 1 in a CTE or subquery, add ROW_NUMBER() OVER (PARTITION BY store_id ORDER BY revenue DESC) AS rn, then filter WHERE rn = 1 in the outer query. This assigns rank 1 to the highest-revenue product within each store and returns only those. An alternative without window functions: join the Step 1 result to a subquery that finds MAX(revenue) per store, matching on both store_id and revenue = max_revenue. The ROW_NUMBER approach is cleaner and handles ties more explicitly (ROW_NUMBER picks one; DENSE_RANK preserves ties). Window functions are covered in depth in Module 45.

// Part 13

Errors You Will Hit — And Exactly Why They Happen

JOIN returns 0 rows — both tables have data but nothing matches

Cause: The ON condition references the wrong columns, uses mismatched data types, or the FK values in one table do not exist as PK values in the other. Common causes: joining on the wrong column (o.store_id = c.customer_id by mistake), type mismatch where one column is VARCHAR and the other is INTEGER, or the tables genuinely have no overlapping values (staging data that has not been linked yet).

Fix: Test each table independently: SELECT DISTINCT customer_id FROM orders LIMIT 5 and SELECT customer_id FROM customers LIMIT 5 — confirm the values overlap. Check data types: SELECT pg_typeof(o.customer_id), pg_typeof(c.customer_id) FROM orders AS o, customers AS c LIMIT 1. If types differ, cast one: ON o.customer_id = c.customer_id::INTEGER. If no values overlap, investigate the data pipeline — FK values in the child table must exist in the parent table for INNER JOIN to return rows.

JOIN is correct but very slow — taking 30+ seconds on a medium table

Cause: The join column on the looked-up table (the inner table) is not indexed. Without an index, each row from the outer table requires a full sequential scan of the inner table to find matches. On a 100,000-row table joined to a 10,000-row table without indexes, this means up to 100,000 × 10,000 = 1 billion comparisons in the worst case.

Fix: Add an index on the join column: CREATE INDEX idx_orders_customer_id ON orders(customer_id). For primary keys and unique columns, indexes already exist. Check with: SELECT indexname, indexdef FROM pg_indexes WHERE tablename = 'orders'. Use EXPLAIN ANALYZE to confirm the query plan shows Index Scan (fast) rather than Sequential Scan (slow) on the inner table. For very large tables, also consider composite indexes that cover both the join column and the most common WHERE columns.

Three-table JOIN returns wrong row count — far more rows than expected

Cause: A many-to-many relationship in the join chain is creating a multiplicative fan-out. If orders join to order_items (one-to-many) AND order_items join to products (many-to-one), the result has one row per order-item-product combination — which may be more rows than you expected if you were thinking in terms of orders or products.

Fix: Always count rows at each JOIN step: add each JOIN one at a time and SELECT COUNT(*) after each one. When the count jumps unexpectedly, the last added JOIN is causing the fan-out. Examine the relationship: is it truly one-to-many? Does the ON condition match multiple rows? The solution is usually to aggregate one side before joining: pre-aggregate order_items to order-level totals, then join that aggregated result to orders — removing the item-level multiplication.

Column 'order_id' is ambiguous after joining orders and order_items

Cause: Both orders and order_items have an order_id column. Without a table alias prefix, SELECT order_id is ambiguous — the database does not know which table's order_id you mean.

Fix: Always prefix columns with their table alias in multi-table queries: SELECT o.order_id, oi.order_id AS item_order_id. In practice, for a shared key column like order_id, you usually only need one copy: SELECT o.order_id — prefix with the alias of the table that owns the column as a PK. Establish the habit of prefixing every column reference with a table alias in any query involving JOINs — this prevents both ambiguity errors and confusion about which table a column comes from.

CTE join returns different results than expected — CTE seems to not filter correctly

Cause: A filter inside the CTE is not behaving as expected, or the outer query's WHERE is conflicting with the CTE's logic. CTEs in PostgreSQL are not always inlined by the optimiser — they may be materialised (computed once, cached), which means subsequent filters in the outer query cannot be pushed into the CTE for optimisation. In some versions, the CTE materialises all its rows even if the outer query only needs a few.

Fix: Verify the CTE result independently: WITH my_cte AS (...) SELECT * FROM my_cte LIMIT 20. Confirm the CTE produces the expected rows. If the outer join to the CTE returns wrong results, check that the ON condition references the correct column names in the CTE (alias names, not original table names). In PostgreSQL 12+, use MATERIALIZED or NOT MATERIALIZED hints to control whether the CTE is inlined: WITH cte AS NOT MATERIALIZED (...) — NOT MATERIALIZED allows the optimiser to push filters into the CTE for better performance.

Try It Yourself

Write a query that produces a category performance report — one row per product category. Show: category, number of distinct products in that category, number of distinct orders that contained at least one product from that category, total units sold, total revenue (from order_items.line_total, rounded to 2 decimal places), average selling price (rounded to 2 decimal places), and the category's share of total delivered revenue as a percentage (rounded to 1 decimal place). Only include delivered orders. Sort by total revenue descending.

🎯 Key Takeaways

  • INNER JOIN returns only rows where the ON condition matches in both tables. It is the default JOIN type — writing JOIN without a qualifier means INNER JOIN.
  • Chain multiple JOINs sequentially — each JOIN adds one more table. Join in the logical relationship order: parent before child, following the FK path through the schema.
  • The fan-out bug: joining a one-to-many relationship before aggregating inflates aggregate values. SUM(orders.total_amount) after joining to order_items triple-counts each order with three items.
  • Prevent fan-out: SUM the many-side column (order_items.line_total), use COUNT(DISTINCT pk) for the one-side count, or pre-aggregate the many side in a CTE before joining.
  • For INNER JOIN, filters in ON and WHERE produce identical results. By convention: join relationship conditions in ON, business filter conditions in WHERE.
  • The right side of JOIN can be a subquery or CTE — join pre-aggregated or filtered data without base table constraints.
  • Join columns on the looked-up table must be indexed for performance. Primary keys are always indexed. Add explicit indexes on FK columns for frequently-joined non-PK columns.
  • Incremental build pattern: write base JOIN → SELECT COUNT(*) + LIMIT 5 spot-check → add WHERE → add GROUP BY → add HAVING. Never write the full complex query in one shot.
  • Verify aggregates from JOIN queries against simpler single-table queries. The fan-out bug is silent — queries run without errors and return plausible but wrong numbers.
  • CTEs make complex multi-JOIN queries readable by naming each analytical step. Each CTE is defined once and can be joined like a regular table in subsequent CTEs or the final SELECT.

What comes next

In Module 32, you learn LEFT and RIGHT JOINs in full depth — every pattern for keeping unmatched rows, the IS NULL filter idiom, optional relationships, and when to choose LEFT over INNER.

Module 32 → LEFT and RIGHT JOIN
Share

Discussion

0

Have a better approach? Found something outdated? Share it — your knowledge helps everyone learning here.

Continue with GitHub
Loading...