How Databases Work
Tables, rows, columns, data types, primary keys, foreign keys, constraints — the complete building blocks
// Part 01
What Actually Happens When You Run a Query
Before learning the building blocks of a database, it helps to understand what a database is actually doing every time you interact with it. Most tutorials skip this and jump straight to CREATE TABLE. That creates a mental model of a database as a passive container — a box you put data into and take data out of. The reality is far more active, and understanding it will make every concept in this course easier to grasp.
Here is the complete journey of a single query — what happens in the milliseconds between you pressing Enter and the result appearing on your screen.
// Part 02
Tables — The Complete Rules
A table is the fundamental unit of storage in a relational database. Every piece of data you ever store lives in a table. Understanding what makes a good table — and what makes a bad one — is the most important design skill in all of SQL. Bad table design causes problems that no amount of clever querying can fix.
Rule 1: One table, one thing
The single most important rule in database design: each table stores exactly one type of entity. FreshMart has a customers table that stores only customers. A products table that stores only products. An orders table that stores only orders. You would never put customer data and product data in the same table, even though both are used when a customer buys a product.
When beginners design databases for the first time they often try to put everything in one table — one row per order, with the customer's name, address, and loyalty tier repeated in every single row. This is called data redundancy and it causes serious problems: if Aisha Khan moves from Bangalore to Hyderabad, you have to update her address in every single order row. Miss one row and your data is inconsistent. In a properly designed database you update her address in exactly one place — the customers table — and every order automatically reflects it through the foreign key relationship.
Rule 2: Every column stores exactly one piece of information
Each column must represent one atomic, indivisible piece of data. A column called address that stores "12 Koramangala, Bangalore, 560034" is a bad design — city, street, and pincode are three different pieces of information crammed into one column. When you later want to find all customers in Bangalore, you would have to use string pattern matching on the whole address, which is slow and error-prone. Correct design separates them: street, city, state, pincode — four columns, four clean pieces of data.
Rule 3: Table and column names must be clear, lowercase, and use underscores
The universal convention for SQL tables and columns: all lowercase letters, words separated by underscores, no spaces, no special characters. customers not Customers. order_date not OrderDate or orderDate. unit_price not UnitPrice or unitprice. This matters because SQL is case-sensitive in some databases for table names, and mixed-case names require quoting everywhere — adding noise to every query you write.
The FreshMart tables examined
| Table | What one thing it stores | Why it is separate |
|---|---|---|
| customers | One row per person who shops at FreshMart | Customer details (name, city, tier) should only exist in one place — update once, reflected everywhere |
| orders | One row per purchase transaction | An order has its own attributes (date, status, total) that are not about the customer or the products inside it |
| order_items | One row per product line within an order | One order can have many products — storing them in the orders table would require an unknown number of columns |
| products | One row per thing FreshMart sells | Product details (name, price, category) should not be repeated in every order_items row — update once in products |
| stores | One row per physical FreshMart location | Store details (city, manager, target) are independent of any specific order or customer |
| employees | One row per person who works at FreshMart | Staff data (salary, role, hire date) is sensitive and operationally separate from customer or product data |
// Part 03
Data Types — Choosing the Right Container for Every Column
When you create a table, every column must be given a data type — a declaration of what kind of data that column will hold. This is not just a label. The database uses the data type to: allocate the right amount of storage space, enforce that only valid values can be inserted, and choose the right comparison rules when you filter or sort.
Choosing the wrong data type is a mistake you cannot fix cheaply. Changing a column from VARCHAR(50) to VARCHAR(255) on a table with 500 million rows can take hours and lock the table. Getting it right at design time is worth the extra five minutes of thought.
| Type | Storage | Example value | Use in FreshMart |
|---|---|---|---|
| INTEGER | 4 bytes | 1, 42, 1001 | customer_id, product_id, order_id — any whole number ID or count |
| BIGINT | 8 bytes | 9876543210 | phone numbers, large counters — when INTEGER's max of ~2 billion is not enough |
| DECIMAL(10,2) | variable | 340.00, 28.50 | unit_price, total_amount — money. NEVER use FLOAT for money — floating point arithmetic introduces rounding errors |
| VARCHAR(n) | up to n chars | 'Aisha', 'Bangalore' | first_name, city, product_name — variable-length text up to n characters |
| CHAR(n) | exactly n chars | 'ST001' | store_id — fixed-length codes where every value is always the same length |
| TEXT | unlimited | long descriptions | product descriptions, notes — when you cannot predict maximum length. Slower to index than VARCHAR |
| DATE | 3 bytes | 2024-01-05 | order_date, joined_date, hire_date — date without time, stored as YYYY-MM-DD |
| TIMESTAMP | 8 bytes | 2024-01-05 14:32:11 | created_at, updated_at — exact moment in time including hours, minutes, seconds |
| BOOLEAN | 1 byte | true, false | in_stock — yes/no flags. MySQL stores as TINYINT(1), PostgreSQL has a native BOOL type |
The money mistake — why FLOAT is wrong for prices
This is one of the most common beginner mistakes. If you define a price column as FLOAT or DOUBLE, you will eventually get values like 28.499999999998 instead of 28.5. This happens because floating-point numbers are stored in binary and cannot represent most decimal fractions exactly — the same way 1/3 cannot be written as a finite decimal. For currency you must always use DECIMAL(precision, scale). In FreshMart, DECIMAL(10,2) means up to 10 total digits with exactly 2 decimal places — sufficient for prices up to ₹99,999,999.99.
VARCHAR vs CHAR — when to use which
VARCHAR(n) stores variable-length strings — it only uses as much space as the actual string needs, plus 1–2 bytes to record the length. A VARCHAR(100) column storing the word "Amul" uses 5 bytes, not 100. CHAR(n) stores fixed-length strings — it always uses exactly n bytes, padding shorter values with spaces. Use CHAR for values that are always the same length: country codes (IN, US), store IDs (ST001), status codes. Use VARCHAR for everything else.
// Part 04
Primary Keys — Every Row's Unique Identity
A primary key is a column (or combination of columns) that uniquely identifies every row in a table. It is the most fundamental constraint in all of SQL. Every table you will ever create should have one.
What the database enforces automatically
When you declare a column as the PRIMARY KEY, the database automatically enforces two rules that you can never break: uniqueness — no two rows can have the same primary key value, and NOT NULL — the primary key column can never be empty. You do not need to write separate constraints for these. The PRIMARY KEY declaration implies both.
The database also automatically creates an index on the primary key column. This means lookups by primary key — the most common type of lookup in any application — are always fast, regardless of how large the table gets.
Auto-increment — let the database generate IDs
In almost every table you build, the primary key will be an auto-incrementing integer. This means the database generates the next value automatically whenever you insert a new row. In MySQL this is done with AUTO_INCREMENT. In PostgreSQL it is done with SERIAL or GENERATED ALWAYS AS IDENTITY. You never manually type a primary key value — the database handles it.
Composite primary keys — when one column is not enough
Sometimes a single column is not sufficient to uniquely identify a row, and you need a combination of two columns to serve as the primary key. This is called a composite primary key. The order_items table is a good example: item_id is a dedicated surrogate key here, but you could also uniquely identify each item by the combination of (order_id, product_id) — one order cannot contain the same product twice. Composite primary keys are common in junction tables (many-to-many relationships). In FreshMart we use a surrogate integer key for simplicity, but both approaches are correct.
🎯 Pro Tip
Always use a dedicated surrogate integer key (an auto-increment ID) as the primary key, even when another column or combination of columns could uniquely identify rows. Phone numbers change. Email addresses change. Business rules change. An integer ID never needs to change, which means every foreign key referencing it also never needs to change. This saves enormous pain when requirements evolve.
// Part 05
Foreign Keys — How Tables Connect and Why It Matters
A foreign key is a column in one table that stores the primary key value of a row in another table. It creates an enforced, permanent link between the two tables. Foreign keys are the mechanism that makes a database "relational" in practice — they are what allow you to ask questions that span multiple tables.
What referential integrity means
When you declare a foreign key, the database enforces a rule called referential integrity: every foreign key value must either match an existing primary key value in the referenced table, or be NULL. This has two practical consequences. First, you cannot insert an order with customer_id = 500 if no customer with id 500 exists — the database rejects the insert with an error. Second, you cannot delete a customer who still has orders — the database rejects the delete because it would leave orphaned orders pointing to nobody.
ON DELETE behaviour — what happens when the parent is deleted
You can control what happens to child rows when their parent is deleted by specifying a behaviour on the foreign key constraint:
| Option | What happens | When to use |
|---|---|---|
| RESTRICT (default) | Prevents deleting the parent if any child rows exist. The delete fails with an error. | Most situations — you want to know before deleting a customer that they have 8 orders. |
| CASCADE | Automatically deletes all child rows when the parent is deleted. | Dependent data that has no meaning without the parent — e.g. deleting a user account deletes their session tokens. |
| SET NULL | Sets the foreign key column to NULL in all child rows. | When the relationship becomes optional — e.g. an employee's manager is deleted but the employee stays. |
| SET DEFAULT | Sets the foreign key column to its default value in all child rows. | Rare. When child rows should fall back to a default parent. |
The FreshMart foreign key map in code
Notice the last one: employees.manager_id references employees.employee_id — a table referencing itself. This is called a self-referencing foreign key and it is how org charts and hierarchy data are stored. Priya Sharma (employee_id = 1) is the Store Manager with no manager above her (manager_id = NULL). Rahul Verma (employee_id = 2) reports to Priya, so his manager_id = 1. You will learn to query this hierarchy using SELF JOINs in Module 34.
// Part 06
Constraints — Rules the Database Enforces Automatically
Constraints are rules you attach to columns (or tables) that the database checks on every INSERT and UPDATE. If a value violates a constraint, the operation fails with an error — the data is never saved. Constraints are your first line of defence against bad data. They are far more reliable than checking validity in application code, because application code can have bugs, can be bypassed, and does not cover every path data enters through.
NOT NULL — the column must always have a value
By default, any column can contain NULL — the absence of a value. Adding NOT NULL means the column must always have a real value. Inserting a row without providing a value for a NOT NULL column causes an error. In FreshMart, first_name, last_name, and order_date are NOT NULL — it makes no sense to have a customer without a name or an order without a date.
UNIQUE — no two rows can have the same value
The UNIQUE constraint prevents duplicate values in a column across all rows. In FreshMart, the email column on customers is UNIQUE — no two customers can share the same email address. Unlike PRIMARY KEY, a UNIQUE column can contain NULL (and multiple NULLs are allowed, since NULL is not equal to NULL in SQL — more on this surprising behaviour in Module 11).
DEFAULT — the value used when nothing is provided
The DEFAULT constraint specifies what value the database should use when a row is inserted without providing a value for that column. In FreshMart, loyalty_tier has a DEFAULT of 'Bronze' — when a new customer is added without specifying their tier, they automatically start at Bronze. Defaults keep INSERT statements cleaner and reduce the chance of NULL slipping into columns that should always have a value.
CHECK — custom rule for valid values
The CHECK constraint lets you define an arbitrary condition that every value in a column must satisfy. For example, discount_pct in order_items should always be between 0 and 100. A CHECK constraint enforces this: CHECK (discount_pct >= 0 AND discount_pct <= 100). If you try to insert a discount of 150 or -5, the database rejects it immediately.
// Part 07
The FreshMart Schema — Every Table Examined Line by Line
Now that you understand tables, data types, primary keys, foreign keys, and constraints — let us look at the full FreshMart schema in one place. You will use this schema for every query in every module. Read it carefully and notice how every design decision reflects the rules covered in this module.
Notice how the second query uses JOIN to combine orders with customer names. The order table only stores customer_id — a number. To get the actual name, you join to the customers table using the foreign key relationship. This is the relational model working as designed. You will learn exactly how to write JOINs — including multiple-table JOINs — in Modules 30 through 35.
// Part 08
What This Looks Like at Work
You are hired as a junior backend developer at a Bangalore fintech startup. On your second week, you are asked to review a new table that a colleague designed and give feedback before it goes to production. Here is what that day looks like.
🎯 Pro Tip
Knowing database design principles — not just SQL syntax — is what separates developers who write queries from developers who design systems. A PR review comment that catches a FLOAT money column or a missing NOT NULL constraint demonstrates understanding that takes most engineers years to develop. It comes from understanding the why, not just the how.
// Part 09
Interview Prep — 5 Questions With Complete Answers
Both PRIMARY KEY and UNIQUE prevent duplicate values in a column. The differences are three. First, a table can have only one PRIMARY KEY but multiple UNIQUE constraints — you could have unique constraints on both email and phone independently. Second, PRIMARY KEY columns implicitly have NOT NULL — they can never contain a null value. UNIQUE columns can contain NULL, and in most databases multiple NULLs are allowed in a UNIQUE column because NULL is not considered equal to NULL. Third, the PRIMARY KEY is the canonical identifier for the row and is the column other tables reference in their foreign keys. UNIQUE constraints enforce business rules about uniqueness but do not serve as the relational identifier.
In FreshMart's customers table: customer_id is the PRIMARY KEY — it is the row's identity that orders reference. email has a UNIQUE constraint — we want no two customers sharing the same email, but email is not the identifier we use in foreign keys.
All three remove data but at very different levels. DELETE removes specific rows from a table based on a WHERE condition. It is a transactional operation — it can be rolled back if inside a transaction. It fires triggers. It is slow on large tables because it logs each deleted row. DELETE without a WHERE clause removes all rows but is still transactional and logged.
TRUNCATE removes all rows from a table much faster than DELETE because it does not log individual row deletions — it deallocates the data pages directly. It cannot be filtered with WHERE. In most databases it cannot be rolled back (though PostgreSQL is an exception — TRUNCATE is transactional there). It resets auto-increment counters. It does not fire row-level triggers.
DROP removes the table itself — not just the data, but the entire structure, all its constraints, indexes, and any foreign keys referencing it. The table no longer exists after DROP. You need to recreate it from scratch to use it again. DROP is irreversible and should never be run in production without a backup strategy.
Referential integrity is the guarantee that every foreign key value in a table corresponds to an existing primary key value in the referenced table — no row can reference a parent that does not exist. The database enforces this automatically when you declare a FOREIGN KEY constraint.
Enforcement happens at two moments. On INSERT or UPDATE: if you try to insert a row with a foreign key value that does not exist in the parent table, the database rejects the operation with an error. On DELETE or UPDATE of the parent: if you try to delete or change the primary key of a parent row that has child rows referencing it, the database either rejects the operation (RESTRICT) or handles it according to the ON DELETE behaviour you specified (CASCADE, SET NULL, SET DEFAULT).
Without referential integrity, orphaned records accumulate silently — orders pointing to deleted customers, items pointing to discontinued products. When you join these tables, the orphaned rows silently disappear from results with no error, causing data loss that is difficult to diagnose. Referential integrity prevents this class of problem entirely at the engine level.
FLOAT and DOUBLE are floating-point types that store numbers in binary representation. Most decimal fractions cannot be represented exactly in binary — the same way 1/3 cannot be written as a finite decimal. This means a value you insert as 199.99 might be stored and retrieved as 199.98999999999999. For arbitrary calculations this imprecision is acceptable. For financial data, it is catastrophic.
The consequences compound. If you sum 10,000 transactions each worth ₹199.99, the result might be ₹1,999,899.98 instead of ₹1,999,900. The discrepancy is small on each row but accumulates across millions of transactions. Accounting reconciliation fails. Regulatory audits find inconsistencies. Customers are charged or refunded wrong amounts.
The correct type for money is DECIMAL(precision, scale) — also called NUMERIC. DECIMAL stores values as exact decimal digits, with no binary conversion. DECIMAL(10,2) stores values from -99,999,999.99 to 99,999,999.99 with perfect precision. This is the universal standard for financial data across every production database. MySQL, PostgreSQL, Oracle, and SQL Server all support it identically.
A self-referencing foreign key is a foreign key in a table that references the primary key of the same table. It is used to represent hierarchical relationships where entities of the same type relate to each other — most commonly parent-child or manager-employee relationships.
In FreshMart's employees table, each employee has an employee_id (primary key) and a manager_id (foreign key). The manager_id column references employees.employee_id — a manager is also an employee. Priya Sharma is the Store Manager with employee_id = 1 and manager_id = NULL (she reports to nobody in this dataset). Rahul Verma is her Assistant Manager with employee_id = 2 and manager_id = 1 — he reports to Priya. This single foreign key declaration captures the entire org chart with no additional tables.
Self-referencing foreign keys appear in: organisational hierarchies (employees and managers), geographic hierarchies (countries containing states containing cities), category trees (a category that has a parent_category_id pointing to another category), and comment threads (a comment with a parent_comment_id pointing to the comment it replies to). You query this structure using a SELF JOIN or, for arbitrary depth, a recursive CTE — covered in Module 34 and Module 56 respectively.
// Part 10
Errors You Will Hit — And Exactly Why They Happen
🎯 Key Takeaways
- ✓When you run a query, six components work in sequence: Parser (validates syntax), Query Planner (chooses the best execution plan), Executor (runs the plan), Buffer Pool (serves from memory), Storage Engine (reads disk), Transaction Manager (enforces ACID).
- ✓The single most important table design rule: one table stores exactly one type of entity. Mixing customer and order data in one table causes redundancy that leads to inconsistency when data changes.
- ✓Every column must store one atomic piece of data. An address column storing "12 Koramangala, Bangalore, 560034" is bad design — split into street, city, state, pincode.
- ✓Use DECIMAL(precision, scale) for all money columns. FLOAT and DOUBLE introduce binary rounding errors that accumulate into accounting discrepancies on financial data.
- ✓PRIMARY KEY automatically enforces uniqueness and NOT NULL, and creates a B-tree index. Every table needs one. Use auto-increment integers in almost all cases.
- ✓UNIQUE allows NULL values (multiple NULLs permitted). PRIMARY KEY does not allow NULL. A table can have one PRIMARY KEY and multiple UNIQUE constraints.
- ✓Foreign keys enforce referential integrity — you cannot insert a child row referencing a non-existent parent, and you cannot delete a parent with existing children (by default). Use ON DELETE CASCADE only when child rows have no meaning without their parent.
- ✓Constraints (NOT NULL, UNIQUE, DEFAULT, CHECK) enforce data quality at the engine level — they work regardless of which application, script, or person inserts data. Application-level validation can be bypassed; database constraints cannot.
- ✓A self-referencing foreign key (manager_id referencing employee_id in the same table) is how hierarchies — org charts, category trees, comment threads — are stored in relational databases.
- ✓DROP removes the entire table structure. TRUNCATE removes all rows fast but cannot be filtered by WHERE. DELETE removes specific rows, is transactional, and can be rolled back.
What comes next
In Module 03, you learn about the three types of databases — Relational, NoSQL, and NewSQL — what problems each one was built to solve, and how to decide which one is right for any situation. This is the context that makes every tool decision in your career make sense.
Module 03 → Types of DatabasesDiscussion
0Have a better approach? Found something outdated? Share it — your knowledge helps everyone learning here.