Relational Model & Keys
The mathematical bedrock of every relational database — from Codd's formal definitions to every key type, every integrity constraint, and every design decision that flows from them.
The Relational Model — Built on Set Theory, Not Intuition
Most students learn the relational model by looking at tables in a spreadsheet and saying "oh, it's just rows and columns." That understanding is correct but dangerously shallow. The reason the relational model has endured for 55 years while every other data model has come and gone is not because tables are intuitive — it is because the relational model is built on a rigorous mathematical foundation that gives it provably correct semantics. Understanding that foundation is what separates someone who uses a database from someone who designs one correctly.
Edgar Codd was a mathematician before he was a computer scientist. When he designed the relational model in 1970, he deliberately grounded it in two branches of established mathematics: set theory(specifically, the theory of relations between sets) and first-order predicate logic(the language of logical conditions over variables). This choice was not accidental — it gave every database operation a provable meaning, and it gave the query language (eventually SQL) a mathematical model that could be formally analysed.
Mathematical Relation vs Relational Table — The Precise Connection
In mathematics, a relation between sets D₁, D₂, ..., Dₙ is a subset of the Cartesian product D₁ × D₂ × ... × Dₙ. The Cartesian product D₁ × D₂ × ... × Dₙ is the set of all possible ordered n-tuples where the first element comes from D₁, the second from D₂, and so on. A relation is a selected subset of those possible tuples — specifically, the tuples that represent true facts about the real world.
// Domain definitions:
D_customer_id = { all strings matching pattern 'C' followed by digits }
D_name = { all non-empty strings up to 100 characters }
D_city = { 'Bengaluru', 'Hyderabad', 'Mumbai', 'Pune', 'Chennai', 'Delhi', ... }
D_age = { integers from 0 to 150 }
// Cartesian product D_customer_id × D_name × D_city × D_age:
// contains EVERY possible combination:
// ('C001', 'Rahul Sharma', 'Bengaluru', 28)
// ('C001', 'Rahul Sharma', 'Bengaluru', 29)
// ('C001', 'Rahul Sharma', 'Mumbai', 28)
// ... (infinite combinations)
// The CUSTOMERS RELATION is a specific SUBSET of this Cartesian product:
// Only the tuples that represent real customers:
CUSTOMERS = {
('C001', 'Rahul Sharma', 'Bengaluru', 28),
('C002', 'Priya Reddy', 'Hyderabad', 31),
('C003', 'Arjun Nair', 'Mumbai', 24),
('C004', 'Kavya Krishnan','Bengaluru', 35),
}
// This set of 4 tuples IS the relation.
// It represents the true facts about customers at this moment in time.
// KEY MATHEMATICAL PROPERTY: It is a SET — which means:
// 1. No two identical elements (no duplicate tuples)
// 2. The elements have no inherent order (sets are unordered)
// These two properties have profound implications for database design.This mathematical grounding explains several features of SQL that seem arbitrary until you understand the set theory behind them. Why does SELECT DISTINCT exist? Because SQL relations are technically multisets (they can have duplicates, which pure sets cannot) — DISTINCT converts a multiset back to a proper set. Why does ORDER BY only affect display and not the underlying data? Because a set has no inherent order — ORDER BY is a display instruction, not a property of the relation itself. Why can relational algebra operations compose arbitrarily? Because every operation takes a relation (set) as input and produces a relation (set) as output — closure under composition.
Every Term in the Relational Model — Precise Definitions
Every technical domain has precise vocabulary — and the relational model is particularly careful about its terminology. The same concept has both a formal mathematical name and an informal everyday name. You must know both because different contexts use different names. A textbook uses "tuple." A job description says "row." An interview question might use either. Knowing only one will confuse you.
NULL — The Relational Model's Acknowledged Flaw
NULL is the most controversial concept in the relational model. Codd introduced it as a marker for "value unknown or inapplicable," but he recognised immediately that a single NULL value is insufficient — there is a semantic difference between "I don't know this person's salary" (unknown) and "this person has no salary" (not applicable) and "this person hasn't been assigned a salary yet" (pending). All three are represented identically as NULL in SQL, making it impossible for the database to distinguish them.
Codd proposed a four-valued logic (true, false, unknown, inapplicable) to address this, but no major DBMS implemented it. SQL instead uses three-valued logic (true, false, unknown) where any comparison with NULL produces "unknown" — which has specific, non-obvious consequences that every SQL developer must understand.
-- Three-valued logic: TRUE, FALSE, UNKNOWN
-- Any comparison with NULL produces UNKNOWN (not TRUE or FALSE)
-- TRAP 1: NULL = NULL is UNKNOWN (not TRUE!)
SELECT * FROM employees WHERE salary = NULL; -- returns ZERO rows!
SELECT * FROM employees WHERE salary IS NULL; -- correct: returns rows with null salary
-- TRAP 2: NULL != NULL is also UNKNOWN (not TRUE!)
SELECT * FROM employees WHERE salary != NULL; -- returns ZERO rows!
SELECT * FROM employees WHERE salary IS NOT NULL; -- correct
-- TRAP 3: Arithmetic with NULL propagates NULL
SELECT salary + 5000 FROM employees;
-- If salary is NULL → result is NULL (not salary + 5000!)
-- Fix: COALESCE(salary, 0) + 5000
-- TRAP 4: COUNT(*) vs COUNT(column) difference
CREATE TABLE test (id INT, value INT);
INSERT INTO test VALUES (1, 100), (2, NULL), (3, 200), (4, NULL);
SELECT COUNT(*) FROM test; -- Returns: 4 (counts all rows)
SELECT COUNT(value) FROM test; -- Returns: 2 (counts non-NULL values only)
SELECT AVG(value) FROM test; -- Returns: 150 (ignores NULLs — (100+200)/2, not /4!)
SELECT SUM(value) FROM test; -- Returns: 300 (ignores NULLs correctly)
-- TRAP 5: NOT IN with NULL subquery — the silent killer
-- "Find employees NOT in the New York office"
SELECT name FROM employees
WHERE employee_id NOT IN (
SELECT employee_id FROM ny_office_employees
);
-- If ny_office_employees has ANY row with employee_id = NULL,
-- this query returns ZERO ROWS — even though there are clearly non-NY employees!
-- Why: NOT IN is equivalent to "!= val1 AND != val2 AND != NULL"
-- "!= NULL" is UNKNOWN, and TRUE AND UNKNOWN = UNKNOWN (not TRUE)
-- So every row's condition becomes UNKNOWN → filtered out
-- FIX: Use NOT EXISTS instead
SELECT name FROM employees e
WHERE NOT EXISTS (
SELECT 1 FROM ny_office_employees n
WHERE n.employee_id = e.employee_id
);
-- NOT EXISTS handles NULL correctly
-- TRAP 6: NULL in CHECK constraints
-- CHECK (age > 0) passes for NULL! (UNKNOWN is treated as passing CHECK)
-- To also reject NULL: CHECK (age IS NOT NULL AND age > 0)
-- Or: use NOT NULL constraint separately
-- HANDLING NULL PROPERLY:
SELECT
name,
COALESCE(phone, 'No phone on record') AS phone_display,
NULLIF(commission, 0) AS actual_commission, -- treat 0 same as NULL
CASE WHEN salary IS NULL THEN 'Unknown'
WHEN salary < 50000 THEN 'Below average'
ELSE 'Above average' END AS salary_band
FROM employees;Properties Every Relation Must Satisfy — And Why Each Property Matters
A relation is not just any table with rows and columns. A properly defined relation must satisfy a specific set of properties that emerge from the mathematical definition of a set. These properties are not optional recommendations — they are constraints that every properly implemented DBMS enforces. Violating them produces what the relational model considers "not a valid relation."
Relational Schema Notation — How to Read and Write It
Academic papers, textbooks, and exam questions express relational schemas in a compact formal notation. Understanding this notation is essential for reading database literature, answering GATE questions, and communicating schema designs to other engineers without drawing diagrams.
// BASIC NOTATION: RelationName(attribute1, attribute2, ..., attributeN)
// Conventions:
// - PRIMARY KEY attributes are UNDERLINED (shown here as ALL_CAPS)
// - FOREIGN KEY attributes are shown with an asterisk or arrow notation
// - NULL/NOT NULL constraints sometimes shown explicitly
// EXAMPLE 1: Simple relation
CUSTOMERS(CUSTOMER_ID, name, email, city, age)
// ↑ underlined = primary key
// EXAMPLE 2: Relation with foreign key
ORDERS(ORDER_ID, customer_id*, order_date, total, status)
// ↑ asterisk = foreign key to CUSTOMERS.customer_id
// EXAMPLE 3: Composite primary key
ENROLLMENTS(STUDENT_ID*, COURSE_ID*, enrolled_date, grade)
// ↑────────────────────↑ both underlined together = composite PK
// Both are also FKs (to STUDENTS and COURSES respectively)
// EXAMPLE 4: Complete university database schema in notation
STUDENTS(STUDENT_ID, name, email, major, gpa, enrollment_year)
PROFESSORS(PROFESSOR_ID, name, department, email, office_room)
COURSES(COURSE_ID, title, credits, department, PROFESSOR_ID*)
ENROLLMENTS(STUDENT_ID*, COURSE_ID*, semester, grade, attendance_pct)
// ↑ composite PK — (student_id, course_id) together are unique
// Reading FK references:
// COURSES.PROFESSOR_ID → PROFESSORS.PROFESSOR_ID
// ENROLLMENTS.STUDENT_ID → STUDENTS.STUDENT_ID
// ENROLLMENTS.COURSE_ID → COURSES.COURSE_ID
// EXTENDED NOTATION with explicit constraints:
EMPLOYEES(EMPLOYEE_ID: INT NOT NULL,
name: VARCHAR(100) NOT NULL,
dept_id: INT NOT NULL REFERENCES DEPARTMENTS,
salary: DECIMAL(10,2) CHECK salary > 0,
hire_date: DATE DEFAULT CURRENT_DATE,
manager_id: INT REFERENCES EMPLOYEES)
// ↑ self-referential FK (recursive relationship)
// ALTERNATIVE NOTATION using arrows:
ORDERS.customer_id → CUSTOMERS.customer_id
EMPLOYEES.dept_id → DEPARTMENTS.dept_id
EMPLOYEES.manager_id → EMPLOYEES.employee_id // self-referentialSchema vs Instance — A Complete Contrast
-- THE SCHEMA (intension) — defined once, stable, structural:
CREATE TABLE products (
product_id VARCHAR(20) PRIMARY KEY,
product_name VARCHAR(200) NOT NULL,
category VARCHAR(100) NOT NULL,
price DECIMAL(10,2) NOT NULL CHECK (price >= 0),
stock_qty INT NOT NULL DEFAULT 0,
is_active BOOLEAN DEFAULT true,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- This CREATE TABLE statement defines:
-- - The relation name (products)
-- - Its degree (7 attributes)
-- - Each attribute's name and domain
-- - The primary key (product_id)
-- - Integrity constraints (price >= 0, stock_qty has default 0)
-- The schema exists the moment you run CREATE TABLE,
-- even BEFORE any data is inserted. An empty table has a schema.
-- THE INSTANCE (extension) — at 2024-03-15 14:32:00:
-- product_id | product_name | category | price | stock_qty | is_active
-- P001 | Chicken Biryani | Main Course| 280.00 | 50 | true
-- P002 | Masala Dosa | Breakfast | 120.00 | 30 | true
-- P003 | Butter Chicken | Main Course| 340.00 | 0 | false
-- P004 | Veg Thali | Meal Combo | 180.00 | 25 | true
-- Cardinality: 4 rows. Degree: 7 columns (unchanged from schema).
-- THE INSTANCE at 2024-03-15 15:00:00 (after INSERT and UPDATE):
-- P001 | Chicken Biryani | Main Course| 280.00 | 48 | true (2 sold)
-- P002 | Masala Dosa | Breakfast | 125.00 | 30 | true (price updated)
-- P003 | Butter Chicken | Main Course| 340.00 | 0 | false (unchanged)
-- P004 | Veg Thali | Meal Combo | 180.00 | 25 | true (unchanged)
-- P005 | Paneer Tikka | Starter | 220.00 | 15 | true (NEW — inserted)
-- Cardinality: now 5 rows. Schema: UNCHANGED. Degree: still 7.
-- KEY INSIGHT:
-- Schema changes require ALTER TABLE (planned, controlled, relatively rare)
-- Instance changes happen with every INSERT/UPDATE/DELETE (continuous, constant)
-- Application code is written against the SCHEMA — it stays correct as the instance changesKeys — The Complete Guide to Every Type, Every Nuance
Keys are the most interview-heavy topic in the entire relational model. Nearly every DBMS exam question, placement interview, and system design discussion involves key concepts. The topic is deceptively simple on the surface — "a key uniquely identifies a row" — but the full picture involves seven distinct key types, each with specific properties, each with specific implementation consequences, and several with subtle distinctions that interviewers deliberately probe.
We will examine every key type in complete depth — formal definition, informal intuition, a worked example using a concrete table, implementation details, and the interview traps specific to each type.
| emp_id | phone | name | dept_id | pan_number | salary | |
|---|---|---|---|---|---|---|
| E001 | rahul@co.in | 98765-43210 | Rahul Sharma | D01 | ABCDE1234F | 85000 |
| E002 | priya@co.in | 87654-32109 | Priya Reddy | D02 | FGHIJ5678K | 92000 |
| E003 | arjun@co.in | 76543-21098 | Arjun Nair | D01 | KLMNO9012L | 78000 |
| E004 | kavya@co.in | 65432-10987 | Kavya Krishnan | D03 | PQRST3456M | 95000 |
This EMPLOYEES relation has 4 tuples and 7 attributes. We will use it to illustrate every key type. The attributes are: emp_id (auto-assigned), email (work email), phone (mobile), name, dept_id (department), pan_number (Indian tax ID — unique per person), salary.
The Key Hierarchy — How All Key Types Relate to Each Other
The seven key types are not independent concepts — they form a precise hierarchy. Understanding the hierarchy makes every key concept self-reinforcing: once you know where each type sits, you know its definition without memorising it separately.
-- EMPLOYEES table demonstrating all key types:
CREATE TABLE employees (
-- PRIMARY KEY (also a SURROGATE KEY):
emp_id SERIAL PRIMARY KEY,
-- Surrogate: system-generated, no real-world meaning
-- Primary: the chosen official identifier
-- ALTERNATE KEYS (unchosen candidate keys):
email VARCHAR(150) NOT NULL UNIQUE, -- alternate key 1
phone VARCHAR(20) NOT NULL UNIQUE, -- alternate key 2
pan_number CHAR(10) NOT NULL UNIQUE, -- alternate key 3
-- Regular attributes (not keys by themselves):
name VARCHAR(100) NOT NULL,
salary DECIMAL(10,2) CHECK (salary > 0),
-- FOREIGN KEY (referential integrity to departments):
dept_id INT NOT NULL,
CONSTRAINT fk_dept FOREIGN KEY (dept_id) REFERENCES departments(dept_id)
ON DELETE RESTRICT ON UPDATE CASCADE,
-- Self-referential FOREIGN KEY (manager hierarchy):
manager_id INT,
CONSTRAINT fk_manager FOREIGN KEY (manager_id) REFERENCES employees(emp_id)
ON DELETE SET NULL ON UPDATE CASCADE
);
-- SUPER KEYS of employees (examples):
-- {emp_id}, {email}, {phone}, {pan_number} ← single-attribute super keys
-- {emp_id, email}, {emp_id, name}, {email, phone} ← multi-attribute super keys
-- {emp_id, email, phone, name, dept_id, pan_number, salary, manager_id} ← trivial super key
-- CANDIDATE KEYS of employees:
-- {emp_id}, {email}, {phone}, {pan_number} ← all minimal, all unique
-- PRIMARY KEY: {emp_id} (chosen — minimal, stable, integer)
-- ALTERNATE KEYS: {email}, {phone}, {pan_number} (UNIQUE constraints)
-- FOREIGN KEYS: dept_id → departments.dept_id, manager_id → employees.emp_id
-- COMPOSITE KEY example (from order_items table):
-- PRIMARY KEY (order_id, line_number) — composite because neither alone is uniqueIntegrity Constraints — The Database's Defence Against Bad Data
Integrity constraints are rules defined in the schema that the DBMS enforces automatically on every INSERT, UPDATE, and DELETE operation. They are the database's built-in defence against incorrect, inconsistent, or invalid data. Any operation that would violate a constraint is rejected — the DBMS returns an error and the database remains in its previous valid state.
Integrity constraints are not optional enhancements — they are fundamental to the relational model's value proposition. An application can try to bypass constraints in code (check for duplicate email before inserting, verify FK exists before adding a child row), but application-level checks have a critical weakness: concurrent transactions can bypass them. Two concurrent transactions can both check "is this email taken?" and both receive "no" — then both insert the same email. DBMS-level constraints enforce uniqueness atomically — this race condition is physically impossible at the database level.
The most fundamental constraint. Every value stored in an attribute must belong to the domain of that attribute — the defined set of valid values. Implemented through data type declarations and CHECK constraints in SQL.
-- Data type is the first domain constraint:
age INT -- values must be integers (no strings, no decimals)
salary DECIMAL(10,2) -- values must be valid decimal numbers
hire_date DATE -- values must be valid calendar dates
is_active BOOLEAN -- values must be true or false
-- CHECK extends domain constraints with semantic rules:
CREATE TABLE employees (
age INT NOT NULL CHECK (age >= 18 AND age <= 80),
-- Domain: integers between 18 and 80 (working age)
salary DECIMAL(10,2) NOT NULL CHECK (salary > 0),
-- Domain: positive decimal numbers only
blood_group CHAR(3) CHECK (blood_group IN ('A+','A-','B+','B-','AB+','AB-','O+','O-')),
-- Domain: exactly 8 specific string values (or NULL)
email VARCHAR(150) CHECK (email LIKE '%@%.%'),
-- Domain: strings containing @ and . (basic email format validation)
-- Note: proper email validation requires a regex CHECK in PostgreSQL
status VARCHAR(20) NOT NULL DEFAULT 'active'
CHECK (status IN ('active', 'inactive', 'suspended', 'terminated')),
-- Domain: exactly 4 allowed string values
hire_date DATE NOT NULL,
resign_date DATE,
-- Cross-column domain constraint:
CHECK (resign_date IS NULL OR resign_date > hire_date)
-- Resign date must be after hire date (if it exists)
);
-- VIOLATION EXAMPLES:
INSERT INTO employees (age, salary) VALUES (15, 50000);
-- ERROR: new row violates check constraint "employees_age_check"
-- Age 15 is outside the domain [18, 80]
INSERT INTO employees (age, status) VALUES (25, 'fired');
-- ERROR: new row violates check constraint "employees_status_check"
-- 'fired' is not in the allowed status domainEvery relation must have a key — a set of attributes that uniquely identifies each tuple. The key constraint requires that no two tuples can have identical values for the key attribute(s). This is enforced by PRIMARY KEY and UNIQUE constraints in SQL.
-- PRIMARY KEY enforces uniqueness + NOT NULL:
CREATE TABLE products (
product_id VARCHAR(20) PRIMARY KEY, -- unique + not null automatically
name VARCHAR(200) NOT NULL
);
-- UNIQUE enforces uniqueness but allows NULL (multiple NULLs allowed!):
CREATE TABLE employees (
emp_id SERIAL PRIMARY KEY,
email VARCHAR(150) UNIQUE NOT NULL, -- unique AND not null (alternate key)
pan_number CHAR(10) UNIQUE, -- unique but nullable (might not have PAN yet)
-- Note: multiple rows can have pan_number = NULL
-- because NULL ≠ NULL in SQL's three-valued logic
-- A UNIQUE constraint allows multiple NULLs
name VARCHAR(100) NOT NULL
);
-- IMPORTANT: UNIQUE with multiple NULLs behaviour
INSERT INTO employees (email, pan_number, name) VALUES ('a@co.in', NULL, 'Rahul');
INSERT INTO employees (email, pan_number, name) VALUES ('b@co.in', NULL, 'Priya');
-- Both inserts succeed! Two rows with NULL pan_number is allowed by UNIQUE.
-- NULL ≠ NULL → not considered duplicate by the uniqueness check.
-- COMPOSITE UNIQUE constraint:
CREATE TABLE schedules (
employee_id INT NOT NULL,
shift_date DATE NOT NULL,
shift_slot VARCHAR(20) NOT NULL, -- 'morning', 'afternoon', 'night'
UNIQUE (employee_id, shift_date, shift_slot)
-- One employee cannot have two assignments in same shift slot on same day
);The entity integrity rule states that primary key attributes must never be NULL. The NOT NULL constraint generalises this — any attribute declared NOT NULL must always have a value. Every INSERT must provide a value for NOT NULL columns (unless a DEFAULT is defined); every UPDATE that would set them to NULL is rejected.
-- Primary key is implicitly NOT NULL:
product_id VARCHAR(20) PRIMARY KEY -- NOT NULL is automatic
-- Explicit NOT NULL for required business data:
CREATE TABLE orders (
order_id SERIAL PRIMARY KEY,
customer_id INT NOT NULL, -- every order must have a customer
order_date DATE NOT NULL DEFAULT CURRENT_DATE, -- must have a date
status VARCHAR(20) NOT NULL DEFAULT 'pending', -- must have a status
delivery_addr TEXT NOT NULL, -- must have a delivery address
-- Optional fields (NULL allowed):
promo_code VARCHAR(20), -- order might not have a promo code
notes TEXT, -- customer notes are optional
delivered_at TIMESTAMP -- NULL until delivery occurs
);
-- NOT NULL violation:
INSERT INTO orders (customer_id, delivery_addr) VALUES (NULL, '123 Main St');
-- ERROR: null value in column "customer_id" violates not-null constraint
-- Choosing NOT NULL vs nullable:
-- NOT NULL: data that must exist for the record to make business sense
-- "what does an order without a customer even mean? nothing."
-- Nullable: data that is genuinely optional or unknown at insertion time
-- "promo code is optional, delivery time is unknown until it happens"For every non-NULL foreign key value in the referencing relation, there must exist a matching primary key value in the referenced relation. The database enforces this on INSERT (cannot add a child row with a FK that doesn't exist), on DELETE (cannot delete a parent row that has referencing child rows, unless CASCADE/SET NULL applies), and on UPDATE (cannot change a PK value that has referencing FK values, unless CASCADE applies).
-- Complete referential integrity scenario:
CREATE TABLE categories (
category_id SERIAL PRIMARY KEY,
name VARCHAR(100) UNIQUE NOT NULL
);
CREATE TABLE products (
product_id SERIAL PRIMARY KEY,
name VARCHAR(200) NOT NULL,
category_id INT NOT NULL,
price DECIMAL(10,2) NOT NULL,
FOREIGN KEY (category_id) REFERENCES categories(category_id)
ON DELETE RESTRICT -- safe default: cannot orphan products
ON UPDATE CASCADE -- if category_id changes, update all products
);
CREATE TABLE order_items (
order_id INT NOT NULL,
product_id INT NOT NULL,
quantity INT NOT NULL,
PRIMARY KEY (order_id, product_id),
FOREIGN KEY (product_id) REFERENCES products(product_id)
ON DELETE RESTRICT, -- cannot delete a product that has been ordered
FOREIGN KEY (order_id) REFERENCES orders(order_id)
ON DELETE CASCADE -- deleting an order deletes all its items
);
-- TESTING all referential integrity scenarios:
-- SCENARIO 1: INSERT with non-existent FK
INSERT INTO products (name, category_id, price)
VALUES ('Chicken Biryani', 999, 280);
-- ERROR: violates FK constraint — category_id 999 doesn't exist
-- SCENARIO 2: DELETE parent with existing children
DELETE FROM categories WHERE category_id = 1;
-- ERROR: violates FK constraint — products reference category_id = 1
-- SCENARIO 3: Correct insertion order
INSERT INTO categories (name) VALUES ('Main Course'); -- id = 1
INSERT INTO products (name, category_id, price) VALUES ('Chicken Biryani', 1, 280); -- OK
INSERT INTO products (name, category_id, price) VALUES ('Butter Chicken', 1, 340); -- OK
-- SCENARIO 4: Correct deletion order (without CASCADE)
DELETE FROM products WHERE category_id = 1; -- delete children first
DELETE FROM categories WHERE category_id = 1; -- now parent can be deleted
-- SCENARIO 5: With CASCADE — deletion propagates automatically
DELETE FROM orders WHERE order_id = 'ORD-001';
-- Automatically deletes all order_items WHERE order_id = 'ORD-001'
-- Because: FOREIGN KEY (order_id) ON DELETE CASCADEThe Most Common Interview Traps — Every Key Question Demystified
The relational model's key concepts are some of the most frequently asked topics in technical interviews — precisely because they seem simple on the surface but have subtle nuances that reveal genuine understanding. Here are every significant trap, with the precise answers that distinguish a prepared candidate from one who has only surface knowledge.
The Schema Review — A Senior Engineer Catches Five Key Mistakes
Key and constraint decisions made at schema design time are often irreversible in production — changing a primary key type after 100 million rows exist requires a full table rewrite. The following is a realistic code review conversation showing the kinds of key-related mistakes that junior engineers commonly make and what the correct approach is.
-- Junior engineer's initial schema:
CREATE TABLE payment_methods (
card_number VARCHAR(16) PRIMARY KEY, -- MISTAKE 1
cardholder_name VARCHAR(200) NOT NULL,
expiry_month INT, -- MISTAKE 2
expiry_year INT, -- MISTAKE 2
cvv VARCHAR(3), -- MISTAKE 3
user_id INT, -- MISTAKE 4
billing_address TEXT
);
CREATE TABLE transactions (
txn_id VARCHAR(50) PRIMARY KEY, -- MISTAKE 5
card_number VARCHAR(16) REFERENCES payment_methods,
amount DECIMAL(10,2),
txn_date TIMESTAMP DEFAULT NOW(),
status VARCHAR(20)
);Card numbers change. When a customer gets a new card (expiry, compromise, replacement), their card_number changes. Because this is the PK, every reference to it (in transactions table) must CASCADE UPDATE — meaning millions of transaction records need rewriting. Also: exposing card_number as a PK means it appears in URLs, logs, and error messages — PCI DSS violation risk.
expiry_month INT and expiry_year INT with no constraints allows storing month=13, year=1800, or a technically valid expiry date that is in the past. Domain constraints are missing entirely.
CVV must NEVER be stored. PCI DSS (Payment Card Industry Data Security Standard) explicitly prohibits storing CVV after transaction authorisation. Storing it in plaintext in a VARCHAR column is a severe compliance violation that can result in loss of payment processing capability and massive fines.
A payment method with no associated user is meaningless in this context. Additionally, there's no FOREIGN KEY constraint — user_id can reference a non-existent user, creating orphaned payment methods that belong to nobody.
Application-generated VARCHAR PKs (like order IDs built from timestamps or UUIDs formatted as strings) are brittle — the format can change, they can collide if generation logic has bugs, and VARCHAR comparisons are slower than integer comparisons. For a high-throughput transactions table, PK choice has significant performance implications.
-- Corrected schema:
CREATE TABLE payment_methods (
payment_method_id BIGSERIAL PRIMARY KEY, -- surrogate PK (stable, fast)
user_id INT NOT NULL, -- FK with NOT NULL
card_number CHAR(16) NOT NULL UNIQUE, -- alternate key
-- Note: in production, card_number should be TOKENIZED, not stored in plaintext
-- Use a payment vault (Stripe, Razorpay) that stores the actual number
-- and returns a token. Store the token here, not the card number.
cardholder_name VARCHAR(200) NOT NULL,
expiry_month SMALLINT NOT NULL CHECK (expiry_month BETWEEN 1 AND 12),
expiry_year SMALLINT NOT NULL CHECK (expiry_year >= EXTRACT(YEAR FROM CURRENT_DATE)),
-- CVV: REMOVED — never store CVV per PCI DSS requirement
card_type VARCHAR(20) NOT NULL CHECK (card_type IN ('visa','mastercard','rupay','amex')),
is_default BOOLEAN NOT NULL DEFAULT false,
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (user_id) REFERENCES users(user_id) ON DELETE RESTRICT,
-- RESTRICT: don't allow deleting users who have payment methods
-- Business rule: deactivate users, don't delete them, to preserve payment history
CHECK (
expiry_year > EXTRACT(YEAR FROM CURRENT_DATE)
OR (expiry_year = EXTRACT(YEAR FROM CURRENT_DATE)
AND expiry_month >= EXTRACT(MONTH FROM CURRENT_DATE))
)
-- Cross-column check: expiry must be in the future
);
CREATE TABLE transactions (
txn_id BIGSERIAL PRIMARY KEY, -- surrogate PK (fast, sequential)
txn_reference VARCHAR(30) NOT NULL UNIQUE, -- human-readable reference (alternate key)
-- e.g., 'TXN-20240315-00001' generated by application, stored as separate column
payment_method_id BIGINT NOT NULL, -- FK to surrogate PK (not card_number)
amount DECIMAL(12,2) NOT NULL CHECK (amount > 0),
currency CHAR(3) NOT NULL DEFAULT 'INR',
txn_timestamp TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
status VARCHAR(20) NOT NULL DEFAULT 'pending'
CHECK (status IN ('pending','authorised','captured','failed','refunded','voided')),
failure_reason TEXT, -- NULL unless status = 'failed'
FOREIGN KEY (payment_method_id) REFERENCES payment_methods(payment_method_id)
ON DELETE RESTRICT -- never delete a payment method that has transactions
);🎯 Key Takeaways
- ✓The relational model is built on set theory — a relation is a set of tuples. Set properties directly explain SQL behaviour: no inherent row order (ORDER BY is display-only), no duplicate rows by default, attributes identified by name not position.
- ✓Eight core terms: Relation=table, Tuple=row, Attribute=column, Domain=allowed values, Degree=column count (static), Cardinality=row count (dynamic), Schema=structure definition, Instance=actual current data.
- ✓NULL means unknown or inapplicable — not zero, not empty string. NULL in any arithmetic produces NULL. NULL = NULL produces UNKNOWN (not TRUE). Use IS NULL, not = NULL. NOT IN with a subquery that can return NULL silently returns no rows — use NOT EXISTS instead.
- ✓Five relational properties: no duplicate tuples (set property), tuples are unordered (set property), attributes are unordered within a tuple, all attribute values are atomic (1NF), each attribute has a unique name within the relation.
- ✓Key hierarchy: Super Key (any unique-identifying set) → Candidate Key (minimal super key) → Primary Key (chosen candidate key, NOT NULL enforced) / Alternate Key (unchosen candidate keys, UNIQUE constraint). Foreign Key is orthogonal — it references another relation's PK or alternate key.
- ✓Primary key must be: unique, not null, stable, minimal, and simple. Among candidate keys, prefer system-generated integers (surrogate keys) over natural keys for stability and performance.
- ✓Foreign key enforces referential integrity. ON DELETE options: RESTRICT (default, safest), CASCADE (propagates deletion), SET NULL (removes the reference), SET DEFAULT (substitutes a default). Choose based on the business rule for the relationship.
- ✓All four constraint types: Domain (CHECK + data type), Key (PRIMARY KEY + UNIQUE), NOT NULL (entity integrity), Referential (FOREIGN KEY). Constraints enforced at the DBMS level are race-condition-proof — application-level checks are not.
- ✓Surrogate keys (SERIAL/BIGSERIAL/UUID) are preferred in modern systems: they never change, are always simple, have no business meaning that could change, and perform better as integer comparisons in JOINs.
- ✓UNIQUE and PRIMARY KEY differ in one critical way: UNIQUE allows multiple NULL values (because NULL ≠ NULL in three-valued logic). PRIMARY KEY never allows NULL. This distinction is tested in nearly every DBMS interview.
Discussion
0Have a better approach? Found something outdated? Share it — your knowledge helps everyone learning here.