Functional Dependencies
The mathematical language of normalization — how attributes determine each other, how to derive all possible dependencies from a given set, and how to design schemas that enforce exactly the constraints the data requires.
What Functional Dependencies Are — And Why They Are the Engine Behind Normalization
In Module 05 we learned normalization: partial dependencies violate 2NF, transitive dependencies violate 3NF, and non-superkey determinants violate BCNF. But we used an informal version of these concepts — we said things like "student_name depends on student_id alone." This informality was sufficient for the introduction. Now we make it precise.
Functional dependencies (FDs) are the mathematical tool that makes "dependency" precise. They are constraints on a relation schema that specify which sets of attributes determine which other attributes. They are the formal language in which normalization theory is expressed. Without FDs, normalization is just intuition. With FDs, it becomes a mathematical procedure with provably correct outcomes.
Understanding functional dependencies deeply serves three critical purposes in practice and in examinations:
Given a set of FDs, compute which attribute sets can serve as candidate keys — without guessing or relying on intuition.
Given a schema and its FDs, determine exactly which normal forms are satisfied and which are violated — with mathematical certainty, not approximation.
When normalising, determine exactly how to split tables to preserve all dependencies and guarantee lossless joins — not by trial and error, but by algorithm.
The Formal Definition — Stated Precisely
Let R be a relation schema and let X and Y be subsets of the attributes of R (X ⊆ attrs(R) and Y ⊆ attrs(R)). A functional dependency X → Y (read: "X functionally determines Y" or "Y is functionally dependent on X") holds on R if and only if:
For every valid instance r of R: for any two tuples t₁ and t₂ in r, if t₁[X] = t₂[X] (the tuples have the same values for all attributes in X), then t₁[Y] = t₂[Y] (the tuples must also have the same values for all attributes in Y).
In plain language: knowing the value of X is sufficient to determine the value of Y. Whenever two rows agree on X, they must agree on Y. X cannot map to two different Y values in any valid state of the relation.
The word "every valid instance" is crucial. An FD is not a statement about one specific snapshot of the data — it is a constraint on ALL possible states the database can ever be in. It is a schema-level property, not a data-level observation. This distinction matters enormously: you cannot determine FDs by looking at a single snapshot of data — you can only verify that a proposed FD is not violated by a snapshot, or discover a violation.
FDs in the Real World — Examples That Build Intuition
Before formalising FD theory, let us build strong intuition with concrete real-world examples. For each FD, understand WHY it holds — what real-world fact makes it true.
Trivial vs Non-Trivial Functional Dependencies
Not all functional dependencies carry useful information. Some FDs are mathematically guaranteed to hold regardless of any data or schema properties — they are "trivially" true. Understanding this distinction is critical because normalization theory and Armstrong's Axioms distinguish carefully between trivial and non-trivial dependencies.
An FD X → Y is trivial if Y ⊆ X — if Y is a subset of X. In other words, the right-hand side is already contained within the left-hand side.
These FDs are always true for every relation, by definition of functional dependency — if two tuples agree on all attributes in X, they trivially agree on any subset of X. They add zero information about the data model.
An FD X → Y is non-trivial if Y ⊄ X — if at least one attribute in Y is NOT in X. Non-trivial FDs carry real information about the domain — they tell us something meaningful about how the data is related.
An FD is completely non-trivial if X ∩ Y = ∅ — the left and right sides share no attributes at all.
Why the Trivial/Non-Trivial Distinction Matters for Normal Forms
BCNF states: "for every non-trivial functional dependency X → Y, X must be a superkey." The word "non-trivial" is load-bearing. Without it, the definition would be absurd — trivial FDs are universally true and cannot be eliminated. 3NF similarly only considers non-trivial FDs.
Armstrong's Axioms, which we examine next, produce both trivial and non-trivial FDs. In practice, when we compute attribute closures or find candidate keys, we work with non-trivial FDs exclusively — trivial FDs are set-theoretically guaranteed and carry no design information.
Armstrong's Axioms — The Complete Inference System for FDs
In 1974, William W. Armstrong published "Dependency Structures of Data Base Relationships" — a paper that gave the relational model community something it critically needed: a complete, sound inference system for functional dependencies.
Complete means: using these axioms alone, you can derive every functional dependency that logically follows from a given set of FDs. No valid FD will escape you. Sound means: every FD derived using these axioms is genuinely valid — you will never derive a false FD.
There are exactly three axioms. All other inference rules (and there are infinitely many FDs that can be derived) are consequences of these three.
The Three Derived Rules — Consequences of the Axioms
Armstrong's three axioms are complete — but working with just three rules makes derivations tedious. Three additional rules are commonly used in practice. They are NOT independent axioms — they can all be proved using only Reflexivity, Augmentation, and Transitivity. But knowing them dramatically speeds up closure computations and derivations.
If X determines Y separately and Z separately, then X determines their union YZ together. You can always combine separately determined attributes onto one right-hand side.
Proof using axioms: Given X → Y and X → Z. By Augmentation of X → Y with Z: XZ → YZ. By Augmentation of X → Z with X: X → XZ. By Transitivity of X → XZ and XZ → YZ: X → YZ. □
// Application:
// employee_id → dept_id (given)
// employee_id → salary (given)
// By Union: employee_id → {dept_id, salary}
// This is useful when computing closures:
// Instead of tracking dept_id and salary as separate derived FDs,
// Union lets us combine them immediately.The reverse of Union. If X determines a combined set YZ, you can split the right-hand side into its individual components. X determines each part separately.
Proof: Given X → YZ. By Reflexivity: YZ → Y (since Y ⊆ YZ). By Transitivity of X → YZ and YZ → Y: X → Y. Similarly, YZ → Z by Reflexivity, giving X → Z by Transitivity. □
// Application:
// dept_id → {dept_name, dept_location, dept_budget} (given)
// By Decomposition:
// dept_id → dept_name (individual FD)
// dept_id → dept_location (individual FD)
// dept_id → dept_budget (individual FD)
// This is useful when testing whether a specific attribute is determined:
// "Is dept_location determined by dept_id?"
// Given dept_id → {dept_name, dept_location, dept_budget},
// Decomposition immediately gives us dept_id → dept_location.A generalisation of Transitivity. If X determines Y, and the combination of W and Y determines Z, then the combination of W and X determines Z (because WX can produce Y from X, then WY → Z gives us Z).
Proof: Given X → Y and WY → Z. By Augmentation of X → Y with W: WX → WY. By Transitivity of WX → WY and WY → Z: WX → Z. □
// Application:
// student_id → advisor_id (given: each student has one advisor)
// {advisor_id, course_id} → grade (given: advisor grades students in courses)
// By Pseudotransitivity:
// {student_id, course_id} → grade (replace advisor_id with student_id on LHS)
// This is the most practically useful derived rule for situations
// where you need to "substitute" one determining attribute for another.The Closure of a Set of Functional Dependencies (F⁺)
Given a set of functional dependencies F on a relation schema R, the closure of F (written F⁺) is the complete set of all functional dependencies that can be derived from F using Armstrong's Axioms. F⁺ includes every FD that logically follows from the given FDs — including the trivial FDs from Reflexivity and all the transitive chains from Transitivity.
F⁺ is typically much larger than F. For a relation with n attributes, F⁺ can contain exponentially many FDs. This is why we never compute F⁺ explicitly in practice — instead we compute the closure of specific attribute sets (the attribute closure algorithm), which is far more efficient and answers the specific questions we care about: "does this FD hold?" and "is this set of attributes a superkey?"
// Given F = {A → B, B → C, A → D} on R(A, B, C, D)
// F⁺ contains ALL of:
// Trivial (from Reflexivity):
// A → A, B → B, C → C, D → D
// {A,B} → A, {A,B} → B, {A,B} → {A,B}, etc. (all subsets of all attribute sets)
// Non-trivial (from Transitivity + Augmentation):
// A → B (given)
// B → C (given)
// A → D (given)
// A → C (Transitivity: A→B, B→C)
// A → BC (Union: A→B, A→C)
// A → BCD (Union: A→BC, A→D)
// A → ABCD (Union of A→A with A→BCD)
// AB → BC (Augmentation of A→BC with B? No... let's trace properly)
// ... (hundreds of FDs)
// TOTAL F⁺ for this small 4-attribute relation: dozens of FDs
// For a 10-attribute relation: thousands of FDs
// For a 20-attribute relation: millions of FDs
// INSTEAD: Ask "What does A determine?" → Compute A⁺ (attribute closure of A)
// This answers the question in O(n²) time — much more practical.Testing FD Membership — Does a Given FD Hold in F⁺?
The most common question in normalization work is: "Given F, does a specific FD X → Y hold?" The answer: compute X⁺ (the attribute closure of X with respect to F). If Y ⊆ X⁺, then X → Y holds in F⁺. If Y ⊄ X⁺, then X → Y does not hold.
This is why the attribute closure algorithm (Part 5) is the central algorithm of functional dependency theory — it answers membership questions for F⁺ efficiently without ever computing F⁺ explicitly.
The Attribute Closure Algorithm — The Most Important Algorithm in FD Theory
The attribute closure of a set of attributes X with respect to a set of FDs F (written X⁺ or X⁺_F) is the largest set of attributes that is functionally determined by X under F.
Formally: X⁺ = {A | X → A is in F⁺}. It is the set of all attributes that X can determine — directly or through chains of FDs. If X⁺ contains every attribute in the relation, then X is a superkey.
The Algorithm — Step by Step
Set closure = X (start with just the attributes you already know — you can determine yourself from yourself by Reflexivity)
Scan every FD in F. If the left-hand side (LHS) of an FD is contained within the current closure, add the right-hand side (RHS) to the closure.
When no FD can add any new attributes to the closure in a complete scan, the algorithm terminates. The current closure is X⁺.
Time complexity: O(n²) where n is the number of FDs. In the worst case, each iteration of the outer loop adds at least one attribute to the closure, and there are at most |R| attributes, so the algorithm runs at most |R| × |F| iterations. This is polynomial — the algorithm is efficient even for large schemas.
Three Complete Worked Examples — Different Difficulty Levels
// COMPUTE {AB}⁺ under F = {AB→C, C→D, D→E, E→B}
// Step 1 — Initialise:
closure = {A, B}
// Step 2 — Iteration 1: Scan all FDs
// FD: AB → C ← Is {A,B} ⊆ closure = {A,B}? YES
// Add C: closure = {A, B, C}
// FD: C → D ← Is {C} ⊆ closure = {A,B,C}? YES
// Add D: closure = {A, B, C, D}
// FD: D → E ← Is {D} ⊆ closure = {A,B,C,D}? YES
// Add E: closure = {A, B, C, D, E}
// FD: E → B ← Is {E} ⊆ closure = {A,B,C,D,E}? YES
// Add B (already in closure — no change)
// End of Iteration 1: closure = {A, B, C, D, E}
// Did anything change? YES — added C, D, E.
// Step 2 — Iteration 2: Scan all FDs again
// FD: AB → C ← C already in closure
// FD: C → D ← D already in closure
// FD: D → E ← E already in closure
// FD: E → B ← B already in closure
// No new attributes added in this iteration → TERMINATE
// RESULT: {AB}⁺ = {A, B, C, D, E} = all attributes of R
// CONCLUSION:
// {AB}⁺ = all attributes → AB is a SUPERKEY
// Can we remove A? {B}⁺: start {B}, E→B not helpful, no FD has {B} alone as LHS
// {B}⁺ = {B} ≠ all → cannot remove A
// Can we remove B? {A}⁺: start {A}, no FD has just {A} on LHS
// {A}⁺ = {A} ≠ all → cannot remove B
// AB is MINIMAL → AB is a CANDIDATE KEY ✓// STRATEGY: First identify which attributes CAN be in a candidate key.
// An attribute that appears ONLY on the right-hand side of FDs (never on LHS)
// CANNOT be a candidate key alone (nothing derives it independently).
// An attribute that appears ONLY on the left-hand side (never on RHS)
// MUST be in every candidate key (no FD produces it, so it cannot be derived).
// Classify attributes:
// A: appears on LHS (A→B, A→C) — can derive things
// appears on RHS (B→A) — can be derived
// B: appears on LHS (B→A, B→C) — can derive things
// appears on RHS (A→B) — can be derived
// C: appears on LHS (C→D) — can derive things
// appears on RHS (B→C, A→C) — can be derived
// D: appears ONLY on RHS (C→D) — NEVER on LHS
// D is only derivable, never a determiner
// D CANNOT be in any candidate key (removing it still gives a superkey)
// D MUST be derived — no candidate key contains D alone OR is reducible by removing D
// Since D never appears on LHS, D cannot help derive anything.
// Every candidate key must be able to derive D → must derive C (C→D) → must derive A or B.
// TEST {A}:
// A⁺: start {A}
// A → B: add B → {A,B}
// A → C: add C → {A,B,C}
// B → A: A already there
// B → C: C already there
// C → D: add D → {A,B,C,D}
// A⁺ = {A,B,C,D} = all attributes → A is a superkey
// Can we reduce? A alone → A⁺ = all → already single attribute
// A is a CANDIDATE KEY ✓
// TEST {B}:
// B⁺: start {B}
// B → A: add A → {A,B}
// B → C: add C → {A,B,C}
// A → B: B already there
// A → C: C already there
// C → D: add D → {A,B,C,D}
// B⁺ = {A,B,C,D} = all attributes → B is a superkey
// B alone → already single attribute
// B is a CANDIDATE KEY ✓
// TEST {C}:
// C⁺: start {C}
// C → D: add D → {C,D}
// C⁺ = {C,D} ≠ all attributes → C is NOT a candidate key
// (Cannot derive A or B from C alone)
// TEST {D}:
// D⁺: start {D}
// No FD has D on LHS → D⁺ = {D} ≠ all → NOT a candidate key
// TEST {A,B}: A⁺ = all → AB is a superkey. But A alone is a superkey → {A,B} is NOT minimal
// TEST {A,C}: A⁺ = all → AC superkey. A alone is superkey → NOT minimal
// TEST {B,C}: B⁺ = all → BC superkey. B alone is superkey → NOT minimal
// (Any multi-attribute set containing A or B is a non-minimal superkey)
// FINAL ANSWER: Candidate keys = {A} and {B}
// Both A and B independently determine all attributes of R.
// A↔B (they determine each other) — they are equivalent identifiers.// STEP 1: Classify attributes
// Appears ONLY on LHS (never on RHS): must be in every candidate key
// F appears on LHS (CF→B) but NEVER on RHS → F must be in EVERY candidate key
// A appears on LHS (AB→C), also on RHS (BC→AD) → not required in every key
// B appears on LHS (AB→C, BC→AD), also on RHS (CF→B) → not required in every key
// C appears on LHS (BC→AD, CF→B), also on RHS (AB→C) → not required in every key
// D appears on LHS (D→E), also on RHS (BC→AD) → not required in every key
// E appears ONLY on RHS (D→E) → CANNOT be in any candidate key (cannot derive others)
// CONCLUSION from classification:
// F must be in EVERY candidate key (never derived)
// E cannot be in any candidate key (never derives anything)
// So every candidate key contains F and does NOT contain E alone
// STEP 2: Test {A, F}
// Closure of {A,F}:
// Start: {A, F}
// FD AB→C: need {A,B} — B not in closure yet. Skip.
// FD BC→AD: need {B,C} — not in closure. Skip.
// FD D→E: need {D} — not in closure. Skip.
// FD CF→B: need {C,F} — C not in closure. Skip.
// No additions in iteration 1.
// {A,F}⁺ = {A,F} ≠ all → NOT a candidate key
// STEP 3: Test {A, B, F}
// Closure of {A,B,F}:
// Start: {A, B, F}
// FD AB→C: {A,B} ⊆ {A,B,F} → add C → {A,B,C,F}
// FD BC→AD: {B,C} ⊆ {A,B,C,F} → add A,D → {A,B,C,D,F}
// FD D→E: {D} ⊆ {A,B,C,D,F} → add E → {A,B,C,D,E,F}
// FD CF→B: {C,F} ⊆ {A,B,C,D,E,F} → add B (already there)
// End iteration 1: {A,B,C,D,E,F} = all attributes ✓
// {ABF}⁺ = all → ABF is a superkey
// Is it minimal? Can we remove any attribute?
// Remove A? {BF}⁺: start {B,F}, CF→B needs C, AB→C needs A...
// {B,F}: no FD fires → {BF}⁺ = {B,F} ≠ all → Cannot remove A ✓
// Remove B? {AF}⁺ = {A,F} ≠ all (computed above) → Cannot remove B ✓
// Remove F? {AB}⁺: AB→C → {A,B,C}, BC→AD → {A,B,C,D}, D→E → {A,B,C,D,E}
// {AB}⁺ = {A,B,C,D,E} — missing F! ≠ all → Cannot remove F ✓
// ABF is MINIMAL → ABF is a CANDIDATE KEY ✓
// STEP 4: Test {B, C, F}
// Closure of {B,C,F}:
// Start: {B,C,F}
// FD AB→C: needs A — not present. Skip.
// FD BC→AD: {B,C} ⊆ {B,C,F} → add A,D → {A,B,C,D,F}
// FD D→E: {D} ⊆ {A,B,C,D,F} → add E → {A,B,C,D,E,F}
// FD CF→B: already have B
// {BCF}⁺ = {A,B,C,D,E,F} = all ✓
// Is it minimal?
// Remove B? {CF}⁺: CF→B → add B → {B,C,F}, BC→AD → add A,D → {A,B,C,D,F}, D→E → {A,B,C,D,E,F}
// {CF}⁺ = all → B is redundant in BCF! NOT minimal
// So BCF is NOT a candidate key (CF alone is a superkey)
// STEP 5: Test {C, F}
// {CF}⁺ (computed above) = {A,B,C,D,E,F} = all ✓
// Is it minimal?
// Remove C? {F}⁺: no FD has just F on LHS → {F}⁺ = {F} ≠ all → cannot remove C ✓
// Remove F? {C}⁺: C→? no FD has just C on LHS → {C}⁺ = {C} ≠ all → cannot remove F ✓
// CF is MINIMAL → CF is a CANDIDATE KEY ✓
// FINAL ANSWER: Candidate keys = {A,B,F} and {C,F}
// (Check any other combinations systematically to confirm no others)
// Verify: A appears in ABF, F appears in both — consistent with our classificationThe Attribute Closure Algorithm — Uses Beyond Candidate Key Finding
The attribute closure algorithm answers every important question about FDs efficiently. Here are its primary applications:
Does X → Y hold in F⁺? Compute X⁺. If Y ⊆ X⁺ → yes. Else → no.
Compute X⁺. If X⁺ = all attributes of R → X is a superkey.
Test all subsets systematically (start with single attributes, then pairs, etc.). A subset is a candidate key if it is a superkey AND no proper subset is also a superkey.
Decompose R into R1(X,Y) and R2(X,Z). Check if X → Y or X → Z holds (i.e., if Y ⊆ X⁺ or Z ⊆ X⁺). If either holds → lossless decomposition.
An FD X → Y is redundant in F if Y ⊆ X⁺ computed from F − {X→Y}. If removing it from F does not change any closure, it is redundant.
In FD XA → Y, is A extraneous? Compute X⁺ from F. If Y ⊆ X⁺ → A is extraneous in XA → Y (X alone already determines Y).
Canonical Cover (Minimal Cover) — Reducing FDs to Their Essence
In practice, the set of FDs given for a schema is often redundant — it contains FDs that can be derived from others, and individual FDs may have extraneous attributes on their left-hand sides. A canonical cover (also called a minimal cover, written F_c) is an equivalent set of FDs that has no redundancy: no extraneous attributes anywhere, no redundant FDs, and every FD has a single attribute on its right-hand side.
Canonical covers are important for two reasons. First, they give you the smallest possible set of FDs that is logically equivalent to the original — useful for understanding the true constraints of a schema without noise. Second, they are used in the 3NF synthesis algorithm to produce a minimal set of tables in a 3NF decomposition.
What Makes an FD Set Canonical (Minimal)
For every FD X → A in F_c, no attribute in X is extraneous — removing any attribute from X changes the closure. Every LHS attribute genuinely contributes to the dependency.
Each FD in F_c has exactly ONE attribute on its right-hand side. X → AB is not canonical — it must be split into X → A and X → B using Decomposition.
No FD in F_c can be derived from the remaining FDs in F_c. Removing any FD changes the closure of some attribute set — every FD genuinely contributes.
The Canonical Cover Algorithm — Step by Step
// INPUT: A set of functional dependencies F
// OUTPUT: A canonical cover F_c equivalent to F
// PHASE 1: Convert all FDs to have single-attribute RHS (Decomposition Rule)
// For each FD X → {A, B, C, ...}, replace with X→A, X→B, X→C, ...
Example: F = {A → BC, B → C, A → B, AB → C}
After Phase 1:
F = {A→B, A→C, B→C, A→B, AB→C}
// Note: A→B appears twice — keep one copy
// PHASE 2: Remove extraneous attributes from LHS of each FD
// For each FD XA → B in F, check if A is extraneous:
// Compute X⁺ using F (not F − {XA→B}, use the full F)
// If B ∈ X⁺ → A is extraneous → replace XA→B with X→B
// Check AB → C (from our F):
// Is A extraneous in AB → C?
// Compute {B}⁺ using all of F = {A→B, A→C, B→C, AB→C}:
// Start {B}
// B→C: add C → {B,C}
// {B}⁺ = {B,C} → C ∈ {B}⁺ → A IS extraneous in AB→C
// Replace AB→C with B→C (already in F — this one is now truly redundant)
// After Phase 2: F = {A→B, A→C, B→C}
// PHASE 3: Remove redundant FDs
// For each FD X → A in F, check if it is redundant:
// Compute X⁺ using F − {X→A}
// If A ∈ X⁺ → X→A is redundant → remove it
// Check A→C: Is it redundant?
// Compute {A}⁺ using F − {A→C} = {A→B, B→C}:
// Start {A}
// A→B: add B → {A,B}
// B→C: add C → {A,B,C}
// {A}⁺ = {A,B,C} → C ∈ {A}⁺ → A→C IS redundant → REMOVE
// Check A→B: Is it redundant?
// Compute {A}⁺ using F − {A→B} = {B→C}:
// Start {A}
// B→C: need B — not in {A}. No FD fires.
// {A}⁺ = {A} → B ∉ {A}⁺ → A→B is NOT redundant → KEEP
// Check B→C: Is it redundant?
// Compute {B}⁺ using F − {B→C} = {A→B}:
// Start {B}
// A→B: need A — not in {B}. No FD fires.
// {B}⁺ = {B} → C ∉ {B}⁺ → B→C is NOT redundant → KEEP
// CANONICAL COVER: F_c = {A→B, B→C}
// Verify equivalence: {A}⁺ under F_c = {A,B,C} same as under original F? YES ✓
// Verify: {B}⁺ under F_c = {B,C} same as under F? YES ✓
// F_c is equivalent to F, has 2 FDs instead of 4, and all are non-redundant.Another Complete Example — Starting from Scratch
// Relation R(A, B, C, D) with:
// F = {A→BCD, AB→C, A→BC, B→D}
// Find the canonical cover.
// PHASE 1: Single-attribute RHS
// A→BCD becomes: A→B, A→C, A→D
// AB→C stays: AB→C
// A→BC becomes: A→B, A→C (already separated above)
// B→D stays: B→D
// After deduplication:
F = {A→B, A→C, A→D, AB→C, B→D}
// PHASE 2: Remove extraneous LHS attributes
// Check AB→C: Is A extraneous?
// {B}⁺ from F = {A→B, A→C, A→D, AB→C, B→D}:
// Start {B}
// B→D: add D → {B,D}
// No other FD fires with just {B,D}.
// {B}⁺ = {B,D} → C ∉ {B}⁺ → A is NOT extraneous in AB→C
// Is B extraneous in AB→C?
// {A}⁺ from F:
// Start {A}
// A→B: add B → {A,B}
// A→C: add C → {A,B,C}
// A→D: add D → {A,B,C,D}
// {A}⁺ = {A,B,C,D} → C ∈ {A}⁺ → B IS extraneous in AB→C
// Replace AB→C with A→C (already in F)
// After Phase 2: F = {A→B, A→C, A→D, B→D}
// (AB→C reduced to A→C, which was already present — just keep one copy)
// PHASE 3: Remove redundant FDs
// Check A→B: {A}⁺ from F − {A→B} = {A→C, A→D, B→D}:
// Start {A}, A→C: {A,C}, A→D: {A,C,D}. B ∉ → NOT redundant. KEEP.
// Check A→C: {A}⁺ from F − {A→C} = {A→B, A→D, B→D}:
// Start {A}, A→B: {A,B}, A→D: {A,B,D}, B→D: D already there.
// {A}⁺ = {A,B,D} → C ∉ → NOT redundant. KEEP.
// Check A→D: {A}⁺ from F − {A→D} = {A→B, A→C, B→D}:
// Start {A}, A→B: {A,B}, A→C: {A,B,C}, B→D: {A,B,C,D}
// {A}⁺ = {A,B,C,D} → D ∈ {A}⁺ → A→D IS redundant (A→B→D). REMOVE.
// Check B→D: {B}⁺ from F − {B→D} = {A→B, A→C}:
// Start {B}. No FD with just B on LHS. {B}⁺ = {B} → D ∉ → NOT redundant. KEEP.
// CANONICAL COVER: F_c = {A→B, A→C, B→D}
// Much cleaner than the original 5 FDs (which reduced to 4 after dedup, now 3).
// Verification: A⁺ under F_c: {A}→B→D, A→C → {A,B,C,D} = same as under original F ✓Equivalence of FD Sets — When Two Sets Represent the Same Constraints
Two sets of functional dependencies F and G are equivalent (written F ≡ G) if they have the same closure: F⁺ = G⁺. Equivalence means they impose exactly the same constraints on any relation — no more, no less.
Equivalence is important because: (a) the canonical cover of F is equivalent to F but simpler, (b) different decompositions may have different sets of FDs but still be equivalent to the original, and (c) testing equivalence is a common GATE question type.
// F and G are equivalent if and only if F ⊆ G⁺ AND G ⊆ F⁺
// (Every FD in F can be derived from G, and every FD in G can be derived from F)
// PROCEDURE:
// For each FD X → Y in F:
// Compute X⁺ using G
// Check if Y ⊆ X⁺
// If NO → G cannot derive this FD → F ≢ G
// For each FD X → Y in G:
// Compute X⁺ using F
// Check if Y ⊆ X⁺
// If NO → F cannot derive this FD → F ≢ G
// EXAMPLE:
// F = {A→B, B→C, A→C}
// G = {A→B, B→C}
// Does F ⊆ G⁺?
// A→B: {A}⁺ under G: A→B → {A,B}. B ∈ {A}⁺ → A→B derivable from G ✓
// B→C: {B}⁺ under G: B→C → {B,C}. C ∈ {B}⁺ → B→C derivable from G ✓
// A→C: {A}⁺ under G: A→B→C → {A,B,C}. C ∈ {A}⁺ → A→C derivable from G ✓
// F ⊆ G⁺ ✓
// Does G ⊆ F⁺?
// A→B: in F directly → derivable ✓
// B→C: in F directly → derivable ✓
// G ⊆ F⁺ ✓
// CONCLUSION: F ≡ G — they are equivalent.
// G is a canonical cover of F (A→C is redundant in F).The Precise Connection Between FDs and Every Normal Form
Now that functional dependencies are fully understood, we can state every normal form condition precisely using FD language. This is how normalization is implemented algorithmically — not by intuition, but by mechanical FD testing.
No FD condition — 1NF is a structural property about attribute atomicity. Every attribute must have an atomic domain (no set-valued attributes). A relation is automatically in 1NF if its schema has a primary key and all domains are atomic.
Inspect the schema domains. No algorithmic FD test required.
multi-valued or composite attribute values in a single cell.
A relation R with primary key PK is in 2NF if and only if for every non-trivial FD X → A where A is a non-prime attribute: X is not a proper subset of PK. In other words: no partial dependency X → A exists where X ⊂ PK (proper subset).
For each attribute A not in any candidate key, compute whether any proper subset of the primary key determines A. If yes → 2NF violation.
non-prime attribute determined by only part of a composite primary key.
A relation R is in 3NF if and only if for every non-trivial FD X → A in F⁺, at least one of: (1) X is a superkey of R, OR (2) A is a prime attribute (member of some candidate key of R).
Find all candidate keys (attribute closures). For each non-trivial FD X → A: check if X is a superkey OR A is prime. If neither → 3NF violation.
non-prime attribute determined by a non-superkey that is not itself a prime attribute.
A relation R is in BCNF if and only if for every non-trivial FD X → Y in F⁺, X is a superkey of R. Period. No exceptions for prime attributes.
Find all candidate keys. For each non-trivial FD X → Y: check if X is a superkey. If not → BCNF violation, regardless of whether Y is prime.
any non-trivial FD where the LHS is not a superkey.
A relation R is in 4NF if and only if for every non-trivial multi-valued dependency X ↠ Y in R, X is a superkey. FDs are a special case of MVDs (X → Y implies X ↠ Y), so every BCNF relation is automatically in 4NF with respect to functional dependencies. The additional constraint is about MVDs that are not implied by FDs.
Identify independent multi-valued attributes of the same key. Check if any two independent sets of multi-valued facts are being stored in the same table.
two independent multi-valued facts about the same key coexisting in one table.
The 3NF Synthesis Algorithm — Using FDs to Design Schemas
The 3NF synthesis algorithm uses FDs to produce a guaranteed 3NF decomposition that is both lossless and dependency-preserving. It is the algorithmic realisation of what we did manually in Module 05.
// INPUT: Relation R with attributes U and FD set F
// OUTPUT: A 3NF decomposition that is lossless and dependency-preserving
// STEP 1: Find the canonical cover F_c of F
// (Remove redundant FDs and extraneous attributes as described in Part 06)
// STEP 2: Create one relation schema for each FD in F_c
// For each FD X → A in F_c, create a relation R_i(X ∪ A)
// (The LHS attributes plus the single RHS attribute form one table)
// STEP 3: If no relation in the decomposition contains a candidate key of R,
// add a relation containing the attributes of one candidate key.
// (This ensures the decomposition is lossless)
// STEP 4: Remove any relation R_i that is a subset of another relation R_j
// (Eliminate redundant tables)
// WORKED EXAMPLE:
// R(student_id, student_name, dept_id, dept_name, course_id, course_name, grade)
// FDs: F = {student_id → student_name, student_id → dept_id, dept_id → dept_name,
// course_id → course_name, (student_id, course_id) → grade}
// STEP 1: Find canonical cover
// Check for extraneous attributes and redundant FDs:
// student_id → student_name: not redundant ✓
// student_id → dept_id: not redundant ✓
// dept_id → dept_name: not redundant ✓
// course_id → course_name: not redundant ✓
// (student_id, course_id) → grade: not redundant (neither alone determines grade) ✓
// F_c = F (already canonical in this example)
// STEP 2: Create one relation per FD in F_c
// R1(student_id, student_name) ← from student_id → student_name
// R2(student_id, dept_id) ← from student_id → dept_id
// R3(dept_id, dept_name) ← from dept_id → dept_name
// R4(course_id, course_name) ← from course_id → course_name
// R5(student_id, course_id, grade) ← from (student_id, course_id) → grade
// STEP 3: Does any relation contain a candidate key of R?
// Candidate key of R: (student_id, course_id) — appears in R5 ✓
// No additional relation needed.
// STEP 4: Remove subsets
// R2(student_id, dept_id) — is this a subset of any other relation? No.
// All relations survive.
// FINAL 3NF SCHEMA:
// R1(student_id PK, student_name)
// R2(student_id PK, dept_id FK→R?) ← Actually merge R1 and R2 since same PK
// Better: STUDENTS(student_id PK, student_name, dept_id FK)
// DEPARTMENTS(dept_id PK, dept_name)
// COURSES(course_id PK, course_name)
// ENROLLMENTS(student_id PK/FK, course_id PK/FK, grade)
// The 3NF synthesis algorithm formally produces this decomposition.
// In practice, good database designers apply the same logic intuitively.
// For exams: know the formal algorithm to apply it mechanically when needed.FD Misconceptions That Cost Marks and Cause Production Bugs
FDs in Production — Finding and Fixing a Schema Designed Without FD Analysis
Most production schemas were never formally analysed for functional dependencies — they were designed by intuition. The result is almost always hidden redundancies and anomalies that only surface after the system is in production with real data volumes. Here is a realistic scenario showing how FD analysis diagnoses and fixes a production problem.
The sales team reports that the same product shows different prices in different reports. Product "Chicken Biryani" shows ₹280 in the order history report and ₹300 in the product catalogue report. An engineer investigates.
-- The original schema (intuition-designed, never FD-analysed):
CREATE TABLE order_details (
order_id INT NOT NULL,
product_id INT NOT NULL,
product_name VARCHAR(200), -- stored here for "convenience"
product_category VARCHAR(100), -- stored here for "convenience"
current_price DECIMAL(10,2), -- PROBLEM: whose current? when?
ordered_qty INT NOT NULL,
unit_price DECIMAL(10,2), -- price at time of order
customer_id INT NOT NULL,
customer_name VARCHAR(100), -- stored here for "convenience"
customer_city VARCHAR(100), -- stored here for "convenience"
order_date DATE NOT NULL,
PRIMARY KEY (order_id, product_id)
);FD Analysis — diagnosing the problem formally:
// Identify all FDs that should hold on order_details:
// PK = (order_id, product_id)
// product_id → product_name, product_category, current_price
// (product attributes depend only on product_id — PARTIAL DEPENDENCY on PK)
// → 2NF VIOLATION → update anomaly root cause!
// customer_id → customer_name, customer_city
// (customer attributes depend only on customer_id)
// But customer_id itself depends on order_id → TRANSITIVE DEPENDENCY
// → 3NF VIOLATION
// order_id → customer_id, order_date
// (order attributes depend on order_id — PARTIAL DEPENDENCY on PK)
// → 2NF VIOLATION
// (order_id, product_id) → ordered_qty, unit_price
// These correctly depend on the full PK
// THE INCIDENT ROOT CAUSE:
// current_price is product_id → current_price (partial dependency).
// When a product's price changes, engineers run:
// UPDATE products SET price = 300 WHERE product_id = 101;
// They update the products table (not order_details).
// But order_details ALSO stores current_price (copied at order insertion).
// The two values diverge → two different prices in two different queries
// depending on which table they source from.-- After FD analysis: extract each dependency set to its own table
CREATE TABLE products (
product_id SERIAL PRIMARY KEY,
product_name VARCHAR(200) NOT NULL,
category VARCHAR(100),
current_price DECIMAL(10,2) NOT NULL -- single source of truth for CURRENT price
);
CREATE TABLE customers (
customer_id SERIAL PRIMARY KEY,
customer_name VARCHAR(100) NOT NULL,
customer_city VARCHAR(100)
);
CREATE TABLE orders (
order_id SERIAL PRIMARY KEY,
customer_id INT NOT NULL,
order_date DATE NOT NULL DEFAULT CURRENT_DATE,
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);
CREATE TABLE order_items (
order_id INT NOT NULL,
product_id INT NOT NULL,
ordered_qty INT NOT NULL CHECK (ordered_qty > 0),
unit_price DECIMAL(10,2) NOT NULL, -- SNAPSHOT: price at time of order (historical)
-- unit_price is INTENTIONAL denormalisation for historical accuracy
-- It is NOT synced with products.current_price — that is correct by design
PRIMARY KEY (order_id, product_id),
FOREIGN KEY (order_id) REFERENCES orders(order_id) ON DELETE CASCADE,
FOREIGN KEY (product_id) REFERENCES products(product_id) ON DELETE RESTRICT
);
-- NOW: "current price" always comes from products.current_price (single source)
-- "historical price" always comes from order_items.unit_price (snapshot)
-- These are genuinely different data — one current fact, one historical fact
-- They SHOULD be different values when prices change — by design
-- QUERIES:
-- Product catalogue price (always current):
SELECT product_name, current_price FROM products WHERE product_id = 101;
-- Order history price (always historical snapshot):
SELECT oi.unit_price, o.order_date
FROM order_items oi JOIN orders o ON oi.order_id = o.order_id
WHERE oi.product_id = 101;
-- No more inconsistency — two different CORRECT answers for two different questions.FD Interview and GATE Questions — With Complete Solutions
All attributes form a cycle: A→B→C→D→A. Each single attribute can derive all others through the cycle. Test each:
{A}⁺: A→B→C→D→A = {A,B,C,D} = all → A is candidate key
{B}⁺: B→C→D→A→B = {A,B,C,D} = all → B is candidate key
{C}⁺: C→D→A→B→C = {A,B,C,D} = all → C is candidate key
{D}⁺: D→A→B→C→D = {A,B,C,D} = all → D is candidate key
All 4 single attributes are candidate keys. R has 4 candidate keys.
Every attribute is a prime attribute → no non-prime attributes → R is in BCNF.Classify: A appears only on LHS (AB→C) → must be in every candidate key.
B appears only on LHS (AB→C) → must be in every candidate key.
D,E appear on both sides (D→E, E→D) — they determine each other but not C.
C appears only on RHS → cannot alone determine everything.
Test {A,B}: {AB}⁺: AB→C → {A,B,C}. D→E, E→D with {A,B,C}: no D or E yet.
{AB}⁺ = {A,B,C} ≠ all → not a candidate key.
Test {A,B,D}: {ABD}⁺: AB→C → {A,B,C,D}, D→E → {A,B,C,D,E} = all ✓
Minimal? Remove A: {BD}⁺ = {B,D,E} (D→E) ≠ all → cannot remove A.
Remove B: {AD}⁺ = {A,D,E} (D→E) ≠ all → cannot remove B.
Remove D: {AB}⁺ = {A,B,C} ≠ all → cannot remove D. → ABD is a CANDIDATE KEY ✓
Test {A,B,E}: {ABE}⁺: AB→C → {A,B,C,E}, E→D → {A,B,C,D,E} = all ✓
Minimal? Same analysis as ABD → ABE is a CANDIDATE KEY ✓
Any others? Any set containing {AB} plus D or E gives a candidate key.
{ABD} and {ABE} are the only two candidate keys.To test if CD→AB holds in F⁺, compute {CD}⁺ under F and check if {A,B} ⊆ {CD}⁺.
{CD}⁺: Start {C,D}
A→BC: need A — not in closure. Skip.
CD→E: {C,D} ⊆ {C,D} → add E → {C,D,E}
B→D: need B — not in closure. Skip.
E→A: {E} ⊆ {C,D,E} → add A → {A,C,D,E}
Iteration 2:
A→BC: {A} ⊆ {A,C,D,E} → add B,C → {A,B,C,D,E}
All other FDs: nothing new to add.
{CD}⁺ = {A,B,C,D,E} → A ∈ {CD}⁺ and B ∈ {CD}⁺ → YES, CD→AB is derivable from F ✓Every attribute forms a cycle with all others. As in the first question pattern:
{P}⁺ = {P,Q,R,S} = all → P is candidate key
{Q}⁺ = {Q,R,S,P} = all → Q is candidate key
{R}⁺ = {R,S,P,Q} = all → R is candidate key
{S}⁺ = {S,P,Q,R} = all → S is candidate key
All 4 attributes are candidate keys → ALL attributes are prime attributes → NO non-prime attributes exist.
Normal form analysis:
1NF: assume atomic values → satisfied ✓
2NF: no non-prime attributes → no partial dependencies possible → satisfied ✓
3NF: no non-prime attributes → no transitive dependencies possible → satisfied ✓
BCNF: for every non-trivial FD X→Y, is X a superkey?
P→Q: P is a candidate key (superkey) ✓
Q→R: Q is a candidate key ✓
R→S: R is a candidate key ✓
S→P: S is a candidate key ✓
All FDs have superkeys on LHS → BCNF satisfied ✓
4NF: no independent MVDs (all MVDs are implied by FDs) → 4NF satisfied ✓
5NF: No join dependencies beyond what keys imply → 5NF satisfied ✓
ANSWER: R is in 5NF (and all lower normal forms).PHASE 1: Single-attribute RHS — already done (all FDs have one attribute on RHS).
F = {A→B, B→C, A→C, A→D}
PHASE 2: Remove extraneous LHS attributes — no FD has composite LHS → skip.
PHASE 3: Remove redundant FDs:
Check A→C: {A}⁺ from F − {A→C} = {A→B, B→C, A→D}:
A→B → {A,B}, B→C → {A,B,C}, A→D → {A,B,C,D}
{A}⁺ = {A,B,C,D} → C ∈ {A}⁺ → A→C IS redundant → REMOVE.
F now = {A→B, B→C, A→D}
Check A→B: {A}⁺ from {B→C, A→D}:
A→D → {A,D}. B not reachable. {A}⁺ = {A,D} → B ∉ → NOT redundant. KEEP.
Check B→C: {B}⁺ from {A→B, A→D}:
Need A to use A→B or A→D. Can't get A from B. {B}⁺ = {B} → C ∉ → NOT redundant. KEEP.
Check A→D: {A}⁺ from {A→B, B→C}:
A→B → {A,B}, B→C → {A,B,C}. {A}⁺ = {A,B,C} → D ∉ → NOT redundant. KEEP.
CANONICAL COVER: F_c = {A→B, B→C, A→D}
Verify: {A}⁺ under F_c = {A}→B→C, A→D = {A,B,C,D} = same as under original F ✓🎯 Key Takeaways
- ✓A functional dependency X → Y is a constraint on ALL valid instances of a relation: whenever two tuples agree on X, they must agree on Y. FDs are schema-level constraints derived from domain semantics — never from inspecting current data.
- ✓Trivial FD: Y ⊆ X — always holds by set theory, carries no useful design information. Non-trivial FD: Y ⊄ X — carries meaningful constraint information. Completely non-trivial: X ∩ Y = ∅. All normal form definitions are stated in terms of non-trivial FDs.
- ✓Armstrong's 3 Axioms are sound and complete: Reflexivity (Y ⊆ X → X→Y), Augmentation (X→Y → XZ→YZ), Transitivity (X→Y and Y→Z → X→Z). All valid FDs can be derived using only these three rules. Three derived rules (Union, Decomposition, Pseudotransitivity) speed up practical derivation.
- ✓The attribute closure X⁺ is the set of all attributes determined by X under F. Algorithm: start with X, repeatedly apply FDs to grow the set until stable. X⁺ = all attributes → X is a superkey. X⁺ = all and X is minimal → X is a candidate key. Time complexity: O(n²).
- ✓Finding ALL candidate keys: classify attributes (LHS-only attributes must be in every CK, RHS-only attributes cannot be in any CK). Test subsets starting from smallest, compute closures, verify minimality. The attribute closure algorithm is the only reliable method.
- ✓F⁺ (FD set closure) = all FDs derivable from F. Computing F⁺ explicitly is exponential. Instead, test FD membership by computing attribute closures: X → Y ∈ F⁺ iff Y ⊆ X⁺.
- ✓Canonical cover F_c: equivalent to F, with single-attribute RHS, no extraneous LHS attributes, no redundant FDs. Not unique — different orderings can produce different but equivalent canonical covers. Used in 3NF synthesis algorithm.
- ✓Two FD sets F and G are equivalent iff F ⊆ G⁺ and G ⊆ F⁺ — every FD in each set is derivable from the other. Test using attribute closures: for each FD X→Y in F, verify Y ⊆ {X}⁺ under G, and vice versa.
- ✓3NF test using FDs: for every non-trivial FD X → A, either X is a superkey OR A is prime. BCNF test: for every non-trivial FD X → Y, X must be a superkey — no exceptions. 3NF synthesis algorithm uses canonical cover to produce guaranteed 3NF decomposition that is lossless and dependency-preserving.
- ✓Attribute classification shortcut for candidate key finding: attributes appearing ONLY on LHS of any FD must appear in every candidate key. Attributes appearing ONLY on RHS can never be in any candidate key. This dramatically reduces the search space.
Discussion
0Have a better approach? Found something outdated? Share it — your knowledge helps everyone learning here.