Slowly Changing Dimensions Explained — SCD Type 1, 2, and 3
Slowly Changing Dimensions (SCDs) appear in almost every data warehousing interview. They describe how to handle changes to dimension data over time — and getting this decision wrong can corrupt your entire historical analysis.
What is a dimension and why does it change slowly?
In a data warehouse, facts are measurements (a sale happened, an event occurred) and dimensions are the context (who, what, where). Customer name, product category, store location — these change infrequently but they do change.
When a customer moves cities, or a product changes category, you have a choice: overwrite the old value, keep both, or track the change date. That choice is your SCD type.
SCD Type 1 — Overwrite
Simply overwrite the old value with the new one. No history is kept.
Use when: history does not matter. Fixing a typo in a customer name. Updating a product weight. Cases where the old value was wrong.
Implementation in Spark or SQL: MERGE statement that updates the target row when the source key matches.
Weakness: historical reports change retroactively. A sales report from last year will show the customer's current city, not the city they lived in when the sale happened.
SCD Type 2 — Add a new row
When a value changes, keep the old row and add a new row with the new value. Use effective_date, expiry_date, and is_current columns to track which row is active.
This is the most common SCD type in production data warehouses. Every historical fact points to the dimension row that was active at the time.
Implementation: MERGE with matched and not-matched conditions. When a change is detected on a matched key, expire the current row (set expiry_date, is_current = false) and insert the new row.
Use when: historical accuracy matters. Customer location for regional sales analysis. Employee department for headcount history.
SCD Type 3 — Add a column
Add a new column to store the previous value: current_city and previous_city.
Use when: you only need to compare the current value to the immediately previous one — not full history.
Weakness: only tracks one previous value. If a customer moves three times, you lose the first two addresses. Rarely used in modern pipelines where storage is cheap and Type 2 adds full history.