====== Slowly Changing Dimensions (SCD) ====== **Slowly Changing Dimensions (SCD)** is a data warehousing technique that addresses the challenge of tracking changes to dimension attributes over time. In data warehouse environments, dimension tables contain relatively static reference data—such as customer information, product details, or geographic locations—that occasionally requires updates. SCD provides a systematic framework for managing these changes while preserving historical context needed for temporal analysis and audit trails. ===== Overview and Purpose ===== Dimension tables form the descriptive backbone of analytical data warehouses, complementing fact tables that record transactional events. Unlike fact tables, which typically grow through new record insertion, dimension tables require mechanisms to accommodate updates to existing attributes. SCD techniques address this challenge by defining how to handle dimension record modifications—whether to overwrite previous values, maintain historical versions, or create hybrid approaches. The fundamental purpose of SCD is to enable **time-based analysis** of how entities evolve. Organizations require the ability to answer historical questions such as: "What was this customer's address when this transaction occurred?" or "How have product prices changed over the past fiscal year?" Without SCD methodologies, this historical context becomes irretrievable once dimension records are updated (([[https://www.kimballgroup.com/data-warehouse-business-intelligence-resources/|Kimball Group - Data Warehouse Design Resources]])). ===== SCD Type 1: Current State Only ===== SCD Type 1 represents the simplest approach to handling dimension changes. When a dimension attribute is updated, the existing row is overwritten in place with the new value, eliminating the previous state entirely. This approach requires minimal storage overhead and simplifies query logic, as no temporal logic is necessary to retrieve current values. Type 1 implementations must handle out-of-order updates, deduplication, deletes, and idempotent reprocessing to maintain a current view of business data (([[https://www.databricks.com/blog/stop-hand-coding-change-data-capture-pipelines|Databricks - Stop Hand-Coding Change Data Capture Pipelines (2026]])). Type 1 is appropriate for scenarios where historical accuracy is not critical or where dimension changes represent corrections rather than meaningful business events. Examples include: * Correcting misspelled customer names or addresses * Updating product descriptions to reflect current specifications * Revising employee job titles when they remain in the same position * Fixing data entry errors in reference tables The implementation is straightforward: a simple UPDATE statement replaces the old attribute value with the new one. However, this simplicity comes at the cost of losing the ability to perform historical analysis or reconstruct past states of the dimension (([[https://www.ora.com/learn/data-warehousing-and-business-intelligence/slowly-changing-dimensions|Oracle Learning - Slowly Changing Dimensions]])). ===== SCD Type 2: Complete History with Validity Windows ===== SCD Type 2 implements a more sophisticated approach by maintaining **complete historical records** of all dimension versions. Rather than updating existing rows, Type 2 creates new rows for each dimension change, with validity windows indicating the time periods during which each version was active. Type 2 implementations typically include the following key columns: * **Surrogate Key**: Unique identifier for each version of the dimension member * **Business Key**: Natural identifier (e.g., customer ID) that remains constant across versions * **Effective Date**: Timestamp when the new version became active * **End Date**: Timestamp when this version was superseded (NULL for current version) * **Current Flag**: Binary indicator (Y/N or 1/0) marking the currently active version * **Change Reason**: Optional documentation of why the change occurred When a dimension attribute changes, the previous record's end date is populated with the change timestamp, and a new record is inserted with an effective date matching that same timestamp. This approach preserves complete lineage, enabling organizations to reconstruct historical dimension states for any point in time. Type 2 is essential for entities where attribute changes represent significant business events requiring historical tracking: * Customer address changes (affecting shipping history and regional analysis) * Product price adjustments (enabling margin analysis by historical pricing) * Employee department transfers (supporting organizational reporting) * Account status transitions (maintaining compliance and audit records) The trade-off involves increased storage requirements and more complex query logic requiring joins on effective/end dates (([[https://www.snowflake.com/en/data-cloud/solutions/data-warehouse/dimension-tables/|Snowflake - Dimension Table Design]])), (([[https://www.redshift.aws.amazon.com/docs/|AWS Redshift Documentation]])). ===== SCD Type 3 and Beyond ===== SCD Type 3 offers a middle ground by maintaining a limited history, typically storing only the current and previous values of changed attributes. This approach is less commonly used but can be appropriate for tracking recent changes without the storage overhead of Type 2. Advanced implementations often combine multiple SCD types within a single data warehouse. Large enterprises frequently apply Type 1 to non-critical attributes, Type 2 to business-critical dimensions, and Type 3 selectively for attributes requiring recent-history tracking. The choice depends on business requirements, query patterns, and acceptable storage trade-offs. ===== Implementation Patterns ===== Modern data warehouse platforms support SCD implementation through multiple patterns. **Merge operations** in tools like Apache Spark and Databricks enable efficient dimension updates by combining insert and update logic. CDC (Change Data Capture) pipelines automatically detect source system changes and propagate them to dimension tables using appropriate SCD logic, eliminating manual change detection and enabling real-time or near-real-time dimension synchronization. ===== See Also ===== * [[scd_type_1_vs_scd_type_2|SCD Type 1 vs SCD Type 2]] * [[scd_type_2|SCD Type 2]] * [[rapid_application_development|Rapid Application Development]] ===== References =====