Saturday, March 1, 2014

Slowly Changing Dimensions demystified



Slowly Changing Dimensions demystified


The attributes of dimension for, time being, can assumed to be constant and not changing but in reality the attributes do change hence while designing the dimensional model one must consider the possibility to incorporate such changes into the model without causing data conflicts.

Type 1: Overwrite
In this SCD, the new value overwrites the previous one. This is the simplest way of changing the attribute of previously recorded data but one loses the history of change. In such a case a retrospective analyses becomes very difficult.
Original Row
Product_ID
Product_Name
Supplier
1234
ABC
Q1 Inc.

Updated row
Product_ID
Product_Name
Supplier
1234
ABC
Q2 Inc.


Type 2: Adding new row
This method preserves the history of change. There can be two ways to implement such a structure
1.       By adding another column that defines the version:  The version  value defines the current state of that row for example Version 0 can be attributed to an expired row and Version 1 can be attributed to current row.
Product_ID
Product_Name
Supplier
Version
1234
ABC
Q1 Inc.
0
1235
ABC
Q2 Inc.
1

2.       By Adding Effective and expired date column: The effective and expires dates define the window during which the row would serve effective.
Product_ID
Product_Name
Supplier
Start Date
End Date
Current State
1234
ABC
Q1 Inc.
1-Jan-14
15-Feb-14
Expired
1235
ABC
Q2 Inc.
16-Feb-14
31-Dec-99
Current


Type 3: Add new attribute.
This method adds new column. In Type 1 and 2 we see that the complete structure of data remains same where as in Type 3 only one attribute is added that reflects the change. The advantage of such a method is that the fact table associated with this dimension needs no modification as in practical the table structure has remained the same it’s just a new attribute being added.

Product_ID
Product_Name
Supplier
Old_Supplier
1234
ABC
Q2 Inc.
Q1 Inc.

Type 4: Add Mini Dimension
 There can be a situation where some attributes are static while other are changing frequently in such a case adding a new row for every change (Type1 / Type2) would make data very bulky. To tackle such a situation a separate table (mini dimension) can be created   which contains all the attributes those changes with higher frequency. In such a situation, the original dimension and mini dimension would point to same fact table.


In the figure above, Vendor dimension contains attribute that are separated from Product dimension as there were frequent changes in them.

Type 5 (Hybrid)
This is the combination of Type 1 and Type 4. During type 4 we already prepared a separate mini dimension of attributes that are changing frequently. Type 1 defines that data in that mini dimension will be overwritten and history will not be stored.  The outside schema of Type 4 and 5 remains the same the difference would be that vendor dimension is fast changing mini dimension and thus the changes get overwritten. 


Type 6 (Hybrid)
It is the combination of Type 1,2 and 3. The new row is added for any new change and simultaneously the old row is edited, for example.

Original row in the dimension table
Product_ID
Product_Name
Supplier
Start Date
End Date
Current State
1234
ABC
Q1 Inc.
1-Jan-14
31-Dec-99
Current

New row added
Product_ID
Product_Name
Supplier
Start Date
End Date
Current State
1234
ABC
Q1 Inc.
1-Jan-14
15-Feb-14
Expired
1235
ABC
Q2 Inc.
16-Feb-14
31-Dec-99
Current

No comments:

Post a Comment