Slowly
Changing Dimensions demystified
The attributes of dimension for, time being, can assumed to
be constant and not changing but in reality the attributes do change hence
while designing the dimensional model one must consider the possibility to
incorporate such changes into the model without causing data conflicts.
Type 1: Overwrite
In this SCD, the new value overwrites the previous one. This
is the simplest way of changing the attribute of previously recorded data but
one loses the history of change. In such a case a retrospective analyses
becomes very difficult.
Original Row
Product_ID
|
Product_Name
|
Supplier
|
1234
|
ABC
|
Q1 Inc.
|
Updated row
Product_ID
|
Product_Name
|
Supplier
|
1234
|
ABC
|
Q2 Inc.
|
Type 2: Adding new
row
This method preserves the history of change. There can be
two ways to implement such a structure
1.
By adding
another column that defines the version: The version
value defines the current state of that row for example Version 0 can be
attributed to an expired row and Version 1 can be attributed to current row.
Product_ID
|
Product_Name
|
Supplier
|
Version
|
1234
|
ABC
|
Q1 Inc.
|
0
|
1235
|
ABC
|
Q2 Inc.
|
1
|
2.
By Adding
Effective and expired date column: The effective and expires dates define
the window during which the row would serve effective.
Product_ID
|
Product_Name
|
Supplier
|
Start Date
|
End Date
|
Current State
|
1234
|
ABC
|
Q1 Inc.
|
1-Jan-14
|
15-Feb-14
|
Expired
|
1235
|
ABC
|
Q2 Inc.
|
16-Feb-14
|
31-Dec-99
|
Current
|
Type 3: Add new attribute.
This
method adds new column. In Type 1 and 2 we see that the complete structure of
data remains same where as in Type 3 only one attribute is added that reflects
the change. The advantage of such a method is that the fact table associated
with this dimension needs no modification as in practical the table structure
has remained the same it’s just a new attribute being added.
Product_ID
|
Product_Name
|
Supplier
|
Old_Supplier
|
1234
|
ABC
|
Q2 Inc.
|
Q1 Inc.
|
Type 4: Add Mini Dimension
There can be a situation where some attributes
are static while other are changing frequently in such a case adding a new row
for every change (Type1 / Type2) would make data very bulky. To tackle such a
situation a separate table (mini dimension) can be created which contains all the attributes those
changes with higher frequency. In such a situation, the original dimension and
mini dimension would point to same fact table.
In
the figure above, Vendor dimension contains attribute that are separated from
Product dimension as there were frequent changes in them.
Type 5 (Hybrid)
This
is the combination of Type 1 and Type 4. During type 4 we already prepared a
separate mini dimension of attributes that are changing frequently. Type 1
defines that data in that mini dimension will be overwritten and history will
not be stored. The outside schema of
Type 4 and 5 remains the same the difference would be that vendor dimension is
fast changing mini dimension and thus the changes get overwritten.
Type 6 (Hybrid)
It
is the combination of Type 1,2 and 3. The new row is added for any new change
and simultaneously the old row is edited, for example.
Original row in the dimension table
Product_ID
|
Product_Name
|
Supplier
|
Start Date
|
End Date
|
Current State
|
1234
|
ABC
|
Q1 Inc.
|
1-Jan-14
|
31-Dec-99
|
Current
|
New row added
Product_ID
|
Product_Name
|
Supplier
|
Start Date
|
End Date
|
Current State
|
1234
|
ABC
|
Q1 Inc.
|
1-Jan-14
|
15-Feb-14
|
Expired
|
1235
|
ABC
|
Q2 Inc.
|
16-Feb-14
|
31-Dec-99
|
Current
|
No comments:
Post a Comment