Friday, March 14, 2014

Understanding Bridge Dimension table

Understanding Bridge Dimension



In this blog I would try to explain what bridge dimensions are, where are they used and in what situation.
The star schema defines how the dimensions relate to the fact table. The cardinality is 1:N from dimension to fact table. The structure of the schema remains coherent till the point where 1:N cardinality is followed. When there exists an N:N relation between the fact and dimension then it causes the issue in relational integrity of the schema.
Imagine two situations
1.       Salesman selling an account: The fact table will have the account number and salesman ID.
2.       Relation between account and customer dimensions
Situation 1 has cardinality is 1:N i.e. one sales man can sell multiple account but one account number can have only one salesman associated with it. Whereas, situation 2 one account can have multiple customers and one customer can have multiple accounts and this type of cardinality is termed as N:N.
Now, N:N relationship creates a problem when such dimensions are directly connected to a star schema. Such as 


What is the problem with N:N
Just imagine the above schema to be correct, the fact table will then have rows indicating for an account. If that account has multiple customers then the same account number will be repeated times the customer count that account has. For example

 
In this table above we see Account 1 has three customers and customer 1 has two accounts and each is of different type. This creates an issue when running the aggregate query and finding out which customer has what account balance remaining.
One way to solve this discrepancy is to convert the scheme to 1:N or N:1 cardinality.
To implement such a schema bridge dimension is used.

Bridge dimension: This is the dimension which will be a bridge between customer and account dimension and would relate to each of the dimension with 1:N relation. 

 

Customer Dimension Attributes
·         Customer_ID
·         Customer_Name
·         Customer_Current_add
·         Customer_DOB……and more….


Account Dimension
·         Account_ID
·         Account_Type
·         Account_Status
·         Account_Current_Bal

Cust_Acct_Bridge_Dimension
·         Customer_ID
·         Account_ID
We notice here that Customer dimension and account dimension each is related to bridge dimension via cust_acct_bridge dimension with 1:N cardinality. And account table is attached to fact table via star schema in 1:N relation

References:

Saturday, March 1, 2014

Slowly Changing Dimensions demystified



Slowly Changing Dimensions demystified


The attributes of dimension for, time being, can assumed to be constant and not changing but in reality the attributes do change hence while designing the dimensional model one must consider the possibility to incorporate such changes into the model without causing data conflicts.

Type 1: Overwrite
In this SCD, the new value overwrites the previous one. This is the simplest way of changing the attribute of previously recorded data but one loses the history of change. In such a case a retrospective analyses becomes very difficult.
Original Row
Product_ID
Product_Name
Supplier
1234
ABC
Q1 Inc.

Updated row
Product_ID
Product_Name
Supplier
1234
ABC
Q2 Inc.


Type 2: Adding new row
This method preserves the history of change. There can be two ways to implement such a structure
1.       By adding another column that defines the version:  The version  value defines the current state of that row for example Version 0 can be attributed to an expired row and Version 1 can be attributed to current row.
Product_ID
Product_Name
Supplier
Version
1234
ABC
Q1 Inc.
0
1235
ABC
Q2 Inc.
1

2.       By Adding Effective and expired date column: The effective and expires dates define the window during which the row would serve effective.
Product_ID
Product_Name
Supplier
Start Date
End Date
Current State
1234
ABC
Q1 Inc.
1-Jan-14
15-Feb-14
Expired
1235
ABC
Q2 Inc.
16-Feb-14
31-Dec-99
Current


Type 3: Add new attribute.
This method adds new column. In Type 1 and 2 we see that the complete structure of data remains same where as in Type 3 only one attribute is added that reflects the change. The advantage of such a method is that the fact table associated with this dimension needs no modification as in practical the table structure has remained the same it’s just a new attribute being added.

Product_ID
Product_Name
Supplier
Old_Supplier
1234
ABC
Q2 Inc.
Q1 Inc.

Type 4: Add Mini Dimension
 There can be a situation where some attributes are static while other are changing frequently in such a case adding a new row for every change (Type1 / Type2) would make data very bulky. To tackle such a situation a separate table (mini dimension) can be created   which contains all the attributes those changes with higher frequency. In such a situation, the original dimension and mini dimension would point to same fact table.


In the figure above, Vendor dimension contains attribute that are separated from Product dimension as there were frequent changes in them.

Type 5 (Hybrid)
This is the combination of Type 1 and Type 4. During type 4 we already prepared a separate mini dimension of attributes that are changing frequently. Type 1 defines that data in that mini dimension will be overwritten and history will not be stored.  The outside schema of Type 4 and 5 remains the same the difference would be that vendor dimension is fast changing mini dimension and thus the changes get overwritten. 


Type 6 (Hybrid)
It is the combination of Type 1,2 and 3. The new row is added for any new change and simultaneously the old row is edited, for example.

Original row in the dimension table
Product_ID
Product_Name
Supplier
Start Date
End Date
Current State
1234
ABC
Q1 Inc.
1-Jan-14
31-Dec-99
Current

New row added
Product_ID
Product_Name
Supplier
Start Date
End Date
Current State
1234
ABC
Q1 Inc.
1-Jan-14
15-Feb-14
Expired
1235
ABC
Q2 Inc.
16-Feb-14
31-Dec-99
Current