DWI and BI: March 2014

Friday, March 14, 2014

Understanding Bridge Dimension table

Understanding Bridge Dimension

In this blog I would try to explain what bridge dimensions are, where are they used and in what situation.

The star schema defines how the dimensions relate to the fact table. The cardinality is 1:N from dimension to fact table. The structure of the schema remains coherent till the point where 1:N cardinality is followed. When there exists an N:N relation between the fact and dimension then it causes the issue in relational integrity of the schema.

Imagine two situations

1. Salesman selling an account: The fact table will have the account number and salesman ID.

2. Relation between account and customer dimensions

Situation 1 has cardinality is 1:N i.e. one sales man can sell multiple account but one account number can have only one salesman associated with it. Whereas, situation 2 one account can have multiple customers and one customer can have multiple accounts and this type of cardinality is termed as N:N.

Now, N:N relationship creates a problem when such dimensions are directly connected to a star schema. Such as

What is the problem with N:N

Just imagine the above schema to be correct, the fact table will then have rows indicating for an account. If that account has multiple customers then the same account number will be repeated times the customer count that account has. For example

In this table above we see Account 1 has three customers and customer 1 has two accounts and each is of different type. This creates an issue when running the aggregate query and finding out which customer has what account balance remaining.

One way to solve this discrepancy is to convert the scheme to 1:N or N:1 cardinality.

To implement such a schema bridge dimension is used.

Bridge dimension: This is the dimension which will be a bridge between customer and account dimension and would relate to each of the dimension with 1:N relation.

Customer Dimension Attributes

· Customer_ID

· Customer_Name

· Customer_Current_add

· Customer_DOB……and more….

Account Dimension

· Account_ID

· Account_Type

· Account_Status

· Account_Current_Bal

Cust_Acct_Bridge_Dimension

· Customer_ID

· Account_ID

We notice here that Customer dimension and account dimension each is related to bridge dimension via cust_acct_bridge dimension with 1:N cardinality. And account table is attached to fact table via star schema in 1:N relation

References:

1. Advanced Dimension Patterns & Case Studies. (n.d.). Kimball Group. Retrieved March 14, 2014, from http://www.kimballgroup.com/2011/06/28/design-tip-136-adding-a-mini-dimension-to-a-bridge-table

2. I-Y. Song, W.Rowen, C. Medsker, E. Ewen 6-1 An Analysis of Many-to-Many Relationships Between Fact and Dimension Tables in Dimensional Modeling . (n.d.). drexcel.edu. Retrieved March 14, 2014, from http://intranet.cis.drexel.edu:8080/faculty/song/publications/p_Song_M_N_DMDW_final.pdf

3. DATAWAREHOUSE CONCEPTS. (n.d.). Bridge Table in Datawarehouse. Retrieved March 14, 2014, from http://dwhlaureate.blogspot.com/2012/10/bridge-table-in-datawarehouse.html

4. How-to: Bridge Tables and Many to Many Relationships Demystified in OBIEE 11g. (n.d.). ~ Ask John OBIEE. Retrieved March 14, 2014, from http://www.askjohnobiee.com/2013/08/how-to-bridge-tables-and-many-to-many.html

5. Solving Many-to-Many Relationship in Dimensional Modeling. (n.d.). Home. Retrieved March 14, 2014, from https://www.packtpub.com/article/solving-many-to-many-relationship-dimensional-modeling

6. Solving Many-to-Many Relationship in Dimensional Modeling. (n.d.). Home. Retrieved March 14, 2014, from https://www.packtpub.com/article/solving-many-to-many-relationship-dimensional-modeling

7. Implementing Many-to-many Relationships in Data Warehousing. (n.d.). Pythian Data Experts Blog Implementing Manytomany Relationships in Data Warehousing Comments. Retrieved March 14, 2014, from http://www.pythian.com/blog/implementing-many-to-many-relationships-in-data-warehousing

Saturday, March 1, 2014

Slowly Changing Dimensions demystified

Slowly Changing Dimensions demystified

The attributes of dimension for, time being, can assumed to be constant and not changing but in reality the attributes do change hence while designing the dimensional model one must consider the possibility to incorporate such changes into the model without causing data conflicts.

Type 1: Overwrite

In this SCD, the new value overwrites the previous one. This is the simplest way of changing the attribute of previously recorded data but one loses the history of change. In such a case a retrospective analyses becomes very difficult.

Original Row

Product_ID	Product_Name	Supplier
1234	ABC	Q1 Inc.

Updated row

Product_ID	Product_Name	Supplier
1234	ABC	Q2 Inc.

Type 2: Adding new row

This method preserves the history of change. There can be two ways to implement such a structure

1. By adding another column that defines the version: The version value defines the current state of that row for example Version 0 can be attributed to an expired row and Version 1 can be attributed to current row.

Product_ID	Product_Name	Supplier	Version
1234	ABC	Q1 Inc.	0
1235	ABC	Q2 Inc.	1

2. By Adding Effective and expired date column: The effective and expires dates define the window during which the row would serve effective.

Product_ID	Product_Name	Supplier	Start Date	End Date	Current State
1234	ABC	Q1 Inc.	1-Jan-14	15-Feb-14	Expired
1235	ABC	Q2 Inc.	16-Feb-14	31-Dec-99	Current

Type 3: Add new attribute.

This method adds new column. In Type 1 and 2 we see that the complete structure of data remains same where as in Type 3 only one attribute is added that reflects the change. The advantage of such a method is that the fact table associated with this dimension needs no modification as in practical the table structure has remained the same it’s just a new attribute being added.

Product_ID	Product_Name	Supplier	Old_Supplier
1234	ABC	Q2 Inc.	Q1 Inc.

Type 4: Add Mini Dimension

There can be a situation where some attributes are static while other are changing frequently in such a case adding a new row for every change (Type1 / Type2) would make data very bulky. To tackle such a situation a separate table (mini dimension) can be created which contains all the attributes those changes with higher frequency. In such a situation, the original dimension and mini dimension would point to same fact table.

In the figure above, Vendor dimension contains attribute that are separated from Product dimension as there were frequent changes in them.

Type 5 (Hybrid)

This is the combination of Type 1 and Type 4. During type 4 we already prepared a separate mini dimension of attributes that are changing frequently. Type 1 defines that data in that mini dimension will be overwritten and history will not be stored. The outside schema of Type 4 and 5 remains the same the difference would be that vendor dimension is fast changing mini dimension and thus the changes get overwritten.

Type 6 (Hybrid)

It is the combination of Type 1,2 and 3. The new row is added for any new change and simultaneously the old row is edited, for example.

Original row in the dimension table

Product_ID	Product_Name	Supplier	Start Date	End Date	Current State
1234	ABC	Q1 Inc.	1-Jan-14	31-Dec-99	Current

New row added

Product_ID	Product_Name	Supplier	Start Date	End Date	Current State
1234	ABC	Q1 Inc.	1-Jan-14	15-Feb-14	Expired
1235	ABC	Q2 Inc.	16-Feb-14	31-Dec-99	Current