Dimensional Modeling Schema

7/24/2019 Dimensional Modeling Schema

1/23

Dimensional Modeling Schema

Written by DWBIConcepts TeamLast Updated: 19 September 2014

Now that we know thebasic approach to do dimensional modelingfrom our earlier article,

let us spend some time to understand various possible schema in dimensional modeling.

Requirement of different design schema

In Dimensional modeling, we can create different schema to suit our requirements. We need

various schema to accomplish several things like accommodating hierarchies of a dimension

or maintaining change histories of information etc. In this article we will discuss about 3

different schema, namely - Star, Snowflake and Conformed and we will also discuss how

hierarchical information are modeled in these schemata. We will reserve the discussion on

maintaining change histories for our next article.

Storing hierarchical information in dimension tables

From our previous article, we already know what is a dimension. Simply put, a dimension is

something that qualifies a measure (number). For example, if I say, "McDonalds sell 5000" -

that won't make any sense. But if I say, "McDonalds sell 5000 burgers per month" - then

that would make perfect sense. Here, "burger" and "month" are the members of dimensions

and they are qualifying the number 5000 in this sentence.

It is important to notice that "burger" and "month" are not dimension themselves - they are

just the members of the dimensions "food" and "time" respectively. "Burger" is just one of

many different "food" that McDonalds sell and "month" is just one of different units by which

time is measured. Typically a dimension will have several members and those members will

be stored in separate rows in the dimension table. So the "food" dimension table of

McDonalds will have one row for burger, one row for fries, one row for "drinks" etc.

Similarly, "time" dimension may contain 12 different months as the members of that

dimension.

Often we may find that there are hierarchical relations among the members of a dimension.

That is certain members of the dimension can be grouped under one group whereas other

members can be grouped into a separate group. Consider this - french fries and twister fries

both are "fries" and hence can be grouped under the same group "fries". Similarly chicken

burger and fish burger both can be grouped as "burger".
http://dwbi.org/data-modelling/dw-design/1-dimensional-modeling-guide.htmlhttp://dwbi.org/data-modelling/dw-design/1-dimensional-modeling-guide.htmlhttp://dwbi.org/data-modelling/dw-design/1-dimensional-modeling-guide.htmlhttp://dwbi.org/data-modelling/dw-design/1-dimensional-modeling-guide.html


2/23

French Fries Twister Fries

This type of hierarchical relations can be stored in the model by following two different

approaches. We can either store them in the same "food" dimension table (star schema

approach) or we can create a separate dimension table in addition to "food" dimension -

just to store the type of the foods (snowflake schema approach).

STAR SCHEMA DESIGN

Star schema is the most simple kind of schema where one fact table is present in the center

of the schema surrounded by multiple dimension tables.

In a star schema all the dimension tables are connected only with the fact table and no

dimension table is connected with any other dimension table.


3/23

Benefit of Star Schema Design

Star schema is probably most popular schema in dimensional modeling because of its

simplicity and flexibility. In a Star schema design, any information can be obtained just by

traversing a single join, which means this type of schema will be ideal for information

retrieval (faster query processing). Here, note that all the hierarchies (or levels) of the

members of a dimension are stored in the single dimension table - that means, lets say if

you wish to group (veggie burger and chicken burger) in "burger" category and (french fries

and twister fries) in "fries" category, you have to store that category information in the

same dimension table.

Star schema provides a de-normalized design

Storing Hierarchy in star schemaAs depicted above, we will store hierarchical information in a flattened pattern in the single

dimension table in star schema. So our food dimension table will look like this: Food

KEY NAME TYPE

1 Chicken Burger Burger

2 Veggie Burger Burger

3 French Fries Fries

4 Twister Fries Fries

SNOW-FLAKE SCHEMA DESIGN

Snow flake schema is just like star schema but the difference is, here one or moredimension tables are connected with other dimension table as well as with the central fact

table. See the example of snowflake schema below.

Here we are storing the information in two dimension tables instead of one. We are storing

the food type in one dimension ("type" table as shown below) and food in other dimension.

This is a snowflake design. Type


4/23

KEY TYPE_NAME

1 Burger

2 Fries

Food

KEY TYPE_KEY NAME

1 1 Chicken Burger

2 1 Veggie Burger

3 2 French Fries

4 2 Twister Fries

If you are familiar with the concept of data normalization, you can understand that snow

flaking actually increase the level of normalization in the data. This has obvious

disadvantage in terms of information retrieval since we need to read more tables (and

traverse more SQL joins) in order to get the same information. Example, if you wish to find

out all the food, food type sold from store 1, the SQL queries from star and snowflake

schemata will be like below:


5/23

SQL Query For Star Schema

SELECT DISTINCT f.name, f.type

FROM food f, sales_fact t

WHERE f.key = t.food_key

AND t.store_key = 1

SQL Query For SnowFlake Schema

SELECT DISTINCT f.name, tp.type_name

FROM food f, type tp, sales_fact t

WHERE f.key = t.food_key

AND f.type_key = tp.key

AND t.store_key = 1

As you can see in this example, compared to star schema, snowflake schema requires one

more join (to connect one more table) to retrieve the same information. This is why

snowflake schema is not good performance wise.


6/23

Then why do we use snowflake schema? Let me give a quick and short answer to that. I

won't explain it in detail right now but I will leave it to you for your comprehension. The

reason we do it is, suppose we have another fact table with granularity store, food type and

day. This fact will use the key of "type" dimension table instead of "food" dimension table.

Unless you have this dimension table in your schema, you won't get the "type" key. This is

the reason we need to snowflake the "food" dimension to "type" dimension.

In our next article we will talk aboutpreserving history in dimension tables (slowly or

rapidly changing dimensions etc.).

History Preserving in Dimensional Modeling

Written by DWBIConcepts Team

Last Updated: 15 September 2014

In our earlier article we have seenhow to design a simple dimensional data modelfor a

point-of-sale system (as an example we took the case of McDonald's fast-food shop). In this

article we will begin with the same model and we will see how we may enhance the model

to store historical changes in the attributes of dimension table.

Nothing Lasts Forever

One of the important objectives while doing data modeling is, to develop a model which can

capture the states of the system with respect to time. You know, nothing lasts forever!

Product prices change over time, people change their addresses, marital status, employers

and even their names. If you are doing data modeling for a data warehouse where we

are particularly interested about historical analysis - it is crucial that we develop some

method of capturing these changes in our data model. As an example, let's say we store the

price of products in the "Food" dimension table that we created earlier and we want to be

able to capture the historical changes in "Food" price. In this article we will see what change

we need to do in our data model to be able to do this.

Note: The simple "Food" dimension we created earlier did not have any "Price" information.

But to illustrate the point of this article, we will add a "price" column to our "Food"

dimension table. So henceforth our "Food" dimension table will look like this: Food

KEY NAME TYPE_KEY PRICE

1 Chicken Burger 1 3.70
http://dwbi.org/data-modelling/dimensional-model/18-history-preserving-in-dimensional-modeling.htmlhttp://dwbi.org/data-modelling/dimensional-model/18-history-preserving-in-dimensional-modeling.htmlhttp://dwbi.org/data-modelling/dimensional-model/18-history-preserving-in-dimensional-modeling.htmlhttp://dwbi.org/data-modelling/dimensional-model/18-history-preserving-in-dimensional-modeling.htmlhttp://dwbi.org/data-modelling/dimensional-model/1-dimensional-modeling-guide.htmlhttp://dwbi.org/data-modelling/dimensional-model/1-dimensional-modeling-guide.htmlhttp://dwbi.org/data-modelling/dimensional-model/1-dimensional-modeling-guide.htmlhttp://dwbi.org/data-modelling/dimensional-model/1-dimensional-modeling-guide.htmlhttp://dwbi.org/data-modelling/dimensional-model/18-history-preserving-in-dimensional-modeling.htmlhttp://dwbi.org/data-modelling/dimensional-model/18-history-preserving-in-dimensional-modeling.html


7/23


2 VeggieBurger 1 3.20

3 French Fries 2 2.00

4 Twister Fries 2 2.20

In case if you have not read my previous article and wondering what "TYPE_KEY" means,

this is a foreign key coming from one other table that contains the type of the food e.g.,

Burger, Fries etc. Also notice, above table only tells us the price of the food as of current

point in time. It does not tell us what the price was, let's say 6 months ago. If the price of

Veggie Burger changes from $3.20 to $3.25 tomorrow, the new price will be updated in the

table and then we will have no way to know what was the earlier price. So our objective isto change the above table structure in such a way so that we can store all the historical and

future prices of the foods.

Types of Changing Dimensions

There are a few different ways to store the historical changes of values in data model. And

any particular way you want to adopt will depend on the typeof changing dimension. For

example, some dimensions can change quite rapidly, some dimensions do not change at all

but most dimensions change very slowly. That is why we can differentiate dimensions in

these 3 types depicted below.

Unchanging Dimension

There are some dimensions that do not change at all. For example, let's say you have

created a dimension table called "Gender". Below are the structure and data of this

dimension table: Gender

ID VALUE

1 Male

2 Female

The "Value" column in the above dimension is the attribute of this table that won't normally

change. This is an unchanging dimension - "male" will be always called "male" and "female"


8/23

will be always called "female". Off course, for some crazy reason, one may wish to change

the texts "Male"/"Female" to something else e.g. "man"/"woman". But that's really not a

change that we should be concerned about as such changes do not alter the "meaning" of

the attribute (the words man/male still mean the same thing). So if some changes need to

be done, we can simply update the "Value" column in dimension table. For all practical

intent and purpose, this dimension remains as an "Unchanging dimension".

Slowly Changing Dimension

Here comes the most popular dimension - "slowly changing dimension". These are the

dimensions where one or more attributes can change slowly with respect to time. Look at

the "food" dimension from our earlier example. "Price" is one such attribute which is

variable in this dimension. But "price" of french fries or burgers do not change very often,

may be they change once in a season. This is an example of slowly changing dimension.

Let me give you one more example. Let's say you have created a dimension table onemployees. And in the "employee" dimension you have a column called "Marital_Status".

This can definitely change (from unmarried to married for example) with respect to time.

But again, like the previous example, this is a slowly changing attribute. Doesn't change so

often.

Later in the article, we will see how to make necessary changes in our dimension table

design to store history for such slowly changing dimensions.

Rapidly Changing Dimensions

If you design a dimension table that has a rapidly changing attribute, then your dimension

table will become rapidly changing dimension.

As for example, let's say you have a "Subscriber" dimension where you store the details of

all the subscribers to a particular pre-paid mobile service plan. You have a "status" column

in the "Subscriber" dimension table which can have several different values based on the

current account balance of the subscriber. For example, if your balance is less than $0.1,

the status becomes "No Outgoing call". If your balance is less than $5, the status becomes

"Restricted Call Service". If your balance is less than $10, the status becomes "No Long

Distance Call" and if the balance is greater than $10 then status becomes "Full Service",

etc. Every month, the status of any subscriber keeps on changing multiple times based onhis or her account balance thereby making the "Subscribers" dimension one rapidly

changing dimension.

One must remember the way we design a rapidly changing dimension is often quite

different from the way we design a slowly changing dimension. In the next article however,

we will only look intodesigning of slowly changing dimension.
http://dwbi.org/data-modelling/dimensional-model/19-modeling-for-various-slowly-changing-dimension.htmlhttp://dwbi.org/data-modelling/dimensional-model/19-modeling-for-various-slowly-changing-dimension.htmlhttp://dwbi.org/data-modelling/dimensional-model/19-modeling-for-various-slowly-changing-dimension.htmlhttp://dwbi.org/data-modelling/dimensional-model/19-modeling-for-various-slowly-changing-dimension.html


9/23

Dimensional Modeling Approach for Various SlowlyChanging Dimensions

Written by DWBIConcepts TeamLast Updated: 15 September 2014

In our earlier article we have discussed theneed of storing historical information in

dimensional tables.We have also learnt about various types of changing dimensions. In this

article we will pick "slowly changing dimension" only and learn in detail about various types

of slowly changing dimensions and how to design them.

Slowly changing dimensions, referred as SCD henceforth, can be modeled basically in 3

different ways based on whether we want to store full histories, partial histories or no

history. These different types are called Type 2, Type 3 and Type 1 respectively. Next we

will learn them in detail.

Also note, there are slight variations to the basic 3 SCD types that I show here. These

variations (sometimes labelled as type 4, 5, 6, 7 etc.) are mostly in terms of

implementation and use-cases. Don't worry about them now.

SCD Type 1

As mentioned above, we design a dimension as SCD type 1 when we do not want to store

the history. That is, whenever some values are modified in the attributes, we just want to

update the old values with the new values and we do notcare about storing the previous

history.

We do not store any history in SCD Type 1

Please mind, this is not same as "Unchanged Dimension" discussed in the previous article.

In case of an unchanged dimension, we assume that the values of the attributes of that

dimension will not change at all. On the other hand, here in case of a SCD Type 1

dimension, we assume that the values of the attributes will change slowly, however, we are

not interested to store those changes. We are only interested to store the current or latest

value. So every time it changes we will update the old value with new ones.

Handling SCD Type 1 Dimension in ETL Process

Technically, from ETL design perspective (Now, if you don't know what is ETL, you don't

have to bother about this paragraph - you can go to the next section) SCD Type 1

dimensions are loaded using "Merge" operation which is also known as "UPSERT" as an

abbreviation of "Update else Insert".
http://dwbi.org/data-modelling/dimensional-model/18-history-preserving-in-dimensional-modeling.htmlhttp://dwbi.org/data-modelling/dimensional-model/18-history-preserving-in-dimensional-modeling.htmlhttp://dwbi.org/data-modelling/dimensional-model/18-history-preserving-in-dimensional-modeling.htmlhttp://dwbi.org/data-modelling/dimensional-model/18-history-preserving-in-dimensional-modeling.htmlhttp://dwbi.org/data-modelling/dimensional-model/18-history-preserving-in-dimensional-modeling.htmlhttp://dwbi.org/data-modelling/dimensional-model/18-history-preserving-in-dimensional-modeling.html


10/23

SCD Type 1 dimensions are loaded by Merge operations

In "UPSERT" method, each row coming from the source is compared will all the records

present in the target dimension table based on the natural key and checked if the sourcerecord already exists in the target or not. If the row exists in the target, the target row is

updated with new values coming from source system. However if the row is not present in

the target system, the source row is inserted in the target table.

In pure ANSI SQL syntax, there is a particular statement that help you achieve the UPSERT

operation. It's called "MERGE" statement

MERGE INTO Target_Dimension_Table tgt

USING source_table src

ON

tgt.natural_key = src.natural_key

WHEN MATCHED THEN

UPDATE

SET tgt.column1 = src.value1,

tgt.column2 = src.value2, ...

WHEN NOT MATCHED THEN

INSERT ( tgt.column1 , tgt.column2 ...)

VALUES ( src.value1 , src.value2 ...)

As obvious from this example, you have to store the natural key of the data in the target

dimension table in order to perform this comparison. Later, I will write a separate article on

ETL architecture design, where I will talk about this in more detail. But from a modeling

perspective, please note that as a data modeler you should add one extra column in your

target dimension table as a place holder to store the natural key of the data.

SCD Type 2

Arguably, this is the most popular type of slowly changing dimensions. So we will try to

learn this as clearly as possible.


11/23

Let me come one step backward here and remind you again about what is our objective

here. As you can recall, in the previous articles we have learnt how the values of the

attributes (or columns) in the dimension table change with time. We are trying to store the

histories of such changes for the purpose of analysis.

In Type 1, we were not storing any history. However, now we are going to learn how may

we design a dimension table so that we can store the full history and always extract the

history of changes as and when we require that. We will take our "Food" dimension table as

an example here, where "Price" is a variable factor. Food


1 Chicken Burger 1 3.70

2 VeggieBurger 1 3.20

3 French Fries 2 2.00

4 Twister Fries 2 2.20

Design of SCD Type 2 Dimension

In order to design the above table as SCD Type 2, we will have to add 3 more columns inthis table, "Date From", "Date To" and "Latest Flag". These columns are called type 2

metadata columns. See below: Food

KEY NAME TYPE_KEY PRICE DATE_FROM DATE_TO LATEST_F

1 Chicken Burger 1 3.70 01-Jan-11 31-Dec-99 Y

2 VeggieBurger 1 3.20 01-Jan-11 31-Dec-99 Y

3 French Fries 2 2.00 01-Jan-11 31-Dec-99 Y

4 Twister Fries 2 2.20 01-Jan-11 31-Dec-99 Y

Notice here, how the values of these 3 new columns are populated. In the very beginning,

when any new record is loaded in the table, we automatically default the values of "date


12/23

from" to the date of the day of the loading, "Date To" to some far future date (e.g., 31st

December 2099) and "Latest Flag" to "Y".

What is the meaning of these 3 metadata columns?

These 3 columns basically tell us whether a particular record in the table is latest or not and

what is the time period during which the record was latest (Also known as active period).

For example, data in the above table basically says that all the 4 records are latest (active)

and they are active from the day of loading (in this case 1st January 2011) until an

indefinite future date (31st December 2099).

But how does these columns help us store the change history?

Lets assume, today is 15 March 2011, and McDonald has decided to increase the price of"Veggie Burger" from $3.20 to $3.25. If this happens we will not straight away update the

price from $3.20 to $3.25. Instead to store this new information (and also the old

information), we will insert a new record in the "Food" dimension table which will look like

below: Food


1 Chicken Burger 1 3.70 01-Jan-11 31-Dec-99 Y

2 VeggieBurger 1 3.20 01-Jan-11 14-Mar-11 N



5 VeggieBurger 1 3.25 15-Mar-11 31-Dec-99 Y

Observe the change in the records with Key 2 and 5. Record 2, which was the originalrecord for the veggie burger, has now got updated as its latest flag has become 'N' and

"Date To" column value has changed to "14-Mar-2011". This means, Record 2 is no longer

latest or active (Latest Flag = "N") and it was active earlier during the period 1st Jan

2011(Date From) to 14 Mar 2011(Date To).

So, if Record 2 is not active, what is the latest record for "Veggie Burger" now? Record 5!

Its latest flag is set to "Y" and it says that that the record is active since 15 March 2011.


13/23

This record will remain active many years in the far-off future (until 31 Dec 2099) or at

least unless a new record is inserted again with latest flag Y and this record is updated

again with Latest Flag N. So next time again, let's say on 20 Dec 2011, McDonalds again

decide to change the price of Veggie Burger back to $3.20 and increase the price of the

chicken burger from $3.70 to $3.90, we will see 2 more new records in the table as

below: Food


1 Chicken Burger 1 3.70 01-Jan-11 19-Dec-11 N

2 VeggieBurger 1 3.20 01-Jan-11 14-Mar-11 N



5 VeggieBurger 1 3.25 15-Mar-11 19-Dec-11 N

6 Chicken Burger 1 3.80 20-Dec-11 31-Dec-99 Y

7 VeggieBurger 1 3.20 20-Dec-11 31-Dec-99 Y

As you can see from the design above, it is now possible to go back to any date in the

history and figure out what was the value of the "Price" attribute of "Food" dimension at

that point in time.

Surrogate key for SCD Type 2 dimension

Note from the above example that, each time we generate a new row in the dimension

table, we also assign a new key to the record. This is the key that flows down to the fact

table in a typical Star schema design. The value of this key, that is the numbers like 1, 2, 3,

. , 7 etc. are not coming from the source systems. Instead those numbers are just like

sequential running numbers which are generated automatically at the time of inserting

these records. These numbers are unique, so as to uniquely identify each record in the

table, and are called "Surrogate Key" of the table.

As obvious, multiple surrogate keys may be related to the same item, however, each key

will relate to one particular state of that item in time. In the above example, keys 2, 5 and 7

are all linked to "Veggie Burger" but they represent the state of the record in 3 different


14/23

time spans. It's worth noting that there would be only one record with latest flag = "Y"

among multiple records of the same item.

Alternate Design of SCD Type 2: Addition of Version

Number

A slight variation of design of SCD Type 2 dimension is possible where we can store the

version numbers of the records. The initial record will be called version 1 and as and when

new records are generated, we will increment the version number by 1. In this design

pattern, the records with highest version will always be the latest record. If we utilize this

design in our earlier example, the dimension table will look like this:Food

KEY NAME TYPE_KEY PRICE DATE_FROM DATE_TO VERS

1 Chicken Burger 1 3.70 01-Jan-11 19-Dec-11 1

2 VeggieBurger 1 3.20 01-Jan-11 14-Mar-11 1

3 French Fries 2 2.00 01-Jan-11 31-Dec-99 1

4 Twister Fries 2 2.20 01-Jan-11 31-Dec-99 1

5 VeggieBurger 1 3.25 15-Mar-11 19-Dec-11 2

6 Chicken Burger 1 3.80 20-Dec-11 31-Dec-99 2

7 VeggieBurger 1 3.20 20-Dec-11 31-Dec-99 3

Off course, we can also keep the "Latest Flag" column in the above table if we wish.

Handling SCD Type 2 Dimension in ETL ProcessAgain, if you do not know what is ETL - you can safely skip this section. But if you have

some ETL background then I suppose you have already pin-pointed the fact that, unlike

SCD Type 1, Type 2 requires you to insert new records in the table as and when any

attribute changes. This is obviously different from SCD Type 1. Because in case of SCD Type

1, we were only updating the record. But here, we will need to update old record (e.g.


15/23

changing the latest flag from "Y" to "N", updating the "Date To") as well as we will need to

insert a new record.

Like before, we can use the "natural key" to first compare if the source record is existing in

the target or not. If not, we will simply insert the record in the target with new surrogate

key. But if it already exists in the target, we will have to check if any value of the attributes

has changed between source and target - if not, we can ignore the source record. But if yes,

we will have to update the existing record as "N" and insert a new record with new

surrogate key. As I mentioned before, I will write a separate article on the ETL handling

later.

Performance Considerations of SCD Type 2 Dimension

SCD type 2, by design, tend to increase the volume of the dimension tables considerably.

Think of this: Let's say you have an "employee" dimension table which you have designed

as SCD Type 2. The employee dimensions has 20 different attributes and there are 10attributes in this table which change at least once in a year on average (e.g. employee

grade, manager's name, department, salary, band, designation etc.). This means if you

have 1,000 employees in your company, at the end of just one year, you are going to get

10,000 records in this dimension table (i.e. assuming on an average 10 attributes change

per year - resulting into 10 different rows in the dimension table).

As you can see, this is not a very good thing performance wise as this can considerably slow

down loading of your fact table as you will require to "look up" this dimension table during

your fact loading. One may argue that, even if we have 10,000 records, we will actually

have only 1,000 records with Latest_Flag = 'Y' and since we will only lookup records with

Latest_Flag = 'Y', the performance will not detoriate. This is not entirely true. While utilizingthe Latest_Flag = 'Y' filter may decrease the size of the lookup cache, but database will

generally need to do a full table scan (FTS) to identify latest records. Moreover, in many

cases ETL developer will not be able to make use of Latest_Flag = 'Y' column if the

transactional records do not always belong to the latest time (e.g. late arriving fact records

or loading fact table at later point in time - month end load / week end load etc.). In those

cases, putting latest_flag = 'Y' filter will be functionally incorrect as you should determine

the correct return key on the basis of "Date To", "Date From" columns. (If you do not

understand what I am talking about in this para, just ignore me for now. I am going to

explain these things later in some other article)

SCD Type 3

As I mentioned before, type 3 design is used to store partial history. Although theoretically

it is possible to use the type 3 design to store full history, that would be not possible

practically. So, what is type 3 design? In Type 2 design above, we have seen that whenever


16/23

the values of the attributes change, we insert new rows to the table. In case of type 3,

however, we add new column to the table to store the history.

So let's say, we have a table where we have 2 column initially - "Key" and "attribute".

KEY ATTRIBUTE

1 A

2 B

3 C

If the record 1 changes its attribute from A to D, we will add one extra column to the table

to store this change.

KEY ATTRIBUTE ATTRIBUTE_OLD

1 D A

2 B

3 C

If the record again change attribute values, we will again have to add columns to store the

history of the changes

KEY ATTRIBUTE ATTRIBUTE_OLD ATTRIBUTE_OLD_1

1 E D A

2 B

3 C

Isn't then SCD Type 3 very cumbersome?


17/23

As you can see, storing the history in terms of changing the structure of the table in this

way is quite cumbersome and after the attributes are changed a few times the table will

become unnecessarily big and fat and difficult to manage. But that does not mean SCD Type

3 design methodology is completely unusable. In fact, it is quite usable in a particular

circumstance - where we just need to store the partial history information.

Let's think about a special circumstance where we only need to know the "current value"

and "previous value" of an attribute. That is, even though the value of that attribute may

change numerous times, at any time we are only concerned about its current and previous

values. In such circumstances, we can design the table as type 3 and keep only 2 columns -

"current value" and "previous value" like below.

KEY CURRENT_VALUE PREVIOUS_VALUE

1 D A

2 B

3 C

I can't find a very good example of this scenario right away, however, I can give you one

example from one of my previous projects in telecom domain, wherein a certain calculated

field in the report used to depend on the latest and previous values of the customer status.

That calculated attribute was called "Churn Indicator" (churn in telecom business generally

means leaving a telephone connection) and the rule to populate the churn indicator was (in

a very very simplified way) like below:

Churn Indicator

= "Voluntary Churn"

(if customer's current status = 'Inactive' and previous status = 'Active')

= "Involuntary Churn",

(if customer's current status = 'Inactive' and previous status = 'Suspended')

As you can guess, in order to find out the correct value of churn indicator, you do not need

to know complete history of changes of customer's status. All you need to know is the

current and previous status. In this kind of partial history scenario, SCD Type 3 design is

very useful.

Note here, compared to SCD Type 2, type 3 does not increase the number of records in the

table thereby easing out performance concerns.


18/23

Now that we have already learnt about slowly changing dimensions, next we will

discusshow to design "Rapidly Changing Dimension" or RCD

What are Slowly Changing Dimensions?

Slowly Changing Dimensions (SCD) - dimensions that change slowly over time, rather thanchanging on regular schedule, time-base. In Data Warehouse there is a needtotrackchangesin dimension attributes in order to reporthistorical data.In other words,implementing one of the SCD types should enable users assigning proper dimension'sattribute value for given date. Example of such dimensions could be: customer, geography,employee.

There are many approaches how to deal with SCD. The most popular are:

Type 0- The passive method Type 1- Overwriting the old value

Type 2- Creating a new additional record

Type 3- Adding a new column

Type 4- Using historical table

Type 6- Combine approaches of types 1,2,3 (1+2+3=6)Type 0- The passive method. In this method no special action is performed upondimensional changes. Somedimension datacan remain the same as it was first timeinserted, others may be overwritten.

Type 1- Overwriting the old value. In this method no history of dimension changes is kept

in the database. The old dimension value is simply overwritten be the new one. This typeis easy to maintain and is often use for data which changes are caused by processingcorrections(e.g. removal special characters, correcting spelling errors).

Before the change:

Customer_ID Customer_Name Customer_Type

1 Cust_1 Corporate

After the change:

Customer_ID Customer_Name Customer_Type1 Cust_1 Retail

Type 2- Creating a new additional record. In this methodology all history of dimensionchanges is kept in the database. You capture attribute change by adding a new row with anew surrogate key to the dimension table. Both the prior and new rows contain asattributes the natural key(or other durable identifier). Also 'effective date' and 'current
http://dwbi.org/data-modelling/dimensional-model/20-implementing-rapidly-changing-dimension.htmlhttp://dwbi.org/data-modelling/dimensional-model/20-implementing-rapidly-changing-dimension.htmlhttp://dwbi.org/data-modelling/dimensional-model/20-implementing-rapidly-changing-dimension.htmlhttp://datawarehouse4u.info/SCD-Slowly-Changing-Dimensions.htmlhttp://datawarehouse4u.info/SCD-Slowly-Changing-Dimensions.htmlhttp://datawarehouse4u.info/SCD-Slowly-Changing-Dimensions.htmlhttp://datawarehouse4u.info/SCD-Slowly-Changing-Dimensions.htmlhttp://datawarehouse4u.info/SCD-Slowly-Changing-Dimensions.htmlhttp://datawarehouse4u.info/SCD-Slowly-Changing-Dimensions.htmlhttp://datawarehouse4u.info/SCD-Slowly-Changing-Dimensions.htmlhttp://datawarehouse4u.info/SCD-Slowly-Changing-Dimensions.htmlhttp://datawarehouse4u.info/SCD-Slowly-Changing-Dimensions.htmlhttp://datawarehouse4u.info/SCD-Slowly-Changing-Dimensions.htmlhttp://datawarehouse4u.info/SCD-Slowly-Changing-Dimensions.htmlhttp://datawarehouse4u.info/SCD-Slowly-Changing-Dimensions.htmlhttp://dwbi.org/data-modelling/dimensional-model/20-implementing-rapidly-changing-dimension.html


19/23

indicator' columns are used in this method. There could be only one record with currentindicator set to 'Y'. For 'effective date' columns, i.e. start_date and end_date, the end_datefor current record usually is set to value 9999-12-31. Introducing changes to thedimensional model in type 2 could be very expensive database operation so it is notrecommended to use it in dimensions where a new attribute could be added in the future.

Before the change:

Customer_ID Customer_Name Customer_Type Start_Date End_Date Current_Flag

1 Cust_1 Corporate 22-07-2010 31-12-9999 Y

After the change:

Customer_ID Customer_Name Customer_Type Start_Date End_Date Current_Flag

1 Cust_1 Corporate 22-07-2010 17-05-2012 N

2 Cust_1 Retail 18-05-2012 31-12-9999 Y

Type 3- Adding a new column. In this type usually only the current and previous value ofdimension is kept in the database. The new value is loaded into 'current/new' column andthe old one into 'old/previous' column. Generally speaking the history is limited to thenumber of column created for storing historical data. This is the least commonly neededtechinque.

Before the change:

Customer_ID Customer_Name Current_Type Previous_Type

1 Cust_1 Corporate Corporate

After the change:

Customer_ID Customer_Name Current_Type Previous_Type

1 Cust_1 Retail Corporate

Type 4- Using historical table. In this method a separate historical table is used to trackall dimension's attribute historical changes for each of the dimension. The 'main' dimensiontable keeps only the current data e.g. customer and customer_history tables.

Current table:

Customer_ID Customer_Name Customer_Type

1 Cust_1 Corporate

Historical table:
http://datawarehouse4u.info/SCD-Slowly-Changing-Dimensions.htmlhttp://datawarehouse4u.info/SCD-Slowly-Changing-Dimensions.htmlhttp://datawarehouse4u.info/SCD-Slowly-Changing-Dimensions.html


20/23

Customer_ID Customer_Name Customer_Type Start_Date End_Date

1 Cust_1 Retail 01-01-2010 21-07-2010

1 Cust_1 Oher 22-07-2010 17-05-2012

1 Cust_1 Corporate 18-05-2012 31-12-9999

Type 6- Combine approaches of types 1,2,3 (1+2+3=6). In this type we have indimension table such additional columns as:

current_type - for keeping current value of the attribute. Allhistory recordsfor givenitem of attribute have the same current value.

historical_type - for keeping historical value of the attribute. All history records for givenitem of attribute could have different values.

start_date - for keepingstart dateof 'effective date' of attribute's history.

end_date - for keepingend dateof 'effective date' of attribute's history.

current_flag - for keeping information about the most recent record.

In this method to capture attribute change we add anew recordas in type 2. Thecurrent_type information is overwritten with the new one as in type 1. We store the historyin a historical_column as in type 3.

Customer_ID Customer_Name Current_Type Historical_Type Start_Date End_Date Current_Flag

1 Cust_1 Corporate Retail 01-01-2010 21-07-2010 N

2 Cust_1 Corporate Other 22-07-2010 17-05-2012 N

3 Cust_1 Corporate Corporate 18-05-2012 31-12-9999 Y

(C) 2008-2009 www.datawarehouse4u.info

All Rights ReservedJunk Dimension

In data warehouse design, frequently we run into a situation where there areyes/no indicator fields in the source system. Through business analysis, weknow it is necessary to keep such information in the fact table. However, ifkeep all those indicator fields in the fact table, not only do we need to buildmany small dimension tables, but the amount of information stored in the facttable also increases tremendously, leading to possible performance andmanagement issues.

Junk dimension is the way to solve this problem. In a junk dimension, wecombine these indicator fields into a single dimension. This way, we'll onlyneed to build a single dimension table, and the number of fields in the facttable, as well as the size of the fact table, can be decreased. The content inthe junk dimension table is the combination of all possible values of theindividual indicator fields.
http://datawarehouse4u.info/SCD-Slowly-Changing-Dimensions.htmlhttp://datawarehouse4u.info/SCD-Slowly-Changing-Dimensions.htmlhttp://datawarehouse4u.info/SCD-Slowly-Changing-Dimensions.htmlhttp://datawarehouse4u.info/SCD-Slowly-Changing-Dimensions.htmlhttp://datawarehouse4u.info/SCD-Slowly-Changing-Dimensions.htmlhttp://datawarehouse4u.info/SCD-Slowly-Changing-Dimensions.htmlhttp://datawarehouse4u.info/SCD-Slowly-Changing-Dimensions.htmlhttp://datawarehouse4u.info/SCD-Slowly-Changing-Dimensions.htmlhttp://datawarehouse4u.info/SCD-Slowly-Changing-Dimensions.htmlhttp://datawarehouse4u.info/SCD-Slowly-Changing-Dimensions.htmlhttp://datawarehouse4u.info/SCD-Slowly-Changing-Dimensions.htmlhttp://datawarehouse4u.info/SCD-Slowly-Changing-Dimensions.htmlhttp://stat.4u.pl/?maciejam1http://datawarehouse4u.info/SCD-Slowly-Changing-Dimensions.htmlhttp://datawarehouse4u.info/SCD-Slowly-Changing-Dimensions.htmlhttp://datawarehouse4u.info/SCD-Slowly-Changing-Dimensions.htmlhttp://datawarehouse4u.info/SCD-Slowly-Changing-Dimensions.html


21/23

Let's look at an example. Assuming that we have the following fact table:

In this example, TXN_CODE, COUPON_IND, and PREPAY_IND are allindicator fields. In this existing format, each one of them is a dimension. Usingthe junk dimension principle, we can combine them into a single junkdimension, resulting in the following fact table:

Note that now the number of dimensions in the fact table went from 7 to 5.

The content of the junk dimension table would look like the following:


22/23

In this case, we have 3 possible values for the TXN_CODE field, 2 possiblevalues for the COUPON_IND field, and 2 possible values for thePREPAY_IND field. This results in a total of 3 x 2 x 2 = 12 rows for the junkdimension table.

By using a junk dimension to replace the 3 indicator fields, we havedecreased the number of dimensions by 2 and also decreased the number offields in the fact table by 2. This will result in a data warehousing environmentthat offer better performance as well as being easier to manage.

Conformed Dimension

A conformed dimension is a dimension that has exactly the same meaningand content when being referred from different fact tables. A conformeddimension can refer to multiple tables in multiple data marts within the sameorganization. For two dimension tables to be considered as conformed, theymust either be identical or one must be a subset of another. There cannot beany other type of difference between the two tables. For example, twodimension tables that are exactly the same except for the primary key are notconsidered conformed dimensions.

Why is conformed dimension important? This goes back to thedefinition ofdata warehousebeing "integrated." Integrated means that even if a particularentity had different meanings and different attributes in the source systems,there must be a single version of this entity once the data flows into the datawarehouse.
http://www.1keydata.com/datawarehousing/data-warehouse-definition.htmlhttp://www.1keydata.com/datawarehousing/data-warehouse-definition.htmlhttp://www.1keydata.com/datawarehousing/data-warehouse-definition.htmlhttp://www.1keydata.com/datawarehousing/data-warehouse-definition.htmlhttp://www.1keydata.com/datawarehousing/data-warehouse-definition.htmlhttp://www.1keydata.com/datawarehousing/data-warehouse-definition.html


23/23

The time dimension is a common conformed dimension in an organization.Usually the only rule to consider with the time dimension is whether there is afiscal year in addition to the calendar year and the definition of a week.Fortunately, both are relatively easy to resolve. In the case of fiscal vs.calendar year, one may go with either fiscal or calendar, or an alternative is to

have two separate conformed dimensions, one for fiscal year and one forcalendar year. The definition of a week is also something that can be differentin large organizations: Finance may use Saturday to Friday, while marketingmay use Sunday to Saturday. In this case, we should decide on a definitionand move on. The nice thing about the time dimension is once these rules areset, the values in the dimension table will never change. For example,October 16th will never become the 15th day in October.

Not all conformed dimensions are as easy to produce as the time dimension.An example is the customer dimension. In any organization with some history,there is a high likelihood that different customer databases exist in differentparts of the organization. To achieve a conformed customer dimension meansthose data must be compared against each other, rules must be set, and datamust be cleansed. In addition, when we are doing incremental data loads intothe data warehouse, we'll need to apply the same rules to the new values tomake sure we are only adding truly new customers to the customerdimension.

Building a conformed dimension also part of the process inmaster datamanagement,or MDM. In MDM, one must not only make sure the master datadimensions are conformed, but that conformity needs to be brought back tothe source systems.
http://www.1keydata.com/datawarehousing/master-data-management.htmlhttp://www.1keydata.com/datawarehousing/master-data-management.htmlhttp://www.1keydata.com/datawarehousing/master-data-management.htmlhttp://www.1keydata.com/datawarehousing/master-data-management.htmlhttp://www.1keydata.com/datawarehousing/master-data-management.htmlhttp://www.1keydata.com/datawarehousing/master-data-management.html

Dimensional Modeling Schema

Documents

Transcript of Dimensional Modeling Schema