Further in Agility with Data Vault - trivadis.com · [email protected] . . Info-Tel. 0800 87 482...

12
Further in Agility with Data Vault MARTINO ADRIANO LS_BI 14/05/2013

Transcript of Further in Agility with Data Vault - trivadis.com · [email protected] . . Info-Tel. 0800 87 482...

Further in Agility with Data Vault MARTINO ADRIANO

LS_BI

14/05/2013

[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . Datum 07.07.2013 . Page 2 / 12

Contents

Why agility? ................................................................................................................. 3 1.

Where is the data vault modeling located in the global BI Architecture ................. 4 2.

Understanding Data Vault Modeling ......................................................................... 5 3.

Why this split? ............................................................................................................. 7 4.1) The core concept of the model is the hub! ...................................................................... 7

2) Business Associations ..................................................................................................... 7

3) Descriptive Data? ........................................................................................................... 8

How can this split increase agility of an EDW structure? .......................................... 9 5.4) Source system is changing .............................................................................................. 9

5) New descriptive information needed by the business .................................................... 10

6) Need historization only on some information ............................................................... 11

Conclusion ................................................................................................................. 11 6. Source and Links… ...................................................................................................... 12 7.

[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . Datum 07.07.2013 . Page 3 / 12

Why agility? 1.

Agility became one of the most important criteria of reporting information systems in

recent years. Data analysis has more and more importance to pilot a business the right

way and give the possibility to adapt the trends very fast. To do so, the reporting

information systems needs to be in-line with the present, be able to restitute the past

and the analysis of those moments give the ability to drive the future.

The concept of agility is linked to the present moment.

The source information systems are evolving very fast to answer the development of

business. For this reason the information systems have to adapt their structure to

restitute those changes with minimum latency.

For Humans, the adaptability is called Intelligence.

For Information systems, it’s called Agility.

I will try to explain in this article, how Data Vault can have an impact on the Agility of

an information system.

[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . Datum 07.07.2013 . Page 4 / 12

Where is the data vault modeling located in the global BI 2.

Architecture

Here is the most common architecture with the different modeling approaches on each

step.

Everything is based on the “operational systems (OPR)”, containing the real-time data:

all the transactions of the organization. Those systems are volatile and generally

modeled with 3rd normal form architecture.

The concept of the Enterprise Data Warehouse describes a way for information systems

to globalize and organize data across a complete organization. The Enterprise data

warehouse is the place were global enterprise data is stored.

Data Vault is an alternative to the dimensional modeling or 3NF in the Enterprise

Data warehouse (sometimes also called the “DWH Core”).

Data marts are used to answer a particular punctual business need for one or more

entities of an organization. This is not a global view of an enterprise but only a specific

business view. This part of the architecture is not impacted by the Data vault Modeling.

In Data Vault architecture, this layer is built on the top of the Data Vault EDW and

keeps its dimensional structure.

Schema1: represents the most common Data Warehouse

[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . Datum 07.07.2013 . Page 5 / 12

The Data vault modeling technique applies to the enterprise Data Warehouse not

to the Data Marts. Data marts keep dimensional modeling methodology.

Understanding Data Vault Modeling 3.

For a better understanding of data vault techniques, I believe that starting from

Dimensional design gives a better explanation of the concept. The same exercise can

be done from third normal form architecture.

The traditional Dimensional (Kimball,) way to design the data warehouse is to split the

business in 2 types of entities:

- Fact containing the numbers to measure the trends ( amounts, number of

clients, number of contracts)

- Dimensions containing the descriptive information (client, product, country, …)

In a dimension, we can find 2 types of information:

- Business key of the entity (in blue)

- Descriptive data and the history of the changes across time (in yellow)

Client_ID

Client_BK

Client_Name

Client_Address

History_Start_date

History_End_Date

Schema2: represents a dimension customer

[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . Datum 07.07.2013 . Page 6 / 12

In a fact table, we can find 3 types of information:

- Descriptive data (measures)

- Relations to the entities (in red)

- Business fact keys (degenerated dimension attributes in blue)

The idea of data vault modeling is to split those 3 concepts:

- Business keys are becoming HUBS

- Relations are becoming LINKS

- Descriptive Data are becoming SATELLITES

To understand how Data Vault can impact the Agility, I need to dive deeper into those

3 types of entity.

Client_FK

Product_FK

Time_FK

Amount

Sales_Order_Number

Schema3: represents the content of a fact table

[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . Datum 07.07.2013 . Page 7 / 12

Why this split? 4.

The idea of data vault is to split physically data that is not changing at the same

frequency.

The split of the business keys, links and descriptive data is done because those

concepts are not changing at the same frequency and for the same reasons.

1) The core concept of the model is the hub!

The Business keys stored in hubs have to be, not technical surrogate keys coming from

source systems, but the most describing and enterprise-wide business “code” for one

business entity. The physical structure of a hub doesn’t contain any link or any

descriptive information but only the business key!

Hubs are the core of Data Vault modeling. Every core business concept has his hub.

(sale, product, customer,..). The existence of a Hub is purely driven by the business!

Those core Business concepts are the pillars of an enterprise and if you change or

remove all of them, you simply change or remove the existence reason of the

enterprise.

If sale, customer, product are core business concepts for an enterprise, those Hubs can

add content (you will add new sales, new customers and new products) but very rarely

being delete or modified.

For example: If we talk about the product “AXYZ-1223”, all business entities will understand it and use it as unique business key to isolate this particular product. This Hub-product will always have a meaning for a particular enterprise that is selling … products. Even, if they add product types and they change sale strategy,

2) Business Associations

Business associations are modeled in physical structures called links.

Each business association is following business reality and represents natural business

associations between business core concepts. This association allows many to many

relations.

[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . Datum 07.07.2013 . Page 8 / 12

For example:

The product Hub contains only the Product business keys

The sale hub contains only the sale business keys

The customer hub contains only customer business keys

The link told us that there is a natural relationship between a sale, product and a

customer

3) Descriptive Data?

Descriptive data is stored in satellites.

Each Satellite contains descriptive information about one and only one Hub.

A satellite is always linked to a hub (a business key) and has no meaning without this

particular hub.

The descriptive information is:

- The most changing type of data (add, remove, change,…)

- The data on which we need to track changes for certain columns

Product HUB Sales HUB

Customer HUB

Link

Design of 3 Hubs joined by a link into the databases

[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . Datum 07.07.2013 . Page 9 / 12

For example:

This customer satellite contains the name, address, age and more of a certain customer

that is represented by one business key into his HUB.

So if we look at the figure above, the customer with the business key “CUS3457” has all

its description in the satellite customer. In this example, there is only one satellite but

there can be more than one. The content attributes of each Satellite are defined based

on the frequency of change of attributes (for ex: grouping in one customer satellite

SCD1 attributes, grouping in one customer satellite SCD2 attribute) or on the

functional meaning (customer address, customer Description).

How can this split increase agility of an EDW structure? 5.

Data Vault gives the possibility to adapt the model easier without impacting the historical

model.

The best way to explain Data Vault is to give examples.

CONCRETE CASES

4) Source system is changing

If your business needs are changing and you need to add a new concept “Shop” into

your EDW, you simply add a new HUB into your model and create a new link without

impacting the existing model and the existing reporting system. You will not have to

recreate and remap your old structure on the new one, the old structure stay in place

and keep history in its original structure. You will not have to retest you old structures

and there is no direct impact on your downstream processes, you have the time to

adapt them.

[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . Datum 07.07.2013 . Page 10 / 12

5) New descriptive information needed by the business

This request is very frequent; a business entity needs additional fields in the reporting.

Data Vault gives you the possibility to minimize impact on your existing reporting� for

example for the other entities. Instead of altering the existing satellite customer

structure, you will add a new satellite on the same customer Hub with the new

information. It’s very agile and has low impact.

Product HUB Sale HUB

Cutomer HUB

Old

Link

Shop HUB

New

Link

Customer HUB

Satellite

Customer

Satellite 2

Customer

[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . Datum 07.07.2013 . Page 11 / 12

6) Need historization only on some information

In this case, you will only historize the attribute that needs to be and store the other

without history. Imagine that you need different types of historization in parallel, you

will multiply your satellites.

Conclusion 6.

I believe that Data Vault Modeling is more agile then traditional way to think the

enterprise Data Warehouse.

It gives the opportunity to modify your EDWH without touching the existing way to

store history and gives easier solutions for the integration of the changes. The impact

on the downstream processes is low due to the easy possible ways to add or change

information without touching to the existing model.

This method has a real meaning when it is applied on changing environments. I mean,

when you have a stable infrastructure with low change request, or a single source to

integrate, it is a non-sense to add this type of layer between your source and data

marts. However, if your business is constantly evolving and agility is a requirement to

your information system, Data Vault can be a good choice for your EDWH Design.

Before the implementation of this kind of architecture, I think that time has to be spent

on defining Standards and global naming conventions; the idea of the split is more

agile but generates more objects (tables). When there is an evolution, a real thinking

Customer HUB

Satellite Customer

With history

Satellite Customer

Without history

[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . Datum 07.07.2013 . Page 12 / 12

about the “HOW to implement it” (considering the factors impact, business needs and

design) has to append, data vault gives you a lot of possibilities but I believe that in

each case, one way has more sense than others.

Source and Links… 7.

- Hans Hultgren : “Modeling the Agile datawarehouse with data vault“

- http://hanshultgren.wordpress.com/

- www.trivadis.com