Data Warehouse Modeling Data...

32
2009-09-16 / p.1 Seminar data warehousing Jeroen Klep September 16 th , 2009 Data Vault Introduction

Transcript of Data Warehouse Modeling Data...

Page 1: Data Warehouse Modeling Data Vaultdatabaser.net/moniwiki/pds/DataVaultModeling/QOSQO_Data_Vault... · What is Data Vault? Definitions: “The Data Vault is a detail oriented, historical

2009-09-16 / p.12009-09-16 / p.1

Seminar data warehousing

Jeroen Klep

September 16th , 2009

Data Vault Introduction

Page 2: Data Warehouse Modeling Data Vaultdatabaser.net/moniwiki/pds/DataVaultModeling/QOSQO_Data_Vault... · What is Data Vault? Definitions: “The Data Vault is a detail oriented, historical

2009-09-16 / p.2

Agenda

• Intro and historical background

• Positioning and concept

• DataVault example: ‘The DVD Store’

• Demo: Quipu data warehousemanagement

Page 3: Data Warehouse Modeling Data Vaultdatabaser.net/moniwiki/pds/DataVaultModeling/QOSQO_Data_Vault... · What is Data Vault? Definitions: “The Data Vault is a detail oriented, historical

2009-09-16 / p.3

(known) Dutch DV users

• SNS Bank• Belastingdienst• ANWB• Coram• Hypothekers Associatie• Friesland Bank• ING• Nutreco• … and many more…

Page 4: Data Warehouse Modeling Data Vaultdatabaser.net/moniwiki/pds/DataVaultModeling/QOSQO_Data_Vault... · What is Data Vault? Definitions: “The Data Vault is a detail oriented, historical

2009-09-16 / p.4

Intro by Dan Linstedt

Source: YouTube

Page 5: Data Warehouse Modeling Data Vaultdatabaser.net/moniwiki/pds/DataVaultModeling/QOSQO_Data_Vault... · What is Data Vault? Definitions: “The Data Vault is a detail oriented, historical

2009-09-16 / p.5

Viewers

…also acting as sort of indicator for interest / application of DV

Page 6: Data Warehouse Modeling Data Vaultdatabaser.net/moniwiki/pds/DataVaultModeling/QOSQO_Data_Vault... · What is Data Vault? Definitions: “The Data Vault is a detail oriented, historical

POSITIONING AND CONCEPT

Page 7: Data Warehouse Modeling Data Vaultdatabaser.net/moniwiki/pds/DataVaultModeling/QOSQO_Data_Vault... · What is Data Vault? Definitions: “The Data Vault is a detail oriented, historical

2009-09-16 / p.7

Modeling for BI: EvolutionCommon Data Warehouse modeling techniques

– 3NF: introduced by Codd & Date– Star schemas & snowflakes: introduced by Kimball

New modeling technique introduced by Dan Linstedt: Data Vault!

Source: Dan Linstedt

Page 8: Data Warehouse Modeling Data Vaultdatabaser.net/moniwiki/pds/DataVaultModeling/QOSQO_Data_Vault... · What is Data Vault? Definitions: “The Data Vault is a detail oriented, historical

2009-09-16 / p.8

Data Vault DW architecture

DW

Semantic integration

Dependent data martsDW with ‘raw’ data

Quality – Cleansing – Business rules – Aggregation –KPI calculation

DATA VAULT

Page 9: Data Warehouse Modeling Data Vaultdatabaser.net/moniwiki/pds/DataVaultModeling/QOSQO_Data_Vault... · What is Data Vault? Definitions: “The Data Vault is a detail oriented, historical

2009-09-16 / p.9

What is Data Vault?

Definitions:

“The Data Vault is a detail oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business. It is a hybrid approach encompassing the best of breed between 3rd normal form (3NF) and star schema.” (Dan Linstedt, www.danlinstedt.com)

“The Data Vault is a data integration architecture; a series of standards, and definitional elements or methods by wayinformation is connected within an RDBMS data store in order to make sense of it.” (Dan Linstedt, www.danlinstedt.com)

Page 10: Data Warehouse Modeling Data Vaultdatabaser.net/moniwiki/pds/DataVaultModeling/QOSQO_Data_Vault... · What is Data Vault? Definitions: “The Data Vault is a detail oriented, historical

2009-09-16 / p.10

Data Vault – Components

A Data Vault data model consists of three types of components:

1. Hub entities: List of unique business keys with their accompanying surrogate (meaningless) keys

2. Link entities:Physical representation of 3NF many-to-many relationships (relationship or transaction between business components / business keys)

3. Satellite entities:Contain hub or link descriptive information (attributes) and history

Page 11: Data Warehouse Modeling Data Vaultdatabaser.net/moniwiki/pds/DataVaultModeling/QOSQO_Data_Vault... · What is Data Vault? Definitions: “The Data Vault is a detail oriented, historical

DATA VAULT EXAMPLE: DVDSTORE

Page 12: Data Warehouse Modeling Data Vaultdatabaser.net/moniwiki/pds/DataVaultModeling/QOSQO_Data_Vault... · What is Data Vault? Definitions: “The Data Vault is a detail oriented, historical

2009-09-16 / p.12

DVD Store - Description

• Simulation of an online eCommerce site• Sales/order data of DVDs (products) to customers

including historic sales information and product inventory figures

• The following models will be shown1. OLTP (source model)2. ERD diagram (logical model)3. Data Vault model

a. Hubsb. Linksc. Satellites

Page 13: Data Warehouse Modeling Data Vaultdatabaser.net/moniwiki/pds/DataVaultModeling/QOSQO_Data_Vault... · What is Data Vault? Definitions: “The Data Vault is a detail oriented, historical

2009-09-16 / p.13

DVD Store - ERD

Page 14: Data Warehouse Modeling Data Vaultdatabaser.net/moniwiki/pds/DataVaultModeling/QOSQO_Data_Vault... · What is Data Vault? Definitions: “The Data Vault is a detail oriented, historical

2009-09-16 / p.14

DVD Store – Hubs

Hubs

Satellites

Links

Page 15: Data Warehouse Modeling Data Vaultdatabaser.net/moniwiki/pds/DataVaultModeling/QOSQO_Data_Vault... · What is Data Vault? Definitions: “The Data Vault is a detail oriented, historical

2009-09-16 / p.15

DVD Store – Links

Hubs

Satellites

Links

Page 16: Data Warehouse Modeling Data Vaultdatabaser.net/moniwiki/pds/DataVaultModeling/QOSQO_Data_Vault... · What is Data Vault? Definitions: “The Data Vault is a detail oriented, historical

2009-09-16 / p.16

DVD Store – Additional satellites

Hubs

Satellites

Links

Page 17: Data Warehouse Modeling Data Vaultdatabaser.net/moniwiki/pds/DataVaultModeling/QOSQO_Data_Vault... · What is Data Vault? Definitions: “The Data Vault is a detail oriented, historical

2009-09-16 / p.17

DVD Store – All satellites

Hubs

Satellites

Links

Page 18: Data Warehouse Modeling Data Vaultdatabaser.net/moniwiki/pds/DataVaultModeling/QOSQO_Data_Vault... · What is Data Vault? Definitions: “The Data Vault is a detail oriented, historical

2009-09-16 / p.18

DVD Store – Complete Data Vault

Hubs

Satellites

Links

Page 19: Data Warehouse Modeling Data Vaultdatabaser.net/moniwiki/pds/DataVaultModeling/QOSQO_Data_Vault... · What is Data Vault? Definitions: “The Data Vault is a detail oriented, historical

2009-09-16 / p.19

Hub_CategoryCategory_ID

Load_Dts

CATEGORY_SID

Hub_CustomerCustomer_ID

Load_Dts

CUSTOMERID_SID

Hub_OrderOrder_ID

Load_Dts

ORDERID_SID

Hub_ProductProduct_ID

Load_Dts

PROD_ID_SID

Lnk_Cust_HistLnk_Cust_Hist_ID

Customer_ID

Order_ID

Product_ID

Load_Dts

Lnk_Cust_Hist_SatLnk_Cust_Hist_ID

Load_Dts

Lnk_Order_CustomerLnk_Order_Customer_ID

Order_ID

Customer_ID

Load_Dts

Lnk_Order_Customer_SatLnk_Order_Customer_ID

Load_Dts

Lnk_OrderlinesLnk_Orderlines_ID

Order_ID

Product_ID

Load_Dts

Lnk_Orderlines_SatLnk_Orderlines_ID

QUANTITY

ORDERDATE

Load_Dts

Lnk_Product_CategoryLnk_Product_Category_ID

Product_ID

Category_ID

Load_Dts

Lnk_Product_Category_SatLnk_Product_Category_ID

Load_Dts

Lnk_Product_Common_ProductLnk_Product_Common_Product_ID

Product_ID

Product_ID_Common

Load_Dts

Lnk_Product_Common_Product_SatLnk_Product_Common_Product_ID

Load_Dts

Sat_CategoryCategory_ID

CATEGORYNAME

Load_Dts

Sat_CustomerCustomer_ID

FIRSTNAME

LASTNAME

ADDRESS1

ADDRESS2

CITY

STATE

ZIP

COUNTRY

REGION

EMAIL

PHONE

CREDITCARDTYPE

CREDITCARD

CREDITCARDEXPI...

USERNAME

PASSWORD

Sat_OrderOrder_ID

ORDERDATE

CUSTOMERID

NETAMOUNT

TAX

TOTALAMOUNT

Load_Dts

Sat_ProductProduct_ID

TITLE

ACTOR

PRICE

SPECIAL

Load_Dts

Sat_Product_InventoryProduct_ID

QUAN_IN_STOCK

SALES

Load_Dts

Sat_Product_ReorderProduct_ID

DATE_LOW

QUAN_LOW

DATE_REORDERED

QUAN_REORDERED

DATE_EXPECTED

Load_Dts

DVD Store – Database model

Page 20: Data Warehouse Modeling Data Vaultdatabaser.net/moniwiki/pds/DataVaultModeling/QOSQO_Data_Vault... · What is Data Vault? Definitions: “The Data Vault is a detail oriented, historical

2009-09-16 / p.20

DVD Store – Database model

Hub_CategoryCategory_ID

Load_Dts

CATEGORY_SID

Hub_CustomerCustomer_ID

Load_Dts

CUSTOMERID_SID

Hub_OrderOrder_ID

Load_Dts

ORDERID_SID

Hub_ProductProduct_ID

Load_Dts

PROD_ID_SID

Lnk_Cust_HistLnk_Cust_Hist_ID

Customer_ID

Order_ID

Product_ID

Load_Dts

Lnk_Cust_Hist_SatLnk_Cust_Hist_ID

Load_Dts

Lnk_Order_CustomerLnk_Order_Customer_ID

Order_ID

Customer_ID

Load_Dts

Lnk_Order_Customer_SatLnk_Order_Customer_ID

Load_Dts

Lnk_OrderlinesLnk_Orderlines_ID

Order_ID

Product_ID

Load_Dts

Lnk_Orderlines_SatLnk_Orderlines_ID

QUANTITY

ORDERDATE

Load_Dts

Lnk_Product_CategoryLnk_Product_Category_ID

Product_ID

Category_ID

Load_Dts

Lnk_Product_Category_SatLnk_Product_Category_ID

Load_Dts

Lnk_Product_Common_ProductLnk_Product_Common_Product_ID

Product_ID

Product_ID_Common

Load_Dts

Lnk_Product_Common_Product_SatLnk_Product_Common_Product_ID

Load_Dts

Sat_CategoryCategory_ID

CATEGORYNAME

Load_Dts

Sat_CustomerCustomer_ID

FIRSTNAME

LASTNAME

ADDRESS1

ADDRESS2

CITY

STATE

ZIP

COUNTRY

REGION

EMAIL

PHONE

CREDITCARDTYPE

CREDITCARD

CREDITCARDEXPI ...

USERNAME

PASSWORD

Sat_OrderOrder_ID

ORDERDATE

CUSTOMERID

NETAMOUNT

TAX

TOTALAMOUNT

Load_Dts

Sat_ProductProduct_ID

TITLE

ACTOR

PRICE

SPECIAL

Load_Dts

Sat_Product_InventoryProduct_ID

QUAN_IN_STOCK

SALES

Load_Dts

Sat_Product_ReorderProduct_ID

DATE_LOW

QUAN_LOW

DATE_REORDERED

QUAN_REORDERED

DATE_EXPECTED

Load_Dts

Page 21: Data Warehouse Modeling Data Vaultdatabaser.net/moniwiki/pds/DataVaultModeling/QOSQO_Data_Vault... · What is Data Vault? Definitions: “The Data Vault is a detail oriented, historical

2009-09-16 / p.21

DVD Store - OLTP

CATEGORIESCATEGORY

CATEGORYNAME

CUST_HISTCUSTOMERID

ORDERID

PROD_ID

CUSTOMERSCUSTOMERID

FIRSTNAME

LASTNAME

ADDRESS1

ADDRESS2

CITY

STATE

ZIP

COUNTRY

REGION

EMAIL

PHONE

CREDITCARDTYPE

CREDITCARD

CREDITCARDEXPIRATION

USERNAME

PASSWORD

AGE

INCOME

GENDER

INVENTORYPROD_ID

QUAN_IN_STOCK

SALES

ORDERLINESORDERLINEID

ORDERID

PROD_ID

QUANTITY

ORDERDATE

PRODUCTSPROD_ID

CATEGORY

TITLE

ACTOR

PRICE

SPECIAL

COMMON_PROD_ID

REORDERPROD_ID

DATE_LOW

QUAN_LOW

DATE_REORDERED

QUAN_REORDERED

DATE_EXPECTED

ORDERSORDERID

ORDERDATE

CUSTOMERID

NETAMOUNT

TAX

TOTALAMOUNT

Page 22: Data Warehouse Modeling Data Vaultdatabaser.net/moniwiki/pds/DataVaultModeling/QOSQO_Data_Vault... · What is Data Vault? Definitions: “The Data Vault is a detail oriented, historical

2009-09-16 / p.22

BI modeling methods

Main BI modeling methods:

• 3NF

• Star schema / Snowflake

• Data Vault

Page 23: Data Warehouse Modeling Data Vaultdatabaser.net/moniwiki/pds/DataVaultModeling/QOSQO_Data_Vault... · What is Data Vault? Definitions: “The Data Vault is a detail oriented, historical

2009-09-16 / p.23

BI architecture pressures and DV

BI ArchitectureFlexibility

Real-time Advancedanalysis

Low cost

Performance

Large data volumes

Auditable

Integration

Agility

Page 24: Data Warehouse Modeling Data Vaultdatabaser.net/moniwiki/pds/DataVaultModeling/QOSQO_Data_Vault... · What is Data Vault? Definitions: “The Data Vault is a detail oriented, historical

QUIPU DATA WAREHOUSE MANAGEMENT

Page 25: Data Warehouse Modeling Data Vaultdatabaser.net/moniwiki/pds/DataVaultModeling/QOSQO_Data_Vault... · What is Data Vault? Definitions: “The Data Vault is a detail oriented, historical

2009-09-16 / p.25

What is Quipu?

• Management tool for– Creating the DWH

• Generate DV schemas

• Generate ETL from source to DV target

• Generate data marts + ETL

– And maintaining the DWH • Deploy into DTAP

• Scheduling and monitoring

• Enhance, split or add models

Page 26: Data Warehouse Modeling Data Vaultdatabaser.net/moniwiki/pds/DataVaultModeling/QOSQO_Data_Vault... · What is Data Vault? Definitions: “The Data Vault is a detail oriented, historical

2009-09-16 / p.26

Our goals with Quipu

• Fast implementation of DV based EDWH

• Reduce risk of modeling errors

• Remove repetitive tasks

• Open source

Page 27: Data Warehouse Modeling Data Vaultdatabaser.net/moniwiki/pds/DataVaultModeling/QOSQO_Data_Vault... · What is Data Vault? Definitions: “The Data Vault is a detail oriented, historical

2009-09-16 / p.27

Key features

• Open source license model

• DataVault oriented

• Rich repository

• Front-end and back-end separation

• Integration with mainstream ETL software

• Support for multiple source and target DBMS platforms

• DTAP and multi-user/developer support

Page 28: Data Warehouse Modeling Data Vaultdatabaser.net/moniwiki/pds/DataVaultModeling/QOSQO_Data_Vault... · What is Data Vault? Definitions: “The Data Vault is a detail oriented, historical

2009-09-16 / p.28

Basic architecture

Page 29: Data Warehouse Modeling Data Vaultdatabaser.net/moniwiki/pds/DataVaultModeling/QOSQO_Data_Vault... · What is Data Vault? Definitions: “The Data Vault is a detail oriented, historical

2009-09-16 / p.29

Enterprise DW architecture

Page 30: Data Warehouse Modeling Data Vaultdatabaser.net/moniwiki/pds/DataVaultModeling/QOSQO_Data_Vault... · What is Data Vault? Definitions: “The Data Vault is a detail oriented, historical

2009-09-16 / p.30

RoadmapToday Demo version, basic functionality

Q4-2009 Closed beta version with partners

Q1-2010 First public beta version

Q2-2010 Public release version 1.0

Q4-2010 Version 2.0

Page 31: Data Warehouse Modeling Data Vaultdatabaser.net/moniwiki/pds/DataVaultModeling/QOSQO_Data_Vault... · What is Data Vault? Definitions: “The Data Vault is a detail oriented, historical

QUIPU DEMO

Page 32: Data Warehouse Modeling Data Vaultdatabaser.net/moniwiki/pds/DataVaultModeling/QOSQO_Data_Vault... · What is Data Vault? Definitions: “The Data Vault is a detail oriented, historical

QUESTIONS ?

© 2009 QOSQO BVAll rights reserved. No part of this document may bereproduced without the written permission of QOSQO.