Data Vault Consortium A Mathematical Perspective of Data Vault.
Data Warehouse Modeling Data...
Transcript of Data Warehouse Modeling Data...
2009-09-16 / p.12009-09-16 / p.1
Seminar data warehousing
Jeroen Klep
September 16th , 2009
Data Vault Introduction
2009-09-16 / p.2
Agenda
• Intro and historical background
• Positioning and concept
• DataVault example: ‘The DVD Store’
• Demo: Quipu data warehousemanagement
2009-09-16 / p.3
(known) Dutch DV users
• SNS Bank• Belastingdienst• ANWB• Coram• Hypothekers Associatie• Friesland Bank• ING• Nutreco• … and many more…
2009-09-16 / p.4
Intro by Dan Linstedt
Source: YouTube
2009-09-16 / p.5
Viewers
…also acting as sort of indicator for interest / application of DV
POSITIONING AND CONCEPT
2009-09-16 / p.7
Modeling for BI: EvolutionCommon Data Warehouse modeling techniques
– 3NF: introduced by Codd & Date– Star schemas & snowflakes: introduced by Kimball
New modeling technique introduced by Dan Linstedt: Data Vault!
Source: Dan Linstedt
2009-09-16 / p.8
Data Vault DW architecture
DW
Semantic integration
Dependent data martsDW with ‘raw’ data
Quality – Cleansing – Business rules – Aggregation –KPI calculation
DATA VAULT
2009-09-16 / p.9
What is Data Vault?
Definitions:
“The Data Vault is a detail oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business. It is a hybrid approach encompassing the best of breed between 3rd normal form (3NF) and star schema.” (Dan Linstedt, www.danlinstedt.com)
“The Data Vault is a data integration architecture; a series of standards, and definitional elements or methods by wayinformation is connected within an RDBMS data store in order to make sense of it.” (Dan Linstedt, www.danlinstedt.com)
2009-09-16 / p.10
Data Vault – Components
A Data Vault data model consists of three types of components:
1. Hub entities: List of unique business keys with their accompanying surrogate (meaningless) keys
2. Link entities:Physical representation of 3NF many-to-many relationships (relationship or transaction between business components / business keys)
3. Satellite entities:Contain hub or link descriptive information (attributes) and history
DATA VAULT EXAMPLE: DVDSTORE
2009-09-16 / p.12
DVD Store - Description
• Simulation of an online eCommerce site• Sales/order data of DVDs (products) to customers
including historic sales information and product inventory figures
• The following models will be shown1. OLTP (source model)2. ERD diagram (logical model)3. Data Vault model
a. Hubsb. Linksc. Satellites
2009-09-16 / p.13
DVD Store - ERD
2009-09-16 / p.14
DVD Store – Hubs
Hubs
Satellites
Links
2009-09-16 / p.15
DVD Store – Links
Hubs
Satellites
Links
2009-09-16 / p.16
DVD Store – Additional satellites
Hubs
Satellites
Links
2009-09-16 / p.17
DVD Store – All satellites
Hubs
Satellites
Links
2009-09-16 / p.18
DVD Store – Complete Data Vault
Hubs
Satellites
Links
2009-09-16 / p.19
Hub_CategoryCategory_ID
Load_Dts
CATEGORY_SID
Hub_CustomerCustomer_ID
Load_Dts
CUSTOMERID_SID
Hub_OrderOrder_ID
Load_Dts
ORDERID_SID
Hub_ProductProduct_ID
Load_Dts
PROD_ID_SID
Lnk_Cust_HistLnk_Cust_Hist_ID
Customer_ID
Order_ID
Product_ID
Load_Dts
Lnk_Cust_Hist_SatLnk_Cust_Hist_ID
Load_Dts
Lnk_Order_CustomerLnk_Order_Customer_ID
Order_ID
Customer_ID
Load_Dts
Lnk_Order_Customer_SatLnk_Order_Customer_ID
Load_Dts
Lnk_OrderlinesLnk_Orderlines_ID
Order_ID
Product_ID
Load_Dts
Lnk_Orderlines_SatLnk_Orderlines_ID
QUANTITY
ORDERDATE
Load_Dts
Lnk_Product_CategoryLnk_Product_Category_ID
Product_ID
Category_ID
Load_Dts
Lnk_Product_Category_SatLnk_Product_Category_ID
Load_Dts
Lnk_Product_Common_ProductLnk_Product_Common_Product_ID
Product_ID
Product_ID_Common
Load_Dts
Lnk_Product_Common_Product_SatLnk_Product_Common_Product_ID
Load_Dts
Sat_CategoryCategory_ID
CATEGORYNAME
Load_Dts
Sat_CustomerCustomer_ID
FIRSTNAME
LASTNAME
ADDRESS1
ADDRESS2
CITY
STATE
ZIP
COUNTRY
REGION
PHONE
CREDITCARDTYPE
CREDITCARD
CREDITCARDEXPI...
USERNAME
PASSWORD
Sat_OrderOrder_ID
ORDERDATE
CUSTOMERID
NETAMOUNT
TAX
TOTALAMOUNT
Load_Dts
Sat_ProductProduct_ID
TITLE
ACTOR
PRICE
SPECIAL
Load_Dts
Sat_Product_InventoryProduct_ID
QUAN_IN_STOCK
SALES
Load_Dts
Sat_Product_ReorderProduct_ID
DATE_LOW
QUAN_LOW
DATE_REORDERED
QUAN_REORDERED
DATE_EXPECTED
Load_Dts
DVD Store – Database model
2009-09-16 / p.20
DVD Store – Database model
Hub_CategoryCategory_ID
Load_Dts
CATEGORY_SID
Hub_CustomerCustomer_ID
Load_Dts
CUSTOMERID_SID
Hub_OrderOrder_ID
Load_Dts
ORDERID_SID
Hub_ProductProduct_ID
Load_Dts
PROD_ID_SID
Lnk_Cust_HistLnk_Cust_Hist_ID
Customer_ID
Order_ID
Product_ID
Load_Dts
Lnk_Cust_Hist_SatLnk_Cust_Hist_ID
Load_Dts
Lnk_Order_CustomerLnk_Order_Customer_ID
Order_ID
Customer_ID
Load_Dts
Lnk_Order_Customer_SatLnk_Order_Customer_ID
Load_Dts
Lnk_OrderlinesLnk_Orderlines_ID
Order_ID
Product_ID
Load_Dts
Lnk_Orderlines_SatLnk_Orderlines_ID
QUANTITY
ORDERDATE
Load_Dts
Lnk_Product_CategoryLnk_Product_Category_ID
Product_ID
Category_ID
Load_Dts
Lnk_Product_Category_SatLnk_Product_Category_ID
Load_Dts
Lnk_Product_Common_ProductLnk_Product_Common_Product_ID
Product_ID
Product_ID_Common
Load_Dts
Lnk_Product_Common_Product_SatLnk_Product_Common_Product_ID
Load_Dts
Sat_CategoryCategory_ID
CATEGORYNAME
Load_Dts
Sat_CustomerCustomer_ID
FIRSTNAME
LASTNAME
ADDRESS1
ADDRESS2
CITY
STATE
ZIP
COUNTRY
REGION
PHONE
CREDITCARDTYPE
CREDITCARD
CREDITCARDEXPI ...
USERNAME
PASSWORD
Sat_OrderOrder_ID
ORDERDATE
CUSTOMERID
NETAMOUNT
TAX
TOTALAMOUNT
Load_Dts
Sat_ProductProduct_ID
TITLE
ACTOR
PRICE
SPECIAL
Load_Dts
Sat_Product_InventoryProduct_ID
QUAN_IN_STOCK
SALES
Load_Dts
Sat_Product_ReorderProduct_ID
DATE_LOW
QUAN_LOW
DATE_REORDERED
QUAN_REORDERED
DATE_EXPECTED
Load_Dts
2009-09-16 / p.21
DVD Store - OLTP
CATEGORIESCATEGORY
CATEGORYNAME
CUST_HISTCUSTOMERID
ORDERID
PROD_ID
CUSTOMERSCUSTOMERID
FIRSTNAME
LASTNAME
ADDRESS1
ADDRESS2
CITY
STATE
ZIP
COUNTRY
REGION
PHONE
CREDITCARDTYPE
CREDITCARD
CREDITCARDEXPIRATION
USERNAME
PASSWORD
AGE
INCOME
GENDER
INVENTORYPROD_ID
QUAN_IN_STOCK
SALES
ORDERLINESORDERLINEID
ORDERID
PROD_ID
QUANTITY
ORDERDATE
PRODUCTSPROD_ID
CATEGORY
TITLE
ACTOR
PRICE
SPECIAL
COMMON_PROD_ID
REORDERPROD_ID
DATE_LOW
QUAN_LOW
DATE_REORDERED
QUAN_REORDERED
DATE_EXPECTED
ORDERSORDERID
ORDERDATE
CUSTOMERID
NETAMOUNT
TAX
TOTALAMOUNT
2009-09-16 / p.22
BI modeling methods
Main BI modeling methods:
• 3NF
• Star schema / Snowflake
• Data Vault
2009-09-16 / p.23
BI architecture pressures and DV
BI ArchitectureFlexibility
Real-time Advancedanalysis
Low cost
Performance
Large data volumes
Auditable
Integration
Agility
QUIPU DATA WAREHOUSE MANAGEMENT
2009-09-16 / p.25
What is Quipu?
• Management tool for– Creating the DWH
• Generate DV schemas
• Generate ETL from source to DV target
• Generate data marts + ETL
– And maintaining the DWH • Deploy into DTAP
• Scheduling and monitoring
• Enhance, split or add models
2009-09-16 / p.26
Our goals with Quipu
• Fast implementation of DV based EDWH
• Reduce risk of modeling errors
• Remove repetitive tasks
• Open source
2009-09-16 / p.27
Key features
• Open source license model
• DataVault oriented
• Rich repository
• Front-end and back-end separation
• Integration with mainstream ETL software
• Support for multiple source and target DBMS platforms
• DTAP and multi-user/developer support
2009-09-16 / p.28
Basic architecture
2009-09-16 / p.29
Enterprise DW architecture
2009-09-16 / p.30
RoadmapToday Demo version, basic functionality
Q4-2009 Closed beta version with partners
Q1-2010 First public beta version
Q2-2010 Public release version 1.0
Q4-2010 Version 2.0
QUIPU DEMO
QUESTIONS ?
© 2009 QOSQO BVAll rights reserved. No part of this document may bereproduced without the written permission of QOSQO.