Why Data Vault?
-
Upload
kent-graziano -
Category
Technology
-
view
5.901 -
download
3
description
Transcript of Why Data Vault?
Why Data Vault?Kent Graziano
Data Vault Master and Oracle ACETrueBridge Resources
OOW 2011Session #28782
My Bio
• Kent Graziano
– Certified Data Vault Master
– Oracle ACE (BI/DW)
– Data Architecture and Data Warehouse Specialist• 30 years in IT
• 20 years of Oracle-related work
• 15+ years of data warehousing experience
– Co-Author of • The Business of Data Vault Modeling (2008)
• The Data Model Resource Book (1st Edition)
• Oracle Designer: A Template for Developing an Enterprise Standards Document
– Past-President of Oracle Development Tools User Group (ODTUG) and Rocky Mountain Oracle User Group
– Co-Chair BIDW SIG for ODTUG
Data Vault Definition
The Data Vault is a detail oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business.
It is a hybrid approach encompassing the best of breed between 3rd normal form (3NF) and star schema. The design is flexible, scalable, consistent, and adaptable to the needs of the enterprise. It is a data model that is architected specifically to meet the needs of today’s enterprise data warehouses.
Dan Linstedt: Defining the Data VaultTDAN.com Article
(C) TeachDataVault.com
Where does a Data Vault Fit?
(C) TeachDataVault.com
Where does a Data Vault Fit?
(C) Oracle Corp
Oracle’s Next Generation Data Warehouse Reference Architecture
Data Vault goes here
Why Bother With Something New?Old Chinese proverb: 'Unless you change direction, you're apt to end up where you're headed.'
(C) TeachDataVault.com
Why do we need it?
• We have seen issues in constructing (and managing) an enterprise data warehouse model using 3rd normal form, or Star Schema.
– 3NF – Complex PKs with cascading snapshot dates (time-driven PKs)
– Star – difficult to re-engineer fact tables for granularity changes
• These issues lead to break downs in flexibility, adaptability, and even scalability
(C) Kent Graziano
Data Vault Time Line
20001960 1970 1980 1990
E.F. Codd invented relational modeling
Chris Date and Hugh Darwen Maintained and Refined Modeling
1976 Dr Peter ChenCreated E-R Diagramming
Early 70’s Bill Inmon Began Discussing Data Warehousing
Mid 60’s Dimension & Fact Modeling presented by General Mills and Dartmouth University
Mid 70’s AC Nielsen PopularizedDimension & Fact Terms
Mid – Late 80’s Dr Kimball Popularizes Star Schema
Mid 80’s Bill InmonPopularizes Data Warehousing
Late 80’s – Barry Devlin and Dr Kimball Release “Business Data Warehouse”
1990 – Dan Linstedt Begins R&D on Data Vault Modeling
2000 – Dan Linstedt releases first 5 articles on Data Vault Modeling
(C) TeachDataVault.com
Data Vault Modeling…
(C) TeachDataVault.com
What Are the Issues?
This is NOT what you want happening to your project!
THE GAP!!(C) TeachDataVault.com
What Are the Foundational Keys?
Flexibility
Scalability
Productivity
(C) TeachDataVault.com
Key: Flexibility (Agility)
Enabling rapid change on a massive scale without downstream impacts!
(C) TeachDataVault.com
Key: Scalability
Providing no foreseeable barrier to increased size and scope
People, Process, & Architecture!
(C) TeachDataVault.com
Key: Productivity
Enabling low complexity systems with high value output at a rapid pace
(C) TeachDataVault.com
HOW DOES IT WORK?Bringing the Data Vault to Your Project
(C) TeachDataVault.com
Key: Flexibility (Agility)• Goes beyond standard 3NF
• Hyper normalized• Hubs and Links only holds keys and meta data• Satellites split by rate of change and/or source
• Enables Agile data modeling• Easy to add to model without having to change existing structures
and load routines• Relationships (links) can be dropped and created on-demand.
• No more reloading history because of a missed requirement
• Based on natural business keys• Not system surrogate keys• Allows for integrating data across functions and source
systems more easily• All data relationships are key driven.
(C) TeachDataVault.com
Key: Flexibility (Agility)
Adding new components to the EDW has NEAR ZERO impact to:• Existing Loading Processes• Existing Data Model• Existing Reporting & BI Functions• Existing Source Systems• Existing Star Schemas and Data Marts
(C) TeachDataVault.com
Split and Merge ON DEMAND!
2 weeks from now
6 months from now
(C) TeachDataVault.com
Case In Point:
Result of flexibility of the Data Vault Model allowed them to merge 3 companies in 90 days – that is ALL systems, ALL DATA!
(C) TeachDataVault.com
Key: Scalability in Architecture
Scaling is easy, its based on the following principles• Hub and spoke design• MPP Shared-Nothing Architecture• Scale Free Networks• Can be partitioned vertically and horizontally to meet performance demands
(C) TeachDataVault.com
Perhaps You Wish To Split For Performance Reasons?
FROM THIS
TO THIS!
(C) TeachDataVault.com
Case In Point:
Result of scalability was to produce a Data Vault model that scaled to 3 Petabytes in size, and is still growing today!
(C) TeachDataVault.com
Key: Scalability in Team Size
You should be able to SCALE your TEAM as well!With the Data Vault methodology, you can:
Scale your team when desired, at different points in the project!
(C) TeachDataVault.com
Case In Point:(Dutch Tax Authority)
Result of scalability was to increase ETL developers for each new source system, and reassign them when the system was completely loaded to the Data Vault
(C) TeachDataVault.com
Key: Productivity
Increasing Productivity requires a reduction in complexity.The Data Vault Model simplifies all of the following:• ETL Loading Routines• Real-Time Ingestion of Data• Data Modeling for the EDW• Enhancing and Adapting for Change to the Model• Ease of Monitoring, managing and optimizing processes
(C) TeachDataVault.com
• Standardized modeling rules
• Highly repeatable and learnable modeling technique
• Can standardize load routines
• Delta Driven process
• Re-startable, consistent loading patterns.
• Can standardize extract routines
• Rapid build of new or revised Data Marts
• Can be automated
• RapidACE (www.rapidace.com)
Key: Productivity
(C) Kent Graziano
• The Data Vault holds granular historical relationships.
• Holds all history for all time, allowing any source system feeds to be reconstructed on-demand
• Easy generation of Audit Trails for data lineage and compliance.
• Data Mining can discover new relationships between elements
• Patterns of change emerge from the historical pictures and linkages.
• The Data Vault can be accessed by power-users
Key: Productivity
(C) Kent Graziano
Case in Point:Result of Productivity was: 2 people in 2 weeks merged 3 systems, built a full Data Vault EDW, 5 star schemas and 3 reports.
These individuals generated:• 90% of the ETL code for moving the data set• 100% of the Staging Data Model• 75% of the finished EDW data Model• 75% of the star schema data model
(C) TeachDataVault.com
The Competing Bid?The competition bid this with 15 people and 3 months to completion, at a cost of $250k! (they bid a Very complex system)
Actual total cost? $30k and 2 weeks!
(C) TeachDataVault.com
Other Benefits of a Data Vault
• Modeling it as a DV forces integration of the Business Keys upfront.
• Good for organizational alignment.
• An integrated data set with raw data extends it’s value beyond BI:• Source for data quality projects
• Source for master data
• Source for data mining
• Source for Data as a Service (DaaS) in an SOA (Service Oriented Architecture).
• Upfront Hub integration simplifies the data integration routines required to load data marts.• Helps divide the work a bit.
• It is much easier to implement security on these granular pieces.
• Granular, re-startable processes enable pin-point failure correction.
• It is designed and optimized for real-time loading in its core architecture (without any tweaks or mods).
(C) Kent Graziano
Conclusion?
Changing the direction of the river takes less effort than stopping the flow of water
(C) TeachDataVault.com
The Experts Say…
“The Data Vault is the optimal choice for modeling the EDW in the DW 2.0 framework.” Bill Inmon
“The Data Vault is foundationally strong and exceptionally scalable architecture.” Stephen Brobst
“The Data Vault is a technique which some industry
experts have predicted may spark a revolution as the next big thing in data modeling for enterprise warehousing....” Doug Laney
More Notables…
“This enables organizations to take control of their data warehousing destiny, supporting better and more relevant data warehouses in less time than before.” Howard Dresner
“[The Data Vault] captures a practical body of
knowledge for data warehouse development which both agile and traditional practitioners will benefit
from..” Scott Ambler
Who’s Using It?
Growing Adoption…
• The number of Data Vault users in the US surpassed 500 in 2010 and grows rapidly (http://danlinstedt.com/about/dv-customers/)
(C) Kent Graziano
In Review…
• Data Vault provides you with the tools you need to succeed in your DW/BI projects
• Flexibility
• Enabling rapid change on a massive scale without downstream impacts!
• Scalability
• Providing no foreseeable barrier to increased size and scope
• Productivity
• Enabling low complexity systems with high value output at a rapid pace
(C) TeachDataVault.com
(C) TeachDataVault.com
Where To Learn More
The Technical Modeling Book: http://LearnDataVault.com
On YouTube: http://www.youtube.com/LearnDataVault
On Facebook: www.facebook.com/learndatavault
Dan’s Blog: www.danlinstedt.com
The Discussion Forums: http://LinkedIn.com – Data Vault Discussions
World wide User Group (Free): http://dvusergroup.com
The Business of Data Vault Modeling
by Dan Linstedt, Kent Graziano, Hans Hultgren
(available at www.lulu.com )
38
10/11/2011 (C) TeachDataVault.com 39
Contact Information
Kent Graziano
Want more Data Vault?
Session # 05923: Introduction to Data Vault Modeling
Thursday, 4:00 PM, Moscone South Rm 303