Why Data Vault?

40
Why Data Vault? Kent Graziano Data Vault Master and Oracle ACE TrueBridge Resources OOW 2011 Session #28782

description

Given at Oracle Open World 2011: Not to be confused with Oracle Database Vault (a commercial db security product), Data Vault Modeling is a specific data modeling technique for designing highly flexible, scalable, and adaptable data structures for enterprise data warehouse repositories. It has been in use globally for over 10 years now but is not widely known. The purpose of this presentation is to provide an overview of the features of a Data Vault modeled EDW that distinguish it from the more traditional third normal form (3NF) or dimensional (i.e., star schema) modeling approaches used in most shops today. Topics will include dealing with evolving data requirements in an EDW (i.e., model agility), partitioning of data elements based on rate of change (and how that affects load speed and storage requirements), and where it fits in a typical Oracle EDW architecture. See more content like this by following my blog http://kentgraziano.com or follow me on twitter @kentgraziano.

Transcript of Why Data Vault?

Page 1: Why Data Vault?

Why Data Vault?Kent Graziano

Data Vault Master and Oracle ACETrueBridge Resources

OOW 2011Session #28782

Page 2: Why Data Vault?

My Bio

• Kent Graziano

– Certified Data Vault Master

– Oracle ACE (BI/DW)

– Data Architecture and Data Warehouse Specialist• 30 years in IT

• 20 years of Oracle-related work

• 15+ years of data warehousing experience

– Co-Author of • The Business of Data Vault Modeling (2008)

• The Data Model Resource Book (1st Edition)

• Oracle Designer: A Template for Developing an Enterprise Standards Document

– Past-President of Oracle Development Tools User Group (ODTUG) and Rocky Mountain Oracle User Group

– Co-Chair BIDW SIG for ODTUG

Page 3: Why Data Vault?

Data Vault Definition

The Data Vault is a detail oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business.

It is a hybrid approach encompassing the best of breed between 3rd normal form (3NF) and star schema. The design is flexible, scalable, consistent, and adaptable to the needs of the enterprise. It is a data model that is architected specifically to meet the needs of today’s enterprise data warehouses.

Dan Linstedt: Defining the Data VaultTDAN.com Article

(C) TeachDataVault.com

Page 4: Why Data Vault?

Where does a Data Vault Fit?

(C) TeachDataVault.com

Page 5: Why Data Vault?

Where does a Data Vault Fit?

(C) Oracle Corp

Oracle’s Next Generation Data Warehouse Reference Architecture

Data Vault goes here

Page 6: Why Data Vault?

Why Bother With Something New?Old Chinese proverb: 'Unless you change direction, you're apt to end up where you're headed.'

(C) TeachDataVault.com

Page 7: Why Data Vault?

Why do we need it?

• We have seen issues in constructing (and managing) an enterprise data warehouse model using 3rd normal form, or Star Schema.

– 3NF – Complex PKs with cascading snapshot dates (time-driven PKs)

– Star – difficult to re-engineer fact tables for granularity changes

• These issues lead to break downs in flexibility, adaptability, and even scalability

(C) Kent Graziano

Page 8: Why Data Vault?

Data Vault Time Line

20001960 1970 1980 1990

E.F. Codd invented relational modeling

Chris Date and Hugh Darwen Maintained and Refined Modeling

1976 Dr Peter ChenCreated E-R Diagramming

Early 70’s Bill Inmon Began Discussing Data Warehousing

Mid 60’s Dimension & Fact Modeling presented by General Mills and Dartmouth University

Mid 70’s AC Nielsen PopularizedDimension & Fact Terms

Mid – Late 80’s Dr Kimball Popularizes Star Schema

Mid 80’s Bill InmonPopularizes Data Warehousing

Late 80’s – Barry Devlin and Dr Kimball Release “Business Data Warehouse”

1990 – Dan Linstedt Begins R&D on Data Vault Modeling

2000 – Dan Linstedt releases first 5 articles on Data Vault Modeling

(C) TeachDataVault.com

Page 9: Why Data Vault?

Data Vault Modeling…

(C) TeachDataVault.com

Page 10: Why Data Vault?

What Are the Issues?

This is NOT what you want happening to your project!

THE GAP!!(C) TeachDataVault.com

Page 11: Why Data Vault?

What Are the Foundational Keys?

Flexibility

Scalability

Productivity

(C) TeachDataVault.com

Page 12: Why Data Vault?

Key: Flexibility (Agility)

Enabling rapid change on a massive scale without downstream impacts!

(C) TeachDataVault.com

Page 13: Why Data Vault?

Key: Scalability

Providing no foreseeable barrier to increased size and scope

People, Process, & Architecture!

(C) TeachDataVault.com

Page 14: Why Data Vault?

Key: Productivity

Enabling low complexity systems with high value output at a rapid pace

(C) TeachDataVault.com

Page 15: Why Data Vault?

HOW DOES IT WORK?Bringing the Data Vault to Your Project

(C) TeachDataVault.com

Page 16: Why Data Vault?

Key: Flexibility (Agility)• Goes beyond standard 3NF

• Hyper normalized• Hubs and Links only holds keys and meta data• Satellites split by rate of change and/or source

• Enables Agile data modeling• Easy to add to model without having to change existing structures

and load routines• Relationships (links) can be dropped and created on-demand.

• No more reloading history because of a missed requirement

• Based on natural business keys• Not system surrogate keys• Allows for integrating data across functions and source

systems more easily• All data relationships are key driven.

(C) TeachDataVault.com

Page 17: Why Data Vault?

Key: Flexibility (Agility)

Adding new components to the EDW has NEAR ZERO impact to:• Existing Loading Processes• Existing Data Model• Existing Reporting & BI Functions• Existing Source Systems• Existing Star Schemas and Data Marts

(C) TeachDataVault.com

Page 18: Why Data Vault?

Split and Merge ON DEMAND!

2 weeks from now

6 months from now

(C) TeachDataVault.com

Page 19: Why Data Vault?

Case In Point:

Result of flexibility of the Data Vault Model allowed them to merge 3 companies in 90 days – that is ALL systems, ALL DATA!

(C) TeachDataVault.com

Page 20: Why Data Vault?

Key: Scalability in Architecture

Scaling is easy, its based on the following principles• Hub and spoke design• MPP Shared-Nothing Architecture• Scale Free Networks• Can be partitioned vertically and horizontally to meet performance demands

(C) TeachDataVault.com

Page 21: Why Data Vault?

Perhaps You Wish To Split For Performance Reasons?

FROM THIS

TO THIS!

(C) TeachDataVault.com

Page 22: Why Data Vault?

Case In Point:

Result of scalability was to produce a Data Vault model that scaled to 3 Petabytes in size, and is still growing today!

(C) TeachDataVault.com

Page 23: Why Data Vault?

Key: Scalability in Team Size

You should be able to SCALE your TEAM as well!With the Data Vault methodology, you can:

Scale your team when desired, at different points in the project!

(C) TeachDataVault.com

Page 24: Why Data Vault?

Case In Point:(Dutch Tax Authority)

Result of scalability was to increase ETL developers for each new source system, and reassign them when the system was completely loaded to the Data Vault

(C) TeachDataVault.com

Page 25: Why Data Vault?

Key: Productivity

Increasing Productivity requires a reduction in complexity.The Data Vault Model simplifies all of the following:• ETL Loading Routines• Real-Time Ingestion of Data• Data Modeling for the EDW• Enhancing and Adapting for Change to the Model• Ease of Monitoring, managing and optimizing processes

(C) TeachDataVault.com

Page 26: Why Data Vault?

• Standardized modeling rules

• Highly repeatable and learnable modeling technique

• Can standardize load routines

• Delta Driven process

• Re-startable, consistent loading patterns.

• Can standardize extract routines

• Rapid build of new or revised Data Marts

• Can be automated

• RapidACE (www.rapidace.com)

Key: Productivity

(C) Kent Graziano

Page 27: Why Data Vault?

• The Data Vault holds granular historical relationships.

• Holds all history for all time, allowing any source system feeds to be reconstructed on-demand

• Easy generation of Audit Trails for data lineage and compliance.

• Data Mining can discover new relationships between elements

• Patterns of change emerge from the historical pictures and linkages.

• The Data Vault can be accessed by power-users

Key: Productivity

(C) Kent Graziano

Page 28: Why Data Vault?

Case in Point:Result of Productivity was: 2 people in 2 weeks merged 3 systems, built a full Data Vault EDW, 5 star schemas and 3 reports.

These individuals generated:• 90% of the ETL code for moving the data set• 100% of the Staging Data Model• 75% of the finished EDW data Model• 75% of the star schema data model

(C) TeachDataVault.com

Page 29: Why Data Vault?

The Competing Bid?The competition bid this with 15 people and 3 months to completion, at a cost of $250k! (they bid a Very complex system)

Actual total cost? $30k and 2 weeks!

(C) TeachDataVault.com

Page 30: Why Data Vault?

Other Benefits of a Data Vault

• Modeling it as a DV forces integration of the Business Keys upfront.

• Good for organizational alignment.

• An integrated data set with raw data extends it’s value beyond BI:• Source for data quality projects

• Source for master data

• Source for data mining

• Source for Data as a Service (DaaS) in an SOA (Service Oriented Architecture).

• Upfront Hub integration simplifies the data integration routines required to load data marts.• Helps divide the work a bit.

• It is much easier to implement security on these granular pieces.

• Granular, re-startable processes enable pin-point failure correction.

• It is designed and optimized for real-time loading in its core architecture (without any tweaks or mods).

(C) Kent Graziano

Page 31: Why Data Vault?

Conclusion?

Changing the direction of the river takes less effort than stopping the flow of water

(C) TeachDataVault.com

Page 32: Why Data Vault?

The Experts Say…

“The Data Vault is the optimal choice for modeling the EDW in the DW 2.0 framework.” Bill Inmon

“The Data Vault is foundationally strong and exceptionally scalable architecture.” Stephen Brobst

“The Data Vault is a technique which some industry

experts have predicted may spark a revolution as the next big thing in data modeling for enterprise warehousing....” Doug Laney

Page 33: Why Data Vault?

More Notables…

“This enables organizations to take control of their data warehousing destiny, supporting better and more relevant data warehouses in less time than before.” Howard Dresner

“[The Data Vault] captures a practical body of

knowledge for data warehouse development which both agile and traditional practitioners will benefit

from..” Scott Ambler

Page 34: Why Data Vault?

Who’s Using It?

Page 35: Why Data Vault?

Growing Adoption…

• The number of Data Vault users in the US surpassed 500 in 2010 and grows rapidly (http://danlinstedt.com/about/dv-customers/)

(C) Kent Graziano

Page 36: Why Data Vault?

In Review…

• Data Vault provides you with the tools you need to succeed in your DW/BI projects

• Flexibility

• Enabling rapid change on a massive scale without downstream impacts!

• Scalability

• Providing no foreseeable barrier to increased size and scope

• Productivity

• Enabling low complexity systems with high value output at a rapid pace

(C) TeachDataVault.com

Page 37: Why Data Vault?

(C) TeachDataVault.com

Page 38: Why Data Vault?

Where To Learn More

The Technical Modeling Book: http://LearnDataVault.com

On YouTube: http://www.youtube.com/LearnDataVault

On Facebook: www.facebook.com/learndatavault

Dan’s Blog: www.danlinstedt.com

The Discussion Forums: http://LinkedIn.com – Data Vault Discussions

World wide User Group (Free): http://dvusergroup.com

The Business of Data Vault Modeling

by Dan Linstedt, Kent Graziano, Hans Hultgren

(available at www.lulu.com )

38

Page 39: Why Data Vault?

10/11/2011 (C) TeachDataVault.com 39

Page 40: Why Data Vault?

Contact Information

Kent Graziano

[email protected]

Want more Data Vault?

Session # 05923: Introduction to Data Vault Modeling

Thursday, 4:00 PM, Moscone South Rm 303