Data Vault Modeling & Methodology - 1105 Media: Home...

45
Data Vault Modeling & Methodology Technical Side and Introduction © Dan Linstedt, 2010, http://DanLinstedt.com

Transcript of Data Vault Modeling & Methodology - 1105 Media: Home...

Page 1: Data Vault Modeling & Methodology - 1105 Media: Home ...download.101com.com/pub/tdwi/files/DV_Presentation_TDWI Boston_Data... · Staging Area EDW – Data Vault Stage Load STAGING

Data Vault Modeling & Methodology

Technical Side and Introduction© Dan Linstedt, 2010, http://DanLinstedt.com

Page 2: Data Vault Modeling & Methodology - 1105 Media: Home ...download.101com.com/pub/tdwi/files/DV_Presentation_TDWI Boston_Data... · Staging Area EDW – Data Vault Stage Load STAGING

Technical DefinitionThe Data Vault is a detail oriented, historical tracking and

uniquely linked set of normalized tables that support one or more functional areas of business. It is a hybrid approach encompassing the best of breed between 3rd normal form (3NF) and star schema. The design is flexible, scalable, consistent and adaptable to the needs of the enterprise.

Architected specifically to meet the needs of today’s enterprise data warehouses

5/28/2010 2http://empoweredHoldings.com

Page 3: Data Vault Modeling & Methodology - 1105 Media: Home ...download.101com.com/pub/tdwi/files/DV_Presentation_TDWI Boston_Data... · Staging Area EDW – Data Vault Stage Load STAGING

What Does One Look Like?

Customer

Sat

Sat

Sat

F(x)

Customer

Product

Sat

Sat

Sat

F(x)

Product

Order

Sat

Sat

Sat

F(x)Order

Elements:•Hub•Link•Satellite

LinkF(x)

Sat

Records a history of the interaction

Hub = List of Unique Business KeysLink = List of Relationships, AssociationsSatellites = Descriptive Data

5/28/2010 3http://empoweredHoldings.com

Page 4: Data Vault Modeling & Methodology - 1105 Media: Home ...download.101com.com/pub/tdwi/files/DV_Presentation_TDWI Boston_Data... · Staging Area EDW – Data Vault Stage Load STAGING

Excel As A Source…

Hub AccountHub Account

Link Acct To GroupLink Acct To Group

Hub GroupingHub Grouping

Sat Group TypeSat Group Type

HierarchicalLink of GroupsHierarchicalLink of Groups

Raw SourceData in DV

User GroupingStructures

Level A

Level BLevel C

ItemItemItem

Flattened Structure

StagingTable

Do you have a power executive who is technically inclined, who runs the business off a

rogue spreadsheet?

5/28/2010 4http://empoweredHoldings.com

Page 5: Data Vault Modeling & Methodology - 1105 Media: Home ...download.101com.com/pub/tdwi/files/DV_Presentation_TDWI Boston_Data... · Staging Area EDW – Data Vault Stage Load STAGING

CORE ARCHITECTUREData Vault Basic Elements

5/28/2010 5http://empoweredHoldings.com

Page 6: Data Vault Modeling & Methodology - 1105 Media: Home ...download.101com.com/pub/tdwi/files/DV_Presentation_TDWI Boston_Data... · Staging Area EDW – Data Vault Stage Load STAGING

Data Vault Core Architecture• Hubs, Links, Satellites• Hubs = Unique List of Business Keys• Links = Unique List of Relationships across keys• Satellites = Descriptive Data

• Satellites have 1 and only one parent table• Satellites cannot be “Parents” to other tables• Hubs cannot be child tables

• Last Seen Dates, Load Dates, Record Sources, and Surrogate keys are notpart of the core architecture. They exists to help models and key migration.

5/28/2010 6http://empoweredHoldings.com

Page 7: Data Vault Modeling & Methodology - 1105 Media: Home ...download.101com.com/pub/tdwi/files/DV_Presentation_TDWI Boston_Data... · Staging Area EDW – Data Vault Stage Load STAGING

Hub EntityA Hub is a list of unique business keys

Primary Key

<Business Key>Load DTSLast Seen DTSRecord Source

Hub StructureProduct Sequence ID

Product NumberProduct Load DTSProduct Last Seen DTSProd Record Source

Hub Product

• A Hub’s business key is a unique index• A Hub’s load date represents the FIRST TIME the EDW saw the data• A Hub’s record source represents: First – the “Master” data source (on

collisions), if not available it holds the origination source of the actual key

Unique Index(Primary Index)

5/28/2010 7http://empoweredHoldings.com

Page 8: Data Vault Modeling & Methodology - 1105 Media: Home ...download.101com.com/pub/tdwi/files/DV_Presentation_TDWI Boston_Data... · Staging Area EDW – Data Vault Stage Load STAGING

Link EntityA Link is an intersection of two or more business keys

It can contain Hub keys and other Link keys

Primary Key

{Hub/Lnk Surrogate Keys 2..N}Load DTSLast Seen DTSRecord Source

Link StructureLink Line Item Sequence ID

Hub Product Sequence IDHub Order Sequence ID**Line Item NumberLoad DTSLast Seen DTSRecord Source

Link Line-Item

A Link’s business key is a composite unique index• A Link may or may not have a “**Item Numbering” attribute• A Link’s load date represents the FIRST TIME the EDW saw the data• A Link’s record source represents: first – the “Master” data source (on collisions), if not

available, it holds the origination source of the actual key

Unique Index(Primary Index)

5/28/2010 8http://empoweredHoldings.com

Page 9: Data Vault Modeling & Methodology - 1105 Media: Home ...download.101com.com/pub/tdwi/files/DV_Presentation_TDWI Boston_Data... · Staging Area EDW – Data Vault Stage Load STAGING

Satellite EntityA Satellite is a time-dimensional table housing detailed information about the Hub’s or Link’s business keys

Primary KeyLoad DTSExtract DTS

DetailBusiness Data

{Update User}{Update DTS}Record Source

**Load End Date

Customer #Load DTSExtract DTS

Customer NameCustomer Addr1Customer Addr2{Update User}{Update DTS}Record Source

**Load End Date

Unique Index(Primary Index)

• Satellites are defined by TYPE of data and RATE OF CHANGE

• Mathematically – this reduces redundancy and decreases storage requirements over time (compared to a star schema)

5/28/2010 9http://empoweredHoldings.com

Page 10: Data Vault Modeling & Methodology - 1105 Media: Home ...download.101com.com/pub/tdwi/files/DV_Presentation_TDWI Boston_Data... · Staging Area EDW – Data Vault Stage Load STAGING

THINKING OF BREAKING RULES…

Rules and Standards GOVERN your deployment…

5/28/2010 10http://empoweredHoldings.com

Page 11: Data Vault Modeling & Methodology - 1105 Media: Home ...download.101com.com/pub/tdwi/files/DV_Presentation_TDWI Boston_Data... · Staging Area EDW – Data Vault Stage Load STAGING

Some Rules For You• NO Foreign Keys in the Satellites!• NO Hub to Hub (Parent Child relationships)• NO Enforcement of relationships in the data model…• NO Date Time attributes in HUB or LINK Primary Keys…

• Why??– It breaks flexibility– It breaks auditability / accountability– It breaks Scalability– It breaks Performance– It introduces “Decisions” in the architecture, which breaks

Patterns!

Up Next Links and the Unit Of Work…

5/28/2010 11http://empoweredHoldings.com

Page 12: Data Vault Modeling & Methodology - 1105 Media: Home ...download.101com.com/pub/tdwi/files/DV_Presentation_TDWI Boston_Data... · Staging Area EDW – Data Vault Stage Load STAGING

Business Key Definitions…

5/28/2010 http://empoweredHoldings.com 12

• “The contracts system is responsible for creating customer account numbers. The EDW will never see other systems creating customer account numbers.”(Requirement #101)

Sales is clearly creating customer numbers, how do we detect the issue and alert the business?

Point: Not all business keys are created EQUAL!

Page 13: Data Vault Modeling & Methodology - 1105 Media: Home ...download.101com.com/pub/tdwi/files/DV_Presentation_TDWI Boston_Data... · Staging Area EDW – Data Vault Stage Load STAGING

Link: Unit of Work

Hub Product

Hub Category

Hub Supplier

LinkLine Item

LinkProd-Cat

Sat Effectivity

LinkProd-Supp

Sat Effectivity

Link: Product by Supplier by Category

Unit Of Work

These links are Optional, usedFor exploration only

Link Product by CategoryLink Product by Supplier

5/28/2010 13http://empoweredHoldings.com

Page 14: Data Vault Modeling & Methodology - 1105 Media: Home ...download.101com.com/pub/tdwi/files/DV_Presentation_TDWI Boston_Data... · Staging Area EDW – Data Vault Stage Load STAGING

What Happens When:We Break the Unit of Work

Product_ID Category_ID Supplier_ID

222 12 96222 12 93729 15 87222 17 93 Product_ID Category_ID

222 12222 17729 15

Product_ID Supplier_ID

222 96222 93729 87

ModelNormalization

Question: After normalizing, how can you reconstruct the source image EXACLTY as it stands?

Source System UOW

Link Product by Supplier

Link Product by Category

5/28/2010 14http://empoweredHoldings.com

Page 15: Data Vault Modeling & Methodology - 1105 Media: Home ...download.101com.com/pub/tdwi/files/DV_Presentation_TDWI Boston_Data... · Staging Area EDW – Data Vault Stage Load STAGING

What Happens When:Trying to Rebuild from Two Links

Product_ID Category_ID Supplier_ID

222 12 96222 12 93222 17 96222 17 93729 15 87

Product_ID Category_ID

222 12222 17729 15

Product_ID Supplier_ID

222 96222 93729 87

ModelNormalization

Re-joining the data, creates a record that does not exist in the original source system, this is the same problem that BI engineswill have when putting together Data Mart results.

Source System UOW

Link Product by Supplier

Link Product by Category

5/28/2010 15http://empoweredHoldings.com

Page 16: Data Vault Modeling & Methodology - 1105 Media: Home ...download.101com.com/pub/tdwi/files/DV_Presentation_TDWI Boston_Data... · Staging Area EDW – Data Vault Stage Load STAGING

Link: Unit of Work Kept Together

Product_ID Category_ID Supplier_ID

222 12 96222 12 93729 15 87222 17 93

Product_ID Category_ID Supplier_ID

222 12 96222 12 93729 15 87222 17 93

Source Table UOW Link: Product by Category by Supplier

Commutative Property: Enable reproduction of the source exactly as it stands

UOW is properly represented by a single Link in the Data Vault

Source System Data Vault

5/28/2010 16http://empoweredHoldings.com

Page 17: Data Vault Modeling & Methodology - 1105 Media: Home ...download.101com.com/pub/tdwi/files/DV_Presentation_TDWI Boston_Data... · Staging Area EDW – Data Vault Stage Load STAGING

CURRENT LOADING PAINWhat keeps you up at night?

5/28/2010 17http://empoweredHoldings.com

Page 18: Data Vault Modeling & Methodology - 1105 Media: Home ...download.101com.com/pub/tdwi/files/DV_Presentation_TDWI Boston_Data... · Staging Area EDW – Data Vault Stage Load STAGING

Problems with EDW Loads TodayTechnical Issues:• 2am Wakeup Calls – because “data” won’t fit the business rules• “Emergency Fixes” to Production• Speed, Speed, Speed (shrinking load window + more data)• Can’t load real-time data (business rules in the way!!)• Business won’t buy better, faster, hardware!

Business Issues:• Maintenance cycles take too long• Maintenance costs continue to increase• Fixes to “existing mappings” break working logic• Complexity of existing systems become unsustainable to business• IT isn’t using 80%+ of the hardware resources given to them (their jobs are

running at 40% utilization when they are “full-bore”)

5/28/2010 18http://empoweredHoldings.com

Page 19: Data Vault Modeling & Methodology - 1105 Media: Home ...download.101com.com/pub/tdwi/files/DV_Presentation_TDWI Boston_Data... · Staging Area EDW – Data Vault Stage Load STAGING

Solutions!Technical Solutions• All Parallel Job Streams As much as possible• 1 Target Per Map, Per Action reduces complexity• Generate Data Flows based on patterns (then focus on the real work)• Get some SLEEP at night!! (no more production modifications)

Business Solutions• Decrease turn-around time• Increase Performance• Handle Real-Time Data!!• Reduce Complexity = Reduce Costs, Reduce Time to Implement• Get the power back for decision making, discovering and building your own

marts

5/28/2010 19http://empoweredHoldings.com

Page 20: Data Vault Modeling & Methodology - 1105 Media: Home ...download.101com.com/pub/tdwi/files/DV_Presentation_TDWI Boston_Data... · Staging Area EDW – Data Vault Stage Load STAGING

How?

5/28/2010 20http://empoweredHoldings.com

Page 21: Data Vault Modeling & Methodology - 1105 Media: Home ...download.101com.com/pub/tdwi/files/DV_Presentation_TDWI Boston_Data... · Staging Area EDW – Data Vault Stage Load STAGING

BASIC LOADING CONCEPTSSome standards to follow…

5/28/2010 21http://empoweredHoldings.com

Page 22: Data Vault Modeling & Methodology - 1105 Media: Home ...download.101com.com/pub/tdwi/files/DV_Presentation_TDWI Boston_Data... · Staging Area EDW – Data Vault Stage Load STAGING

Loading: A Golden Rule

It’s all about Auditability…

100% of the Data Loaded to the EDW 100% of the time!

5/28/2010 22http://empoweredHoldings.com

Page 23: Data Vault Modeling & Methodology - 1105 Media: Home ...download.101com.com/pub/tdwi/files/DV_Presentation_TDWI Boston_Data... · Staging Area EDW – Data Vault Stage Load STAGING

Load Date / End Date GeologyBatch Load

Real-Time Loading

5/28/2010 23http://empoweredHoldings.com

Page 24: Data Vault Modeling & Methodology - 1105 Media: Home ...download.101com.com/pub/tdwi/files/DV_Presentation_TDWI Boston_Data... · Staging Area EDW – Data Vault Stage Load STAGING

Real Time Loading - DV Stock TradeACCOUNT=123443576 TRADE="Buy" STOCK=“DAN" SHARES=100.0 CURRENCY="USD" PRICE=115.52 DATE="Feb 20, 2002“Comment="Buy Order to Execute"

123443576

Acct Hub

“DAN”

Stock Hub

Trade Link

TRADE="Buy" SHARES=100.0 CURRENCY="USD" PRICE=115.52 DATE="Feb 20, 2002“Comment="Buy Order to Execute"

Transactional Link

= Inserts Only, no Updates

1

2

3

Months in Production

# of Inserts

10M25M50M75M

1 2 3 4 5 6 7 8

First Data Set Loaded

New Systems Data Added

• As critical mass of current business keys is reached, the insert rates decrease rapidly.

• New systems add new keys, quickly and efficiently to an existing Hub.

5/28/2010 24http://empoweredHoldings.com

Page 25: Data Vault Modeling & Methodology - 1105 Media: Home ...download.101com.com/pub/tdwi/files/DV_Presentation_TDWI Boston_Data... · Staging Area EDW – Data Vault Stage Load STAGING

Batch Load Date Time Stamp

CNTRL_DTELOAD_DTS

EDW – Data VaultStaging Area

Stage LoadSTAGING TABLESequence_ID….Load_DTSRecord_Source

STAGING TABLESequence_ID….Load_DTSRecord_Source

Stage Load

Load DateIs exactly the sameFor All rows

5/28/2010 25http://empoweredHoldings.com

Page 26: Data Vault Modeling & Methodology - 1105 Media: Home ...download.101com.com/pub/tdwi/files/DV_Presentation_TDWI Boston_Data... · Staging Area EDW – Data Vault Stage Load STAGING

Sources Stage HubsHubSatellites

LinkSatellites Dimensions Facts

Links

Data Vault Loads Data Mart LoadsStaging Loads

Major Synchronization PointsProcessing:• All loads are done in parallel• Sets of processes “wait” for the previous set to complete• Processes are run as soon as data is ready• No other “waiting” time is required• Load dependencies are greatly reduced

Parallel Load Architecture - Batch

5/28/2010 26http://empoweredHoldings.com

Page 27: Data Vault Modeling & Methodology - 1105 Media: Home ...download.101com.com/pub/tdwi/files/DV_Presentation_TDWI Boston_Data... · Staging Area EDW – Data Vault Stage Load STAGING

Mathematics of Batch LoadingIts all about SPEED SPEED SPEED

EDW:1 Billion RowsAnd growing

10 Million Incoming Rows

60% - 80%Inserts

(Never Seen Before)

10%-20%UpdatesMatchedBy KEY

5%Deletes

• Inserts are the single fastest operation in the Database!

• Updates are the single slowest operation in the Database!

Q: Why push 80% of your Insert data through “the heaviest/slowest”transformation logic?

5/28/2010 27http://empoweredHoldings.com

Page 28: Data Vault Modeling & Methodology - 1105 Media: Home ...download.101com.com/pub/tdwi/files/DV_Presentation_TDWI Boston_Data... · Staging Area EDW – Data Vault Stage Load STAGING

Simple Loading Patterns

Source SQ LKP Target Filter If Exists Target Insert

Update View: SelectALL that exist By PK in targetONLY those with DELTA

Source SQ Target Insert

Source(Stage)

Insert View: SelectALL that do not existBy PK in target Target

Rule: 1 Target Per Data Flow (map/graph) Per Action

5/28/2010 28http://empoweredHoldings.com

Page 29: Data Vault Modeling & Methodology - 1105 Media: Home ...download.101com.com/pub/tdwi/files/DV_Presentation_TDWI Boston_Data... · Staging Area EDW – Data Vault Stage Load STAGING

Results of Pattern Tuning

FROM THIS…..

• 5M rows @ 600 RPS = 2.31 hrs• OR: 5m @ 7k rps = 11.9 mins• No parallelism

TO THIS!• Pass 1: 5m @ 33k RPS = 2.52 mins• Pass 2:

•5m @ 33k RPS = 2.52 mins•5m @ 25k RPS = 3.33 mins

• Pass 3:•5m @ 50k RPS = 1.66 mins•5m @ 33k RPS = 2.52 mins•5m @ 40k RPS = 2.03 mins•5m @ 23k RPS = 3.61 mins

• Total Time:•2.52+3.33+3.61 = 9.46 mins

This map must run at a minimum of 10k rps to beat the parallel times5m @ 10k rps = 8.33 mins

5/28/2010 29http://empoweredHoldings.com

Page 30: Data Vault Modeling & Methodology - 1105 Media: Home ...download.101com.com/pub/tdwi/files/DV_Presentation_TDWI Boston_Data... · Staging Area EDW – Data Vault Stage Load STAGING

LOADING THE DATA VAULTPatterns Take the Cake!

5/28/2010 30http://empoweredHoldings.com

Page 31: Data Vault Modeling & Methodology - 1105 Media: Home ...download.101com.com/pub/tdwi/files/DV_Presentation_TDWI Boston_Data... · Staging Area EDW – Data Vault Stage Load STAGING

Loading Templates: Hubs

• Select a “Master” system, and a hierarchy of importance for sub-systems to annotate arrival location of data

• Purpose of the loading template: Find out if the business key exists in the hub, if not – insert it

• Use a distinct list (unique) of business keys coming from the staging area

StagingDataStagingData

Distinct ListBK KeysDistinct ListBK Keys

Insert IntoTarget(Gen Surrogate)

Insert IntoTarget(Gen Surrogate)

HubHub

Drop RowFrom Feed

No

Yes

Exists InTarget?

5/28/2010 31http://empoweredHoldings.com

Page 32: Data Vault Modeling & Methodology - 1105 Media: Home ...download.101com.com/pub/tdwi/files/DV_Presentation_TDWI Boston_Data... · Staging Area EDW – Data Vault Stage Load STAGING

Loading Templates: Links

StagingDataStagingData

Distinct ListBusn KeysDistinct ListBusn Keys

Insert IntoTarget(gen surrogate)

Insert IntoTarget(gen surrogate)

LinkLink

Drop RowFrom Feed

No

Yes

Lookup EACHHubsSurrogateKeys

Lookup EACHHubsSurrogateKeys

• Select a “Master” system, and a hierarchy of importance for sub-systems to annotate arrival location of data

• Purpose of the loading template: Find all relationships between business keys, then, is the relationship already recorded in the Link, if not – insert it

• Use a distinct list of related business keys

Exists InTarget?

5/28/2010 32http://empoweredHoldings.com

Page 33: Data Vault Modeling & Methodology - 1105 Media: Home ...download.101com.com/pub/tdwi/files/DV_Presentation_TDWI Boston_Data... · Staging Area EDW – Data Vault Stage Load STAGING

Loading Templates: Satellites

• Select a “Master” system, and a hierarchy of importance for sub-systems to annotate arrival location of data

• Purpose of the loading template: Gather descriptive data, compare to most recent copy of information in satellite, and if there are any deltas – load, if not, don’t load

• Use a distinct list of descriptive fields from the source systems

StagingDataStagingData

Distinct ListSat RowsDistinct ListSat Rows

Insert IntoTargetInsert IntoTarget SatelliteSatellite

Drop RowFrom Feed

No

Yes

Lookup EACHHub’s or Link’sSurrogateKeys

Lookup EACHHub’s or Link’sSurrogateKeys

Find Latest Sat RowFind Latest Sat Row

All ColumnsMatch?

5/28/2010 33http://empoweredHoldings.com

Page 34: Data Vault Modeling & Methodology - 1105 Media: Home ...download.101com.com/pub/tdwi/files/DV_Presentation_TDWI Boston_Data... · Staging Area EDW – Data Vault Stage Load STAGING

GETTING STARTED… HOW TOHow to build your Data Vault…

5/28/2010 34http://empoweredHoldings.com

Page 35: Data Vault Modeling & Methodology - 1105 Media: Home ...download.101com.com/pub/tdwi/files/DV_Presentation_TDWI Boston_Data... · Staging Area EDW – Data Vault Stage Load STAGING

Step 1: Establish Scope(Build Business Case Model)

5/28/2010 http://empoweredHoldings.com 35

Page 36: Data Vault Modeling & Methodology - 1105 Media: Home ...download.101com.com/pub/tdwi/files/DV_Presentation_TDWI Boston_Data... · Staging Area EDW – Data Vault Stage Load STAGING

Step 1: Define Business Keys

5/28/2010 http://empoweredHoldings.com 36

Hub Campaign Hub Customer

Hub Invoice

Hub Products

Page 37: Data Vault Modeling & Methodology - 1105 Media: Home ...download.101com.com/pub/tdwi/files/DV_Presentation_TDWI Boston_Data... · Staging Area EDW – Data Vault Stage Load STAGING

Step 2: Define Associations

5/28/2010 http://empoweredHoldings.com 37

Hub Campaign Hub Customer

Hub Invoice

Hub Products

Link Campaign byInvoice by Customer

Link Invoice Line Items

Link Product onCampaign

Page 38: Data Vault Modeling & Methodology - 1105 Media: Home ...download.101com.com/pub/tdwi/files/DV_Presentation_TDWI Boston_Data... · Staging Area EDW – Data Vault Stage Load STAGING

Step 3: Define Descriptive Data

5/28/2010 http://empoweredHoldings.com 38

Hub Campaign Hub Customer

Hub Invoice

Hub Products

Link Campaign byInvoice by Customer

Link Invoice Line Items

Link Product onCampaign

Sat EffectivenessRatingsSat EffectivenessRatings

Sat EffectivenessDatesSat EffectivenessDates

Sat Availability DatesSat Availability DatesSat Defect ReasonsSat Defect Reasons Sat Stock QuantitiesSat Stock Quantities

Sat DescriptionsSat Descriptions

Sat Dates andAmountsSat Dates andAmounts

Sat AmountsSat Amounts Sat QuantitiesSat Quantities

Sat AddressSat Address

Sat ContactsSat Contacts

Sat DetailsSat Details

Page 39: Data Vault Modeling & Methodology - 1105 Media: Home ...download.101com.com/pub/tdwi/files/DV_Presentation_TDWI Boston_Data... · Staging Area EDW – Data Vault Stage Load STAGING

Step 4: Build Source Model (PK/FK)(No Pictures, Sorry)• Ensure the source model (DDL Only) has Primary and Foreign Keys defined• Normalize the source model (if not normalized)• Capture and integrate all source systems involved (if not already captured)• Add Comments to the DDL (tables and fields)

5/28/2010 http://empoweredHoldings.com 39

Page 40: Data Vault Modeling & Methodology - 1105 Media: Home ...download.101com.com/pub/tdwi/files/DV_Presentation_TDWI Boston_Data... · Staging Area EDW – Data Vault Stage Load STAGING

Step 5: Build Cross-Reference

5/28/2010 http://empoweredHoldings.com 40

SOURCE TABLE SOURCE COLUMN GROUP TARGET TABLE TARGET COLUMNAHLTAT_DIAGNOSIS DOC_REF 1 SAT_AHLTAT_DIAGNOSIS DOC_REF

DATAID 1 HUB_DIAGNOSIS DIAGNOSIS_DATAIDFACILITYNCID 1 HUB_FACILITY FAC_IDDIAGNOSISNCID 1 SAT_AHLTAT_DIAGNOSIS DIAGNOSISNCIDENCOUNTERNUMBER 1 HUB_EVENT EVNT_IDCLINICIANNCID 1 HUB_CLINICIAN CLINICIAN_NCIDUNIT_NUMBER 1 HUB_UNIT UNIT_IDMEDCINID 1 HUB_MEDCIN MEDCIN_IDCREATETIME 1 SAT_AHLTAT_DIAGNOSIS CREATETIMECREATEUSERNCID 1 SAT_AHLTAT_DIAGNOSIS CREATEUSERNCIDMODIFYUSERNCID 1 SAT_AHLTAT_DIAGNOSIS MODIFYUSERNCIDMODIFYTIME 1 SAT_AHLTAT_DIAGNOSIS MODIFYTIMEPRIORITY 1 SAT_AHLTAT_DIAGNOSIS PRIORITYDIAGNOSESCOMMENT 1 SAT_AHLTAT_DIAGNOSIS DIAGNOSESCOMMENT

The purpose of such an exercise is not to identify all the elements, but specifically to identify the target Hubs, (ie: the business keys), target Links, and at LEAST a single Satellite for at least 1 source column…

The engine (SaaS) will automatically assign all other descriptive elements to thefirst Satellite identified.

Page 41: Data Vault Modeling & Methodology - 1105 Media: Home ...download.101com.com/pub/tdwi/files/DV_Presentation_TDWI Boston_Data... · Staging Area EDW – Data Vault Stage Load STAGING

Step 6: Generate Baseline ETL/ELT

5/28/2010 http://empoweredHoldings.com 41

SourceDDL

TargetDDL

Cross-RefMapping

XLS

Generate Code,Reports, Documentation

Data Flows(Mappings / Graphs)

Page 42: Data Vault Modeling & Methodology - 1105 Media: Home ...download.101com.com/pub/tdwi/files/DV_Presentation_TDWI Boston_Data... · Staging Area EDW – Data Vault Stage Load STAGING

CONCLUSIONS / SUMMARYWhat did we learn?

5/28/2010 42http://empoweredHoldings.com

Page 43: Data Vault Modeling & Methodology - 1105 Media: Home ...download.101com.com/pub/tdwi/files/DV_Presentation_TDWI Boston_Data... · Staging Area EDW – Data Vault Stage Load STAGING

Data Vault…Modeling Is…• Made up of Hubs, Links, and Satellites• Easy to create and build• Hardest thing is to “find/locate” and define the Business Keys• Consistent, Scalable, Repeatable, Pattern Based• RULES BASED / STANDARDS DRIVEN

Loading Is….• Scalable, Fault-Tolerant, Parallelizable, Pattern Based• Generatable• Performance Based• 100% Restartable• Set Based• Devoid of “Soft” Business Rules!!

5/28/2010 43http://empoweredHoldings.com

Page 44: Data Vault Modeling & Methodology - 1105 Media: Home ...download.101com.com/pub/tdwi/files/DV_Presentation_TDWI Boston_Data... · Staging Area EDW – Data Vault Stage Load STAGING

Still - Lots To Learn…We didn’t cover: • Joins• point-in-time tables• building marts• business logic

components• SQL extraction• bridge tables

• what to do when…• dealing with bad data• architecting security,

managing governance, handling metadata

Contact me for Workshops (training), and Mentoring…

5/28/2010 44http://empoweredHoldings.com

Page 45: Data Vault Modeling & Methodology - 1105 Media: Home ...download.101com.com/pub/tdwi/files/DV_Presentation_TDWI Boston_Data... · Staging Area EDW – Data Vault Stage Load STAGING

Questions?Dan LinstedtPresident, Empowered Holdings, LLChttp://EmpoweredHoldings.comhttp://DanLinstedt.comTel: +1 802-524-8566E-Mail: [email protected]

SERVICES:• Consulting• Assessments• Product Selection Scorecards• Architecture / Design• Mentoring and Workshops (training)

5/28/2010 45http://empoweredHoldings.com