Data vault what's Next: Part 2

21
Data Vault Modeling What’s Next? Part 2 © Dan Linstedt 2009-2012 This was PART 2 of a presentation I gave at an Array Conference In the Netherlands, in 2009.

description

Part 2 of a 2 part presentation that I did in 2009, this presentation covers more about unstructured data, and operational data vault components. YES, even then I was commenting on how this market will evolve. IF you want to use these slides, please let me know, and add: "(C) Dan Linstedt, all rights reserved, http://LearnDataVault.com" in a VISIBLE fashion on your slides.

Transcript of Data vault what's Next: Part 2

Page 1: Data vault what's Next: Part 2

Data Vault Modeling

What’s Next? Part 2

© Dan Linstedt 2009-2012

This was PART 2 of a presentation I gave at an Array Conference In the Netherlands, in 2009.

Page 2: Data vault what's Next: Part 2

A bit about me…

• Author, Inventor, Speaker – and part time photographer…

• 25+ years in the IT industry• Worked in DoD, US Gov’t, Fortune 50, and

so on…

• Find out more about the Data Vault:o http://www.youtube.com/LearnDataVaulto http://LearnDataVault.com

• Full profile on http://www.LinkedIn.com/dlinstedt

LearnDataVault.com

Page 3: Data vault what's Next: Part 2

Where are We Today?• IF you are using Data Vault…

o Auto Generation of Staging Loadso Auto Generation of Data Vault Loadso Auto Generation of Data Vault Reconciliation Routineso Auto Generation of RAW Star Schemaso Rapid Build out of Star Schemas

• If you are lucky…o Auto Generation of the Data Vault Modelo Auto Consolidation of Source System Data Modelso Auto Generation of the Staging Data Model

LearnDataVault.com

Page 4: Data vault what's Next: Part 2

Where do all these pieces fit?

DW2.0 Framework!

LearnDataVault.com

Page 5: Data vault what's Next: Part 2

LearnDataVault.com

DW2.0 Framework

METADATA

Interactive

Archival

Integrated

Near-Line

Tactical

Historical

Strategic

Extended

Enterprise Data Warehouse

Active Data Mining

TransformationActive

Cleansing

Cube Processing

TemporalIndexing

SemanticManagement

Enterprise Service Bus / SOA / Web Services

Unstructured Data:• Email• Plain Text• Word Docs• Images

HOT

MEDIUM

TEMP

WARM

COLD

SSD!(Cloud RAM)

CloudStorage

Page 6: Data vault what's Next: Part 2

How do we get there?

LearnDataVault.com

Page 7: Data vault what's Next: Part 2

Virtual Marts: What are they?

They Are:• RAM based data marts, or SSD drive based Data

Marts• OLAP cubes (most of the time) built on the fly by

new queries• “hot-data” that are continually accessed by the BI

tool• the result sets of the most frequently used queries• built dynamically, are accessed regularly, and are

destroyed after “idle” for a specific time• FAST• only a subset of data from the EDW

NOTE: They have WRITE-BACK capabilities!!LearnDataVault.com

Page 8: Data vault what's Next: Part 2

Virtual MartsREQUIREMENTS• Cloud based RDBMS

o with expandable RAMo Unlimited computing powero Maximum parallelismo Extreme scalability

• OR: Big Hardware with similar attributes

LearnDataVault.com

BENEFITS• Highly Alterable Answer Sets• Write Back to BDV• Dynamic create/destroy

capability• No “copy” of the data except

in RAM

Page 9: Data vault what's Next: Part 2

Virtual Marts: How do I build one?

• You can, if you have Solid-State-Disk (RAM-DISK) in your database server

• You can if you are using Cloud Technology• Building one is the job of the 2010 RDBMS engine

(today’s database engines do not provide these capabilities)

• However: To emulate, you can build one as follows:o Monitor the queries most frequently executedo Build the Cubes / stars on a regular schedule (automated queries)o Tear the cubes down when queries no longer access the data

Remember: It will be YOUR job to maintain, monitor and manage these components until the database engines get there with HOT data.

LearnDataVault.com

Page 10: Data vault what's Next: Part 2

Virtual Marts Affect The BDV

Write Back Capability:• from Virtual Marts affect business decisions• New Business transactions/changed transactions will be

fed back to operational systems• Changes will be sent on the bus to notify other systems

of business decisions

• User security and control will have to be in place to authorize WHO can change WHAT in which parts of the marts.

• Tracking of each change will become a required standard

Eventually the Virtual Marts will become a MIXED BI Application with an operational front end!

LearnDataVault.com

Page 11: Data vault what's Next: Part 2

Unstructured Data: What is it?

• It is: Information that resides on your desktop, on your servers, on the web, is multi-lingual, and conceptually based.

• Technically: Documents, E-Mails, Transcripts, Videos, Images, Sound Files.

• It is 80% of the data yet un-used by EDW/BI operations around the world

• It is 10x harder to deal with than structured data due to privacy concerns, ownership issues, and ethical concerns.

• Data Governance, and Data Stewardship play a HUGE role in the success/failure of working with Unstructured Data Sets

LearnDataVault.com

Page 12: Data vault what's Next: Part 2

LearnDataVault.com

Unstructured Data

REQUIREMENTS• Pre-Processed data sets• Pointers to data sets• Use of & Loading of Ontologies• Multi-Language processing

BENEFITS• Highly Alterable Answer Sets• Write Back to BDV• Dynamic create/destroy

capability• No “copy” of the data except

in RAM

Page 13: Data vault what's Next: Part 2

Unstructured Data Engines Vs

Search Engines

Unstructured Data Engine Search Engine

LearnDataVault.com

• Indexes Documents• Locates ALL potential

matches• Uses Data Mining / Neural

Nets• Correlates across multiple

languages, multiple meanings of phrases

• Induction based reasoning• Similarity Ratings based on

Confidence and Strength• Deep Analysis (focused on 1

question)• Utilizes Ontologies

• Indexes key terms• Locates “most likely match”• Uses Statistical Analysis• Correlates based on “Term

matching”• Wide search, but not “deep

analysis”

Page 14: Data vault what's Next: Part 2

U-Data & Data Vault

LearnDataVault.com

Unstructured Data – Loaded To Database

Structured RAW Data Vault

Dynamic LinksBuilt from Analyzing Queries

And OntologiesUsed to Load Cubes!

Ontology, Loaded to Database

Page 15: Data vault what's Next: Part 2

U-Data & Ontologies• Ontologies describe term relationships• Ontologies house term hierarchies• Ontologies can correlate terms across languages• Ontologies can provide synonyms, homonyms, and

antonyms• Ontologies are the key piece of Metadata needed

to cross unstructured mining results to structured data sets in source systems

• Ontologies define the manner in which natural language ties together concepts

Ontologies (or pieces of them) are required for success within the understanding of Unstructured Data & Structured Data Combinations

LearnDataVault.com

Page 16: Data vault what's Next: Part 2

Ontologies and BI Applications

• Business Users will shift their BI applications to include managing data sets THROUGH ontology specifications

• Business Users will assign governance to ontologies and manage changes to ontologies as their metadata definitions

• Tomorrows BI tool set will provide visualizations of Ontologies cross-mapped to analytical data sets

Ontologies ARE the metadata of tomorrow

LearnDataVault.com

Page 17: Data vault what's Next: Part 2

LearnDataVault.com

Plateau: Operational Data

Warehouse

REQUIREMENTS• Web-Services feeds with real-time

data• Applications for metadata

management on top of the EDV• Applications for Ontology

Management on top of the EDV• Applications to edit/maintain

Operational Data• Virtual Data Marts• In-DB Data Mining Engine CapabilitiesBENEFITS

• Direct ties between the operational world and the Data Warehouse

• Rapid turn around/impact analysis by business users

Page 18: Data vault what's Next: Part 2

Operational DV: How to Build One

• The Easy Way:o Start with standard Data Vault Modelingo Attach Web-Services for In-flow/Out-Flow of Data (putting the DV on the ESB

as a 24x7x365 operational component)o Use Business Workflow Engines to monitor, create, edit, change and build

applications on top of the web-services and web messages componentso Never allow direct access to the data in the Data Vault EXCEPT through web-

services

• The Hard Way:o Start with Standard Data Vault Modelingo Attach Web Services for In-Flow/Out-Flow of Datao Build a common data access layer (CDAL) that houses transactions in RAM

(manages locking of data sets)o Build applications on top of the CDALo Put the whole thing on the CLOUD to allow dynamic data marts

LearnDataVault.com

Page 19: Data vault what's Next: Part 2

The Experts Say…“The Data Vault is the optimal choice for modeling the EDW in the DW 2.0 framework.” Bill Inmon

“The Data Vault is foundationally strong and exceptionally scalable architecture.” Stephen Brobst

“The Data Vault is a technique which some industry experts have predicted may spark a revolution as the next big thing in data modeling for enterprise warehousing....” Doug Laney

LearnDataVault.com

Page 20: Data vault what's Next: Part 2

More Notables…

“This enables organizations to take control of their data warehousing destiny, supporting better and more relevant data warehouses in less time than before.” Howard Dresner

“[The Data Vault] captures a practical body of knowledge for data warehouse development which both agile and traditional practitioners

will benefit from..” Scott Ambler

LearnDataVault.com

Page 21: Data vault what's Next: Part 2

Where To Learn More• The Technical Modeling Book:

http://LearnDataVault.com

• The Discussion Forums: & eventshttp://LinkedIn.com – Data Vault Discussions

• Contact me:http://DanLinstedt.com - web [email protected] - email

• World wide User Group (Free)http://dvusergroup.com

LearnDataVault.com