Idq summit2014 ronald damhof - it's all about the data

29
R.D.Damhof – October 2014 – IDQ Summit 2014 It’s all about the data A managerial perspective By Ronald Damhof

description

"It's all about the data, a managerial perspective" - these are the slides of the presentations I gave at Data Modeling Zone 2014 in Hamburg and at the International Data Quality Summit in Richmond (VA) 2014.

Transcript of Idq summit2014 ronald damhof - it's all about the data

Page 1: Idq summit2014   ronald damhof - it's all about the data

R.D.Damhof – October 2014 – IDQ Summit 2014

It’s all about the data !

A managerial perspective

By Ronald Damhof

Page 2: Idq summit2014   ronald damhof - it's all about the data

R.D.Damhof – Prudenza BV - Copyright - 22 mei 2014R.D.Damhof – October 2014 – IDQ Summit 2014

I am an opinionated kind a guy…. !

Page 3: Idq summit2014   ronald damhof - it's all about the data

R.D.Damhof – Prudenza BV - Copyright - 22 mei 2014R.D.Damhof – October 2014 – IDQ Summit 2014

Who am I - My Data Manifesto

The X commandments of data management !I. Context is leading !II. Data is the ultimate proprietary asset, it is to be managed and

governed in line with morals & ethics, internal and external rules and legislation

!III. Stop center apps and process over data; data first, facts first !IV. It is all about the quality of our product; the data. Get clean,Stay

clean, Get access !V. Thou shall abstract

and separate concerns rigorously !

!

Page 4: Idq summit2014   ronald damhof - it's all about the data

R.D.Damhof – Prudenza BV - Copyright - 22 mei 2014R.D.Damhof – October 2014 – IDQ Summit 2014

Who am I - My Data Manifesto

The X commandments of data management !!VI. a) Thou shall make a fundamentalistic distinction between Fact

and Context b) Thou shall not forsake ‘Time’

!VII.Data architecture is not the same as technology architecture !VIII.The science and practice of Information & Data Modeling needs

to be uphold, improved and taught !IX. Specify, Standardise, Automate & Productise !X. Thou can not buy your way out of the data misery you are in

!

!

Page 5: Idq summit2014   ronald damhof - it's all about the data

R.D.Damhof – Prudenza BV - Copyright - 22 mei 2014R.D.Damhof – October 2014 – IDQ Summit 2014

XI There is a new saviour in town. Its name is Hadoop and

it calls to us from its mountain: !

‘we got a lake and thou shall throw all your data in it. The water will be clean so you can drink it, the water will flow so it will irrigate your lands, grow your stock, feed your kids and

of course bring you world peace…..’ !

nah, kidding ;-) !

Who am I - My Data Manifesto

The X commandments of data management

Page 6: Idq summit2014   ronald damhof - it's all about the data

R.D.Damhof – September 2014 – Data Modeling Zone

Page 7: Idq summit2014   ronald damhof - it's all about the data

R.D.Damhof – Prudenza BV - Copyright - 22 mei 2014R.D.Damhof – October 2014 – IDQ Summit 2014

Page 8: Idq summit2014   ronald damhof - it's all about the data

R.D.Damhof – October 2014 – IDQ Summit 2014

Page 9: Idq summit2014   ronald damhof - it's all about the data

R.D.Damhof – October 2014 – IDQ Summit 2014

Logistics & Manufacturing

Page 10: Idq summit2014   ronald damhof - it's all about the data

R.D.Damhof – October 2014 – IDQ Summit 2014

Page 11: Idq summit2014   ronald damhof - it's all about the data

R.D.Damhof – October 2014 – IDQ Summit 2014

Push/Supply/Source driven Pull/Demand/Product driven

▪ Mass deployment ▪ Control > Agility!▪ Validation of “ingredients” ▪ Repeatable & predictable processes ▪ Standardized processes ▪ High level of automation ▪ Relatively high IT/Data expertise

▪ Piece deployment ▪ Agility > Control!▪ Plausibility ▪ User-friendliness ▪ Relatively low IT expertise ▪ Domain expertise essential

All facts, fully temporal Truth, Interpretation, Context

Business Rules Downstream

The Data Push Pull Point

Page 12: Idq summit2014   ronald damhof - it's all about the data

R.D.Damhof – October 2014 – IDQ Summit 2014

Systematic

Opportunistic

▪ User and developer are separated ▪ Defensive Governance; focus on control and compliance ▪ Strong focus on non-functionals; auditability, robustness, traceability, …. ▪ Centralised and organisation-wide information domain ▪ Configured and controlled deployment environment (dev/tst/acc/prod)

▪ User and developer are the same person or closely related ▪ Offensive governance; focus on adaptability & agility ▪ Decentralised,personal/workgroup/department/theme information domain ▪ All deployment is done in production

The Development Style

Page 13: Idq summit2014   ronald damhof - it's all about the data

R.D.Damhof – October 2014 – IDQ Summit 2014

Development Style

Systematic

Opportunistic

I II

III IV

Research, Innovation & Design

“Shadow IT, Incubation, Ad-hoc,

Once off”

Push/Supply/Source driven Pull/Demand/Product driven

Data Push/Pull

Point

ContextFacts

A Data Deployment Quadrant

Page 14: Idq summit2014   ronald damhof - it's all about the data

R.D.Damhof – October 2014 – IDQ Summit 2014

7 Applications of the Quadrant

Page 15: Idq summit2014   ronald damhof - it's all about the data

R.D.Damhof – October 2014 – IDQ Summit 2014

(1) How we produce

Page 16: Idq summit2014   ronald damhof - it's all about the data

R.D.Damhof – October 2014 – IDQ Summit 2014

How we produce, process variants

Page 17: Idq summit2014   ronald damhof - it's all about the data

R.D.Damhof – October 2014 – IDQ Summit 2014

How we produce, automation

Rephrased - somewhat more nerdy:!• Model-driven, metadata driven!• Declarative instead of imperative !!Rephrased - somewhat more popular: !“In Data, the developer is the data modeller”

Page 18: Idq summit2014   ronald damhof - it's all about the data

R.D.Damhof – October 2014 – IDQ Summit 2014

Production-line: Data orientation

Data Products Information Products

Access to data

Analytical tools

Processing Power

Production-line: Forms orientation

Eg. XBRL

How we produce, production lines

Page 19: Idq summit2014   ronald damhof - it's all about the data

R.D.Damhof – October 2014 – IDQ Summit 2014

(2) How we organize

Page 20: Idq summit2014   ronald damhof - it's all about the data

R.D.Damhof – October 2014 – IDQ Summit 2014

To centralize or to decentralize

Page 21: Idq summit2014   ronald damhof - it's all about the data

R.D.Damhof – October 2014 – IDQ Summit 2014

(3) How we govern

Page 22: Idq summit2014   ronald damhof - it's all about the data

R.D.Damhof – October 2014 – IDQ Summit 2014

How we govern, products

Page 23: Idq summit2014   ronald damhof - it's all about the data

R.D.Damhof – October 2014 – IDQ Summit 2014

I II

III IV

Deliverant is Accountable

Demandee is Accountable

Data scientist/Analyst/Researcher responsible

How we govern, accountability Never, never, never ‘ownership’

In- en outboundData Delivery Agreements

Page 24: Idq summit2014   ronald damhof - it's all about the data

R.D.Damhof – October 2014 – IDQ Summit 2014

(4) How do people excel

Page 25: Idq summit2014   ronald damhof - it's all about the data

R.D.Damhof – October 2014 – IDQ Summit 2014

(5) How to use technology

Page 26: Idq summit2014   ronald damhof - it's all about the data

R.D.Damhof – October 2014 – IDQ Summit 2014

Storage: (R)DBMS Processing: Automation Software Data Quality: Validation, Profiling Development: Data Modeling Accessibility: Data Virtualization

Storage: Pattern based Processing: Automation/limited ETL Data Quality: DQ rules/dashboards User tooling: Reporting, dashboards, Data Visualization

Storage: Analytical Processing: Preptools for Data Analyst User tooling: Advanced Analytics, Data Visualization

(6) How about Technology

Page 27: Idq summit2014   ronald damhof - it's all about the data

R.D.Damhof – October 2014 – IDQ Summit 2014

(7) Business-,Information- or Data Modeling is key

The Logical Model drives the technical data architecture, design and implementation

Conceptual

Logical

e.g Data Vault,

Anchor Model

e.g. Dimensional,

hierarchical,flat

OntologyFacts

Relational

Page 28: Idq summit2014   ronald damhof - it's all about the data

R.D.Damhof – October 2014 – IDQ Summit 2014

Oh…data warehouse?The classic distinction between ‘operational data environment’ and ‘informational data environment’ is fading. "!

Modern day data warehouses have been split up. Where the ‘fact’-part (Q1) moved into the operational side."!

Although data warehouses have evolved, operational applications have not, at least not in terms of data architecture. They should though…..

Page 29: Idq summit2014   ronald damhof - it's all about the data

R.D.Damhof – October 2014 – IDQ Summit 2014

Email: [email protected] Linkedin: nl.linkedin.com/in/ronalddamhof/ Twitter: RonaldDamhof Blog: prudenza.typepad.com Website: www.prudenza.nl