The Changing Data Quality & Data Governance Landscape

37
Be Certain, Be Trillium Certain The Changing Data Quality & Data Governance Landscape a survival guide for data governance & data quality professionals Trillium Software webinar – Wednesday 12 December Nigel Turner, VP Information Management Strategy

Transcript of The Changing Data Quality & Data Governance Landscape

Page 1: The Changing Data Quality & Data Governance Landscape

Be Certain, Be Trillium Certain

The Changing Data Quality & Data Governance Landscape a survival guide for data governance & data quality professionals

Trillium Software webinar – Wednesday 12 DecemberNigel Turner, VP Information Management Strategy

Page 2: The Changing Data Quality & Data Governance Landscape

The traditional DQ & Data Governance Landscape?

2 © Copyright 2012, Trillium Software, Inc. All rights reserved.

Page 3: The Changing Data Quality & Data Governance Landscape

The future DQ & Data Governance Landscape?

© Copyright 2012, Trillium Software, Inc. All rights reserved.3

Page 4: The Changing Data Quality & Data Governance Landscape

The changing landscape:

potential disruptive eruptions

BIG DATA

CLOUDCOMPUTING

DATAVIRTUALIZATION

4 © Copyright 2012, Trillium Software, Inc. All rights reserved.

Page 5: The Changing Data Quality & Data Governance Landscape

Disruptive eruption 1 –

Big Data

5 © Copyright 2012, Trillium Software, Inc. All rights reserved.

Page 6: The Changing Data Quality & Data Governance Landscape

Big Data – what is it?

� Set of new concepts, practices & technologies to manage & exploit digital data

� Can be defined as:

� “Data that exceeds the processing capability of conventional

database systems. The data is too big, moves too fast, or

doesn’t fit the strictures of your database architecture”

(Source: Ed Dumbill – O’Reilly Community)

� Its key premise is that all data has potential value if it can be collected, analysed and used to generate actionable insight

6 © Copyright 2012, Trillium Software, Inc. All rights reserved.

Page 7: The Changing Data Quality & Data Governance Landscape

The characteristics of Big Data - the 3Vs

• Reflects exponential growth of data – predicted 40-60% per

annum

• Today 2.5 quintillion bytes of data are created every day

• 90% of all digital data was created in the last two years

• Data generated more varied and complex than before:

– Text, Audio, Images, Machine Generated etc.

• Much of this data is semi-structured or unstructured• Traditional IT techniques ill equipped to process & analyse it

• Data often generated in real time

• Analysis and response needs to be rapid, often also real time

• Traditional BI / DW environments becoming obsolescent –

new approaches are needed

7 © Copyright 2012, Trillium Software, Inc. All rights reserved.

Page 8: The Changing Data Quality & Data Governance Landscape

What’s different about Big Data?

� New technologies which enable distributed & highly scalable MPP (Massively Parallel Processing), e.g.� Apache Hadoop

� MapReduce

� NoSQL databases

� Strong emphasis on analytical approaches� Emergence of “data science”

� Predictive Analytics

� Data Mining

� The “democratisation” of data � Data made available to all (cf Cloud Computing)

� Business and not IT led BI

8 © Copyright 2012, Trillium Software, Inc. All rights reserved.

Page 9: The Changing Data Quality & Data Governance Landscape

Where does Big Data come from?

SOCIAL MEDIA &SOCIAL

NETWORKS

MACHINE GENERATED

WIDELY KNOWN SOURCES

9 © Copyright 2012, Trillium Software, Inc. All rights reserved.

Page 10: The Changing Data Quality & Data Governance Landscape

Big Data – Foundations of Success

� Identifying the right data to solve the business problem or opportunity

� The ability to integrate & match varied data from multiple data sources

� structured, semi-structured, unstructured

� Building the right IT infrastructure to support Big Data applications

� Having the right capabilities & skills to exploit the data

10 © Copyright 2012, Trillium Software, Inc. All rights reserved.

Page 11: The Changing Data Quality & Data Governance Landscape

Big Data – Barriers & Pitfalls

� The sheer volume of data – what’s worth using?

� Data extraction challenges

� The ability to match data from disparate sources / formats / media

� The time taken to integrate new data sources

� The risks of mismatching and incorrect identification of individuals � Legal & regulatory pitfalls

� Security concerns – corporate & individual

� Lack of skills & expertise

11 © Copyright 2012, Trillium Software, Inc. All rights reserved.

Page 12: The Changing Data Quality & Data Governance Landscape

Big Data – the data integration challenge

SOCIAL

MEDIA

SENSORS

CS

DATA

EMAIL

MOBILES

EX

TE

RN

AL

DA

TA

SO

UR

CE

S

INT

ER

NA

L D

AT

A S

OU

RC

ESCRM

BILLING

OPS

SALES

PRODS

ANALYTICS PLATFORM 1

ANALYTICS PLATFORM 2

ANALYTICS PLATFORM 3

ANALYTICS PLATFORM n

ACTIONABLE INSIGHT

& KNOWLEDGE 12 © Copyright 2012, Trillium Software, Inc. All rights reserved.

Page 13: The Changing Data Quality & Data Governance Landscape

Big Data – DQ as the key enabler

SOCIAL

MEDIA

SENSOR

S

CS

DATA

EMAIL

EX

TE

RN

AL

DA

TA

SO

UR

CE

S

INT

ER

NA

L D

AT

A S

OU

RC

ES

CRM

BILLING

OPS

SALES

PRODS

ANALYTICS PLATFORM 1

ANALYTICS PLATFORM 2

ANALYTICS PLATFORM 3

ANALYTICS PLATFORM n

ACTIONABLE INSIGHT

& KNOWLEDGE

PROFILE

PARSE

STANDARDISE

MATCH

ENRICH

DATA QUALITY PLATFORM

PROFILE

PARSE

STANDARDISE

MATCH

ENRICH

MOBILES

13 © Copyright 2012, Trillium Software, Inc. All rights reserved.

Page 14: The Changing Data Quality & Data Governance Landscape

Big Data – the DG & DQ impact

• Big Data will depend on data quality to reap its claimed benefits – the GIGO truism

• The democratization of data will expose poor DQ

• The need for Data Governance increases as data becomes more accessible

• Data skills will become more

valued for ‘data science’

• Big Data will increase the 3Vs of data

• Control of data becomes more difficult – scope and

variety of use increases • Data standards & business

rules become more complex• Potential legal & regulatory

minefield

14 © Copyright 2012, Trillium Software, Inc. All rights reserved.

Page 15: The Changing Data Quality & Data Governance Landscape

Disruptive eruption 2 –

Cloud Computing

15 © Copyright 2012, Trillium Software, Inc. All rights reserved.

Page 16: The Changing Data Quality & Data Governance Landscape

Cloud Computing – Alternative Definitions

� “Cloud computing is the delivery of computing as a service rather than a product, whereby shared resources, software, and information are provided to computers and other devices as a metered service over a network (typically the Internet).” (Wikipedia)

� “Marketing term for the technologies that provide computation, software, data access, and storage services that do not require end-user knowledge of the physical location or configuration of the system that delivers the services.” (Trillium Software)

16 © Copyright 2012, Trillium Software, Inc. All rights reserved.

Page 17: The Changing Data Quality & Data Governance Landscape

Cloud Computing – the Wikipedia view

17 © Copyright 2012, Trillium Software, Inc. All rights reserved.

Page 18: The Changing Data Quality & Data Governance Landscape

Cloud Computing – Key Elements

� Provision of services via the Internet / network

� Virtual not physical allocation of resources

� Multi-tenanted hosting

� Pay as you use - not outright purchase (cf utilities)

� Cloud is a disruptive technology as it provides a clear

alternative model to outright purchase of hardware,

platforms & applications

18

18 © Copyright 2012, Trillium Software, Inc. All rights reserved.

Page 19: The Changing Data Quality & Data Governance Landscape

Types of clouds & services

� Public/private/hybrid options� Public – via the internet

� Private – via an intranet

� Hybrid – combination

� Cloud services � Infrastructure as a service (IaaS)

� Platform as a service (PaaS)

� Software as a service (SaaS)

� et al (XaaS)

19 © Copyright 2012, Trillium Software, Inc. All rights reserved.

Page 20: The Changing Data Quality & Data Governance Landscape

Cloud Computing: potential benefits (1)

� Speed to deploy new applications & services

� Greater standardisation

� Scalability & elasticity

� Lower initial implementation costs – CAPEX to OPEX

� Better cost control and lower internal IT costs (e.g.

help desks)

20 © Copyright 2012, Trillium Software, Inc. All rights reserved.

Page 21: The Changing Data Quality & Data Governance Landscape

Cloud Computing: potential benefits (2)

� Benefits to SMEs who cannot afford to purchase

� Try before you buy options – benefits both

customers & suppliers

� Self-service and self-configuration of services

� Better and faster user adoption

� Potentially improved performance

� Automatic data back ups

21 © Copyright 2012, Trillium Software, Inc. All rights reserved.

Page 22: The Changing Data Quality & Data Governance Landscape

Cloud Computing –barriers & risks

DATA DATA

SECURITYSECURITY

& PRIVACY& PRIVACY

CONCERNS CONCERNS

COMMERCIAL COMMERCIAL

& OPERATIONAL& OPERATIONAL

FACTORSFACTORS

APPLICATIONAPPLICATION

& DATA& DATA

INTEGRATIONINTEGRATION

CHALLENGESCHALLENGES

LEGAL &LEGAL &

REGULATORYREGULATORY

RESTRICTIONSRESTRICTIONS

22 © Copyright 2012, Trillium Software, Inc. All rights reserved.

Page 23: The Changing Data Quality & Data Governance Landscape

Preparing data for migration

• Scoping and scaling data to be migrated

• Evaluating its suitability for integration with other data sources

• Undertaking source data rationalization & cleanse

Migrating to the cloud environment

• Profiling data in advance of data migration

• Enhancing data in preparation for migration

• Maintaining DQ during ETL processes

Managing data in the cloud

• Enforcing business rules to be applied in the Cloud environment

• Auditing data to ensure security, adherence and quality

• Supporting data governance activities

Cloud – the role of DQ & DG

23 © Copyright 2012, Trillium Software, Inc. All rights reserved.

Page 24: The Changing Data Quality & Data Governance Landscape

Cloud Computing – the DG / DQ impact

• DQ / DG will be key to Cloud migration success –before, during and after migration

• Internal and external data integration will become key

• Could improve DQ as fewer devices will hold data

• DQ host and application companies may offer DQaaS

• Cloud will require an enhanced focus on data governance – within and outside the enterprise

• Organisations may lose physical control of data

• DQ SLAs will be needed with data hosts / suppliers

• Legal & regulatory compliance becomes a major challenge

24 © Copyright 2012, Trillium Software, Inc. All rights reserved.

Page 25: The Changing Data Quality & Data Governance Landscape

Disruptive eruption 3 –

Data Virtualization

25 © Copyright 2012, Trillium Software, Inc. All rights reserved.

Page 26: The Changing Data Quality & Data Governance Landscape

Data virtualization – a simple view

26 © Copyright 2012, Trillium Software, Inc. All rights reserved.

Page 27: The Changing Data Quality & Data Governance Landscape

Data Virtualization – a less simple view

27 © Copyright 2012, Trillium Software, Inc. All rights reserved.

Page 28: The Changing Data Quality & Data Governance Landscape

Data virtualization – the essentials

� Data is held in a variety of internal and external sources (e.g.

DBMS, DW, Excel etc.)

� A middleware layer sits above the data sources

� Creates a virtual view at run time and creates temporary

tables in a dedicated server

� Processes, assembles and presents the data to the application

layer / device

� Benefits claimed:

� Hides complexity from users

� Flexibility

� Speed - as data can be cached in memory

28 © Copyright 2012, Trillium Software, Inc. All rights reserved.

Page 29: The Changing Data Quality & Data Governance Landscape

Data virtualization – the DG / DQ impact

• Will put the focus on DQ & data

standardisation as a key

enabler to DV interoperability

• To work will require the deployment of both real time

and batch DQ capability

• Will require a Shared Business

Vocabulary (SBV) for shared data model and data standards

across an organisation

• Need for better DQ in source

systems to enable run time

integration

• Data is physically held in a

wide variety of sources so

makes coherent Data

Governance more difficult• Data at source will be used for

multiple applications so

common business rules harder

to agree• Run time integration requires

real time DQ – many

organisations do not have this

capability

29 © Copyright 2012, Trillium Software, Inc. All rights reserved.

Page 30: The Changing Data Quality & Data Governance Landscape

The potential eruptions…

DATAVIRTUALIZATION

BIG DATA

CLOUDCOMPUTING

30 © Copyright 2012, Trillium Software, Inc. All rights reserved.

Page 31: The Changing Data Quality & Data Governance Landscape

So what’s the impact of all this on DQ / DG practitioners?

New Data Quality & Data Governance challenges

What do we need to do?

Changing DQ and DG roles

& skills

31 © Copyright 2012, Trillium Software, Inc. All rights reserved.

Page 32: The Changing Data Quality & Data Governance Landscape

New DQ & Data Governance challenges

PREDOMINANTLY BATCH DQ

CUSTOMERORGANISATION

FOCUS

PROCEDURAL

FOCUS MAINLY WITHIN

THE ENTERPRISE

THE TRADITIONAL LANDSCAPE

SUPPLIER ORGANISATION

FOCUS

PREDOMINANTLYREAL TIME DQ

GROWING FOCUS OUTSIDE

THE ENTERPRISE

COMMERCIAL

THE CHANGING LANDSCAPE

32 © Copyright 2012, Trillium Software, Inc. All rights reserved.

Page 33: The Changing Data Quality & Data Governance Landscape

Changing DQ and DG roles

� DQ and Data Governance roles will become more ‘beyond

organisation’ facing – into hosting companies, data &

application suppliers etc.

� Many data management and DQ specialists will work with or

evolve into data scientists

� DQ and DG people will need to enhance their understanding

of global legal and regulatory environments

� Commercial and negotiation skills will become more

important

33 © Copyright 2012, Trillium Software, Inc. All rights reserved.

Page 34: The Changing Data Quality & Data Governance Landscape

What action should we take?

� Identify and get involved in any current or planned Big Data,

Cloud or Data Virtualization initiatives within our

organisations

� Ensure that the DQ and DG implications & imperatives of

these initiatives are understood

� Participate in any due diligence of potential third party

vendors & providers

� Plan for the new DQ and DG challenges that these trends will

pose

34 © Copyright 2012, Trillium Software, Inc. All rights reserved.

Page 35: The Changing Data Quality & Data Governance Landscape

The changing landscape

� Better DQ needs to be achieved in an environment where data will

continue to increase by 50% per annum

� The claimed benefits of Big Data, Cloud & Data Virtualisation cannot be

achieved without renewed emphasis on data quality management & data

governance

� Data governance becomes increasingly challenging & extends within and

outside the enterprise

� DQ services will increasingly be offered as DQaaS by vendors and data

hosts, and more DQ / DG roles may be outsourced

� As DQ practitioners we need to understand, educate and get involved

with those in our organisations who are creating the new landscape

35 © Copyright 2012, Trillium Software, Inc. All rights reserved.

Page 36: The Changing Data Quality & Data Governance Landscape

A final thought…

“It’s not the will to win

but the will to prepare to

win that makes the

difference”

Bear Bryant –US Football Coach

1913 – 1983

36 © Copyright 2012, Trillium Software, Inc. All rights reserved.

Page 37: The Changing Data Quality & Data Governance Landscape

Questions

Contact: [email protected]

www.trilliumsoftware.com

37 © Copyright 2012, Trillium Software, Inc. All rights reserved.