CDMP preparation workshop EDW2016

166
CDMP Preparation Workshop EDW April 2016 Presented by: Chris Bradley and Katherine O’Keefe

Transcript of CDMP preparation workshop EDW2016

Page 1: CDMP preparation workshop EDW2016

CDMP Preparation WorkshopEDW April 2016

Presented by:

Chris Bradley and Katherine O’Keefe

Page 2: CDMP preparation workshop EDW2016

Christopher BradleyPresident DAMA-UKCDMP FellowCDMP Author &

ExaminerDAMA Professional

Achievement AwardDMBoK 2 co-author

Who We Are

35 years Global Data Management Experience

Author DMBoK education series

Independent Consultant Data Management Advisors

Information Strategist, Author, Trainer

[email protected]

Page 3: CDMP preparation workshop EDW2016

Christopher BradleyChris has 35 years of Information Management experience & is a leading Independent Information Management strategy advisor.

In the Information Management field, Chris works with prominent organizations including HSBC, Celgene, GSK, Pfizer, Icon, Quintiles, Total, Barclays, ANZ, GSK, Shell, BP, Statoil, Riyad Bank & Aramco. He addresses challenges faced by large organisations in the areas of Data Governance, Master Data Management, Information Management Strategy, Data Quality, Metadata Management and Business Intelligence.

He is a Director of DAMA- I, is the inaugural CDMP Fellow, author & examiner for CDMP, a Fellow of the Chartered Institute of Management Consulting (now IC) a member of the MPO, and SME Director of the DM Board. He also is the

recipient of the DAMA lifetime professional achievement award.

A recognised thought-leader in Information Management Chris is the author of numerous papers, books, including sections of DMBoK 2.0, a columnist, a frequent contributor to industry publications and member of several IM standards authorities.

He leads an experts channel on the influential BeyeNETWORK, is a sought after speaker at major international conferences, and is the co-author of “Data Modelling For The Business – A Handbook for aligning the business with IT using high-level data models”. He also blogs frequently on Information Management (and motorsport).

Page 4: CDMP preparation workshop EDW2016

@inforacer

uk.linkedin.com/in/christophermichaelbradley/

+44 7973 184475 (mobile) +44 1225 923000 (office)

infomanagementlifeandpetrol.blogspot.com

Christopher BradleyI N F O R M A T I O N M A N A G E M E N T S T R A T E G I S T

[email protected]

T R A I N I N G

A D V I S O R Y

C O N S U L T I N G

C E R T I F I C A T I O N

Page 5: CDMP preparation workshop EDW2016

Katherine O’Keefe, PhD• Project Lead Consultant,

CDMP exams design

• Data Governance and Data Privacy Consultant and Trainer with Castlebridge Associates

• Lecturer on Irish Law Society certification course for Data Protection

• Tutor and Lecturer in English and Irish Literature and Drama

Who We Are

Ethics in Information Management

Storytelling for Change Management

Data Privacy and the EU General Data Protection Regulation

[email protected] AssociatesC H A N G I N G H O W P E O P L E T H I N K A B O U T I N F O R M A T I O N

Page 6: CDMP preparation workshop EDW2016

Katherine O’Keefe, PhD

Dr Katherine O'Keefe is a Data Governance and Data Privacy consultant and trainer with Castlebridge Associates, specializing in “translating Data Geek to People Speak”.

Katherine has worked with clients in a variety of sectors on consulting and training engagements since starting with Castlebridge Associates. In addition to her professional experience in Data Governance and Privacy. Katherine holds a Doctorate in Anglo-Irish Literature from University College Dublin with an interdisciplinary focus on Philosophy, and as well as being a Data Governance and Privacy consultant, is a world

leading expert on the Fairy Tales of Oscar Wilde.

Ten years of experience teaching in diverse learning environments. As an experienced teacher of English as a foreign language, she understands the challenges of translating concepts across language and culture.

She is the author of “A Primer on Ethical Principles in an Information Governance Framework”, which sets out a structured, first principles based framework for ethical decision making in the processing of data.

Castlebridge AssociatesC H A N G I N G H O W P E O P L E T H I N K A B O U T I N F O R M A T I O N

Page 7: CDMP preparation workshop EDW2016

@okeefekat

https://ie.linkedin.com/in/okeefekat

+353 86 3699863

www.castlebridge.ie

Katherine O’KeefeI N F O R M A T I O N G O V E R N A N C E A N D D A T A P R I V A C Y

C O N S U L T A N T A N D T R A I N E R

[email protected]

T R A I N I N G

A D V I S O R Y

S T R A T E G Y

C O N S U L T I N G

Castlebridge AssociatesC H A N G I N G H O W P E O P L E T H I N K A B O U T I N F O R M A T I O N

Page 8: CDMP preparation workshop EDW2016

CDMP Revamped 2015

Page 9: CDMP preparation workshop EDW2016

Comparing the Levels

Page 10: CDMP preparation workshop EDW2016

CDMP Exam PricesItem Member Non-Member

Associate (DM Fundamentals) $220 $290*

Associate to Practitioner/ Master Data Management (DM) Fundamentals Exam conversion**

$150 $220*

Practitioner/ Master DM Advanced Exam $250 $330*

Practitioner/ Master Elective Exams (per exam) $250 N/A

Master Case Study Elective Exam *** $280 N/A

Exam re-take (Master & Practitioner levels only) $230 $300*

* Non-members receive 1 year’s Central Membership of DAMAI with their first DM exam** Associate exam focuses on theory and concepts based on the DMBOK (V1 currently)** Practitioner and Master focuses on applying/ implementing the theory and concepts** Marks gained at Associate Level do not convert to similar marks at Practitioner Level. Associate CDMP’s must write DM

Advanced to progress to the next level*** Individuals aiming for Master must provide a case study related to one of their two elective topics as well as pass all 3 exams at

80% and above

An admin fee of $50 will be levied per exam for cancellations or date changesTransfers of exams/ membership from one individual to another are not permitted

ALL EXAMS ARE TAKEN ONLINE ONLYONLINE OR CHAPTER-LED PROCTORING REQUIRES FULL PAYMENT UP FRONTPlease watch out at various international DAMA I endorsed conferences for exam proctoring and preparation workshops

Page 11: CDMP preparation workshop EDW2016

Data Management Fundamentals (Associate Level)

100 questions

90 minutes

60% to pass

Taking the Exams: Associate

1 Exam Data Management Fundamentals

Page 12: CDMP preparation workshop EDW2016

Data Management Fundamentals(Practitioner Level)

110 questions

90 minutes

70% to pass

Taking the Exams: Practitioner

Data Management Fundamentals (Practitioner Level)

+ 2 Advanced Elective exams

Elective Exams (each)

100 questions

90 minutes

70% to pass

Page 13: CDMP preparation workshop EDW2016

Data Management Fundamentals(Practitioner Level)

110 questions

90 minutes

80% to pass

Taking the Exams: Master

Data Management Fundamentals (Practitioner Level)

+ 2 Advanced Elective exams

Elective Exams (each)

100 questions

90 minutes

80% to pass

Page 14: CDMP preparation workshop EDW2016

Substitution Exams

Page 15: CDMP preparation workshop EDW2016

Adjacent Knowledge AreaCertificate Recognition

Page 16: CDMP preparation workshop EDW2016

Sunday, 4/17/2016 10:30 AM - 02:00 PM | CDMP Preparation 06:00 PM - 07:30 PM | CDMP Exam [Associate]

Tuesday, 4/19/2016 04:30 PM - 06:00 PM | CDMP Exam [Associate or Practitioner or

Practitioner electives] 06:00 PM - 07:30 PM | CDMP Exam [Associate or Practitioner or

Practitioner electives]

Wednesday, 4/20/2016 02:00 PM - 03:30 PM | CDMP Exam [Associate or Practitioner or

Practitioner electives]

CDMP Testing at EDW

Page 17: CDMP preparation workshop EDW2016

DMBOK Wheel(Version 1)

Page 18: CDMP preparation workshop EDW2016

Bloom’s Taxonomy of Learning: Cognitive Domains

Recall, Restate, Define, Identify, List, Name

Classify, Compare, Summarize, Explain

Implement, Use, Carry out, Execute

Strategize, Design, Make, Plan, Produce

Reflect, Critique, Test, Judge, Monitor, Assess

Integrate, Organize, Compare, Deconstruct

Page 19: CDMP preparation workshop EDW2016

Metacognitive

Conceptual

Procedural

Factual

Dimensions of Knowledge

Page 20: CDMP preparation workshop EDW2016

Bloom’s Taxonomy Revised

Page 21: CDMP preparation workshop EDW2016

The Anatomy of a Multiple Choice Question Item

How many economists does it take to change a lightbulb?

KeyA. They can't tell you unless you give them a lightbulb

approximation to work on. B. They're projecting three for next year, but that's a

conservative estimate.C. Nine. One to change the bulb, and eight to hold a

seminar on how Nietzche would have done it. D. One, but they'll spend three hours checking it for

alignment and leaks.E. How many did it take this time last year?

Distractors

Stem

Alternatives

Page 22: CDMP preparation workshop EDW2016

Direct Answer (only correct choice)vs. Best Answer (most correct choice)

An example of a Direct Answer item:

The California State Capitol is located in which city?

A. Los Angeles

B. Monterey

C. Sacramento

D. San Jose

An example of a Best Answer Item:

Why does the planet Mercury have a year of 88 Earth days?

a) Mercury’s year is shorter than Earth’s

b) Mercury’s small size and elliptical orbit make it travel faster than Earth.

c) Mercury’s orbit is closer to the sun than is Earth’s.

A B

Page 23: CDMP preparation workshop EDW2016

Exam Questions: evaluating the same information at different levels

Which of the following is characteristic of a good Data Steward? (According to DAMA-DMBOK version 1)

A. Quality A

B. Quality B

C. Quality C

D. Quality D

You need Data Stewards for your DG programme: which of these people would best fit the role?

a) Description of person A

b) Description of person B

c) Description of person C

d) Description of person D

A B

Page 24: CDMP preparation workshop EDW2016

Practitioner Level Knowledge:Going Beyond the DMBOK

Page 25: CDMP preparation workshop EDW2016

Data Management Functions

DATAARCHITECTUREMANAGEMENT

DATADEVELOPMENT

DATABASEOPERATIONSMANAGEMENT

DATA SECURITYMANAGEMENT

REFERENCE & MASTER DATAMANAGEMENT

DATA QUALITYMANAGEMENT

META DATAMANAGEMENT

DOCUMENT & CONTENTMANAGEMENT

DATA WAREHOUSE

& BUSINESS INTELLIGENCE MANAGEMENT

DATA GOVERNANCE

› Enterprise Data Modelling› Value Chain Analysis› Related Data Architecture

› External Codes› Internal Codes› Customer Data› Product Data› Dimension Management

› Acquisition› Recovery› Tuning› Retention› Purging

› Standards› Classifications› Administration› Authentication› Auditing

› Analysis› Data modelling› Database Design› Implementation

› Strategy› Organisation & Roles› Policies & Standards› Issues› Valuation

› Architecture› Implementation› Training & Support› Monitoring & Tuning

› Acquisition & Storage› Backup & Recovery› Content Management› Retrieval› Retention

› Architecture› Integration› Control› Delivery

› Specification› Analysis› Measurement› Improvement

Page 26: CDMP preparation workshop EDW2016

DMBoK Webinars to date

DMBoK Overview

26th Feb 2015

Master & Ref Data

30th March

Data Modelling

2nd June

Data Quality

18th AugustDW & BI

19th September

Data Risk & Security:

October 20th

Data Lifecycle

Management:

December 11th

Metadata Management:

November 17th

Data Governance:

January 12th 2016

Data Operations:

February 26th 2016

Document & Content

Management March 15th 2016

NEW FOR DMBoK 2

Data Integration & Interoperability

April 12th 2016

https://goo.gl/MdQlgn

Page 27: CDMP preparation workshop EDW2016

CDMP Certification & DMBoK TrainingMore to come

Information Management Disciplines of the DMBoK

CDMP Preparation & Examinations

April 17-19EDW 2016

San DiegoUSA

April 26-28IRM Training

London UK

CDMP Preparation & Examinations

May 16-18IRM MDMDG

London UK

Data Quality Management May 26-27 Rome Italy

Information Management Disciplines of the DMBoK & CDMP Preparation & Exams

July 10-21 DubaiUAE

CDMP Preparation & Examinations

November 7-9IRM ED/BI

London UK

Page 28: CDMP preparation workshop EDW2016
Page 29: CDMP preparation workshop EDW2016

Data Management Fundamentals

Page 30: CDMP preparation workshop EDW2016

DM Fundamentals Contents

1. Data Management Process

2. Data Governance Function

3. Data Architecture Management Function

4. Data Development Function

5. Data Operations Management Function

6. Data Security Management Function

7. Reference & Master Data Management Function

8. Data Warehousing and Business Intelligence Management Function

9. Document and Content Management Function

10. Meta-data Management Function

11. Data Quality Management Function

Page 31: CDMP preparation workshop EDW2016

Data Management Process

Page 32: CDMP preparation workshop EDW2016

ITIL

IT Infrastructure Library

Page 33: CDMP preparation workshop EDW2016

Information Lifecycle & SDLC

PURGEPLAN SPECIFY ENABLECREATE & ACQUIRE

MAINTAIN & USE

ARCHIVE & RETRIEVE

MAINTAINPLAN ANALYSE DESIGN BUILD TEST DEPLOY

(SOURCE DAMA)

S YS T E M S D E VE L O P M E NT L I F E CYCL E ( S D L C)

I NF O R M AT I O N L I F E C Y C L E

Page 34: CDMP preparation workshop EDW2016

The Information Lifecycle THE INFORMATION LIFECYCLE (DAMA)

› IM strategy

› Governance

› Define policies and procedures for quality, retention, security etc

› Architecture

› Conceptual, logical and physical modelling

› Install or provision servers, networks, storage, DBMSs

› Access controls

› Data created, acquired (external), extracted, imported, migrated, organised

› Data validated, edited, cleansed, converted, reviewed, reported, analysed

› Data archived, retained and retrieved

› Data deleted

PURGEPLAN SPECIFY ENABLECREATE & ACQUIRE

MAINTAIN & USE

ARCHIVE &

RETRIEVE

(SOURCE DAMA)

Page 35: CDMP preparation workshop EDW2016

Data Management Functions

DATAARCHITECTUREMANAGEMENT

DATADEVELOPMENT

DATABASEOPERATIONSMANAGEMENT

DATA SECURITYMANAGEMENT

REFERENCE & MASTER DATAMANAGEMENT

DATA QUALITYMANAGEMENT

META DATAMANAGEMENT

DOCUMENT & CONTENTMANAGEMENT

DATA WAREHOUSE

& BUSINESS INTELLIGENCE MANAGEMENT

DATA GOVERNANCE

› Enterprise Data Modelling› Value Chain Analysis› Related Data Architecture

› External Codes› Internal Codes› Customer Data› Product Data› Dimension Management

› Acquisition› Recovery› Tuning› Retention› Purging

› Standards› Classifications› Administration› Authentication› Auditing

› Analysis› Data modelling› Database Design› Implementation

› Strategy› Organisation & Roles› Policies & Standards› Issues› Valuation

› Architecture› Implementation› Training & Support› Monitoring & Tuning

› Acquisition & Storage› Backup & Recovery› Content Management› Retrieval› Retention

› Architecture› Integration› Control› Delivery

› Specification› Analysis› Measurement› Improvement

Page 36: CDMP preparation workshop EDW2016

Data Management Organisations DATA GOVERNANCE COUNCIL

The primary and highest authority organisation for data governance. Includes senior managers serving as executive data stewards, DM Leader and the CIO.

DATA STEWARDSHIP STEERING COMMITTEEOne or more cross-functional groups of coordinating data stewards responsible for support and oversight of a particular data management initiative.

DATA STEWARDSHIP TEAMOne or more business data stewards collaborating on an area of data management, typically within an assigned subject area, led by a Coordinating Data Steward.

DATA GOVERNANCE OFFICEExists in larger organisations to support the above teams.

Page 37: CDMP preparation workshop EDW2016

Data Stewards

EXECUTIVE DATA STEWARDSenior Managers who serve on a Data Governance Council.

COORDINATING DATA STEWARDLeads and represents teams of business data stewards in discussions across teams and with executive data stewards. Coordinating data stewards are particularly important in large organizations.

BUSINESS DATA STEWARDA knowledge worker and business leader recognized as a subject matter expert who is assigned accountability for the data specifications and data quality of specifically assigned business entities, subject areas or databases.

Page 38: CDMP preparation workshop EDW2016

Data Governance

• DQ & MDM Tool

Workflow:

Page 39: CDMP preparation workshop EDW2016

What Is Data Governance?

The Design & Execution Of Standards & Policies Covering … Design and operation of a management system to assure that data delivers value and is

not a cost

Who can do what to the organisation’s data and how

Ensuring standards are set and met

A strategic & high level view across the whole organisation

To Ensure … Key principles/processes of effective Information Management are put into practice

Continual improvement through the evolution of an Information Management strategy

Data Governance Is NOT … A “one off” Tactical management exercise

The responsibility of the Technology and IT department alone

T H E E X E R C I S E O F A U T H O R I T Y A N D C O N T R O L , P L A N N I N G , M O N I T O R I N G , A N DE N F O R C E M E N T O V E R T H E M A N A G E M E N T O F D A T A A S S E T S . ( D A M A I N T E R N A T I O N A L )

Page 40: CDMP preparation workshop EDW2016

Why Is Data Governance Critical?

Higher volumes of data generated by organisations (raw data, devices, CRM, ECM, IOT)

Proliferation of data-centric systems

New product development

To make the management of information front and centre and part of the culture

Greater demand for reliable information: Gain deep insights through analytics

Trust in Information: “What do you mean by ….?”

Tighter regulatory compliance

Competitive advantage: Improved decision making

Business change is no longer optional – it’s inevitable: Agility AND ability to respond to change

• Big Data explosion (and hype)

Page 41: CDMP preparation workshop EDW2016

Drivers for Data Governance1. Global operations are typically complex, disparate and often

inefficient in their approaches to information management (IM).

2. Shared and / or critical information is siloed & this siloedinformation impairs enterprise level reporting, decision-making and performance optimization

3. Aggregated information is required by certain business functions, but is not readily available

4. Business and IT neither talk the same language, nor have a common understanding about information management, causing a considerable knowledge gap to exist with regards to critical data elements for the enterprise.

5. Information management budgets and program focuses are siloed, often inside individual projects with no enterprise scope.

6. Enterprise wide information lacks semantic consistency (meaning & definition).

7. The information management needs of multiple “owners” across the enterprise must be rationalized.

8. Decentralized IT organizations operate independently within individual business unit, adding complexity and challenge.

9. Business perceives IT as being insufficiently agile to meet ad hoc information needs.

10. If even discussed, Business and IT can’t agree who actually “owns” the data.

11. Data context is critical to consumers, but often lacking.

12. Operationalization of information management projects at the enterprise level is a difficult challenge.

13. Regulation & compliance make effective information management no longer optional.

14. Data quality must be operationalized across the entire organization to assure the usefulness of the information that business users consume.

15. Organisations need to become information-centric enterprises.

16. Successful transformation of an organization into an information-centric enterprise requires a designated champion from senior management to educate and guide the company in operationalizing strategic data plans.

17. Strategic thinking and decision-making is needed on the issue of whether data should be centralized or distributed.

Page 42: CDMP preparation workshop EDW2016

Exercise1. List the top 5 drivers for Data Governance /

Information Management for Your Company

2. For each of the drivers above, describe the issues

faced / evidence and implications of these

Page 43: CDMP preparation workshop EDW2016

Data Governance Activities

Page 44: CDMP preparation workshop EDW2016

Guiding principles

Data management is a shared responsibility

Data Stewards have responsibilities in all 10 management functions

Every data governance/data stewardship programme is unique

The best data stewards are found not made

Shared decision making is the hallmark of data governance

DG Councils/Data Stewards (legislative) while DMSO (executive)

Data Governance occurs at enterprise and local levels

No substitute for visionary and active IT leadership

Centralised organisation for DM professionals is essential

Define a formal charter for the Data Governance Council

Data Strategy should be driven by the Business Strategy

Page 45: CDMP preparation workshop EDW2016

Ethical issues raised by IT

Who should have access to data? To whom does the data belong?

Who is responsible for maintaining accuracy and security?

Does the ability to capture data imply a responsibility to monitor its use?

Should data patterns be analyzed to prevent risks to employees / customers?

How much information is necessary and relevant for decision making?

Should certain data "follow" individuals or corporations throughout their lives?

Does IT lead to job elimination, job repetition, or job enhancement?

Page 46: CDMP preparation workshop EDW2016

What is the Learning objective / Area of knowledge?

(Data Governance)

Stem (construct a question):

Key (the correct answer)

Distractor 1

Distractor 2

Distractor 3

Preparing for an exam by creating questions

Page 47: CDMP preparation workshop EDW2016

Data Architecture Management

Page 48: CDMP preparation workshop EDW2016

Enterprise Architecture Types and Structures

Enterprise Architecture

Enterprise architecture (EA) is the process of translating business vision and strategy into effective enterprise change by creating, communicating and improving the key requirements, principles and models that describe the enterprise's future state and enable its evolution.

Segment Architecture

Segment architecture is a detailed, formal description of areas within an enterprise, used at the program or portfolio level to organize and align change activity.

Solutions Architecture

Solution architecture is a kind of architecture domain, that aims to address specific problems and requirements, usually through the design of specific information systems or applications.

Page 49: CDMP preparation workshop EDW2016

Enterprise Architecture Types and Structures

L E V E L S C O P E D E T A I L I M P A C T A U D I E N C E

EnterpriseArchitecture

SegmentArchitecture

SolutionArchitecture

Agency /Organization

Line of Business

Function / Process

Low

Medium

High

Strategic Outcomes

Business Outcomes

Operational Outcomes

All Stakeholders

Business Owners

Users and Developers

Page 50: CDMP preparation workshop EDW2016

Enterprise Architecture Frameworks

Examples include:

TOGAF – The Open Group Architecture Framework, probably the most widely adopted framework and contains an Architecture Development Method (ADM), content meta-model and defined artefacts within the business, application, data and technology domains.

Zachman – the first enterprise architecture framework and defines artifacts in a 6 x 6 matrix (interrogatives (What, How, Where etc.) as columns and stakeholder perspective as rows (Executive, Business , Architect etc.). It is an ontology not a methodology for enterprise architecture.

FEA - The U.S. federal enterprise architecture (FEA) is an initiative of the U.S. Office of Management and Budget that aims to comply with the Clinger-Cohen Act and provide a common methodology for IT acquisition in the US federal government.

An enterprise architecture framework defines how to organize the structure and views associated with an

enterprise architecture.

Page 51: CDMP preparation workshop EDW2016

Enterprise Architecture Types and Structures

Business ArchitectureThe Business Architecture defines the business strategy, governance, organization, and key business processes.

Application ArchitectureThe Application Architecture defines the major kinds of application system necessary to process the data and support the business.

Data ArchitectureThe Data Architecture describes the structure of an organization's logical and physical data assets and data management resources.

Technology (Infrastructure) ArchitectureThe Technology Architecture describes the logical software and hardware capabilities that are required to support the deployment of business, data, and application services. This includes IT infrastructure, middleware, networks, communications, processing, standards, etc.

Enterprise Architecture Domains

Page 52: CDMP preparation workshop EDW2016

Enterprise Architecture Types and Structures

Enterprise Data Model

Depicts the relationships between critical data entities within the enterprise. This diagram is developed to address the concerns of business stakeholders.

Information Value Chain Matrix

A Value Chain diagram provides a high-level orientation view of an enterprise and how it interacts with the outside world.

Database Architecture

A data architecture describes the architecture of the data structures used by a business and/or its applications.

Data Integration Architecture

Data integration involves combining data residing in different sources and providing users with a unified view of these data e.g. ETL or Virtualisation.

Document Content Architecture

The Document Content Architecture, or DCA for short, was a document standard supported by IBM in the early 1980s.

Meta-data Architecture

A model that describes how and with what the architecture will be described in a structured way.

Data Architecture Terms

Page 53: CDMP preparation workshop EDW2016

Enterprise Architecture Types & Structures

T O G A FI N P U T S &O U T P U T S

Page 55: CDMP preparation workshop EDW2016

Enterprise Architecture Types & Structures

Page 56: CDMP preparation workshop EDW2016

Enterprise Architecture Types and Structures

Federal Enterprise Architecture Framework

Page 57: CDMP preparation workshop EDW2016

Data Development3. Data Development

Definition: Designing, implementing, and maintaining solutions to meet the data needs of the enterprise.

Goals:

1. Identify and define data requirements.

2. Design data structures and other solutions to these requirements.

3. Implement and maintain solution components that meet these requirements.

4. Ensure solution conformance to data architecture and standards as appropriate.

5. Ensure the integrity, security, usability, and maintainability of structured data assets.

Inputs:

• Business Goals and Strategies

• Data Needs and Strategies

• Data Standards

• Data Architecture

• Process Architecture

• Application Architecture

• Technical Architecture

Primary Deliverables:

• Data Requirements and Business Rules

• Conceptual Data Models

• Logical Data Models and Specifications

• Physical Data Models and Specifications

• Meta-data (Business and Technical)

• Data Modeling and DB Design Standards

• Data Model and DB Design Reviews

• Version Controlled Data Models

• Test Data

• Development and Test Databases

• Information Products

• Data Access Services

• Data Integration Services

• Migrated and Converted Data

Suppliers:

• Data Stewards

• Subject Matter Experts

• IT Steering Committee

• Data Governance Council

• Data Architects and Analysts

• Software Developers

• Data Producers

• Information Consumers

Consumers:

• Data Producers

• Knowledge Workers

• Managers and Executives

• Customers

• Data Professionals

• Other IT Professionals

Tools:

• Data Modeling Tools

• Database Management Systems

• Software Development Tools

• Testing Tools

Activities:

1. Data Modeling, Analysis and Solution Design (D)

1.Analyze Information Requirements

2.Develop and Maintain Conceptual Data Models

3.Develop and Maintain Logical Data Models

4.Develop and Maintain Physical Data Models

2. Detailed Data Design (D)

1.Design Physical Databases

2.Design Information Products

3.Design Data Access Services

4.Design Data Integration Services

3. Data Model and Design Quality Management

1.Develop Data Modeling and Design Standards (P)

2.Review Data Model and Database Design Quality (C)

3.Manage Data Model Versioning and Integration (C)

4. Data Implementation (D)

1.Implement Development / Test Database Changes

2.Create and Maintain Test Data

3.Migrate and Convert Data

4.Build and Test Information Products

5.Build and Test Data Access Services

6.Validate Information Requirements

7.Prepare for Data Deployment

Participants:

• Data Stewards and SMEs

• Data Architects and Analysts

• Database Administrators

• Data Model Administrators

• Software Developers

• Project Managers

• DM Executives and Other IT

Management

Activities: (P) – Planning (C) – Control (D) – Development (O) - Operational

• Data Profiling Tools

• Model Management Tools

• Configuration Management Tools

• Office Productivity Tools

Page 58: CDMP preparation workshop EDW2016

What Is A Data Model?

A model is a representation of something in our

environment making use of

standard symbols to enable improved

understanding of the concept

A data model describes the specification,

definition and rules for data in a

business area

A data model is a diagram (with

additional supporting

metadata) that uses text and

symbols to represent data to give the reader a

better understanding of

the data

A data model describes the

inherent logical structure of the

data within a given domain and, by implication, the

underlying structure of that

domain itself

Page 59: CDMP preparation workshop EDW2016

A Data Model Represents

“Each CUSTOMER is the placer of zero, one or

more ORDER(s)"

Relationships should be named in both directions, thus in the other direction we have:

"Each ORDER must be placed by one and only

one CUSTOMER"

A relationship called "is the placer of" operates on entity classes CUSTOMER and ORDER and forms the following concrete assertion:

Is this true… always?

Is this true?

relationshipsamong those entities

and (often implicit)

relationships among those attributes

Relationships form a concrete Business

Assertion

Page 60: CDMP preparation workshop EDW2016

What Is A Conceptual Data Model?

A description of a Business (or an area of the

Business) in terms of the things it needs to

know about.

The Data things are “entities” and the “facts

about things” are attributes & relationships.

It’s a representation of the “real world”, not

a technical implementation of it

Should be able to be understood by Business

users

Definition:A Student is any person who has been admitted to a course, has paid, and has enrolled in one or more modules within a course. Tutors and other staff members may also be Students

Business Assertions A Student enrolls for zero, one or more modules A Course can be taught through zero, one or more Modules A Room can be the location of zero, one or more modules A Tutor can be the teacher of zero, one or more modules

The Other Way? A Module is enrolled in by zero or many students A Module is an offering within zero or one course A Module is located in zero or one room A Module is taught by zero or one tutor

Really?

Page 61: CDMP preparation workshop EDW2016

A Data Model Represents

Person, Employee, Vendor, Customer, Department, Organisation, …WHO

Product, Service, Raw Material, Training Course, Flight, Room, …WHAT

Time, Day, Date, Calendar, Reporting Period, Fiscal Period, …WHEN

Geographic location, Delivery address, Storage Depot, Airport, …WHERE

Order, Complaint, Inquiry, Transaction, …WHY

Invoice, Policy, Contract, Agreement, Document, Account, …HOW

Classes of

entities(kinds of things)

about which a company wishes to know or hold

information

Page 62: CDMP preparation workshop EDW2016

What is an Entity?

Entity: A classification of the types of objects found in the real world --persons, places,

things, concepts and events – of interest to the enterprise. DAMA Dictionary of Data Management

WHO? WHERE?

WHEN? HOW?WHY?

WHAT?

Page 63: CDMP preparation workshop EDW2016

Identifying Entities

A Rule Of Thumb

Is it an Entity?

Does this imply an instance of a SINGLE thing, not

a group or collection

How do I identify ONE of those things?

What are the facts I want to hold against ONE of

those things?

Do I even WANT to hold facts about these things?

PROCESSES will act upon it, so does the

“thing” make sense in a well formed process phrase i.e. a verb – noun pair?

What is ONE of those things?

Are there MULTIPLE instances of these things?

Page 64: CDMP preparation workshop EDW2016

Sample Entities

Product

Customer

Location

Order

Raw Material

Building

Region

Page 65: CDMP preparation workshop EDW2016

ExerciseIdentify Entities

Page 66: CDMP preparation workshop EDW2016

Exercise: EntitiesWhich of these might / might not be valid entities?

Student Building Maths Department

Course Catalogue

Attendance Sheet

Enrolment Form

Professor Plumb

Prerequisite list

ModuleOrganisation

ChartStudent

Directory

Module Description

QualificationCertification

BodyGraduation

Page 67: CDMP preparation workshop EDW2016

Exercise: EntitiesWhich of these might / might not be valid entities?

Student Building Maths Department

Course Catalogue

Attendance Sheet

Enrolment Form

Professor Plumb

Prerequisite list

ModuleOrganisation

ChartStudent

Directory

Module Description

QualificationCertification

BodyGraduation

Page 68: CDMP preparation workshop EDW2016

Data Model Levels

ENTERPRISE

CONCEPTUAL

LOGICAL

PHYSICAL

PHYSICAL IT SYSTEM

Described in more detail by

Generates schema of

Described in more detail by

Domain of an Enterprise data concept

Within subject area/domain

Reverse engineered into

Implementedin

Reverse engineered into

Co

mm

un

ica

tio

n F

oc

us

Imp

lem

en

ta

tio

n F

oc

us

Page 69: CDMP preparation workshop EDW2016

We All Use Models

Page 70: CDMP preparation workshop EDW2016

1st Normal Form 1 N F D E F I N I T I O N :

E v e r y n o n - k e y a t t r i b u t e i n a n e n t i t y m u s t d e p e n d o n i t ’ s p r i m a r y k e y

A P R I M A R Y K E Y M U S T B E

› Unique - the primary key uniquely

identifies each instance of the entity

› Mandatory – the primary key must be

defined for every instance of the entity

› Unchanging – while not mandatory, it is

desirable that the primary key does not

change

T O P U T A M O D E L I N T O

1 N F

1. Identify the primary key

2. Remodel repeating values

3. Remodel multi-valued attributes

Page 71: CDMP preparation workshop EDW2016

2nd Normal Form

Take each non-key attribute

(i.e. not a primary, foreign or alternate

key).

Test if it depends

entirely on the primary key

If it doesn’t, move it out to a new entity

2 N F D E F I N I T I O N :E A C H E N T I T Y M U S T H A V E T H E F E W E S T P O S S I B L E C O R R E C T P R I M A R Y K E Y AT T R I B U T E S

How do

we do

this?

Page 72: CDMP preparation workshop EDW2016

3rd Normal Form

For each non-key attribute

(i.e. not a primary, foreign or alternate

key)

Test if it depends

entirely on the primary key & nothing else

If it doesn’t, move it out to a new entity

3 N F D E F I N I T I O N :E A C H N O N K E Y E L E M E N T M U S T B E D I R E C T LY D E P E N D E N T U P O N T H E P R I M A R Y K E Y A N D N O T U P O N A N Y O T H E R N O N - K E Y AT T R I B U T E S

How do

we do

this?

Page 73: CDMP preparation workshop EDW2016

PRISM

Performance and Ease of Use

Ensure quick and easy access to data

Reusability

Multiple applications can use the data

Integrity

The data should have valid business meaning and

value

Security

Data should only be available to authorised users

Maintainability

Ensure cost of maintenance does not exceed its

value to the organisation

D A T A B A S E D E S I G N P R I N C I P L E S

Page 74: CDMP preparation workshop EDW2016

Physical database design best-practice

Use normalised design for relational databases supporting OLTP apps.

Use views, functions and stored procedures to create non-normalised, application-specific, object-friendly, conceptual (virtual) views of data.

Use standard naming conventions.

Enforce data security and integrity at the database level, not in the application.

Keep database processing on the database server as much as possible.

Grant permissions on database objects only to application groups or roles, not to individuals.

Do not permit any direct, ad-hoc updating of the database.

Page 75: CDMP preparation workshop EDW2016

Transforming from a logical to physical data model

DenormalisationSelectively and justifiably violating normalisation rules to reduce retrieval time, potentially at the expense of additional space, insert/update time and reduced data quality.

Surrogate keysSubstitute keys not visible to the business.

IndexingCreate additional index files to optimise specific types of queries.

PartitioningBreak a table or file horizontally or vertically.

ViewsVirtual tables used to simplify queries, control data access and rename columns.

DimensionalityCreation of fact tables with associated dimension tables. Structured as star schemas and snowflake schemas for BI.

Page 76: CDMP preparation workshop EDW2016

Database index architecture

Non-clusteredThe data is present in arbitrary order, but the logical ordering is specified by the index. The non-clustered index tree contains the index keys in sorted order, with the leaf level of the index containing the pointer to the record.

ClusteredClustering alters the data block into a distinct order to match the index, resulting in the row data being stored in order. The primary feature of a clustered index is the ordering of the physical data rows in accordance with the index blocks that point to them.

ClusterWhen multiple databases and multiple tables are joined. The records for the tables sharing the value of a cluster key shall be stored together in the same or nearby data blocks. This may improve the joins of these tables on the cluster key, since the matching records are stored together and less I/O is required to locate them. A cluster can be keyed as a B-Tree index or hash table.

Page 77: CDMP preparation workshop EDW2016

Types of indexes

Bitmap indexA bitmap index is a special kind of index that stores the bulk of its data as bit arrays. Works well for data such as gender (small number of distinct values but many occurrences of those values.

Dense indexA file with keys and pointers for every record in the data file. Every key in this file is associated with a particular pointer to a record in the sorted data file.

Sparse indexA sparse index in databases is a file with pairs of keys and pointers for every block in the data file. Every key in this file is associated with a particular pointer to the block in the sorted data file.

Reverse indexA reverse key index reverses the key value before entering it in the index. E.g., the value 24538 becomes 83542 in the index. Reversing the key value is particularly useful for indexing data such as sequence numbers, where new key values monotonically increase.

Page 78: CDMP preparation workshop EDW2016

Partitioning

Horizontal partitioningHorizontal partitioning is the partitioning of a table into a number of smaller tables on the basis of rows. For example, in an employee table, employees with a salary of less than £25, 000 will be partitioned into a different table.

Vertical partitioningVertical partitioning is dividing the table based on the different columns. For example, in a customer table, retrieving only the name and contact number of customers into a different table.

Page 79: CDMP preparation workshop EDW2016

Hierarchical Data Models

A hierarchical database model is a data model in which the data is organized into a tree-like structure.

The structure allows representing information using parent/child relationships: each parent can have many children, but each child has only one parent.

Page 80: CDMP preparation workshop EDW2016

Network Data Models

The network model is a database model conceived as a flexible way of representing objects and their relationships. Its distinguishing feature is that the schema, viewed as a graph in which object types are nodes and relationship types are arcs, is not restricted to being a hierarchy or lattice.

Page 81: CDMP preparation workshop EDW2016

Prime, Class, Modifier, Qualifier WordsThe following word classification types are used by various data modelling tools and are defined below with examples.

Defining Word Classification Types

Prime Word:

The prime word identifies the object or element

being defined. Typically, these objects represent a

person, place, thing, or event about which an

organization wishes to maintain information. Prime

words may act as primary search identifiers when

querying a database system and provide a basic list

of keywords for developing a general-to-specific

classification scheme based on business usages.

CUSTOMER in Customer Address is an example of a

prime word.

Modifier:

A modifier gives additional information about the

class word or prime word. Modifiers may be

adjectives or nouns. DELIVERY in Customer Delivery

Address is an example of a modifier. Other modifier

examples: ANNUAL, QUARTERLY, MOST, and LEAST.

Class Word:

A class word is the most important noun in a data

element name. Class words identify the use or

purpose of a data element. Class words designate

the type of information maintained about the object

(prime word) of the data element name. ADDRESS in

Customer Address is an example of a class word.

Qualifier:

A qualifier is a special kind of modifier that is used

with a class word to further describes a characteristic

of the class word within a domain of values, or to

specify a type of information that can be attached

to an object.

Examples: FEET, METERS, SECONDS, and WEEKS.

Page 82: CDMP preparation workshop EDW2016

ACID Test For Transaction Processing

AT O M I C I T Y

Atomicity requires that database modifications

must follow an "all or nothing" rule. Each

transaction is said to be atomic. If one part of the

transaction fails, the entire transaction fails and

the database state is left unchanged.

To be compliant with the 'A', a system must

guarantee the atomicity in each and every

situation, including power failures / errors /

crashes.

This guarantees that 'an incomplete transaction'

cannot exist.

AT O M I C I T Y, C O N S I S T E N C Y, I S O L AT I O N , D U R A B I L I T Y

Page 83: CDMP preparation workshop EDW2016

ACID Test For Transaction Processing

CO N S I S T E N C Y

The consistency property ensures that any transaction the

database performs will take it from one consistent state to another.

Consistency states that only consistent (valid according to all the

rules defined) data will be written to the database.

Quite simply, whatever rows will be affected by the transaction will

remain consistent with each and every rule that is applied to them

(including but not only: constraints, cascades, triggers).

While this is extremely simple and clear, it's worth noting that this

consistency requirement applies to everything changed by the

transaction, without any limit (including triggers firing other triggers

launching cascades that eventually fire other triggers etc.) at all.

AT O M I C I T Y, C O N S I S T E N C Y, I S O L AT I O N , D U R A B I L I T Y

Page 84: CDMP preparation workshop EDW2016

ACID Test For Transaction Processing

IS O L AT I O N

The requirement that no transaction should be able to interfere with another transaction at all.

In other words, it should not be possible that two transactions affect the same rows run concurrently, as the

outcome would be unpredicted and the system thus made unreliable.

This property of ACID is often relaxed (i.e. partly respected) because of the huge speed decrease this type of

concurrency management implies.

In effect the only strict way to respect the isolation property is to use a serial model

where no two transactions can occur on the same data at the same time and

where the result is predictable (i.e. transaction B will happen after

transaction A in every single possible case).

In reality, many alternatives are used due to speed concerns,

but none of them guarantee the same reliability.

AT O M I C I T Y, C O N S I S T E N C Y, I S O L AT I O N , D U R A B I L I T Y

Page 85: CDMP preparation workshop EDW2016

ACID Test For Transaction Processing

DU R A B I L I T Y

Durability means that once a transaction has been

committed, it will remain so.

In other words, every committed transaction is

protected against power loss/crash/errors and cannot

be lost by the system and can thus be guaranteed to

be completed.

In a relational database, for instance, once a group of

SQL statements execute, the results need to be stored

permanently. If the database crashes right after a group

of SQL statements execute, it should be possible to

restore the database state to the point after the last

transaction committed.

AT O M I C I T Y, C O N S I S T E N C Y, I S O L AT I O N , D U R A B I L I T Y

Page 86: CDMP preparation workshop EDW2016

BASE

These ACID qualities seem indispensable, and yet they are

incompatible with availability and performance in very large

systems.

For example, suppose you run an online book store and you

proudly display how many of each book you have in your

inventory.

Every time someone is in the process of buying a book, you

lock part of the database until they finish so that all visitors

around the world will see accurate inventory numbers.

That works well if you run The Shop Around the Corner but

not if you run Amazon.com.

Page 87: CDMP preparation workshop EDW2016

BASE

Amazon might instead use cached data.

Users would not see not the inventory count at this second, but

what it was say an hour ago when the last snapshot was taken.

Also, Amazon might violate the “I” in ACID by tolerating a small

probability that simultaneous transactions could interfere with

each other.

For example, two customers might both believe that they just

purchased the last copy of a certain book. The company might

risk having to apologize to one of the two customers (and

maybe compensate them with a gift card) rather than slowing

down their site and irritating lots of other customers.

Page 88: CDMP preparation workshop EDW2016

BASEThe CAP computer science theorem quantifies the inevitable trade-offs.

Eric Brewer’s CAP theorem: If you want consistency, availability, and partition tolerance, you have to settle for two out of three. (For a distributed system, partition tolerance means the system will continue to work unless there is a total network failure. A few nodes can fail and the system keeps going.)

An alternative to ACID is BASE:

BAsic Availability

Soft-state

Eventual consistency

Rather than requiring consistency after every transaction, it is enough for the database to eventually be in a consistent state. (Accounting systems do this all the time. It’s called “closing out the books.”) It’s OK to use stale data, and it’s OK to give approximate answers.

ACID

BASEBASE

Page 89: CDMP preparation workshop EDW2016

Data Operations Management

Page 90: CDMP preparation workshop EDW2016

DBA Responsibilities

Ensuring the performance and reliability of the database, including performance tuning, monitory and error reporting.

Implementing appropriate backup and recovery mechanisms to guarantee the recoverability of the data in any circumstance.

Implementing mechanisms for clustering and failover of the database, if continual data availability data is a requirement.

Implementing mechanisms for archiving data operations management.

Page 91: CDMP preparation workshop EDW2016

Factors affecting availability

ManageabilityThe ability to create and maintain an effective environment.

RecoverabilityThe ability to re-establish service after interruption, and correct errors caused by unforeseen events or component failures.

ReliabilityThe ability to deliver service at specified levels for a stated period.

ServiceabilityThe ability to determine the existence of problems, diagnose their cause and repair/solve the problems.

Page 92: CDMP preparation workshop EDW2016

Causes of poor database performance

Memory allocation (buffer/cache for data)

Locking and blocking

Failure to update database statistics

Poor SQL coding

Insufficient indexing

Application activity

Increase in the number, size or use of databases

Database volatility

Page 93: CDMP preparation workshop EDW2016

Data Technology Architecture

DBMS software

Relational database management utilities

Data modelling and management software

Business intelligence software for reporting and analysis

Extract-Transform-Load (ETL) and other data integration tools

Data quality analysis and data cleansing tools

Meta-data management software, including meta-data repositories

Data technologies to be included in the technology architecture include

Page 94: CDMP preparation workshop EDW2016

Technology Architecture Components - “Bricks”

CurrentProducts currently supported and used.

Deployment PeriodProducts to be deployed for use in the next 1-2 years.

Strategic PeriodProducts expected to be available for use in the next 2+ years.

RetirementProducts the organisation has retired or intends to retire this year.

PreferredProducts preferred for use by most applications.

ContainmentProducts limited to use by certain applications.

EmergingProducts being researched and piloted for possible future deployment.

Page 95: CDMP preparation workshop EDW2016

Data Security Management

Page 96: CDMP preparation workshop EDW2016

Data Security Guiding Principles

Be a responsible trustee of data about all parties

Understand and comply with all pertinent regulations and guidelines

Use CRUD matrices to help map data access needs

Ensure Data Security Policy is reviewed and approved by the governance council

Identify detailed application security requirements on projects

Classify all enterprise data and information products for confidentiality

Set passwords following a set of password complexity guidelines

Create role groups

Formally request, track and approve all user and group authorisations

Centrally manage user identity data and group membership data

Use views to restrict access to sensitive columns or specific rows

Strictly limit and consider every use of shared or service user accounts

Monitor data access activity to understand trends

Page 97: CDMP preparation workshop EDW2016

Sources of Data Security Requirements

STAKEHOLDER CONCERNS

GOVERNMENT REGULATIONS

LEGITIMATE BUSINESS

CONCERNS

NECESSARY BUSINESS

ACCESS NEEDS

• Regulations may restrict access to information

• Acts to ensure openness and accountability

• Provision of subject access rights

• And more …

• Privacy and confidentiality of clients information

• Trade secrets

• Business partner activity• Mergers & acquisitions

• Data Security must be appropriate

• Data security must not be too onerous to prevent users from doing their jobs.

• Goldilocks principle

• Trade secrets• Research & other IP

• Knowledge of customer needs

• Business partner relationships and impending deals

Source: DMBoK

Page 98: CDMP preparation workshop EDW2016

A4

AUTHENTICATION Validate users are who they say they are

AUTHORISATION Identify the right individuals and grant them the right privileges to specific, appropriate views of data

ACCESS Enable individuals and their privileges in a timely manner

AUDIT Review security actions and user activity (to ensure compliance with regulations and conformance with policy and standards)

Page 99: CDMP preparation workshop EDW2016

A4

Page 100: CDMP preparation workshop EDW2016

CIA

CONFIDENTIALITYPreventing the disclosure of information to unauthorised individuals or systems.

INTEGRITYPreventing the undetectable modification of information.

AVAILABILITYEnsuring that information is available where and when it is needed.

Page 101: CDMP preparation workshop EDW2016

4 issue types: THREAT An aspect that might be environmental or manmade or environmental) that has the potential to compromise the confidentiality, integrity or availability of an information asset

VULNERABILITY A weakness that could be exploited to compromise the confidentiality, integrity or availability of an information asset

RISK the likelihood that a threat will exploit a vulnerability to compromise the confidentiality, integrity or availability of an information asset

IMPACT A loss of confidentiality, integrity or availability which may result in more significant losses to competitive advantage, revenue, life, property or reputation

Source: DMBoK

Page 102: CDMP preparation workshop EDW2016

ExerciseA4 = ?

CIA = ?

4 Issue Types = ?

Page 103: CDMP preparation workshop EDW2016

Network Security

Network Security Threats:

Viruses, worms, and Trojan horses

Spyware and adware

Zero-day attacks, also called zero-hour attacks

Hacker attacks

Denial of service attacks

Data interception and theft

Identity theft

Network Security Components:

Anti-virus and anti-spyware

Firewall, to block unauthorized access to your network

Intrusion prevention systems (IPS), to identify fast-spreading threats, such as zero-day or zero-hour attacks

Virtual Private Networks (VPNs), to provide secure remote access

Page 104: CDMP preparation workshop EDW2016

Securing IT Infrastructure

EncryptionThe process of transforming information using an algorithm (called a cipher) to make it unreadable to anyone except those possessing special knowledge, usually referred to as a key.

Network EncryptionA network security process that applies crypto services at the network transfer layer - above the data link level, but below the application level.

Email EncryptionA network security process that applies crypto services at the network transfer layer - above the data link level, but below the application level.

S/MIME - form of encryption that is included in several email clients by default (such as Outlook Express and Mozilla Thunderbird) and relies on the use of a Certificate Authority to issue a secure email certificate.

PGP - the commercial version, where OpenPGP is a free, open source equivalent) takes a de-centralised approach to email encryption. It does not rely on trusting a Certificate Authority, rather the users create encryption keys themselves.

Page 105: CDMP preparation workshop EDW2016

IT Security Threats Privilege Escalation

Software programs often have bugs that can be exploited. These bugs can be used to gain access to certain resources with higher privileges that can bypass security controls.

VirusA virus is a computer program that, like a medical virus, has the ability to replicate and infect other computers.

TrojanThey masquerade as normal, safe applications, but their mission is to allow a hacker remote access to your computer. In turn, the infected computer can be used as part of a denial of service attack and data theft can occur (e.g. keystroke logger).

WormA worm is a specific type of virus. Unlike a typical virus, it’s goal isn’t to alter system files, but to replicate so many times that it consumes hard disk space or memory.

SpywareLike Trojans, spyware can pilfer sensitive information, but are often used as advertising tools as well. The intent is to gather a user’s information by monitoring Internet activity and transmitting that to an attacker.

SpamSpam is unsolicited junk mail. It comes in the form of an advertisement, and in addition to being a time waster, has he ability to consume precious network bandwidth.

Page 106: CDMP preparation workshop EDW2016

IT Security Threats Botnets

Botnets are created with a Trojan and reside on IRC networks. The bot can launch an IRC client, and join chat room in order to spam and launch denial of service attacks.

Logic bombThey are bits of code added to software that will set off a specific function. Logic bombs are similar to viruses in that they can perform malicious actions like deleting files and corrupting data.

AdwareSimilar to spyware, adware observes a user’s Internet browsing habits. But the purpose is to be able to better target the display of web advertisements.

RootkitsRootkits are some of the most difficult to detect. They are activated when your system boots up — before anti-virus software is started. Rootkits allow the installation of files and accounts, or the purposes of intercepting sensitive information.

Page 107: CDMP preparation workshop EDW2016

Reference & Master Data Management

Page 108: CDMP preparation workshop EDW2016

Reference and Master Data

Reference DataIs used to classify or categorise other data, for example.

Master DataIs the authoritative, most accurate data available about key business entities, used to establish the context for transactional data. Master data values are considered ‘golden’.

Code Value

Description

US United States of America

GB United Kingdom

Page 109: CDMP preparation workshop EDW2016

What is Event / Transaction Data?

“Bob bought a Mars bar from Morrison's on Monday 3rd Jan at 4pm and paid using cash.”

Event data example:

WHO WHAT WHERE WHEN HOW QUANTITY AMOUNT

Bob Smith Twix bar Morrison's, Bath 16:00 Monday 3rd January 2011

Cash 1 £0.60

CUSTOMER CODE

PRODUCT CODE

VENDOR CODE

DATE PAYMENTMETHOD

QUANTITY AMOUNT

BS005 CONF101 WMBATH 2011-01-03 16:00 CASH 1 £0.60

TerminologyFIELD (or attribute): column in a database tableRECORD: row in a database table

Page 110: CDMP preparation workshop EDW2016

About Event Data

AKA Transaction data

Describes an action (a

verb)

E.g. “buy”

May include

measurements about

the action:

Quantity bought

Amount paid

Does not include

information:

describing the nouns:

›Bob is female, aged 25

and works for British

Airways

›Monday 3rd Jan 2011 is

a bank holiday

› The address of

Morrisons Bath is: York

Place, London Road,

Bath, BA1 6AE.

› That Twix is a special

offer 200g jumbo bar

Includes information:

identifying the nouns that

were involved in the

event

(the Who / What / Where

/ When / How and maybe

even the Why):

›Bob Smith

› Twix bar

›Morrisons, Bath

›16:00 Monday 3rd Jan

2011

›Cash

Page 111: CDMP preparation workshop EDW2016

What is…MASTER DATA?

› Defines and describes the nouns (things) of the business.e.g. Field, Well, Rig, Product, Store, TheraputicArea, Adverse Event, etc.

› Data about the “things” that will participate in events.

› Provides contextual information about events / transactions.

› Stored in many systems

› Packaged Systems

› Line of Business Systems

› Spreadsheets

› SharePoint Lists

MASTER DATA MANAGEMENT (MDM)?

› The ongoing reconciliation

and maintenance of master

data.

› Control over master data

values to enable consistent,

shared, contextual use across

systems, of the most accurate,

timely, and relevant version

of truth about essential

business entities.

[DAMA, the Data Management Association]

MASTER DATA MANAGEMENT (MDM)?

› Comprises a set of

processes and tools that

consistently defines and

manages the non-

transactional data entities

of an organisation.

[Wikipedia]

Page 112: CDMP preparation workshop EDW2016

Master Data – What’s the problem?

No organisation has just one system (unless the are tiny)

Details about the same noun are found in multiple systems, e.g. Customer, Product

Problems

Data may need to be rekeyed in each system

Systems may not be in synch (new records, updated records)

Duplicate data: are “ABC Ltd” and “ABC Limited” the same thing?

No single version of the truth

Reporting / Analysis: difficult to combine data from multiple systems

The same customers may be defined in:• Finance systems • Marketing systems• Line of business systems

S O L U T I O N :

Master Data Management!

Page 113: CDMP preparation workshop EDW2016

Standard “Hub” architectures

1. REPOSITORY

2. REGISTRY

3. HYBRID

4. VIRTUALISED

*A key difference is the number of fields that are

stored centrally

Page 114: CDMP preparation workshop EDW2016

Example: PERSON

Customer code

First name

Last name

Date of birth Preferred delivery address line 1

Preferreddelivery address post code

Credit rating

Occupation Car

BS005 Bob Smith 1985-12-25 Royal Crescent BA1 7LA A InformationArchitect

Audi R8

IDENTIFIERS

CORE FIELDS

ALL FIELDS

Page 115: CDMP preparation workshop EDW2016

Example: PERSON

Customer code

First name

Last name

Date of birth Preferred delivery address line 1

Preferreddelivery address post code

Credit rating

Occupation Car

BS005 Bob Smith 1985-12-25 Royal Crescent BA1 7LA A InformationArchitect

Audi R8

I D E NT I F I E R S

C O R E F I E L D S

A L L F I E L D S

ALL FIELDS Repository

CORE FIELDS Hybrid

IDENTIFIERS

Registry

NONE Virtualised

Page 116: CDMP preparation workshop EDW2016

Master Data Examples

Party Master DataIncludes data about individuals, organizations and the roles they play in business relationships (e.g. customers, citizens, patients, vendors, suppliers, business partners, competitors, employees, students etc.

Financial Master DataIncludes data about business units, cost centers, profit centers, general ledger accounts, budgets, projections and projects.

Product Master DataFocusses on an organization's internal products or services. May include bill-of-materials, manuals, design documents, SOPs etc. (can be unstructured data).

Location Master DataIncludes data about business party addresses and geographic positioning coordinates, such as latitude, longitude and altitude.

Page 117: CDMP preparation workshop EDW2016

Master Data Match Rules

Rules around the matching, merging and linking of data from multiple systems about the same person, group, place or thing.

Three primary scenarios:

1. Duplicate identification match rulesFocus on a specific set of fields that uniquely identify an entity and identify merge opportunities without taking automatic action. Business Data stewards can review these occurrences and decide to take action on a case-by-case basis.

2. Match-merge rulesMatch records and merge the data from these records into a single, unified, reconciled and comprehensive record. If the rules apply across data sources, create a single unique and comprehensive record in each database.

3. Match-link rulesIdentify and cross-reference records that appear to relate to a master record without updating the content of the cross-referenced record. Match-link rules are easier to implement and much easier to reverse.

Page 118: CDMP preparation workshop EDW2016

Guiding Principles

Shared reference and master data belong to the organisation, not to a particular application or department.

Reference and master data management is an on-going data quality improvement program; its goals cannot be achieved by one project alone.

Business data stewards are the authorities accountable for controlling reference data values. Business data stewards work with data professionals to improve the quality of reference and master data.

Golden data values represent the organisation’s best efforts at determining the most accurate, current and relevant data values for contextual use. New data may prove earlier assumptions to be false. Therefore apply matching rules with caution and ensure that any changes that are made are reversible.

Replicate master data values only from the database of record.

Request, communicate, and, in some cases, approve changes to reference data values before implementation.

Page 119: CDMP preparation workshop EDW2016

DW & BI Management

Page 120: CDMP preparation workshop EDW2016

Why Use A Data Warehouse?

Legacy Applications + Databases = Chaos

Production Control

MRP

InventoryControl

Parts Management

Logistics

Shipping

Raw Goods

Order Control

Purchasing

Marketing

Finance

Sales

Accounting

Management Reporting

Engineering

Actuarial

Human Resources

ContinuityConsolidationControlComplianceCollaboration

Enterprise Data Warehouse = Order

Single version of the truth

Enterprise DataWarehouse

Every question = decision

Two purposes of data warehouse: 1) save time building reports; 2) Report & analyze in ways you could not do before

Page 121: CDMP preparation workshop EDW2016

Simplified Business Intelligence Stack

REPORTING & ANALYSIS TOOLS

DATA WAREHOUSE

DATA INTEGRATION LAYER

DATA SOURCE

DATA SOURCE

DATA SOURCE

DATA SOURCE

Operational systems, legacy databases, ERP/CRM, text files, spreadsheets…

E.G Extract, Transform & Load (ETL) or Enterprise Information Integration (EII)

Dimensional data model (star schema) or Virtual Data Warehouse

Standard/ad-hoc reports, analytics, data mining, dashboards, scorecards…

Page 122: CDMP preparation workshop EDW2016

What is Data Warehousing? (DMBoK)

Data Warehousing is the term used to describe the processes that maintain the data contained within a data warehouse, namely:

Extract processes

Cleansing processes

Transformation processes

Load processes

Associated Control processes

The use of Meta-data

Page 123: CDMP preparation workshop EDW2016

What is a Data Warehouse? (2)

REPORTING & ANALYSIS TOOLS

DATA WAREHOUSE

DATA INTEGRATION LAYER

DATA SOURCE

DATA SOURCE

DATA SOURCE

DATA SOURCE

Integrated Decision Support Database, and…

…Related Software Programs• CDC – Change Data Capture• ETL – Extract, Transform &

Load• DQ – Data Quality• DV – Data Virtualisation

DA

MA

De

fin

itio

n

Page 124: CDMP preparation workshop EDW2016

What is Business Intelligence? (DMBoK)

Business Intelligence (BI) is a set of business capabilities.

BI can mean any of the following:

Query, analysis and reporting by knowledge workers

Query, analysis and reporting processes and procedures

A synonym for the business intelligence environment

The market segment for business intelligence tools

Strategic and operational analytics and reporting on corporate operational data to support business decisions, risk management and compliance

A synonym for Decision Support Systems (DSS)

Page 125: CDMP preparation workshop EDW2016

What is Business Intelligence (BI)?

REPORTING & ANALYSIS TOOLS

DATA WAREHOUSE

DATA INTEGRATION LAYER

DATA SOURCE

DATA SOURCE

DATA SOURCE

DATA SOURCE

BROAD DEFINITION:

› “Business Intelligence a set of

methodologies, processes,

architectures, and technologies

that transform raw data into

meaningful and useful

information used to enable more

effective strategic, tactical, and

operational insights and decision-

making.” [Forrester Research]

NARROWER DEFINITION:

› Analysis, Query and Reporting

BR

OA

D D

EF

INIT

ION

NA

RR

OW

D

EFI

NIT

ION

Page 126: CDMP preparation workshop EDW2016

What is Data Warehousing and Business Intelligence Management (DW-BIM)? (DMBoK)

Data Warehousing and Business Intelligence Management (DW-BIM) is the collection, integration and presentation of data to knowledge workers for the purpose of business analysis and decision-making.

DW-BIM is composed of activities supporting all phases of the decision support life-cycle that provides context, moves and transforms data from sources to a common target data store, and then provides knowledge workers various means of access, manipulation and reporting of the integrated target data.

Page 127: CDMP preparation workshop EDW2016

Objectives of DW-BIM include…

Integrated data From disparate sources

Historical and current

Ensuring credible, accurate, timely data is used in reports and BI applications

Ensuring high-performance data access for reports and BI applications

Making best use of the outputs of the Reference and Master Data Management, Data Governance, Data Quality and Meta-data disciplines

Page 128: CDMP preparation workshop EDW2016

A Dimensional Model

Dimension tables

Examples: Location, Product, Time, Promotion, Organisation etc.

Records in the dimension tables correspond to nouns.

The data in the dimension tables changes slowly – the number of new records created each day is typically low.

Fact tables

Contains measures (e.g. Sales Value GBP) and dimension columns

Records in the fact tables correspond to events, transactions, or measurements.

The number of new records created each dayis typically high.

Page 129: CDMP preparation workshop EDW2016

Dimension tables

A dimension table is one of a set of companion tables to a fact table, forming a vertex of the “star”

Each dimension table represents a particular business entity – records represent nouns within the business

Products, Customers, Times, Locations etc.

Each dimension table contains a single field that serves as its primary key

Each dimension table also contains a number of fields providing details of the entity – each of these fields is known as an attribute (or dimension)

Page 130: CDMP preparation workshop EDW2016

Dimension tables and Hierarchies

Hierarchies for the dimensions are stored in the dimensional table itself.

E.g. Product dimension has the hierarchies from Manufacturer, Brand and Product Type to Product.

There is no need for the individual hierarchical lookup tables like Manufacturer lookup, Brand lookup, Product Type lookup to be shown in the model.

Page 131: CDMP preparation workshop EDW2016

Dimension tables (summary)

1. Records in dimension tables correspond to nouns Tables are “short” – 10s to 1,000s of records

2. Data changes slowly

3. Rich set of attributes Tables are “wide” – many columns

4. Denormalised No need to join to further lookup tables

Lots of redundancy

Page 132: CDMP preparation workshop EDW2016

Fact tables

Facts are used to store numerical measurements captured in a ‘measurement event’ caused by a business process

A fact table is the primary table in each dimensional model, forming the centre of “star”

Each fact table represents a many-to-many relationship

Each fact table contains two or more foreign keys to dimension tables

Each fact table has a compound primary key consisting of two or more foreign keys

A fact table may additionally contain fields that are used to record the value of a business measure, e.g. Sales Value in GBP – each of these fields is known as a measure (or fact)

The most useful measures are numeric and additive

‘Additive’ means that it is meaningful to sum the values over multiple records.

Cost and Revenue are examples of additive facts.

Page 133: CDMP preparation workshop EDW2016

Fact tables

Records in fact tables correspond to events, transactions, or measurements.

Data is added regularly

›Tables are “long” – often millions of records

Rich set of attributes

›Tables are “narrow” – minimal number of columns

Low redundancy

Page 134: CDMP preparation workshop EDW2016

What are slowly changing dimensions?

Dimensions whose values change infrequently as a result of UPDATE operations in the source system

For example

› A product may be renamed

› A product may be reclassified (i.e. the “product type” may change)

› A supplier may change address

› A person might change their name

› Etc., etc.

In fact most dimensions will change slowly over time!

Page 135: CDMP preparation workshop EDW2016

Why do slowly changing dimensions present problems?

The Data Warehouse will need to be updated to reflect the changes made in the source system.

_so there’s some ETL work to be done.

If we just overwrite the details with the new details, we’ll effectively change the history stored in the Data Warehouse.

_When we re-run reports against historical data, they’ll no longer return the same results as before.

Page 136: CDMP preparation workshop EDW2016

How can we handle slowly changing dimensions?

There are standard techniques for handling slowly changing dimensions.

1.Type 1 (overwrite)

2.Type 2 (add new row)

3.Type 3 (add new attribute)

4. Type 4 (add history table)

5. Type 6 (hybrid)

6. Others – see the internet!

We may need to employ different techniques for different fields.

Page 137: CDMP preparation workshop EDW2016

Type 1 - Overwrite

Overwrite the dimension record with the new values, thereby losing history.

_Used when correcting an error, for instance

Page 138: CDMP preparation workshop EDW2016

Type 2 – Create new record

Create a new additional dimension record using a new value of the surrogate key (NOTE: a surrogate key is required!)

_Used when a true change has occurred and it is appropriate to partition history.

_Historic FACT records can continue to point to the “old” dimension record while new FACT records will point to the “new” dimension record.

Page 139: CDMP preparation workshop EDW2016

Type 3 – Use an “old” field

Create an “old” field in the dimension record to store the immediate previous value of the attribute.

_Used when the change is “soft” or tentative, or when we wish to track history based on the old value as well as the new (e.g. change of sales boundaries)

_Supports analysis by either of two versions.

_Works best when there is only one soft change at a time.

Page 140: CDMP preparation workshop EDW2016

Slowly Changing Dimensions Summary

Three most common techniques:

1. Type 1 – Overwrite

2.Type 2 – Keep all old versions in separate records

3.Type 3 – Keep the latest old version in an “old” field

Different techniques for different fields

Page 141: CDMP preparation workshop EDW2016

Document & Content Management8. Document & Content Management

Definition: Planning, implementation, and control activities to store, protect, and access data found

within electronic files and physical records (including text, graphics, images, audio, and video).

Goals:

1. To safeguard and ensure the availability of data assets stored in less structured formats.

2. To enable effective and efficient retrieval and use of data and information in unstructured formats.

3. To comply with legal obligations and customer expectations.

4. To ensure business continuity through retention, recovery, and conversion.

5. To control document storage operating costs.

Inputs:

• Text Documents

• Reports

• Spreadsheets

• Email

• Instant Messages

• Faxes

• Voicemail

• Images

• Video recordings

• Audio recordings

• Printed paper files

• Microfiche

• Graphics

Suppliers:

• Employees

• External parties

Participants:

• All Employees

• Data Stewards

• DM Professionals

• Records Management Staff

• Other IT Professionals

• Data Management Executive

• Other IT Managers

• Chief Information Officer

• Chief Knowledge Officer

Tools:

• Stored Documents

• Office Productivity Tools

• Image and Workflow

Management Tools

• Records Management Tools

• XML Development Tools

• Collaboration Tools

• Internet

• Email Systems

Activities:

1. Document / Records Management

1.Plan for Managing Documents / Records (P)

2.Implement Document / Records Management Systems for

Acquisition, Storage, Access, and Security Controls ( O, C)

3.Backup and Recover Documents / Records (O)

4.Retain and Dispose of Documents / Records (O)

5.Audit Document / Records Management (C)

2. Content Management

1.Define and Maintain Enterprise Taxonomies (P)

2.Document / Index Information Content Meta-data (O)

3.Provide Content Access and Retrieval (O)

4.Govern for Quality Content (C)

Primary Deliverables:

• Managed records in many

media formats

• E-discovery records

• Outgoing letters and emails

• Contracts and financial

documents

• Policies and procedures

• Audit trails and logs

• Meeting minutes

• Formal reports

• Significant memoranda

Consumers:

• Business and IT users

• Government regulatory agencies

• Senior management

• External customers

Metrics:

• Return on investment

• Key Performance Indicators

• Balanced Scorecards

Activities: (P) – Planning (C) – Control (D) – Development (O) - Operational

Page 142: CDMP preparation workshop EDW2016

Terms

Document ManagementThe storage, inventory and control of electronic and paper documents.

Content ManagementThe organisation, categorisation, and structure of data / resources so that they can be stored, published and reused in multiple ways.

TaxonomyThe science or technique of classification.

OntologyA type of model that represents a set of concepts and their relationships within a domain.

Page 143: CDMP preparation workshop EDW2016

Main Activities•Document / Record Management is the lifecycle management of the

designated significant documents of the organization.

•Not all documents are significant as evidence of the organization’s business activities and regulatory compliance.

•Records management manages paper and microfiche / film records from their creation or receipt through processing, distribution, organization, and retrieval, to their ultimate disposition.

Document & Records

Management

•Content management is the organization, categorization, and structure of data / resources to be stored, published, and reused in multiple ways.

•Content includes data / information, that exists in many forms and in multiple stages of completion within its lifecycle. Content may be found on electronic, paper or other media.

•The lifecycle of content can be active, with daily changes through controlled processes for creation, modification, and collaboration of content before dissemination.

Content Management

Page 144: CDMP preparation workshop EDW2016

Document/Record Management Lifecycle

Identification

Creation, Approval

and enforcementof policies

Classificationof

documents / records

StorageRetrieval

and Circulation

Preservationand

Disposal

Page 145: CDMP preparation workshop EDW2016

Taxonomies

Grouped into four types:

1.Flat Taxonomy – no relationship among the controlled set of categories (example: list of countries).

2.Facet Taxonomy – for example meta-data where each attribute (creator, title, keywords etc.) is a facet of a content object.

3.Hierarchical Taxonomy – for example geography, from continent down to address.

4.Network Taxonomy – for example a recommender engine (if you liked that, you may also like this…).

Page 146: CDMP preparation workshop EDW2016

MetaData Management9. Meta-data Management

Definition: Planning, implementation, and control activities to enable easy access to high quality, integrated meta-data.

Goals:

1. Provide organizational understanding of terms, and usage

2. Integrate meta-data from diverse source

3. Provide easy, integrated access to meta-data

4. Ensure meta-data quality and security

Inputs:

• Meta-data

Requirements

• Meta-data Issues

• Data Architecture

• Business Meta-data

• Technical Meta-data

• Process Meta-data

• Operational Meta-data

• Data Stewardship

Meta-data

Primary Deliverables:

• Meta-data Repositories

• Quality Meta-data

• Meta-data Models and

Architecture

• Meta-data Management

Operational Analysis

• Meta-data Analysis

• Data Lineage

• Change Impact Analysis

• Meta-data Control Procedures

Suppliers:

• Data Stewards

• Data Architects

• Data Modelers

• Database

Administrators

• Other Data

Professionals

• Data Brokers

• Government and

Industry Regulators

Consumers:

• Data Stewards

• Data Professionals

• Other IT Professionals

• Knowledge Workers

• Managers and Executives

• Customers and Collaborators

• Business Users

Participants:

• Meta-data Specialist

• Data Integration

Architects

• Data Stewards

• Data Architects and

Modelers

• Database Administrators

• Other DM Professionals

• Other IT Professionals

• DM Executive

• Business Users

Tools:

• Meta-data Repositories

• Data Modeling Tools

• Database Management

Systems

• Data Integration Tools

• Business Intelligence Tools

• System Management Tools

• Object Modeling Tools

• Process Modeling Tools

• Report Generating Tools

• Data Quality Tools

• Data Development and

Administration Tools

• Reference and Master Data

Management Tools

Activities:

1. Understand Meta-data Requirements (P)

2. Define the Meta-data Architecture (P)

3. Develop and Maintain Meta-data Standards (P)

4. Implement a Managed Meta-data Environment (D)

5. Create and Maintain Meta-data (O)

6. Integrate Meta-data (C)

7. Manage Meta-data Repositories (C)

8. Distribute and Deliver Meta-data (C)

9. Query, Report, and Analyze Meta-data (O)

Metrics:

• Meta Data Quality

• Master Data Service Data

Compliance

• Meta-data Repository Contribution

• Meta-data Documentation Quality

• Steward Representation /

Coverage

• Meta-data Usage / Reference

• Meta-data Management Maturity

• Meta-data Repository Availability

Activities: (P) – Planning (C) – Control (D) – Development (O) - Operational

Page 147: CDMP preparation workshop EDW2016

Where do you encounter metadata every day?

Page 148: CDMP preparation workshop EDW2016

MetaDataD AT A M E T A D AT A

Page 149: CDMP preparation workshop EDW2016

MetaData

Page 150: CDMP preparation workshop EDW2016

Where else do you use metadata every day?

Page 151: CDMP preparation workshop EDW2016
Page 152: CDMP preparation workshop EDW2016

ExerciseWhere do YOU encounter MetaData

every day?

Page 153: CDMP preparation workshop EDW2016

Types of Meta-data

Business meta-dataRelates business perspective to the meta-data user (e.g. business data definitions, regulatory or contractual constraints, data quality statements).

Technical and Operational meta-dataTargeted at IT operations users’ needs (e.g. data archiving and retention rules, audit rules, recovery and backup rules)

Process meta-dataOther system elements (e.g. data stores involved, process name, roles and responsibilities)

Data Stewardship meta-dataData about stewards and stewardship processes (e.g. Data Owners, Data Subject Areas, Data Users, Data Stewards).

Page 154: CDMP preparation workshop EDW2016

Meta-data Architecture

Centralised Meta-data Architecture Centralised architecture consists of a single meta-data repository that contains copies of live meta-data from various sources

Distributed Meta-data ArchitectureA single access point. The meta-data retrieval engine responds to user requests by retrieving data from source systems in real time; there is no persistent repository.

Hybrid Meta-data ArchitectureA combined alternative. Meta-data still moves directly from the source systems into the repository, however, repository design only accounts for the user-added meta-data, the critical standardised items and the additions from manual sources.

Page 155: CDMP preparation workshop EDW2016

Industry Meta-data Standards

OMG (Common Warehouse Meta-data (CWM), Information Management

Metamodel (IMM), MDC Open Information Model (OIM), XML, UML, SQL)

World Wide Web Consortium (W3C): RDF (Relational Defintion Framework)

Dublin Core: Dublin Core Meta-data Initiative (DCMI)

Distributed Management Task Force (DTMF): Web-based Enterprise

Management (WBEM)

Meta-data standards for unstructured data

Page 156: CDMP preparation workshop EDW2016

Data Quality Management

Page 157: CDMP preparation workshop EDW2016

Data Quality Management Cycle The Data Management Body of Knowledge identifies 4 key activities necessary for operationalising DQM:

Planning for the assessment of the current state and identification of key metrics for measuring data quality

Acting to resolve any identifies issues to improve data quality

and better meet business expectations

Deploying processes for measuring and improving the quality of data

Monitoring and measuring the levels in relation to the

defined business expectations

DEMING CYCLE(continuous improvement

Page 158: CDMP preparation workshop EDW2016

What is Data Quality Management?

› Poor Data Quality Management does not equate to

poor data quality

› But when you don’t have good Data Quality

Management…

» The current level of data quality will be unknown

» Maintaining a sufficient level of data quality will be a result

of ‘winging it’ and the sheer persistence of talent

» The risk to the business will increase

› It is infinitely more sensible to ensure good data

quality by having good

management through

a coherent set of

policies, standards,

processes and

supporting technology

“Data errors can cost a company millions of dollars, alienate customers, suppliers and business partners, and make implementing new strategies difficult or even impossible.

The very existence of an organisation can be threatened by poor data”

Joe Peppard – European School of Management and Technology

“Ultimately, poor data quality is like dirt on the windshield. You may be able to drive for a long time with slowly degrading vision, but at some point you either have to stop and clear the windshield or risk everything”

Ken Orr, The Cutter Consortium

Page 159: CDMP preparation workshop EDW2016

Answer: It depends…In February 2011, the UK governmentlaunched a crime-mapping website for England and Wales (www.police.uk).

Unfortunately, for a number of reasons, the postcode allocated to a specific police incident didn’t always correspond to the precise location of the crime.

The net result was that poor accuracy in the recording of geographical information led many quiet residential streets to be incorrectly identified as crime hotspots.

In the context of creating aggregated statistics to

assess relative crime rates between counties, the

data quality is perfectly acceptable.

However, if the same data is used by an insurance

company, there is an issue for the homeowners who

receive inflated home insurance premiums.

Data fitfor purpose

Data not fit for purpose

Data quality can only be considered within the context of the intended use of the dataData needs to be “fit for purpose”

Data quality needs to be assessed on that basis

So How Good Does Data Quality Need To Be?

Page 160: CDMP preparation workshop EDW2016

Good data quality benefit

Adherence to corporate & Regulatory acts

Improved confidence in Data

Reduced “busy work” in data archaeology

Enriched Customer Satisfaction

Better decision making

Effective Marketing and Advertising

Cost efficiencies

Improved Operational Efficiency & streamlining

Poor data quality impact

Ineffectual Advertising & Marketing

Reputational damage

Diminished Regulatory Compliance

Decrease in Customer Satisfaction

Uneconomical Business Processes

Compromised Health, Safety & Security

Erratic Business Intelligence

Amplified Corporate Risk

Impaired Business Agility

Benefit and Impact

Page 161: CDMP preparation workshop EDW2016

What can & can’t be achieved with DQ?Can:• Make order from chaos

• Drive business accountability for enterprise data

• Keep track of data assets: where they’re stored, who’s got access, and how often they are cleansed and checked.

• Ensure data quality processes are established

Can’t:• Be solely responsible for managing data

• Perform miracles to create “data perfection”

• Magically fix all historic data quality issues

Page 162: CDMP preparation workshop EDW2016

Dimensions of Data Quality

› Completeness– The

proportion of stored data

against the potential of "100%

complete" Business rules

define what "100% complete"

represents.

› Uniqueness– No thing will be

recorded more than once

based upon how that thing is

identified. The Data item

measured against itself or its

counterpart in another data

set or database.

› Timeliness– The degree to

which data represent reality

from the required point in

time. The time the real world

event being recorded

occurred.

Source: DAMA UK

Data Quality

Dimensions

COMPLETENESS

UNIQUENESS

TIMELINESS

VALIDITY

ACCURACY

CONSISTENCY

› Validity– Data are valid if it

conforms to the syntax (format, type,

range) of its definition. Database,

metadata or documentation rules as

to the allowable types (string, integer,

floating point etc.), the format

(length, number of digits etc.) and

range (minimum, maximum or

contained within a set of allowable

values).

› Accuracy– The degree to which data

correctly describes the "real world"

object or event being described. The

degree to which data correctly

describes the "real world" object or

event being described.

› Consistency– The absence of

difference, when comparing two or

more representations of a thing

against a definition. The absence of

difference, when comparing two or

more representations of a thing

against a definition

Page 163: CDMP preparation workshop EDW2016

Data Profiling, Analysis & Assessment

1. Identify a data set for review

2. Catalogue the business uses of that data set

3. Subject the data set to empirical analysis using data profiling tools

4. List all potential anomalies

5. For each anomaly:

›Review with SME to determine if it represents a true data flaw

› Evaluate potential business impacts

6.Prioritise criticality of important anomalies in preparation for defining data metrics

Page 164: CDMP preparation workshop EDW2016

Typical Outputs of Data Quality Profiling

COLUMN PROFILING

•Record count, unique count, null count, blank count, pattern count

•Minimum, maximum, mean, mode, median, standard deviation, standard error

•Completeness (% of non-null records)

•Data type (defined v actual)

•Primary key candidates

FREQUENCY ANALYSIS•Count/percentage each distinct value

•Count/percentage each distinct character pattern

PRIMARY/FOREIGN KEY ANALYSIS

•Candidate primary/foreign key relationships

•Referential integrity checks between tables

DUPLICATE ANALYSIS •Identification of potential duplicate records (with variable sensitivity)

BUSINESS RULES CONFORMANCE

•Using a preliminary set of business rules

OUTLIER ANALYSIS •Identification of possible out of range values or anomalous records

Page 165: CDMP preparation workshop EDW2016

Data Quality Business Rules

Value domain membership

Definitional Conformance

Range conformance

Format compliance

Mapping conformance

Value presence and record completeness

Consistency rules

Accuracy verification

Uniqueness verification

Timeliness validation

Page 166: CDMP preparation workshop EDW2016

@inforacer

uk.linkedin.com/in/christophermichaelbradley/

+44 7973 184475 (mobile) +44 1225 923000 (office)

infomanagementlifeandpetrol.blogspot.com

Christopher BradleyI N F O R M A T I O N M A N A G E M E N T S T R A T E G I S T

[email protected]

T R A I N I N G

A D V I S O R Y

C O N S U LT I N G

C E R T I F I C AT I O N