Data Quality / Data Cleansing in BW · Data Cleansing? • Data cleansing occurs at all levels. •...

32
Data Quality / Data Cleansing in BW Lothar Schubert, BW RIG 8/2001 01

Transcript of Data Quality / Data Cleansing in BW · Data Cleansing? • Data cleansing occurs at all levels. •...

Page 1: Data Quality / Data Cleansing in BW · Data Cleansing? • Data cleansing occurs at all levels. • Avoid tendency to attempt cleanse only within the BW extraction process. • Often

Data Quality /Data Cleansing

in BW

Lothar Schubert, BW RIG

8/200101

Page 2: Data Quality / Data Cleansing in BW · Data Cleansing? • Data cleansing occurs at all levels. • Avoid tendency to attempt cleanse only within the BW extraction process. • Often

SAP AG 2001, Title of Presentation, Speaker Name 2

Agenda

About Data Quality

Data Cleansing

Data Validation

Data Repair

02

Page 3: Data Quality / Data Cleansing in BW · Data Cleansing? • Data cleansing occurs at all levels. • Avoid tendency to attempt cleanse only within the BW extraction process. • Often

SAP AG 2001, Title of Presentation, Speaker Name 3

Why Data Cleansing / Validation?

BW data are highly integrated.BW data are queried frequently.BW data are expected to be of high quality.BW requires high data accuracy for effective decision support.BW data often serve as foundation for further processing.

Data Quality / Information Quality – what it means:Data / Information is relevant.Data / Information is timely.Data / Information is correct. 03

Page 4: Data Quality / Data Cleansing in BW · Data Cleansing? • Data cleansing occurs at all levels. • Avoid tendency to attempt cleanse only within the BW extraction process. • Often

SAP AG 2001, Title of Presentation, Speaker Name 4

Sources for Dirty Data

Data are incorrect in source systemData consolidation causes issuesTechnical platforms are different (code pages, etc.)Administration issues (double loadings,…)Custom logicTechnology issues (SW, DB, O/S, HW, …)…

04

Page 5: Data Quality / Data Cleansing in BW · Data Cleansing? • Data cleansing occurs at all levels. • Avoid tendency to attempt cleanse only within the BW extraction process. • Often

SAP AG 2001, Title of Presentation, Speaker Name 5

Data Contaminants - 1

012-3344Cup Holder, green US012-3378Cup Holder, red US012-4122Lighter, black US012-5521white cover US012-7662green Cup Holder US

012-4011Cup Holder, green JP012-4122phone plug JP012-6611channel JP013-1452plastic cover, red JP013-1452(pink version of above) JP

red wheel, type "014-2221" CAblue wheel, type "012-3342" CA023-2211white wheel CA

multiple keys

inconsistent keys

invalid characters

surprises

free form fields

05

Page 6: Data Quality / Data Cleansing in BW · Data Cleansing? • Data cleansing occurs at all levels. • Avoid tendency to attempt cleanse only within the BW extraction process. • Often

SAP AG 2001, Title of Presentation, Speaker Name 6

Data Contaminants - 2

XYZ.com Ltd. 10/10/2000 $ 67221XYZ.com Ltd. 10/10/2000 $ 67221XYZ.com Ltd. 10/10/2000 $ 67221XYZ.com Ltd. 10/12/2000 $ 35332XYZ.com Ltd. 10/14/2000 $ 31122XYZ.com Ltd. 10/17/2000 $ 99999999XYZ.com Ltd. 10/19/2000 $ 78882

XYZ.com Ltd. 10/10/99 $ 44332XYZ.com Ltd. 10/12/99 $ 33222

ABC Co. 10/14/2000 $ 4333LMN Ltd. 10/14/2000 $ 9000XYZ.com Ltd. 10/14/2000 $ 31122ZZZ Sl. 10/14/2000 $ 122211

data redundancy

data anomalies

data format

data redundancy

06

Page 7: Data Quality / Data Cleansing in BW · Data Cleansing? • Data cleansing occurs at all levels. • Avoid tendency to attempt cleanse only within the BW extraction process. • Often

SAP AG 2001, Title of Presentation, Speaker Name 7

Data Contaminants - 3

Data Contamination during upload via Exits

Application ExitsGeneric BW Exit RSAP0001Transfer- / Update-RoutinesVirtual Exits

Consider the following:Timeliness of DataCheck for VersionsCheck for Return CodesDelta Trigger CapabilitiesPerformance and General Architecture

07

Page 8: Data Quality / Data Cleansing in BW · Data Cleansing? • Data cleansing occurs at all levels. • Avoid tendency to attempt cleanse only within the BW extraction process. • Often

SAP AG 2001, Title of Presentation, Speaker Name 8

Where:In the Source System?During Data Extraction?In the BW System?

When:In the productive phase?In the test phase?In the blueprint phase?

Who:Is it a technical issue?Is it a project issue?Is it an organizational issue?

Where, When and by Whom to implementData Cleansing?

• Data cleansing occurs at all levels.• Avoid tendency to attempt cleanse onlywithin the BW extraction process.• Often data cleansing is best performed atthe legacy / source system level.

• Data cleansing is one of the greatestrisks in data movement efforts.• Design belongs into blueprint phase.• Test data are often cleaner than real data.

Often data quality and inconsistencyissues are systemic in the organizationand must be addressed at higher levelin the organization to get resolved.

08

Page 9: Data Quality / Data Cleansing in BW · Data Cleansing? • Data cleansing occurs at all levels. • Avoid tendency to attempt cleanse only within the BW extraction process. • Often

SAP AG 2001, Title of Presentation, Speaker Name 9

ROI of Data Quality

You should ask…What is the risk of incomplete / incorrect data sets?What is the cost to fix data, once contaminated?What are corporate quality standard?

However, also you should ask…What is the reliability of source data?Where is the point of diminishing returns?

Data Quality as an Investment

09

Page 10: Data Quality / Data Cleansing in BW · Data Cleansing? • Data cleansing occurs at all levels. • Avoid tendency to attempt cleanse only within the BW extraction process. • Often

SAP AG 2001, Title of Presentation, Speaker Name 10

Cleansing

dwh

ods

BW Architecture

Extr

actio

n /

Ope

n St

agin

g

Tran

sfor

mat

ion

Integration

Granularity

any

sour

ce

Asy

nchr

onou

s D

istr

ibut

ion

-O

pen

HU

B S

ervi

ces

any

targ

et

Business InformationBusiness Information WarehouseWarehouseSynchronous A

ccess

port

al/

appl

icat

iondata marts

master data

PersistentStagingArea

Bus

ines

s R

ules Info

Cube

InfoCube

InfoCube

InfoCube

odsobject

odsobject

odsobject

odsobject

odsobject

Bus

ines

s R

ules

10

Page 11: Data Quality / Data Cleansing in BW · Data Cleansing? • Data cleansing occurs at all levels. • Avoid tendency to attempt cleanse only within the BW extraction process. • Often

SAP AG 2001, Title of Presentation, Speaker Name 11

Agenda

About Data Quality

Data Cleansing

Data Validation

Data Repair

11

Page 12: Data Quality / Data Cleansing in BW · Data Cleansing? • Data cleansing occurs at all levels. • Avoid tendency to attempt cleanse only within the BW extraction process. • Often

SAP AG 2001, Title of Presentation, Speaker Name 12

Referential Integrity

Assuring Referential Integrity can be a majorchallenge in DWH design…

Relax!BW does it for you.

Automated checks.Central Metadata Dictionary.Integrated Architecture. 12

Page 13: Data Quality / Data Cleansing in BW · Data Cleansing? • Data cleansing occurs at all levels. • Avoid tendency to attempt cleanse only within the BW extraction process. • Often

SAP AG 2001, Title of Presentation, Speaker Name 13

Master Data Validation

13

Page 14: Data Quality / Data Cleansing in BW · Data Cleansing? • Data cleansing occurs at all levels. • Avoid tendency to attempt cleanse only within the BW extraction process. • Often

SAP AG 2001, Title of Presentation, Speaker Name 14

C

Check for Permitted Characters

Case A: characters not permitted Case B: characters permitted

Permitted by standard:

!"%&'()*+,-/:;<=>?_0123456789

ABCDEFGHIJKLMNOPQRSTUVWXYZ14

Page 15: Data Quality / Data Cleansing in BW · Data Cleansing? • Data cleansing occurs at all levels. • Avoid tendency to attempt cleanse only within the BW extraction process. • Often

SAP AG 2001, Title of Presentation, Speaker Name 15

ConsiderPerformance Impacts!

Checking for…

• use of character values in the Data type NUMC fields• correct consideration of the conversion routine ALPHA• use of lower case letters• use of special characters• plausibility of date / time fields

Consistency Check for Characteristic Values

15

Page 16: Data Quality / Data Cleansing in BW · Data Cleansing? • Data cleansing occurs at all levels. • Avoid tendency to attempt cleanse only within the BW extraction process. • Often

SAP AG 2001, Title of Presentation, Speaker Name 16

Data Integrity Checks on Packages

APIs are available to read PSA contentsFunction RSAR_ODS_MAINTAIN,….Check for reference between recordsSummary checks, ….

16

Page 17: Data Quality / Data Cleansing in BW · Data Cleansing? • Data cleansing occurs at all levels. • Avoid tendency to attempt cleanse only within the BW extraction process. • Often

SAP AG 2001, Title of Presentation, Speaker Name 17

Tip:Consider AutomationVia Event Chains.

Handling of Invalid Data Records

StagingEngine

StagingEngine

Business Information Warehouse

PSAExtractExtract OKOK

SchedulerScheduler

Error Handling:1- No Update, No Reporting2- Valid Records Update, No Reporting3- Valid Records Update, Reporting Possible

ErrorError

Correction of invalid data:• within source System• manually in PSA• by Rule (see RS_ERRORLOG_EXAMPLE)

PSA

17

Page 18: Data Quality / Data Cleansing in BW · Data Cleansing? • Data cleansing occurs at all levels. • Avoid tendency to attempt cleanse only within the BW extraction process. • Often

SAP AG 2001, Title of Presentation, Speaker Name 18

Local Master Data

18

Page 19: Data Quality / Data Cleansing in BW · Data Cleansing? • Data cleansing occurs at all levels. • Avoid tendency to attempt cleanse only within the BW extraction process. • Often

SAP AG 2001, Title of Presentation, Speaker Name 19

Deletion Features during Update

19

Page 20: Data Quality / Data Cleansing in BW · Data Cleansing? • Data cleansing occurs at all levels. • Avoid tendency to attempt cleanse only within the BW extraction process. • Often

SAP AG 2001, Title of Presentation, Speaker Name 20

Agenda

About Data Quality

Data Cleansing

Data Validation

Data Repair

20

Page 21: Data Quality / Data Cleansing in BW · Data Cleansing? • Data cleansing occurs at all levels. • Avoid tendency to attempt cleanse only within the BW extraction process. • Often

SAP AG 2001, Title of Presentation, Speaker Name 21

Aggregate Check Tool

Report RRX_TRACE_CHECK_AGGREGATE

Check OSS Note 202469 for details. 21

Page 22: Data Quality / Data Cleansing in BW · Data Cleansing? • Data cleansing occurs at all levels. • Avoid tendency to attempt cleanse only within the BW extraction process. • Often

SAP AG 2001, Title of Presentation, Speaker Name 22

Custom Check Points

Key figuresArticleCashier Number

Article Cashier Number Sales (POS Receipts)Sales (Receipt)Overall result777921 1128 $ 0.00 $ 0.00 $ 0.00777922 1128 $ 0.00 $ 0.00 $ 0.00777923 1128 $ 0.00 $ 0.00 $ 0.00Overall result $ 0.00 $ 0.00 $ 0.00

• Identify check points in source system

• Write check point data to custom table

• Use generic extractor for load

• Populate check cube

• Perform Compress with 0 suppression

• Execute exception report

22

Page 23: Data Quality / Data Cleansing in BW · Data Cleansing? • Data cleansing occurs at all levels. • Avoid tendency to attempt cleanse only within the BW extraction process. • Often

SAP AG 2001, Title of Presentation, Speaker Name 23

Audit Dimensions / Data Modelling

Audit Dimensions can identify:When were the data created?Which source did the data come from?Which tools where used for extraction?Which rules had touched the data?…

23

Page 24: Data Quality / Data Cleansing in BW · Data Cleansing? • Data cleansing occurs at all levels. • Avoid tendency to attempt cleanse only within the BW extraction process. • Often

SAP AG 2001, Title of Presentation, Speaker Name 24

Display of individual Requests

Key figures Amount

Request ID

Request ID Amount12389 $ 80,000.00# $ 28,078,400.00Overall result $ 28,158,400.00

You can use the REQUEST ID to displayand analyze individual requests.

24

Page 25: Data Quality / Data Cleansing in BW · Data Cleansing? • Data cleansing occurs at all levels. • Avoid tendency to attempt cleanse only within the BW extraction process. • Often

SAP AG 2001, Title of Presentation, Speaker Name 25

Check Programs (RSRV)

Infocubes: Fact, SID, MID,…HierarchiesInfoobjectsDDIC DefinitionsCharacteristic Values…

25

Page 26: Data Quality / Data Cleansing in BW · Data Cleansing? • Data cleansing occurs at all levels. • Avoid tendency to attempt cleanse only within the BW extraction process. • Often

SAP AG 2001, Title of Presentation, Speaker Name 26

Data Quality Check Flags

26

Page 27: Data Quality / Data Cleansing in BW · Data Cleansing? • Data cleansing occurs at all levels. • Avoid tendency to attempt cleanse only within the BW extraction process. • Often

SAP AG 2001, Title of Presentation, Speaker Name 27

Agenda

About Data Quality

Data Cleansing

Data Validation

Data Repair

27

Page 28: Data Quality / Data Cleansing in BW · Data Cleansing? • Data cleansing occurs at all levels. • Avoid tendency to attempt cleanse only within the BW extraction process. • Often

SAP AG 2001, Title of Presentation, Speaker Name 28

Request Deletion Infocube / ODS

ConsiderLimitations!

28

Page 29: Data Quality / Data Cleansing in BW · Data Cleansing? • Data cleansing occurs at all levels. • Avoid tendency to attempt cleanse only within the BW extraction process. • Often

SAP AG 2001, Title of Presentation, Speaker Name 29

Selective Deletion

29

Page 30: Data Quality / Data Cleansing in BW · Data Cleansing? • Data cleansing occurs at all levels. • Avoid tendency to attempt cleanse only within the BW extraction process. • Often

SAP AG 2001, Title of Presentation, Speaker Name 30

InfoCube Reconstruction

30

Page 31: Data Quality / Data Cleansing in BW · Data Cleansing? • Data cleansing occurs at all levels. • Avoid tendency to attempt cleanse only within the BW extraction process. • Often

SAP AG 2001, Title of Presentation, Speaker Name 31

InfoCube Request Reversal Posting

Works still fine after compression / roll-up! 31

Page 32: Data Quality / Data Cleansing in BW · Data Cleansing? • Data cleansing occurs at all levels. • Avoid tendency to attempt cleanse only within the BW extraction process. • Often

Data Quality /Data Cleansing

in BW

Lothar Schubert, BW RIG

8/200132