Data Quality / Data Cleansing in BW · Data Cleansing? • Data cleansing occurs at all levels. •...
Transcript of Data Quality / Data Cleansing in BW · Data Cleansing? • Data cleansing occurs at all levels. •...
Data Quality /Data Cleansing
in BW
Lothar Schubert, BW RIG
8/200101
SAP AG 2001, Title of Presentation, Speaker Name 2
Agenda
About Data Quality
Data Cleansing
Data Validation
Data Repair
02
SAP AG 2001, Title of Presentation, Speaker Name 3
Why Data Cleansing / Validation?
BW data are highly integrated.BW data are queried frequently.BW data are expected to be of high quality.BW requires high data accuracy for effective decision support.BW data often serve as foundation for further processing.
Data Quality / Information Quality – what it means:Data / Information is relevant.Data / Information is timely.Data / Information is correct. 03
SAP AG 2001, Title of Presentation, Speaker Name 4
Sources for Dirty Data
Data are incorrect in source systemData consolidation causes issuesTechnical platforms are different (code pages, etc.)Administration issues (double loadings,…)Custom logicTechnology issues (SW, DB, O/S, HW, …)…
04
SAP AG 2001, Title of Presentation, Speaker Name 5
Data Contaminants - 1
012-3344Cup Holder, green US012-3378Cup Holder, red US012-4122Lighter, black US012-5521white cover US012-7662green Cup Holder US
012-4011Cup Holder, green JP012-4122phone plug JP012-6611channel JP013-1452plastic cover, red JP013-1452(pink version of above) JP
red wheel, type "014-2221" CAblue wheel, type "012-3342" CA023-2211white wheel CA
multiple keys
inconsistent keys
invalid characters
surprises
free form fields
05
SAP AG 2001, Title of Presentation, Speaker Name 6
Data Contaminants - 2
XYZ.com Ltd. 10/10/2000 $ 67221XYZ.com Ltd. 10/10/2000 $ 67221XYZ.com Ltd. 10/10/2000 $ 67221XYZ.com Ltd. 10/12/2000 $ 35332XYZ.com Ltd. 10/14/2000 $ 31122XYZ.com Ltd. 10/17/2000 $ 99999999XYZ.com Ltd. 10/19/2000 $ 78882
XYZ.com Ltd. 10/10/99 $ 44332XYZ.com Ltd. 10/12/99 $ 33222
ABC Co. 10/14/2000 $ 4333LMN Ltd. 10/14/2000 $ 9000XYZ.com Ltd. 10/14/2000 $ 31122ZZZ Sl. 10/14/2000 $ 122211
data redundancy
data anomalies
data format
data redundancy
06
SAP AG 2001, Title of Presentation, Speaker Name 7
Data Contaminants - 3
Data Contamination during upload via Exits
Application ExitsGeneric BW Exit RSAP0001Transfer- / Update-RoutinesVirtual Exits
Consider the following:Timeliness of DataCheck for VersionsCheck for Return CodesDelta Trigger CapabilitiesPerformance and General Architecture
07
SAP AG 2001, Title of Presentation, Speaker Name 8
Where:In the Source System?During Data Extraction?In the BW System?
When:In the productive phase?In the test phase?In the blueprint phase?
Who:Is it a technical issue?Is it a project issue?Is it an organizational issue?
Where, When and by Whom to implementData Cleansing?
• Data cleansing occurs at all levels.• Avoid tendency to attempt cleanse onlywithin the BW extraction process.• Often data cleansing is best performed atthe legacy / source system level.
• Data cleansing is one of the greatestrisks in data movement efforts.• Design belongs into blueprint phase.• Test data are often cleaner than real data.
Often data quality and inconsistencyissues are systemic in the organizationand must be addressed at higher levelin the organization to get resolved.
08
SAP AG 2001, Title of Presentation, Speaker Name 9
ROI of Data Quality
You should ask…What is the risk of incomplete / incorrect data sets?What is the cost to fix data, once contaminated?What are corporate quality standard?
However, also you should ask…What is the reliability of source data?Where is the point of diminishing returns?
Data Quality as an Investment
09
SAP AG 2001, Title of Presentation, Speaker Name 10
Cleansing
dwh
ods
BW Architecture
Extr
actio
n /
Ope
n St
agin
g
Tran
sfor
mat
ion
Integration
Granularity
any
sour
ce
Asy
nchr
onou
s D
istr
ibut
ion
-O
pen
HU
B S
ervi
ces
any
targ
et
Business InformationBusiness Information WarehouseWarehouseSynchronous A
ccess
port
al/
appl
icat
iondata marts
master data
PersistentStagingArea
Bus
ines
s R
ules Info
Cube
InfoCube
InfoCube
InfoCube
odsobject
odsobject
odsobject
odsobject
odsobject
Bus
ines
s R
ules
10
SAP AG 2001, Title of Presentation, Speaker Name 11
Agenda
About Data Quality
Data Cleansing
Data Validation
Data Repair
11
SAP AG 2001, Title of Presentation, Speaker Name 12
Referential Integrity
Assuring Referential Integrity can be a majorchallenge in DWH design…
Relax!BW does it for you.
Automated checks.Central Metadata Dictionary.Integrated Architecture. 12
SAP AG 2001, Title of Presentation, Speaker Name 13
Master Data Validation
13
SAP AG 2001, Title of Presentation, Speaker Name 14
C
Check for Permitted Characters
Case A: characters not permitted Case B: characters permitted
Permitted by standard:
!"%&'()*+,-/:;<=>?_0123456789
ABCDEFGHIJKLMNOPQRSTUVWXYZ14
SAP AG 2001, Title of Presentation, Speaker Name 15
ConsiderPerformance Impacts!
Checking for…
• use of character values in the Data type NUMC fields• correct consideration of the conversion routine ALPHA• use of lower case letters• use of special characters• plausibility of date / time fields
Consistency Check for Characteristic Values
15
SAP AG 2001, Title of Presentation, Speaker Name 16
Data Integrity Checks on Packages
APIs are available to read PSA contentsFunction RSAR_ODS_MAINTAIN,….Check for reference between recordsSummary checks, ….
16
SAP AG 2001, Title of Presentation, Speaker Name 17
Tip:Consider AutomationVia Event Chains.
Handling of Invalid Data Records
StagingEngine
StagingEngine
Business Information Warehouse
PSAExtractExtract OKOK
SchedulerScheduler
Error Handling:1- No Update, No Reporting2- Valid Records Update, No Reporting3- Valid Records Update, Reporting Possible
ErrorError
Correction of invalid data:• within source System• manually in PSA• by Rule (see RS_ERRORLOG_EXAMPLE)
PSA
17
SAP AG 2001, Title of Presentation, Speaker Name 18
Local Master Data
18
SAP AG 2001, Title of Presentation, Speaker Name 19
Deletion Features during Update
19
SAP AG 2001, Title of Presentation, Speaker Name 20
Agenda
About Data Quality
Data Cleansing
Data Validation
Data Repair
20
SAP AG 2001, Title of Presentation, Speaker Name 21
Aggregate Check Tool
Report RRX_TRACE_CHECK_AGGREGATE
Check OSS Note 202469 for details. 21
SAP AG 2001, Title of Presentation, Speaker Name 22
Custom Check Points
Key figuresArticleCashier Number
Article Cashier Number Sales (POS Receipts)Sales (Receipt)Overall result777921 1128 $ 0.00 $ 0.00 $ 0.00777922 1128 $ 0.00 $ 0.00 $ 0.00777923 1128 $ 0.00 $ 0.00 $ 0.00Overall result $ 0.00 $ 0.00 $ 0.00
• Identify check points in source system
• Write check point data to custom table
• Use generic extractor for load
• Populate check cube
• Perform Compress with 0 suppression
• Execute exception report
22
SAP AG 2001, Title of Presentation, Speaker Name 23
Audit Dimensions / Data Modelling
Audit Dimensions can identify:When were the data created?Which source did the data come from?Which tools where used for extraction?Which rules had touched the data?…
23
SAP AG 2001, Title of Presentation, Speaker Name 24
Display of individual Requests
Key figures Amount
Request ID
Request ID Amount12389 $ 80,000.00# $ 28,078,400.00Overall result $ 28,158,400.00
You can use the REQUEST ID to displayand analyze individual requests.
24
SAP AG 2001, Title of Presentation, Speaker Name 25
Check Programs (RSRV)
Infocubes: Fact, SID, MID,…HierarchiesInfoobjectsDDIC DefinitionsCharacteristic Values…
25
SAP AG 2001, Title of Presentation, Speaker Name 26
Data Quality Check Flags
26
SAP AG 2001, Title of Presentation, Speaker Name 27
Agenda
About Data Quality
Data Cleansing
Data Validation
Data Repair
27
SAP AG 2001, Title of Presentation, Speaker Name 28
Request Deletion Infocube / ODS
ConsiderLimitations!
28
SAP AG 2001, Title of Presentation, Speaker Name 29
Selective Deletion
29
SAP AG 2001, Title of Presentation, Speaker Name 30
InfoCube Reconstruction
30
SAP AG 2001, Title of Presentation, Speaker Name 31
InfoCube Request Reversal Posting
Works still fine after compression / roll-up! 31
Data Quality /Data Cleansing
in BW
Lothar Schubert, BW RIG
8/200132