Post on 12-Jan-2016
“Solving Data Inconsistencies and Data Integration with a Data
Quality Manager”
Presented by Maria del Pilar Angeles, Lachlan M.MacKinnonSchool of Mathematical and Computer Sciences, Heriot-Watt University,
Edinburgh,EH14 4AS{pilar,lachlan}@macs.hw.ac.uk
Doctoral Consortium
Solving Data Inconsistencies and Data Integration with a Data Quality ManagerAngeles Maria del Pilar, MacKinnon Lachlan M. 2
Agenda• Introduction• Proposal• Data Quality Manager Components
– Reference Model– Measure Model– Assessment Model– Quality Metadata
• Information Integration Process– Classification of DataSources– Selection of Best Datasources– Query Planning– Data Fusion– Ranking of Query results
• Questions
Solving Data Inconsistencies and Data Integration with a Data Quality ManagerAngeles Maria del Pilar, MacKinnon Lachlan M. 3
Naming Data Representation
Domain Data scalingdefinition Data Precision
GeneralizationAbstract Aggregation
Data value attributeSchematic Attribute entitydiscrepancy Data value entity
Known inconsistencyData Value Temporal inconsistency
Acceptable inconsistency
Default value
Database idEntity Namingdefinition Union compatibility
Structural Schema isomorphismConflicts Missing data item
Attribute integrity constraints
(Sheth92)
Introduction
Approached by
Ontology
Metadata
Transformation rules
Mapping
Solving Data Inconsistencies and Data Integration with a Data Quality ManagerAngeles Maria del Pilar, MacKinnon Lachlan M. 4
Introduction
Emp_no Name salary
123987 Alastair Freich
14000
456339 Fernando Lujan
NULL
SSN fullname sal
123987 A. Freich 20000
789222 Fiona Shaning
15000
employe SFE salary
123987 Al. Freich NULL
393765 Lauren MacMillan
14500
DS 1 DS 2 DS 3
Employee_number Full_name_employee Salary
123987 Alastair F. 14000
123987 A. Freich 20000
123987 Al. Freich NULL
456339 Fernando Lujan NULL
393765 Lauren MacMillan 14500
789222 Fiona Shaning 15000
Solving Data Inconsistencies and Data Integration with a Data Quality ManagerAngeles Maria del Pilar, MacKinnon Lachlan M. 5
Proposal
We propose the development of a Data Quality Manager (DQM) to establish communication between the process of integration of information, the user and the application, to deal with semantic heterogeneity.
Solving Data Inconsistencies and Data Integration with a Data Quality ManagerAngeles Maria del Pilar, MacKinnon Lachlan M. 6
Proposal
Local Schema 1
Local User 1 Local User 2 Local User N
Wrapper
Global User 1 Global User 2 Global User 3
ExportSchema 1
ExportSchema N
ExportSchema 2
Mediator
Data Quality Manager
Applications
GlobalSchema
Data Source 1
Data Source 2
Data Source N
WrapperWrapper …
Local Schema 2
Local Schema N
Global User M…
Solving Data Inconsistencies and Data Integration with a Data Quality ManagerAngeles Maria del Pilar, MacKinnon Lachlan M. 7
• Definition of Quality Criteria
Reference Model
DQM Components
Solving Data Inconsistencies and Data Integration with a Data Quality ManagerAngeles Maria del Pilar, MacKinnon Lachlan M. 8
• Definition of Quality Criteria
• Definition of Metrics
Measurement Model
Reference Model
DQM Components
Solving Data Inconsistencies and Data Integration with a Data Quality ManagerAngeles Maria del Pilar, MacKinnon Lachlan M. 9
• Definition of Quality Criteria
• Definition of Metrics
• Definition of Assessment methodsAssessment
Model
Measurement Model
Reference Model
DQM Components
Solving Data Inconsistencies and Data Integration with a Data Quality ManagerAngeles Maria del Pilar, MacKinnon Lachlan M. 10
• Definition of Quality Criteria
• Definition of Metrics
• Definition of Assessment methods
• Definition of Quality Metadata (QMD)
QualityMetadata
Assessment Model
Measurement Model
Reference Model
DQM Components
Solving Data Inconsistencies and Data Integration with a Data Quality ManagerAngeles Maria del Pilar, MacKinnon Lachlan M. 11
Completeness
Accuracy
Currency
Survey,Queries,
benchmarks
# incomplete # total
# errors # total
Age + delivery time – input time
Based on DQM components, classify the data sources
QMD
QMD Population
DQM: Data Quality Manager
QMD: Quality Meta Data
Solving Data Inconsistencies and Data Integration with a Data Quality ManagerAngeles Maria del Pilar, MacKinnon Lachlan M. 12
Data
Quality
Manager
Selection of
Best Data Sources
Information Integration Process
Solving Data Inconsistencies and Data Integration with a Data Quality ManagerAngeles Maria del Pilar, MacKinnon Lachlan M. 13
Data
Quality
Manager Query
Planning
Selection of
Best Data Sources
Information Integration Process
Solving Data Inconsistencies and Data Integration with a Data Quality ManagerAngeles Maria del Pilar, MacKinnon Lachlan M. 14
Data
Quality
Manager
Fusion of Data
Inconsistencies
Query
Planning
Selection of
Best Data Sources
Information Integration Process
Solving Data Inconsistencies and Data Integration with a Data Quality ManagerAngeles Maria del Pilar, MacKinnon Lachlan M. 15
Data
Quality
Manager
Query
Integration
Fusion of Data
Inconsistencies
Query
Planning
Selection of
Best Data Sources
Information Integration Process
Solving Data Inconsistencies and Data Integration with a Data Quality ManagerAngeles Maria del Pilar, MacKinnon Lachlan M. 16
Data
Quality
Manager
Ranking of
Query results
Query
Integration
Fusion of Data
Inconsistencies
Query
Planning
Selection of
Best Data Sources
Information Integration Process
Solving Data Inconsistencies and Data Integration with a Data Quality ManagerAngeles Maria del Pilar, MacKinnon Lachlan M. 17
User Query
1. The Quality user priorities are given by the user.
Mapping Local/Global
Schemas
Selection of best Data Sources
QMD
Quality User Priorities
Data sources Involved in the
Query1
2 3
4Ranking of best Data Sources
2. The ranking of best data sources involved in the query is given before execution
Solving Data Inconsistencies and Data Integration with a Data Quality ManagerAngeles Maria del Pilar, MacKinnon Lachlan M. 18
User Query
Top ranking Query Plan
Query Partition
QMD
Quality User
Priorities
QueryA
QueryB
QueryCPlan 1Plan 2Plan 3
.Plan N
Query Planning
Solving Data Inconsistencies and Data Integration with a Data Quality ManagerAngeles Maria del Pilar, MacKinnon Lachlan M. 19
ResultX
DataInconsistencies
DetectionData
fusion
QMD
Quality user
priorities
ResultY
ResultZ
InconsistentQuery Result
ExecuteQueryPlan
Data Fusion
ConsistentQuery Result
As in the DQM is stored where data comes from, it is possible to make decisions at data fusion time.
Solving Data Inconsistencies and Data Integration with a Data Quality ManagerAngeles Maria del Pilar, MacKinnon Lachlan M. 20
QMD
Quality user
priorities
DataFusion
ResultJ
ResultK
ResultLQuery
IntegrationQueryResult
Ranking
Ranking Query Result
ConsistentQuery Result
Solving Data Inconsistencies and Data Integration with a Data Quality ManagerAngeles Maria del Pilar, MacKinnon Lachlan M. 21
ConclusionUsing Data Quality Manager we can..
• Approach data value level inconsistencies during Information Integration Process, using data quality properties.
• User may demand different quality priorities at query time.
• Manage user quality priorities AND data quality properties to give the expected quality query result by the user.
What we need to do now….
Identify tools for measurement, assessment and develop a QMD.
Store quality of data sources involved in the heterogeneous system.
Identify techniques for
Ranking of data sources and plans involved in the query
Inconsistency detection
Fusion data using data source and data level properties
Ranking of query results.
Solving Data Inconsistencies and Data Integration with a Data Quality ManagerAngeles Maria del Pilar, MacKinnon Lachlan M. 22
Questions?
Solving Data Inconsistencies and Data Integration with a Data Quality ManagerAngeles Maria del Pilar, MacKinnon Lachlan M. 23
Thanks !!