LDQ 2014 DQ Methodology
-
Upload
amrapali-zaveri -
Category
Data & Analytics
-
view
123 -
download
0
description
Transcript of LDQ 2014 DQ Methodology
A Methodology for Assessment of
Linked Data QualityAnisa Rula
Amrapali Zaveri
Outline➢ Linked Data Quality
○ Current State ○ Limitations
➢Quality Assessment Methodology ○ 3 phases, 6 steps
➢Conclusion ○ Future Work
Linked Data Quality● c.a. 50 Billion Facts in
the Linked Data Cloud ● But, what about the quality?
● Data is only as good as its quality !
Linked Data Quality➢ 30 approaches, 18 Dimensions, 69 Metrics* ➢ 12 Tools
○ Automated ○ Semi-automated
➢No generalized methodology ➢Not taking into account the actual use case/user
requirements ➢Only assessment, no improvement * http://www.semantic-web-journal.net/content/quality-assessment-linked-data-survey
Quality Assessment Methodology for Linked Data
➢ 3 Phases ➢ 6 steps
Phase I: Requirement Analysis Step I: Use Case Analysis - Description that best illustrates the intended usage of the dataset(s) Two types of users ➢Consumers ➢Potential consumers
Phase II: Quality AssessmentStep II: Identification of quality issues ➢Based on the use case ➢Checklist-based approach ➢Yes - 1, No - 0 ➢ List of quality dimensions
Phase II: Quality AssessmentStep III: Statistics and Low-level Analysis ➢Generic statistics ➢Example
○ Interlinking degree ○ Blank nodes
Phase II: Quality AssessmentStep IV: Advanced Analysis ➢High-level metrics ➢Example
○ Accuracy ○ Completeness
➢Requires (i) input and (ii) target dataset
Data Quality Score➢Ratio
○ DQscore = 1 - (V/T) ■ V - total no. of instances that violate a DQ rule ■ T - total no. of relevant instances ■ for each property
○ DQweightedscore= (DQscore * wi / W) ■ wi - weight ■ W - sum of all weighted factors of the properties ■ for quality of overall properties
Phase III: Quality ImprovementStep V: Root Cause Analysis ➢Analyze cause of each quality issue ➢Helps user interpret the results ➢Detect whether the problem occurs in the
original dataset ➢ In case original dataset is unavailable,
analyze the available dataset to determine the cause
Phase III: Quality ImprovementStep VI: Fixing Quality Problems ➢Semi-automatic
○ Consistency ○ Completeness ○ Syntactic validity
➢Crowdsourcing* ○ Semantic accuracy
○ Datatypes ○ Interlinks
* Acosta et al., Crowdsourcing Linked Data Quality Assessment. ISWC 2013.
Conclusion and Future Work➢Assessment methodology - 3 phases, 6
steps ➢Focus on use case ➢ Improvement phase
!Future Work ➢Application to an actual use case ➢Build a tool
Questions Suggestions Comments
Thank you
@AnisaRula @amrapaliz