On Data quality

8
AMSTERDAM ON DATA QUALITY Marco Fossati [email protected] 30TH JANUARY 2014

Transcript of On Data quality

Page 1: On Data quality

AMSTERDAM

ON DATA QUALITYMarco Fossati [email protected]

30TH JANUARY 2014

Page 2: On Data quality

MAPPING PARADIGM

A. TEMPLATE-DEPENDENT ERROR-PRONE HETEROGENEOUS USAGE

!

B. FULLY MANUAL COSTLY

�2

SolutionMACHINE LEARNING-BASED METHODS

TYPE INFERENCE CONFIDENCE SCORE MAPPING ASSISTANT

Problem

Page 3: On Data quality

ONTOLOGY

COMMUNITY-BASED SHALLOW SEMANTICS LACK OF COVERAGE UNBALANCED • TOO GENERIC • REDUNDANT

�3

SolutionA. CONSISTENCY CHECK

CLASS USAGE B. DATA-DRIVEN SCHEMA

WIKIPEDIA CATEGORIES

Problem

Page 4: On Data quality

LINKING MULTIMEDIA DATA SOURCESPHOTO AUDIO VIDEO

�4

Page 5: On Data quality

PHOTO !

FLICKR WRAPPERMAINTENANCE? UPDATES?

�5

Page 6: On Data quality

AUDIOBANDCAMP GROOVESHARK RDIO SOUNDCLOUD

�6

Page 7: On Data quality

VIDEOIMDB ROTTEN TOMATOES VIMEO YOUTUBE

�7

Page 8: On Data quality

THANKS FOR YOUR ATTENTION!

Marco Fossati [email protected]