Dbpedia leipzig2014 csarasua_open

29
Cristina Sarasua Data Interlinking together with Crowd Workers 1 Institute for Web Science and Technologies · University of Koblenz-Landau, Germany Data Interlinking together with Crowd Workers Cristina Sarasua 2nd DBpedia Community Meeting, Leipzig

Transcript of Dbpedia leipzig2014 csarasua_open

Cristina Sarasua Data Interlinking together with Crowd Workers 1Institute for Web Science and Technologies · Univ ersity of Koblenz-Landau, Germany

Data Interlinkingtogether with Crowd Workers

Cristina Sarasua

2nd DBpedia Community Meeting, Leipzig

Cristina Sarasua Data Interlinking together with Crowd Workers 2

Image: http://www.w3.org/DesignIssues/diagrams/lod/597992118v2_350x350_Back.jpg

Cristina Sarasua Data Interlinking together with Crowd Workers 3

Scenario for data interlinking

Music data integration

Cristina Sarasua Data Interlinking together with Crowd Workers 4

• A: Extending the description of resources� enabling richer queries

What for?

dbpediasong1

d1song1

owl:sameAs

dbpediaLeipzig

d1song1

o:wasPlayedIn

Cristina Sarasua Data Interlinking together with Crowd Workers 5

The Problem

Cristina Sarasua Data Interlinking together with Crowd Workers 6

d1:song1 a ma:AudioTrack;ma:title ``UFO´´;ma:locator musicexample:s1896.mp3^^xsd:anyURI;ma:hasKeyword d1:colplay;

d1:song1 a ma:AudioTrack;ma:title ``UFO´´;ma:locator musicexample:s1896.mp3^^xsd:anyURI;ma:hasKeyword d1:colplay;

dbpedia:U.F.O._(song a dbpedia-owl:Work;a dbpedia-owl:Song;dc:title ``U.F.O.´´;prop:artist dbpedia:Coldplay;

dbpedia:UFO_(band) a dbpedia-owl:Band;a dbpedia-owl:Song;dc:title ``U.F.O.´´;

dbpedia:U.F.O._(song a dbpedia-owl:Work;a dbpedia-owl:Song;dc:title ``U.F.O.´´;prop:artist dbpedia:Coldplay;

dbpedia:UFO_(band) a dbpedia-owl:Band;a dbpedia-owl:Song;dc:title ``U.F.O.´´;

D1

DBpedia

owl:sameAs ?

Cristina Sarasua Data Interlinking together with Crowd Workers 7

• Goal : typed link tocreate (e.g. owl:sameAs)

• Information to analyse(i.e. attribute-values)

• Decision criterion (e.g. levenshtein < 2)

automatic

Cristina Sarasua Data Interlinking together with Crowd Workers 8

d1:song1 a ma:AudioTrack;ma:title ``UFO´´;ma:locator musicexample:s1896.mp3^^xsd:anyURI;ma:hasKeyword d1:colplay;

d1:song1 a ma:AudioTrack;ma:title ``UFO´´;ma:locator musicexample:s1896.mp3^^xsd:anyURI;ma:hasKeyword d1:colplay;

dbpedia:U.F.O._(song a dbpedia-owl:Work;a dbpedia-owl:Song ;dc:title ``U.F.O.´´;prop:artist dbpedia:Coldplay;

dbpedia:UFO_(band) a dbpedia-owl:Band ;prop:name ``U.F.O.´´;

dbpedia:U.F.O._(song a dbpedia-owl:Work;a dbpedia-owl:Song ;dc:title ``U.F.O.´´;prop:artist dbpedia:Coldplay;

dbpedia:UFO_(band) a dbpedia-owl:Band ;prop:name ``U.F.O.´´;

D1

DBpedia

Human toguide theprocess

owl:sameAs ?

Cristina Sarasua Data Interlinking together with Crowd Workers 9

d1:song1 a ma:AudioTrack;ma:title ``Soon´´;ma:locator musicexample:s1896.mp3^^xsd:anyURI;

d1:song1 a ma:AudioTrack;ma:title ``Soon´´;ma:locator musicexample:s1896.mp3^^xsd:anyURI;

dbpedia:Transatlantic_KK a dbpedia-owl:Work;a dbpedia-owl:Album;dc:title ``Soon´´;dbprop:artist dbpedia:Delorean_(band);

dbpedia:Soon_(Tanya_Tucker_song) a dbpedia-owl:Work ;a dbpedia-owl:MusicalWork;dc:title ``Soon´´;dbprop:artist dbpedia:Tanya_Tucker;

dbpedia:Transatlantic_KK a dbpedia-owl:Work;a dbpedia-owl:Album;dc:title ``Soon´´;dbprop:artist dbpedia:Delorean_(band);

dbpedia:Soon_(Tanya_Tucker_song) a dbpedia-owl:Work ;a dbpedia-owl:MusicalWork;dc:title ``Soon´´;dbprop:artist dbpedia:Tanya_Tucker;

D1

DBpedia

Human tocorrect

owl:sameAs ?

Cristina Sarasua Data Interlinking together with Crowd Workers 10

d1:song1 a ma:AudioTrack;ma:title ``UFO´´;ma:locator musicexample:s1896.mp3^^xsd:anyURI;ma:hasKeyword d1:colplay;

d1:song1 a ma:AudioTrack;ma:title ``UFO´´;ma:locator musicexample:s1896.mp3^^xsd:anyURI;ma:hasKeyword d1:colplay;

dbpedia:Leipzig a dbpedia-owl:Place;rdfs:label ``Leipzig´´;

dbpedia:Leipzig a dbpedia-owl:Place;rdfs:label ``Leipzig´´;

D1

DBpedia

Human tocrate

new links

o:wasPlayedIn?

Cristina Sarasua Data Interlinking together with Crowd Workers 11

• Creative and proactive• Listen / watch / search• Process / associate / more

complicated conclusionshuman

Cristina Sarasua Data Interlinking together with Crowd Workers 12

The Approach

Cristina Sarasua Data Interlinking together with Crowd Workers 13

Crowd -powered data interlinking

• Building a system that– Combines algorithmic and human

computation– Systematically involves humans

via microtasks– Considers the aforementioned

typs of links– Schema- and instance-level links

Automaticinterlinking

Cristina Sarasua Data Interlinking together with Crowd Workers 14

It worked! quick, unexpensiveSee CrowdMAP [Sarasua et al., 2012]

Overview

Cristina Sarasua Data Interlinking together with Crowd Workers 15

A microtask

Cristina Sarasua Data Interlinking together with Crowd Workers 16

A microtask

Challenge #1: It has to work with ANYONE

Challenge #2: We still want a data-independent solution

Cristina Sarasua Data Interlinking together with Crowd Workers 17

Picture: Icon made by Freepik from http://www.flaticon.com

Ongoing work

How toimprove?

Cristina Sarasua Data Interlinking together with Crowd Workers 18

Crowdsourcing approaches• Additional incentives to make them process

more links, faster (e.g. display #links left)• Let them explain others: write the argument

for the decision• Show similar link: decide by comparison

How to optimize the process ?

Cristina Sarasua Data Interlinking together with Crowd Workers 19

Crowdsourcing approaches• Additional incentives to make them process

more links, faster (e.g. display #links left)• Let them explain others: others: write the

argument for the decision• Show similar link: decide by comparison

How to optimize the process ?

Challenge #3: How to decide what is an analogous link here? (danger of bias?)

predicate rdf:type False positive / negative

Cristina Sarasua Data Interlinking together with Crowd Workers 20

Data-oriented approaches• Test and instructing links: targeted selection• Scheduled sequences of links to process: to

make more sense

How to optimize the process ?

Cristina Sarasua Data Interlinking together with Crowd Workers 21

Data-oriented approaches• Test and instructing links: targeted selection

• Scheduled sequences of links to process: • Validate vs identify microtasks :

How to optimize the process ?

Challege #4: How to build that programmatically?

data analysis data + crowd data + expert

Difficult case, rare

Easy case, common

Cristina Sarasua Data Interlinking together with Crowd Workers 22

Data-oriented approaches• Test and instructing links: targeted selection• Scheduled sequences of links to process: to

make more sense

How to optimize the process ?

Cristina Sarasua Data Interlinking together with Crowd Workers 23

Data-oriented approaches• Test and instructing links: targeted selection• Scheduled sequences of links to process: to

make more sense• Validate vs identify microtasks

How to optimize the process ?(II)

Challege #5: How to predict how suitable a worker will be forprocessing a particular link?

Which features of links have influence in the prediction?

Previous cross-platformexperience (CrowdWorkCV)

See also [Sarasua et al., 2013]

Ranking a list of suitablelinks based on training links

Cristina Sarasua Data Interlinking together with Crowd Workers 24

Data-oriented approaches• Test and instructing links: targeted selection• Scheduled sequences of links to process: to

make more sense

How to optimize the process ?(II)

Challege #6: How should we assess a priori if (and to whatextent approx.) we need crowdsourcing for a particular pair

of data sets?

Cristina Sarasua Data Interlinking together with Crowd Workers 25

Closing

Cristina Sarasua Data Interlinking together with Crowd Workers 26

• Yes, microtask crowdsourcing allows you to involvehumans for processing lots of data, it is cost-effective and fast

• Research shows it is a feasible complement to datainterlinking algorithms

• BUT do not underestimate the microtasks management

Coming soon …http://github.com/criscod

Take-away messages

Cristina Sarasua Data Interlinking together with Crowd Workers 27

[Schmachtenberg et al., 2014]

Open question : wouldn ´t crowd -powereddata interlinking enrich this table ?

Cristina Sarasua Data Interlinking together with Crowd Workers 28Institute for Web Science and Technologies · Univ ersity of Koblenz-Landau, Germany

Thank you for your attention!

Contact:Cristina SarasuaInstitute for Web Science and TechnologiesUniversität Koblenz-Landau [email protected]

Cristina Sarasua Data Interlinking together with Crowd Workers 29

• Sarasua, C. Crowdsourced Interlinking on the Web of Data. In: 18th International Conference on Knowledge Engineering and Knowledge Management(EKAW). Doctoral Symposium. (2012)

• Sarasua, C., Simperl, E., Noy, N.F.: CrowdMAP: Crowdsourcing ontology alignment with microtasks. In: Proceedings of the 11th International Semantic Web Conference (ISWC). (2012)

• Sarasua, C. Thimm, M.: Microtask available, send us your CV! In: Proceedings of the International Workshop on Crowd Work and Human Computation(CrowdWork 2013). (2013)

• Max Schmachtenberg, Christian Bizer, Heiko Paulheim: Adoption of the Linked Data Best Practices in Different Topical Domains. 13th International Semantic Web Conference (ISWC2014) - RDB Track, Riva del Garda, Italy, October 2014

References