Microtask Crowdsourcing Applications for Linked Data
-
Upload
euclid-project -
Category
Technology
-
view
111 -
download
2
description
Transcript of Microtask Crowdsourcing Applications for Linked Data
Microtask Crowdsourcing
Applications for Linked Data
2
Architecture of Linked Data Applications
SPARQL EndpointsWeb Data accessed via APIs
Data Tier
RDF/ XML
Integrated Dataset
Interlinking CleansingData Access Component
Linked DataEUCLID – Microtask crowdsourcing
applications for Linked Data
Relational Data
Vocabulary Mapping
Logic Tier
Presentation Tier
Data Integration Component
Republication Republication Component
SPARQL Wr. R2R Transf. LD WrapperPhysical Wrapper
3
CH 2
Data Integration Component
• Consolidates the data retrieved from heterogeneous sources.
• This component may operate at:– Schema level: Performs vocabulary mappings in order to translate
data into a single unified schema. Links correspond to RDFS properties or OWL property and class axioms.
– Instance level: Performs entity linking, e.g., entity resolution via owl:sameAs links CH 3
Data Tier
Interlinking CleansingData Access Component
Vocabulary Mapping
Data Integration Component
EUCLID – Microtask crowdsourcing applications for Linked Data
4
Data Integration Component
The data integration component can be enhanced by including microtask crowdsourcing apporaches:
• Cleansing or data assessments: Assessment of DBpedia triples
• Vocabulary mapping: CrowdMAP
• Interlinking: ZenCrowd
Data Tier (2)
Interlinking CleansingData Access Component
Vocabulary Mapping
Data Integration Component
EUCLID – Microtask crowdsourcing applications for Linked Data
5
Other Crowdsourcing-based Solutions for Linked Data Tasks
• Query understanding: CrowdDQ
• Ontology population: OntoGame
• Linked Data curation: Urbanopoly
• …
EUCLID – Microtask crowdsourcing applications for Linked Data
DBPEDIA QUALITY ASSESSMENT
EUCLID – Microtask crowdsourcing applications for Linked Data
Assessing DBpedia Triples
1. Selecting LD quality issues generated by erroneous extraction mechanisms and that can be detected by the crowd
2. Selecting the appropriate crowdsourcing approaches
3. Designing and generating the interfaces to present the data to the crowd
Dataset{s p o .}
{s p o .}
Correct
Incorrect +Quality issue
EUCLID – Microtask crowdsourcing applications for Linked Data
Three categories of quality problems occur pervasively in DBpedia [Zaveri2013]
and can be crowdsourced:
• Incorrect object Example: dbpedia:Dave_Dobbyn dbprop:dateOfBirth “3”.
• Incorrect data type Example: dbpedia:Torishima_Izu_Islands foaf:name “鳥島” @en.
• Incorrect link to “external Web pages” Example: dbpedia:John-Two-Hawks dbpedia-owl:wikiPageExternalLink
<http://cedarlakedvd.com/>
Selecting LD Quality Issues to Crowdsource
EUCLID – Microtask crowdsourcing applications for Linked Data
Selecting Appropriate Crowdsourcing Approaches
ContestLD ExpertsDifficult taskFinal prize
Find Verify
MicrotasksWorkersEasy taskMicropayments
TripleCheckMate [Kontoskostas2013] MTurk
Adapted from [Bernstein2010]
EUCLID – Microtask crowdsourcing applications for Linked Data
Presenting the Data to the Crowd
• Selection of foaf:name or rdfs:label to extract human-readable descriptions
• Real object values extracted automatically from Wikipedia infoboxes
• Link to the Wikipedia article via foaf:isPrimaryTopicOf
• Preview of external pages by implementing HTML iframe
Microtask interfaces: MTurk tasksIncorrect object
Incorrect data type
Incorrect outlink
EUCLID – Microtask crowdsourcing applications for Linked Data
11
Results
Object values Data types Interlinks
Linked Data experts
0.7151 0.8270 0.1525
MTurk (majority voting)
0.8977 0.4752 0.9412
• Both forms of crowdsourcing can be applied to detect certain LD quality issues
• The effort of LD experts must be applied on those tasks demanding specific-domain skills
• MTurk crowd are exceptionally good at performing comparison of data entries
EUCLID – Microtask crowdsourcing applications for Linked Data
ZENCROWD
EUCLID – Microtask crowdsourcing applications for Linked Data
13
ZenCrowd: Entity Linking by the Crowd
• Combine both algorithmic and manual linking• Automate manual linking via crowdsourcing• Dynamically assess human workers with a
probabilistic reasoning framework
Crowd
AlgorithmsMachines
EUCLID – Microtask crowdsourcing applications for Linked Data
14
http://dbpedia.org/resource/Facebook
http://dbpedia.org/resource/Instagram
fbase:Instagramowl:sameAs
Android
<p>Facebook is not waiting for its initial public offering to make its first big purchase.</p><p>In its largest acquisition to date, the social network has purchased Instagram, the popular photo-sharing application, for about $1 billion in cash and stock, the company said Monday.</p>
<p><span about="http://dbpedia.org/resource/Facebook"><cite property=”rdfs:label">Facebook</cite> is not waiting for its initial public offering to make its first big purchase.</span></p><p><span about="http://dbpedia.org/resource/Instagram">In its largest acquisition to date, the social network has purchased <cite property=”rdfs:label">Instagram</cite> , the popular photo-sharing application, for about $1 billion in cash and stock, the company said Monday.</span></p>
RDFa enrichment
HTML:
EUCLID – Microtask crowdsourcing applications for Linked Data
15
ZenCrowd Architecture
Gianluca Demartini, Djellel Eddine Difallah, and Philippe Cudré-Mauroux. ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for Large-Scale Entity Linking. In: 21st International Conference on World Wide Web (WWW 2012).
EUCLID – Microtask crowdsourcing applications for Linked Data
16
Entity Factor Graphs
• Graph components– Workers, links, clicks– Prior probabilities– Link Factors– Constraints
• Probabilistic Inference– Select all links with
posterior prob >τ 2 workers, 6 clicks, 3 candidate links
Link priors
Workerpriors
Observedvariables
Linkfactors
SameAsconstraints
DatasetUnicityconstraints
EUCLID – Microtask crowdsourcing applications for Linked Data
17
Lessons Learnt• Crowdsourcing + Prob reasoning works!• But
– Different worker communities perform differently– Many low quality workers– Completion time may vary (based on reward)
• Need to find the right workers for your task (see WWW13 paper)
EUCLID – Microtask crowdsourcing applications for Linked Data
18
ZenCrowd Summary
• ZenCrowd: Probabilistic reasoning over automatic and crowdsourcing methods for entity linking
• Standard crowdsourcing improves 6% over automatic• 4% - 35% improvement over standard crowdsourcing• 14% average improvement over automatic approaches
• Follow up-work (VLDBJ):– Also used for instance matching across datasets– 3-way blocking with the crowd
http://exascale.info/zencrowd/
EUCLID – Microtask crowdsourcing applications for Linked Data
CROWDQ – CROWD-POWERED QUERY UNDERSTANDING
EUCLID – Microtask crowdsourcing applications for Linked Data
20
Motivation
• Web Search Engines can answer simple factual queries directly on the result page
• Users with complex information needs are often unsatisfied
• Purely automatic techniques are not enough
• We want to solve it with Crowdsourcing!
EUCLID – Microtask crowdsourcing applications for Linked Data
21
CrowdQ• CrowdQ is the first system that uses
crowdsourcing to– Understand the intended meaning– Build a structured query template– Answer the query over Linked Open Data
Gianluca Demartini, Beth Trushkowsky, Tim Kraska, and Michael Franklin. CrowdQ: Crowdsourced Query Understanding. In: 6th Biennial Conference on Innovative Data Systems Research (CIDR 2013).
EUCLID – Microtask crowdsourcing applications for Linked Data
22
23
CrowdQ ArchitectureOff-line: query template generation with the help of the crowdOn-line: query template matching using NLP and search over open data
24
Hybrid Human-Machine Pipeline
Q= birthdate of actors of forrest gump
Query annotation Noun Noun Named entity
Verification
Entity Relations
Is forrest gump this entity in the query?
Which is the relation between: actors and forrest gump starring
Schema element Starring <dbpedia-owl:starring>
Verification Is the relation between:Indiana Jones – Harrison FordBack to the Future – Michael J. Foxof the same type asForrest Gump – actors
EUCLID – Microtask crowdsourcing applications for Linked Data
25
Structured query generation
SELECT ?y ?xWHERE { ?y <dbpedia-owl:birthdate> ?x .
?z <dbpedia-owl:starring> ?y .?z <rdfs:label> ‘Forrest Gump’ }
Results from BTC09:
Q= birthdate of actors of forrest gumpMOVIE
MOVIE
EUCLID – Microtask crowdsourcing applications for Linked Data
CROWDMAP & OTHERS
EUCLID – Microtask crowdsourcing applications for Linked Data
27
CrowdMAP
• Experiments using MTurk, CrowdFlower and established benchmarks• Enhancing the results of automatic techniques• Fast, accurate, cost-effective [Sarasua, Simperl, Noy,
ISWC2012]
CartP301-304
100R50PEdas-Iasted
100R50PEkaw-Iasted
100R50PCmt-Ekaw
100R50PConfOf-Ekaw
Imp301-304
PRECISION 0.53 0.8 1.0 1.0 0.93 0.73
RECALL 1.0 0.42 0.7 0.75 0.65 1.0
10.04.2023 28
Taste IT! Try IT!
• Restaurant review Android app developed in the Insemtives project• Uses Dbpedia concepts to generate structured reviews• Uses mechanism design/gamification to configure incentives• User study
– 2274 reviews by 180 reviewers referring to 900 restaurants, using 5667 DPpedia concepts
https://play.google.com/store/apps/details?id=insemtives.android&hl=en
CAFE FASTFOOD PUB RESTAURANT0
500
1000
1500
2000
2500
Numer of reviewsNumber of semantic annotations (type of cuisine)Number of semantic annotations (dishes)
EUCLID – Microtask crowdsourcing applications for Linked Data
10.04.2023 29
LODrefine
http://research.zemanta.com/crowds-to-the-rescue/EUCLID – Microtask crowdsourcing
applications for Linked Data
10.04.2023 30
Ontology Population
EUCLID – Microtask crowdsourcing applications for Linked Data
31
Linked Data Curation
EUCLID – Microtask crowdsourcing applications for Linked Data
10.04.2023 32
Problems and Challenges• What is feasible and how can tasks be optimally translated into microtasks?
– Examples: data quality assessment for technical and contextual features; subjective vs objective tasks (also in modeling); open-ended questions
• What to show to users– Natural language descriptions of Linked Data/SPARQL– How much context– What form of rendering– How about links?
• How to combine with automatic tools– Which results to validate
• Low precision (no fun for gamers...)• Low recall (vs all possible questions)
• How to embed it into an existing application– Tasks are fine granular, perceived as additional burden to the actual functionality
• What to do with the resulting data?– Integration into existing practices– Vocabularies!
EUCLID – Microtask crowdsourcing applications for Linked Data
10.04.2023
Web site: https://sites.google.com/site/microtasktutorial/
SLIDES and EXERCISES: https://github.com/maribelacosta/crowdsourcing-tutorial
Full-day tutorial ISWC2013Sydney Australia
33EUCLID – Microtask crowdsourcing applications for Linked Data
34
For exercises, quiz and further material visit our website:
@euclid_project euclidproject euclidproject
http://www.euclid-project.eu
Other channels:
eBook Course
EUCLID – Microtask crowdsourcing applications for Linked Data