Crowdsourcing for research libraries

25
CROWDSOURCING CONTENT MANAGEMENT: CHALLENGES AND OPPORTUNITIES ELENA SIMPERL UNIVERSITY OF SOUTHAMPTON 03-Jul-14 LIBER2014 1

description

Invited talk at the LIBER2014

Transcript of Crowdsourcing for research libraries

Page 1: Crowdsourcing for research libraries

CROWDSOURCING

CONTENT

MANAGEMENT:

CHALLENGES AND

OPPORTUNITIES

ELENA SIMPERL

UNIVERSITY OF SOUTHAMPTON

03-Jul-14

LIBER2014 1

Page 2: Crowdsourcing for research libraries

EXECUTIVE SUMMARY

Crowdsourcing helps with content

management tasks.

However,

• there is crowdsourcing and crowdsourcing

pick your faves and mix them

• human intelligence is a valuable resource

experiment design is key

• sustaining engagement is an art

crowdsourcing analytics may help

• computers are sometimes better than humans

the age of ‘social machines’

2

Page 3: Crowdsourcing for research libraries

CROWDSOURCING:

PROBLEM SOLVING VIA

OPEN CALLS

"Simply defined, crowdsourcing represents the act of a

company or institution taking a function once performed by

employees and outsourcing it to an undefined (and generally

large) network of people in the form of an open call. This can

take the form of peer-production (when the job is performed

collaboratively), but is also often undertaken by sole

individuals. The crucial prerequisite is the use of the open

call format and the large network of potential .“

[Howe, 2006]

03-Jul-14

3

Page 4: Crowdsourcing for research libraries

THE MANY FACES OF

CROWDSOURCING

03-Jul-14

4

Page 5: Crowdsourcing for research libraries

CROWDSOURCING AND

RESEARCH LIBRARIES

CHALLENGES

Understand what drives

participation

Design systems to reach

critical mass and sustain

engagement

OPPORTUNITIES

Better ‘customer’ experience

Enhanced information

management

Capitalize on crowdsourced

scientific workflows

03-Jul-14

5

Page 6: Crowdsourcing for research libraries

03-Jul-14

Tutorial@ISWC2013

IN THIS TALK:

CROWDSOURCING AS

‚HUMAN COMPUTATION‘

Outsourcing tasks that machines find difficult to solve

to humans

6

Page 7: Crowdsourcing for research libraries

IN THIS TALK:

CROWDSOURCING DATA

CITATION

‘The USEWOD experiment ‘

• Goal: collect information about the usage of Linked Data sets in research papers

• Explore different crowdsourcing methods

• Online tool to link publications to data sets (and their versions)

• 1st feasibility study with 10 researchers in May 2014

03-Jul-14

7

http://prov.usewod.org/

9650 publications

Page 8: Crowdsourcing for research libraries

03-Jul-14

8

DIMENSIONS OF CROWDSOURCING

Page 9: Crowdsourcing for research libraries

DIMENSIONS OF CROWDSOURCING

WHAT IS

OUTSOURCED

Tasks based on

human skills not

easily replicable by

machines

• Visual recognition

• Language

understanding

• Knowledge acquisition

• Basic human

communication

• ...

WHO IS THE CROWD

• Open call (crowd accessible through a platform)

• Call may target specific skills and expertise (qualification tests)

• Requester typically knows less about the ‘workers’ than in other ‘work’ environments

03-Jul-14

9

See also [Quinn & Bederson, 2012]

Page 10: Crowdsourcing for research libraries

DIMENSIONS OF CROWDSOURCING (2)

HOW IS THE TASK OUTSOURCED

• Explicit vs. implicit participation

• Tasks broken down into smaller units

undertaken in parallel by different people

• Coordination required to handle cases with

more complex workflows

• Partial or independent answers consolidated

and aggregated into complete solution

03-Jul-14

10

See also [Quinn & Bederson, 2012]

Page 11: Crowdsourcing for research libraries

EXAMPLE: CITIZEN SCIENCE

WHAT IS OUTSOURCED

• Object recognition, labeling,

categorization in media content

WHO IS THE CROWD

• Anyone

HOW IS THE TASK

OUTSOURCED

• Highly parallelizable tasks

• Every item is handled by multiple

annotators

• Every annotator provides an answer

• Consolidated answers solve scientific

problems

03-Jul-14

11

Page 12: Crowdsourcing for research libraries

USEWOD EXPERIMENT: TASK

AND CROWD

WHAT IS

OUTSOURCED

Annotating research papers with data set information

• Alternative representations of the domain

• What if the paper is not available?

• What if the domain is not known in advance or is infinite?

• Do we know the list of potential answers?

• Is there only one correct solution to each atomic task?

• How many people would solve the same task?

WHO IS THE CROWD

• People who know the papers or the data sets

• Experts in the (broader ) field

• Casual gamers

• Librarians

• Anyone (knowledgeable of English, with a computer/cell phone…)

• Combinations thereof…

03-Jul-14

12

Page 13: Crowdsourcing for research libraries

USEWOD EXPERIMENT: TASK

DESIGN

HOW IS THE TASK OUTSOURCED:

ALTERNATIVE MODELS

• Use the data collected here to train a IE algorithm

• Use paid microtask workers to go a first screening, then expert crowd to sort out challenging cases

• What if you have very long documents potentially mentioning different/unknown data sets?

• Competition via Twitter

• ‘Which version of DBpedia does this paper use?’

• One question a day, prizes

• Needs golden standard to bootstrap and redundancy

• Involve the authors

• Use crowdsourcing to find out Twitter accounts, then launch campaign on Twitter

• Write an email to the authors…

• Change the task

• Which papers use Dbpedia 3.X? • Competition to find all papers

03-Jul-14

13

Page 14: Crowdsourcing for research libraries

DIMENSIONS OF CROWDSOURCING (3)

HOW ARE THE

RESULTS VALIDATED

• Solutions space closed vs. open

• Performance measurements/ground truth

• Statistical techniques employed to predict accurate solutions

• May take into account confidence values of algorithmically generated solutions

HOW CAN THE

PROCESS BE

OPTIMIZED

• Incentives and motivators

• Assigning tasks to people based on their skills and performance (as opposed to random assignments)

• Symbiotic combinations of human- and machine-driven computation, including combinations of different forms of crowdsourcing

03-Jul-14

14

See also [Quinn & Bederson, 2012]

Page 15: Crowdsourcing for research libraries

USEWOD EXPERIMENT:

VALIDATION

• Domain is fairly restricted

• Spam and obvious wrong answers can be detected easily

• When are two answers the same? Can there be more

than one correct answer per question?

• Redundancy may not be the final answer

• Most people will be able to identify the data set, but

sometimes the actual version is not trivial to reproduce

• Make educated version guess based on time intervals

and other features

03-Jul-14

15

Page 16: Crowdsourcing for research libraries

ALIGNING INCENTIVES

IS ESSENTIAL

Successful volunteer crowdsourcing is difficult to predict or replicate

• Highly context-specific

• Not applicable to arbitrary tasks

Reward models often easier to study and control (if performance can be reliably measured)

• Different models: pay-per-time, pay-per-unit, winner-takes-it-all

• Not always easy to abstract from social aspects (free-riding, social pressure)

• May undermine intrinsic motivation

16

Page 17: Crowdsourcing for research libraries

IT‘S NOT ALWAYS

JUST ABOUT MONEY

03-Jul-14

17

http://www.crowdsourcing.org/editorial/how-to-motivate-the-crowd-infographic/

http://www.oneskyapp.com/blog/tips-to-motivate-participants-of-crowdsourced-

translation/

[Source: Kaufmann,

Schulze, Viet, 2011]

[Source: Ipeirotis, 2008]

Page 18: Crowdsourcing for research libraries

CROWDSOURCING

ANALYTICS

03-Jul-14

18 0

2

4

6

8

10

12

14

16

18

20

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31

Acti

ve u

sers

in

%

Month since registration

See also [Luczak-Rösch et al. 2014]

Page 19: Crowdsourcing for research libraries

USEWOD EXPERIMENT:

OTHER INCENTIVES

MODELS

• Who benefits from the results

• Who owns the results

• Twitter-based contest

• ‘Which version of DBpedia does this paper use?’

• One question a day, prizes

• If question is not answered correctly, increase the prize

• If low participation, re-focus the audience or change the

incentive.

• Altruism: for each ten papers annotated we send a

student to ESWC…

03-Jul-14

19

[Source: Nature.com]

Page 20: Crowdsourcing for research libraries

DIFFERENT CROWDS FOR

DIFFERENT TASKS

Contest

Linked Data experts

Difficult task

Final prize

Find Verify

Microtasks

Workers

Easy task

Micropayments

TripleCheckMate [Kontoskostas2013]

MTurk http://mturk.com

See also [Acosta et al., 2013]

20

Page 21: Crowdsourcing for research libraries

Not sure

COMBINING HUMAN AND

COMPUTATIONAL INTELLIGENCE

EXAMPLE: BIBLIOGRAPHIC DATA

INTEGRATION

21

paper conf

Data integration VLDB-01

Data mining SIGMOD-02

title author email

OLAP Mike mike@a

Social media Jane jane@b

Generate plausible matches

– paper = title, paper = author, paper = email, paper = venue

– conf = title, conf = author, conf = email, conf = venue

Ask users to verify

paper conf

Data integration VLDB-01

Data mining SIGMOD-02

title author email venue

OLAP Mike mike@a ICDE-02

Social media Jane jane@b PODS-05

Does attribute paper match attribute author?

No Yes

See also [McCann, Shen, Doan, 2008]

Page 22: Crowdsourcing for research libraries

03-Jul-14

22

SUMMARY AND FINAL REMARKS

[Source: Dave de Roure]

Page 23: Crowdsourcing for research libraries

SUMMARY

• There is crowdsourcing and

crowdsourcing pick your faves

and mix them

• Human intelligence is a valuable

resource experiment design is

key

• Sustaining engagement is an art

crowdsourcing analytics may help

• Computers are sometimes better

than humans the age of ‘social

machines’

03-Jul-14

23

Page 24: Crowdsourcing for research libraries

THE AGE OF SOCIAL

MACHINES

03-Jul-14

24

Page 25: Crowdsourcing for research libraries

[email protected]

@ESIMPERL

WWW.SOCIAM.ORG

WWW.PLANET-DATA.EU

THANKS TO MARIBEL ACOSTA, LAURA

DRAGAN, MARKUS LUCZAK-RÖSCH, RAMINE

TINATI, AND MANY OTHERS

03-Jul-14

25