Cscw family searchindexing

29
CSCW, SAN ANTONIO, TX FEB 26, 2013 Derek Hansen, Patrick Schone, Douglas Corey, Matthew Reid, & Jake Gehring QUALITY CONTROL MECHANISMS FOR CROWDSOURCING: PEER REVIEW, ARBITRATION, & EXPERTISE AT FAMILYSEARCH INDEXING

description

 

Transcript of Cscw family searchindexing

Page 1: Cscw family searchindexing

CSCW, SAN ANTONIO, TXFEB 26, 2013

Derek Hansen, Patrick Schone, Douglas Corey, Matthew Reid, & Jake Gehring

QUALITY CONTROL MECHANISMS FOR CROWDSOURCING: PEER REVIEW, ARBITRATION, & EXPERTISE AT FAMILYSEARCH INDEXING

Page 2: Cscw family searchindexing

FamilySearch.org

Page 3: Cscw family searchindexing

FamilySearch Indexing (FSI)

Page 4: Cscw family searchindexing

FamilySearch Indexing (FSI)

Page 5: Cscw family searchindexing

FSI in Broader Landscape• Crowdsourcing Project

Aggregates discrete tasks completed by volunteers who replace professionals (Howe, 2006; Doan, et al., 2011)

• Human Computation SystemHumans use computational system to work on a problem that may someday be solvable by computers (Quinn & Bederson, 2011)

• Lightweight Peer ProductionLargely anonymous contributors independently completing discrete, repetitive tasks provided by authorities (Haythornthwaite, 2009)

Page 6: Cscw family searchindexing

Design Challenge: Improve efficiency without sacrificing quality

Time

Am

ount

Scanned Documents

Page 7: Cscw family searchindexing

Quality Control Mechanisms• 9 Types of quality control for human computation

systems (Quinn & Bederson, 2011)• Redundancy• Multi-level review

• Find-Fix-Verify pattern (Bernstein, et al., 2010)• Weight proposed solutions by reputation of contributor

(McCann, et al., 2003)• Peer or expert oversight (Cosley, et al., 2005)• Tournament selection approach (Sun, et al., 2011)

Page 8: Cscw family searchindexing

A-B-Arbitrate process (A-B-ARB)

A

B

ARB

Currently Used Mechanism

Page 9: Cscw family searchindexing

Peer review process (A-R-RARB)

A R RARB

Already Filled InOptional?Proposed Mechanism

Page 10: Cscw family searchindexing

Two Act Play

Act I: Experience

What is the role of experience on quality and efficiency?

Historical data analysis using full US and Canadian Census records from 1920 and earlier

Act II: Quality Control

Is peer review or arbitration better in terms of quality and efficiency?

Field experiment using 2,000 images from the 1930 US Census Data & corresponding truth set

Page 11: Cscw family searchindexing

Act I: Experience

Quality is estimated based on A-B agreement (no truth set)

Efficiency calculated using keystroke-logging data with idle time and outliers removed

Page 12: Cscw family searchindexing

A-B agreement by field

Page 13: Cscw family searchindexing

A-B agreement by language (1871 Canadian Census)

English Language

Given Name: 79.8%

Surname: 66.4%

French Language

Given Name: 62.7%

Surname: 48.8%

Page 14: Cscw family searchindexing

A-B agreement by experience

Birth Place: All U.S. Censuses

B (

novi

ce ↔

exp

erie

nced

)

A (novice ↔ experienced)

Page 15: Cscw family searchindexing

A-B agreement by experience

Given Name: All U.S. Censuses

A (novice ↔ experienced)

B (

novi

ce ↔

exp

erie

nced

)

Page 16: Cscw family searchindexing

A-B agreement by experience

Surname: All U.S. Censuses

A (novice ↔ experienced)

B (

novi

ce ↔

exp

erie

nced

)

Page 17: Cscw family searchindexing

A-B agreement by experience

Gender: All U.S. Censuses

A (novice ↔ experienced)

B (

novi

ce ↔

exp

erie

nced

)

Page 18: Cscw family searchindexing

A-B agreement by experience

Birthplace: English-speaking Canadian Census

A (novice ↔ experienced)

B (

novi

ce ↔

exp

erie

nced

)

Page 19: Cscw family searchindexing

Time & keystroke by experience

Page 20: Cscw family searchindexing

Summary & Implications of Act I Experienced workers are faster and more accurate,

gains which continue even at high levels

- Focus on retention

- Encourage both novices & experts to do more

- Develop interventions to speed up experience gains (e.g., send users common mistakes made by people at their experience level)

Page 21: Cscw family searchindexing

Summary & Implications of Act I Contextual knowledge (e.g., Canadian placenames)

and specialized skills (e.g., French language fluency) is needed for some tasks

- Recruit people with existing knowledge & skills

- Provide contextual information when possible (e.g., Canadian placename prompts)

- Don’t remove context (e.g., captcha)

- Allow users to specialize?

Page 22: Cscw family searchindexing

Act II: Quality ControlA-B-ARB data from original transcribers (Feb 2011)

A-R-RARB data includes original A data and newly collected R and RARB data from people new to this method (Jan-Feb of 2012)

Truth Set data from company with independent audit by FSI experts

Statistical Test: mixed-model logistic regression (accurate or not) with random effects, controlling for expertise

Page 23: Cscw family searchindexing

Limitations• Experience levels of R and RARB were

lower than expected, though we did statistically control for this

• Original B data used in A-B-ARB for certain fields was transcribed in non-standard manner requiring adjustment

Page 24: Cscw family searchindexing

No Need for RARB• No gains in quality from extra arbitration of

peer reviewed data (A-R = A-R-RARB)• RARB takes some time, so better without

Page 25: Cscw family searchindexing

Quality Comparison

• Both methods were statistically better than A alone

• A-B-ARB had slightly lower error rates than A-R

• R “missed” more errors, but also introduced fewer errors

Page 26: Cscw family searchindexing

Time Comparison

Page 27: Cscw family searchindexing

Summary & Implications of Act II Peer Review shows considerable efficiency

gains with nearly as good quality as Arbitration

- Prime reviewers to find errors (e.g., prompt them with expected # of errors on a page)

- Highlight potential problems (e.g., let A flag tough fields)

- Route difficult pages to experts

- Consider an A-R1-R2 process when high quality is critical

Page 28: Cscw family searchindexing

Summary & Implications of Act II Reviewing reviewers isn’t always worth the time

- At least in some contexts, Find-Fix may not need Verify

Quality of different fields varies dramatically

- Use different quality control mechanisms for harder or easier fields

Integrate human and algorithmic transcription

- Use algorithms on easy fields & integrate into review process so machine learning can occur

Page 29: Cscw family searchindexing

Questions• Derek Hansen ([email protected])• Patrick Schone ([email protected])• Douglas Corey ([email protected])• Matthew Reid ([email protected])• Jake Gehring ([email protected])