2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica
-
Upload
ed-chi -
Category
Technology
-
view
2.103 -
download
1
description
Transcript of 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica
![Page 1: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/1.jpg)
Ed H. Chi
Area Manager and Principal Scientist Augmented Social Cognition Area Palo Alto Research Center
![Page 2: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/2.jpg)
Cognition: the ability to remember, think, and reason; the faculty of knowing.
Social Cognition: the ability of a group to remember, think, and reason; the construction of knowledge structures by a group. – (not quite the same as in the branch of psychology that studies the
cognitive processes involved in social interaction, though included)
Augmented Social Cognition: Supported by systems, the enhancement of the ability of a group to remember, think, and reason; the system-‐supported construction of knowledge structures by a group.
Citation: Chi, IEEE Computer, Sept 2008
2010-02-22 Ed H. Chi ASC Overview 2
2
![Page 3: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/3.jpg)
Characterize activity on social systems with analytics Model interaction social and community dynamics and variables Prototype tools to increase benefits or reduce cost Evaluate prototypes via Living Laboratories with real users
3 Ed H. Chi ASC Overview 2010-02-22
Characteriza*on Models
Prototypes Evalua*ons
3
![Page 4: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/4.jpg)
Characterization and Modeling: – Community Analytics and Wikipedia Dynamics
Prototyping: – Social Transparency thru WikiDashboard
Evaluation: – Evaluations using Amazon Mechanical Turk
4 Ed H. Chi ASC Overview 2010-02-22 4
![Page 5: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/5.jpg)
Characteriza*on Models
Prototypes Evalua*ons
![Page 6: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/6.jpg)
2010-02-22 6
Conflict/Coordination Effects in Wikipedia
Ed H. Chi ASC Overview
![Page 7: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/7.jpg)
Mediator Pattern -‐ Terri Schiavo
Mediators
Sympathetic to parents
Sympathetic to husband
Anonymous (vandals/spammers)
2010-02-22 7 Ed H. Chi ASC Overview
![Page 8: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/8.jpg)
Measure of controversy • “Controversial” tag
• Use # revisions tagged controversial
8 2010-02-22 Ed H. Chi ASC Overview
![Page 9: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/9.jpg)
Page metrics • Possible metrics for identifying conflict in articles
Metric type Page Type Revisions (#) Article, talk, article/talk Page length Article, talk, article/talk
Unique editors Article, talk, article/talk Unique editors / revisions Article, talk Links from other articles Article, talk
Links to other articles Article, talk Anonymous edits (#, %) Article, talk
Administrator edits (#, %) Article, talk Minor edits (#, %) Article, talk
Reverts (#, by unique editors) Article
9 2010-02-22 Ed H. Chi ASC Overview
![Page 10: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/10.jpg)
Performance: Cross-‐validation • 5x cross-‐validation, R2 = 0.897
10 2010-02-22 Ed H. Chi ASC Overview
![Page 11: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/11.jpg)
Determinants of conflict
Revisions (talk) Minor edits (talk) Unique editors (talk) Revisions (article) Unique editors (article) Anonymous edits (talk) Anonymous edits (article)
Highly weighted features of conflict model:
11 2010-02-22 Ed H. Chi ASC Overview
![Page 12: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/12.jpg)
Number of Articles (Log Scale)
http://en.wikipedia.org/wiki/Wikipedia:Modelling_Wikipedia’s_growth
12 2010-02-22 12 Ed H. Chi ASC Overview
![Page 13: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/13.jpg)
13 2010-02-22 13 Ed H. Chi ASC Overview
![Page 14: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/14.jpg)
Monthly Edits
14 2010-02-22 14 Ed H. Chi ASC Overview
![Page 15: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/15.jpg)
Monthly Edits
15 2010-02-22 15 Ed H. Chi ASC Overview
![Page 16: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/16.jpg)
Monthly Active Editors
16 2010-02-22 16 Ed H. Chi ASC Overview
![Page 17: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/17.jpg)
Characteriza*on Models
Prototypes Evalua*ons
![Page 18: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/18.jpg)
18 2010-02-22 18 Ed H. Chi ASC Overview
![Page 19: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/19.jpg)
Edits beget edits – more number of previous edits, more number of new edits
€
N(t) = N0 ⋅ ert
€
dNdt
= r ⋅ N
Growth rate of population
Current population
Growth rate depends on current population N r = growth rate of the population
19 2010-02-22 19 Ed H. Chi ASC Overview
![Page 20: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/20.jpg)
Ecological population growth model – r, growth rate of the population – K, carrying capacity (due to resource limitation)
€
dNdt
= r ⋅ N ⋅ (1− NK)
0
500000
1000000
1500000
2000000
2500000
3000000
3500000
4000000
2000 2002 2004 2006 2008 2010
Popu
latio
n
Year
K
20 2010-02-22 20 Ed H. Chi ASC Overview
![Page 21: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/21.jpg)
http://en.wikipedia.org/wiki/Wikipedia:Modelling_Wikipedia’s_growth
Follows a logistic growth curve
New Article
21 2010-02-22 21 Ed H. Chi ASC Overview
![Page 22: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/22.jpg)
Carrying Capacity as a function of time.
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
Popu
latio
n
Year
K(t)
22 2010-02-22 22 Ed H. Chi ASC Overview
![Page 23: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/23.jpg)
Biological system – Competition increases as
population hit the limits of the ecology
– Advantage go to members of the population that have competitive dominance over others
Analogy – Limited opportunities to make
novel contributions – Increased patterns of conflict and
dominance
23 2010-02-22 23 Ed H. Chi ASC Overview
![Page 24: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/24.jpg)
24 2010-02-22 24 Ed H. Chi ASC Overview
![Page 25: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/25.jpg)
Highly skewed contribution pattern – Top 3% users contribute 50%+ edits – A lot of single-edit users
Five Editor Classes – Monthly edit count – No bot, vandalism included in the analysis – 1000+: editors who made more than 1000 edits in that month – 100-999 – 10-99 – 2-9 – 1
25 2010-02-22 25 Ed H. Chi ASC Overview
![Page 26: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/26.jpg)
Monthly Edits by Editor Class (in thousands)
26 2010-02-22 26 Ed H. Chi ASC Overview
![Page 27: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/27.jpg)
27 2010-02-22 27 Ed H. Chi ASC Overview
![Page 28: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/28.jpg)
Monthly Ratio of Reverted Edits
2010-02-22 Ed H. Chi ASC Overview 28
28
![Page 29: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/29.jpg)
Two interpretations: – Overall increased resistance
from the Wikipedia community to changing content
– Disparity of treatment of edits » Occasional editors have been
reverted in a higher rate
Example of increased patterns of conflict and dominance
Photo: http://www.flickr.com/photos/efan78/3619921561/
29 2010-02-22 29 Ed H. Chi ASC Overview
![Page 30: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/30.jpg)
30 2010-02-22 30 Ed H. Chi ASC Overview
![Page 31: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/31.jpg)
Bongwon Suh, Gregorio Convertino, Ed H. Chi, Peter Pirolli. WikiSym 2009
31 2010-02-22 31 Ed H. Chi ASC Overview
![Page 32: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/32.jpg)
Characteriza*on Models
Prototypes Evalua*ons
![Page 33: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/33.jpg)
“Wikipedia is the best thing ever. Anyone in the world can write anything they want about any subject, so you know you’re getting the
best possible information.” – Steve Carell, The Office
33 2010-02-22 33 Ed H. Chi ASC Overview
![Page 34: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/34.jpg)
Content in Wikipedia can be added or changed by anyone
Because of this, WP has become one of the most important resources on the web – Hundreds of thousands of contributors – Over 2 million articles – 5th most used websites (Alexa.com)
Also because of this, is viewed with skepticism by readers, press, researchers
34 2010-02-22 34 Ed H. Chi ASC Overview
![Page 35: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/35.jpg)
35 2010-02-22 35 Ed H. Chi ASC Overview
![Page 36: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/36.jpg)
Nothing
36 2010-02-22 36 Ed H. Chi ASC Overview
![Page 37: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/37.jpg)
“Wikipedia, just by its nature, is impossible to trust completely. I don't think this can necessarily be changed.”
37 2010-02-22 37 Ed H. Chi ASC Overview
![Page 38: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/38.jpg)
Risks with using Wikipedia – Accuracy of content – Motives of editors – Expertise of editors – Stability of article – Coverage of topics – Quality of cited information
Insufficient information to evaluate trustworthiness
38 2010-02-22 38 Ed H. Chi ASC Overview
![Page 39: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/39.jpg)
Transparency of social dynamics can reduce conflict and coordination issues
Attribution encourages contribution – WikiDashboard: Social dashboard for wikis – Prototype system: http://wikidashboard.parc.com
Visualization for every wiki page showing edit history timeline and top individual editors
Can drill down into activity history for specific editors and view edits to see changes side-by-side
39
Citation: Suh et al. CHI 2008 Proceedings
Ed H. Chi ASC Overview 2010-02-22 39
![Page 40: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/40.jpg)
2010-02-22 40 Ed H. Chi ASC Overview 40
![Page 41: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/41.jpg)
2010-02-22 41 Ed H. Chi ASC Overview
![Page 42: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/42.jpg)
Characteriza*on Models
Prototypes Evalua*ons
![Page 43: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/43.jpg)
Surfacing information
• Numerous studies mining Wikipedia revision history to surface trust-relevant information – Adler & Alfaro, 2007; Dondio et al., 2006; Kittur et al., 2007;
Viegas et al., 2004; Zeng et al., 2006
• But how much impact can this have on user perceptions in a system which is inherently mutable?
Suh, Chi, Kittur, & Pendleton, CHI2008
43
![Page 44: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/44.jpg)
Hypotheses
1. Visualization will impact perceptions of trust 2. Compared to baseline, visualization will
impact trust both positively and negatively 3. Visualization should have most impact when
high uncertainty about article • Low quality • High controversy
44
![Page 45: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/45.jpg)
Design
• 3 x 2 x 2 design
Abortion
George Bush
Volcano
Shark
Pro-life feminism
Scientology and celebrities
Disk defragmenter
Beeswax
Controversial Uncontroversial
High quality
Low quality
Visualization • High stability • Low stability • Baseline (none)
45
![Page 46: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/46.jpg)
Example: High trust visualization
46
![Page 47: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/47.jpg)
Example: Low trust visualization
47
![Page 48: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/48.jpg)
Summary info
• % from anonymous users
48
![Page 49: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/49.jpg)
Summary info
• % from anonymous users
• Last change by anonymous or established user
49
![Page 50: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/50.jpg)
Summary info
• % from anonymous users
• Last change by anonymous or established user
• Stability of words
50
![Page 51: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/51.jpg)
Graph
• Instability
51
![Page 52: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/52.jpg)
Graph
• Instability • Revert activity
52
![Page 53: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/53.jpg)
Method
• Users recruited via Amazon’s Mechanical Turk – 253 participants – 673 ratings – 7 cents per rating – Kittur, Chi, & Suh, CHI 2008: Crowdsourcing user studies
• To ensure salience and valid answers, participants answered: – In what time period was this article the least stable? – How stable has this article been for the last month? – Who was the last editor? – How trustworthy do you consider the above editor?
53
![Page 54: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/54.jpg)
Results
main effects of quality and controversy: • high-quality articles > low-quality articles (F(1, 425) = 25.37, p < .001) • uncontroversial articles > controversial articles (F(1, 425) = 4.69, p = .031)
54
![Page 55: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/55.jpg)
Results
interaction effects of quality and controversy: • high quality articles were rated equally trustworthy whether controversial or not, while • low quality articles were rated lower when they were controversial than when they were uncontroversial service.
55
![Page 56: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/56.jpg)
Results
1. Significant effect of visualization – High > low, p < .001
2. Viz has both positive and negative effects – High > baseline, p < .001 – Low > baseline, p < .01
3. No interaction of visualization with either quality or controversy – Robust across conditions
56
![Page 57: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/57.jpg)
Results
1. Significant effect of visualization – High > low, p < .001
2. Viz has both positive and negative effects – High > baseline, p < .001 – Low > baseline, p < .01
3. No interaction of visualization with either quality or controversy – Robust across conditions
57
![Page 58: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/58.jpg)
Results
1. Significant effect of visualization – High > low, p < .001
2. Viz has both positive and negative effects – High > baseline, p < .001 – Low > baseline, p < .01
3. No interaction effect of visualization with either quality or controversy – Robust across conditions
58
![Page 59: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/59.jpg)
Methodology
Characteriza*on Models
Prototypes Evalua*ons
![Page 60: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/60.jpg)
User studies
• Getting input from users is important in HCI – surveys – rapid prototyping – usability tests – cognitive walkthroughs – performance measures – quantitative ratings
![Page 61: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/61.jpg)
User studies
• Getting input from users is expensive – Time costs – Monetary costs
• Often have to trade off costs with sample size
![Page 62: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/62.jpg)
Online solutions
• Online user surveys • Remote usability testing • Online experiments • But still have difficulties
– Rely on practitioner for recruiting participants – Limited pool of participants
![Page 63: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/63.jpg)
Crowdsourcing
• Make tasks available for anyone online to complete • Quickly access a large user pool, collect data, and
compensate users
• Experiences at PARC: – CSL UbiComp group – ISL’s NLTT group
![Page 64: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/64.jpg)
Crowdsourcing
• Make tasks available for anyone online to complete • Quickly access a large user pool, collect data, and
compensate users • Example: NASA Clickworkers
– 100k+ volunteers identified Mars craters from space photographs
– Aggregate results “virtually indistinguishable” from expert geologists
experts
crowds
http://clickworkers.arc.nasa.gov
![Page 65: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/65.jpg)
Amazon’s Mechanical turk
• Market for “human intelligence tasks” • Typically short, objective tasks
– Tag an image – Find a webpage – Evaluate relevance of search results
• Users complete for a few pennies each
![Page 66: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/66.jpg)
Example task
![Page 67: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/67.jpg)
Using Mechanical Turk for user studies
Traditional user studies
Mechanical Turk
Task complexity Complex Long
Simple Short
Task subjectivity Subjective Opinions
Objective Verifiable
User information Targeted demographics High interactivity
Unknown demographics Limited interactivity
Can Mechanical Turk be usefully used for user studies?
![Page 68: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/68.jpg)
Task
• Assess quality of Wikipedia articles • Started with ratings from expert Wikipedians
– 14 articles (e.g., “Germany”, “Noam Chomsky”) – 7-point scale
• Can we get matching ratings with mechanical turk?
![Page 69: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/69.jpg)
Experiment 1
• Rate articles on 7-point scales: – Well written – Factually accurate – Overall quality
• Free-text input: – What improvements does the article need?
• Paid $0.05 each
![Page 70: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/70.jpg)
Experiment 1: Good news
• 58 users made 210 ratings (15 per article) – $10.50 total
• Fast results – 44% within a day, 100% within two days – Many completed within minutes
![Page 71: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/71.jpg)
Experiment 1: Bad news
• Correlation between turkers and Wikipedians only marginally significant (r=.50, p=.07)
• Worse, 59% potentially invalid responses
• Nearly 75% of these done by only 8 users
Experiment 1
Invalid comments
49%
<1 min responses
31%
![Page 72: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/72.jpg)
Not a good start
• Summary so far: – Only marginal correlation with experts. – Heavy gaming of the system by a minority
• Possible Response: – Can make sure these gamers are not rewarded – Ban them from doing your hits in the future – Create a reputation system [Delores Lab]
• Can we change how we collect user input ?
![Page 73: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/73.jpg)
Design changes
• Use verifiable questions to signal monitoring – “How many sections does the article have?” – “How many images does the article have?” – “How many references does the article have?”
![Page 74: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/74.jpg)
Design changes
• Use verifiable questions to signal monitoring • Make malicious answers as high cost as
good-faith answers – “Provide 4-6 keywords that would give someone a
good summary of the contents of the article”
![Page 75: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/75.jpg)
Design changes
• Use verifiable questions to signal monitoring • Make malicious answers as high cost as
good-faith answers • Make verifiable answers useful for completing
task – Used tasks similar to how Wikipedians described
evaluating quality (organization, presentation, references)
![Page 76: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/76.jpg)
Design changes
• Use verifiable questions to signal monitoring • Make malicious answers as high cost as
good-faith answers • Make verifiable answers useful for completing
task • Put verifiable tasks before subjective
responses – First do objective tasks and summarization – Only then evaluate subjective quality – Ecological validity?
![Page 77: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/77.jpg)
Experiment 2: Results
• 124 users provided 277 ratings (~20 per article) • Significant positive correlation with Wikipedians (r=.
66, p=.01)
• Smaller proportion malicious responses • Increased time on task
Experiment 1 Experiment 2
Invalid comments
49% 3% <1 min
responses 31% 7%
Median time 1:30 4:06
![Page 78: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/78.jpg)
Generalizing to other user studies
• Combine objective and subjective questions – Rapid prototyping: ask verifiable questions about
content/design of prototype before subjective evaluation
– User surveys: ask common-knowledge questions before asking for opinions
![Page 79: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/79.jpg)
Limitations of mechanical turk
• No control of users’ environment – Potential for different browsers, physical
distractions – General problem with online experimentation
• Not designed for user studies – Difficult to do between-subjects design – Involves some programming
• Users – Uncertainty about user demographics, expertise
![Page 80: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/80.jpg)
Conclusion
1. Use verifiable questions to signal monitoring 2. Make malicious answers as high cost as good-faith
answers 3. Make verifiable answers useful for completing task 4. Put verifiable tasks before subjective responses
• Mechanical Turk offers the practitioner a way to access a large user pool and quickly collect data at low cost
• Good results require careful task design
![Page 81: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/81.jpg)
Ed H. Chi (manager, PS) Peter Pirolli (RF) Lichan Hong Bongwon Suh Les Nelson Rowan Nairn Gregorio Convertino
Interns/Collaborators: Sanjay Kairam, Jilin Chen (UMinn), Michael Bernstein (MIT)
http://asc-‐parc.blogspot.com
2010-02-22 Ed H. Chi ASC Overview 81
![Page 82: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/82.jpg)
2010-02-22 82 Ed H. Chi ASC Overview
![Page 83: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/83.jpg)
r, growth rate K, carrying capacity
€
dNdt
= rN(1− NK)
0
500000
1000000
1500000
2000000
2500000
3000000
3500000
4000000
2000 2002 2004 2006 2008 2010 Year
r dominates when N is small
K dominates when N ⇒K
€
(1− NK) ≈1
€
(1− NK) ≈ 0
2010-02-22 83 Ed H. Chi ASC Overview
![Page 84: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/84.jpg)
r-Strategist – Growth or exploitation – Less-crowded niches / produce many offspring
K-Strategist – Conservation – Strong competitors in crowded niches / invest more heavily in
fewer offspring
Evolution cycle – Resilience of an ecological system – Gunderson & Holling 2001
2010-02-22 84 Ed H. Chi ASC Overview
![Page 85: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/85.jpg)
Exponential growth model – Growth rate depends on the current N
Ecological population growth model – r, growth rate of the population – K, carrying capacity (due to resource limitation)
€
dNdt
= rN(1− NK)€
dNdt
= r *N
2010-02-22 85 Ed H. Chi ASC Overview
![Page 86: 2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica](https://reader033.fdocuments.net/reader033/viewer/2022051209/5486ea0db4af9f730d8b5318/html5/thumbnails/86.jpg)
People-ware – Growing resistance to changing content – Coordination cost and bureaucracy
Knowledge-ware: Availability of easy topics to write about Tool-ware: Quality of tools used by editors and admins
http://www.aerostich.com/ http://www.mikestreetmedia.co.uk/blog/wp-content/uploads/2009/01/knowledge.jpg http://youropenbook.agitprop.co.uk/growing.php?p=2 2010-02-22 86 Ed H. Chi ASC Overview