U NDERSTANDING W IKIPEDIA Niki Kittur [email protected].

79
UNDERSTANDING WIKIPEDIA Niki Kittur [email protected]

Transcript of U NDERSTANDING W IKIPEDIA Niki Kittur [email protected].

Page 1: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

UNDERSTANDING WIKIPEDIA

Niki [email protected]

Page 2: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Slowing growth

• Since 2007, slowing growth

Why?• Fewer new topics to

write about• Growing resistance to

new contributions

Proportion reverted edits (by editor class)

Number of active editors per month

Suh, Convertino, Chi, & Pirolli, 2009

Page 3: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Wisdom of crowds poll

What proportion of Wikipedia (in words) is made up of

articles?

0-25% | 25-50% | 50-75% | 75-100%

Page 4: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Wisdom of crowds poll

Page 5: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Article

Page 6: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Discussion

Page 7: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Discussion

Page 8: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Edit history

Page 9: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Edit history

Page 10: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Policies + Procedures

Page 11: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

How does it work?

• “Wisdom of crowds” - Many independent judgments– “with enough eyeballs all bugs are shallow”

• More contributors ->– more information– fewer errors– less bias

Page 12: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Wilkinson & Huberman, 2007

• Examined featured articles vs. non-featured articles– Controlling for PageRank (i.e., popularity)

• Featured articles = more edits, more editors

• More work, more people => better outcomes

Edits Editors

Page 13: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Difficulties with generalizing results

• Cross-sectional analysis– Reverse causation: articles which become

featured may subsequently attract more people

• Coarse quality metrics– Fewer than 2000 out of >2,000,000 articles

are featured• What about coordination?

Page 14: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Coordination costs

• Increasing contributors incurs process losses (Boehm, 1981; Steiner, 1972)

• Diminishing returns with added people (Hill, 1982; Sheppard, 1993)

– Super-linear increase in communication pairs– Linear increase in added work

• In the extreme, costs may exceed benefits to quality (Brooks, 1975)

• The more you can support coordination, the more benefits from adding people“Adding manpower to a late

software project makes it later”

Brooks, 1975

Page 15: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Research question

To what degree are editors in Wikipedia working independently

versus coordinating?

Page 16: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Research infrastructure

• Analyzed entire history of Wikipedia– Every edit to every article

• Large dataset (as of 2008)– 10+ million pages– 200+ million revisions– 2.5+ Tb

• Used distributed processing– Hadoop distributed filesystem– Map/reduce to process data in parallel– Reduce time for analysis from weeks to

hours

Page 17: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Types of work

Direct work Editing articles

Indirect workUser talk, creating

policy

Maintenance work Reverts, vandalism

Page 18: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Less direct work

• Decrease in proportion of edits to article page

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

2001 2002 2003 2004 2005 2006

Edi

t pr

opor

tion

70%

Page 19: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

2001 2002 2003 2004 2005 2006

Ed

it P

rop

ort

ion

More indirect work

• Increase in proportion of edits to user talk

8%

Page 20: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

More indirect work

• Increase in proportion of edits to user talk

• Increase in proportion of edits to policy pages

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

2001 2002 2003 2004 2005 2006

Edi

t pr

opor

tion 11

%

Page 21: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

More maintenance work

• Increase in proportion of edits that are reverts

00.020.040.060.08

0.10.120.140.160.18

0.2

2001 2002 2003 2004 2005 2006

Ed

it p

rop

ort

ion

7%

Page 22: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

More wasted work

• Increase in proportion of edits that are reverts

• Increase in proportion of edits reverting vandalism

00.005

0.010.015

0.02

0.0250.03

2001 2002 2003 2004 2005

Ed

it p

rop

ort

ion

1-2%

Page 23: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Global level

• Coordination costs are growing– Less direct work (articles)+ More indirect work (article talk, user,

procedure)+ More maintenance work (reverts, vandalism)

Kittur, Suh, Pendleton, & Chi, 2007

Page 24: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Research question

How does coordination impact quality?

Page 25: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Coordination types

• Explicit coordination– Direct communication among editors

planning and discussing article• Implicit coordination

– Division of labor and workgroup structure– Concentrating work in core group of editors

Leavitt, 1951; March & Simon, 1958; Malone, 1987; Rouse et al., 1992; Thompson, 1967

Page 26: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Explicit coordination: “Music of Italy”

planning

Page 27: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Explicit coordination: “Music of Italy”

coverage

Page 28: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Explicit coordination: “Music of Italy”

readability

Page 29: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Coordination types

• Explicit coordination– Direct communication among editors

planning and discussing article• Implicit coordination

– Division of labor and workgroup structure– Concentrating work in core group of editors

Leavitt, 1951; March & Simon, 1958; Malone, 1987; Rouse et al., 1992; Thompson, 1967

Page 30: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Implicit coordination: “Music of Italy”

Page 31: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Implicit coordination: “Music of Italy”

TUF-KAT: Set scope and structure

Page 32: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Implicit coordination: “Music of Italy”

Filling in by many contributors

Page 33: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Implicit coordination: “Music of Italy”

Restructuring by Jeffmatt

Page 34: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Research question

• What factors lead to improved quality?– More contributors– Explicit coordination

• Number of communication edits

– Implicit coordination• Concentration among editors

Page 35: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Measuring concentration

• If an article has 100 edits and 10 editors, it could have:– 10 editors making 10 edits each

Page 36: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Measuring concentration

• If an article has 100 edits and 10 editors, it could have:– 10 editors making 10 edits each– 1 editor making 90 edits

Page 37: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Measuring concentration

• If an article has 100 edits and 10 editors, it could have:– 10 editors making 10 edits each– 1 editor making 90 edits

• Measure concentration with Gini coefficient

Page 38: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Measuring concentration

• If an article has 100 edits and 10 editors, it could have:– 10 editors making 10 edits each– 1 editor making 90 edits

• Measure concentration with Gini coefficient

Gini = 0

Page 39: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Measuring concentration

• If an article has 100 edits and 10 editors, it could have:– 10 editors making 10 edits each– 1 editor making 90 edits

• Measure concentration with Gini coefficient

Gini = 0 Gini ~ 1

Page 40: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Measuring quality

• Wikipedia 1.0 quality assessment scale – Over 900,000 assessments– 6 classes of quality, from “Stub” up to

“Featured”– Top 3 classes require increasingly rigorous

peer review• Validated community assessments with

non-expert judges (r = .54***)

Page 41: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Analysis

Page 42: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Analysis

Page 43: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Analysis

Page 44: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Editors + coordination

1. Editors no effect on quality2. Communication increase in quality3. Concentration increase in quality

Page 45: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Communication x Editors

• Communication does not scale to the crowd– Effective with few editors– Ineffective with many editors

Page 46: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Concentration x Editors

• Concentration enables effective harnessing of the crowd– High concentration: more editors increase quality– Low concentration: more editors reduce quality

Page 47: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Summary

• Wikipedia includes large degree of coordination

• Adding more editors does not improve quality– Coordination between editors is critical

• Type of coordination is important– Communication does not scale to large

groups– Concentration does scale to large groups

Page 48: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

TOOLS FOR SOCIAL COLLABORATION

Page 49: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Profits and perils of user-generated content

• Content in Wikipedia can be added or changed by anyone

• Because of this, has become one of the most important information resources on the web– Top 10 most popular websites (Alexa.com)– Millions of contributors

• Also causes problems– Conflict between contributors– Unknown trustworthiness

Page 50: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Denning et al. (2005)

• Risks with using Wikipedia– Accuracy of content– Motives of editors– Expertise of editors– Stability of article– Coverage of topics– Quality of cited information

Insufficient information to evaluate trustworthiness

Page 51: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

History flow

Page 52: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Details

Page 53: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Vandalism

Page 54: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Anonymous contribution

M$: many anonymous contributors

Brazil: few anonymous contributors

Page 55: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Edit war

Page 56: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Conflict at the user level

• How can we identify conflict between users?

Kittur et al., 2007; Suh et al. 2007; Brandes & Lerner, 2008

Page 57: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Terry Schiavo

Mediators

Sympathetic to parents

Sympathetic to husband

Anonymous (vandals/spammers)

Page 58: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Dokdo/Takeshima opinion groups

Group A

Group B Group C

Group D

Page 59: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Ekstrand & Riedl, 2009

Page 60: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Ekstrand & Riedl (2009)

Page 61: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Ekstrand & Riedl (2009)

Page 62: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Trust

• Numerous studies surface trust-relevant information– Editors [Adler & Alfaro, 2007; Dondio et al., 2006; Zeng

et al., 2006]

– Stability [Suh et al., 2008]

– Conflict [Kittur et al., 2007; Viegas et al., 2004]

• But how much impact can this have on user perceptions in a system which is inherently mutable?

Page 63: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

What would make you trust Wikipedia more?

Nothing

Page 64: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

What would make you trust Wikipedia more?

“Wikipedia, just by its nature, is impossible to trust completely. I don't think this can necessarily be changed.”

Page 65: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Hypotheses

1. Visualization will impact perceptions of trust

2. Compared to baseline, visualization will impact trust both positively and negatively

3. Visualization should have most impact when high uncertainty about article• Low quality• High controversy

Page 66: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Design

• 3 x 2 x 2 design

Abortion

George Bush

Volcano

Shark

Pro-life feminism

Scientology and celebrities

Disk defragmenter

Beeswax

Controversial

Uncontroversial

High quality

Low quality

Visualization

• High trust• Low trust• Baseline

(none)

Page 67: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Method

• Users recruited via Amazon’s Mechanical Turk– 253 participants– 673 ratings– 7 cents per rating– Kittur, Chi, & Suh, CHI 2008: Crowdsourcing user

studies

Page 68: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Example: High trust visualization

Page 69: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Example: Low trust visualization

Page 70: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Summary info: Editor

• % from anonymous users

Page 71: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Summary info: Editor

• % from anonymous users

• Last change by anonymous or established user

Page 72: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Summary info: Stability

• Stability of words

Page 73: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Summary info: Stability

• Instability

Page 74: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Summary info: Conflict

• Instability• Conflict

Page 75: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Results

1. Significant effect of visualization– High > low, p < .001

2. Both positive and negative effects– High > baseline, p < .001– Low < baseline, p < .01

3. No effect of article uncertainty– No interaction of

visualization with either quality or controversy

– Robust across conditions

Page 76: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Results

1. Significant effect of visualization– High > low, p < .001

2. Both positive and negative effects– High > baseline, p < .001– Low < baseline, p < .01

3. No effect of article uncertainty– No interaction of

visualization with either quality or controversy

– Robust across conditions

Page 77: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.

Results

1. Significant effect of visualization– High > low, p < .001

2. Both positive and negative effects– High > baseline, p < .001– Low < baseline, p < .01

3. No effect of article uncertainty– No interaction of

visualization with either quality or controversy

– Robust across conditions

Page 78: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.
Page 79: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu.