He Says, She Says: Conflict and Coordination in Wikipedia
Aniket Kittur, Bongwon Suh, Bryan Pendleton, Ed ChiUCLA Augmented Social Cognition Group
Palo Alto Research Center
What is Wikipedia?
“Wikipedia is the best thing ever. Anyone in the world can write anything they want about any subject, so you know you’re getting the best possible information.”
– Steve Carell, The Office
Spreading conflict
Spreading conflict
Spreading conflict
Spreading conflict
Spreading conflict
Policy and procedure
“The degree of success that one meets in dealing with conflicts... often depends on the efficiency with which one can quote policy and precedent.” - Wikipedia admin (survey
data)
Collaborative work beneath the surface
• Visitors only look at article pages• But much of Wikipedia comprised of
other pages– Conflict resolution, coordination, policies and
procedures
Characterizing coordination and conflict
Characterizing coordination and conflict
Exponential growth
Costs of growth
• Increase in conflict and coordination costs– Software development (Boehm, 1981; Brooks, 1975)
– MUDs/MOOs (Curtis, 1992; Dibbell, 1993)
– Mailing lists (Sproull & Kiesler, 1991)
• How has growth affected Wikipedia?– Millions of new users and articles
Infrastructure
• Analyze entire history of Wikipedia– Every edit to every article
• Large amount of data– 4+ million pages– 58+ million revisions– 800+ Gb– as of June 2006
• Distributed processing– Hadoop distributed filesystem– Map/reduce to process data in parallel
Types of work
Direct work Immediately consumable
Indirect workCoordination,
conflict
Maintenance work Reverts, vandalism
Article Talk, user, procedure
Less direct work
• Decrease in proportion of edits to article page
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
2001 2002 2003 2004 2005 2006
Edi
t pr
opor
tion
70%
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
2001 2002 2003 2004 2005 2006
Ed
it P
rop
ort
ion
More indirect work
• Increase in proportion of edits to user talk
8%
More indirect work
• Increase in proportion of edits to user talk
• Increase in proportion of edits to procedure
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
2001 2002 2003 2004 2005 2006
Edi
t pr
opor
tion 11
%
More maintenance work
• Increase in proportion of edits that are reverts
00.020.040.060.08
0.10.120.140.160.18
0.2
2001 2002 2003 2004 2005 2006
Ed
it p
rop
ort
ion
7%
More wasted work
• Increase in proportion of edits that are reverts
• Increase in proportion of edits reverting vandalism
00.005
0.010.015
0.02
0.0250.03
2001 2002 2003 2004 2005
Ed
it p
rop
ort
ion
1-2%
Global level
• Conflict and coordination costs are growing– Less direct work (articles)+ More indirect work (article talk, user,
procedure)+ More maintenance work (reverts, vandalism)
60%
65%
70%
75%
80%
85%
90%
95%
100%
2001 2002 2003 2004 2005 2006
Pe
rce
nta
ge
of t
ota
l ed
its
Article
User
Article Talk
User Talk
Other
Maintenance
Characterizing coordination and conflict
Conflict at the article level
• What defines conflict in articles?• Build a characterization model of article
conflict– Identify page features and metrics
associated with conflict– Automatically identify high-conflict articles
Page metrics
• Chose metrics for identifying conflict in articles– Easily computable, scalable
Metric type Page Type
Revisions (#)Article, talk, article/talk
Page lengthArticle, talk, article/talk
Unique editorsArticle, talk, article/talk
Unique editors / revisions
Article, talk
Links from other articles Article, talk
Links to other articles Article, talk
Anonymous edits (#, %) Article, talk
Administrator edits (#, %)
Article, talk
Minor edits (#, %) Article, talk
Reverts (#, by unique editors)
Article
Defining conflict
• Operational definition for conflict • Revisions tagged controversial
• Conflict revision count
Machine learning
• Predict conflict from page metrics– Training set of “controversial” pages– Support vector machine regression
predicting # controversial revisions (SMOreg; Smola & Scholkopf, 1998)
• Not just conflict/no conflict, but how much conflict
Performance: Cross-validation
• 5x cross-validation, R2 = 0.897
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Predicted controversial revisions
Act
ual c
ontrov
ersial
revi
sion
s
Performance: Cross-validation
• 5x cross-validation, R2 = 0.897
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Predicted controversial revisions
Act
ual c
ontrov
ersial
revi
sion
s
Determinants of conflict
1. —Revisions (talk)2. —Minor edits (talk)3. ˜Unique editors (talk)4. —Revisions (article)5. ˜Unique editors (article)6. —Anonymous edits (talk)7. ˜Anonymous edits (article)
Highly weighted metrics of conflict model:
Identifying untagged articles
• Detect conflicts for unlabeled articles– Majority of articles have never been conflict
tagged
• Testing model generalization– Applied model to untagged articles– Sample rated by expert Wikipedians
• Significant positive correlation with predicted scores– By rank correlation, p < 0.013 (Spearman’s
rho)
Characterizing coordination and conflict
Conflict at the user level
• How can we identify conflict between users?
• Reverts as a proxy for user conflict• Revert patterns between users• Force directed layout to cluster users
– Group similar viewpoints– Find conflicts between groups
Dokdo/Takeshima opinion groups
Group A
Group B Group C
Group D
Terry Schiavo
Mediators
Sympathetic to parents
Sympathetic to husband
Anonymous (vandals/spammers)
Summary: Characterizing Wikipedia
• Coordination costs and conflict are increasing
• Global-level: Trend identification– Decrease in direct article work– Increase in indirect coordination work– Increase in maintenance work
• Article-level: Prediction using Machine learning– Identify characteristics of article conflict– Detect conflict-heavy articles needing extra
attention
• User-level: User Conflict Visualization– Make sense of user conflicts and identify shared
viewpoints
Future Work
• Applied to many domains– Corporate memory (Socialtext)– Intelligence gathering (Intellipedia)– Scholarly research (Scholarpedia)– Collaborative problem solving (Lostpedia)
• Application: Social Dashboard– Identify high conflict articles– Surface editing patterns to readers– Route attention to articles that need it most
Future work
He Says, She Says: Conflict and Coordination in Wikipedia
Aniket Kittur, Bongwon Suh, Bryan Pendleton, Ed ChiUCLA Augmented Social Cognition Group
Palo Alto Research Center
Thank you!
Top Related