CHI2007 talk on Conflicts in Wikipedia
-
Upload
ed-chi -
Category
Technology
-
view
1.520 -
download
2
description
Transcript of CHI2007 talk on Conflicts in Wikipedia
![Page 1: CHI2007 talk on Conflicts in Wikipedia](https://reader035.fdocuments.net/reader035/viewer/2022062419/557d6732d8b42a7c638b45d4/html5/thumbnails/1.jpg)
He Says, She Says: Conflict and Coordination in Wikipedia
Aniket Kittur, Bongwon Suh, Bryan Pendleton, Ed ChiUCLA Augmented Social Cognition Group
Palo Alto Research Center
![Page 2: CHI2007 talk on Conflicts in Wikipedia](https://reader035.fdocuments.net/reader035/viewer/2022062419/557d6732d8b42a7c638b45d4/html5/thumbnails/2.jpg)
What is Wikipedia?
“Wikipedia is the best thing ever. Anyone in the world can write anything they want about any subject, so you know you’re getting the best possible information.”
– Steve Carell, The Office
![Page 3: CHI2007 talk on Conflicts in Wikipedia](https://reader035.fdocuments.net/reader035/viewer/2022062419/557d6732d8b42a7c638b45d4/html5/thumbnails/3.jpg)
Spreading conflict
![Page 4: CHI2007 talk on Conflicts in Wikipedia](https://reader035.fdocuments.net/reader035/viewer/2022062419/557d6732d8b42a7c638b45d4/html5/thumbnails/4.jpg)
Spreading conflict
![Page 5: CHI2007 talk on Conflicts in Wikipedia](https://reader035.fdocuments.net/reader035/viewer/2022062419/557d6732d8b42a7c638b45d4/html5/thumbnails/5.jpg)
Spreading conflict
![Page 6: CHI2007 talk on Conflicts in Wikipedia](https://reader035.fdocuments.net/reader035/viewer/2022062419/557d6732d8b42a7c638b45d4/html5/thumbnails/6.jpg)
Spreading conflict
![Page 7: CHI2007 talk on Conflicts in Wikipedia](https://reader035.fdocuments.net/reader035/viewer/2022062419/557d6732d8b42a7c638b45d4/html5/thumbnails/7.jpg)
Spreading conflict
![Page 8: CHI2007 talk on Conflicts in Wikipedia](https://reader035.fdocuments.net/reader035/viewer/2022062419/557d6732d8b42a7c638b45d4/html5/thumbnails/8.jpg)
Policy and procedure
“The degree of success that one meets in dealing with conflicts... often depends on the efficiency with which one can quote policy and precedent.” - Wikipedia admin (survey
data)
![Page 9: CHI2007 talk on Conflicts in Wikipedia](https://reader035.fdocuments.net/reader035/viewer/2022062419/557d6732d8b42a7c638b45d4/html5/thumbnails/9.jpg)
Collaborative work beneath the surface
• Visitors only look at article pages• But much of Wikipedia comprised of
other pages– Conflict resolution, coordination, policies and
procedures
![Page 10: CHI2007 talk on Conflicts in Wikipedia](https://reader035.fdocuments.net/reader035/viewer/2022062419/557d6732d8b42a7c638b45d4/html5/thumbnails/10.jpg)
Characterizing coordination and conflict
![Page 11: CHI2007 talk on Conflicts in Wikipedia](https://reader035.fdocuments.net/reader035/viewer/2022062419/557d6732d8b42a7c638b45d4/html5/thumbnails/11.jpg)
Characterizing coordination and conflict
![Page 12: CHI2007 talk on Conflicts in Wikipedia](https://reader035.fdocuments.net/reader035/viewer/2022062419/557d6732d8b42a7c638b45d4/html5/thumbnails/12.jpg)
Exponential growth
![Page 13: CHI2007 talk on Conflicts in Wikipedia](https://reader035.fdocuments.net/reader035/viewer/2022062419/557d6732d8b42a7c638b45d4/html5/thumbnails/13.jpg)
Costs of growth
• Increase in conflict and coordination costs– Software development (Boehm, 1981; Brooks, 1975)
– MUDs/MOOs (Curtis, 1992; Dibbell, 1993)
– Mailing lists (Sproull & Kiesler, 1991)
• How has growth affected Wikipedia?– Millions of new users and articles
![Page 14: CHI2007 talk on Conflicts in Wikipedia](https://reader035.fdocuments.net/reader035/viewer/2022062419/557d6732d8b42a7c638b45d4/html5/thumbnails/14.jpg)
Infrastructure
• Analyze entire history of Wikipedia– Every edit to every article
• Large amount of data– 4+ million pages– 58+ million revisions– 800+ Gb– as of June 2006
• Distributed processing– Hadoop distributed filesystem– Map/reduce to process data in parallel
![Page 15: CHI2007 talk on Conflicts in Wikipedia](https://reader035.fdocuments.net/reader035/viewer/2022062419/557d6732d8b42a7c638b45d4/html5/thumbnails/15.jpg)
Types of work
Direct work Immediately consumable
Indirect workCoordination,
conflict
Maintenance work Reverts, vandalism
Article Talk, user, procedure
![Page 16: CHI2007 talk on Conflicts in Wikipedia](https://reader035.fdocuments.net/reader035/viewer/2022062419/557d6732d8b42a7c638b45d4/html5/thumbnails/16.jpg)
Less direct work
• Decrease in proportion of edits to article page
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
2001 2002 2003 2004 2005 2006
Edi
t pr
opor
tion
70%
![Page 17: CHI2007 talk on Conflicts in Wikipedia](https://reader035.fdocuments.net/reader035/viewer/2022062419/557d6732d8b42a7c638b45d4/html5/thumbnails/17.jpg)
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
2001 2002 2003 2004 2005 2006
Ed
it P
rop
ort
ion
More indirect work
• Increase in proportion of edits to user talk
8%
![Page 18: CHI2007 talk on Conflicts in Wikipedia](https://reader035.fdocuments.net/reader035/viewer/2022062419/557d6732d8b42a7c638b45d4/html5/thumbnails/18.jpg)
More indirect work
• Increase in proportion of edits to user talk
• Increase in proportion of edits to procedure
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
2001 2002 2003 2004 2005 2006
Edi
t pr
opor
tion 11
%
![Page 19: CHI2007 talk on Conflicts in Wikipedia](https://reader035.fdocuments.net/reader035/viewer/2022062419/557d6732d8b42a7c638b45d4/html5/thumbnails/19.jpg)
More maintenance work
• Increase in proportion of edits that are reverts
00.020.040.060.08
0.10.120.140.160.18
0.2
2001 2002 2003 2004 2005 2006
Ed
it p
rop
ort
ion
7%
![Page 20: CHI2007 talk on Conflicts in Wikipedia](https://reader035.fdocuments.net/reader035/viewer/2022062419/557d6732d8b42a7c638b45d4/html5/thumbnails/20.jpg)
More wasted work
• Increase in proportion of edits that are reverts
• Increase in proportion of edits reverting vandalism
00.005
0.010.015
0.02
0.0250.03
2001 2002 2003 2004 2005
Ed
it p
rop
ort
ion
1-2%
![Page 21: CHI2007 talk on Conflicts in Wikipedia](https://reader035.fdocuments.net/reader035/viewer/2022062419/557d6732d8b42a7c638b45d4/html5/thumbnails/21.jpg)
Global level
• Conflict and coordination costs are growing– Less direct work (articles)+ More indirect work (article talk, user,
procedure)+ More maintenance work (reverts, vandalism)
60%
65%
70%
75%
80%
85%
90%
95%
100%
2001 2002 2003 2004 2005 2006
Pe
rce
nta
ge
of t
ota
l ed
its
Article
User
Article Talk
User Talk
Other
Maintenance
![Page 22: CHI2007 talk on Conflicts in Wikipedia](https://reader035.fdocuments.net/reader035/viewer/2022062419/557d6732d8b42a7c638b45d4/html5/thumbnails/22.jpg)
Characterizing coordination and conflict
![Page 23: CHI2007 talk on Conflicts in Wikipedia](https://reader035.fdocuments.net/reader035/viewer/2022062419/557d6732d8b42a7c638b45d4/html5/thumbnails/23.jpg)
Conflict at the article level
• What defines conflict in articles?• Build a characterization model of article
conflict– Identify page features and metrics
associated with conflict– Automatically identify high-conflict articles
![Page 24: CHI2007 talk on Conflicts in Wikipedia](https://reader035.fdocuments.net/reader035/viewer/2022062419/557d6732d8b42a7c638b45d4/html5/thumbnails/24.jpg)
Page metrics
• Chose metrics for identifying conflict in articles– Easily computable, scalable
Metric type Page Type
Revisions (#)Article, talk, article/talk
Page lengthArticle, talk, article/talk
Unique editorsArticle, talk, article/talk
Unique editors / revisions
Article, talk
Links from other articles Article, talk
Links to other articles Article, talk
Anonymous edits (#, %) Article, talk
Administrator edits (#, %)
Article, talk
Minor edits (#, %) Article, talk
Reverts (#, by unique editors)
Article
![Page 25: CHI2007 talk on Conflicts in Wikipedia](https://reader035.fdocuments.net/reader035/viewer/2022062419/557d6732d8b42a7c638b45d4/html5/thumbnails/25.jpg)
Defining conflict
• Operational definition for conflict • Revisions tagged controversial
• Conflict revision count
![Page 26: CHI2007 talk on Conflicts in Wikipedia](https://reader035.fdocuments.net/reader035/viewer/2022062419/557d6732d8b42a7c638b45d4/html5/thumbnails/26.jpg)
Machine learning
• Predict conflict from page metrics– Training set of “controversial” pages– Support vector machine regression
predicting # controversial revisions (SMOreg; Smola & Scholkopf, 1998)
• Not just conflict/no conflict, but how much conflict
![Page 27: CHI2007 talk on Conflicts in Wikipedia](https://reader035.fdocuments.net/reader035/viewer/2022062419/557d6732d8b42a7c638b45d4/html5/thumbnails/27.jpg)
Performance: Cross-validation
• 5x cross-validation, R2 = 0.897
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Predicted controversial revisions
Act
ual c
ontrov
ersial
revi
sion
s
![Page 28: CHI2007 talk on Conflicts in Wikipedia](https://reader035.fdocuments.net/reader035/viewer/2022062419/557d6732d8b42a7c638b45d4/html5/thumbnails/28.jpg)
Performance: Cross-validation
• 5x cross-validation, R2 = 0.897
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Predicted controversial revisions
Act
ual c
ontrov
ersial
revi
sion
s
![Page 29: CHI2007 talk on Conflicts in Wikipedia](https://reader035.fdocuments.net/reader035/viewer/2022062419/557d6732d8b42a7c638b45d4/html5/thumbnails/29.jpg)
Determinants of conflict
1. —Revisions (talk)2. —Minor edits (talk)3. ˜Unique editors (talk)4. —Revisions (article)5. ˜Unique editors (article)6. —Anonymous edits (talk)7. ˜Anonymous edits (article)
Highly weighted metrics of conflict model:
![Page 30: CHI2007 talk on Conflicts in Wikipedia](https://reader035.fdocuments.net/reader035/viewer/2022062419/557d6732d8b42a7c638b45d4/html5/thumbnails/30.jpg)
Identifying untagged articles
• Detect conflicts for unlabeled articles– Majority of articles have never been conflict
tagged
• Testing model generalization– Applied model to untagged articles– Sample rated by expert Wikipedians
• Significant positive correlation with predicted scores– By rank correlation, p < 0.013 (Spearman’s
rho)
![Page 31: CHI2007 talk on Conflicts in Wikipedia](https://reader035.fdocuments.net/reader035/viewer/2022062419/557d6732d8b42a7c638b45d4/html5/thumbnails/31.jpg)
Characterizing coordination and conflict
![Page 32: CHI2007 talk on Conflicts in Wikipedia](https://reader035.fdocuments.net/reader035/viewer/2022062419/557d6732d8b42a7c638b45d4/html5/thumbnails/32.jpg)
Conflict at the user level
• How can we identify conflict between users?
• Reverts as a proxy for user conflict• Revert patterns between users• Force directed layout to cluster users
– Group similar viewpoints– Find conflicts between groups
![Page 33: CHI2007 talk on Conflicts in Wikipedia](https://reader035.fdocuments.net/reader035/viewer/2022062419/557d6732d8b42a7c638b45d4/html5/thumbnails/33.jpg)
Dokdo/Takeshima opinion groups
Group A
Group B Group C
Group D
![Page 34: CHI2007 talk on Conflicts in Wikipedia](https://reader035.fdocuments.net/reader035/viewer/2022062419/557d6732d8b42a7c638b45d4/html5/thumbnails/34.jpg)
Terry Schiavo
Mediators
Sympathetic to parents
Sympathetic to husband
Anonymous (vandals/spammers)
![Page 35: CHI2007 talk on Conflicts in Wikipedia](https://reader035.fdocuments.net/reader035/viewer/2022062419/557d6732d8b42a7c638b45d4/html5/thumbnails/35.jpg)
Summary: Characterizing Wikipedia
• Coordination costs and conflict are increasing
• Global-level: Trend identification– Decrease in direct article work– Increase in indirect coordination work– Increase in maintenance work
• Article-level: Prediction using Machine learning– Identify characteristics of article conflict– Detect conflict-heavy articles needing extra
attention
• User-level: User Conflict Visualization– Make sense of user conflicts and identify shared
viewpoints
![Page 36: CHI2007 talk on Conflicts in Wikipedia](https://reader035.fdocuments.net/reader035/viewer/2022062419/557d6732d8b42a7c638b45d4/html5/thumbnails/36.jpg)
Future Work
• Applied to many domains– Corporate memory (Socialtext)– Intelligence gathering (Intellipedia)– Scholarly research (Scholarpedia)– Collaborative problem solving (Lostpedia)
• Application: Social Dashboard– Identify high conflict articles– Surface editing patterns to readers– Route attention to articles that need it most
![Page 37: CHI2007 talk on Conflicts in Wikipedia](https://reader035.fdocuments.net/reader035/viewer/2022062419/557d6732d8b42a7c638b45d4/html5/thumbnails/37.jpg)
Future work
![Page 38: CHI2007 talk on Conflicts in Wikipedia](https://reader035.fdocuments.net/reader035/viewer/2022062419/557d6732d8b42a7c638b45d4/html5/thumbnails/38.jpg)
He Says, She Says: Conflict and Coordination in Wikipedia
Aniket Kittur, Bongwon Suh, Bryan Pendleton, Ed ChiUCLA Augmented Social Cognition Group
Palo Alto Research Center
Thank you!