Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)
description
Transcript of Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)
![Page 1: Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)](https://reader033.fdocuments.net/reader033/viewer/2022052904/557d672ed8b42a7c638b45d1/html5/thumbnails/1.jpg)
Image from: http://www.flickr.com/photos/ourcommon/480538715/
Ed H. Chi, Principal Scientist and Area Manager
Peter Pirolli, Lichan Hong Bongwon Suh, Les Nelson Gregorio Convertino, Sharoda Paul
Interns: Sanjay Kairam, Jilin Chen, Brent HectMichael Bernstein Alumni: Raluca Budiu, Bryan Pendleton, Niki Kittur, Todd Mytkowicz, Terrell Russell, Brynn Evans, Bryan Chan, KMRC students
Augmented Social Cognition Area Palo Alto Research Center
![Page 2: Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)](https://reader033.fdocuments.net/reader033/viewer/2022052904/557d672ed8b42a7c638b45d1/html5/thumbnails/2.jpg)
2010-10-22 IBM NPUC 2010 2
To: [email protected] From: Brad Barrish <brad@…removed.for.privacy….com> Subject: Pancreatic cancer Date: Thu, 1 Feb 2007 21:37:55 PST
Hey Ed. I'm a fellow del.icio.us user and noticed you bookmark a lot of pancreatic cancer stuff. I'm at home with my dad who was diagnosed a little over a year ago and is now at the tale end of things. I've learned a lot through his treatments and about what's out there. I dunno if it's something you or a family member has, but just wanted to drop you an email. Be well.
Brad
![Page 3: Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)](https://reader033.fdocuments.net/reader033/viewer/2022052904/557d672ed8b42a7c638b45d1/html5/thumbnails/3.jpg)
Cognition: the ability to remember, think, and reason; the faculty of knowing.
Social Cognition: the ability of a group to remember, think, and reason; the construction of knowledge structures by a group. – (not quite the same as in the branch of psychology that studies the
cognitive processes involved in social interaction, though included)
Augmented Social Cognition: Supported by systems, the enhancement of the ability of a group to remember, think, and reason; the system-‐supported construction of knowledge structures by a group.
Citation: Chi, IEEE Computer, Sept 2008
3 2010-10-22 IBM NPUC 2010
![Page 4: Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)](https://reader033.fdocuments.net/reader033/viewer/2022052904/557d672ed8b42a7c638b45d1/html5/thumbnails/4.jpg)
Kudos to Todd Mytkowicz and Rowan Nairn
![Page 5: Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)](https://reader033.fdocuments.net/reader033/viewer/2022052904/557d672ed8b42a7c638b45d1/html5/thumbnails/5.jpg)
Topics Concepts
Users Documents
Tags
T1…Tn Encoding Decoding
Noise
2010-10-22 5 IBM NPUC 2010
![Page 6: Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)](https://reader033.fdocuments.net/reader033/viewer/2022052904/557d672ed8b42a7c638b45d1/html5/thumbnails/6.jpg)
H(Tag) shows tag saturation H(Doc | Tag), browsability
2010-10-22 IBM NPUC 2010 6
![Page 7: Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)](https://reader033.fdocuments.net/reader033/viewer/2022052904/557d672ed8b42a7c638b45d1/html5/thumbnails/7.jpg)
I(Doc; Tag) Mutual Information Raise in avg. tag / bookmark
2010-10-22 IBM NPUC 2010 7
![Page 8: Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)](https://reader033.fdocuments.net/reader033/viewer/2022052904/557d672ed8b42a7c638b45d1/html5/thumbnails/8.jpg)
2010-10-22 8
Guide
Web
Howto
Tips Help
Tools
Tip
Tricks
Tutorial
Tutorials
Reference
Semantic Similarity Graph
IBM NPUC 2010
![Page 9: Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)](https://reader033.fdocuments.net/reader033/viewer/2022052904/557d672ed8b42a7c638b45d1/html5/thumbnails/9.jpg)
Spreading Activation in a bi-‐graph Computation over a very large data set
– 150 Million+ bookmarks
Tags URLs
P(URL|Tag)
P(Tag|URL)
2010-10-22 9 IBM NPUC 2010
![Page 10: Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)](https://reader033.fdocuments.net/reader033/viewer/2022052904/557d672ed8b42a7c638b45d1/html5/thumbnails/10.jpg)
2010-10-22 10 IBM NPUC 2010
![Page 11: Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)](https://reader033.fdocuments.net/reader033/viewer/2022052904/557d672ed8b42a7c638b45d1/html5/thumbnails/11.jpg)
Kudos to Bongwon Suh, Niki Kittur
![Page 12: Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)](https://reader033.fdocuments.net/reader033/viewer/2022052904/557d672ed8b42a7c638b45d1/html5/thumbnails/12.jpg)
What drives contributions to Wikipedia?
Conflicts drives most of the contributions to Wikipedia. – How do we measure conflicts?
Conflicts cause coordination costs to go up. – Measuring coordination costs
2010-10-22 IBM NPUC 2010 12
![Page 13: Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)](https://reader033.fdocuments.net/reader033/viewer/2022052904/557d672ed8b42a7c638b45d1/html5/thumbnails/13.jpg)
2010-10-22 13 IBM NPUC 2010
![Page 14: Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)](https://reader033.fdocuments.net/reader033/viewer/2022052904/557d672ed8b42a7c638b45d1/html5/thumbnails/14.jpg)
Mediators
Sympathetic to parents
Sympathetic to husband
Anonymous (vandals/spammers)
2010-10-22 14 IBM NPUC 2010
![Page 15: Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)](https://reader033.fdocuments.net/reader033/viewer/2022052904/557d672ed8b42a7c638b45d1/html5/thumbnails/15.jpg)
2010-10-22 IBM NPUC 2010 15
Counting ‘Controversial’ labels 5x cross-‐validation, R2 = 0.897
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Predicted controversial revisions
Actu
al c
ontr
over
sial r
evisi
ons
![Page 16: Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)](https://reader033.fdocuments.net/reader033/viewer/2022052904/557d672ed8b42a7c638b45d1/html5/thumbnails/16.jpg)
Number of Articles (Log Scale)
http://en.wikipedia.org/wiki/Wikipedia:Modelling_Wikipedia’s_growth
2010-10-22 16 IBM NPUC 2010
![Page 17: Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)](https://reader033.fdocuments.net/reader033/viewer/2022052904/557d672ed8b42a7c638b45d1/html5/thumbnails/17.jpg)
Monthly Edits
2010-10-22 17 IBM NPUC 2010
![Page 18: Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)](https://reader033.fdocuments.net/reader033/viewer/2022052904/557d672ed8b42a7c638b45d1/html5/thumbnails/18.jpg)
Monthly Edits
2010-10-22 18 IBM NPUC 2010
![Page 19: Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)](https://reader033.fdocuments.net/reader033/viewer/2022052904/557d672ed8b42a7c638b45d1/html5/thumbnails/19.jpg)
*In thousands Monthly Active Editors
2010-10-22 19 IBM NPUC 2010
![Page 20: Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)](https://reader033.fdocuments.net/reader033/viewer/2022052904/557d672ed8b42a7c638b45d1/html5/thumbnails/20.jpg)
*In thousands Monthly Active Editors
2010-10-22 20 IBM NPUC 2010
![Page 21: Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)](https://reader033.fdocuments.net/reader033/viewer/2022052904/557d672ed8b42a7c638b45d1/html5/thumbnails/21.jpg)
Preferential Attachment: Edits beget edits – more number of previous edits, more number of new edits
Growth rate of population
Current population
Growth rate depends on: N = current population r = growth rate of the population
2010-10-22 21 IBM NPUC 2010
!
dNdt
= r " N
!
N(t) = N0 " ert
![Page 22: Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)](https://reader033.fdocuments.net/reader033/viewer/2022052904/557d672ed8b42a7c638b45d1/html5/thumbnails/22.jpg)
Ecological population growth model – Also depend on environmental conditions – K, carrying capacity (due to resource limitation)
€
dNdt
= rN(1− NK)
2010-10-22 22 IBM NPUC 2010
![Page 23: Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)](https://reader033.fdocuments.net/reader033/viewer/2022052904/557d672ed8b42a7c638b45d1/html5/thumbnails/23.jpg)
Follows a logistic growth curve
New Article
2010-10-22 23 IBM NPUC 2010
![Page 24: Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)](https://reader033.fdocuments.net/reader033/viewer/2022052904/557d672ed8b42a7c638b45d1/html5/thumbnails/24.jpg)
Biological system – Competition increases as
population hit the limits of the ecology
– Advantage go to members of the population that have competitive dominance over others
Analogy – Limited opportunities to make
novel contributions – Increased patterns of conflict and
dominance
2010-10-22 24 IBM NPUC 2010
![Page 25: Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)](https://reader033.fdocuments.net/reader033/viewer/2022052904/557d672ed8b42a7c638b45d1/html5/thumbnails/25.jpg)
Monthly Ratio of Reverted Edits
2010-10-22 25 IBM NPUC 2010
![Page 26: Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)](https://reader033.fdocuments.net/reader033/viewer/2022052904/557d672ed8b42a7c638b45d1/html5/thumbnails/26.jpg)
2010-10-22 26 IBM NPUC 2010
![Page 27: Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)](https://reader033.fdocuments.net/reader033/viewer/2022052904/557d672ed8b42a7c638b45d1/html5/thumbnails/27.jpg)
Kudos to Brent Hecht, Jilin Chen, Bongwon Suh, Lichan Hong
![Page 28: Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)](https://reader033.fdocuments.net/reader033/viewer/2022052904/557d672ed8b42a7c638b45d1/html5/thumbnails/28.jpg)
![Page 29: Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)](https://reader033.fdocuments.net/reader033/viewer/2022052904/557d672ed8b42a7c638b45d1/html5/thumbnails/29.jpg)
![Page 30: Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)](https://reader033.fdocuments.net/reader033/viewer/2022052904/557d672ed8b42a7c638b45d1/html5/thumbnails/30.jpg)
n = 10,000 users with 5 or more tweets
All Users Who Manually Specified Location
![Page 31: Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)](https://reader033.fdocuments.net/reader033/viewer/2022052904/557d672ed8b42a7c638b45d1/html5/thumbnails/31.jpg)
n = 3,311 users with 5 or more tweets
Users w/ No Useful Location Information Manually Entered
![Page 32: Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)](https://reader033.fdocuments.net/reader033/viewer/2022052904/557d672ed8b42a7c638b45d1/html5/thumbnails/32.jpg)
Schrute Farms User ID 39111154
User ID 75135928
NONE YA BISNESS!!
User ID 57987417
in jail...smh
not tellin you User ID 130681147
![Page 33: Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)](https://reader033.fdocuments.net/reader033/viewer/2022052904/557d672ed8b42a7c638b45d1/html5/thumbnails/33.jpg)
wherever justin wants me to be
User ID 71097545
User ID 77503970
Justin Biebers heart!
User ID 134222427
Jonasbieberland3
Bieber Island User ID 91705969
![Page 34: Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)](https://reader033.fdocuments.net/reader033/viewer/2022052904/557d672ed8b42a7c638b45d1/html5/thumbnails/34.jpg)
n = 10,000 users with 5 or more tweets
All Twitter Users
![Page 35: Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)](https://reader033.fdocuments.net/reader033/viewer/2022052904/557d672ed8b42a7c638b45d1/html5/thumbnails/35.jpg)
n = 2,965 users with 5 or more tweets
Users w/ Informative Location in the United States
![Page 36: Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)](https://reader033.fdocuments.net/reader033/viewer/2022052904/557d672ed8b42a7c638b45d1/html5/thumbnails/36.jpg)
California User ID 125271323
User ID 92455577
Skinny Jeans City, IL
User ID 92455577
Bieberville, California
East Jesus Nowhere, Indiana
User ID 26526957
![Page 37: Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)](https://reader033.fdocuments.net/reader033/viewer/2022052904/557d672ed8b42a7c638b45d1/html5/thumbnails/37.jpg)
All 1,698 Fake Locations Yahoo! Geocoder
Justin Biebers heart!
![Page 38: Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)](https://reader033.fdocuments.net/reader033/viewer/2022052904/557d672ed8b42a7c638b45d1/html5/thumbnails/38.jpg)
All 1,698 Fake Locations Yahoo! Geocoder
Justin Biebers heart!
Lat = 36.328785 Lon = -91.700189
![Page 39: Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)](https://reader033.fdocuments.net/reader033/viewer/2022052904/557d672ed8b42a7c638b45d1/html5/thumbnails/39.jpg)
Location of Justin Bieber’s Heart (Don’t Tell Your Teenage Daughters)
![Page 40: Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)](https://reader033.fdocuments.net/reader033/viewer/2022052904/557d672ed8b42a7c638b45d1/html5/thumbnails/40.jpg)
![Page 41: Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)](https://reader033.fdocuments.net/reader033/viewer/2022052904/557d672ed8b42a7c638b45d1/html5/thumbnails/41.jpg)
Country-scale
10-fold cross validation multinomial naive bayes classifier
2.4x better than random
![Page 42: Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)](https://reader033.fdocuments.net/reader033/viewer/2022052904/557d672ed8b42a7c638b45d1/html5/thumbnails/42.jpg)
State-scale
20% test set multinomial naive bayes classifier
2.2x better than random
![Page 43: Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)](https://reader033.fdocuments.net/reader033/viewer/2022052904/557d672ed8b42a7c638b45d1/html5/thumbnails/43.jpg)
Which tweet features are associated with retweet? Retweet Model
– # Retweet ~ function(f1, f2, …., fn), where fi are simple features extracted from a tweet
74M tweets from Twitter Stream API – Characterization – 2~3 % sample – Hadoop / Hbase / MapReduce
2010-10-22 43 IBM NPUC 2010
![Page 44: Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)](https://reader033.fdocuments.net/reader033/viewer/2022052904/557d672ed8b42a7c638b45d1/html5/thumbnails/44.jpg)
# Followees: 395 # Followers: 1,400 # Favorite: 1,657 # Day: (since June 17, 2008) # Past tweets: 21,000
Contextual Features
URL Hashtag
Mention
Content Features
2010-10-22 44 IBM NPUC 2010
Two Types of Features
![Page 45: Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)](https://reader033.fdocuments.net/reader033/viewer/2022052904/557d672ed8b42a7c638b45d1/html5/thumbnails/45.jpg)
Con
tent
Fac
tor
Contextual Factor
2010-10-22 45 IBM NPUC 2010
![Page 46: Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)](https://reader033.fdocuments.net/reader033/viewer/2022052904/557d672ed8b42a7c638b45d1/html5/thumbnails/46.jpg)
Information Streams =>Information Overload
ASC Social Recommender
Engine
2010-10-22 46 IBM NPUC 2010
![Page 47: Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)](https://reader033.fdocuments.net/reader033/viewer/2022052904/557d672ed8b42a7c638b45d1/html5/thumbnails/47.jpg)
My Friends’ URLs
Popular URLs
Recommendation Algorithm: Combining Sources and
Models
Recommendations
My Friends’ Network and Tweeting Pattern
Social Ranking Model
My Tweets
My Friends’ Tweets
Topic Relevance Model
2010-10-22 47 IBM NPUC 2010
![Page 48: Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)](https://reader033.fdocuments.net/reader033/viewer/2022052904/557d672ed8b42a7c638b45d1/html5/thumbnails/48.jpg)
Hadoop Compute Cluster – 50 nodes, depending on project requirement – ~40TB storage capacity – Experience with Hbase, Pig, Interaction with Lucene, MySQL
Large-‐scale crawling and analytics experience with – Wikipedia (all edits up to 2009) – Delicious data set (200M bookmarks) – Twitter (70M+ Tweets)
Experience with Large Scale Social Analytics – Example 1: Visual analytics in Wikipedia (wikidashboard.com) – Example 2: Search engines for social bookmarks (mrtaggy.com) – Example 3: Recommenders for Twitter news (zerozero88.com)
2010-10-22 IBM NPUC 2010 48
![Page 49: Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)](https://reader033.fdocuments.net/reader033/viewer/2022052904/557d672ed8b42a7c638b45d1/html5/thumbnails/49.jpg)
2010-10-22 IBM NPUC 2010 49
![Page 50: Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)](https://reader033.fdocuments.net/reader033/viewer/2022052904/557d672ed8b42a7c638b45d1/html5/thumbnails/50.jpg)
Image from: http://www.flickr.com/photos/ourcommon/480538715/
Research Vision: Understand how social computing systems can enhance the ability of a group of people to remember, think, and reason.
Understand and support Collective Intelligence by modeling social group behaviors and testing prototype tools in Living Labs
http://asc-‐parc.blogspot.com http://www.edchi.net [email protected]