A glimpse on social influence and link prediction in OSNs Keywords : link creation, link prediction,...
-
Upload
lilian-sutton -
Category
Documents
-
view
217 -
download
0
Transcript of A glimpse on social influence and link prediction in OSNs Keywords : link creation, link prediction,...
A glimpse on social influence and link prediction in OSNs
Keywords : link creation, link prediction, homophily, social influence, aNobii
Workshop on Data Driven Dynamical Networks
Speaker:
Luca Maria Aiello, PhD studentUniversità degli Studi di TorinoComputer Science [email protected]
Acknowledgments
Giancarlo RuffoRossano Schifanella
Alain BarratCiro Cattuto
People:
Università degli Studi di Torino ISI Foundation
School of Informatics and Computing, Indiana University
Filippo Menczer
3
Dynamics leading to link creation
Several theories from sociology◦Self-interest◦Mutual-interest◦Exchange◦Contagion (influence)◦Balance◦Homophily◦Proximity
28/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino
Food networksCollaboration networks
Social media
2nd part:exploit the observations on these phenomena to predict future links
428/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino
• Dataset• Topical overlap
• Homophily and influence
• Link prediction
• Conclusions
Outline
528/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino
• Dataset• Topical overlap
• Homophily and influence
• Link prediction
• Conclusions
Outline
6
Social network for bookworms
Data-driven analysis on anobii.com
28/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino
Social network◦Directed◦Friendship + neighborhood
Profile features◦ Library and wishlist◦Groups ◦Tags
4th snapshot Friendship Neighborhood Union
Nodes 74,908 54,590 86,800
Links 268,655 429,482 697,910
6 snapshots, 15 days apartFull giant connected component
Basic statistics
28/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino 7
Broad distributionsPositive correlations
between connectivity and activity
Assortativity
103
102
101
100
100 101 102 103kout
ng(kout)
nb(kout)
nw(kout)
8
Triadic closure
Reciprocation is strong (exchange)Users tend to choose “friends of their friends”
as new friends (balance)
28/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino
Direct Reciprocated BidirectionalClosure
Doubleclosure
75% 20% 25%30% 10%
Classification of new links at time t+1 between nodes already present at time t (t ∈ {1,…,5})
928/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino
• Dataset• Topical overlap
• Homophily and influence
• Link prediction
• Conclusions
Outline
10
Profile similarity vs. social distance
Topical overlapStatistical correlation because of assortative biases?Null model to discern real overlap from purely statistical
effects◦ No topical overlap other than that caused by statistical
mixing patters28/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino
vnun
bbvu
bb
b vub
,
Does similarity between user profiles depend on the social distance?
SocialCom 2010 - Luca Maria Aiello, Università degli Studi di Torino 11
Geographical overlap
Null model test with random link rewire
Country-level overlap due to language barriers
City level overlap
22/08/2010
1228/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino
• Dataset• Topical overlap
• Homophily and influence
• Link prediction
• Conclusions
Outline
13
Causality between similarity and link creation
Topical overlap is observed for all profile features
Three possible explanations:1. Homophily (people connect with similar people)2. Social influence (social connection conveys
similarity)3. Mixture of the two
Explore the causality relationship between profile similarity and social linking
28/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino
What is the cause of topical overlap?
14
Similarity link creation (homophily)
28/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino
⟨ncb⟩ σb ⟨ncg⟩ σg
duv = 2 9.5 0.02 1.12 0.05
u → v 12.9 0.04 1.10 0.08
u ↔ v 18.5 0.04 1.67 0.11Closure 18.2 0.04 1.81 0.10Dbl closure 23.4 0.05 1.20 0.12
Average similarity of pairs forming new links between t and t+1 (t=4), compared with average similarity of all the pairs at distance 2 at time t
Pairs that are going to get connected show a substantially higher similarity
15
Link creation similarity (influence)
28/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino
Evolution of the similarity between pairs linking together at different times
Groups
Books
16
Summary
Theories to explain link creation◦Self-interest◦Mutual-interest◦Exchange Reciprocity in linking◦Contagion Social influence◦Balance Triangle closure◦Homophily For all profile features◦Proximity Geographical and on social graph
28/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino
Can we exploit the observations on these phenomena to predict future links?
1728/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino
• Dataset• Topical overlap
• Homophily and influence
• Link prediction
• Conclusions
Outline
18
Link prediction
Snapshots at time t and t+1Predict links created between t and t+1 given the
whole information at time tSupervised learning approach to combine profile
and structural features
28/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino
Pair Id Library sim. Common neighbors Will be connected?
1 0.56 18 1
2 0.11 5 0
3 0.71 36 1
Learning set example
FeaturesProfile
◦ Library (cosine)◦ Groups (cosine)◦ Groups (size)
◦ Gender {0,1}◦ Town {0,1}◦ Age (|age1 – age2|)◦ Country {0,1}◦ Vocabulary (cosine)◦ Wishlists (cosine)◦ Tagging behavior
28/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino 19
Structural◦ Common neighbors◦ Distance on graph◦ Preferential attachment
◦ Resource allocation
◦ Local path
)()( )(
1
yxzxy zks
)()( ykxksxy
]1,0[,32 AAS
)()( ||
1
yGxGgxy gs
20
Link prediction: preliminary results
28/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino
Rotation forest, 10-fold cross-validation, balanced sets
Rotation forest, 10-fold cross-validation, unbalanced sets
Precision Recall F-measure AUC
Structural 0.782 0.778 0.777 0.838
Topical 0.746 0.746 0.746 0.82
Complete 0.827 0.826 0.826 0.9
Complete
K-ratio Precision Recall F-measure AUC
1:1 0.827 0.826 0.826 0.9
1:10 0.934 0.94 0.933 0.897
1:100 0.988 0.991 0.987 0.86
2128/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino
• Dataset• Topical overlap
• Homophily and influence
• Link prediction
• Conclusions
Outline
22
Conclusions and future work
Theories on social network growth are verifiedCausality between similarity and social
connectionEffective link detection/prediction
◦Topical information seems to be predictive as well as structural information
RFC:◦ Link prediction sampling/evaluation procedure◦New challenges in prediction
28/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino
Speaker: Luca Maria [email protected]
www.di.unito.it/~aiello
Thank you for your attention!
Workshop on Data Driven Dynamical Networks
Reference:L. M. Aiello, A. Barrat, C. Cattuto, G. Ruffo, R. Schifanella "Link creation and profile alignment in the aNobii social network"In SocialCom'10: Proceedings of the 2nd IEEE International Conference on Social Computing, Minneapolis, MN, USA, August 2010