A glimpse on social influence and link prediction in OSNs Keywords : link creation, link prediction,...

23
A glimpse on social influence and link prediction in OSNs ywords : link creation, link prediction, homophily, social influence Workshop on Data Driven Dynamical Networks Speaker: Luca Maria Aiello, PhD student Università degli Studi di Torino Computer Science Department [email protected]

Transcript of A glimpse on social influence and link prediction in OSNs Keywords : link creation, link prediction,...

A glimpse on social influence and link prediction in OSNs

Keywords : link creation, link prediction, homophily, social influence, aNobii

Workshop on Data Driven Dynamical Networks

Speaker:

Luca Maria Aiello, PhD studentUniversità degli Studi di TorinoComputer Science [email protected]

Acknowledgments

Giancarlo RuffoRossano Schifanella

Alain BarratCiro Cattuto

People:

Università degli Studi di Torino ISI Foundation

School of Informatics and Computing, Indiana University

Filippo Menczer

3

Dynamics leading to link creation

Several theories from sociology◦Self-interest◦Mutual-interest◦Exchange◦Contagion (influence)◦Balance◦Homophily◦Proximity

28/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino

Food networksCollaboration networks

Social media

2nd part:exploit the observations on these phenomena to predict future links

428/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino

• Dataset• Topical overlap

• Homophily and influence

• Link prediction

• Conclusions

Outline

528/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino

• Dataset• Topical overlap

• Homophily and influence

• Link prediction

• Conclusions

Outline

6

Social network for bookworms

Data-driven analysis on anobii.com

28/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino

Social network◦Directed◦Friendship + neighborhood

Profile features◦ Library and wishlist◦Groups ◦Tags

4th snapshot Friendship Neighborhood Union

Nodes 74,908 54,590 86,800

Links 268,655 429,482 697,910

6 snapshots, 15 days apartFull giant connected component

Basic statistics

28/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino 7

Broad distributionsPositive correlations

between connectivity and activity

Assortativity

103

102

101

100

100 101 102 103kout

ng(kout)

nb(kout)

nw(kout)

8

Triadic closure

Reciprocation is strong (exchange)Users tend to choose “friends of their friends”

as new friends (balance)

28/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino

Direct Reciprocated BidirectionalClosure

Doubleclosure

75% 20% 25%30% 10%

Classification of new links at time t+1 between nodes already present at time t (t ∈ {1,…,5})

928/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino

• Dataset• Topical overlap

• Homophily and influence

• Link prediction

• Conclusions

Outline

10

Profile similarity vs. social distance

Topical overlapStatistical correlation because of assortative biases?Null model to discern real overlap from purely statistical

effects◦ No topical overlap other than that caused by statistical

mixing patters28/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino

vnun

bbvu

bb

b vub

,

Does similarity between user profiles depend on the social distance?

SocialCom 2010 - Luca Maria Aiello, Università degli Studi di Torino 11

Geographical overlap

Null model test with random link rewire

Country-level overlap due to language barriers

City level overlap

22/08/2010

1228/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino

• Dataset• Topical overlap

• Homophily and influence

• Link prediction

• Conclusions

Outline

13

Causality between similarity and link creation

Topical overlap is observed for all profile features

Three possible explanations:1. Homophily (people connect with similar people)2. Social influence (social connection conveys

similarity)3. Mixture of the two

Explore the causality relationship between profile similarity and social linking

28/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino

What is the cause of topical overlap?

14

Similarity link creation (homophily)

28/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino

⟨ncb⟩ σb ⟨ncg⟩ σg

duv = 2 9.5 0.02 1.12 0.05

u → v 12.9 0.04 1.10 0.08

u ↔ v 18.5 0.04 1.67 0.11Closure 18.2 0.04 1.81 0.10Dbl closure 23.4 0.05 1.20 0.12

Average similarity of pairs forming new links between t and t+1 (t=4), compared with average similarity of all the pairs at distance 2 at time t

Pairs that are going to get connected show a substantially higher similarity

15

Link creation similarity (influence)

28/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino

Evolution of the similarity between pairs linking together at different times

Groups

Books

16

Summary

Theories to explain link creation◦Self-interest◦Mutual-interest◦Exchange Reciprocity in linking◦Contagion Social influence◦Balance Triangle closure◦Homophily For all profile features◦Proximity Geographical and on social graph

28/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino

Can we exploit the observations on these phenomena to predict future links?

1728/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino

• Dataset• Topical overlap

• Homophily and influence

• Link prediction

• Conclusions

Outline

18

Link prediction

Snapshots at time t and t+1Predict links created between t and t+1 given the

whole information at time tSupervised learning approach to combine profile

and structural features

28/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino

Pair Id Library sim. Common neighbors Will be connected?

1 0.56 18 1

2 0.11 5 0

3 0.71 36 1

Learning set example

FeaturesProfile

◦ Library (cosine)◦ Groups (cosine)◦ Groups (size)

◦ Gender {0,1}◦ Town {0,1}◦ Age (|age1 – age2|)◦ Country {0,1}◦ Vocabulary (cosine)◦ Wishlists (cosine)◦ Tagging behavior

28/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino 19

Structural◦ Common neighbors◦ Distance on graph◦ Preferential attachment

◦ Resource allocation

◦ Local path

)()( )(

1

yxzxy zks

)()( ykxksxy

]1,0[,32 AAS

)()( ||

1

yGxGgxy gs

20

Link prediction: preliminary results

28/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino

Rotation forest, 10-fold cross-validation, balanced sets

Rotation forest, 10-fold cross-validation, unbalanced sets

Precision Recall F-measure AUC

Structural 0.782 0.778 0.777 0.838

Topical 0.746 0.746 0.746 0.82

Complete 0.827 0.826 0.826 0.9

Complete

K-ratio Precision Recall F-measure AUC

1:1 0.827 0.826 0.826 0.9

1:10 0.934 0.94 0.933 0.897

1:100 0.988 0.991 0.987 0.86

2128/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino

• Dataset• Topical overlap

• Homophily and influence

• Link prediction

• Conclusions

Outline

22

Conclusions and future work

Theories on social network growth are verifiedCausality between similarity and social

connectionEffective link detection/prediction

◦Topical information seems to be predictive as well as structural information

RFC:◦ Link prediction sampling/evaluation procedure◦New challenges in prediction

28/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino

Speaker: Luca Maria [email protected]

www.di.unito.it/~aiello

Thank you for your attention!

Workshop on Data Driven Dynamical Networks

Reference:L. M. Aiello, A. Barrat, C. Cattuto, G. Ruffo, R. Schifanella "Link creation and profile alignment in the aNobii social network"In SocialCom'10: Proceedings of the 2nd IEEE International Conference on Social Computing, Minneapolis, MN, USA, August 2010