Stanford Universitysnap.stanford.edu/class/cs322-2009/11-viral-annot.pdf · 2020. 1. 9. ·...
Transcript of Stanford Universitysnap.stanford.edu/class/cs322-2009/11-viral-annot.pdf · 2020. 1. 9. ·...
CS 322: (Social and Information) Network AnalysisJure LeskovecStanford University
Probabilistic models of network diffusion Probabilistic models of network diffusion How cascades spread in real life: Viral marketing Viral marketing BlogsG b hi Group membership
Last 20 minutes: mid term course evaluation Last 20 minutes: mid‐term course evaluation
10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 2
How do viruses/rumors propagate?How do viruses/rumors propagate? Will a flu‐like virus linger, or will it become extinct?
(Virus) birth rate β: probability than an infected(Virus) birth rate β: probability than an infected neighbor attacks
(Virus) death rate δ: probability that an infected node healsheals
HealthyN2
Prob. δ
NN1
2
Prob. β
Jure Leskovec, Stanford CS322: Network Analysis 3
Infected N310/27/2009
General scheme for epidemic models: General scheme for epidemic models:
S…susceptibleE…exposedI…infected
d
10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis
R…recoveredZ…immune
4
Assuming perfect g pmixing, i.e., a network is a complete graph
odes
The model dynamics:
mbe
r of n
oNu
time
10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis
Susceptible Infected Recovered
5
Susceptible Infective Susceptible (SIS) model Susceptible‐Infective‐Susceptible (SIS) model Cured nodes immediately become susceptible Virus “strength”: s = β / δ Virus strength : s = β / δ
Infected by neighbor with prob. β
Susceptible Infective
Jure Leskovec, Stanford CS322: Network Analysis 6
Cured internally with prob. δ
10/27/2009
f Assuming perfect mixing (complete graph): n
odes
graph):
ISIdS
Num
ber o
f n
ISIdI
dt
N
S sceptible Infected
ISIdt
time
10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis
Susceptible Infected
7
Epidemic threshold of a graph is a value of t Epidemic threshold of a graph is a value of t, such that: If strength s β / δ < t epidemic can not happen If strength s = β / δ < t epidemic can not happen (it eventually dies out)
Given a graph compute its epidemic threshold
Jure Leskovec, Stanford CS322: Network Analysis 810/27/2009
What should t depend on? What should t depend on? avg. degree? and/or highest degree? and/or variance of degree? and/or variance of degree? and/or third moment of degree? and/or diameter? and/or diameter?
Jure Leskovec, Stanford CS322: Network Analysis10/27/2009 9
[Wang et al. 2003]
We have no epidemic if: We have no epidemic if:
(Virus) Death Epidemic threshold
β/δ < τ = 1/ λ1,A
( )rate
β 1,A
(Virus) Birth rate largest eigenvalue
► λ A alone captures the property of the graph!
of adj. matrix A
Jure Leskovec, Stanford CS322: Network Analysis
► λ1,A alone captures the property of the graph!
10/27/2009 10
[Wang et al. 2003]
500 Oregonβ 0 001
10,900 nodes and 31,180 edges
400
d N
odes
β = 0.001
β/δ > τ(above threshold)
3 , g
200
300
f Inf
ecte
d
100
200
umbe
r of
β/δ = τ(at the threshold)
00 250 500 750 1000
N
β/δ < τ
Jure Leskovec, Stanford CS322: Network Analysis
Timeδ: 0.05 0.06 0.07
β(below threshold)
10/27/2009 11
Does it matter how many people are Does it matter how many people are initially infected?
Jure Leskovec, Stanford CS322: Network Analysis10/27/2009 12
[Leskovec et al., SDM ’07]
Bloggers write posts and refer (link) to other Bloggers write posts and refer (link) to other posts and the information propagates
1310/27/2009 Jure Leskovec, Stanford CS322: Network Analysis
Posts
Blogs
Time
Information cascade
D t Bl
Time ordered
hyperlinks
Data – Blogs: We crawled 45,000 blogs for 1 year 10 million posts and 350,000 cascades
Jure Leskovec, Stanford CS322: Network Analysis10/27/2009 14
[Leskovec et al., TWEB ’07]
Senders and followers of recommendations receive discounts on products
10% credit 10% off
• Data – Incentivized Viral Marketing program• 16 million recommendations
illi l
Jure Leskovec, Stanford CS322: Network Analysis
• 4 million people• 500,000 products
10/27/2009 15
[Backstrom et al., KDD ’06]
Use social networks where people belong to Use social networks where people belong to explicitly defined groups
Each group defines a behavior that diffuses Each group defines a behavior that diffuses
Data – LiveJournal: On‐line blogging community with friendship links and user‐defined groupsg p Over a million users update content each month Over 250,000 groups to joinOver 250,000 groups to join
Jure Leskovec, Stanford CS322: Network Analysis10/27/2009 16
Prob of adoption depends on the number of Prob. of adoption depends on the number of friends who have adopted [Bass ‘69, Granovetter ’78] What is the shape?What is the shape? Distinction has consequences for models and algorithms
on on
of a
doptio
of a
doptio
k = number of friends adopting
Prob
. o
k = number of friends adopting
Prob
. o
k = number of friends adopting k = number of friends adopting
Diminishing returns? Critical mass?10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 17
[Leskovec et al., TWEB ’07]
DVD recommendations
asing
0.090.1
DVD recommendations(8.2 million observations)
of purcha
0 050.060.070.08
bability o
0 020.030.040.05
Prob
00.010.02
0 10 20 30 40
18
0 10 20 30 40# recommendations received
10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis
[Backstrom et al., KDD ’06]
LiveJournal community membership LiveJournal community membership oining
rob. of jo
k ( b f f i d i th it )
Pr
Jure Leskovec, Stanford CS322: Network Analysis
k (number of friends in the community)
10/27/2009 19
[Kossinets‐Watts ‘06]
Sending email:Sending email: Email network of large university Prob. of a link as a function of # of common friends
ob. o
f em
Pro
Jure Leskovec, Stanford CS322: Network Analysis
k (number of common friends)
10/27/2009 20
For viral marketing: For viral marketing: We see that node v receiving the i‐threcommendation and then purchased the productp p
For communities: At time t we see the behavior of node v’s friends
Questions: When did v become aware of recommendations or f i d ’ b h i ?friends’ behavior? When did it translate into a decision by v to act? How long after this decision did v act? How long after this decision did v act?
Jure Leskovec, Stanford CS322: Network Analysis10/27/2009 21
Dependence on number of friends Dependence on number of friends Consider: connectedness of friends x and y have three friends in the group x and y have three friends in the group x’s friends are independent
’ f i d ll t d x y y’s friends are all connected Who is more likely to join?
x y
Jure Leskovec, Stanford CS322: Network Analysis10/27/2009 22
Competing sociological theories x y Competing sociological theories Information argument [Granovetter ‘73] S i l it l t [C l ’88]
x y
Social capital argument [Coleman ’88]
Information argument:Information argument: Unconnected friends give independent support
Social capital argument:Social capital argument: Safety/truest advantage in having friends who know each otherknow each other
Jure Leskovec, Stanford CS322: Network Analysis10/27/2009 23
[Backstrom et al. KDD ‘06]
LiveJournal: 1 million users, 250,000 groups
Social capital argument wins!p gProb. of joining increaseswith
adjacent members.
10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 24
Large anonymous online retailerLarge anonymous online retailer (June 2001 to May 2003)
15 646 121 d ti 15,646,121 recommendations 3,943,084 distinct customers 548 523 products recommended548,523 products recommended Products belonging to 4 product groups: books DVDs music VHS
2510/27/2009 Jure Leskovec, Stanford CS322: Network Analysis
t < t < < t t1 < t2 < … < tn
t3legend
bought but didn’ti di t
t1
receive a discount
bought andreceived a discount
t2received a recommendationbut didn’t buy
26
t4
10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis
There are relatively few DVD titles, but DVDs account for ~ 50% of recommendations. Recommendations per personp p
DVD: 10 books and music: 2 VHS: 1
Recommendations per purchase Recommendations per purchase books: 69 DVDs: 108 music: 136 VHS: 203
Overall there are 3.69 recommendations per node on 3.85 different products.
Music recommendations reached about the same number of people as DVDs but used only p p y1/5 as many recommendations
Book recommendations reached by far the most people – 2.8 million. All networks have a very small number of unique edges. For books, videos and music the
number of unique edges is smaller than the number of nodes – the networks are highlynumber of unique edges is smaller than the number of nodes – the networks are highly disconnected
2710/27/2009 Jure Leskovec, Stanford CS322: Network Analysis
What role does the product category play? What role does the product category play?
products customers recommenda-tions edges
buy + getdiscount
buy + no discounttions discount discount
Book 103,161 2,863,977 5,741,611 2,097,809 65,344 17,769
DVD 19,829 805,285 8,180,393 962,341 17,232 58,189
Music 393,598 794,148 1,443,847 585,738 7,837 2,739
Video 26,131 239,583 280,270 160,683 909 467
F ll 542 719 3 943 084 15 646 121 3 153 676 91 322 79 164Full 542,719 3,943,084 15,646,121 3,153,676 91,322 79,164
peoplerecommendations
Jure Leskovec, Stanford CS322: Network Analysis
highlow
10/27/2009 28
Some products are easier to recommend than Some products are easier to recommend than othersd t t number of buy forward tproduct category number of buy
bitsforward
recommendations percent
Book 65,391 15,769 24.2
DVD 16,459 7,336 44.6
Music 7,843 1,824 23.3Music 7,843 1,824 23.3
Video 909 250 27.6
Total 90 602 25 179 27 8
Jure Leskovec, Stanford CS322: Network Analysis
Total 90,602 25,179 27.8
10/27/2009 29
Does sending more recommendations Does sending more recommendations influence more purchases?
5
6
7
ases
3
4
ber o
f Pur
cha
20 40 60 80 100 120 1400
1
2
Num
Jure Leskovec, Stanford CS322: Network Analysis
20 40 60 80 100 120 140Outgoing Recommendations
10/27/2009 30
What is the effectiveness of subsequent What is the effectiveness of subsequent recommendations?
0 07
0.06
0.07
ying
0.04
0.05
babi
lity
of b
u
0 02
0.03Pro
b
Jure Leskovec, Stanford CS322: Network Analysis
5 10 15 20 25 30 35 400.02
Exchanged recommendations
10/27/2009 31
consider successful recommendations in terms of av # senders of recommendations per book categoryav. # senders of recommendations per book category av. # of recommendations accepted
books overall have a 3% success rate (2% with discount, 1% without)
lower than average success rate (significant at p=0 01 level) lower than average success rate (significant at p=0.01 level) fiction romance (1.78), horror (1.81) teen (1.94), children’s books (2.06) i (2 30) i fi (2 34) t d th ill (2 40) comics (2.30), sci‐fi (2.34), mystery and thrillers (2.40)
nonfiction sports (2.26) home & garden (2.26) travel (2 39) travel (2.39)
higher than average success rate (statistically significant) professional & technical medicine (5.68) professional & technical (4 54) professional & technical (4.54) engineering (4.10), science (3.90), computers & internet (3.61) law (3.66), business & investing (3.62)
3210/27/2009 Jure Leskovec, Stanford CS322: Network Analysis
47 000 customers responsible for the 2 5 out of47,000 customers responsible for the 2.5 out of 16 million recommendations in the system
29% success rate per recommender of an anime DVD
Giant component covers 19% of the nodes
Overall, recommendations for DVDs are more likely to result in a purchase (7%), but the anime
i dcommunity stands out
Jure Leskovec, Stanford CS322: Network Analysis10/27/2009 33
Variable transformation Coefficient
const -0.940 ***# recommendations ln(r) 0 426 ***# recommendations ln(r) 0.426 # senders ln(ns) -0.782 ***# recipients ln(n ) -1 307 ***# recipients ln(nr) 1.307 product price ln(p) 0.128 ***# reviews ln(v) -0 011 ***# reviews ln(v) -0.011 avg. rating ln(t) -0.027 *
R2 0 74
10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 34
R2 0.74significance at the 0.01 (***), 0.05 (**) and 0.1 (*) levels
94% of users make first recommendation without having greceived one previously
Size of giant connected component increases from 1% to 2 % f h k (100 20 ) ll!2.5% of the network (100,420 users) – small!
Some sub‐communities are better connected24% f 18 000 f V 24% out of 18,000 users for westerns on DVD
26% of 25,000 for classics on DVD 19% of 47,000 for anime (Japanese animated film) on DVD
Others are just as disconnected 3% of 180,000 home and gardening 2‐7% for children’s and fitness DVDs
Jure Leskovec, Stanford CS322: Network Analysis10/27/2009 35
Products suited for Viral Marketing: small and tightly knit community few reviews, senders, and recipients but sending more recommendations helps
pricey products pricey products rating doesn’t play as much of a role
Observations for future diffusion models: purchase decision more complex than threshold or simple infection influence saturates as the number of contacts expands links user effectiveness if they are overused
Conditions for successful recommendations: professional and organizational contexts discounts on expensive items small tightly knit communities small, tightly knit communities
Jure Leskovec, Stanford CS322: Network Analysis10/27/2009 36
How big are cascades? How big are cascades? What are the building blocks of cascades?blocks of cascades?
973
938
Jure Leskovec, Stanford CS322: Network Analysis
Medical guide book DVD10/27/2009 37
Given a (social) network Given a (social) network A process by spreading over the network creates a graph (a tree)creates a graph (a tree)
Cascade (propagation graph)
Social network
Jure Leskovec, Stanford CS322: Network Analysis
Let’s count cascades
10/27/2009 38
is the most common cascade subgraph is the most common cascade subgraph It accounts for ~75% cascades in books, CD and VHS, only 12% of DVD cascades, y
is 6 (1.2 for DVD) times more frequent than
For DVDs is more frequent than Chains ( ) are more frequent than
i f t th lli i is more frequent than a collision ( ) (but collision has less edges)
Late split ( ) is more frequent than Late split ( ) is more frequent than
Jure Leskovec, Stanford CS322: Network Analysis10/27/2009 39
Stars (“no propagation”) Stars ( no propagation )
Bipartite cores (“common friends”)
Nodes having same friends Nodes having same friends
Jure Leskovec, Stanford CS322: Network Analysis10/27/2009 40
t d ff bookssteep drop‐off books
106= 1.8e6 x-4.98
104
ount
very few large cascades102Co
100 101 102100
x = Cascade size (number of nodes)
Jure Leskovec, Stanford CS322: Network Analysis
x = Cascade size (number of nodes)
10/27/2009 41
DVD cascades can grow large DVD cascades can grow large Possibly as a result of websites where people sign up to exchange recommendationssign up to exchange recommendations
shallow drop off – fat tail~ x-1.56
104
Coun
t
a number of large cascades102
Jure Leskovec, Stanford CS322: Network Analysis
100 101 102 103100
x = Cascade size (number of nodes)10/27/2009 42
The probability of observing a cascade on x p y gnodes follows: p(x) ~ x‐2
Coun
t
Jure Leskovec, Stanford CS322: Network Analysis
x = Cascade size (number of nodes)10/27/2009 43
Cascade sizes follow a heavy‐tailed distributiony Viral marketing: Books: steep drop‐off: power‐law exponent ‐5 DVDs: larger cascades: exponent ‐1.5s a ge cascades e po e 5
Blogs: Power‐law exponent ‐2
However, it is not a simple branching processo e e , s o a s p e b a c g p ocess A simple branching process (a on k‐ary tree): Every node infects each of k of its neighbors with prob. pgives exponential cascade size distributiongives exponential cascade size distribution
Questions: What role does the underlying social network play? C k t t d li ti d ti Can make a step towards more realistic cascade generation (propagation) model?
Jure Leskovec, Stanford CS322: Network Analysis10/27/2009 44
1) Randomly pick blog to infect add to cascade
2) Infect each in‐linked neighbor with probability infect, add to cascade.
B1 B2
11
neighbor with probability
B1B1
B1 B2
11
B4B3
2
1 3
1
B4B3
2
1 3
1
4
3) Add infected neighbors to cascade.
4) Set node infected in (i) to uninfected.
B1 B2
11
21
B1 B2
11
21
B1 B1
Jure Leskovec, Stanford CS322: Network Analysis
B4B31 3B4
B31 3B4 B4
10/27/2009 45
Generative model
Coun
t
Coun
tproduces realistic cascades
Cascade size Cascade node in‐degree
β=0.025
ount
nt
Co
Cou
Jure Leskovec, Stanford CS322: Network Analysis
Most frequent cascades Size of star cascade Size of chain cascade
10/27/2009 46
Blogs – information epidemics Blogs – information epidemics Which are the influential/infectious blogs?
Viral marketing Who are the trendsetters?Who are the trendsetters? Influential people?
Disease spreading Where to place monitoring stations to detect p gepidemics?
4710/27/2009 Jure Leskovec, Stanford CS322: Network Analysis