Large Graph Mining: Power Tools and a Practitioner’s...

28
KDD'09 Faloutsos, Miller, Tsourakakis P6-1 CMU SCS Large Graph Mining: Power Tools and a Practitioner’s guide Task 6: Virus/Influence Propagation Faloutsos, Miller,Tsourakakis CMU

Transcript of Large Graph Mining: Power Tools and a Practitioner’s...

Page 1: Large Graph Mining: Power Tools and a Practitioner’s guidectsourak/KDD09/FOILS/task6-virusProp.pdf · Shashank Pandit, CMU NetProbe: A Fast and Scalable System for Fraud Detection

KDD'09 Faloutsos, Miller, Tsourakakis P6-1

CMU SCS

Large Graph Mining:

Power Tools and a Practitioner’s guide

Task 6: Virus/Influence Propagation

Faloutsos, Miller,Tsourakakis

CMU

Page 2: Large Graph Mining: Power Tools and a Practitioner’s guidectsourak/KDD09/FOILS/task6-virusProp.pdf · Shashank Pandit, CMU NetProbe: A Fast and Scalable System for Fraud Detection

KDD'09 Faloutsos, Miller, Tsourakakis P6-2

CMU SCS

Outline

• Introduction – Motivation

• Task 1: Node importance

• Task 2: Community detection

• Task 3: Recommendations

• Task 4: Connection sub-graphs

• Task 5: Mining graphs over time

• Task 6: Virus/influence propagation

• Task 7: Spectral graph theory

• Task 8: Tera/peta graph mining: hadoop

• Observations – patterns of real graphs

• Conclusions

Page 3: Large Graph Mining: Power Tools and a Practitioner’s guidectsourak/KDD09/FOILS/task6-virusProp.pdf · Shashank Pandit, CMU NetProbe: A Fast and Scalable System for Fraud Detection

KDD'09 Faloutsos, Miller, Tsourakakis P6-3

CMU SCS

Detailed outline

• Epidemic threshold

– Problem definition

– Analysis

– Experiments

• Fraud detection in e-bay

Page 4: Large Graph Mining: Power Tools and a Practitioner’s guidectsourak/KDD09/FOILS/task6-virusProp.pdf · Shashank Pandit, CMU NetProbe: A Fast and Scalable System for Fraud Detection

KDD'09 Faloutsos, Miller, Tsourakakis P6-4

CMU SCS

Virus propagation

• How do viruses/rumors propagate?

• Blog influence?

• Will a flu-like virus linger, or will it

become extinct soon?

Page 5: Large Graph Mining: Power Tools and a Practitioner’s guidectsourak/KDD09/FOILS/task6-virusProp.pdf · Shashank Pandit, CMU NetProbe: A Fast and Scalable System for Fraud Detection

KDD'09 Faloutsos, Miller, Tsourakakis P6-5

CMU SCS

The model: SIS

• ‘Flu’ like: Susceptible-Infected-Susceptible

• Virus ‘strength’ s= β/δ

Infected

Healthy

NN1

N3

N2

Prob. ββββ

Prob. β

Prob. δδδδ

Page 6: Large Graph Mining: Power Tools and a Practitioner’s guidectsourak/KDD09/FOILS/task6-virusProp.pdf · Shashank Pandit, CMU NetProbe: A Fast and Scalable System for Fraud Detection

KDD'09 Faloutsos, Miller, Tsourakakis P6-6

CMU SCS

Epidemic threshold ττττ

of a graph: the value of τ, such that

if strength s = β / δ < τ

an epidemic can not happen

Thus,

• given a graph

• compute its epidemic threshold

Page 7: Large Graph Mining: Power Tools and a Practitioner’s guidectsourak/KDD09/FOILS/task6-virusProp.pdf · Shashank Pandit, CMU NetProbe: A Fast and Scalable System for Fraud Detection

KDD'09 Faloutsos, Miller, Tsourakakis P6-7

CMU SCS

Epidemic threshold ττττ

What should τ depend on?

• avg. degree? and/or highest degree?

• and/or variance of degree?

• and/or third moment of degree?

• and/or diameter?

Page 8: Large Graph Mining: Power Tools and a Practitioner’s guidectsourak/KDD09/FOILS/task6-virusProp.pdf · Shashank Pandit, CMU NetProbe: A Fast and Scalable System for Fraud Detection

KDD'09 Faloutsos, Miller, Tsourakakis P6-8

CMU SCS

Epidemic threshold

• [Theorem 1] We have no epidemic, if

β/δ <τ = 1/ λ1,A

Page 9: Large Graph Mining: Power Tools and a Practitioner’s guidectsourak/KDD09/FOILS/task6-virusProp.pdf · Shashank Pandit, CMU NetProbe: A Fast and Scalable System for Fraud Detection

KDD'09 Faloutsos, Miller, Tsourakakis P6-9

CMU SCS

Epidemic threshold

• [Theorem 1] We have no epidemic (*), if

β/δ <τ = 1/ λ1,A

largest eigenvalue

of adj. matrix Aattack prob.

recovery prob.epidemic threshold

Proof: [Wang+03](*) under mild, conditional-independence assumptions

Page 10: Large Graph Mining: Power Tools and a Practitioner’s guidectsourak/KDD09/FOILS/task6-virusProp.pdf · Shashank Pandit, CMU NetProbe: A Fast and Scalable System for Fraud Detection

KDD'09 Faloutsos, Miller, Tsourakakis P6-10

CMU SCS

Beginning of proof

Healthy @ t+1:

- ( healthy or healed )

- and not attacked @ t

Let: p(i , t) = Prob node i is sick @ t+1

1 - p(i, t+1 ) = (1 – p(i, t) + p(i, t) * δ ) *

Πj (1 – β aji * p(j , t) )

Below threshold, if the above non-linear dynamical systemabove is ‘stable’ (eigenvalue of Hessian < 1 )

Page 11: Large Graph Mining: Power Tools and a Practitioner’s guidectsourak/KDD09/FOILS/task6-virusProp.pdf · Shashank Pandit, CMU NetProbe: A Fast and Scalable System for Fraud Detection

KDD'09 Faloutsos, Miller, Tsourakakis P6-11

CMU SCS

Epidemic threshold for various

networks

Formula includes older results as special cases:

• Homogeneous networks [Kephart+White]

– λ1,A = <k>; τ = 1/<k> (<k> : avg degree)

• Star networks (d = degree of center)

– λ1,A = sqrt(d); τ = 1/ sqrt(d)

• Infinite power-law networks

– λ1,A = ∞; τ = 0 ; [Barabasi]

Page 12: Large Graph Mining: Power Tools and a Practitioner’s guidectsourak/KDD09/FOILS/task6-virusProp.pdf · Shashank Pandit, CMU NetProbe: A Fast and Scalable System for Fraud Detection

KDD'09 Faloutsos, Miller, Tsourakakis P6-12

CMU SCS

Epidemic threshold

• [Theorem 2] Below the epidemic threshold,

the epidemic dies out exponentially

Page 13: Large Graph Mining: Power Tools and a Practitioner’s guidectsourak/KDD09/FOILS/task6-virusProp.pdf · Shashank Pandit, CMU NetProbe: A Fast and Scalable System for Fraud Detection

KDD'09 Faloutsos, Miller, Tsourakakis P6-13

CMU SCS

Detailed outline

• Epidemic threshold

– Problem definition

– Analysis

– Experiments

• Fraud detection in e-bay

Page 14: Large Graph Mining: Power Tools and a Practitioner’s guidectsourak/KDD09/FOILS/task6-virusProp.pdf · Shashank Pandit, CMU NetProbe: A Fast and Scalable System for Fraud Detection

KDD'09 Faloutsos, Miller, Tsourakakis P6-14

CMU SCS

Current prediction vs. previous

The formula’s predictions are more accurate

Oregon Star

PL-3

OurOur

Nu

mb

er o

f

infe

cted

nod

es

β/δ β/δ

PL-3

Page 15: Large Graph Mining: Power Tools and a Practitioner’s guidectsourak/KDD09/FOILS/task6-virusProp.pdf · Shashank Pandit, CMU NetProbe: A Fast and Scalable System for Fraud Detection

KDD'09 Faloutsos, Miller, Tsourakakis P6-15

CMU SCS

Experiments (Oregon)

0

100

200

300

400

500

0 250 500 750 1000

Time

Nu

mb

er

of

Infe

cte

d N

od

es

δ: 0.05 0.06 0.07

Oregon

β = 0.001

ββββ/δδδδ > τ

(above threshold)

ββββ/δδδδ = τ

(at the threshold)

ββββ/δδδδ < τ

(below threshold)

Page 16: Large Graph Mining: Power Tools and a Practitioner’s guidectsourak/KDD09/FOILS/task6-virusProp.pdf · Shashank Pandit, CMU NetProbe: A Fast and Scalable System for Fraud Detection

KDD'09 Faloutsos, Miller, Tsourakakis P6-16

CMU SCS

SIS simulation - # infected nodes vs

time

Time (linear scale)

#inf.(log scale)

above

at

below

Log - Lin

Page 17: Large Graph Mining: Power Tools and a Practitioner’s guidectsourak/KDD09/FOILS/task6-virusProp.pdf · Shashank Pandit, CMU NetProbe: A Fast and Scalable System for Fraud Detection

KDD'09 Faloutsos, Miller, Tsourakakis P6-17

CMU SCS

SIS simulation - # infected nodes vs

time

Log - Lin

Time (linear scale)

#inf.(log scale)

above

at

below

Exponentialdecay

Page 18: Large Graph Mining: Power Tools and a Practitioner’s guidectsourak/KDD09/FOILS/task6-virusProp.pdf · Shashank Pandit, CMU NetProbe: A Fast and Scalable System for Fraud Detection

KDD'09 Faloutsos, Miller, Tsourakakis P6-18

CMU SCS

SIS simulation - # infected nodes vs

time

Log - Log

Time (log scale)

#inf.(log scale)

above

at

below

Page 19: Large Graph Mining: Power Tools and a Practitioner’s guidectsourak/KDD09/FOILS/task6-virusProp.pdf · Shashank Pandit, CMU NetProbe: A Fast and Scalable System for Fraud Detection

KDD'09 Faloutsos, Miller, Tsourakakis P6-19

CMU SCS

SIS simulation - # infected nodes vs

time

Time (log scale)

#inf.(log scale)

above

at

below

Log - Log

Power-lawDecay (!)

Page 20: Large Graph Mining: Power Tools and a Practitioner’s guidectsourak/KDD09/FOILS/task6-virusProp.pdf · Shashank Pandit, CMU NetProbe: A Fast and Scalable System for Fraud Detection

KDD'09 Faloutsos, Miller, Tsourakakis P6-20

CMU SCS

Detailed outline

• Epidemic threshold

• Fraud detection in e-bay

Page 21: Large Graph Mining: Power Tools and a Practitioner’s guidectsourak/KDD09/FOILS/task6-virusProp.pdf · Shashank Pandit, CMU NetProbe: A Fast and Scalable System for Fraud Detection

KDD'09 Faloutsos, Miller, Tsourakakis P6-21

CMU SCS

E-bay Fraud detection

w/ Polo Chau &

Shashank Pandit, CMU

NetProbe: A Fast and Scalable System for Fraud Detection in Online Auction Networks, S. Pandit, D. H. Chau, S. Wang, and C.

Faloutsos (WWW'07), pp. 201-210

Page 22: Large Graph Mining: Power Tools and a Practitioner’s guidectsourak/KDD09/FOILS/task6-virusProp.pdf · Shashank Pandit, CMU NetProbe: A Fast and Scalable System for Fraud Detection

KDD'09 Faloutsos, Miller, Tsourakakis P6-22

CMU SCS

E-bay Fraud detection

• lines: positive feedbacks

• would you buy from him/her?

Page 23: Large Graph Mining: Power Tools and a Practitioner’s guidectsourak/KDD09/FOILS/task6-virusProp.pdf · Shashank Pandit, CMU NetProbe: A Fast and Scalable System for Fraud Detection

KDD'09 Faloutsos, Miller, Tsourakakis P6-23

CMU SCS

E-bay Fraud detection

• lines: positive feedbacks

• would you buy from him/her?

• or him/her?

Page 24: Large Graph Mining: Power Tools and a Practitioner’s guidectsourak/KDD09/FOILS/task6-virusProp.pdf · Shashank Pandit, CMU NetProbe: A Fast and Scalable System for Fraud Detection

KDD'09 Faloutsos, Miller, Tsourakakis P6-24

CMU SCS

E-bay Fraud detection - NetProbe

Belief Propagation gives:

Page 25: Large Graph Mining: Power Tools and a Practitioner’s guidectsourak/KDD09/FOILS/task6-virusProp.pdf · Shashank Pandit, CMU NetProbe: A Fast and Scalable System for Fraud Detection

KDD'09 Faloutsos, Miller, Tsourakakis P6-25

CMU SCS

Conclusions

• λ1,A : Eigenvalue of adjacency matrix

determines the survival of a flu-like virus

– It gives a measure of how well connected is the

graph (~ # paths – see Task 7, later)

– May guide immunization policies

• [Belief Propagation: a powerful algo]

Page 26: Large Graph Mining: Power Tools and a Practitioner’s guidectsourak/KDD09/FOILS/task6-virusProp.pdf · Shashank Pandit, CMU NetProbe: A Fast and Scalable System for Fraud Detection

KDD'09 Faloutsos, Miller, Tsourakakis P6-26

CMU SCS

References

• D. Chakrabarti, Y. Wang, C. Wang, J.

Leskovec, and C. Faloutsos, Epidemic

Thresholds in Real Networks, in ACM

TISSEC, 10(4), 2008

• Ganesh, A., Massoulie, L., and Towsley, D.,

2005. The effect of network topology on the

spread of epidemics. In INFOCOM.

Page 27: Large Graph Mining: Power Tools and a Practitioner’s guidectsourak/KDD09/FOILS/task6-virusProp.pdf · Shashank Pandit, CMU NetProbe: A Fast and Scalable System for Fraud Detection

KDD'09 Faloutsos, Miller, Tsourakakis P6-27

CMU SCS

References (cont’d)

• Hethcote, H. W. 2000. The mathematics of

infectious diseases. SIAM Review 42, 599–

653.

• Hethcote, H. W. AND Yorke, J. A. 1984.

Gonorrhea Transmission Dynamics and

Control. Vol. 56. Springer. Lecture Notes in

Biomathematics.

Page 28: Large Graph Mining: Power Tools and a Practitioner’s guidectsourak/KDD09/FOILS/task6-virusProp.pdf · Shashank Pandit, CMU NetProbe: A Fast and Scalable System for Fraud Detection

KDD'09 Faloutsos, Miller, Tsourakakis P6-28

CMU SCS

References (cont’d)

• Y. Wang, D. Chakrabarti, C. Wang and C.

Faloutsos, Epidemic Spreading in Real

Networks: An Eigenvalue Viewpoint, in

SRDS 2003 (pages 25-34), Florence, Italy