PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank...

77
PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego

Transcript of PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank...

Page 1: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

PageRank and Diffusionon Large Graphs

Alexander Tsiatas

University of California, San Diego

Page 2: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Graphs

• A mathematical model for a set of objects with pairwise relationships– Nodes: represent objects within a larger set

– Edges: characterize pairwise relationships between those objects

• Graphs are omnipresent in the real world –both natural and man-made– Examples…

Page 3: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Technological networks

• The Internet

– Nodes are routers, computers, switches

– Edges are physical wires between them

• World Wide Web

– Nodes are individual webpages

– Edges represent hyperlinks between pages

Page 4: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Technological networks

Subgraph of the Internet - From the OPTE Project, 2005

Page 5: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Social networks

• Nodes represent people in a population

• Edges represent friendship, co-authorship, physical contact

Paul Erdős

Page 6: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Communication networks

• Physical networks

– Nodes represent telephones

– Edges represent physical telephone wires

• Interaction networks

[email protected]

Page 7: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Transportation networks

Page 8: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Transportation networks

Page 9: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Biological networks

• Protein interactions

• The brain as a neural network

• Transcription regulatory networks

– Relationships between proteins and genes

• Chemical transmission between bacteria

• Many, many more

Page 10: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Different graphs, same properties

• “Small-world phenomenon”

– Short paths between pairs of nodes

– Adjacent nodes share more neighbors

• Power-law degree distribution

kkdv ]Pr[

Page 11: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Importance of graphs

• Interesting graphs

• Interesting problems

• Very important to have rigorous analysis

Page 12: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

One important barrier to analysis

• In the real world, these graphs are LARGE

– Internet: billions of webpages

– Facebook: over 350M active users

– Millions of road miles in the USA

• Many computations become intractable at this scale

Page 13: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.
Page 14: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.
Page 15: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Outline

• Diffusion

• Applications of diffusion

– Local graph partitioning

– Network epidemics

• Future directions

Page 16: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Problems characterized by diffusion

• Propagation on graphs– Adoption of new products

– Infections and epidemics

– Information dissemination

• Ranking– Finding the most important or relevant webpage,

research paper, etc.

• Routing– Internet, transportation, …

Page 17: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Random walks

• A model for diffusion

• If random walk is at node u, move to a neighbor v chosen uniformly at random

• W: random walk matrix

otherwise0

~ if/1]|Pr[

vuduv

u

Page 18: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Random walk

Dolphin social network – Lusseau et al. 2003

Page 19: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Random walk as a probability distribution

Page 20: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Random walk stationary distribution

Equal to the degree distribution

Page 21: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

PageRank• Originally conceived by Brin and Page (1998)

for ranking web pages

• Models an Internet user:

– With probability α, jump to a random web page

– With probability (1 – α), click a hyperlink

– PageRank is the stationary distribution

otherwise

~ if]|Pr[

)1(

n

n

d

nvu

uvu

Page 22: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

PageRank

• Model is applicable to any graph, not just the Web

• Captures importance and relationships between nodes

• Solution to equation:

– 1 parameter: α

– W is the random walk matrix

– pα is the PageRank vector

Wn

p1

p )1(

Page 23: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Personalized PageRank

• Random jumps: instead of jumping to a node uniformly at random, choose a node according to a prescribed distribution s:

• Personalized PageRank is the stationary distribution, the solution to

W psp )1(

otherwise

~ if]|Pr[

)1(

v

n

d

v

s

vusuv

u

Page 24: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

(Personalized) PageRank properties

• Geometrically weighted sum of random walks

• Linear in s

0

)()1(t

tt Wssp

Page 25: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Personalized PageRank

Starting distribution s

Page 26: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Personalized PageRank

α = 0.1

Page 27: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Personalized PageRank

α = 0.01

Page 28: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Personalized PageRank

α = 0.5

Page 29: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

With a different s

α = 0.1

Page 30: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Computing PageRank vectors

• Solve matrix equation

– Intractable for large graphs

• Iterate randomized model until convergence

– Fast convergence, but still intractable for large graphs

Page 31: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Approximating PageRank

• Andersen, Chung, Lang (2006)

• ε-approximate PageRank vector

PageRank vector for (s – r)

0 ≤ r(v) ≤ ε dv for all v in G

Page 32: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

ApproximatePR(s,α,ε)

• Computes p and r such that p is an

ε-approximate PageRank vector

– Starts with p = 0 and r = s

– Iteratively pushes PageRank from r to p until r is small enough

– Maintains p = prα(s – r)

Page 33: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

ApproximatePR(s,α,ε)

• Computes p and r such that p is an

ε-approximate PageRank vector

– Uses only local computations

– Running time: O(1/εα) independent of n

– vol(Supp p): at most 2/(1-α)ε

• vol(S): a measure of the size of a set S

Page 34: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Outline

• Diffusion

• Applications of diffusion

– Local graph partitioning

– Network epidemics

• Future directions

Page 35: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Local graph partitioning

• Small communities within a larger graph structure

– Online communities, social cliques, …

• If v is located within a small community, how can we find it?

• Goal: Design an algorithm to find the community containing v

– Only using local computations

Page 36: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

The Cheeger ratio

• Metric for graph cuts

• Cheeger constant: minimum Cheeger ratio for a subset S

))(vol),(volmin(

),(

SS

SSehS

Page 37: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

The Cheeger ratio

hS = 1hS = 0.0645

Page 38: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Relationship between diffusion and graph partitioning

• Suppose S has a low Cheeger ratio

• Start a diffusion process in S

– Unlikely to leave S

• Must be careful about this, though…

Page 39: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Where to start diffusion

start

start

Starting a random walk near the boundary of S will make it more likely to leave S.

Page 40: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Where to start diffusion

• Can’t start anywhere in S

– But there are many nodes in S which work

– Need to find a core of nodes in S

Page 41: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Where to start random walks

• Lovász, Simonovits (1990, 1993)

– If S has Cheeger ratio hS, then there is a set St with volume at least half of vol(S)

– Start a random walk in St, for t’ ≤ t steps

– The probability that the random walk is outside of S is at most thS

Page 42: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Where to start random walks

Random walk started at green nodes for 5 iterations

Random walk probability on red nodes is less than 5hS

S

Page 43: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Which initial distribution for personalized PageRank

• Andersen, Chung, Lang (2006)

– If S has Cheeger ratio hS, then there is a set Sα with volume at least half of vol(S)

– Calculate personalized PageRank with s contained in Sα

– The personalized PageRank outside of S is at most hS /α

Page 44: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Finding small cuts near v:algorithmic ideas

• Simulate diffusion processes

• Examine the results

– Are random walk probabilities or personalized PageRank vectors concentrated among a small set of nodes?

Page 45: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Sweep of a vector

• Suppose p is a vector with components corresponding to nodes in a graph

– Personalized PageRank, random walk probabilities

• Normalize p by dividing by the degree of each node

• Sort p in descending order

• Take the top k nodes to form a set Sk

• Take the Sk with minimal Cheeger ratio

Page 46: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Sweep of a random walk probability vector

• Spielman, Teng (2004)

• If a random walk probability is significantly larger than the stationary distribution, then a sweep over the vector finds a set with small Cheeger ratio.

• More effective if PageRank mixes slowly

Page 47: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Finding a small cut with random walks

• Spielman, Teng (2004)

• Algorithm Nibble

– Simulate a random walk for t0 steps, starting at v

• (t0 depends on size of G, desired Cheeger ratio h)

– Perform a sweep of the random walk probabilities

• Resulting set S (if one exists)

– Cheeger ratio smaller than a target h

Page 48: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Sweep of a personalizedPageRank vector

• Andersen, Chung, Lang (2006)

• If personalized PageRank on a set S is significantly larger than the stationary distribution, then S has low Cheeger ratio

• Can find an S via a sweep on the PageRankvector

• More effective if PageRank mixes slowly

))(vollog( SO

Page 49: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Finding a small cut with PageRank

• Andersen, Chung, Lang (2006)

• Algorithm PageRank-Nibble

– Compute an (approximate) personalized PageRankvector p, where the starting distribution is on v

– Perform a sweep of p

• Resulting set S (if one exists)

– Cheeger ratio is smaller than target h

– vol(S) is small

Page 50: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Running time

• Nibble:

• PageRank-Nibble:

• Only local computations

– Truncated random walks, approximate PageRank

54 /log|| hmSO

22 /log|| hmSO

Page 51: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Graph bipartitioning

Page 52: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

What makes a good bipartition?

• Many edges within each subset

• Not many edges between subsets

• Balanced

• Cheeger ratio is a good metric

Page 53: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Putting small cuts together

• Suppose that small cuts found by Nibble or PageRank-Nibble are found by diffusion from a vertex v in the core of a set S

• Suppose that S has small Cheeger ratio

• Can put them together to form a larger cut

– Algorithm Partition (Spielman, Teng 2004)

Page 54: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Putting small cuts together

• Result: a bipartition

– Small Cheeger ratio

– Volume not too large or too small

Page 55: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

A refinement

• Andersen and Chung (2008)

• Algorithm Local Partition

– Calculate an approximate personalized PageRankvector p

– Initialize a set S based on the normalized PageRank

– Repeatedly add to S by looking for sharp drops in PageRank, until S is large enough.

Page 56: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

A refinement

• Andersen and Chung (2008)

• Algorithm Local Partition– Resulting S has small Cheeger ratio

– Resulting S is not too large or too small• Target size x

– If the initial vertex v is within the core of a set Cwith small Cheeger ratio, then S has a large intersection with C.

• No need to combine smaller cuts

Page 57: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Running time of bipartitioning

• Partition: O(m log6 m / h5)

• PageRank-Partition: O(m log4 m / h2)

• Local Partition: O(m log2 m / h2)

Page 58: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Outline

• Diffusion

• Applications of diffusion

– Local graph partitioning

– Network epidemics

• Future directions

Page 59: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Network epidemics

• Disease in human and animal populations

– H1N1 flu, STD’s, SARS, etc.

• Viruses and worms on technological and social networks

– MySpace worms, e-mail attachment viruses, …

• Clear connection to diffusion on graphs

Page 60: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Model for network epidemics

• Contact process (1927)

– Continuous-time Markov process on G

– Each node in G is either healthy or infected

– Infected nodes cure according to vector c

– Healthy nodes become infected by neighbors at rate β

• “SIS” model vs. “SIR” model

• Traditionally, c = 1

Page 61: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Thresholds

• For the contact process on many graphs G, there is an infection threshold βc

– Existence and properties depend on graph structure

– If β < βc, then any infection will die out quickly

– If β > βc, then any infection will persist indefinitely

• First discovered on empirical data, later proven rigorously

Page 62: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Thresholds

• Shown to exist for many classes of graphs (Newman 2002, Ganesh et al. 2005)

– Star graphs

– Complete graphs

– Erdős-Rényi random graphs

– General graphs

• Depending on Cheeger ratio and eigenvalues

Page 63: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Thresholds on power-law graphs

• Threshold tends to zero very quickly for scale-free graphs

– Pastor-Sattoras, Vespignani 2001-2; May, Lloyd 2001

– This is for c = 1

• What about a non-constant c

– Interpret c as the amount of antidote to be given to each node in G

Page 64: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Idea: contact tracing

• Give extra antidote to neighbors of infected nodes

– c now depends on t

• Shown to be ineffective in many cases

– Dezső, Barabási 2002

– Tsmiring, Huerta 2003

– Kiss et al. 2005

Page 65: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Contact tracing

• Contact tracing does poorly on star graphs

– Borgs et al. 2008

• Star graphs embedded in scale-free networks

Page 66: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Contact tracing on star graphs

• Star graphs represent the ‘worst case’

– If center is infected, leaves easily infected

– If several leaves infected, center easily infected

• In order for contact tracing to be effective, the total amount of antidote in c must be super-linear in the size of the graph

• Otherwise, threshold goes to zero quickly with the size of the graph

Page 67: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Diffusion-based inoculation scheme

• Borgs et al. 2008

• Give each node antidote equal to its degree

– Random walk stationary distribution

• Any infection dies out in logarithmic time

• Total amount of antidote: vol(G)

– Within a constant factor of the best possible for expander graphs

– c no longer depends on t

Page 68: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Another diffusion-basedinoculation scheme

• Previous scheme based on random walks. What about PageRank?

– (Miller, Hyman 2007) Empirical study shows that distributing antidote according to PageRank is effective

– No rigorous analysis

Page 69: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Mathematical relationship between PageRank and network epidemics

• Suppose an infection starts in S, a subset of H, and each node in H receives antidote according to their degrees.

• Probability that the infection never leaves H is lower-bounded by s/β times the personalized PageRank on H

• If H has a low Cheeger ratio, then this probability is small.

Page 70: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

PageRank-based inoculation scheme

• To combat an infection starting from a set S:

– Find a set H that contains S in its core, with small Cheeger ratio

• This can be done with a local partitioning algorithm

– Give nodes H antidote according to their degrees

• Leads to a probabilistic guarantee that the infection will die out in logarithmic time

• vol(H) antidote vs. vol(G)

Page 71: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Outline

• Diffusion

• Applications of diffusion

– Local graph partitioning

– Network epidemics

• Future directions

Page 72: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Local clustering

• Directed graphs (some work done)

• Weighted graphs

• Better running time

• Simpler algorithms and analysis

Page 73: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Network epidemics

• Characterize cases when contact tracing is effective

• Smarter ways to find small sets H to inoculate

Page 74: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Game theoretic problems

• Game theory has a lot of problems relating to diffusion on graphs

– Adoption of new products

– Consensus game

Page 75: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

New and improved tools

• Better approximations for PageRank

• Heat kernel PageRank and other variants

– Approximation and computation

– Better algorithms?

Page 76: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Conclusions

• Random walks and PageRank lead to tractable algorithms for local graph partitioning

• Random walks and PageRank lead to effective means to combat network infection

Page 77: PageRank and Diffusion on Large Graphscseweb.ucsd.edu/~atsiatas/pr_diffusion_slides.pdf · PageRank and Diffusion on Large Graphs Alexander Tsiatas University of California, San Diego.

Questions?