Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random networks

31
Asymptotic behaviour of ranking algorithms in directed random networks Nelly Litvak University of Twente, The Netherlands joint work with Mariana Olvera-Cravioto and Ningyuan Chen Workshop on Extremal Graph Theory Moscow, 06-06-2014

Transcript of Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random networks

Page 1: Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random networks

Asymptotic behaviour of rankingalgorithms in directed randomnetworks

Nelly Litvak

University of Twente, The Netherlands

joint work withMariana Olvera-Cravioto and Ningyuan Chen

Workshop on Extremal Graph TheoryMoscow, 06-06-2014

Page 2: Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random networks

Power law of PageRank

Pandurangan, Raghavan, Upfal, 2002.

[ Nelly Litvak, SOR group ] 2/25

Page 3: Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random networks

Power laws in complex networks

I Power laws: Internet, WWW, social networks, biologicalnetworks, etc...

I degree of the node = # (in-/out-) linksI [fraction nodes degree at least k] = pk ,I Power law: pk ≈ const · k−α, α > 0.I Power law is the model for high variability: some nodes (hubs)

have extremely many connectionsI log pk = log(const) − α log kI Straight line on the log-log scale

[ Nelly Litvak, SOR group ] 3/25

Page 4: Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random networks

Power laws in complex networks

I Power laws: Internet, WWW, social networks, biologicalnetworks, etc...

I degree of the node = # (in-/out-) linksI [fraction nodes degree at least k] = pk ,I Power law: pk ≈ const · k−α, α > 0.I Power law is the model for high variability: some nodes (hubs)

have extremely many connections

I log pk = log(const) − α log kI Straight line on the log-log scale

[ Nelly Litvak, SOR group ] 3/25

Page 5: Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random networks

Power laws in complex networks

I Power laws: Internet, WWW, social networks, biologicalnetworks, etc...

I degree of the node = # (in-/out-) linksI [fraction nodes degree at least k] = pk ,I Power law: pk ≈ const · k−α, α > 0.I Power law is the model for high variability: some nodes (hubs)

have extremely many connectionsI log pk = log(const) − α log k

I Straight line on the log-log scale

[ Nelly Litvak, SOR group ] 3/25

Page 6: Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random networks

Power laws in complex networks

I Power laws: Internet, WWW, social networks, biologicalnetworks, etc...

I degree of the node = # (in-/out-) linksI [fraction nodes degree at least k] = pk ,I Power law: pk ≈ const · k−α, α > 0.I Power law is the model for high variability: some nodes (hubs)

have extremely many connectionsI log pk = log(const) − α log kI Straight line on the log-log scale

[ Nelly Litvak, SOR group ] 3/25

Page 7: Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random networks

Regular variation

I X is regularly varying random variable with index α

P(X > x) = L(x)x−α, x > 0

I L(x) is slowly varying:for every t > 0, L(tx)/L(x)→ 1 as x →∞

[ Nelly Litvak, SOR group ] 4/25

Page 8: Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random networks

Google PageRank

I S. Brin, L. Page, The anatomy of a large-scale hypertextualWeb search engine (1998)

I PageRank Ri of page i = 1, . . . , n is defined as a stationarydistribution of a random walk with jumps:

Ri =∑j → i

c

djRj + (1 − c)bi , i = 1, . . . , n

I dj = # out-links of page j

I c ∈ (0, 1), originally 0.85, probability of a random jump

I bi probability to jump to page i , originally, bi = 1/n

I personalized PageRank: bi 6= 1/n

[ Nelly Litvak, SOR group ] 5/25

Page 9: Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random networks

Google PageRank

I S. Brin, L. Page, The anatomy of a large-scale hypertextualWeb search engine (1998)

I PageRank Ri of page i = 1, . . . , n is defined as a stationarydistribution of a random walk with jumps:

Ri =∑j → i

c

djRj + (1 − c)bi , i = 1, . . . , n

I dj = # out-links of page j

I c ∈ (0, 1), originally 0.85, probability of a random jump

I bi probability to jump to page i , originally, bi = 1/n

I personalized PageRank: bi 6= 1/n

[ Nelly Litvak, SOR group ] 5/25

Page 10: Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random networks

Google PageRank

I S. Brin, L. Page, The anatomy of a large-scale hypertextualWeb search engine (1998)

I PageRank Ri of page i = 1, . . . , n is defined as a stationarydistribution of a random walk with jumps:

Ri =∑j → i

c

djRj + (1 − c)bi , i = 1, . . . , n

I dj = # out-links of page j

I c ∈ (0, 1), originally 0.85, probability of a random jump

I bi probability to jump to page i , originally, bi = 1/n

I personalized PageRank: bi 6= 1/n

[ Nelly Litvak, SOR group ] 5/25

Page 11: Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random networks

Examples of applications

Ri =∑j → i

c

djRj + (1 − c)bi , i = 1, . . . , n

I Topic-sensitive search (Haveliwala, 2002);I Spam detection (Gyongyi et al., 2004)I Finding related entities (Chakrabarti, 2007);I Link prediction (Liben-Nowell and Kleinberg, 2003;

Voevodski, Teng, Xia, 2009);I Finding local cuts (Andersen, Chung, Lang, 2006);I Graph clustering (Tsiatas, Chung, 2010);I Person name disambiguation

(Smirnova, Avrachenkov, Trousse, 2010);I Finding most influential people in Wikipedia

(Shepelyansky et al, 2010, 2013)

[ Nelly Litvak, SOR group ] 6/25

Page 12: Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random networks

Stochastic model for PageRank

I Rescale: Ri → nRi , bi → nbi

Ri =∑j → i

c

djRj + (1 − c)bi , i = 1, . . . , n

I Stochastic equation:

Rd= c

N∑j=1

1

DjRj + cp0 + (1 − c)B

I N: in-degree of the randomly chosen pageI D: out-degree of page that links to the randomly chosen pageI p0: fraction of pages with out-degree zeroI Rj is distributed as R; N,D,Rj are independent; N and B can

be dependentI We can denote Q = cp0 + (1 − c)B, Cj = c/Dj .

[ Nelly Litvak, SOR group ] 7/25

Page 13: Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random networks

Stochastic model for PageRank

I Rescale: Ri → nRi , bi → nbi

Ri =∑j → i

c

djRj + (1 − c)bi , i = 1, . . . , n

I Stochastic equation:

Rd= c

N∑j=1

1

DjRj + cp0 + (1 − c)B

I N: in-degree of the randomly chosen pageI D: out-degree of page that links to the randomly chosen pageI p0: fraction of pages with out-degree zeroI Rj is distributed as R; N,D,Rj are independent; N and B can

be dependentI We can denote Q = cp0 + (1 − c)B, Cj = c/Dj .

[ Nelly Litvak, SOR group ] 7/25

Page 14: Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random networks

Results for stochastic recursion

Rd=

N∑j=1

CjRj + Q

Theorem (Volkovich&L 2010)

If P(B > x) = o(P(N > x)), then the following are equivalent:

I P(N > x) ∼ x−αNLN(x) as x →∞,

I P(R > x) ∼ cNx−αNLN(x) as x →∞,

where cN = (E (c/D))αN [1 − E(N)E((C )αN )]−1

[ Nelly Litvak, SOR group ] 8/25

Page 15: Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random networks

Power Law behaviour of PageRank

I Data for Web, Wikipedia and Preferential Attachment graph

[ Nelly Litvak, SOR group ] 9/25

Page 16: Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random networks

Results for stochastic recursion

Rd=

N∑j=1

CjRj + Q

I Series of papers Olvera-Cravioto& Jelenkovic 2010, 2012,Olvera-Cravioto 2012 analyzed the recursion in details usingsample path large deviation and implicit renewal theory.

I Tail behaviour of R is obtained under most generalassumptions on Cj ’s

I R can be heavy-tailed even when N is light-tailed.

[ Nelly Litvak, SOR group ] 10/25

Page 17: Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random networks

Recursion on a graph

I So far we, in fact, consider recursion on a treeI Will similar results hold on a particular graph structure?I Some graphs are tree-like (Thorny Branching Process, TBP)

[ Nelly Litvak, SOR group ] 11/25

Page 18: Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random networks

Directed configuration model

I Directed graph on n nodes V = {v1, . . . , vn}.I In-degree and out-degree:

I mi = in-degree of node vi = number of edges pointing to vi .I di = out-degree of node vi = number of edges pointing from

vi .

I (m,d) = ({mi }, {di }) is called a bi-degree-sequence.

I Target distributions:

In-degree: F = (fk : k = 0, 1, 2, . . . ), and

Out-degree: G = (gk : k = 0, 1, 2, . . . ).

[ Nelly Litvak, SOR group ] 12/25

Page 19: Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random networks

Assumptions on the target distributions

I Suppose further that for some α,β > 2,

F (x) =∑k>x

fk 6 x−αLF (x)

and

G (x) =∑k>x

gk 6 x−βLG (x),

for all x > 0, where LF (·) and LG (·) are slowly varying.

I Assume both F and G have finite variance.

[ Nelly Litvak, SOR group ] 13/25

Page 20: Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random networks

The bi-degree sequence (Chen&Olvera-Cravioto, 2012)

1 Fix 0 < δ0 < 1 − θ, θ = max{α−1,β−1, 1/2}.2 Sample {γ1, . . . ,γn} i.i.d. from F ; let Γn =

∑ni=1 γi .

3 Sample {ξ1, . . . , ξn} i.i.d. from G ; let Ξn =∑n

i=1 ξi .4 Let ∆n = Γn − Ξn. If |∆n| 6 nθ+δ0 go to step 5; otherwise go

to step 2.5 Choose randomly |∆n| nodes S = {i1, i2, . . . , i|∆n|} without

replacement and let

Ni = γi + τi , Di = ξi + χi , i = 1, 2, . . . , n,

whereχi =

{1 if ∆n > 0 and i ∈ S,

0 otherwise,and

τi =

{1 if ∆n < 0 and i ∈ S,

0 otherwise.

[ Nelly Litvak, SOR group ] 14/25

Page 21: Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random networks

Constructing the graph

I Using the bi-degree-sequence (N,D) for the in- andout-degrees:

I assign to each node vi a number mi of inbound stubs and anumber di of outbound stubs;

I pair outbound stubs to inbound stubs to form directed edgesby matching to each inbound stub an outbound stub chosenuniformly at random from the set of unpaired outbound stubs.

I proceed in the same way for all remaining unpaired inboundstubs, i.e., choose uniformly from the set of unpaired outboundstubs and draw the corresponding directed edge.

I The result is a multigraph (e.g., with self-loops and multipleedges in the same direction) on nodes {v1, . . . , vn}.

[ Nelly Litvak, SOR group ] 15/25

Page 22: Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random networks

PageRank in directed configuration model

I Ci = ζi/Di , where {ζi } is a sequence of i.i.d. random variablesindependent of (N,D) (ζi = c in a classical case)

I M = M(n) ∈ Rn×n is related to the adjacency matrix of thegraph:

Mi ,j =

{sijCi , if there are sij edges from i to j ,

0, otherwise.

I Q ∈ Rn is a personalization vector

I We are interested in one coordinate, R1, of the vector R ∈ Rn

defined byR = RM + Q

[ Nelly Litvak, SOR group ] 16/25

Page 23: Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random networks

Matrix iterations

R(n,0) = B,

R(n,1) = R(n,0)M + Q = BM + Q,

R(n,2) = R(n,1)M + Q = BM2 + QM + Q,

R(n,3) = R(n,2)M + Q = BM3 + QM2 + QM + Q,

...

R(n,k) =

k−1∑i=0

QM i + BMk , k > 1.

We are interested in analyzing P(R(n,∞)1 > x), x →∞.

[ Nelly Litvak, SOR group ] 17/25

Page 24: Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random networks

Idea of the analysis

I R(n,k)1 – PageRank on a perfect branching tree

I R – solution of the equation

Rd=

γ∑i=1

CjRj + Q

I We will try to prove the following: for any fixed t ∈ R, and arandomly chosen node v ,

P(R(n,∞)1 6 t) ≈ P(R

(n,k)1 6 t) ≈ P(R

(n,k)1 6 t) ≈ P(R 6 t)

for large enough n, k .

[ Nelly Litvak, SOR group ] 18/25

Page 25: Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random networks

Idea of the analysis

If we prove that for some k = k(n)→∞ and any ε > 0,

(Matrix Iterations) P(∣∣∣R(n,∞)

1 − R(n,k)1

∣∣∣ > ε)→ 0,

(1)

(Coupling with branching tree) P(∣∣∣R(n,k)

1 − R(n,k)1

∣∣∣ > ε)→ 0,

(2)

(Limiting solution) P(∣∣∣R(n,k)

1 − R∣∣∣ > ε)→ 0,

(3)

as n→∞, then it will follow, by Slutsky’s lemma, that

R(n,∞)1 ⇒ R(∞)

as n→∞, where ⇒ denotes convergence in distribution.

[ Nelly Litvak, SOR group ] 19/25

Page 26: Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random networks

Coupling with branching tree

I We start with random node (node 1) and explore itsneighbours, labeling the stubs that we have already seen

I τ – the number of generations of WBP completed beforecoupling breaks

[ Nelly Litvak, SOR group ] 20/25

Page 27: Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random networks

Coupling with branching tree

Lemma

Let τ be the number of generations of the TBP that we are able tocomplete before we draw the first stub that has already beenobserved before. Then, for any 0 < ε < 1/2, anda = (1/2 − ε)/ logm, where m = E [N]

P(τ 6 a log n) = O(n−ε/2

)as n→∞.

[ Nelly Litvak, SOR group ] 21/25

Page 28: Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random networks

Combining with matrix iteration

I P(∣∣∣R(n,∞)

1 − R(n,k)1

∣∣∣ > ckKn)= o(1)

I We need ckn = o(1) for some k < τ

I Combining this with Lemma 2, we get the main result

[ Nelly Litvak, SOR group ] 22/25

Page 29: Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random networks

Main result

I Let n be the number of nodes in the random graph, and let Nand D be r.v.s having the in-degree and effective out-degreedistributions, resp.

I Let R(n) be the rank vector computed on the graph with nnodes.

I Theorem: (Chen, L, Olvera-Cravioto, 2014) Suppose0 < c < 1/(E [N])2, then

R1(n)⇒ R, n→∞,

where R is the solution to the fixed point equation

Rd= q + c

N∑i=1

Ri

Di.

[ Nelly Litvak, SOR group ] 23/25

Page 30: Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random networks

Work in progress

I Relaxing conditions on c: better bounds for τ and the matrixiterations

I So far, finite variance assumption

I The result probably will not hold for all c ∈ (0, 1).

I The PageRank must converge for all c < 1. Will we obtainthe same power law but with different factor?

[ Nelly Litvak, SOR group ] 24/25

Page 31: Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random networks

Thank you!

[ Nelly Litvak, SOR group ] 25/25