Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random networks

Asymptotic behaviour of rankingalgorithms in directed randomnetworks

Nelly Litvak

University of Twente, The Netherlands

joint work withMariana Olvera-Cravioto and Ningyuan Chen

Workshop on Extremal Graph TheoryMoscow, 06-06-2014

Power law of PageRank

Pandurangan, Raghavan, Upfal, 2002.

[ Nelly Litvak, SOR group ] 2/25

Power laws in complex networks

I Power laws: Internet, WWW, social networks, biologicalnetworks, etc...

I degree of the node = # (in-/out-) linksI [fraction nodes degree at least k] = pk ,I Power law: pk ≈ const · k−α, α > 0.I Power law is the model for high variability: some nodes (hubs)

have extremely many connectionsI log pk = log(const) − α log kI Straight line on the log-log scale





have extremely many connections

I log pk = log(const) − α log kI Straight line on the log-log scale





have extremely many connectionsI log pk = log(const) − α log k

I Straight line on the log-log scale





have extremely many connectionsI log pk = log(const) − α log kI Straight line on the log-log scale


Regular variation

I X is regularly varying random variable with index α

P(X > x) = L(x)x−α, x > 0

I L(x) is slowly varying:for every t > 0, L(tx)/L(x)→ 1 as x →∞


Google PageRank

I S. Brin, L. Page, The anatomy of a large-scale hypertextualWeb search engine (1998)

I PageRank Ri of page i = 1, . . . , n is defined as a stationarydistribution of a random walk with jumps:

Ri =∑j → i

c

djRj + (1 − c)bi , i = 1, . . . , n

I dj = # out-links of page j

I c ∈ (0, 1), originally 0.85, probability of a random jump

I bi probability to jump to page i , originally, bi = 1/n

I personalized PageRank: bi 6= 1/n


Examples of applications

Ri =∑j → i

c

djRj + (1 − c)bi , i = 1, . . . , n

I Topic-sensitive search (Haveliwala, 2002);I Spam detection (Gyongyi et al., 2004)I Finding related entities (Chakrabarti, 2007);I Link prediction (Liben-Nowell and Kleinberg, 2003;

Voevodski, Teng, Xia, 2009);I Finding local cuts (Andersen, Chung, Lang, 2006);I Graph clustering (Tsiatas, Chung, 2010);I Person name disambiguation

(Smirnova, Avrachenkov, Trousse, 2010);I Finding most influential people in Wikipedia

(Shepelyansky et al, 2010, 2013)


Stochastic model for PageRank

I Rescale: Ri → nRi , bi → nbi

Ri =∑j → i

c

djRj + (1 − c)bi , i = 1, . . . , n

I Stochastic equation:

Rd= c

N∑j=1

1

DjRj + cp0 + (1 − c)B

I N: in-degree of the randomly chosen pageI D: out-degree of page that links to the randomly chosen pageI p0: fraction of pages with out-degree zeroI Rj is distributed as R; N,D,Rj are independent; N and B can

be dependentI We can denote Q = cp0 + (1 − c)B, Cj = c/Dj .


Results for stochastic recursion

Rd=

N∑j=1

CjRj + Q

Theorem (Volkovich&L 2010)

If P(B > x) = o(P(N > x)), then the following are equivalent:

I P(N > x) ∼ x−αNLN(x) as x →∞,

I P(R > x) ∼ cNx−αNLN(x) as x →∞,

where cN = (E (c/D))αN [1 − E(N)E((C )αN )]−1


Power Law behaviour of PageRank

I Data for Web, Wikipedia and Preferential Attachment graph


Results for stochastic recursion

Rd=

N∑j=1

CjRj + Q

I Series of papers Olvera-Cravioto& Jelenkovic 2010, 2012,Olvera-Cravioto 2012 analyzed the recursion in details usingsample path large deviation and implicit renewal theory.

I Tail behaviour of R is obtained under most generalassumptions on Cj ’s

I R can be heavy-tailed even when N is light-tailed.


Recursion on a graph

I So far we, in fact, consider recursion on a treeI Will similar results hold on a particular graph structure?I Some graphs are tree-like (Thorny Branching Process, TBP)


Directed configuration model

I Directed graph on n nodes V = {v1, . . . , vn}.I In-degree and out-degree:

I mi = in-degree of node vi = number of edges pointing to vi .I di = out-degree of node vi = number of edges pointing from

vi .

I (m,d) = ({mi }, {di }) is called a bi-degree-sequence.

I Target distributions:

In-degree: F = (fk : k = 0, 1, 2, . . . ), and

Out-degree: G = (gk : k = 0, 1, 2, . . . ).


Assumptions on the target distributions

I Suppose further that for some α,β > 2,

F (x) =∑k>x

fk 6 x−αLF (x)

and

G (x) =∑k>x

gk 6 x−βLG (x),

for all x > 0, where LF (·) and LG (·) are slowly varying.

I Assume both F and G have finite variance.


The bi-degree sequence (Chen&Olvera-Cravioto, 2012)

1 Fix 0 < δ0 < 1 − θ, θ = max{α−1,β−1, 1/2}.2 Sample {γ1, . . . ,γn} i.i.d. from F ; let Γn =

∑ni=1 γi .

3 Sample {ξ1, . . . , ξn} i.i.d. from G ; let Ξn =∑n

i=1 ξi .4 Let ∆n = Γn − Ξn. If |∆n| 6 nθ+δ0 go to step 5; otherwise go

to step 2.5 Choose randomly |∆n| nodes S = {i1, i2, . . . , i|∆n|} without

replacement and let

Ni = γi + τi , Di = ξi + χi , i = 1, 2, . . . , n,

whereχi =

{1 if ∆n > 0 and i ∈ S,

0 otherwise,and

τi =

{1 if ∆n < 0 and i ∈ S,

0 otherwise.


Constructing the graph

I Using the bi-degree-sequence (N,D) for the in- andout-degrees:

I assign to each node vi a number mi of inbound stubs and anumber di of outbound stubs;

I pair outbound stubs to inbound stubs to form directed edgesby matching to each inbound stub an outbound stub chosenuniformly at random from the set of unpaired outbound stubs.

I proceed in the same way for all remaining unpaired inboundstubs, i.e., choose uniformly from the set of unpaired outboundstubs and draw the corresponding directed edge.

I The result is a multigraph (e.g., with self-loops and multipleedges in the same direction) on nodes {v1, . . . , vn}.


PageRank in directed configuration model

I Ci = ζi/Di , where {ζi } is a sequence of i.i.d. random variablesindependent of (N,D) (ζi = c in a classical case)

I M = M(n) ∈ Rn×n is related to the adjacency matrix of thegraph:

Mi ,j =

{sijCi , if there are sij edges from i to j ,

0, otherwise.

I Q ∈ Rn is a personalization vector

I We are interested in one coordinate, R1, of the vector R ∈ Rn

defined byR = RM + Q


Matrix iterations

R(n,0) = B,

R(n,1) = R(n,0)M + Q = BM + Q,

R(n,2) = R(n,1)M + Q = BM2 + QM + Q,

R(n,3) = R(n,2)M + Q = BM3 + QM2 + QM + Q,

...

R(n,k) =

k−1∑i=0

QM i + BMk , k > 1.

We are interested in analyzing P(R(n,∞)1 > x), x →∞.


Idea of the analysis

I R(n,k)1 – PageRank on a perfect branching tree

I R – solution of the equation

Rd=

γ∑i=1

CjRj + Q

I We will try to prove the following: for any fixed t ∈ R, and arandomly chosen node v ,

P(R(n,∞)1 6 t) ≈ P(R

(n,k)1 6 t) ≈ P(R

(n,k)1 6 t) ≈ P(R 6 t)

for large enough n, k .


Idea of the analysis

If we prove that for some k = k(n)→∞ and any ε > 0,

(Matrix Iterations) P(∣∣∣R(n,∞)

1 − R(n,k)1

∣∣∣ > ε)→ 0,

(1)

(Coupling with branching tree) P(∣∣∣R(n,k)

1 − R(n,k)1

∣∣∣ > ε)→ 0,

(2)

(Limiting solution) P(∣∣∣R(n,k)

1 − R∣∣∣ > ε)→ 0,

(3)

as n→∞, then it will follow, by Slutsky’s lemma, that

R(n,∞)1 ⇒ R(∞)

as n→∞, where ⇒ denotes convergence in distribution.


Coupling with branching tree

I We start with random node (node 1) and explore itsneighbours, labeling the stubs that we have already seen

I τ – the number of generations of WBP completed beforecoupling breaks


Coupling with branching tree

Lemma

Let τ be the number of generations of the TBP that we are able tocomplete before we draw the first stub that has already beenobserved before. Then, for any 0 < ε < 1/2, anda = (1/2 − ε)/ logm, where m = E [N]

P(τ 6 a log n) = O(n−ε/2

)as n→∞.


Combining with matrix iteration

I P(∣∣∣R(n,∞)

1 − R(n,k)1

∣∣∣ > ckKn)= o(1)

I We need ckn = o(1) for some k < τ

I Combining this with Lemma 2, we get the main result


Main result

I Let n be the number of nodes in the random graph, and let Nand D be r.v.s having the in-degree and effective out-degreedistributions, resp.

I Let R(n) be the rank vector computed on the graph with nnodes.

I Theorem: (Chen, L, Olvera-Cravioto, 2014) Suppose0 < c < 1/(E [N])2, then

R1(n)⇒ R, n→∞,

where R is the solution to the fixed point equation

Rd= q + c

N∑i=1

Ri

Di.


Work in progress

I Relaxing conditions on c: better bounds for τ and the matrixiterations

I So far, finite variance assumption

I The result probably will not hold for all c ∈ (0, 1).

I The PageRank must converge for all c < 1. Will we obtainthe same power law but with different factor?


Thank you!


Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random networks

Science

Transcript of Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random networks