Localized methods for diffusions in large graphs

Localized methods for diffusions in large graphs

David F. Gleich!Purdue University!

Joint work with Kyle Kloster @"Purdue & Michael Mahoney @ Berkeley supported by "NSF CAREER CCF-1149756

Code "www.cs.purdue.edu/homes/dgleich/codes/nexpokit ! "www.cs.purdue.edu/homes/dgleich/codes/l1pagerank !

David Gleich · Purdue 1 MMDS 2014

Image from rockysprings, deviantart, CC share-alike

Everything in the world can be explained by a matrix, and we see

how deep the rabbit hole goes

The talk ends, you believe -- whatever

you want to.

Graph diffusions

David Gleich · Purdue 4

f =1X

k=0

↵k Pk s

34

feature

X R D S • S P R I N G 2 0 1 3 • V O L . 1 9 • N O . 3

damentally limited); the second type models a diffusion of a virtual good (think of sending links or a virus spread-ing on a network), which can be copied infinitely often. Google’s celebrated PageRank model uses a conservative diffusion to determine importance of pages on the Web [4], whereas those studying when a virus will continue to propagate found the eigenvalues of the non-conservative diffusion determine the answer [5]. Thus, just as in scientific computing, marrying the method to the model is key for the best scientific computing on social networks.

Ultimately, none of these steps dif-fer from the practice of physical sci-entific computing. The challenges in creating models, devising algorithms, validating results, and comparing models just take on different chal-lenges when the problems come from social data instead of physical mod-els. Thus, let us return to our starting question: What does the matrix have to do with the social network? Just as in scientific computing, many inter-esting problems, models, and meth-ods for social networks boil down to matrix computations. Yet, as in the expander example above, the types of matrix questions change dramatical-ly in order to fit social network mod-els. Let’s see what’s been done that’s enticingly and refreshingly different from the types of matrix computa-tions encountered in physical scien-tific computing.

EXPANDER GRAPHS AND PARALLEL COMPUTING Recently, a coalition of folks from aca-demia, national labs, and industry set out to tackle the problems in parallel computing and expander graphs. They established the Graph 500 benchmark (http://www.graph500.org) to measure the performance of a parallel com-puter on a standard graph computa-tion with an expander graph. Over the past three years, they’ve seen perfor-mance grow by more than 1,000-times through a combination of novel soft-ware algorithms and higher perfor-mance parallel computers. But, there is still work left in adapting the soft-ware innovations for parallel comput-ing back to matrix computations for social networks.

Figure 1. In a standard scientific computing problem, we find the steady state heat distribution of a plate with a heat-source in the middle. This scientific problem is solved via a linear system. In a social diffusion problem, we are trying to find people who like the movie (labeled in dark orange) instead of people who don’t like the movie (labeled in dark purple). By solving a different linear system, we can determine who is likely to enjoy the movie (light orange).

Problem

Diffusionin a plate

Movie interest indiffusion

Equations

�

�

�

�

Figure 2. The network, or mesh, from a typical problem in scientific computing resides in a low dimensional space—think of two or three dimensions. These physical spaces put limits on the size of the boundary or “surface area” of the space given its volume. No such limits exist in social networks and these two sets are usually about the same size. A network with this property is called an expander network.

Size of set ≈ Size of boundary

Size of set » Size of boundary

“Networks” from PDEs are usuallyphysical

Social networksare expanders

high

low

A – adjacency matrix!D – degree matrix!P – column stochastic operator s – the “seed” (a sparse vector) f – the diffusion result 𝛼k – the path weights

P = AD

�1

Px =X

j!i

1d

j

x

j

Graph diffusions help: 1.  Attribute prediction 2.  Community detection 3.  “Ranking” 4.  Find small conductance sets

MMDS 2014

Graph diffusions


34

feature

X R D S • S P R I N G 2 0 1 3 • V O L . 1 9 • N O . 3

damentally limited); the second type models a diffusion of a virtual good (think of sending links or a virus spread-ing on a network), which can be copied infinitely often. Google’s celebrated PageRank model uses a conservative diffusion to determine importance of pages on the Web [4], whereas those studying when a virus will continue to propagate found the eigenvalues of the non-conservative diffusion determine the answer [5]. Thus, just as in scientific computing, marrying the method to the model is key for the best scientific computing on social networks.

Ultimately, none of these steps dif-fer from the practice of physical sci-entific computing. The challenges in creating models, devising algorithms, validating results, and comparing models just take on different chal-lenges when the problems come from social data instead of physical mod-els. Thus, let us return to our starting question: What does the matrix have to do with the social network? Just as in scientific computing, many inter-esting problems, models, and meth-ods for social networks boil down to matrix computations. Yet, as in the expander example above, the types of matrix questions change dramatical-ly in order to fit social network mod-els. Let’s see what’s been done that’s enticingly and refreshingly different from the types of matrix computa-tions encountered in physical scien-tific computing.

EXPANDER GRAPHS AND PARALLEL COMPUTING Recently, a coalition of folks from aca-demia, national labs, and industry set out to tackle the problems in parallel computing and expander graphs. They established the Graph 500 benchmark (http://www.graph500.org) to measure the performance of a parallel com-puter on a standard graph computa-tion with an expander graph. Over the past three years, they’ve seen perfor-mance grow by more than 1,000-times through a combination of novel soft-ware algorithms and higher perfor-mance parallel computers. But, there is still work left in adapting the soft-ware innovations for parallel comput-ing back to matrix computations for social networks.

Figure 1. In a standard scientific computing problem, we find the steady state heat distribution of a plate with a heat-source in the middle. This scientific problem is solved via a linear system. In a social diffusion problem, we are trying to find people who like the movie (labeled in dark orange) instead of people who don’t like the movie (labeled in dark purple). By solving a different linear system, we can determine who is likely to enjoy the movie (light orange).

Problem

Diffusionin a plate

Movie interest indiffusion

Equations

�

�

�

�

Figure 2. The network, or mesh, from a typical problem in scientific computing resides in a low dimensional space—think of two or three dimensions. These physical spaces put limits on the size of the boundary or “surface area” of the space given its volume. No such limits exist in social networks and these two sets are usually about the same size. A network with this property is called an expander network.

Size of set ≈ Size of boundary

Size of set » Size of boundary

“Networks” from PDEs are usuallyphysical

Social networksare expanders

high

low

h = e�t1X

k=0

tk

k !Pk s

h = e�texp{tP}s

PageRank

Heat kernel

x = (1 � �)1X

k=0

�kP

ks

(I � �P)x = (1 � �)s

P = AD

�1

Px =X

j!i

1d

j

x

j

MMDS 2014

Graph diffusions


h = e�t1X

k=0

tk

k !Pk s

h = e�texp{tP}s

PageRank

Heat kernel

0 20 40 60 80 100

10−5

100

t=1 t=5 t=15 α=0.85

α=0.99

Weig

ht

Length

x = (1 � �)1X

k=0

�kP

ks

(I � �P)x = (1 � �)s

MMDS 2014

Uniformly localized "solutions in livejournal

1 2 3 4 5

x 106

0

0.5

1

1.5

nnz = 4815948

magnitu

de

plot(x)

100

101

102

103

104

105

106

10−14

10−12

10−10

10−8

10−6

10−4

10−2

100

1−

no

rm e

rro

r

largest non−zeros retained10

010

110

210

310

410

510

6

10−14

10−12

10−10

10−8

10−6

10−4

10−2

100

1−

no

rm e

rro

r

largest non−zeros retained

x = exp(P)ec


nnz(x) = 4, 815, 948

Gleich & Kloster, arXiv:1310.3423

MMDS 2014

Our mission!Find the solution with work "roughly proportional to the "localization, not the matrix.

David Gleich · Purdue 8 MMDS 2014

Two types of localization


kx � x

⇤k1 " kD

�1(x � x

⇤)k1 "

x ⇡ x

⇤

Uniform (Strong)! Entry-wise (Weak)!

Localized vectors are not sparse, but they can be approximated by sparse vectors.

Good global approximation using only a local region. “Hard” to prove. “Need” a graph property.

Good approximation for cuts and communities. “Easy” to prove. “Fast” algorithms

MMDS 2014

We have four results

1.  A new interpretation for the PageRank diffusion in relationship with a mincut problem.

2.  A new understanding of the scalable, localized PageRank “push” method

3.  A new algorithm for the heat kernel diffusion in a degree weighted norm.

4.  Algorithms for diffusions as functions of matrices (K. Kloster’s poster on Thurs.)


Undirected graphs only Entry-wise localization

Directed, uniform localization

MMDS 2014

Our algorithms for uniform localization"www.cs.purdue.edu/homes/dgleich/codes/nexpokit

100 101 102 103 104 105 10610−8

10−6

10−4

10−2

100

non−zeros

1−no

rm e

rror

gexpmgexpmqexpmimv

100 101 102 103 104 105 10610−8

10−6

10−4

10−2

100

non−zeros

1−no

rm e

rror


MMDS 2014

work = O⇣

log(

1

" )(

1

" )

3/2d2

(log d)

2

⌘nnz = O

⇣log(

1

" )(

1

" )

3/2d(log d)

⌘

PageRank, mincuts, and the push method via Algorithmic Anti-Differentiation

David Gleich · Purdue 12 Gleich & Mahoney,

ICML 2014 MMDS 2014

The PageRank problem & "the Laplacian on undirected graphs

Combinatorial Laplacian L = D - A!David Gleich · Purdue 13

The PageRank random surfer 1.  With probability beta, follow a random-walk step 2.  With probability (1-beta), jump randomly ~ dist. s. Goal find the stationary dist. x!

x = (1 � �)1X

k=0

�kP

ks1. (I � �AD

�1)x = (1 � �)s;

2. [↵D + L]z = ↵s where � = 1/(1 + ↵) and x = Dz.

MMDS 2014

minimize kBxkC,1

=

Pij2E

C

i ,j

|xi

� x

j

|subject to x

s

= 1, x

t

= 0, x � 0.

The s-t min-cut problem Unweighted incidence matrix

Diagonal capacity matrix

14

David Gleich · Purdue

�

�

��

�

�

�

�

�

t

s

In the unweighted case, "solve via max-flow. In the weighted case, solve via network simplex or industrial LP.

MMDS 2014

The localized cut graph Related to a construction used in “FlowImprove” "Andersen & Lang (2007); and Orecchia & Zhu (2014)

AS =

2

40 ↵dT

S 0↵dS A ↵dS

0 ↵dTS 0

3

5

Connect s to vertices

in S with weight ↵ · degree

Connect t to vertices

in

¯S with weight ↵ · degree


MMDS 2014

The localized cut graph Connect s to vertices



in


BS =

2

4e �IS 00 B 00 �IS e

3

5

minimize kB

S

xkC(↵),1

subject to x

s

= 1, x

t

= 0

x � 0.

Solve the s-t min-cut


MMDS 2014

The localized cut graph Connect s to vertices



in


BS =

2

4e �IS 00 B 00 �IS e

3

5

Solve the “electrical flow” s-t min-cut minimize kB

S

xkC(↵),2

subject to x

s

= 1, x

t

= 0


MMDS 2014

s-t min-cut à PageRank Proof

Square and expand

the objective into

a Laplacian, then

apply constraints.


MMDS 2014

The PageRank vector z that solves

(↵D + L)z = ↵s

with s = d

S

/vol(S) is a renormalized

solution of the electrical cut computation:

minimize kB

S

xkC(↵),2

subject to x

s

= 1, x

t

= 0.

Specifically, if x is the solution, then

x =

2

41

vol(S)z

0

3

5

PageRank à s-t min-cut That equivalence works if s is degree-weighted. What if s is the uniform vector?

�

�

��

�

��

��

��

�

A(s) =2

40 ↵sT 0↵s A ↵(d � s)0 ↵(d � s)T 0

3

5 .


MMDS 2014

Insight 1!PageRank implicitly approximates the solution of these s-t mincut problems


MMDS 2014

The Push Algorithm for PageRank Proposed (in closest form) in Andersen, Chung, Lang "(also by McSherry, Jeh & Widom) for personalized PageRank

Strongly related to Gauss-Seidel on Ax=b (see my talk at Simons) Derived to show improved runtime for balanced solvers

1. x

(1)

= 0, r

(1)

= (1� �)e

i

, k = 1

2. while any r

j

> ⌧d

j

(d

j

is the degree of node j)

3. x

(k+1)

= x

(k )

+ (r

j

� ⌧d

j

⇢)e

j

4. r

(k+1)

i

=

8><

>:

⌧d

j

⇢ i = j

r

(k )

i

+ �(r

j

� ⌧d

j

⇢)/d

j

i ⇠ j

r

(k )

i

otherwise

5. k k + 1

The Push

Method!⌧ , ⇢


a

b

c

MMDS 2014

Why do we care about push? 1.  Used for empirical stud-

ies of “communities” 2.  Local Cheeger inequality. 3.  Used for “fast Page-

Rank approximation” 4.  It produces weakly

localized approximations to PageRank!

Newman’s netscience!379 vertices, 1828 nnz “zero” on most of the nodes

s has a single "one here

22

kD

�1(x � x

⇤)k1 "

1(1 � �)"

edges

The push method revisited Let x be the output from the push method

with 0 < � < 1, v = dS/vol(S),

⇢ = 1, and ⌧ > 0.

Set ↵ =

1�� , = ⌧vol(S)/�, and let zG solve:

minimize

1

2

kBSzk2

C(↵),2

+ kDzk1

subject to zs = 1, zt = 0, z � 0

,

where z =

h1

zG0

i.

Then x = DzG/vol(S).

Proof Write out KKT conditions

Show that the push method

solves them. Slackness was “tricky”

Regularization for sparsity


Need for normalization

MMDS 2014

Insight 2!The PageRank push method implicitly solves a 1-norm regularized 2-norm cut approximation.


MMDS 2014

Insight 2’ We get 3-digits of accuracy on P and 16-digits of accuracy on P’.


MMDS 2014


Anti-di↵erentiating Approximation Algorithms

16 nonzeros 15 nonzeros 284 nonzeros 24 nonzeros

Figure 2. Examples of the di↵erent cut vectors on a portion of the netscience graph. In the left subfigure, we show the set S highlightedwith its vertices enlarged. In the other subfigures, we show the solution vectors from the various cut problems (from left to right, Probs. (2),(4), and (6), solved with min-cut, PageRank, and ACL) for this set S . Each vector determines the color and size of a vertex, where highvalues are large and dark. White vertices with outlines are numerically non-zero (which is why most of the vertices in the fourth figure areoutlined, in contrast to the third figure). The true min-cut set is large in all vectors, but the implicitly regularized problem achieves thiswith many fewer non-zeros than the vanilla PageRank problem.

References

Andersen, Reid and Lang, Kevin. An algorithm for improvinggraph partitions. In Proceedings of the 19th annual ACM-SIAM

Symposium on Discrete Algorithms, pp. 651–660, 2008.

Andersen, Reid, Chung, Fan, and Lang, Kevin. Local graph par-titioning using PageRank vectors. In Proceedings of the 47th

Annual IEEE Symposium on Foundations of Computer Science,pp. 475–486, 2006.

Arasu, Arvind, Novak, Jasmine, Tomkins, Andrew, and Tomlin,John. PageRank computation and the structure of the web:Experiments and algorithms. In Proceedings of the 11th inter-

national conference on the World Wide Web, 2002.

Chin, Hui Han, Madry, Aleksander, Miller, Gary L., and Peng,Richard. Runtime guarantees for regression problems. In Pro-

ceedings of the 4th Conference on Innovations in Theoretical

Computer Science, pp. 269–282, 2013.

Dhillon, Inderjit S., Guan, Yuqiang, and Kulis, Brian. Weightedgraph cuts without eigenvectors: A multilevel approach. IEEE

Trans. Pattern Anal. Mach. Intell., 29(11):1944–1957, 2007.

Goldberg, Andrew V. and Tarjan, Robert E. A new approach to themaximum-flow problem. Journal of the ACM, 35(4):921–940,1988.

Koutra, Danai, Ke, Tai-You, Kang, U., Chau, Duen Horng, Pao,Hsing-Kuo Kenneth, and Faloutsos, Christos. Unifying guilt-by-association approaches: Theorems and fast algorithms. In Pro-

ceedings of the 2011 European conference on Machine learning

and knowledge discovery in databases, pp. 245–260, 2011.

Kulis, Brian and Jordan, Michael I. Revisiting k-means: Newalgorithms via Bayesian nonparametrics. In Proceedings of the

29th International Conference on Machine Learning, 2012.

Langville, Amy N. and Meyer, Carl D. Google’s PageRank and

Beyond: The Science of Search Engine Rankings. PrincetonUniversity Press, 2006.

Leskovec, Jure, Lang, Kevin J., Dasgupta, Anirban, and Mahoney,Michael W. Community structure in large networks: Naturalcluster sizes and the absence of large well-defined clusters.Internet Mathematics, 6(1):29–123, 2009.

Mahoney, Michael W. Approximate computation and implicitregularization for very large-scale data analysis. In Proceedings

of the 31st Symposium on Principles of Database Systems, pp.143–154, 2012.

Mahoney, Michael W., Orecchia, Lorenzo, and Vishnoi,Nisheeth K. A local spectral method for graphs: With applica-tions to improving graph partitions and exploring data graphslocally. Journal of Machine Learning Research, 13:2339–2365,2012.

Newman, M. E. J. Finding community structure in networks usingthe eigenvectors of matrices. Phys. Rev. E, 74:036104, 2006.

Orecchia, Lorenzo and Mahoney, Michael W. Implementing regu-larization implicitly via approximate eigenvector computation.In Proceedings of the 28th International Conference on Machine

Learning, pp. 121–128, 2011.

Orecchia, Lorenzo and Zhu, Zeyuan Allen. Flow-based algorithmsfor local graph clustering. In Proceedings of the 25th ACM-

SIAM Symposium on Discrete Algorithms, pp. 1267–1286, 2014.

Page, Lawrence, Brin, Sergey, Motwani, Rajeev, and Winograd,Terry. The PageRank citation ranking: Bringing order to theweb. Technical Report 1999-66, Stanford University, November1999.

Saunders, Michael A. Solution of sparse rectangular systemsusing LSQR and CRAIG. BIT Numerical Mathematics, 35(4):588–604, 1995.

Tibshirani, Robert. Regression shrinkage and selection via thelasso. Journal of the Royal Statistical Society, Series B, 58:267–288, 1994.

Zhou, Dengyong, Bousquet, Olivier, Lal, Thomas Navin, Weston,Jason, and Scholkopf, Bernhard. Learning with local and globalconsistency. In Advances in Neural Information Processing

Systems 16, pp. 321–328, 2004.

Anti-di↵erentiating Approximation Algorithms

16 nonzeros 15 nonzeros 284 nonzeros 24 nonzeros

Figure 2. Examples of the di↵erent cut vectors on a portion of the netscience graph. In the left subfigure, we show the set S highlightedwith its vertices enlarged. In the other subfigures, we show the solution vectors from the various cut problems (from left to right, Probs. (2),(4), and (6), solved with min-cut, PageRank, and ACL) for this set S . Each vector determines the color and size of a vertex, where highvalues are large and dark. White vertices with outlines are numerically non-zero (which is why most of the vertices in the fourth figure areoutlined, in contrast to the third figure). The true min-cut set is large in all vectors, but the implicitly regularized problem achieves thiswith many fewer non-zeros than the vanilla PageRank problem.

References

Andersen, Reid and Lang, Kevin. An algorithm for improvinggraph partitions. In Proceedings of the 19th annual ACM-SIAM

Symposium on Discrete Algorithms, pp. 651–660, 2008.

Andersen, Reid, Chung, Fan, and Lang, Kevin. Local graph par-titioning using PageRank vectors. In Proceedings of the 47th

Annual IEEE Symposium on Foundations of Computer Science,pp. 475–486, 2006.

Arasu, Arvind, Novak, Jasmine, Tomkins, Andrew, and Tomlin,John. PageRank computation and the structure of the web:Experiments and algorithms. In Proceedings of the 11th inter-

national conference on the World Wide Web, 2002.

Chin, Hui Han, Madry, Aleksander, Miller, Gary L., and Peng,Richard. Runtime guarantees for regression problems. In Pro-

ceedings of the 4th Conference on Innovations in Theoretical

Computer Science, pp. 269–282, 2013.

Dhillon, Inderjit S., Guan, Yuqiang, and Kulis, Brian. Weightedgraph cuts without eigenvectors: A multilevel approach. IEEE

Trans. Pattern Anal. Mach. Intell., 29(11):1944–1957, 2007.

Goldberg, Andrew V. and Tarjan, Robert E. A new approach to themaximum-flow problem. Journal of the ACM, 35(4):921–940,1988.

Koutra, Danai, Ke, Tai-You, Kang, U., Chau, Duen Horng, Pao,Hsing-Kuo Kenneth, and Faloutsos, Christos. Unifying guilt-by-association approaches: Theorems and fast algorithms. In Pro-

ceedings of the 2011 European conference on Machine learning

and knowledge discovery in databases, pp. 245–260, 2011.

Kulis, Brian and Jordan, Michael I. Revisiting k-means: Newalgorithms via Bayesian nonparametrics. In Proceedings of the

29th International Conference on Machine Learning, 2012.

Langville, Amy N. and Meyer, Carl D. Google’s PageRank and

Beyond: The Science of Search Engine Rankings. PrincetonUniversity Press, 2006.

Leskovec, Jure, Lang, Kevin J., Dasgupta, Anirban, and Mahoney,Michael W. Community structure in large networks: Naturalcluster sizes and the absence of large well-defined clusters.Internet Mathematics, 6(1):29–123, 2009.

Mahoney, Michael W. Approximate computation and implicitregularization for very large-scale data analysis. In Proceedings

of the 31st Symposium on Principles of Database Systems, pp.143–154, 2012.

Mahoney, Michael W., Orecchia, Lorenzo, and Vishnoi,Nisheeth K. A local spectral method for graphs: With applica-tions to improving graph partitions and exploring data graphslocally. Journal of Machine Learning Research, 13:2339–2365,2012.

Newman, M. E. J. Finding community structure in networks usingthe eigenvectors of matrices. Phys. Rev. E, 74:036104, 2006.

Orecchia, Lorenzo and Mahoney, Michael W. Implementing regu-larization implicitly via approximate eigenvector computation.In Proceedings of the 28th International Conference on Machine

Learning, pp. 121–128, 2011.

Orecchia, Lorenzo and Zhu, Zeyuan Allen. Flow-based algorithmsfor local graph clustering. In Proceedings of the 25th ACM-

SIAM Symposium on Discrete Algorithms, pp. 1267–1286, 2014.

Page, Lawrence, Brin, Sergey, Motwani, Rajeev, and Winograd,Terry. The PageRank citation ranking: Bringing order to theweb. Technical Report 1999-66, Stanford University, November1999.

Saunders, Michael A. Solution of sparse rectangular systemsusing LSQR and CRAIG. BIT Numerical Mathematics, 35(4):588–604, 1995.

Tibshirani, Robert. Regression shrinkage and selection via thelasso. Journal of the Royal Statistical Society, Series B, 58:267–288, 1994.

Zhou, Dengyong, Bousquet, Olivier, Lal, Thomas Navin, Weston,Jason, and Scholkopf, Bernhard. Learning with local and globalconsistency. In Advances in Neural Information Processing

Systems 16, pp. 321–328, 2004.

Push’s sparsity helps it identify the “right” graph feature with fewer non-zeros

The set S The mincut solution

The push solution The PageRank solution

MMDS 2014

The push method revisited Let x be the output from the push method

with 0 < � < 1, v = dS/vol(S),

⇢ = 1, and ⌧ > 0.

Set ↵ =

1�� , = ⌧vol(S)/�, and let zG solve:

minimize

1

2

kBSzk2

C(↵),2

+ kDzk1

subject to zs = 1, zt = 0, z � 0

,

where z =

h1

zG0

i.

Then x = DzG/vol(S).

Regularization for sparsity in solution and residual


The push method is scalable because it gives us sparse solutions AND sparse residuals r.

MMDS 2014

This is a case of Algorithmic Anti-differentiation!

28

MMDS 2014 David Gleich · Purdue

Understand why H works!Show heuristic H solves P’ Guess and check!until you find something H solves Derive characterization of heuristic H

The real world

Given “find-communities” Hack around " Write paper presenting “three steps of the power method on P finds communities”

Algorithmic Anti-differentiation!Given heuristic H, is there a problem P’ such that H is an algorithm for P’ ?

MMDS 2014 David Gleich · Purdue 29

e.g. Mahoney & Orecchia, Dhillon et al. (Graclus); Saunders

Without these insights, we’d draw the wrong conclusion.


Gleich & Mahoney, Submitted

Our s-t mincut framework extends to many diffusions used in semi-supervised learning.

MMDS 2014





2 4 6 8 100

0.2

0.4

0.6

0.8

err

or

rate

average training samples per class

K2RK2K3RK3

Off the shelf SSL procedure

MMDS 2014





2 4 6 8 100

0.2

0.4

0.6

0.8

err

or

rate


K2RK2K3RK3

2 4 6 8 100

0.2

0.4

0.6

0.8

err

or

rate


K2RK2K3RK3

Off the shelf SSL procedure Rank-rounded SSL

MMDS 2014

Recap so far 1.  Used the relationship between

PageRank and mincut to get a new understanding of the implicit properties of the push method

2.  Showed that this insight helps improve semi-supervised learning.

(next) A new algorithm for the heat kernel diffusion in a degree weighted norm.


MMDS 2014

Graph diffusions


h = e�t1X

k=0

tk

k !Pk s

h = e�texp{tP}s

PageRank

Heat kernel 0 20 40 60 80 100

10−5

100

t=1 t=5 t=15 α=0.85

α=0.99

Weig

ht

Length

x = (1 � �)1X

k=0

�kP

ks

(I � �P)x = (1 � �)s

Many “empirically useful” properties of PageRank also hold for the Heat kernel diffusion, e.g. "Chung (2007) showed a local Cheeger inequality. No “local” algorithm until a randomized method by Simpson & Chung (2013).

MMDS 2014

We can turn the heat kernel into a linear system Direct expansion! "!!!


x = exp(P)ec ⇡PN

k=0

1

k !

P

kec = xN

2

6666664

III�P/1 III

�P/2. . .. . . III

�P/N III

3

7777775

2

6666664

v0v1......

vN

3

7777775=

2

6666664

ec0......0

3

7777775xN =

NX

i=0

vi

(III⌦ IIIN � SN ⌦ P)v = e1 ⌦ ec

Lemma we approximate xN well if we approximate v well

Kloster & Gleich, WAW2013

MMDS 2014

There is a fast deterministic adaptation of the push method


Kloster & Gleich, KDD2014

These polynomials k(t) are closely related to the � functionscentral to exponential integrators in ODEs [37]. Note that 0 = TN .

To guarantee the total error satisfies the criterion (3) then,it is enough to show that the error at each Taylor termsatisfies an 1-norm inequality analogous to (3). This isdiscussed in more detail in Section A.

4.3 Deriving a linear systemTo define the basic step of the hk-relax algorithm and to

show how the k influence the total error, we rearrange theTaylor polynomial computation into a linear system.

Denote by vk the k

th term of the vector sum TN (tP)s:

TN (tP)s = s+ t1Ps+ · · ·+ tN

N !PNs (7)

= v0 + v1 + · · ·+ vN . (8)

Note that vk+1 = tk+1

(k+1)!Pk+1 = t

(k+1)Pvk. This identityimplies that the terms vk exactly satisfy the linear system

2

6

6

6

6

6

6

4

I�t1 P I

�t2 P

. . .

. . . I�tN P I

3

7

7

7

7

7

7

5

2

6

6

6

6

6

6

4

v0

v1

...

...vN

3

7

7

7

7

7

7

5

=

2

6

6

6

6

6

6

4

s0......0

3

7

7

7

7

7

7

5

. (9)

Let v = [v0;v1; · · · ;vN ]. An approximate solution v to(9) would have block components vk such that

PNk=0 vk ⇡

TN (tP)s, our desired approximation to e

th. In practice, weupdate only a single length n solution vector, adding allupdates to that vector, instead of maintaining the N + 1di↵erent block vectors vk as part of v; furthermore, the blockmatrix and right-hand side are never formed explicitly.

With this block system in place, we can describe the algo-rithm’s steps.

4.4 The hk-relax algorithmGiven a random walk transition matrix P, scalar t > 0,

and seed vector s as inputs, we solve the linear system from(9) as follows. Denote the initial solution vector by y and theinitial nN ⇥ 1 residual by r(0) = e1 ⌦ s. Denote by r(i, j) theentry of r corresponding to node i in residual block j. Theidea is to iteratively remove all entries from r that satisfy

r(i, j) � e

t"di

2N j(t). (10)

To organize this process, we begin by placing the nonzeroentries of r(0) in a queue, Q(r), and place updated entries ofr into Q(r) only if they satisfy (10).

Then hk-relax proceeds as follows.1. At each step, pop the top entry of Q(r), call it r(i, j),

and subtract that entry in r, making r(i, j) = 0.2. Add r(i, j) to yi.3. Add r(i, j) t

j+1Pej to residual block rj+1.4. For each entry of rj+1 that was updated, add that entry

to the back of Q(r) if it satisfies (10).Once all entries of r that satisfy (10) have been removed,

the resulting solution vector y will satisfy (3), which weprove in Section A, along with a bound on the work requiredto achieve this. We present working code for our methodin Figure 2 that shows how to optimize this computationusing sparse data structures. These make it highly e�cientin practice.

# G is graph as dictionary -of-sets ,

# seed is an array of seeds ,

# t, eps , N, psis are precomputed

x = {} # Store x, r as dictionaries

r = {} # initialize residual

Q = collections.deque() # initialize queue

for s in seed:

r[(s,0)] = 1./len(seed)

Q.append ((s,0))

while len(Q) > 0:

(v,j) = Q.popleft () # v has r[(v,j)] ...

rvj = r[(v,j)]

# perform the hk-relax step

if v not in x: x[v] = 0.

x[v] += rvj

r[(v,j)] = 0.

mass = (t*rvj/(float(j)+1.))/ len(G[v])

for u in G[v]: # for neighbors of v

next = (u,j+1) # in the next block

if j+1 == N: # last step , add to soln

x[u] += rvj/len(G(v))

continue

if next not in r: r[next] = 0.

thresh = math.exp(t)*eps*len(G[u])

thresh = thresh /(N*psis[j+1])/2.

if r[next] < thresh and \

r[next] + mass >= thresh:

Q.append(next) # add u to queue

r[next] = r[next] + mass

Figure 2: Pseudo-code for our algorithm as work-ing python code. The graph is stored as a dic-tionary of sets so that the G[v] statement re-turns the set of neighbors associated with vertexv. The solution is the vector x indexed by ver-tices and the residual vector is indexed by tuples(v, j) that are pairs of vertices and steps j. Afully working demo may be downloaded from githubhttps://gist.github.com/dgleich/7d904a10dfd9ddfaf49a.

4.5 Choosing NThe last detail of the algorithm we need to discuss is how

to pick N . In (4) we want to guarantee the accuracy ofD�1 exp {tP} s � D�1

TN (tP)s. By using D�1 exp {tP} =exp

�

tPT

D�1 andD�1TN (tP) = TN (tPT )D�1, we can get

a new upperbound on kD�1 exp {tP} s�D�1TN (tP)sk1 by

noting

k expn

tPTo

D�1s� TN (tPT )D�1sk1

k expn

tPTo

� TN (tPT )k1kD�1sk1.

Since s is stochastic, we have kD�1sk1 kD�1sk1 1.From [34] we know that the norm k exp

�

tPT

�TN (tPT )k1is bounded by

ktPT kN+11

(N + 1)!(N + 2)

(N + 2� t) t

N+1

(N + 1)!(N + 2)

(N + 2� t). (11)

So to guarantee (4), it is enough to choose N that impliestN+1

(N+1)!(N+2)

(N+2�t) < "/2. Such an N can be determined e�-ciently simply by iteratively computing terms of the Taylorpolynomial for et until the error is less than the desired errorfor hk-relax. In practice, this required a choice of N nogreater than 2t log( 1" ), which we think can be made rigorous.

Let h = e�texp{tP}s.

Let x = hk-push(") output

Then kD

�1

(x � h)k1 "

after looking at

2Net

" edges.

We believe that the bound below suffices N 2t log(1/")

MMDS 2014

PageRank vs. Heat Kernel


Table 1: DatasetsGraph |V | |E|pgp-cc 10,680 24,316ca-AstroPh-cc 17,903 196,972marvel-comics-cc 19,365 96,616as-22july06 22,963 48,436rand-ff-25000-0.4 25,000 56,071cond-mat-2003-cc 27,519 116,181email-Enron-cc 33,696 180,811cond-mat-2005-fix-cc 36,458 171,735soc-sign-epinions-cc 119,130 704,267itdk0304-cc 190,914 607,610dblp-cc 226,413 716,460flickr-bidir-cc 513,969 3,190,452ljournal-2008 5,363,260 50,030,085twitter-2010 41,652,230 1,202,513,046friendster 65,608,366 1,806,067,135com-amazon 334,863 925,872com-dblp 317,080 1,049,866com-youtube 1,134,890 2,987,624com-lj 3,997,962 34,681,189com-orkut 3,072,441 117,185,083

Table 2: The result of evaluating the heat kernel (hk)vs. PageRank (pr) on finding real-world communi-ties. The heat kernel finds smaller, more accurate,sets with slightly worse conductance.

data F1-measure conductance set sizehk pr hk pr hk pr

amazon 0.325 0.140 0.141 0.048 193 15293dblp 0.257 0.115 0.267 0.173 44 16026youtube 0.177 0.136 0.337 0.321 1010 6079lj 0.131 0.107 0.474 0.459 283 738orkut 0.055 0.044 0.714 0.687 537 1989friendster 0.078 0.090 0.785 0.802 229 333

scores, but using much smaller sets with substantially betterF1 measures. This suggests that hk-relax better capturesthe properties of real-world communities than the PageRankdi↵usion in the sense that the tighter sets produced by theheat kernel are better focused around real-world communitiesthan are the larger sets produced by the PageRank di↵usion.

7. CONCLUSIONSThese results suggest that the hk-relax algorithm is a

viable companion to the celebrated PageRank push algorithmand may even be a worthy competitor for tasks that requireaccurate communities of large graphs. Furthermore, wesuspect that the hk-relax method will be useful for themyriad other uses of PageRank-style di↵usions such as link-prediction [29] or even logic programming [51].

In the future, we plan to explore this method on directednetworks as well as better methods for selecting the parame-ters of the di↵usion. It is also possible that our new ideas maytranslate to faster methods for non-conservative di↵usionssuch as the Katz [25] and modularity methods [43], and weplan to explore these di↵usions as well.

5 6 7 8 90

0.5

1

1.5

2Runtime: hk vs. ppr

log10(|V|+|E|)

Ru

ntim

e (

s)

hkgrow 50%25%75%pprgrow 50%25%75%

5 6 7 8 9

10−2

10−1

100

Conductances: hk vs. ppr

log10(|V|+|E|)

log

10

(Co

nd

uct

an

ces)

hkgrow 50%25%75%pprgrow 50%25%75%

Figure 5: (Top figure) Runtimes of the hk-relaxvs. ppr-push, shown with percentile trendlines froma select set of experiments. (Bottom) Conductancesof hk-relax vs. ppr-push, shown in the same way.

AcknowledgementsThis work was supported by NSF CAREER Award CCF-1149756.

8. REFERENCES[1] B. Abrahao, S. Soundarajan, J. Hopcroft, and R. Kleinberg.

On the separability of structural classes of communities. InKDD, pages 624–632, 2012.

[2] A. H. Al-Mohy and N. J. Higham. Computing the action ofthe matrix exponential, with an application to exponentialintegrators. SIAM J. Sci. Comput., 33(2):488–511, March2011.

[3] R. Alberich, J. Miro-Julia, and F. Rossello. Marvel universelooks almost like a real social network. arXiv,cond-mat.dis-nn:0202174, 2002.

[4] N. Alon and V. D. Milman. �1, isoperimetric inequalities forgraphs, and superconcentrators. J. of Comb. Theory, SeriesB, 38(1):73–88, 1985.

[5] R. Andersen, F. Chung, and K. Lang. Local graphpartitioning using PageRank vectors. In FOCS, 2006.

[6] R. Andersen and K. Lang. An algorithm for improvinggraph partitions. In SODA, pages 651–660, January 2008.

[7] K. Avrachenkov, N. Litvak, D. Nemirovsky, and N. Osipova.Monte carlo methods in pagerank computation: When oneiteration is su�cient. SIAM J. Numer. Anal., 45(2):890–904,February 2007.

[8] B. Bahmani, K. Chakrabarti, and D. Xin. Fast personalizedpagerank on MapReduce. In SIGMOD, pages 973–984, NewYork, NY, USA, 2011. ACM.

Table 1: DatasetsGraph |V | |E|pgp-cc 10,680 24,316ca-AstroPh-cc 17,903 196,972marvel-comics-cc 19,365 96,616as-22july06 22,963 48,436rand-ff-25000-0.4 25,000 56,071cond-mat-2003-cc 27,519 116,181email-Enron-cc 33,696 180,811cond-mat-2005-fix-cc 36,458 171,735soc-sign-epinions-cc 119,130 704,267itdk0304-cc 190,914 607,610dblp-cc 226,413 716,460flickr-bidir-cc 513,969 3,190,452ljournal-2008 5,363,260 50,030,085twitter-2010 41,652,230 1,202,513,046friendster 65,608,366 1,806,067,135com-amazon 334,863 925,872com-dblp 317,080 1,049,866com-youtube 1,134,890 2,987,624com-lj 3,997,962 34,681,189com-orkut 3,072,441 117,185,083

Table 2: The result of evaluating the heat kernel (hk)vs. PageRank (pr) on finding real-world communi-ties. The heat kernel finds smaller, more accurate,sets with slightly worse conductance.

data F1-measure conductance set sizehk pr hk pr hk pr

amazon 0.325 0.140 0.141 0.048 193 15293dblp 0.257 0.115 0.267 0.173 44 16026youtube 0.177 0.136 0.337 0.321 1010 6079lj 0.131 0.107 0.474 0.459 283 738orkut 0.055 0.044 0.714 0.687 537 1989friendster 0.078 0.090 0.785 0.802 229 333

scores, but using much smaller sets with substantially betterF1 measures. This suggests that hk-relax better capturesthe properties of real-world communities than the PageRankdi↵usion in the sense that the tighter sets produced by theheat kernel are better focused around real-world communitiesthan are the larger sets produced by the PageRank di↵usion.

7. CONCLUSIONSThese results suggest that the hk-relax algorithm is a

viable companion to the celebrated PageRank push algorithmand may even be a worthy competitor for tasks that requireaccurate communities of large graphs. Furthermore, wesuspect that the hk-relax method will be useful for themyriad other uses of PageRank-style di↵usions such as link-prediction [29] or even logic programming [51].

In the future, we plan to explore this method on directednetworks as well as better methods for selecting the parame-ters of the di↵usion. It is also possible that our new ideas maytranslate to faster methods for non-conservative di↵usionssuch as the Katz [25] and modularity methods [43], and weplan to explore these di↵usions as well.

5 6 7 8 90

0.5

1

1.5

2Runtime: hk vs. ppr

log10(|V|+|E|)

Ru

ntim

e (

s)

hkgrow 50%25%75%pprgrow 50%25%75%

5 6 7 8 9

10−2

10−1

100

Conductances: hk vs. ppr

log10(|V|+|E|)

log

10

(Co

nd

uct

an

ces)

hkgrow 50%25%75%pprgrow 50%25%75%

Figure 5: (Top figure) Runtimes of the hk-relaxvs. ppr-push, shown with percentile trendlines froma select set of experiments. (Bottom) Conductancesof hk-relax vs. ppr-push, shown in the same way.

AcknowledgementsThis work was supported by NSF CAREER Award CCF-1149756.

8. REFERENCES[1] B. Abrahao, S. Soundarajan, J. Hopcroft, and R. Kleinberg.

On the separability of structural classes of communities. InKDD, pages 624–632, 2012.

[2] A. H. Al-Mohy and N. J. Higham. Computing the action ofthe matrix exponential, with an application to exponentialintegrators. SIAM J. Sci. Comput., 33(2):488–511, March2011.

[3] R. Alberich, J. Miro-Julia, and F. Rossello. Marvel universelooks almost like a real social network. arXiv,cond-mat.dis-nn:0202174, 2002.

[4] N. Alon and V. D. Milman. �1, isoperimetric inequalities forgraphs, and superconcentrators. J. of Comb. Theory, SeriesB, 38(1):73–88, 1985.

[5] R. Andersen, F. Chung, and K. Lang. Local graphpartitioning using PageRank vectors. In FOCS, 2006.

[6] R. Andersen and K. Lang. An algorithm for improvinggraph partitions. In SODA, pages 651–660, January 2008.

[7] K. Avrachenkov, N. Litvak, D. Nemirovsky, and N. Osipova.Monte carlo methods in pagerank computation: When oneiteration is su�cient. SIAM J. Numer. Anal., 45(2):890–904,February 2007.

[8] B. Bahmani, K. Chakrabarti, and D. Xin. Fast personalizedpagerank on MapReduce. In SIGMOD, pages 973–984, NewYork, NY, USA, 2011. ACM.

On large graphs, our heat kernel takes slightly longer than a localized PageRank, but produces sets with smaller (better) conductance scores. Our python code on clueweb12 (72B edges) via libbvg: •  99 seconds to load •  1 second to compute

MMDS 2014

References and ongoing work

Gleich and Kloster – Relaxation methods for the matrix exponential, Submitted"Kloster and Gleich – Heat kernel based community detection KDD2014 Gleich and Mahoney – Algorithmic Anti-differentiation, ICML 2014 "Gleich and Mahoney – Regularized diffusions, Submitted www.cs.purdue.edu/homes/dgleich/codes/nexpokit!www.cs.purdue.edu/homes/dgleich/codes/l1pagerank

•  Improved localization bounds for functions of matrices •  Asynchronous and parallel “push”-style methods


Supported by NSF CAREER 1149756-CCF www.cs.purdue.edu/homes/dgleich

Localized methods for diffusions in large graphs

Technology

Transcript of Localized methods for diffusions in large graphs