Localized methods for diffusions in large graphs

38
Localized methods for diffusions in large graphs David F. Gleich Purdue University Joint work with Kyle Kloster @ Purdue & Michael Mahoney @ Berkeley supported by NSF CAREER CCF-1149756 Code www.cs.purdue.edu/homes/dgleich/codes/nexpokit www.cs.purdue.edu/homes/dgleich/codes/l1pagerank David Gleich · Purdue 1 MMDS 2014

description

I describe a few ongoing research projects on diffusions in large graphs and how we can create efficient matrix computations in order to determine them efficiently.

Transcript of Localized methods for diffusions in large graphs

Page 1: Localized methods for diffusions in large graphs

Localized methods for diffusions in large graphs

David F. Gleich!Purdue University!

Joint work with Kyle Kloster @"Purdue & Michael Mahoney @ Berkeley supported by "NSF CAREER CCF-1149756

Code "www.cs.purdue.edu/homes/dgleich/codes/nexpokit ! "www.cs.purdue.edu/homes/dgleich/codes/l1pagerank !

David Gleich · Purdue 1 MMDS 2014

Page 2: Localized methods for diffusions in large graphs

Image from rockysprings, deviantart, CC share-alike

Everything in the world can be explained by a matrix, and we see

how deep the rabbit hole goes

The talk ends, you believe -- whatever

you want to.

Page 3: Localized methods for diffusions in large graphs

3

Page 4: Localized methods for diffusions in large graphs

Graph diffusions

David Gleich · Purdue 4

f =1X

k=0

↵k Pk s

34

feature

X R D S • S P R I N G 2 0 1 3 • V O L . 1 9 • N O . 3

damentally limited); the second type models a diffusion of a virtual good (think of sending links or a virus spread-ing on a network), which can be copied infinitely often. Google’s celebrated PageRank model uses a conservative diffusion to determine importance of pages on the Web [4], whereas those studying when a virus will continue to propagate found the eigenvalues of the non-conservative diffusion determine the answer [5]. Thus, just as in scientific computing, marrying the method to the model is key for the best scientific computing on social networks.

Ultimately, none of these steps dif-fer from the practice of physical sci-entific computing. The challenges in creating models, devising algorithms, validating results, and comparing models just take on different chal-lenges when the problems come from social data instead of physical mod-els. Thus, let us return to our starting question: What does the matrix have to do with the social network? Just as in scientific computing, many inter-esting problems, models, and meth-ods for social networks boil down to matrix computations. Yet, as in the expander example above, the types of matrix questions change dramatical-ly in order to fit social network mod-els. Let’s see what’s been done that’s enticingly and refreshingly different from the types of matrix computa-tions encountered in physical scien-tific computing.

EXPANDER GRAPHS AND PARALLEL COMPUTING Recently, a coalition of folks from aca-demia, national labs, and industry set out to tackle the problems in parallel computing and expander graphs. They established the Graph 500 benchmark (http://www.graph500.org) to measure the performance of a parallel com-puter on a standard graph computa-tion with an expander graph. Over the past three years, they’ve seen perfor-mance grow by more than 1,000-times through a combination of novel soft-ware algorithms and higher perfor-mance parallel computers. But, there is still work left in adapting the soft-ware innovations for parallel comput-ing back to matrix computations for social networks.

Figure 1. In a standard scientific computing problem, we find the steady state heat distribution of a plate with a heat-source in the middle. This scientific problem is solved via a linear system. In a social diffusion problem, we are trying to find people who like the movie (labeled in dark orange) instead of people who don’t like the movie (labeled in dark purple). By solving a different linear system, we can determine who is likely to enjoy the movie (light orange).

Problem

Diffusionin a plate

Movie interest indiffusion

Equations

Figure 2. The network, or mesh, from a typical problem in scientific computing resides in a low dimensional space—think of two or three dimensions. These physical spaces put limits on the size of the boundary or “surface area” of the space given its volume. No such limits exist in social networks and these two sets are usually about the same size. A network with this property is called an expander network.

Size of set ≈ Size of boundary

Size of set » Size of boundary

“Networks” from PDEs are usuallyphysical

Social networksare expanders

high

low

A – adjacency matrix!D – degree matrix!P – column stochastic operator s – the “seed” (a sparse vector) f – the diffusion result 𝛼k – the path weights

P = AD

�1

Px =X

j!i

1d

j

x

j

Graph diffusions help: 1.  Attribute prediction 2.  Community detection 3.  “Ranking” 4.  Find small conductance sets

MMDS 2014

Page 5: Localized methods for diffusions in large graphs

Graph diffusions

David Gleich · Purdue 5

34

feature

X R D S • S P R I N G 2 0 1 3 • V O L . 1 9 • N O . 3

damentally limited); the second type models a diffusion of a virtual good (think of sending links or a virus spread-ing on a network), which can be copied infinitely often. Google’s celebrated PageRank model uses a conservative diffusion to determine importance of pages on the Web [4], whereas those studying when a virus will continue to propagate found the eigenvalues of the non-conservative diffusion determine the answer [5]. Thus, just as in scientific computing, marrying the method to the model is key for the best scientific computing on social networks.

Ultimately, none of these steps dif-fer from the practice of physical sci-entific computing. The challenges in creating models, devising algorithms, validating results, and comparing models just take on different chal-lenges when the problems come from social data instead of physical mod-els. Thus, let us return to our starting question: What does the matrix have to do with the social network? Just as in scientific computing, many inter-esting problems, models, and meth-ods for social networks boil down to matrix computations. Yet, as in the expander example above, the types of matrix questions change dramatical-ly in order to fit social network mod-els. Let’s see what’s been done that’s enticingly and refreshingly different from the types of matrix computa-tions encountered in physical scien-tific computing.

EXPANDER GRAPHS AND PARALLEL COMPUTING Recently, a coalition of folks from aca-demia, national labs, and industry set out to tackle the problems in parallel computing and expander graphs. They established the Graph 500 benchmark (http://www.graph500.org) to measure the performance of a parallel com-puter on a standard graph computa-tion with an expander graph. Over the past three years, they’ve seen perfor-mance grow by more than 1,000-times through a combination of novel soft-ware algorithms and higher perfor-mance parallel computers. But, there is still work left in adapting the soft-ware innovations for parallel comput-ing back to matrix computations for social networks.

Figure 1. In a standard scientific computing problem, we find the steady state heat distribution of a plate with a heat-source in the middle. This scientific problem is solved via a linear system. In a social diffusion problem, we are trying to find people who like the movie (labeled in dark orange) instead of people who don’t like the movie (labeled in dark purple). By solving a different linear system, we can determine who is likely to enjoy the movie (light orange).

Problem

Diffusionin a plate

Movie interest indiffusion

Equations

Figure 2. The network, or mesh, from a typical problem in scientific computing resides in a low dimensional space—think of two or three dimensions. These physical spaces put limits on the size of the boundary or “surface area” of the space given its volume. No such limits exist in social networks and these two sets are usually about the same size. A network with this property is called an expander network.

Size of set ≈ Size of boundary

Size of set » Size of boundary

“Networks” from PDEs are usuallyphysical

Social networksare expanders

high

low

h = e�t1X

k=0

tk

k !Pk s

h = e�texp{tP}s

PageRank

Heat kernel

x = (1 � �)1X

k=0

�kP

ks

(I � �P)x = (1 � �)s

P = AD

�1

Px =X

j!i

1d

j

x

j

MMDS 2014

Page 6: Localized methods for diffusions in large graphs

Graph diffusions

David Gleich · Purdue 6

h = e�t1X

k=0

tk

k !Pk s

h = e�texp{tP}s

PageRank

Heat kernel

0 20 40 60 80 100

10−5

100

t=1 t=5 t=15 α=0.85

α=0.99

Weig

ht

Length

x = (1 � �)1X

k=0

�kP

ks

(I � �P)x = (1 � �)s

MMDS 2014

Page 7: Localized methods for diffusions in large graphs

Uniformly localized "solutions in livejournal

1 2 3 4 5

x 106

0

0.5

1

1.5

nnz = 4815948

magnitu

de

plot(x)

100

101

102

103

104

105

106

10−14

10−12

10−10

10−8

10−6

10−4

10−2

100

1−

no

rm e

rro

r

largest non−zeros retained10

010

110

210

310

410

510

6

10−14

10−12

10−10

10−8

10−6

10−4

10−2

100

1−

no

rm e

rro

r

largest non−zeros retained

x = exp(P)ec

David Gleich · Purdue 7

nnz(x) = 4, 815, 948

Gleich & Kloster, arXiv:1310.3423

MMDS 2014

Page 8: Localized methods for diffusions in large graphs

Our mission!Find the solution with work "roughly proportional to the "localization, not the matrix.

David Gleich · Purdue 8 MMDS 2014

Page 9: Localized methods for diffusions in large graphs

Two types of localization

David Gleich · Purdue 9

kx � x

⇤k1 " kD

�1(x � x

⇤)k1 "

x ⇡ x

Uniform (Strong)! Entry-wise (Weak)!

Localized vectors are not sparse, but they can be approximated by sparse vectors.

Good global approximation using only a local region. “Hard” to prove. “Need” a graph property.

Good approximation for cuts and communities. “Easy” to prove. “Fast” algorithms

MMDS 2014

Page 10: Localized methods for diffusions in large graphs

We have four results

1.  A new interpretation for the PageRank diffusion in relationship with a mincut problem.

2.  A new understanding of the scalable, localized PageRank “push” method

3.  A new algorithm for the heat kernel diffusion in a degree weighted norm.

4.  Algorithms for diffusions as functions of matrices (K. Kloster’s poster on Thurs.)

David Gleich · Purdue 10

Undirected graphs only Entry-wise localization

Directed, uniform localization

MMDS 2014

Page 11: Localized methods for diffusions in large graphs

Our algorithms for uniform localization"www.cs.purdue.edu/homes/dgleich/codes/nexpokit

100 101 102 103 104 105 10610−8

10−6

10−4

10−2

100

non−zeros

1−no

rm e

rror

gexpmgexpmqexpmimv

100 101 102 103 104 105 10610−8

10−6

10−4

10−2

100

non−zeros

1−no

rm e

rror

David Gleich · Purdue 11

MMDS 2014

work = O⇣

log(

1

" )(

1

" )

3/2d2

(log d)

2

⌘nnz = O

⇣log(

1

" )(

1

" )

3/2d(log d)

Page 12: Localized methods for diffusions in large graphs

PageRank, mincuts, and the push method via Algorithmic Anti-Differentiation

David Gleich · Purdue 12 Gleich & Mahoney,

ICML 2014 MMDS 2014

Page 13: Localized methods for diffusions in large graphs

The PageRank problem & "the Laplacian on undirected graphs

Combinatorial Laplacian L = D - A!David Gleich · Purdue 13

The PageRank random surfer 1.  With probability beta, follow a random-walk step 2.  With probability (1-beta), jump randomly ~ dist. s. Goal find the stationary dist. x!

x = (1 � �)1X

k=0

�kP

ks1. (I � �AD

�1)x = (1 � �)s;

2. [↵D + L]z = ↵s where � = 1/(1 + ↵) and x = Dz.

MMDS 2014

Page 14: Localized methods for diffusions in large graphs

minimize kBxkC,1

=

Pij2E

C

i ,j

|xi

� x

j

|subject to x

s

= 1, x

t

= 0, x � 0.

The s-t min-cut problem Unweighted incidence matrix

Diagonal capacity matrix

14

David Gleich · Purdue

��

t

s

In the unweighted case, "solve via max-flow. In the weighted case, solve via network simplex or industrial LP.

MMDS 2014

Page 15: Localized methods for diffusions in large graphs

The localized cut graph Related to a construction used in “FlowImprove” "Andersen & Lang (2007); and Orecchia & Zhu (2014)

AS =

2

40 ↵dT

S 0↵dS A ↵dS

0 ↵dTS 0

3

5

Connect s to vertices

in S with weight ↵ · degree

Connect t to vertices

in

¯S with weight ↵ · degree

David Gleich · Purdue 15

MMDS 2014

Page 16: Localized methods for diffusions in large graphs

The localized cut graph Connect s to vertices

in S with weight ↵ · degree

Connect t to vertices

in

¯S with weight ↵ · degree

BS =

2

4e �IS 00 B 00 �IS e

3

5

minimize kB

S

xkC(↵),1

subject to x

s

= 1, x

t

= 0

x � 0.

Solve the s-t min-cut

David Gleich · Purdue 16

MMDS 2014

Page 17: Localized methods for diffusions in large graphs

The localized cut graph Connect s to vertices

in S with weight ↵ · degree

Connect t to vertices

in

¯S with weight ↵ · degree

BS =

2

4e �IS 00 B 00 �IS e

3

5

Solve the “electrical flow” s-t min-cut minimize kB

S

xkC(↵),2

subject to x

s

= 1, x

t

= 0

David Gleich · Purdue 17

MMDS 2014

Page 18: Localized methods for diffusions in large graphs

s-t min-cut à PageRank Proof

Square and expand

the objective into

a Laplacian, then

apply constraints.

David Gleich · Purdue 18

MMDS 2014

The PageRank vector z that solves

(↵D + L)z = ↵s

with s = d

S

/vol(S) is a renormalized

solution of the electrical cut computation:

minimize kB

S

xkC(↵),2

subject to x

s

= 1, x

t

= 0.

Specifically, if x is the solution, then

x =

2

41

vol(S)z

0

3

5

Page 19: Localized methods for diffusions in large graphs

PageRank à s-t min-cut That equivalence works if s is degree-weighted. What if s is the uniform vector?

�����

��

��

��

A(s) =2

40 ↵sT 0↵s A ↵(d � s)0 ↵(d � s)T 0

3

5 .

David Gleich · Purdue 19

MMDS 2014

Page 20: Localized methods for diffusions in large graphs

Insight 1!PageRank implicitly approximates the solution of these s-t mincut problems

David Gleich · Purdue 20

MMDS 2014

Page 21: Localized methods for diffusions in large graphs

The Push Algorithm for PageRank Proposed (in closest form) in Andersen, Chung, Lang "(also by McSherry, Jeh & Widom) for personalized PageRank

Strongly related to Gauss-Seidel on Ax=b (see my talk at Simons) Derived to show improved runtime for balanced solvers

1. x

(1)

= 0, r

(1)

= (1� �)e

i

, k = 1

2. while any r

j

> ⌧d

j

(d

j

is the degree of node j)

3. x

(k+1)

= x

(k )

+ (r

j

� ⌧d

j

⇢)e

j

4. r

(k+1)

i

=

8><

>:

⌧d

j

⇢ i = j

r

(k )

i

+ �(r

j

� ⌧d

j

⇢)/d

j

i ⇠ j

r

(k )

i

otherwise

5. k k + 1

The Push

Method!⌧ , ⇢

David Gleich · Purdue 21

a

b

c

MMDS 2014

Page 22: Localized methods for diffusions in large graphs

Why do we care about push? 1.  Used for empirical stud-

ies of “communities” 2.  Local Cheeger inequality. 3.  Used for “fast Page-

Rank approximation” 4.  It produces weakly

localized approximations to PageRank!

Newman’s netscience!379 vertices, 1828 nnz “zero” on most of the nodes

s has a single "one here

22

kD

�1(x � x

⇤)k1 "

1(1 � �)"

edges

Page 23: Localized methods for diffusions in large graphs

The push method revisited Let x be the output from the push method

with 0 < � < 1, v = dS/vol(S),

⇢ = 1, and ⌧ > 0.

Set ↵ =

1��� , = ⌧vol(S)/�, and let zG solve:

minimize

1

2

kBSzk2

C(↵),2

+ kDzk1

subject to zs = 1, zt = 0, z � 0

,

where z =

h1

zG0

i.

Then x = DzG/vol(S).

Proof Write out KKT conditions

Show that the push method

solves them. Slackness was “tricky”

Regularization for sparsity

David Gleich · Purdue 23

Need for normalization

MMDS 2014

Page 24: Localized methods for diffusions in large graphs

Insight 2!The PageRank push method implicitly solves a 1-norm regularized 2-norm cut approximation.

David Gleich · Purdue 24

MMDS 2014

Page 25: Localized methods for diffusions in large graphs

Insight 2’ We get 3-digits of accuracy on P and 16-digits of accuracy on P’.

David Gleich · Purdue 25

MMDS 2014

Page 26: Localized methods for diffusions in large graphs

David Gleich · Purdue 26

Anti-di↵erentiating Approximation Algorithms

16 nonzeros 15 nonzeros 284 nonzeros 24 nonzeros

Figure 2. Examples of the di↵erent cut vectors on a portion of the netscience graph. In the left subfigure, we show the set S highlightedwith its vertices enlarged. In the other subfigures, we show the solution vectors from the various cut problems (from left to right, Probs. (2),(4), and (6), solved with min-cut, PageRank, and ACL) for this set S . Each vector determines the color and size of a vertex, where highvalues are large and dark. White vertices with outlines are numerically non-zero (which is why most of the vertices in the fourth figure areoutlined, in contrast to the third figure). The true min-cut set is large in all vectors, but the implicitly regularized problem achieves thiswith many fewer non-zeros than the vanilla PageRank problem.

References

Andersen, Reid and Lang, Kevin. An algorithm for improvinggraph partitions. In Proceedings of the 19th annual ACM-SIAM

Symposium on Discrete Algorithms, pp. 651–660, 2008.

Andersen, Reid, Chung, Fan, and Lang, Kevin. Local graph par-titioning using PageRank vectors. In Proceedings of the 47th

Annual IEEE Symposium on Foundations of Computer Science,pp. 475–486, 2006.

Arasu, Arvind, Novak, Jasmine, Tomkins, Andrew, and Tomlin,John. PageRank computation and the structure of the web:Experiments and algorithms. In Proceedings of the 11th inter-

national conference on the World Wide Web, 2002.

Chin, Hui Han, Madry, Aleksander, Miller, Gary L., and Peng,Richard. Runtime guarantees for regression problems. In Pro-

ceedings of the 4th Conference on Innovations in Theoretical

Computer Science, pp. 269–282, 2013.

Dhillon, Inderjit S., Guan, Yuqiang, and Kulis, Brian. Weightedgraph cuts without eigenvectors: A multilevel approach. IEEE

Trans. Pattern Anal. Mach. Intell., 29(11):1944–1957, 2007.

Goldberg, Andrew V. and Tarjan, Robert E. A new approach to themaximum-flow problem. Journal of the ACM, 35(4):921–940,1988.

Koutra, Danai, Ke, Tai-You, Kang, U., Chau, Duen Horng, Pao,Hsing-Kuo Kenneth, and Faloutsos, Christos. Unifying guilt-by-association approaches: Theorems and fast algorithms. In Pro-

ceedings of the 2011 European conference on Machine learning

and knowledge discovery in databases, pp. 245–260, 2011.

Kulis, Brian and Jordan, Michael I. Revisiting k-means: Newalgorithms via Bayesian nonparametrics. In Proceedings of the

29th International Conference on Machine Learning, 2012.

Langville, Amy N. and Meyer, Carl D. Google’s PageRank and

Beyond: The Science of Search Engine Rankings. PrincetonUniversity Press, 2006.

Leskovec, Jure, Lang, Kevin J., Dasgupta, Anirban, and Mahoney,Michael W. Community structure in large networks: Naturalcluster sizes and the absence of large well-defined clusters.Internet Mathematics, 6(1):29–123, 2009.

Mahoney, Michael W. Approximate computation and implicitregularization for very large-scale data analysis. In Proceedings

of the 31st Symposium on Principles of Database Systems, pp.143–154, 2012.

Mahoney, Michael W., Orecchia, Lorenzo, and Vishnoi,Nisheeth K. A local spectral method for graphs: With applica-tions to improving graph partitions and exploring data graphslocally. Journal of Machine Learning Research, 13:2339–2365,2012.

Newman, M. E. J. Finding community structure in networks usingthe eigenvectors of matrices. Phys. Rev. E, 74:036104, 2006.

Orecchia, Lorenzo and Mahoney, Michael W. Implementing regu-larization implicitly via approximate eigenvector computation.In Proceedings of the 28th International Conference on Machine

Learning, pp. 121–128, 2011.

Orecchia, Lorenzo and Zhu, Zeyuan Allen. Flow-based algorithmsfor local graph clustering. In Proceedings of the 25th ACM-

SIAM Symposium on Discrete Algorithms, pp. 1267–1286, 2014.

Page, Lawrence, Brin, Sergey, Motwani, Rajeev, and Winograd,Terry. The PageRank citation ranking: Bringing order to theweb. Technical Report 1999-66, Stanford University, November1999.

Saunders, Michael A. Solution of sparse rectangular systemsusing LSQR and CRAIG. BIT Numerical Mathematics, 35(4):588–604, 1995.

Tibshirani, Robert. Regression shrinkage and selection via thelasso. Journal of the Royal Statistical Society, Series B, 58:267–288, 1994.

Zhou, Dengyong, Bousquet, Olivier, Lal, Thomas Navin, Weston,Jason, and Scholkopf, Bernhard. Learning with local and globalconsistency. In Advances in Neural Information Processing

Systems 16, pp. 321–328, 2004.

Anti-di↵erentiating Approximation Algorithms

16 nonzeros 15 nonzeros 284 nonzeros 24 nonzeros

Figure 2. Examples of the di↵erent cut vectors on a portion of the netscience graph. In the left subfigure, we show the set S highlightedwith its vertices enlarged. In the other subfigures, we show the solution vectors from the various cut problems (from left to right, Probs. (2),(4), and (6), solved with min-cut, PageRank, and ACL) for this set S . Each vector determines the color and size of a vertex, where highvalues are large and dark. White vertices with outlines are numerically non-zero (which is why most of the vertices in the fourth figure areoutlined, in contrast to the third figure). The true min-cut set is large in all vectors, but the implicitly regularized problem achieves thiswith many fewer non-zeros than the vanilla PageRank problem.

References

Andersen, Reid and Lang, Kevin. An algorithm for improvinggraph partitions. In Proceedings of the 19th annual ACM-SIAM

Symposium on Discrete Algorithms, pp. 651–660, 2008.

Andersen, Reid, Chung, Fan, and Lang, Kevin. Local graph par-titioning using PageRank vectors. In Proceedings of the 47th

Annual IEEE Symposium on Foundations of Computer Science,pp. 475–486, 2006.

Arasu, Arvind, Novak, Jasmine, Tomkins, Andrew, and Tomlin,John. PageRank computation and the structure of the web:Experiments and algorithms. In Proceedings of the 11th inter-

national conference on the World Wide Web, 2002.

Chin, Hui Han, Madry, Aleksander, Miller, Gary L., and Peng,Richard. Runtime guarantees for regression problems. In Pro-

ceedings of the 4th Conference on Innovations in Theoretical

Computer Science, pp. 269–282, 2013.

Dhillon, Inderjit S., Guan, Yuqiang, and Kulis, Brian. Weightedgraph cuts without eigenvectors: A multilevel approach. IEEE

Trans. Pattern Anal. Mach. Intell., 29(11):1944–1957, 2007.

Goldberg, Andrew V. and Tarjan, Robert E. A new approach to themaximum-flow problem. Journal of the ACM, 35(4):921–940,1988.

Koutra, Danai, Ke, Tai-You, Kang, U., Chau, Duen Horng, Pao,Hsing-Kuo Kenneth, and Faloutsos, Christos. Unifying guilt-by-association approaches: Theorems and fast algorithms. In Pro-

ceedings of the 2011 European conference on Machine learning

and knowledge discovery in databases, pp. 245–260, 2011.

Kulis, Brian and Jordan, Michael I. Revisiting k-means: Newalgorithms via Bayesian nonparametrics. In Proceedings of the

29th International Conference on Machine Learning, 2012.

Langville, Amy N. and Meyer, Carl D. Google’s PageRank and

Beyond: The Science of Search Engine Rankings. PrincetonUniversity Press, 2006.

Leskovec, Jure, Lang, Kevin J., Dasgupta, Anirban, and Mahoney,Michael W. Community structure in large networks: Naturalcluster sizes and the absence of large well-defined clusters.Internet Mathematics, 6(1):29–123, 2009.

Mahoney, Michael W. Approximate computation and implicitregularization for very large-scale data analysis. In Proceedings

of the 31st Symposium on Principles of Database Systems, pp.143–154, 2012.

Mahoney, Michael W., Orecchia, Lorenzo, and Vishnoi,Nisheeth K. A local spectral method for graphs: With applica-tions to improving graph partitions and exploring data graphslocally. Journal of Machine Learning Research, 13:2339–2365,2012.

Newman, M. E. J. Finding community structure in networks usingthe eigenvectors of matrices. Phys. Rev. E, 74:036104, 2006.

Orecchia, Lorenzo and Mahoney, Michael W. Implementing regu-larization implicitly via approximate eigenvector computation.In Proceedings of the 28th International Conference on Machine

Learning, pp. 121–128, 2011.

Orecchia, Lorenzo and Zhu, Zeyuan Allen. Flow-based algorithmsfor local graph clustering. In Proceedings of the 25th ACM-

SIAM Symposium on Discrete Algorithms, pp. 1267–1286, 2014.

Page, Lawrence, Brin, Sergey, Motwani, Rajeev, and Winograd,Terry. The PageRank citation ranking: Bringing order to theweb. Technical Report 1999-66, Stanford University, November1999.

Saunders, Michael A. Solution of sparse rectangular systemsusing LSQR and CRAIG. BIT Numerical Mathematics, 35(4):588–604, 1995.

Tibshirani, Robert. Regression shrinkage and selection via thelasso. Journal of the Royal Statistical Society, Series B, 58:267–288, 1994.

Zhou, Dengyong, Bousquet, Olivier, Lal, Thomas Navin, Weston,Jason, and Scholkopf, Bernhard. Learning with local and globalconsistency. In Advances in Neural Information Processing

Systems 16, pp. 321–328, 2004.

Push’s sparsity helps it identify the “right” graph feature with fewer non-zeros

The set S The mincut solution

The push solution The PageRank solution

MMDS 2014

Page 27: Localized methods for diffusions in large graphs

The push method revisited Let x be the output from the push method

with 0 < � < 1, v = dS/vol(S),

⇢ = 1, and ⌧ > 0.

Set ↵ =

1��� , = ⌧vol(S)/�, and let zG solve:

minimize

1

2

kBSzk2

C(↵),2

+ kDzk1

subject to zs = 1, zt = 0, z � 0

,

where z =

h1

zG0

i.

Then x = DzG/vol(S).

Regularization for sparsity in solution and residual

David Gleich · Purdue 27

The push method is scalable because it gives us sparse solutions AND sparse residuals r.

MMDS 2014

Page 28: Localized methods for diffusions in large graphs

This is a case of Algorithmic Anti-differentiation!

28

MMDS 2014 David Gleich · Purdue

Page 29: Localized methods for diffusions in large graphs

Understand why H works!Show heuristic H solves P’ Guess and check!until you find something H solves Derive characterization of heuristic H

The real world

Given “find-communities” Hack around " Write paper presenting “three steps of the power method on P finds communities”

Algorithmic Anti-differentiation!Given heuristic H, is there a problem P’ such that H is an algorithm for P’ ?

MMDS 2014 David Gleich · Purdue 29

e.g. Mahoney & Orecchia, Dhillon et al. (Graclus); Saunders

Page 30: Localized methods for diffusions in large graphs

Without these insights, we’d draw the wrong conclusion.

David Gleich · Purdue 30

Gleich & Mahoney, Submitted

Our s-t mincut framework extends to many diffusions used in semi-supervised learning.

MMDS 2014

Page 31: Localized methods for diffusions in large graphs

Without these insights, we’d draw the wrong conclusion.

David Gleich · Purdue 31

Gleich & Mahoney, Submitted

Our s-t mincut framework extends to many diffusions used in semi-supervised learning.

2 4 6 8 100

0.2

0.4

0.6

0.8

err

or

rate

average training samples per class

K2RK2K3RK3

Off the shelf SSL procedure

MMDS 2014

Page 32: Localized methods for diffusions in large graphs

Without these insights, we’d draw the wrong conclusion.

David Gleich · Purdue 32

Gleich & Mahoney, Submitted

Our s-t mincut framework extends to many diffusions used in semi-supervised learning.

2 4 6 8 100

0.2

0.4

0.6

0.8

err

or

rate

average training samples per class

K2RK2K3RK3

2 4 6 8 100

0.2

0.4

0.6

0.8

err

or

rate

average training samples per class

K2RK2K3RK3

Off the shelf SSL procedure Rank-rounded SSL

MMDS 2014

Page 33: Localized methods for diffusions in large graphs

Recap so far 1.  Used the relationship between

PageRank and mincut to get a new understanding of the implicit properties of the push method

2.  Showed that this insight helps improve semi-supervised learning.

(next) A new algorithm for the heat kernel diffusion in a degree weighted norm.

David Gleich · Purdue 33

MMDS 2014

Page 34: Localized methods for diffusions in large graphs

Graph diffusions

David Gleich · Purdue 34

h = e�t1X

k=0

tk

k !Pk s

h = e�texp{tP}s

PageRank

Heat kernel 0 20 40 60 80 100

10−5

100

t=1 t=5 t=15 α=0.85

α=0.99

Weig

ht

Length

x = (1 � �)1X

k=0

�kP

ks

(I � �P)x = (1 � �)s

Many “empirically useful” properties of PageRank also hold for the Heat kernel diffusion, e.g. "Chung (2007) showed a local Cheeger inequality. No “local” algorithm until a randomized method by Simpson & Chung (2013).

MMDS 2014

Page 35: Localized methods for diffusions in large graphs

We can turn the heat kernel into a linear system Direct expansion! "!!!

David Gleich · Purdue 35

x = exp(P)ec ⇡PN

k=0

1

k !

P

kec = xN

2

6666664

III�P/1 III

�P/2. . .. . . III

�P/N III

3

7777775

2

6666664

v0v1......

vN

3

7777775=

2

6666664

ec0......0

3

7777775xN =

NX

i=0

vi

(III⌦ IIIN � SN ⌦ P)v = e1 ⌦ ec

Lemma we approximate xN well if we approximate v well

Kloster & Gleich, WAW2013

MMDS 2014

Page 36: Localized methods for diffusions in large graphs

There is a fast deterministic adaptation of the push method

David Gleich · Purdue 36

Kloster & Gleich, KDD2014

These polynomials k(t) are closely related to the � functionscentral to exponential integrators in ODEs [37]. Note that 0 = TN .

To guarantee the total error satisfies the criterion (3) then,it is enough to show that the error at each Taylor termsatisfies an 1-norm inequality analogous to (3). This isdiscussed in more detail in Section A.

4.3 Deriving a linear systemTo define the basic step of the hk-relax algorithm and to

show how the k influence the total error, we rearrange theTaylor polynomial computation into a linear system.

Denote by vk the k

th term of the vector sum TN (tP)s:

TN (tP)s = s+ t1Ps+ · · ·+ tN

N !PNs (7)

= v0 + v1 + · · ·+ vN . (8)

Note that vk+1 = tk+1

(k+1)!Pk+1 = t

(k+1)Pvk. This identityimplies that the terms vk exactly satisfy the linear system

2

6

6

6

6

6

6

4

I�t1 P I

�t2 P

. . .

. . . I�tN P I

3

7

7

7

7

7

7

5

2

6

6

6

6

6

6

4

v0

v1

...

...vN

3

7

7

7

7

7

7

5

=

2

6

6

6

6

6

6

4

s0......0

3

7

7

7

7

7

7

5

. (9)

Let v = [v0;v1; · · · ;vN ]. An approximate solution v to(9) would have block components vk such that

PNk=0 vk ⇡

TN (tP)s, our desired approximation to e

th. In practice, weupdate only a single length n solution vector, adding allupdates to that vector, instead of maintaining the N + 1di↵erent block vectors vk as part of v; furthermore, the blockmatrix and right-hand side are never formed explicitly.

With this block system in place, we can describe the algo-rithm’s steps.

4.4 The hk-relax algorithmGiven a random walk transition matrix P, scalar t > 0,

and seed vector s as inputs, we solve the linear system from(9) as follows. Denote the initial solution vector by y and theinitial nN ⇥ 1 residual by r(0) = e1 ⌦ s. Denote by r(i, j) theentry of r corresponding to node i in residual block j. Theidea is to iteratively remove all entries from r that satisfy

r(i, j) � e

t"di

2N j(t). (10)

To organize this process, we begin by placing the nonzeroentries of r(0) in a queue, Q(r), and place updated entries ofr into Q(r) only if they satisfy (10).

Then hk-relax proceeds as follows.1. At each step, pop the top entry of Q(r), call it r(i, j),

and subtract that entry in r, making r(i, j) = 0.2. Add r(i, j) to yi.3. Add r(i, j) t

j+1Pej to residual block rj+1.4. For each entry of rj+1 that was updated, add that entry

to the back of Q(r) if it satisfies (10).Once all entries of r that satisfy (10) have been removed,

the resulting solution vector y will satisfy (3), which weprove in Section A, along with a bound on the work requiredto achieve this. We present working code for our methodin Figure 2 that shows how to optimize this computationusing sparse data structures. These make it highly e�cientin practice.

# G is graph as dictionary -of-sets ,

# seed is an array of seeds ,

# t, eps , N, psis are precomputed

x = {} # Store x, r as dictionaries

r = {} # initialize residual

Q = collections.deque() # initialize queue

for s in seed:

r[(s,0)] = 1./len(seed)

Q.append ((s,0))

while len(Q) > 0:

(v,j) = Q.popleft () # v has r[(v,j)] ...

rvj = r[(v,j)]

# perform the hk-relax step

if v not in x: x[v] = 0.

x[v] += rvj

r[(v,j)] = 0.

mass = (t*rvj/(float(j)+1.))/ len(G[v])

for u in G[v]: # for neighbors of v

next = (u,j+1) # in the next block

if j+1 == N: # last step , add to soln

x[u] += rvj/len(G(v))

continue

if next not in r: r[next] = 0.

thresh = math.exp(t)*eps*len(G[u])

thresh = thresh /(N*psis[j+1])/2.

if r[next] < thresh and \

r[next] + mass >= thresh:

Q.append(next) # add u to queue

r[next] = r[next] + mass

Figure 2: Pseudo-code for our algorithm as work-ing python code. The graph is stored as a dic-tionary of sets so that the G[v] statement re-turns the set of neighbors associated with vertexv. The solution is the vector x indexed by ver-tices and the residual vector is indexed by tuples(v, j) that are pairs of vertices and steps j. Afully working demo may be downloaded from githubhttps://gist.github.com/dgleich/7d904a10dfd9ddfaf49a.

4.5 Choosing NThe last detail of the algorithm we need to discuss is how

to pick N . In (4) we want to guarantee the accuracy ofD�1 exp {tP} s � D�1

TN (tP)s. By using D�1 exp {tP} =exp

tPT

D�1 andD�1TN (tP) = TN (tPT )D�1, we can get

a new upperbound on kD�1 exp {tP} s�D�1TN (tP)sk1 by

noting

k expn

tPTo

D�1s� TN (tPT )D�1sk1

k expn

tPTo

� TN (tPT )k1kD�1sk1.

Since s is stochastic, we have kD�1sk1 kD�1sk1 1.From [34] we know that the norm k exp

tPT

�TN (tPT )k1is bounded by

ktPT kN+11

(N + 1)!(N + 2)

(N + 2� t) t

N+1

(N + 1)!(N + 2)

(N + 2� t). (11)

So to guarantee (4), it is enough to choose N that impliestN+1

(N+1)!(N+2)

(N+2�t) < "/2. Such an N can be determined e�-ciently simply by iteratively computing terms of the Taylorpolynomial for et until the error is less than the desired errorfor hk-relax. In practice, this required a choice of N nogreater than 2t log( 1" ), which we think can be made rigorous.

Let h = e�texp{tP}s.

Let x = hk-push(") output

Then kD

�1

(x � h)k1 "

after looking at

2Net

" edges.

We believe that the bound below suffices N 2t log(1/")

MMDS 2014

Page 37: Localized methods for diffusions in large graphs

PageRank vs. Heat Kernel

David Gleich · Purdue 37

Table 1: DatasetsGraph |V | |E|pgp-cc 10,680 24,316ca-AstroPh-cc 17,903 196,972marvel-comics-cc 19,365 96,616as-22july06 22,963 48,436rand-ff-25000-0.4 25,000 56,071cond-mat-2003-cc 27,519 116,181email-Enron-cc 33,696 180,811cond-mat-2005-fix-cc 36,458 171,735soc-sign-epinions-cc 119,130 704,267itdk0304-cc 190,914 607,610dblp-cc 226,413 716,460flickr-bidir-cc 513,969 3,190,452ljournal-2008 5,363,260 50,030,085twitter-2010 41,652,230 1,202,513,046friendster 65,608,366 1,806,067,135com-amazon 334,863 925,872com-dblp 317,080 1,049,866com-youtube 1,134,890 2,987,624com-lj 3,997,962 34,681,189com-orkut 3,072,441 117,185,083

Table 2: The result of evaluating the heat kernel (hk)vs. PageRank (pr) on finding real-world communi-ties. The heat kernel finds smaller, more accurate,sets with slightly worse conductance.

data F1-measure conductance set sizehk pr hk pr hk pr

amazon 0.325 0.140 0.141 0.048 193 15293dblp 0.257 0.115 0.267 0.173 44 16026youtube 0.177 0.136 0.337 0.321 1010 6079lj 0.131 0.107 0.474 0.459 283 738orkut 0.055 0.044 0.714 0.687 537 1989friendster 0.078 0.090 0.785 0.802 229 333

scores, but using much smaller sets with substantially betterF1 measures. This suggests that hk-relax better capturesthe properties of real-world communities than the PageRankdi↵usion in the sense that the tighter sets produced by theheat kernel are better focused around real-world communitiesthan are the larger sets produced by the PageRank di↵usion.

7. CONCLUSIONSThese results suggest that the hk-relax algorithm is a

viable companion to the celebrated PageRank push algorithmand may even be a worthy competitor for tasks that requireaccurate communities of large graphs. Furthermore, wesuspect that the hk-relax method will be useful for themyriad other uses of PageRank-style di↵usions such as link-prediction [29] or even logic programming [51].

In the future, we plan to explore this method on directednetworks as well as better methods for selecting the parame-ters of the di↵usion. It is also possible that our new ideas maytranslate to faster methods for non-conservative di↵usionssuch as the Katz [25] and modularity methods [43], and weplan to explore these di↵usions as well.

5 6 7 8 90

0.5

1

1.5

2Runtime: hk vs. ppr

log10(|V|+|E|)

Ru

ntim

e (

s)

hkgrow 50%25%75%pprgrow 50%25%75%

5 6 7 8 9

10−2

10−1

100

Conductances: hk vs. ppr

log10(|V|+|E|)

log

10

(Co

nd

uct

an

ces)

hkgrow 50%25%75%pprgrow 50%25%75%

Figure 5: (Top figure) Runtimes of the hk-relaxvs. ppr-push, shown with percentile trendlines froma select set of experiments. (Bottom) Conductancesof hk-relax vs. ppr-push, shown in the same way.

AcknowledgementsThis work was supported by NSF CAREER Award CCF-1149756.

8. REFERENCES[1] B. Abrahao, S. Soundarajan, J. Hopcroft, and R. Kleinberg.

On the separability of structural classes of communities. InKDD, pages 624–632, 2012.

[2] A. H. Al-Mohy and N. J. Higham. Computing the action ofthe matrix exponential, with an application to exponentialintegrators. SIAM J. Sci. Comput., 33(2):488–511, March2011.

[3] R. Alberich, J. Miro-Julia, and F. Rossello. Marvel universelooks almost like a real social network. arXiv,cond-mat.dis-nn:0202174, 2002.

[4] N. Alon and V. D. Milman. �1, isoperimetric inequalities forgraphs, and superconcentrators. J. of Comb. Theory, SeriesB, 38(1):73–88, 1985.

[5] R. Andersen, F. Chung, and K. Lang. Local graphpartitioning using PageRank vectors. In FOCS, 2006.

[6] R. Andersen and K. Lang. An algorithm for improvinggraph partitions. In SODA, pages 651–660, January 2008.

[7] K. Avrachenkov, N. Litvak, D. Nemirovsky, and N. Osipova.Monte carlo methods in pagerank computation: When oneiteration is su�cient. SIAM J. Numer. Anal., 45(2):890–904,February 2007.

[8] B. Bahmani, K. Chakrabarti, and D. Xin. Fast personalizedpagerank on MapReduce. In SIGMOD, pages 973–984, NewYork, NY, USA, 2011. ACM.

Table 1: DatasetsGraph |V | |E|pgp-cc 10,680 24,316ca-AstroPh-cc 17,903 196,972marvel-comics-cc 19,365 96,616as-22july06 22,963 48,436rand-ff-25000-0.4 25,000 56,071cond-mat-2003-cc 27,519 116,181email-Enron-cc 33,696 180,811cond-mat-2005-fix-cc 36,458 171,735soc-sign-epinions-cc 119,130 704,267itdk0304-cc 190,914 607,610dblp-cc 226,413 716,460flickr-bidir-cc 513,969 3,190,452ljournal-2008 5,363,260 50,030,085twitter-2010 41,652,230 1,202,513,046friendster 65,608,366 1,806,067,135com-amazon 334,863 925,872com-dblp 317,080 1,049,866com-youtube 1,134,890 2,987,624com-lj 3,997,962 34,681,189com-orkut 3,072,441 117,185,083

Table 2: The result of evaluating the heat kernel (hk)vs. PageRank (pr) on finding real-world communi-ties. The heat kernel finds smaller, more accurate,sets with slightly worse conductance.

data F1-measure conductance set sizehk pr hk pr hk pr

amazon 0.325 0.140 0.141 0.048 193 15293dblp 0.257 0.115 0.267 0.173 44 16026youtube 0.177 0.136 0.337 0.321 1010 6079lj 0.131 0.107 0.474 0.459 283 738orkut 0.055 0.044 0.714 0.687 537 1989friendster 0.078 0.090 0.785 0.802 229 333

scores, but using much smaller sets with substantially betterF1 measures. This suggests that hk-relax better capturesthe properties of real-world communities than the PageRankdi↵usion in the sense that the tighter sets produced by theheat kernel are better focused around real-world communitiesthan are the larger sets produced by the PageRank di↵usion.

7. CONCLUSIONSThese results suggest that the hk-relax algorithm is a

viable companion to the celebrated PageRank push algorithmand may even be a worthy competitor for tasks that requireaccurate communities of large graphs. Furthermore, wesuspect that the hk-relax method will be useful for themyriad other uses of PageRank-style di↵usions such as link-prediction [29] or even logic programming [51].

In the future, we plan to explore this method on directednetworks as well as better methods for selecting the parame-ters of the di↵usion. It is also possible that our new ideas maytranslate to faster methods for non-conservative di↵usionssuch as the Katz [25] and modularity methods [43], and weplan to explore these di↵usions as well.

5 6 7 8 90

0.5

1

1.5

2Runtime: hk vs. ppr

log10(|V|+|E|)

Ru

ntim

e (

s)

hkgrow 50%25%75%pprgrow 50%25%75%

5 6 7 8 9

10−2

10−1

100

Conductances: hk vs. ppr

log10(|V|+|E|)

log

10

(Co

nd

uct

an

ces)

hkgrow 50%25%75%pprgrow 50%25%75%

Figure 5: (Top figure) Runtimes of the hk-relaxvs. ppr-push, shown with percentile trendlines froma select set of experiments. (Bottom) Conductancesof hk-relax vs. ppr-push, shown in the same way.

AcknowledgementsThis work was supported by NSF CAREER Award CCF-1149756.

8. REFERENCES[1] B. Abrahao, S. Soundarajan, J. Hopcroft, and R. Kleinberg.

On the separability of structural classes of communities. InKDD, pages 624–632, 2012.

[2] A. H. Al-Mohy and N. J. Higham. Computing the action ofthe matrix exponential, with an application to exponentialintegrators. SIAM J. Sci. Comput., 33(2):488–511, March2011.

[3] R. Alberich, J. Miro-Julia, and F. Rossello. Marvel universelooks almost like a real social network. arXiv,cond-mat.dis-nn:0202174, 2002.

[4] N. Alon and V. D. Milman. �1, isoperimetric inequalities forgraphs, and superconcentrators. J. of Comb. Theory, SeriesB, 38(1):73–88, 1985.

[5] R. Andersen, F. Chung, and K. Lang. Local graphpartitioning using PageRank vectors. In FOCS, 2006.

[6] R. Andersen and K. Lang. An algorithm for improvinggraph partitions. In SODA, pages 651–660, January 2008.

[7] K. Avrachenkov, N. Litvak, D. Nemirovsky, and N. Osipova.Monte carlo methods in pagerank computation: When oneiteration is su�cient. SIAM J. Numer. Anal., 45(2):890–904,February 2007.

[8] B. Bahmani, K. Chakrabarti, and D. Xin. Fast personalizedpagerank on MapReduce. In SIGMOD, pages 973–984, NewYork, NY, USA, 2011. ACM.

On large graphs, our heat kernel takes slightly longer than a localized PageRank, but produces sets with smaller (better) conductance scores. Our python code on clueweb12 (72B edges) via libbvg: •  99 seconds to load •  1 second to compute

MMDS 2014

Page 38: Localized methods for diffusions in large graphs

References and ongoing work

Gleich and Kloster – Relaxation methods for the matrix exponential, Submitted"Kloster and Gleich – Heat kernel based community detection KDD2014 Gleich and Mahoney – Algorithmic Anti-differentiation, ICML 2014 "Gleich and Mahoney – Regularized diffusions, Submitted www.cs.purdue.edu/homes/dgleich/codes/nexpokit!www.cs.purdue.edu/homes/dgleich/codes/l1pagerank

•  Improved localization bounds for functions of matrices •  Asynchronous and parallel “push”-style methods

David Gleich · Purdue 38

Supported by NSF CAREER 1149756-CCF www.cs.purdue.edu/homes/dgleich