Fast pair-wise and node-wise algorithms for commute times and Katz scores
-
Upload
david-gleich -
Category
Education
-
view
1.426 -
download
0
description
Transcript of Fast pair-wise and node-wise algorithms for commute times and Katz scores
![Page 1: Fast pair-wise and node-wise algorithms for commute times and Katz scores](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b7218b4a7959177f8b47a3/html5/thumbnails/1.jpg)
FAST MATRIX COMPUTATIONS FOR
PAIRWISE AND COLUMN-WISE KATZ
SCORES AND COMMUTE TIMES
David F. Gleich
Purdue University
Emory University
Math/CS Seminar
January 20, 2012
With Pooya Esfandiar, Francesco Bonchi, Chen Greif,
Laks V. S. Lakshmanan, and Byung-Won On
David F. Gleich (Purdue) Emory Math/CS Seminar 1 / 47
![Page 2: Fast pair-wise and node-wise algorithms for commute times and Katz scores](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b7218b4a7959177f8b47a3/html5/thumbnails/2.jpg)
MAIN RESULTS
A – adjacency matrix
L – Laplacian matrix
Katz score :
Commute time:
For Commute
Compute one fast
Estimate
David F. Gleich (Purdue) Emory Math/CS Seminar
For Katz Compute one fast Compute top fast
2 of 47
![Page 3: Fast pair-wise and node-wise algorithms for commute times and Katz scores](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b7218b4a7959177f8b47a3/html5/thumbnails/3.jpg)
OUTLINE
Why study these measures?
Katz Rank and Commute Time
How else do people compute them?
Quadrature rules for pairwise scores
Sparse linear systems solves for top-k
As many results as we have time for…
David F. Gleich (Purdue) Emory Math/CS Seminar 3 of 47
![Page 4: Fast pair-wise and node-wise algorithms for commute times and Katz scores](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b7218b4a7959177f8b47a3/html5/thumbnails/4.jpg)
WHY? LINK PREDICTION
David F. Gleich (Purdue) Emory Math/CS Seminar
Liben-Nowell and Kleinberg 2003, 2006 found that path based link prediction was more efficient
Neighborhood based
Path based
4 of 47
![Page 5: Fast pair-wise and node-wise algorithms for commute times and Katz scores](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b7218b4a7959177f8b47a3/html5/thumbnails/5.jpg)
NOTE
All graphs are undirected
All graphs are connected
David F. Gleich (Purdue) Emory Math/CS Seminar 5 of 47
![Page 6: Fast pair-wise and node-wise algorithms for commute times and Katz scores](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b7218b4a7959177f8b47a3/html5/thumbnails/6.jpg)
LEO KATZ
David F. Gleich (Purdue) Emory Math/CS Seminar 6 of 47
![Page 7: Fast pair-wise and node-wise algorithms for commute times and Katz scores](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b7218b4a7959177f8b47a3/html5/thumbnails/7.jpg)
NOT QUITE, WIKIPEDIA
: adjacency, : random walk
PageRank
Katz
These are equivalent if has constant degree
David F. Gleich (Purdue) Emory Math/CS Seminar 7 of 47
![Page 8: Fast pair-wise and node-wise algorithms for commute times and Katz scores](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b7218b4a7959177f8b47a3/html5/thumbnails/8.jpg)
WHAT KATZ ACTUALLY SAID
Leo Katz 1953, A New Status Index Derived from Sociometric Analysis, Psychometria 18(1):39-43
“we assume that each link independently has the
same probability of being effective” …
“we conceive a constant , depending
on the group and the context of the particular
investigation, which has the force of a probability
of effectiveness of a single link. A k-step chain
then, has probability of being effective.”
“We wish to find the column sums of the matrix”
David F. Gleich (Purdue) Emory Math/CS Seminar 8 of 47
![Page 9: Fast pair-wise and node-wise algorithms for commute times and Katz scores](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b7218b4a7959177f8b47a3/html5/thumbnails/9.jpg)
A MODERN TAKE
The Katz score (node-based) is
The Katz score (edge-based) is
David F. Gleich (Purdue) Emory Math/CS Seminar 9 of 47
![Page 10: Fast pair-wise and node-wise algorithms for commute times and Katz scores](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b7218b4a7959177f8b47a3/html5/thumbnails/10.jpg)
RETURNING TO THE MATRIX
Carl Neumann
David F. Gleich (Purdue) Emory Math/CS Seminar 10 of 47
![Page 11: Fast pair-wise and node-wise algorithms for commute times and Katz scores](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b7218b4a7959177f8b47a3/html5/thumbnails/11.jpg)
Carl Neumann
I’ve heard the Neumann series called the “von Neumann”
series more than I’d like! In fact, the von Neumann kernel
of a graph should be named the “Neumann” kernel!
David F. Gleich (Purdue) Emory Math/CS Seminar
Wikipedia page
11 / 47
![Page 12: Fast pair-wise and node-wise algorithms for commute times and Katz scores](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b7218b4a7959177f8b47a3/html5/thumbnails/12.jpg)
PROPERTIES OF KATZ’S MATRIX
is symmetric
exists when
is spd when
Note that 1/max-degree suffices
David F. Gleich (Purdue) Emory Math/CS Seminar 12 of 47
![Page 13: Fast pair-wise and node-wise algorithms for commute times and Katz scores](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b7218b4a7959177f8b47a3/html5/thumbnails/13.jpg)
COMMUTE TIME
David F. Gleich (Purdue) Emory Math/CS Seminar
Picture taken from Google images, seems to be
Bay Bridge Traffic by Jim M. Goldstein.
13 / 47
![Page 14: Fast pair-wise and node-wise algorithms for commute times and Katz scores](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b7218b4a7959177f8b47a3/html5/thumbnails/14.jpg)
COMMUTE TIME
Consider a uniform random walk on a graph
David F. Gleich (Purdue) Emory Math/CS Seminar
Also called the hitting
time from node i to j, or
the first transition time
14 of 47
![Page 15: Fast pair-wise and node-wise algorithms for commute times and Katz scores](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b7218b4a7959177f8b47a3/html5/thumbnails/15.jpg)
SKIPPING DETAILS
: graph Laplacian
is the only null-vector
David F. Gleich (Purdue) Emory Math/CS Seminar 15 of 47
![Page 16: Fast pair-wise and node-wise algorithms for commute times and Katz scores](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b7218b4a7959177f8b47a3/html5/thumbnails/16.jpg)
WHAT DO OTHER PEOPLE DO?
1) Just work with the linear algebra formulations
2) For Katz, Truncate the Neumann series as a few (3-5) terms, or MMQ (Benzi & Boito)
3) Use low-rank approximations from EVD(A) or EVD(L)
4) For commute, use Johnson-Lindenstrauss inspired random sampling
5) Approximately decompose into smaller problems
David F. Gleich (Purdue) Emory Math/CS Seminar
Liben-Nowell and Kleinberg CIKM2003, Acar et al. ICDM2009,
Spielman and Srivastava STOC2008, Sarkar and Moore UAI2007, Wang et al. ICDM2007 16 of 47
![Page 17: Fast pair-wise and node-wise algorithms for commute times and Katz scores](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b7218b4a7959177f8b47a3/html5/thumbnails/17.jpg)
THE PROBLEM
All of these techniques are
preprocessing based because
most people’s goal is to compute
all the scores.
We want to avoid
preprocessing the graph.
David F. Gleich (Purdue) Emory Math/CS Seminar
There are a few caveats here! i.e. one could solve the system instead of looking for the matrix inverse
17 of 47
![Page 18: Fast pair-wise and node-wise algorithms for commute times and Katz scores](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b7218b4a7959177f8b47a3/html5/thumbnails/18.jpg)
WHY NO PREPROCESSING?
The graph is constantly changing
as I rate new movies.
David F. Gleich (Purdue) Emory Math/CS Seminar 18 of 47
![Page 19: Fast pair-wise and node-wise algorithms for commute times and Katz scores](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b7218b4a7959177f8b47a3/html5/thumbnails/19.jpg)
WHY NO PREPROCESSING?
David F. Gleich (Purdue) Emory Math/CS Seminar
Top-k predicted “links”
are movies to watch!
Pairwise scores give
user similarity
19 of 47
![Page 20: Fast pair-wise and node-wise algorithms for commute times and Katz scores](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b7218b4a7959177f8b47a3/html5/thumbnails/20.jpg)
PAIR-WISE ALGORITHMS
David F. Gleich (Purdue) Emory Math/CS Seminar 20 / 47
![Page 21: Fast pair-wise and node-wise algorithms for commute times and Katz scores](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b7218b4a7959177f8b47a3/html5/thumbnails/21.jpg)
PAIRWISE ALGORITHMS
Katz
Commute
David F. Gleich (Purdue) Emory Math/CS Seminar
Golub and Meurant
to the rescue!
21 of 47
![Page 22: Fast pair-wise and node-wise algorithms for commute times and Katz scores](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b7218b4a7959177f8b47a3/html5/thumbnails/22.jpg)
MMQ - THE BIG IDEA
Quadratic form
Weighted sum
Stieltjes integral
Quadrature approximation
Matrix equation David F. Gleich (Purdue) Emory Math/CS Seminar
Think
A is s.p.d. use EVD
“A tautology”
Lanczos
22 of 47
![Page 23: Fast pair-wise and node-wise algorithms for commute times and Katz scores](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b7218b4a7959177f8b47a3/html5/thumbnails/23.jpg)
LANCZOS
, k-steps of the Lanczos method produce
and
David F. Gleich (Purdue) Emory Math/CS Seminar
=
23 of 47
![Page 24: Fast pair-wise and node-wise algorithms for commute times and Katz scores](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b7218b4a7959177f8b47a3/html5/thumbnails/24.jpg)
PRACTICAL LANCZOS
Only need to store the last 2 vectors in
Updating requires O(matvec) work
is not orthogonal
David F. Gleich (Purdue) Emory Math/CS Seminar 24 of 47
![Page 25: Fast pair-wise and node-wise algorithms for commute times and Katz scores](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b7218b4a7959177f8b47a3/html5/thumbnails/25.jpg)
MMQ PROCEDURE
Goal
Given
1. Run k-steps of Lanczos on starting with
2. Compute , with an additional eigenvalue at ,
set 3. Compute , with an additional eigenvalue at , set
4. Output as lower and upper bounds on
David F. Gleich (Purdue) Emory Math/CS Seminar
Correspond to a Gauss-Radau rule, with
u as a prescribed node
Correspond to a Gauss-Radau rule, with
l as a prescribed node
25 of 47
![Page 26: Fast pair-wise and node-wise algorithms for commute times and Katz scores](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b7218b4a7959177f8b47a3/html5/thumbnails/26.jpg)
PRACTICAL MMQ
Increase k to become more accurate
Bad eigenvalue bounds yield worse results
and are easy to compute
not required, we can iteratively
update it’s LU factorization
David F. Gleich (Purdue) Emory Math/CS Seminar 26 of 47
![Page 27: Fast pair-wise and node-wise algorithms for commute times and Katz scores](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b7218b4a7959177f8b47a3/html5/thumbnails/27.jpg)
PRACTICAL MMQ
David F. Gleich (Purdue) Emory Math/CS Seminar 27 of 47
![Page 28: Fast pair-wise and node-wise algorithms for commute times and Katz scores](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b7218b4a7959177f8b47a3/html5/thumbnails/28.jpg)
ONE LAST STEP FOR KATZ
Katz
David F. Gleich (Purdue) Emory Math/CS Seminar 28 of 47
![Page 29: Fast pair-wise and node-wise algorithms for commute times and Katz scores](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b7218b4a7959177f8b47a3/html5/thumbnails/29.jpg)
COLUMN-WISE
ALGORITHMS
David F. Gleich (Purdue) Emory Math/CS Seminar 29 / 47
![Page 30: Fast pair-wise and node-wise algorithms for commute times and Katz scores](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b7218b4a7959177f8b47a3/html5/thumbnails/30.jpg)
COLUMN-WISE COMMUTE-TIME
requires entire diagonal of
David F. Gleich (Purdue) Emory Math/CS Seminar
Paige and Saunders, 1975
Each vector is computed by the a Lanczos based CG algorithm.
30 of 47
=
![Page 31: Fast pair-wise and node-wise algorithms for commute times and Katz scores](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b7218b4a7959177f8b47a3/html5/thumbnails/31.jpg)
COLUMN-WISE COMMUTE TIME
David F. Gleich (Purdue) Emory Math/CS Seminar
following Hein et al. 2010 is a MUCH better and faster approximation.
31
![Page 32: Fast pair-wise and node-wise algorithms for commute times and Katz scores](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b7218b4a7959177f8b47a3/html5/thumbnails/32.jpg)
KATZ SCORES ARE LOCALIZED
David F. Gleich (Purdue) Emory Math/CS Seminar
Up to 50 neighbors is 99.65% of the total mass
32 of 47
![Page 33: Fast pair-wise and node-wise algorithms for commute times and Katz scores](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b7218b4a7959177f8b47a3/html5/thumbnails/33.jpg)
PARTICIPATION RATIOS
David F. Gleich (Purdue) Emory Math/CS Seminar
Participation
Ratios
“effective non-zeros” in a vector
33 of 47
![Page 34: Fast pair-wise and node-wise algorithms for commute times and Katz scores](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b7218b4a7959177f8b47a3/html5/thumbnails/34.jpg)
TOP-K ALGORITHM FOR KATZ
Approximate
where is sparse
Keep sparse too
Ideally, don’t “touch” all of
David F. Gleich (Purdue) Emory Math/CS Seminar 34 of 47
![Page 35: Fast pair-wise and node-wise algorithms for commute times and Katz scores](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b7218b4a7959177f8b47a3/html5/thumbnails/35.jpg)
INSPIRATION - PAGERANK
Approximate
where is sparse
Keep sparse too? YES!
Ideally, don’t “touch” all of ? YES!
David F. Gleich (Purdue) Emory Math/CS Seminar
McSherry WWW2005, Berkhin 2007, Anderson et al. FOCS2008 – Thanks to Reid Anderson for telling me McSherry did this too.
35 of 47
![Page 36: Fast pair-wise and node-wise algorithms for commute times and Katz scores](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b7218b4a7959177f8b47a3/html5/thumbnails/36.jpg)
THE ALGORITHM - MCSHERRY
For
Start with the Richardson iteration
Rewrite
Richardson converges if
David F. Gleich (Purdue) Emory Math/CS Seminar 36 of 47
![Page 37: Fast pair-wise and node-wise algorithms for commute times and Katz scores](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b7218b4a7959177f8b47a3/html5/thumbnails/37.jpg)
THE ALGORITHM
Note is sparse.
If , then is sparse.
Idea
only add one component of to
David F. Gleich (Purdue) Emory Math/CS Seminar 37 of 47
![Page 38: Fast pair-wise and node-wise algorithms for commute times and Katz scores](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b7218b4a7959177f8b47a3/html5/thumbnails/38.jpg)
THE ALGORITHM
For
Init:
How to pick ?
David F. Gleich (Purdue) Emory Math/CS Seminar 38 of 47
![Page 39: Fast pair-wise and node-wise algorithms for commute times and Katz scores](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b7218b4a7959177f8b47a3/html5/thumbnails/39.jpg)
THE ALGORITHM FOR KATZ
For
Init:
Pick as max David F. Gleich (Purdue) Emory Math/CS Seminar
Storing the non-zeros of the residual in a heap makes picking the max log(n) time. See Anderson et al. FOCS2008 for more
39 of 47
![Page 40: Fast pair-wise and node-wise algorithms for commute times and Katz scores](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b7218b4a7959177f8b47a3/html5/thumbnails/40.jpg)
CONVERGENCE?
If 1/max-degree then is sub-stochastic and the PageRank based proof applies because the matrix is diagonally dominant
For , then for symmetric , this algorithm is the Gauss-Southwell procedure and it still converges.
David F. Gleich (Purdue) Emory Math/CS Seminar
40 of 47
![Page 41: Fast pair-wise and node-wise algorithms for commute times and Katz scores](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b7218b4a7959177f8b47a3/html5/thumbnails/41.jpg)
NONSYMMETRIC ALGORITHM?
Was this algorithm known in the non-symmetric case?
YES! It’s just Gauss-Seidel/Gauss-Southwell, but with a column vs. row orientation.
Called dynamic residual to AMGers
David F. Gleich (Purdue) Emory Math/CS Seminar 41 of 47
![Page 42: Fast pair-wise and node-wise algorithms for commute times and Katz scores](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b7218b4a7959177f8b47a3/html5/thumbnails/42.jpg)
RESULTS – DATA, PARAMETERS
All unweighted, connected graphs
Easy
Hard
David F. Gleich (Purdue) Emory Math/CS Seminar 42 of 47
![Page 43: Fast pair-wise and node-wise algorithms for commute times and Katz scores](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b7218b4a7959177f8b47a3/html5/thumbnails/43.jpg)
KATZ BOUND CONVERGENCE
David F. Gleich (Purdue) Emory Math/CS Seminar 43 of 47
![Page 44: Fast pair-wise and node-wise algorithms for commute times and Katz scores](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b7218b4a7959177f8b47a3/html5/thumbnails/44.jpg)
COMMUTE BOUND CONVERG.
David F. Gleich (Purdue) Emory Math/CS Seminar 44 of 47
![Page 45: Fast pair-wise and node-wise algorithms for commute times and Katz scores](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b7218b4a7959177f8b47a3/html5/thumbnails/45.jpg)
KATZ SET CONVERGENCE
David F. Gleich (Purdue) Emory Math/CS Seminar
For arXiv graph.
45 of 47
![Page 46: Fast pair-wise and node-wise algorithms for commute times and Katz scores](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b7218b4a7959177f8b47a3/html5/thumbnails/46.jpg)
TIMING
David F. Gleich (Purdue) Emory Math/CS Seminar 46 of 47
![Page 47: Fast pair-wise and node-wise algorithms for commute times and Katz scores](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b7218b4a7959177f8b47a3/html5/thumbnails/47.jpg)
CONCLUSIONS
These algorithms are faster than many alternatives.
For pairwise commute, stopping criteria are simpler with bounds
For top-k problems, we often need less than 1 matvec for good enough results
David F. Gleich (Purdue) Emory Math/CS Seminar 47 of 47
![Page 48: Fast pair-wise and node-wise algorithms for commute times and Katz scores](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b7218b4a7959177f8b47a3/html5/thumbnails/48.jpg)
By AngryDogDesign on DeviantArt
Paper at WAW2010 and J. Internet Mathematics
Slides online
Code is online already
www.cs.purdue.edu/
homes/dgleich/codes/fast-katz-2011
David F. Gleich (Purdue) Emory Math/CS Seminar 48