Edge-Weighted Personalized PageRank: Breaking a Decade...
Transcript of Edge-Weighted Personalized PageRank: Breaking a Decade...
Edge-Weighted Personalized PageRank:Breaking a Decade-Old Performance Barrier
W. Xie D. Bindel A. Demers J. Gehrke
12 Aug 2015
W. Xie, D. Bindel, A. Demers, J. Gehrke KDD2015 12 Aug 2015 1 / 1
PageRank Model
Unweighted Node weighted Edge weighted
Random surfer model: x (t+1) = ↵Px (t) + (1� ↵)v where P = AD�1
Stationary distribution: Mx = b where M = (I � ↵P), b = (1� ↵)v
W. Xie, D. Bindel, A. Demers, J. Gehrke KDD2015 12 Aug 2015 2 / 1
Edge Weight vs Node Weight Personalization
vi = vi (w)
w 2 Rd
�ij = �ij(w)
Introduce personalization parameters w 2 Rd in two ways:Node weights: M x(w) = b(w)Edge weights: M(w) x(w) = b
W. Xie, D. Bindel, A. Demers, J. Gehrke KDD2015 12 Aug 2015 3 / 1
Edge Weight vs Node Weight Personalization
Node weight personalization is well-studied
Topic-sensitive PageRank: fast methods based on linearity
Localized PageRank: fast methods based on sparsity
Some work on edge weight personalization
ObjectRank/ScaleRank: personalize weights for di↵erent edge types
But lots of work incorporates edge weights without personalization
Our goal: General, fast methods for edge weight personalization
W. Xie, D. Bindel, A. Demers, J. Gehrke KDD2015 12 Aug 2015 4 / 1
Model Reduction
=
Expensive full model(Mx = b)
⇡ U
Reduced basis =
Reduced model(M̃y = b̃)
Approximation ansatz
Model reduction procedure from physical simulation world:
O✏ine: Construct reduced basis U 2 Rn⇥k
O✏ine: Choose � k equations to pick approximation x̂ = Uy
Online: Solve for y(w) given w and reconstruct x̂
W. Xie, D. Bindel, A. Demers, J. Gehrke KDD2015 12 Aug 2015 5 / 1
Reduced Basis Construction: SVD (aka POD/PCA/KL)
⇡ U
⌃ V T
Snapshot matrix
x1 x2 . . . xr
w2wr
w1 Sample points
W. Xie, D. Bindel, A. Demers, J. Gehrke KDD2015 12 Aug 2015 6 / 1
Approximation Ansatz
Want r = MUy � b ⇡ 0. Consider two approximation conditions:
Method Ansatz Properties
Bubnov-Galerkin UT r = 0 Good accuracy empiricallyFast for P(w) linear
DEIM min krIk Fast even for nonlinear P(w)Complex cost/accuracy tradeo↵
Similar error analysis framework for both (see paper):
Consistency + Stability = Accuracy
Consistency: Does the subspace contain good approximants?
Stability: Is the approximation subproblem far from singular?
W. Xie, D. Bindel, A. Demers, J. Gehrke KDD2015 12 Aug 2015 7 / 1
Bubnov-Galerkin Method
UT
M U
y
� b = 0.
Linear case: wi = probability of transition with edge type i
M(w) = I � ↵
X
i
wiP(i)
!, M̃(w) = I � ↵
X
i
wi P̃(i)
!
where we can precompute P̃(i) = UTP(i)U
Nonlinear: Cost to form M̃(w) comparable to cost of PageRank!
W. Xie, D. Bindel, A. Demers, J. Gehrke KDD2015 12 Aug 2015 8 / 1
Discrete Empirical Interpolation Method (DEIM)
M U
y
� b
I
= 0.Equations in I
Ansatz: Minimize krIk for chosen indices I
Only need a few rows of M (and associated rows of U)
Di↵erence from physics applications: high-degree nodes!
W. Xie, D. Bindel, A. Demers, J. Gehrke KDD2015 12 Aug 2015 9 / 1
Interpolation Costs
Consider subgraph relevant to one interpolation equation:
i 2 I
Incoming neighbors of i
. . .1/3 1/50
Really care about weights of edges incident on INeed more edges to normalize (unless A(w) is linear)
High in/out degree are expensive but informative
Key question: how to choose I to balance cost vs accuracy?
W. Xie, D. Bindel, A. Demers, J. Gehrke KDD2015 12 Aug 2015 10 / 1
Interpolation Accuracy
Key: keep MI,: far from singular.
If |I| = k , this is a subset selection over rows of MU.
Have standard techniques (e.g. pivoted QR)
Want to pick I once, so look at rows of
Z =⇥M(w1)U M(w2)U . . .
⇤
for sample parameters w (i).
Helps to explicitly enforceP
i x̂i = 1
Several heuristics for cost/accuracy tradeo↵ (see paper)
W. Xie, D. Bindel, A. Demers, J. Gehrke KDD2015 12 Aug 2015 11 / 1
Online Costs
If ` = # PR components needed, online costs are:
Form M̃ O(dk2) for B-GMore complex for DEIM
Factor M̃ O(k3)Solve for y O(k2)Form Uy O(k`)
Online costs do not depend on graph size!(unless you want the whole PR vector)
W. Xie, D. Bindel, A. Demers, J. Gehrke KDD2015 12 Aug 2015 12 / 1
Example Networks
DBLP (citation network)3.5M nodes / 18.5M edgesSeven edge types =)seven parametersP(w) linearCompetition: ScaleRank
Weibo (micro-blogging)1.9M nodes / 50.7M edgesWeight edges by topicalsimilarity of postsNumber of parameters =number of topics (5, 10, 20)
(Studied global and local PageRank – see paper for latter.)
W. Xie, D. Bindel, A. Demers, J. Gehrke KDD2015 12 Aug 2015 13 / 1
Singular Value Decay
10-1
100
101
102
103
104
105
106
0 50 100 150 200
Val
ue
ith
Largest Singular Value
DBLP-LWeibo-S5
Weibo-S10Weibo-S20
r = 1000 samples, k = 100
W. Xie, D. Bindel, A. Demers, J. Gehrke KDD2015 12 Aug 2015 14 / 1
DBLP Accuracy
10-5
10-4
10-3
10-2
10-1
100
Galerkin
DEIM
-100
DEIM
-120
DEIM
-200
ScaleR
ank
Kendall@100
Normalized L1
W. Xie, D. Bindel, A. Demers, J. Gehrke KDD2015 12 Aug 2015 15 / 1
DBLP Running Times (All Nodes)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Galerkin
DEIM
-100
DEIM
-120
DEIM
-200
ScaleRank
Runnin
g t
ime
(s)
CoefficientsConstruction
W. Xie, D. Bindel, A. Demers, J. Gehrke KDD2015 12 Aug 2015 16 / 1
Weibo Accuracy
10-4
10-3
10-2
10-1
5 10 20
# Parameters
Kendall@100
Normalized L1
W. Xie, D. Bindel, A. Demers, J. Gehrke KDD2015 12 Aug 2015 17 / 1
Weibo Running Times (All Nodes)
0
0.1
0.2
0.3
0.4
0.5
5 10 20
Ru
nn
ing
tim
e (s
)
# Parameters
CoefficientsConstruction
W. Xie, D. Bindel, A. Demers, J. Gehrke KDD2015 12 Aug 2015 18 / 1
Application: Learning to Rank
Goal: Given T = {(iq, jq)}|T |q=1, find w that mostly ranks iq over j1.
(c.f. Backstrom and Leskovec, WSDM 2011)
Standard: Gradient descent on full problem
One PR computation for objective
One PR computation for each gradient component
Costs d + 1 PR computations per step
With model reduction
Rephrase objective in reduced coordinate space
Use factorization to solve PR for objective
Re-use same factorization for gradient
W. Xie, D. Bindel, A. Demers, J. Gehrke KDD2015 12 Aug 2015 19 / 1
DBLP Learning Task
100
150
200
250
300
350
400
0 2 4 6 8 10 12 14 16 18 20
Obje
ctiv
e F
unct
ion V
alue
Iteration
StandardGalerkin
DEIM-200
(8 papers for training + 7 params)
W. Xie, D. Bindel, A. Demers, J. Gehrke KDD2015 12 Aug 2015 20 / 1
The Punchline
Test case: DBLP, 3.5M nodes, 18.5M edges, 7 params
Cost per Iteration:
Method Standard Bubnov-Galerkin DEIM-200Time(sec) 159.3 0.002 0.033
W. Xie, D. Bindel, A. Demers, J. Gehrke KDD2015 12 Aug 2015 21 / 1
Roads Not Taken
In the paper (but not the talk)
Selecting interpolation equations for DEIM
Localized PageRank experiments (Weibo and DBLP)
Comparison to BCA for localized PageRank
Quasi-optimality framework for error analysis
Room for future work! Analysis, applications, systems, ...
W. Xie, D. Bindel, A. Demers, J. Gehrke KDD2015 12 Aug 2015 22 / 1
Questions?
Edge-Weighted Personalized PageRank:Breaking a Decade-Old Performance Barrier
Wenlei Xie, David Bindel, Johannes Gehrke, and Al Demers
KDD 2015, paper 117
Sponsors:
NSF (IIS-0911036 and IIS-1012593)
iAd Project from the National Research Council of Norway
W. Xie, D. Bindel, A. Demers, J. Gehrke KDD2015 12 Aug 2015 23 / 1