Combinatorial Hodge Theory and a Geometric Approach to...
-
Upload
phungthuan -
Category
Documents
-
view
218 -
download
2
Transcript of Combinatorial Hodge Theory and a Geometric Approach to...
Combinatorial Hodge Theory and a Geometric Approach to Ranking
Combinatorial Hodge Theory and a GeometricApproach to Ranking
Yuan Yao
2008 SIAM Annual Meeting
San Diego, July 7, 2008
with Lek-Heng Lim et al.
Combinatorial Hodge Theory and a Geometric Approach to Ranking
Outline
1 Ranking on networks (graphs)Netflix exampleSkew-symmetric matrices and network flows
2 Combinatorial Hodge TheoryDiscrete Differential GeometryHodge Decomposition TheoremRank aggregation as a linear projection
3 Why pairwise works for netflix: a spectral embedding view
4 Conclusions
Combinatorial Hodge Theory and a Geometric Approach to Ranking
Ranking on networks (graphs)
Ranking on Networks (Graphs)
“Multi-criteria” rank/decision systems• Amazon or Netflix’s recommendation system (user-product)• Interest ranking in social networks (person-interest)• Financial analyst recommendation system (analyst-stock)• Voting (voter-candidate)
“Peer-review” systems• publication citation systems (paper-paper)• Google’s webpage ranking (web-web)• eBay’s reputation system (customer-customer)
Combinatorial Hodge Theory and a Geometric Approach to Ranking
Ranking on networks (graphs)
continued...
Ranking data are
incomplete: partial list or pairwise (e.g. ∼ 1% in netflix)
unbalanced: varied distributed votes (e.g. power law)
cardinal: scores or stochastic choices
Implicitly or explicitly, ranking data may be viewed to live on asimple graph G = (V ,E )
V : set of alternatives (products, interests,...) to be ranked
E : pairs of alternatives compared
Combinatorial Hodge Theory and a Geometric Approach to Ranking
Ranking on networks (graphs)
Netflix example
Example: Netflix
Example (Netflix Customer-Product Rating)
480189-by-17770 customer-product 5-star rating matrix X
X is incomplete, with 98.82% of missing values
However,
pairwise comparison graph G = (V ,E ) is very dense!
only 0.22% edges are missed, almost complete
rank aggregation without estimating missing values!
unbalanced: number of raters on e ∈ E varies.
Caveat: we are not trying to solve the Netflix prize problem
Combinatorial Hodge Theory and a Geometric Approach to Ranking
Ranking on networks (graphs)
Netflix example
Netflix example continued
The first-order statistics, mean score for each product, is ofteninadequate because of the following:
most customers just rate a very small portion of the products
different products might have different raters, whence meanscores involve noise due to arbitrary individual rating scales
not able to characterize the inconsistency (ubiquitous for rankaggregation: Arrow”s impossibility theorem)
How about high order statistics?
Combinatorial Hodge Theory and a Geometric Approach to Ranking
Ranking on networks (graphs)
Skew-symmetric matrices and network flows
From 1st Order to 2nd Order: Pairwise Rankings
Linear Model: average score difference between product i andj over all customers who have rated both of them,
wij =
∑
k(Xkj − Xki )
#{k : Xki ,Xkj exist} ,
is translation invariant.
Log-linear Model: when all the scores are positive, thelogarithmic average score ratio,
wij =
∑
k(log Xkj − log Xki )
#{k : Xki ,Xkj exist} ,
is invariant up to a multiplicative constant.
Combinatorial Hodge Theory and a Geometric Approach to Ranking
Ranking on networks (graphs)
Skew-symmetric matrices and network flows
More Invariants
Linear Probability Model: the probability that product j ispreferred to i in excess of a purely random choice,
wij = Pr{k : Xkj > Xki} −1
2.
This is invariant up to a monotone transformation.
Bradley-Terry Model: logarithmic odd ratio (logit)
wij = logPr{k : Xkj > Xki}Pr{k : Xkj < Xki}
.
This is invariant up to a monotone transformation.
Combinatorial Hodge Theory and a Geometric Approach to Ranking
Ranking on networks (graphs)
Skew-symmetric matrices and network flows
Skew-Symmetric Matrices of Pairwise Rankings
All such models induce (sparse) skew-symmetric matrices of|V |-by-|V | (or 2-alternating tensor),
wij =
{
−wji , {i , j} ∈ ENA, otherwise
where G = (V ,E ) is a pairwise comparison graph.Note: Such a skew-symmetric matrix induces a pairwise rankingnetwork flow on graph G .
Combinatorial Hodge Theory and a Geometric Approach to Ranking
Ranking on networks (graphs)
Skew-symmetric matrices and network flows
Pairwise Ranking Flow for IMDB Top 20 Movies
Figure: Pairwise ranking flow on a complete graph
Combinatorial Hodge Theory and a Geometric Approach to Ranking
Ranking on networks (graphs)
Skew-symmetric matrices and network flows
Rank Aggregation Problem
Hardness:
Arrow-Sen’s impossibility theorems in social choice theory
Kemeny-Snell optimal ordering is NP-hard to compute
Spectral analysis on Sn is impractical for large n since|Sn| = n!
Our approach:
Problem (Rank Aggregation)
Does there exist a global ranking function, v : V → R, such that
wij = vj − vi =: δ0(v)(i , j)?
Equivalently, does there exist a scalar field v : V → R whosegradient field gives the flow w?
Combinatorial Hodge Theory and a Geometric Approach to Ranking
Ranking on networks (graphs)
Skew-symmetric matrices and network flows
Answer: Not Always!
From multivariate calculus, there are non-integrable vector fields(cf. movie A Beautiful Mind). A combinatorial version:
1
A
B
C
1
1
B
1 2
1
1
1
1
1
2
C
D
EF
A
Figure: No global ranking v gives wij = vj − vi : (a) cyclic ranking, notewAB + wBC + wCA 6= 0; (b) contains a 4-node cyclic flowA→ C → D → E → A, note on 3-clique {A, B, C} (also {A, E , F}),wAB + wBC + wCA = 0
Combinatorial Hodge Theory and a Geometric Approach to Ranking
Ranking on networks (graphs)
Skew-symmetric matrices and network flows
Triangular Transitivity: null triangular-trace
Fact
For a skew-symmetric matrix W = (wij) associated with graph G,
∃v : wij = vj − vi⇒wij + wjk + wki = 0,∀ 3-clique {i , j , k}
Note:
Transitivity subspace: null triangular-trace or curl-free
{W : wij + wjk + wki = 0,∀ 3-clique {i , j , k}}
Example in the last slide, (a) lies outside; (b) lies in thissubspace, but not a gradient flow.
Combinatorial Hodge Theory and a Geometric Approach to Ranking
Ranking on networks (graphs)
Skew-symmetric matrices and network flows
Hodge decomposition for skew-symmetric matrices
A skew-symmetric matrix W associated with G has an uniqueorthogonal decomposition
W = W1 + W2 + W3
where
W1 satisfies (’integrable’): W1(i , j) = vj − vi for somev : V → R;
W2 satisfies that• (’curl-free’) W2(i , j) + W2(j , k) + W3(k, i) = 0 for all3-clique {i , j , k}• (’divergence-free’)
∑
j W3(i , j) = 0 for all edge {i , j} ∈ E ;
W3 ⊥W1 and W3 ⊥W2.
Combinatorial Hodge Theory and a Geometric Approach to Ranking
Ranking on networks (graphs)
Skew-symmetric matrices and network flows
Hodge decomposition for network flows
A network flows (e.g. pairwise rankings) on graph G has anorthogonal decomposition into
gradient flow + locally acyclic (harmonic) + locally cyclic
where the first two components lie in the transitivity subspace and
gradient flow is integrable to give a global ranking
example (b) is locally acyclic, but cyclic on large scale(harmonic)
example (a) is locally cyclic
Combinatorial Hodge Theory and a Geometric Approach to Ranking
Combinatorial Hodge Theory
Discrete Differential Geometry
Clique Complex and Discrete Differential Forms
We extend graph G to a simplicial complex χG by attachingtriangles
0-simplices χ0G : vertices V
1-simplices χ1G : edges E such that comparison (i.e. pairwise
ranking) between i and j exists
2-simplices χ2G : triangles {i , j , k} such that every edge exists
Note: it suffices here to construct χG up to dimension 2!
global ranking v : V → R, 0-forms, vectors
pairwise ranking w(i , j) = −w(j , i) (∀(i , j) ∈ E ), 1-forms,skew-symmetric matrices, network flows
Combinatorial Hodge Theory and a Geometric Approach to Ranking
Combinatorial Hodge Theory
Discrete Differential Geometry
Space of k-Forms (k-cochains) and Metrics
k-forms:
C k(χG , R) = {u : χk+1G → R, uσ(i0),...,σ(ik ) = sign(σ)ui0,...,ik}
for (i0, . . . , ik) ∈ χk+1G , where σ ∈ Sk+1 is a permutation on
(0, . . . , k). Also k + 1-alternating tensors.
One may associate C k(χG , R) with inner-products.
In particular, the following inner-product on 1-forms is forunbalance issue in pairwise ranking
〈wij , ωij〉D =∑
(i ,j)∈E
Dijwijωij , w , ω ∈ C 1(χG , R)
where Dij = |{customers who rate both i and j}|.
Combinatorial Hodge Theory and a Geometric Approach to Ranking
Combinatorial Hodge Theory
Discrete Differential Geometry
Discrete Exterior Derivatives (Coboundary Maps)
k-coboundary maps δk : C k(χG , R)→ C k+1(χG , R) aredefined as the alternating difference operator
(δku)(i0, . . . , ik+1) =k+1∑
j=0
(−1)j+1u(i0, . . . , ij−1, ij+1, . . . , ik+1)
δk plays the role of differentiation
In particular,• (δ0v)(i , j) = vj − vi =: (gradv)(i , j)• (δ1w)(i , j , k) = (±)wij + wjk + wki =: (curlw)(i , j , k)(triangular-trace of skew-symmetric matrix (wij))
Combinatorial Hodge Theory and a Geometric Approach to Ranking
Combinatorial Hodge Theory
Discrete Differential Geometry
Curl
Definition
For each triangle {i , j , k}, the curl (triangular trace)
wij + wjk + wki
measures the total flow-sum along the loop i → j → k → i .
(δ1w)(i , j , k) = 0 implies the flow is path-independent, whichdefines the triangular transitivity subspace.
Combinatorial Hodge Theory and a Geometric Approach to Ranking
Combinatorial Hodge Theory
Discrete Differential Geometry
Two directions of cochain maps:
ForwardC 0 δ0−→ C 1 δ1−→ C 2,
in other words,
Globalgrad−−→ Pairwise
curl−−→ Triplewise
Backward
Globalδ∗0 =grad∗←−−−−−− Pairwise
δ∗1 =curl∗←−−−−− 3-alternating C 2
• grad∗ = δ∗0 is the negative divergence• curl∗ = δ∗1 , is the boundary operator, gives triangular(locally) cyclic pairwise rankings along triangles
Combinatorial Hodge Theory and a Geometric Approach to Ranking
Combinatorial Hodge Theory
Discrete Differential Geometry
Divergence
Definition
For each alternative i ∈ V , the divergence
(div w)(i) := −(δT0 w)(i) :=
∑
wi∗
measures the inflow-outflow sum at i .
(δT0 w)(i) = 0 implies alternative i is preference-neutral in all
pairwise comparisons.
divergence-free flow δT0 w = 0 is cyclic
With metric D, conjugate operator gives weighted flow-sum
(δ∗0w)(i) =∑
j
wijDij = (δT0 Dw)(i)
Combinatorial Hodge Theory and a Geometric Approach to Ranking
Combinatorial Hodge Theory
Discrete Differential Geometry
A Fundamental Property: Closed Map Property
’Boundary of boundary is empty’: δk+1 ◦ δk = 0
in particular• gradient flow is curl-free: curl ◦ grad = δ2 ◦ δ1 = 0• circular flow is divergence-free: div ◦ curl∗ = δ∗1 ◦ δ∗2 = 0
This leads to the powerful combinatorial Laplacians.
Combinatorial Hodge Theory and a Geometric Approach to Ranking
Combinatorial Hodge Theory
Discrete Differential Geometry
Combinatorial Laplacians
Define the k-dimensional combinatorial Laplacian,∆k : C k → C k by
∆k = δk−1δ∗k−1 + δ∗kδk , k > 0
k = 0, ∆0 = δT0 δ0 is the well-known graph Laplacian
k = 1,∆1 = curl ◦ curl∗− grad ◦ div
Important Properties:• ∆k positive semi-definite• ker(∆k) = ker(δT
k−1) ∩ ker(δk): divergence-free andcurl-free, called harmonics
Combinatorial Hodge Theory and a Geometric Approach to Ranking
Combinatorial Hodge Theory
Hodge Decomposition Theorem
Hodge Decomposition Theorem
Theorem
The space of pairwise rankings, C 1(χG , R), admits an orthogonaldecomposition into three
C 1(χG , R) = im(δ0)⊕ H1 ⊕ im(δ∗1)
whereH1 = ker(δ1) ∩ ker(δ∗0) = ker(∆1).
Combinatorial Hodge Theory and a Geometric Approach to Ranking
Combinatorial Hodge Theory
Hodge Decomposition Theorem
Hodge Decomposition Illustration
ker δ1
⊕ ⊕
CYCLIC(divergence-free)
LOCALLY CONSISTENT(curl-free)
im δ0
Harmonic
H1
Locally cyclic
im δ∗1
(locally acyclic)
Global
(globally acyclic)
ker δ∗0
Figure: Hodge decomposition for pairwise rankings
Combinatorial Hodge Theory and a Geometric Approach to Ranking
Combinatorial Hodge Theory
Hodge Decomposition Theorem
Harmonic Rankings: Locally but NOT Globally Consistent
B
1 2
1
1
1
1
1
2
C
D
EF
A
Figure: (a) a harmonic ranking; (b) from truncated Netflix network
Combinatorial Hodge Theory and a Geometric Approach to Ranking
Combinatorial Hodge Theory
Rank aggregation as a linear projection
Rank aggregation as a linear projection
Corollary
Every pairwise ranking admits a unique orthogonal decomposition,
w = projim δ0w + projker(δ∗0 ) w
i.e.pairwise = grad(global) + cyclic
Particularly the first projection grad(global) gives a global ranking
x∗ = (δ∗0δ0)†δ∗0w = −(∆0)
† div w
Combinatorial Hodge Theory and a Geometric Approach to Ranking
Combinatorial Hodge Theory
Rank aggregation as a linear projection
When Harmonic Ranking Vanishes
Corollary
If the clique complex χG has trivial 1-homology (no holes ofboundary length ≥ 4), then triangular transitivity (curl w = 0)implies the existence of global ranking im δ0.
On simple-connected domains, curl-free vector fields areintegrable.
With trivial 1-homology χG , local consistency implies theglobal consistency
A particular case is when G is a complete graph, theprojection on transitivity subspace gives a unique globalranking, called Borda Count (1782) in social choice theory
Combinatorial Hodge Theory and a Geometric Approach to Ranking
Combinatorial Hodge Theory
Rank aggregation as a linear projection
Example: Erdos-Renyi Random Graph
Theorem (Kahle ’06)
For an Erdos-Renyi random graph G (n, p) with n vertices and eachedge independently emerging with probability p, its clique complexχG almost always has zero 1-homology, except that
1
n2≪ p ≪ 1
n.
Note that full Netflix movie-movie comparison graph is almostcomplete (0.22% missing edges), is that a Erdos-Renyirandom graph?
Combinatorial Hodge Theory and a Geometric Approach to Ranking
Combinatorial Hodge Theory
Rank aggregation as a linear projection
Which Pairwise Ranking Model is More Consistent?
Curl distribution measures the intrinsic local inconsistency in apairwise ranking:
0 100 200 300 400 500 600 700 800 900 10000
0.5
1
1.5
2
2.5
3x 10
5 Curl Distribution of Pairwise Rankings
Pairwise Score Difference
Pairwise Probability Difference
Probability Log Ratio
Figure: Curl distribution of three pairwise rankings, based on mostpopular 500 movies. The pairwise score difference (the Linear model) inred have the thinnest tail.
Combinatorial Hodge Theory and a Geometric Approach to Ranking
Combinatorial Hodge Theory
Rank aggregation as a linear projection
Comparisons of Netflix Global Rankings
Mean Score Score Difference Probability Difference Logarithmic Odd Ratio
Mean Score 1.0000 0.9758 0.9731 0.9746Score Difference 1.0000 0.9976 0.9977
Probability Difference 1.0000 0.9992Logarithmic Odd Ratio 1.0000
Cyclic Residue - 6.03% 7.16% 7.15%
Table: Kendall’s rank correlation coefficients between different globalrankings for Netflix. Note that the pairwise score difference (the Linearmodel) has the smallest relative residue.
Combinatorial Hodge Theory and a Geometric Approach to Ranking
Why pairwise works for netflix: a spectral embedding view
Why Pairwise Ranking Works for Netflix?
Pairwise rankings are good approximations of gradient flowson movie-movie networks
In fact, netflix data in the large scale behave like a1-dimensional curve in high dimensional space
To visualize this, we may use a spectral embedding approach
Combinatorial Hodge Theory and a Geometric Approach to Ranking
Why pairwise works for netflix: a spectral embedding view
Spectral Embedding
Map every movie to a point in S4 by
movie m → (√
p1(m), . . . ,√
p5(m))
where pk(m) is the probability that movie m is rated as stark ∈ {1, . . . , 5}. Obtain a movie-by-star matrix Y .
Do SVD on Y , which is equivalent to do eigenvaluedecomposition on the linear kernel
K (s, t) = 〈s, t〉d , d = 1, s, t ∈ S4
K (s, t) is non-negative, whence the first eigenvector capturesthe centricity (density) of data and the second captures atangent field of the manifold.
Combinatorial Hodge Theory and a Geometric Approach to Ranking
Why pairwise works for netflix: a spectral embedding view
SVD Embedding
Figure: The second singular vector is monotonic to the mean score,indicating the intrinsic parameter of the horseshoe curve is driven by themean score
Combinatorial Hodge Theory and a Geometric Approach to Ranking
Conclusions
Conclusions
Ranking as 1-dimensional scaling of data
Pairwise ranking as approximate gradient fields or flows ongraphs
Hodge Theory provides an orthogonal decomposition forpairwise ranking flows
This decomposition helps characterize the local (triangular)vs. global consistency of pairwise rankings, and gives anatural rank aggregation scheme
Geometry and topology play important roles, but everything isjust linear algebra!