Combinatorial Hodge Theory and a Geometric Approach to...

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Combinatorial Hodge Theory and a GeometricApproach to Ranking

Yuan Yao

2008 SIAM Annual Meeting

San Diego, July 7, 2008

with Lek-Heng Lim et al.


Outline

1 Ranking on networks (graphs)Netflix exampleSkew-symmetric matrices and network flows

2 Combinatorial Hodge TheoryDiscrete Differential GeometryHodge Decomposition TheoremRank aggregation as a linear projection

3 Why pairwise works for netflix: a spectral embedding view

4 Conclusions


Ranking on networks (graphs)

Ranking on Networks (Graphs)

“Multi-criteria” rank/decision systems• Amazon or Netflix’s recommendation system (user-product)• Interest ranking in social networks (person-interest)• Financial analyst recommendation system (analyst-stock)• Voting (voter-candidate)

“Peer-review” systems• publication citation systems (paper-paper)• Google’s webpage ranking (web-web)• eBay’s reputation system (customer-customer)



continued...

Ranking data are

incomplete: partial list or pairwise (e.g. ∼ 1% in netflix)

unbalanced: varied distributed votes (e.g. power law)

cardinal: scores or stochastic choices

Implicitly or explicitly, ranking data may be viewed to live on asimple graph G = (V ,E )

V : set of alternatives (products, interests,...) to be ranked

E : pairs of alternatives compared



Netflix example

Example: Netflix

Example (Netflix Customer-Product Rating)

480189-by-17770 customer-product 5-star rating matrix X

X is incomplete, with 98.82% of missing values

However,

pairwise comparison graph G = (V ,E ) is very dense!

only 0.22% edges are missed, almost complete

rank aggregation without estimating missing values!

unbalanced: number of raters on e ∈ E varies.

Caveat: we are not trying to solve the Netflix prize problem



Netflix example

Netflix example continued

The first-order statistics, mean score for each product, is ofteninadequate because of the following:

most customers just rate a very small portion of the products

different products might have different raters, whence meanscores involve noise due to arbitrary individual rating scales

not able to characterize the inconsistency (ubiquitous for rankaggregation: Arrow”s impossibility theorem)

How about high order statistics?



Skew-symmetric matrices and network flows

From 1st Order to 2nd Order: Pairwise Rankings

Linear Model: average score difference between product i andj over all customers who have rated both of them,

wij =

∑

k(Xkj − Xki )

#{k : Xki ,Xkj exist} ,

is translation invariant.

Log-linear Model: when all the scores are positive, thelogarithmic average score ratio,

wij =

∑

k(log Xkj − log Xki )

#{k : Xki ,Xkj exist} ,

is invariant up to a multiplicative constant.




More Invariants

Linear Probability Model: the probability that product j ispreferred to i in excess of a purely random choice,

wij = Pr{k : Xkj > Xki} −1

2.

This is invariant up to a monotone transformation.

Bradley-Terry Model: logarithmic odd ratio (logit)

wij = logPr{k : Xkj > Xki}Pr{k : Xkj < Xki}

.

This is invariant up to a monotone transformation.




Skew-Symmetric Matrices of Pairwise Rankings

All such models induce (sparse) skew-symmetric matrices of|V |-by-|V | (or 2-alternating tensor),

wij =

{

−wji , {i , j} ∈ ENA, otherwise

where G = (V ,E ) is a pairwise comparison graph.Note: Such a skew-symmetric matrix induces a pairwise rankingnetwork flow on graph G .




Pairwise Ranking Flow for IMDB Top 20 Movies

Figure: Pairwise ranking flow on a complete graph




Rank Aggregation Problem

Hardness:

Arrow-Sen’s impossibility theorems in social choice theory

Kemeny-Snell optimal ordering is NP-hard to compute

Spectral analysis on Sn is impractical for large n since|Sn| = n!

Our approach:

Problem (Rank Aggregation)

Does there exist a global ranking function, v : V → R, such that

wij = vj − vi =: δ0(v)(i , j)?

Equivalently, does there exist a scalar field v : V → R whosegradient field gives the flow w?




Answer: Not Always!

From multivariate calculus, there are non-integrable vector fields(cf. movie A Beautiful Mind). A combinatorial version:

1

A

B

C

1

1

B

1 2

1

1

1

1

1

2

C

D

EF

A

Figure: No global ranking v gives wij = vj − vi : (a) cyclic ranking, notewAB + wBC + wCA 6= 0; (b) contains a 4-node cyclic flowA→ C → D → E → A, note on 3-clique {A, B, C} (also {A, E , F}),wAB + wBC + wCA = 0




Triangular Transitivity: null triangular-trace

Fact

For a skew-symmetric matrix W = (wij) associated with graph G,

∃v : wij = vj − vi⇒wij + wjk + wki = 0,∀ 3-clique {i , j , k}

Note:

Transitivity subspace: null triangular-trace or curl-free

{W : wij + wjk + wki = 0,∀ 3-clique {i , j , k}}

Example in the last slide, (a) lies outside; (b) lies in thissubspace, but not a gradient flow.




Hodge decomposition for skew-symmetric matrices

A skew-symmetric matrix W associated with G has an uniqueorthogonal decomposition

W = W1 + W2 + W3

where

W1 satisfies (’integrable’): W1(i , j) = vj − vi for somev : V → R;

W2 satisfies that• (’curl-free’) W2(i , j) + W2(j , k) + W3(k, i) = 0 for all3-clique {i , j , k}• (’divergence-free’)

∑

j W3(i , j) = 0 for all edge {i , j} ∈ E ;

W3 ⊥W1 and W3 ⊥W2.




Hodge decomposition for network flows

A network flows (e.g. pairwise rankings) on graph G has anorthogonal decomposition into

gradient flow + locally acyclic (harmonic) + locally cyclic

where the first two components lie in the transitivity subspace and

gradient flow is integrable to give a global ranking

example (b) is locally acyclic, but cyclic on large scale(harmonic)

example (a) is locally cyclic


Combinatorial Hodge Theory

Discrete Differential Geometry

Clique Complex and Discrete Differential Forms

We extend graph G to a simplicial complex χG by attachingtriangles

0-simplices χ0G : vertices V

1-simplices χ1G : edges E such that comparison (i.e. pairwise

ranking) between i and j exists

2-simplices χ2G : triangles {i , j , k} such that every edge exists

Note: it suffices here to construct χG up to dimension 2!

global ranking v : V → R, 0-forms, vectors

pairwise ranking w(i , j) = −w(j , i) (∀(i , j) ∈ E ), 1-forms,skew-symmetric matrices, network flows




Space of k-Forms (k-cochains) and Metrics

k-forms:

C k(χG , R) = {u : χk+1G → R, uσ(i0),...,σ(ik ) = sign(σ)ui0,...,ik}

for (i0, . . . , ik) ∈ χk+1G , where σ ∈ Sk+1 is a permutation on

(0, . . . , k). Also k + 1-alternating tensors.

One may associate C k(χG , R) with inner-products.

In particular, the following inner-product on 1-forms is forunbalance issue in pairwise ranking

〈wij , ωij〉D =∑

(i ,j)∈E

Dijwijωij , w , ω ∈ C 1(χG , R)

where Dij = |{customers who rate both i and j}|.




Discrete Exterior Derivatives (Coboundary Maps)

k-coboundary maps δk : C k(χG , R)→ C k+1(χG , R) aredefined as the alternating difference operator

(δku)(i0, . . . , ik+1) =k+1∑

j=0

(−1)j+1u(i0, . . . , ij−1, ij+1, . . . , ik+1)

δk plays the role of differentiation

In particular,• (δ0v)(i , j) = vj − vi =: (gradv)(i , j)• (δ1w)(i , j , k) = (±)wij + wjk + wki =: (curlw)(i , j , k)(triangular-trace of skew-symmetric matrix (wij))




Curl

Definition

For each triangle {i , j , k}, the curl (triangular trace)

wij + wjk + wki

measures the total flow-sum along the loop i → j → k → i .

(δ1w)(i , j , k) = 0 implies the flow is path-independent, whichdefines the triangular transitivity subspace.




Two directions of cochain maps:

ForwardC 0 δ0−→ C 1 δ1−→ C 2,

in other words,

Globalgrad−−→ Pairwise

curl−−→ Triplewise

Backward

Globalδ∗0 =grad∗←−−−−−− Pairwise

δ∗1 =curl∗←−−−−− 3-alternating C 2

• grad∗ = δ∗0 is the negative divergence• curl∗ = δ∗1 , is the boundary operator, gives triangular(locally) cyclic pairwise rankings along triangles




Divergence

Definition

For each alternative i ∈ V , the divergence

(div w)(i) := −(δT0 w)(i) :=

∑

wi∗

measures the inflow-outflow sum at i .

(δT0 w)(i) = 0 implies alternative i is preference-neutral in all

pairwise comparisons.

divergence-free flow δT0 w = 0 is cyclic

With metric D, conjugate operator gives weighted flow-sum

(δ∗0w)(i) =∑

j

wijDij = (δT0 Dw)(i)




A Fundamental Property: Closed Map Property

’Boundary of boundary is empty’: δk+1 ◦ δk = 0

in particular• gradient flow is curl-free: curl ◦ grad = δ2 ◦ δ1 = 0• circular flow is divergence-free: div ◦ curl∗ = δ∗1 ◦ δ∗2 = 0

This leads to the powerful combinatorial Laplacians.




Combinatorial Laplacians

Define the k-dimensional combinatorial Laplacian,∆k : C k → C k by

∆k = δk−1δ∗k−1 + δ∗kδk , k > 0

k = 0, ∆0 = δT0 δ0 is the well-known graph Laplacian

k = 1,∆1 = curl ◦ curl∗− grad ◦ div

Important Properties:• ∆k positive semi-definite• ker(∆k) = ker(δT

k−1) ∩ ker(δk): divergence-free andcurl-free, called harmonics



Hodge Decomposition Theorem


Theorem

The space of pairwise rankings, C 1(χG , R), admits an orthogonaldecomposition into three

C 1(χG , R) = im(δ0)⊕ H1 ⊕ im(δ∗1)

whereH1 = ker(δ1) ∩ ker(δ∗0) = ker(∆1).




Hodge Decomposition Illustration

ker δ1

⊕ ⊕

CYCLIC(divergence-free)

LOCALLY CONSISTENT(curl-free)

im δ0

Harmonic

H1

Locally cyclic

im δ∗1

(locally acyclic)

Global

(globally acyclic)

ker δ∗0

Figure: Hodge decomposition for pairwise rankings




Harmonic Rankings: Locally but NOT Globally Consistent

B

1 2

1

1

1

1

1

2

C

D

EF

A

Figure: (a) a harmonic ranking; (b) from truncated Netflix network



Rank aggregation as a linear projection


Corollary

Every pairwise ranking admits a unique orthogonal decomposition,

w = projim δ0w + projker(δ∗0 ) w

i.e.pairwise = grad(global) + cyclic

Particularly the first projection grad(global) gives a global ranking

x∗ = (δ∗0δ0)†δ∗0w = −(∆0)

† div w




When Harmonic Ranking Vanishes

Corollary

If the clique complex χG has trivial 1-homology (no holes ofboundary length ≥ 4), then triangular transitivity (curl w = 0)implies the existence of global ranking im δ0.

On simple-connected domains, curl-free vector fields areintegrable.

With trivial 1-homology χG , local consistency implies theglobal consistency

A particular case is when G is a complete graph, theprojection on transitivity subspace gives a unique globalranking, called Borda Count (1782) in social choice theory




Example: Erdos-Renyi Random Graph

Theorem (Kahle ’06)

For an Erdos-Renyi random graph G (n, p) with n vertices and eachedge independently emerging with probability p, its clique complexχG almost always has zero 1-homology, except that

1

n2≪ p ≪ 1

n.

Note that full Netflix movie-movie comparison graph is almostcomplete (0.22% missing edges), is that a Erdos-Renyirandom graph?




Which Pairwise Ranking Model is More Consistent?

Curl distribution measures the intrinsic local inconsistency in apairwise ranking:

0 100 200 300 400 500 600 700 800 900 10000

0.5

1

1.5

2

2.5

3x 10

5 Curl Distribution of Pairwise Rankings

Pairwise Score Difference

Pairwise Probability Difference

Probability Log Ratio

Figure: Curl distribution of three pairwise rankings, based on mostpopular 500 movies. The pairwise score difference (the Linear model) inred have the thinnest tail.




Comparisons of Netflix Global Rankings

Mean Score Score Difference Probability Difference Logarithmic Odd Ratio

Mean Score 1.0000 0.9758 0.9731 0.9746Score Difference 1.0000 0.9976 0.9977

Probability Difference 1.0000 0.9992Logarithmic Odd Ratio 1.0000

Cyclic Residue - 6.03% 7.16% 7.15%

Table: Kendall’s rank correlation coefficients between different globalrankings for Netflix. Note that the pairwise score difference (the Linearmodel) has the smallest relative residue.


Why pairwise works for netflix: a spectral embedding view

Why Pairwise Ranking Works for Netflix?

Pairwise rankings are good approximations of gradient flowson movie-movie networks

In fact, netflix data in the large scale behave like a1-dimensional curve in high dimensional space

To visualize this, we may use a spectral embedding approach



Spectral Embedding

Map every movie to a point in S4 by

movie m → (√

p1(m), . . . ,√

p5(m))

where pk(m) is the probability that movie m is rated as stark ∈ {1, . . . , 5}. Obtain a movie-by-star matrix Y .

Do SVD on Y , which is equivalent to do eigenvaluedecomposition on the linear kernel

K (s, t) = 〈s, t〉d , d = 1, s, t ∈ S4

K (s, t) is non-negative, whence the first eigenvector capturesthe centricity (density) of data and the second captures atangent field of the manifold.



SVD Embedding

Figure: The second singular vector is monotonic to the mean score,indicating the intrinsic parameter of the horseshoe curve is driven by themean score


Conclusions

Conclusions

Ranking as 1-dimensional scaling of data

Pairwise ranking as approximate gradient fields or flows ongraphs

Hodge Theory provides an orthogonal decomposition forpairwise ranking flows

This decomposition helps characterize the local (triangular)vs. global consistency of pairwise rankings, and gives anatural rank aggregation scheme

Geometry and topology play important roles, but everything isjust linear algebra!


Acknowledgements

Gunnar Carlsson

Vin de Silva

Persi Diaconis

Leo Guibas

Fei Han

Susan Holmes

Qixing Huang

Xiaoye Jiang

Ming Ma

Jason Morton

Art Owen

Michael Saunders

Harlan Sexton

Steve Smale

Shmuel Weinberger

Ya Xu

Combinatorial Hodge Theory and a Geometric Approach to...

Documents

Transcript of Combinatorial Hodge Theory and a Geometric Approach to...