Ranking from Pairwise Comparisonsswoh.web.engr.illinois.edu/slide_ismp2012.pdfNP-hard approach:...

Transcript of Ranking from Pairwise Comparisonsswoh.web.engr.illinois.edu/slide_ismp2012.pdfNP-hard approach:...

Ranking from Pairwise Comparisons

Sewoong Oh

Department of ISEUniversity of Illinois at Urbana-Champaign

Joint work with Sahand Negahban(MIT) and Devavrat Shah(MIT)

1 / 19
Rank Aggregation

AlgorithmData Decision

1

3 2

4

56

7

32

4

7

2

1

3

6

5

Score Ranking

1.0

0.9

0.6

0.1

0.5

0.4

0.2

1

2

3

7

4

5

6

/

2 / 19
Example


kittenwar.com3 / 19
Example

.78

.77

.76


kittenwar.com3 / 19
Example

.78

.77

.76

.20

.21

.21


kittenwar.com3 / 19
Example


www.allourideas.org4 / 19
Example


www.allourideas.org4 / 19
Comparisons data

Group decisions, recommendation and advertisement, sports and game

Universal (ratings can be converted into comparisons)

Consistent and reliable

Natural (e.g. Sports and games, MSR’s TrueSkill)

Aggregation is challenging

5 / 19
NP-hard approach: Kemeny optimal algorithm

1

3 2

4

56

7

a12a21

Find the most consistent ranking

arg minσ

∑i 6=j

aijI(σ(j) > σ(i))

NP-hard combinatorial optimization

Optimal for a specific model

Strong Weak

1 2 3 4 5 76

4 5

54

w.p. 1− p

w.p. p

6 / 19
Traditional approach: `1 Ranking

1

3 2

4

56

7

a12a21

Compute score:

s(i) =1

|∂i|∑j∈∂i

ajiaij + aji

n: number of items

m: number of samples

Compute in O(m) time

Works when everyone plays everyone else:m = Ω(n2) [Ammar, Shah ’12]

Strong Weak

1 .9 .8 .7 .6 .4 .31 .2 .1 0

All wins are equally weighted

7 / 19

1

3 2

4

56

7

a12a21

Compute score:

s(i) =1

|∂i|∑j∈∂i

ajiaij + aji

n: number of items




Strong Weak

1 .9 .8 .7 .6 .4 .31 .2 .1 0


7 / 19

1

3 2

4

56

7

a12a21

Compute score:

s(i) =1

|∂i|∑j∈∂i

ajiaij + aji

n: number of items




Strong Weak

1 .9 .8 .7 .6 .4 .31 .2 .1 0


7 / 19

1

3 2

4

56

7

a12a21

Compute score:

s(i) =1

|∂i|∑j∈∂i

ajiaij + aji

s(j)

n: number of items




Strong Weak

1 .9 .8 .7 .6 .4 .31 .2 .1 0


7 / 19
Prior work


1

3 2

4

56

7

32

4

7

2

1

3

6

5

Score Ranking

1.0

0.9

0.6

0.1

0.5

0.4

0.2

1

2

3

7

4

5

6

/`1 ranking`p ranking

Kemeny optimal

MNL

Mixed MNL

etc.

8 / 19
Challenges

High-dimensional regime

# of samples ∝ n(log n)

Low computational complexity

# of operations ∝ # of samples

Model independent

Makes sense

Under BTL model, Rank Centrality achieves optimal performance

9 / 19
Challenges

High-dimensional regime

# of samples ∝ n(log n)

Low computational complexity

# of operations ∝ # of samples

Model independent

Makes sense

Under BTL model, Rank Centrality achieves optimal performance

9 / 19
Rank Centrality

1

3 2

4

56

7

a12a21

Define a random walk on G

Pij =1

dmax

aijaij + aji

Pii = 1−1

dmax

∑j 6=i

aijaij + aji

(unique) stationary distribution

sTP = sT

Random walk spends more time on ‘stronger’ nodes

Higher score for beating a ‘stronger’ node

s(i) =(

1− 1dmax

∑j 6=i

aijaij + aji

)s(i) +

∑j 6=i

Pjis(j)

=1

Zi

∑j 6=i

ajiaij + aji

s(j)

10 / 19
Rank Centrality

1

3 2

4

56

7

a12a21


Pij =1

dmax

aijaij + aji

Pii = 1−1

dmax

∑j 6=i

aijaij + aji


sTP = sT



s(i) =(

1− 1dmax

∑j 6=i

aijaij + aji

)s(i) +

∑j 6=i

Pjis(j)

=1

Zi

∑j 6=i

ajiaij + aji

s(j)

10 / 19
Rank Centrality

1

3 2

4

56

7

a12a21


Pij =1

dmax

aijaij + aji

Pii = 1−1

dmax

∑j 6=i

aijaij + aji


sTP = sT



s(i) =(

1− 1dmax

∑j 6=i

aijaij + aji

)s(i) +

∑j 6=i

Pjis(j)

=1

Zi

∑j 6=i

ajiaij + aji

s(j)

10 / 19
Rank Centrality

1

3 2

4

56

7

a12a21


Pij =1

dmax

aijaij + aji

Pii = 1−1

dmax

∑j 6=i

aijaij + aji


sTP = sT



s(i) =(

1− 1dmax

∑j 6=i

aijaij + aji

)s(i) +

∑j 6=i

Pjis(j)

=1

Zi

∑j 6=i

ajiaij + aji

s(j)

10 / 19
Rank Centrality

1

3 2

4

56

7

a12a21


Pij =1

dmax

aijaij + aji

Pii = 1−1

dmax

∑j 6=i

aijaij + aji


sTP = sT



s(i) =(

1− 1dmax

∑j 6=i

aijaij + aji

)s(i) +

∑j 6=i

Pjis(j)

=1

Zi

∑j 6=i

ajiaij + aji

s(j)

10 / 19
Rank Centrality

1

3 2

4

56

7

a12a21


Pij =1

dmax

aijaij + aji

Pii = 1−1

dmax

∑j 6=i

aijaij + aji


sTP = sT

Random walk spends more time on ‘stronger’ nodesHigher score for beating a ‘stronger’ node

s(i) =(

1− 1dmax

∑j 6=i

aijaij + aji

)s(i) +

∑j 6=i

Pjis(j)

=1

Zi

∑j 6=i

ajiaij + aji

s(j)

10 / 19
Experiment: Polling public opinionsWashington Post - allourideas

11 / 19
Experiment: PollingGround truth: what the algorithm produces with complete dataError = 1n

∑i(σi − σ̂i)

0

2

4

6

8

10

12

14

16

18

20

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

L1 rankingRank Centrality

Error

Sampling rate12 / 19
ModelAlgorithm is model independentFor experiments and analysis, need model to generate dataComparisons model

I Bradley-Terry-Luce (BTL) modelthere is a true ranking {wi}when a pair is compared, the noise is modelled by

P (i < j) =wj

wi + wj

Sampling modelI Sample each pair with probability d/nI k comparisons for each pair

1

3 2

4

56

7

d

k1

3 2

4

56

7

k − 3

3

13 / 19
ModelAlgorithm is model independentFor experiments and analysis, need model to generate dataComparisons model

I Bradley-Terry-Luce (BTL) modelthere is a true ranking {wi}when a pair is compared, the noise is modelled by

P (i < j) =wj

wi + wj

Sampling modelI Sample each pair with probability d/nI k comparisons for each pair

1

3 2

4

56

7

d

k1

3 2

4

56

7

k − 3

3

13 / 19
Experiment: BTL model

1

3 2

4

56

7

d

k

Error= 1‖w‖∑

i>j(wi − wj)2I((σ̂i−σ̂j)(wi−wj)>0)

0.0001

0.001

0.01

0.1

1 10 100

Ratio MatrixL1 ranking

Rank CentralityML estimateError

k

0.001

0.01

0.1

0.1 1

Ratio MatrixL1 ranking

Rank CentralityML estimateError

d/n

14 / 19
Performance guarantee

For d = Ω(log n)

BTL model {wi} with wmin = Θ(wmax)

Theorem (Negahban, O., Shah, ’12)

Rank Centrality achieves

‖w − s‖‖w‖

≤ C√

log n

k d

Information-theoretic lower bound:

infs

supw∈W

‖w − s‖‖w‖

≥ C ′√

1

k d

# of samples = O(n log n) sufficies to achieve arbitrary small error

15 / 19
Performance guarantee

For d = Ω(log n)

BTL model {wi} with wmin = Θ(wmax)



‖w − s‖‖w‖

≤ C√

log n

k d

Information-theoretic lower bound:

infs

supw∈W

‖w − s‖‖w‖

≥ C ′√

1

k d


15 / 19
Performance guarantee for general graphs

Oftentimes we do not control how data is collected

Let G denote the (undirected) graph of given data

Compute spectral gap of G (cf. mixing time of natural random walk)

ξ ≡ 1− λ1(G)λ2(G)



‖w − s‖‖w‖

≤ C dmaxξ dmin

√log n

k dmax


16 / 19
Performance guarantee for general graphs

Oftentimes we do not control how data is collected

Let G denote the (undirected) graph of given data

Compute spectral gap of G (cf. mixing time of natural random walk)

ξ ≡ 1− λ1(G)λ2(G)



‖w − s‖‖w‖

≤ C dmaxξ dmin

√log n

k dmax


16 / 19
Proof technique

1. Spectral analysis of reversible Markov chainsMarkov chain (P, π) is revisible iff π(i)Pij = π(j)Pji

Pij =1

dmax

aijaij + aji

is not reversible, but the expectation π̃(i)P̃ij = π̃(j)P̃ji

P̃ij =1

dmax

wjwi + wj

π̃(i) ∝ wi

17 / 19
Proof technique

1. Spectral analysis of reversible Markov chainsFor any Markov chain (P, π) and any reversible MC (P̃ , π̃)

‖π − π̃‖ = ‖P Tπ − P̃ T π̃‖≤ ‖P Tπ − P T π̃‖+ ‖P T π̃ − P̃ T π̃‖≤ ‖P̃ T (π − π̃)‖︸︷︷︸

Reversible

+‖(P − P̃ )T (π − π̃)‖+ ‖P − P̃‖2 ‖π̃‖

≤(λ2(P̃ ) + ‖P − P̃‖2

)‖π − π̃‖+ ‖P − P̃‖2 ‖π̃‖

‖π − π̃‖‖π̃‖

≤ ‖P − P̃‖21− λ2(P̃ )− ‖P − P̃‖2

2. Bound 1− λ2(P̃ ) by comparisons theorem3. Bound ‖P − P̃‖2 by concentration of measure inequality for matrices

18 / 19
Proof technique

1. Spectral analysis of reversible Markov chainsFor any Markov chain (P, π) and any reversible MC (P̃ , π̃)

‖π − π̃‖ = ‖P Tπ − P̃ T π̃‖≤ ‖P Tπ − P T π̃‖+ ‖P T π̃ − P̃ T π̃‖≤ ‖P̃ T (π − π̃)‖︸︷︷︸

Reversible

+‖(P − P̃ )T (π − π̃)‖+ ‖P − P̃‖2 ‖π̃‖

≤(λ2(P̃ ) + ‖P − P̃‖2

)‖π − π̃‖+ ‖P − P̃‖2 ‖π̃‖

‖π − π̃‖‖π̃‖

≤ ‖P − P̃‖21− λ2(P̃ )− ‖P − P̃‖2

2. Bound 1− λ2(P̃ ) by comparisons theorem3. Bound ‖P − P̃‖2 by concentration of measure inequality for matrices

18 / 19
Conclusion

Rank aggregation from comparisons

High-dimensional regime nkd ∼ n log nLow complexity O(nkd log n)

Model independent

Optimal under BTL up to log n

Other Markov chain approaches e.g.”Rank Aggregation Revisited”C. Dwork, R. Kumar, M. Naor, and D. Sivakumar

19 / 19

Ranking from Pairwise Comparisonsswoh.web.engr.illinois.edu/slide_ismp2012.pdfNP-hard approach:...

Documents

Transcript of Ranking from Pairwise Comparisonsswoh.web.engr.illinois.edu/slide_ismp2012.pdfNP-hard approach:...