Ranking from Pairwise Comparisonsswoh.web.engr.illinois.edu/slide_ismp2012.pdfNP-hard approach:...
Transcript of Ranking from Pairwise Comparisonsswoh.web.engr.illinois.edu/slide_ismp2012.pdfNP-hard approach:...
-
Ranking from Pairwise Comparisons
Sewoong Oh
Department of ISEUniversity of Illinois at Urbana-Champaign
Joint work with Sahand Negahban(MIT) and Devavrat Shah(MIT)
1 / 19
-
Rank Aggregation
AlgorithmData Decision
1
3 2
4
56
7
32
4
7
2
1
3
6
5
Score Ranking
1.0
0.9
0.6
0.1
0.5
0.4
0.2
1
2
3
7
4
5
6
/
2 / 19
-
Example
AlgorithmData Decision
kittenwar.com3 / 19
-
Example
.78
.77
.76
AlgorithmData Decision
kittenwar.com3 / 19
-
Example
.78
.77
.76
.20
.21
.21
AlgorithmData Decision
kittenwar.com3 / 19
-
Example
AlgorithmData Decision
www.allourideas.org4 / 19
-
Example
AlgorithmData Decision
www.allourideas.org4 / 19
-
Comparisons data
Group decisions, recommendation and advertisement, sports and game
Universal (ratings can be converted into comparisons)
Consistent and reliable
Natural (e.g. Sports and games, MSR’s TrueSkill)
Aggregation is challenging
5 / 19
-
NP-hard approach: Kemeny optimal algorithm
1
3 2
4
56
7
a12a21
Find the most consistent ranking
arg minσ
∑i 6=j
aijI(σ(j) > σ(i))
NP-hard combinatorial optimization
Optimal for a specific model
Strong Weak
1 2 3 4 5 76
4 5
54
w.p. 1− p
w.p. p
6 / 19
-
Traditional approach: `1 Ranking
1
3 2
4
56
7
a12a21
Compute score:
s(i) =1
|∂i|∑j∈∂i
ajiaij + aji
n: number of items
m: number of samples
Compute in O(m) time
Works when everyone plays everyone else:m = Ω(n2) [Ammar, Shah ’12]
Strong Weak
1 .9 .8 .7 .6 .4 .31 .2 .1 0
All wins are equally weighted
7 / 19
-
Traditional approach: `1 Ranking
1
3 2
4
56
7
a12a21
Compute score:
s(i) =1
|∂i|∑j∈∂i
ajiaij + aji
n: number of items
m: number of samples
Compute in O(m) time
Works when everyone plays everyone else:m = Ω(n2) [Ammar, Shah ’12]
Strong Weak
1 .9 .8 .7 .6 .4 .31 .2 .1 0
All wins are equally weighted
7 / 19
-
Traditional approach: `1 Ranking
1
3 2
4
56
7
a12a21
Compute score:
s(i) =1
|∂i|∑j∈∂i
ajiaij + aji
n: number of items
m: number of samples
Compute in O(m) time
Works when everyone plays everyone else:m = Ω(n2) [Ammar, Shah ’12]
Strong Weak
1 .9 .8 .7 .6 .4 .31 .2 .1 0
All wins are equally weighted
7 / 19
-
Traditional approach: `1 Ranking
1
3 2
4
56
7
a12a21
Compute score:
s(i) =1
|∂i|∑j∈∂i
ajiaij + aji
s(j)
n: number of items
m: number of samples
Compute in O(m) time
Works when everyone plays everyone else:m = Ω(n2) [Ammar, Shah ’12]
Strong Weak
1 .9 .8 .7 .6 .4 .31 .2 .1 0
All wins are equally weighted
7 / 19
-
Prior work
AlgorithmData Decision
1
3 2
4
56
7
32
4
7
2
1
3
6
5
Score Ranking
1.0
0.9
0.6
0.1
0.5
0.4
0.2
1
2
3
7
4
5
6
/`1 ranking`p ranking
Kemeny optimal
MNL
Mixed MNL
etc.
8 / 19
-
Challenges
High-dimensional regime
# of samples ∝ n(log n)
Low computational complexity
# of operations ∝ # of samples
Model independent
Makes sense
Under BTL model, Rank Centrality achieves optimal performance
9 / 19
-
Challenges
High-dimensional regime
# of samples ∝ n(log n)
Low computational complexity
# of operations ∝ # of samples
Model independent
Makes sense
Under BTL model, Rank Centrality achieves optimal performance
9 / 19
-
Rank Centrality
1
3 2
4
56
7
a12a21
Define a random walk on G
Pij =1
dmax
aijaij + aji
Pii = 1−1
dmax
∑j 6=i
aijaij + aji
(unique) stationary distribution
sTP = sT
Random walk spends more time on ‘stronger’ nodes
Higher score for beating a ‘stronger’ node
s(i) =(
1− 1dmax
∑j 6=i
aijaij + aji
)s(i) +
∑j 6=i
Pjis(j)
=1
Zi
∑j 6=i
ajiaij + aji
s(j)
10 / 19
-
Rank Centrality
1
3 2
4
56
7
a12a21
Define a random walk on G
Pij =1
dmax
aijaij + aji
Pii = 1−1
dmax
∑j 6=i
aijaij + aji
(unique) stationary distribution
sTP = sT
Random walk spends more time on ‘stronger’ nodes
Higher score for beating a ‘stronger’ node
s(i) =(
1− 1dmax
∑j 6=i
aijaij + aji
)s(i) +
∑j 6=i
Pjis(j)
=1
Zi
∑j 6=i
ajiaij + aji
s(j)
10 / 19
-
Rank Centrality
1
3 2
4
56
7
a12a21
Define a random walk on G
Pij =1
dmax
aijaij + aji
Pii = 1−1
dmax
∑j 6=i
aijaij + aji
(unique) stationary distribution
sTP = sT
Random walk spends more time on ‘stronger’ nodes
Higher score for beating a ‘stronger’ node
s(i) =(
1− 1dmax
∑j 6=i
aijaij + aji
)s(i) +
∑j 6=i
Pjis(j)
=1
Zi
∑j 6=i
ajiaij + aji
s(j)
10 / 19
-
Rank Centrality
1
3 2
4
56
7
a12a21
Define a random walk on G
Pij =1
dmax
aijaij + aji
Pii = 1−1
dmax
∑j 6=i
aijaij + aji
(unique) stationary distribution
sTP = sT
Random walk spends more time on ‘stronger’ nodes
Higher score for beating a ‘stronger’ node
s(i) =(
1− 1dmax
∑j 6=i
aijaij + aji
)s(i) +
∑j 6=i
Pjis(j)
=1
Zi
∑j 6=i
ajiaij + aji
s(j)
10 / 19
-
Rank Centrality
1
3 2
4
56
7
a12a21
Define a random walk on G
Pij =1
dmax
aijaij + aji
Pii = 1−1
dmax
∑j 6=i
aijaij + aji
(unique) stationary distribution
sTP = sT
Random walk spends more time on ‘stronger’ nodes
Higher score for beating a ‘stronger’ node
s(i) =(
1− 1dmax
∑j 6=i
aijaij + aji
)s(i) +
∑j 6=i
Pjis(j)
=1
Zi
∑j 6=i
ajiaij + aji
s(j)
10 / 19
-
Rank Centrality
1
3 2
4
56
7
a12a21
Define a random walk on G
Pij =1
dmax
aijaij + aji
Pii = 1−1
dmax
∑j 6=i
aijaij + aji
(unique) stationary distribution
sTP = sT
Random walk spends more time on ‘stronger’ nodesHigher score for beating a ‘stronger’ node
s(i) =(
1− 1dmax
∑j 6=i
aijaij + aji
)s(i) +
∑j 6=i
Pjis(j)
=1
Zi
∑j 6=i
ajiaij + aji
s(j)
10 / 19
-
Experiment: Polling public opinionsWashington Post - allourideas
11 / 19
-
Experiment: PollingGround truth: what the algorithm produces with complete dataError = 1n
∑i(σi − σ̂i)
0
2
4
6
8
10
12
14
16
18
20
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
L1 rankingRank Centrality
Error
Sampling rate12 / 19
-
ModelAlgorithm is model independentFor experiments and analysis, need model to generate dataComparisons model
I Bradley-Terry-Luce (BTL) modelthere is a true ranking {wi}when a pair is compared, the noise is modelled by
P (i < j) =wj
wi + wj
Sampling modelI Sample each pair with probability d/nI k comparisons for each pair
1
3 2
4
56
7
d
k1
3 2
4
56
7
k − 3
3
13 / 19
-
ModelAlgorithm is model independentFor experiments and analysis, need model to generate dataComparisons model
I Bradley-Terry-Luce (BTL) modelthere is a true ranking {wi}when a pair is compared, the noise is modelled by
P (i < j) =wj
wi + wj
Sampling modelI Sample each pair with probability d/nI k comparisons for each pair
1
3 2
4
56
7
d
k1
3 2
4
56
7
k − 3
3
13 / 19
-
Experiment: BTL model
1
3 2
4
56
7
d
k
Error= 1‖w‖∑
i>j(wi − wj)2I((σ̂i−σ̂j)(wi−wj)>0)
0.0001
0.001
0.01
0.1
1 10 100
Ratio MatrixL1 ranking
Rank CentralityML estimateError
k
0.001
0.01
0.1
0.1 1
Ratio MatrixL1 ranking
Rank CentralityML estimateError
d/n
14 / 19
-
Performance guarantee
For d = Ω(log n)
BTL model {wi} with wmin = Θ(wmax)
Theorem (Negahban, O., Shah, ’12)
Rank Centrality achieves
‖w − s‖‖w‖
≤ C√
log n
k d
Information-theoretic lower bound:
infs
supw∈W
‖w − s‖‖w‖
≥ C ′√
1
k d
# of samples = O(n log n) sufficies to achieve arbitrary small error
15 / 19
-
Performance guarantee
For d = Ω(log n)
BTL model {wi} with wmin = Θ(wmax)
Theorem (Negahban, O., Shah, ’12)
Rank Centrality achieves
‖w − s‖‖w‖
≤ C√
log n
k d
Information-theoretic lower bound:
infs
supw∈W
‖w − s‖‖w‖
≥ C ′√
1
k d
# of samples = O(n log n) sufficies to achieve arbitrary small error
15 / 19
-
Performance guarantee for general graphs
Oftentimes we do not control how data is collected
Let G denote the (undirected) graph of given data
Compute spectral gap of G (cf. mixing time of natural random walk)
ξ ≡ 1− λ1(G)λ2(G)
Theorem (Negahban, O., Shah, ’12)
Rank Centrality achieves
‖w − s‖‖w‖
≤ C dmaxξ dmin
√log n
k dmax
# of samples = O(n log n) sufficies to achieve arbitrary small error
16 / 19
-
Performance guarantee for general graphs
Oftentimes we do not control how data is collected
Let G denote the (undirected) graph of given data
Compute spectral gap of G (cf. mixing time of natural random walk)
ξ ≡ 1− λ1(G)λ2(G)
Theorem (Negahban, O., Shah, ’12)
Rank Centrality achieves
‖w − s‖‖w‖
≤ C dmaxξ dmin
√log n
k dmax
# of samples = O(n log n) sufficies to achieve arbitrary small error
16 / 19
-
Proof technique
1. Spectral analysis of reversible Markov chainsMarkov chain (P, π) is revisible iff π(i)Pij = π(j)Pji
Pij =1
dmax
aijaij + aji
is not reversible, but the expectation π̃(i)P̃ij = π̃(j)P̃ji
P̃ij =1
dmax
wjwi + wj
π̃(i) ∝ wi
17 / 19
-
Proof technique
1. Spectral analysis of reversible Markov chainsFor any Markov chain (P, π) and any reversible MC (P̃ , π̃)
‖π − π̃‖ = ‖P Tπ − P̃ T π̃‖≤ ‖P Tπ − P T π̃‖+ ‖P T π̃ − P̃ T π̃‖≤ ‖P̃ T (π − π̃)‖︸ ︷︷ ︸
Reversible
+‖(P − P̃ )T (π − π̃)‖+ ‖P − P̃‖2 ‖π̃‖
≤(λ2(P̃ ) + ‖P − P̃‖2
)‖π − π̃‖+ ‖P − P̃‖2 ‖π̃‖
‖π − π̃‖‖π̃‖
≤ ‖P − P̃‖21− λ2(P̃ )− ‖P − P̃‖2
2. Bound 1− λ2(P̃ ) by comparisons theorem3. Bound ‖P − P̃‖2 by concentration of measure inequality for matrices
18 / 19
-
Proof technique
1. Spectral analysis of reversible Markov chainsFor any Markov chain (P, π) and any reversible MC (P̃ , π̃)
‖π − π̃‖ = ‖P Tπ − P̃ T π̃‖≤ ‖P Tπ − P T π̃‖+ ‖P T π̃ − P̃ T π̃‖≤ ‖P̃ T (π − π̃)‖︸ ︷︷ ︸
Reversible
+‖(P − P̃ )T (π − π̃)‖+ ‖P − P̃‖2 ‖π̃‖
≤(λ2(P̃ ) + ‖P − P̃‖2
)‖π − π̃‖+ ‖P − P̃‖2 ‖π̃‖
‖π − π̃‖‖π̃‖
≤ ‖P − P̃‖21− λ2(P̃ )− ‖P − P̃‖2
2. Bound 1− λ2(P̃ ) by comparisons theorem3. Bound ‖P − P̃‖2 by concentration of measure inequality for matrices
18 / 19
-
Conclusion
Rank aggregation from comparisons
High-dimensional regime nkd ∼ n log nLow complexity O(nkd log n)
Model independent
Optimal under BTL up to log n
Other Markov chain approaches e.g.”Rank Aggregation Revisited”C. Dwork, R. Kumar, M. Naor, and D. Sivakumar
19 / 19