Ranking from Pairwise Comparisonsswoh.web.engr.illinois.edu/slide_ismp2012.pdfNP-hard approach:...

35
Ranking from Pairwise Comparisons Sewoong Oh Department of ISE University of Illinois at Urbana-Champaign Joint work with Sahand Negahban(MIT) and Devavrat Shah(MIT) 1 / 19

Transcript of Ranking from Pairwise Comparisonsswoh.web.engr.illinois.edu/slide_ismp2012.pdfNP-hard approach:...

  • Ranking from Pairwise Comparisons

    Sewoong Oh

    Department of ISEUniversity of Illinois at Urbana-Champaign

    Joint work with Sahand Negahban(MIT) and Devavrat Shah(MIT)

    1 / 19

  • Rank Aggregation

    AlgorithmData Decision

    1

    3 2

    4

    56

    7

    32

    4

    7

    2

    1

    3

    6

    5

    Score Ranking

    1.0

    0.9

    0.6

    0.1

    0.5

    0.4

    0.2

    1

    2

    3

    7

    4

    5

    6

    /

    2 / 19

  • Example

    AlgorithmData Decision

    kittenwar.com3 / 19

  • Example

    .78

    .77

    .76

    AlgorithmData Decision

    kittenwar.com3 / 19

  • Example

    .78

    .77

    .76

    .20

    .21

    .21

    AlgorithmData Decision

    kittenwar.com3 / 19

  • Example

    AlgorithmData Decision

    www.allourideas.org4 / 19

  • Example

    AlgorithmData Decision

    www.allourideas.org4 / 19

  • Comparisons data

    Group decisions, recommendation and advertisement, sports and game

    Universal (ratings can be converted into comparisons)

    Consistent and reliable

    Natural (e.g. Sports and games, MSR’s TrueSkill)

    Aggregation is challenging

    5 / 19

  • NP-hard approach: Kemeny optimal algorithm

    1

    3 2

    4

    56

    7

    a12a21

    Find the most consistent ranking

    arg minσ

    ∑i 6=j

    aijI(σ(j) > σ(i))

    NP-hard combinatorial optimization

    Optimal for a specific model

    Strong Weak

    1 2 3 4 5 76

    4 5

    54

    w.p. 1− p

    w.p. p

    6 / 19

  • Traditional approach: `1 Ranking

    1

    3 2

    4

    56

    7

    a12a21

    Compute score:

    s(i) =1

    |∂i|∑j∈∂i

    ajiaij + aji

    n: number of items

    m: number of samples

    Compute in O(m) time

    Works when everyone plays everyone else:m = Ω(n2) [Ammar, Shah ’12]

    Strong Weak

    1 .9 .8 .7 .6 .4 .31 .2 .1 0

    All wins are equally weighted

    7 / 19

  • Traditional approach: `1 Ranking

    1

    3 2

    4

    56

    7

    a12a21

    Compute score:

    s(i) =1

    |∂i|∑j∈∂i

    ajiaij + aji

    n: number of items

    m: number of samples

    Compute in O(m) time

    Works when everyone plays everyone else:m = Ω(n2) [Ammar, Shah ’12]

    Strong Weak

    1 .9 .8 .7 .6 .4 .31 .2 .1 0

    All wins are equally weighted

    7 / 19

  • Traditional approach: `1 Ranking

    1

    3 2

    4

    56

    7

    a12a21

    Compute score:

    s(i) =1

    |∂i|∑j∈∂i

    ajiaij + aji

    n: number of items

    m: number of samples

    Compute in O(m) time

    Works when everyone plays everyone else:m = Ω(n2) [Ammar, Shah ’12]

    Strong Weak

    1 .9 .8 .7 .6 .4 .31 .2 .1 0

    All wins are equally weighted

    7 / 19

  • Traditional approach: `1 Ranking

    1

    3 2

    4

    56

    7

    a12a21

    Compute score:

    s(i) =1

    |∂i|∑j∈∂i

    ajiaij + aji

    s(j)

    n: number of items

    m: number of samples

    Compute in O(m) time

    Works when everyone plays everyone else:m = Ω(n2) [Ammar, Shah ’12]

    Strong Weak

    1 .9 .8 .7 .6 .4 .31 .2 .1 0

    All wins are equally weighted

    7 / 19

  • Prior work

    AlgorithmData Decision

    1

    3 2

    4

    56

    7

    32

    4

    7

    2

    1

    3

    6

    5

    Score Ranking

    1.0

    0.9

    0.6

    0.1

    0.5

    0.4

    0.2

    1

    2

    3

    7

    4

    5

    6

    /`1 ranking`p ranking

    Kemeny optimal

    MNL

    Mixed MNL

    etc.

    8 / 19

  • Challenges

    High-dimensional regime

    # of samples ∝ n(log n)

    Low computational complexity

    # of operations ∝ # of samples

    Model independent

    Makes sense

    Under BTL model, Rank Centrality achieves optimal performance

    9 / 19

  • Challenges

    High-dimensional regime

    # of samples ∝ n(log n)

    Low computational complexity

    # of operations ∝ # of samples

    Model independent

    Makes sense

    Under BTL model, Rank Centrality achieves optimal performance

    9 / 19

  • Rank Centrality

    1

    3 2

    4

    56

    7

    a12a21

    Define a random walk on G

    Pij =1

    dmax

    aijaij + aji

    Pii = 1−1

    dmax

    ∑j 6=i

    aijaij + aji

    (unique) stationary distribution

    sTP = sT

    Random walk spends more time on ‘stronger’ nodes

    Higher score for beating a ‘stronger’ node

    s(i) =(

    1− 1dmax

    ∑j 6=i

    aijaij + aji

    )s(i) +

    ∑j 6=i

    Pjis(j)

    =1

    Zi

    ∑j 6=i

    ajiaij + aji

    s(j)

    10 / 19

  • Rank Centrality

    1

    3 2

    4

    56

    7

    a12a21

    Define a random walk on G

    Pij =1

    dmax

    aijaij + aji

    Pii = 1−1

    dmax

    ∑j 6=i

    aijaij + aji

    (unique) stationary distribution

    sTP = sT

    Random walk spends more time on ‘stronger’ nodes

    Higher score for beating a ‘stronger’ node

    s(i) =(

    1− 1dmax

    ∑j 6=i

    aijaij + aji

    )s(i) +

    ∑j 6=i

    Pjis(j)

    =1

    Zi

    ∑j 6=i

    ajiaij + aji

    s(j)

    10 / 19

  • Rank Centrality

    1

    3 2

    4

    56

    7

    a12a21

    Define a random walk on G

    Pij =1

    dmax

    aijaij + aji

    Pii = 1−1

    dmax

    ∑j 6=i

    aijaij + aji

    (unique) stationary distribution

    sTP = sT

    Random walk spends more time on ‘stronger’ nodes

    Higher score for beating a ‘stronger’ node

    s(i) =(

    1− 1dmax

    ∑j 6=i

    aijaij + aji

    )s(i) +

    ∑j 6=i

    Pjis(j)

    =1

    Zi

    ∑j 6=i

    ajiaij + aji

    s(j)

    10 / 19

  • Rank Centrality

    1

    3 2

    4

    56

    7

    a12a21

    Define a random walk on G

    Pij =1

    dmax

    aijaij + aji

    Pii = 1−1

    dmax

    ∑j 6=i

    aijaij + aji

    (unique) stationary distribution

    sTP = sT

    Random walk spends more time on ‘stronger’ nodes

    Higher score for beating a ‘stronger’ node

    s(i) =(

    1− 1dmax

    ∑j 6=i

    aijaij + aji

    )s(i) +

    ∑j 6=i

    Pjis(j)

    =1

    Zi

    ∑j 6=i

    ajiaij + aji

    s(j)

    10 / 19

  • Rank Centrality

    1

    3 2

    4

    56

    7

    a12a21

    Define a random walk on G

    Pij =1

    dmax

    aijaij + aji

    Pii = 1−1

    dmax

    ∑j 6=i

    aijaij + aji

    (unique) stationary distribution

    sTP = sT

    Random walk spends more time on ‘stronger’ nodes

    Higher score for beating a ‘stronger’ node

    s(i) =(

    1− 1dmax

    ∑j 6=i

    aijaij + aji

    )s(i) +

    ∑j 6=i

    Pjis(j)

    =1

    Zi

    ∑j 6=i

    ajiaij + aji

    s(j)

    10 / 19

  • Rank Centrality

    1

    3 2

    4

    56

    7

    a12a21

    Define a random walk on G

    Pij =1

    dmax

    aijaij + aji

    Pii = 1−1

    dmax

    ∑j 6=i

    aijaij + aji

    (unique) stationary distribution

    sTP = sT

    Random walk spends more time on ‘stronger’ nodesHigher score for beating a ‘stronger’ node

    s(i) =(

    1− 1dmax

    ∑j 6=i

    aijaij + aji

    )s(i) +

    ∑j 6=i

    Pjis(j)

    =1

    Zi

    ∑j 6=i

    ajiaij + aji

    s(j)

    10 / 19

  • Experiment: Polling public opinionsWashington Post - allourideas

    11 / 19

  • Experiment: PollingGround truth: what the algorithm produces with complete dataError = 1n

    ∑i(σi − σ̂i)

    0

    2

    4

    6

    8

    10

    12

    14

    16

    18

    20

    0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

    L1 rankingRank Centrality

    Error

    Sampling rate12 / 19

  • ModelAlgorithm is model independentFor experiments and analysis, need model to generate dataComparisons model

    I Bradley-Terry-Luce (BTL) modelthere is a true ranking {wi}when a pair is compared, the noise is modelled by

    P (i < j) =wj

    wi + wj

    Sampling modelI Sample each pair with probability d/nI k comparisons for each pair

    1

    3 2

    4

    56

    7

    d

    k1

    3 2

    4

    56

    7

    k − 3

    3

    13 / 19

  • ModelAlgorithm is model independentFor experiments and analysis, need model to generate dataComparisons model

    I Bradley-Terry-Luce (BTL) modelthere is a true ranking {wi}when a pair is compared, the noise is modelled by

    P (i < j) =wj

    wi + wj

    Sampling modelI Sample each pair with probability d/nI k comparisons for each pair

    1

    3 2

    4

    56

    7

    d

    k1

    3 2

    4

    56

    7

    k − 3

    3

    13 / 19

  • Experiment: BTL model

    1

    3 2

    4

    56

    7

    d

    k

    Error= 1‖w‖∑

    i>j(wi − wj)2I((σ̂i−σ̂j)(wi−wj)>0)

    0.0001

    0.001

    0.01

    0.1

    1 10 100

    Ratio MatrixL1 ranking

    Rank CentralityML estimateError

    k

    0.001

    0.01

    0.1

    0.1 1

    Ratio MatrixL1 ranking

    Rank CentralityML estimateError

    d/n

    14 / 19

  • Performance guarantee

    For d = Ω(log n)

    BTL model {wi} with wmin = Θ(wmax)

    Theorem (Negahban, O., Shah, ’12)

    Rank Centrality achieves

    ‖w − s‖‖w‖

    ≤ C√

    log n

    k d

    Information-theoretic lower bound:

    infs

    supw∈W

    ‖w − s‖‖w‖

    ≥ C ′√

    1

    k d

    # of samples = O(n log n) sufficies to achieve arbitrary small error

    15 / 19

  • Performance guarantee

    For d = Ω(log n)

    BTL model {wi} with wmin = Θ(wmax)

    Theorem (Negahban, O., Shah, ’12)

    Rank Centrality achieves

    ‖w − s‖‖w‖

    ≤ C√

    log n

    k d

    Information-theoretic lower bound:

    infs

    supw∈W

    ‖w − s‖‖w‖

    ≥ C ′√

    1

    k d

    # of samples = O(n log n) sufficies to achieve arbitrary small error

    15 / 19

  • Performance guarantee for general graphs

    Oftentimes we do not control how data is collected

    Let G denote the (undirected) graph of given data

    Compute spectral gap of G (cf. mixing time of natural random walk)

    ξ ≡ 1− λ1(G)λ2(G)

    Theorem (Negahban, O., Shah, ’12)

    Rank Centrality achieves

    ‖w − s‖‖w‖

    ≤ C dmaxξ dmin

    √log n

    k dmax

    # of samples = O(n log n) sufficies to achieve arbitrary small error

    16 / 19

  • Performance guarantee for general graphs

    Oftentimes we do not control how data is collected

    Let G denote the (undirected) graph of given data

    Compute spectral gap of G (cf. mixing time of natural random walk)

    ξ ≡ 1− λ1(G)λ2(G)

    Theorem (Negahban, O., Shah, ’12)

    Rank Centrality achieves

    ‖w − s‖‖w‖

    ≤ C dmaxξ dmin

    √log n

    k dmax

    # of samples = O(n log n) sufficies to achieve arbitrary small error

    16 / 19

  • Proof technique

    1. Spectral analysis of reversible Markov chainsMarkov chain (P, π) is revisible iff π(i)Pij = π(j)Pji

    Pij =1

    dmax

    aijaij + aji

    is not reversible, but the expectation π̃(i)P̃ij = π̃(j)P̃ji

    P̃ij =1

    dmax

    wjwi + wj

    π̃(i) ∝ wi

    17 / 19

  • Proof technique

    1. Spectral analysis of reversible Markov chainsFor any Markov chain (P, π) and any reversible MC (P̃ , π̃)

    ‖π − π̃‖ = ‖P Tπ − P̃ T π̃‖≤ ‖P Tπ − P T π̃‖+ ‖P T π̃ − P̃ T π̃‖≤ ‖P̃ T (π − π̃)‖︸ ︷︷ ︸

    Reversible

    +‖(P − P̃ )T (π − π̃)‖+ ‖P − P̃‖2 ‖π̃‖

    ≤(λ2(P̃ ) + ‖P − P̃‖2

    )‖π − π̃‖+ ‖P − P̃‖2 ‖π̃‖

    ‖π − π̃‖‖π̃‖

    ≤ ‖P − P̃‖21− λ2(P̃ )− ‖P − P̃‖2

    2. Bound 1− λ2(P̃ ) by comparisons theorem3. Bound ‖P − P̃‖2 by concentration of measure inequality for matrices

    18 / 19

  • Proof technique

    1. Spectral analysis of reversible Markov chainsFor any Markov chain (P, π) and any reversible MC (P̃ , π̃)

    ‖π − π̃‖ = ‖P Tπ − P̃ T π̃‖≤ ‖P Tπ − P T π̃‖+ ‖P T π̃ − P̃ T π̃‖≤ ‖P̃ T (π − π̃)‖︸ ︷︷ ︸

    Reversible

    +‖(P − P̃ )T (π − π̃)‖+ ‖P − P̃‖2 ‖π̃‖

    ≤(λ2(P̃ ) + ‖P − P̃‖2

    )‖π − π̃‖+ ‖P − P̃‖2 ‖π̃‖

    ‖π − π̃‖‖π̃‖

    ≤ ‖P − P̃‖21− λ2(P̃ )− ‖P − P̃‖2

    2. Bound 1− λ2(P̃ ) by comparisons theorem3. Bound ‖P − P̃‖2 by concentration of measure inequality for matrices

    18 / 19

  • Conclusion

    Rank aggregation from comparisons

    High-dimensional regime nkd ∼ n log nLow complexity O(nkd log n)

    Model independent

    Optimal under BTL up to log n

    Other Markov chain approaches e.g.”Rank Aggregation Revisited”C. Dwork, R. Kumar, M. Naor, and D. Sivakumar

    19 / 19