The Wisdom of Crowds in the Aggregation of Rankings
description
Transcript of The Wisdom of Crowds in the Aggregation of Rankings
The Wisdom of Crowds in the Aggregation of Rankings
Mark SteyversDepartment of Cognitive Sciences
University of California, Irvine
Joint work with:Michael Lee, Brent Miller, Pernille Hemmer
Rank aggregation problem
Goal is to combine many different rank orderings on the same set of items in order to obtain a “better” ordering
Example applications Combining voters rankings: social choice theory Information retrieval and meta-search*
2*e.g. Lebanon & Mao (2008); Klementiev, Roth et al. (2008; 2009), Dwork et al. (2001)
Ulysses S. Grant
James Garfield
Rutherford B. Hayes
Abraham Lincoln
Andrew Johnson
James Garfield
Ulysses S. Grant
Rutherford B. Hayes
Andrew Johnson
Abraham Lincoln
Example ranking problem in our research
time
What is the correct chronological order?
Aggregating ranking data
4
D A B C A B D C B A D C A C B D A D B C
Aggregation Algorithm
A B C D A B C D
ground truth
=?
group answer
Generative Approach
5
D A B C A B D C B A D C A C B D A D B C
Generative Model
? ? ? ?
latent truth
Wisdom of crowds phenomenon
Aggregating over individuals often leads to an estimate that is among the best individual estimates (or sometimes better)
6
Galtons Ox (1907): Median of individual weight estimates came close to true answer
Approach
No communication between individuals
There is always a true answer (ground truth) ground truth only used in evaluation
Unsupervised weighting of individuals* exploit relationship between expertise and consensus experts tend to be closer to the truth and therefore reach more
similar judgments
Incorporate prior knowledge about latent truth discount a priori bad rankings
7* Klementiev, Roth et al. (2008, 2009); Dani, Madani, Pennock et al. (2006). Bayesian truth serum (Prelec et al., 2004); Cultural Consensus Theory (Batchelder and Romney, 1986)
Overview of talk
General knowledge tasks reconstructing order of US presidents Thurstonian models
Sports prediction forecasting NBA and NCAA outcomes Thurstonian models
Episodic memory reconstructing order of personally experienced events Mallows model
8
Experiment: 26 individuals order all 44 US presidents
9
George Washington John Adams Thomas Jefferson James Madison
James Monroe John Quincy Adams Andrew Jackson Martin Van Buren
William Henry Harrison John Tyler James Knox Polk Zachary Taylor
Millard Fillmore Franklin Pierce James Buchanan Abraham Lincoln
Andrew Johnson Ulysses S. Grant Rutherford B. Hayes James Garfield
Chester Arthur Grover Cleveland 1 Benjamin Harrison Grover Cleveland 2
William McKinley Theodore Roosevelt William Howard Taft Woodrow Wilson
Warren Harding Calvin Coolidge Herbert Hoover Franklin D. Roosevelt
Harry S. Truman Dwight Eisenhower John F. Kennedy Lyndon B. Johnson
Richard Nixon Gerald Ford James Carter Ronald Reagan
George H.W. Bush William Clinton George W. Bush Barack Obama
= 1= 1+1Measuring performance
Kendall’s Tau: The number of adjacent pair-wise swaps
Ordering by IndividualA B E C D
True OrderA B C D E
C DEA B
A B E C D
A B C D E= 2
Empirical Results
11
1 10 200
100
200
300
400
500
Individuals (ordered from best to worst)
(random guessing)
Classic models: Thurstone (1927) Mallows (1957); Fligner and Verducci, 1986 Diaconis (1989) Voting methods: e.g. Borda count (1770)
We will focus on Thurstonian and Mallows models implemented as graphical models MCMC inference
Unsupervised models for ranking data
12Many models were developed for preference rankings and voting situations no known ground truth
Thurstonian Model
13
A. George Washington
B. James Madison
C. Andrew Jackson
Each item has a true coordinate on some dimension
Thurstonian Model
14
… but there is noise because of encoding errors
A. George Washington
B. James Madison
C. Andrew Jackson
Thurstonian Model
15
A. George Washington
B. James Madison
C. Andrew Jackson
Each persons mental encoding is based on a single sample from each distribution
A
B
C
Thurstonian Model
16
A. George Washington
B. James Madison
C. Andrew Jackson
A
B
C
A < C < B
The observed ordering is based on the ordering of the samples
Thurstonian Model
17
A. George Washington
B. James Madison
C. Andrew Jackson
A
B
C
A < B < C
The observed ordering is based on the ordering of the samples
Thurstonian Model
18
A. George Washington
B. James Madison
C. Andrew Jackson
Important assumption: across individuals, variance can vary but not the means
Graphical Model of Extended Thurstonian Model
19
j individuals
jx
jy
μ
j
| , ~ N ,ij j jx
( )j jranky x
Latent truth
Expertise of individual
Mental samples
Observed ordering
/1,Gamma~ 0j
Inferred Distributions for 44 US Presidents
20
George Washington (1)John Adams (2)
Thomas Jefferson (3)James Madison (4)James Monroe (6)
John Quincy Adams (5)Andrew Jackson (7)
Martin Van Buren (8)William Henry Harrison (21)
John Tyler (10)James Knox Polk (18)
Zachary Taylor (16)Millard Fillmore (11)Franklin Pierce (19)
James Buchanan (13)Abraham Lincoln (9)
Andrew Johnson (12)Ulysses S. Grant (17)
Rutherford B. Hayes (20)James Garfield (22)Chester Arthur (15)
Grover Cleveland 1 (23)Benjamin Harrison (14)
Grover Cleveland 2 (25)William McKinley (24)
Theodore Roosevelt (29)William Howard Taft (27)
Woodrow Wilson (30)Warren Harding (26)Calvin Coolidge (28)Herbert Hoover (31)
Franklin D. Roosevelt (32)Harry S. Truman (33)
Dwight Eisenhower (34)John F. Kennedy (37)
Lyndon B. Johnson (36)Richard Nixon (39)
Gerald Ford (35)James Carter (38)
Ronald Reagan (40)George H.W. Bush (41)
William Clinton (42)George W. Bush (43)
Barack Obama (44)
error bars = median and minimum sigma
Calibration of individuals
21
0 0.1 0.2 0.3 0.450
100
150
200
250
300
R=0.941
inferred noise level for
each individual
distance to ground
truth
individual
Wisdom of crowds effect
22
1 10 200
50
100
150
200
250
300
350
Individuals
Thurstonian ModelPerturbationIndividuals
Heuristic Models
Many heuristic methods from voting theory E.g., Borda count method
Suppose we have 10 items assign a count of 10 to first item, 9 for second item, etc add counts over individuals order items by the Borda count
i.e., rank by average rank across people
23
Model Comparison
24
1 10 20 300
50
100
150
200
250
300
350
Individuals
Thurstonian ModelPerturbationBorda countIndividuals
Borda
Other ordering tasks
25
Freedom of speech & religion (1)
Right to bear arms (2)
No quartering of soldiers (4)
No unreasonable searches (3)
Due process (5)
Trial by Jury (6)
Civil Trial by Jury (7)
No cruel punishment (8)
Right to non-specified rights (10)
Power for the States & People (9)
ten ammendmentsTen Amendments
Worship any other God (1)
Make a graven image (7)
Take the Lords name in vain (2)
Break the Sabbath (3)
Dishonor your parents (4)
Murder (6)
Commit adultery (8)
Steal (5)
Bear false witness (9)
Covet (10)
Ten Commandments
Overview of talk
General knowledge tasks reconstructing order of US presidents
Sports prediction forecasting NBA and NCAA outcomes
Episodic memory reconstructing order of personally experienced events
New directions
26
Human forecasting experiment
Forecast end-of-season rankings for 15 NBA teams Eastern conference Western conference
Participants were college undergraduates heterogeneous population regarding basketball expertise 172 individuals for Eastern conference 156 individuals for Western conference
Experiment conducted Feb 2010 teams have played about a bit over half of games in regular
season
27
Model predictions for Eastern conference
28
Borda
1. Boston2. Cleveland3. Orlando4. Miami5. Detroit6. Chicago7. Philadelphia8. Atlanta9. New York10. New Jersey11. Indiana12. Washington13. Toronto14. Charlotte15. Milwaukee
Actual outcome
1. Cleveland2. Orlando3. Atlanta4. Boston5. Miami6. Milwaukee7. Charlotte8. Chicago9. Toronto10. Indiana11. New York12. Detroit13. Philadelphia14. Washington15. New Jersey
Thurstonian Model
ClevelandBostonOrlandoMiamiAtlantaChicagoDetroitCharlotteTorontoPhiladelphiaWashingtonIndianaNew YorkMilwaukeeNew Jersey
0 20 40 60 80 100 120 140 1600
10
20
30
40
50
60
70
80
Individuals
Thurstonian model with expertise priorThurstonian modelBorda countIndividuals
0 20 40 60 80 100 120 140 1600
10
20
30
40
50
60
70
80
Individuals
Thurstonian model with expertise priorThurstonian modelBorda countIndividuals
29
East
73%
93%
West
87%94%
Calibration Results
30
0 0.5 1 1.5 210
20
30
40
50
60
70
80
R=0.818
East
0 0.5 1 1.5 2 2.510
20
30
40
50
60
70
80
R=0.762
West
Heuristics: who will win more games?
31
Chicago Bulls Charlotte Bobcats
Won 6 championshipsTeam in existence for 44 years
vs
Won 0 championshipsTeam in existence for 6 years
Related to work on “fast and frugal heuristics” by Gigerenzer et al.
Heuristic ranking by #championships won
32
#championships
1. Boston 2. Chicago 3. Philadelphia 4. Detroit 5. Indiana 6. New York 7. New Jersey 8. Atlanta 9. Washington 10. Milwaukee 11. Miami 12. Orlando 13. Cleveland 14. Toronto 15. Charlotte
Actual outcome
1. Cleveland2. Orlando3. Atlanta4. Boston5. Miami6. Milwaukee7. Charlotte8. Chicago9. Toronto10. Indiana11. New York12. Detroit13. Philadelphia14. Washington15. New Jersey
0 1 2 3 4 5 60
0.5
1
1.5
2
2.5
Informative Priors on Expertise
Individuals who closely follow heuristic orderings are probably not experts
Set hyperparameters of variance prior based on distance to heuristic ordering
33
prior for individual who closely follows heuristic ordering
Graphical Model
34
j individuals
jx
jy
μ
j | , ~ N ,ij j jx
( )j jranky x
jλ
jj λGamma~
0 20 40 60 80 100 120 140 1600
10
20
30
40
50
60
70
80
Individuals
Thurstonian model with expertise priorThurstonian modelBorda countIndividuals
35
East
96%
73%
93%
West
0 20 40 60 80 100 120 140 1600
10
20
30
40
50
60
70
80
Individuals
Thurstonian model with expertise priorThurstonian modelBorda countIndividuals
96%
87%94%
Forecasting NCAA tournament (March Madness)
64 US college basketball teams are placed in a set of four seeded brackets, and play an elimination tournament.
Midwest bracket:
Data
Predictions from 16,718 Yahoo users Each individual predicts the winner of all games We use the predictions for the first four rounds (60 games total)
Two scoring systems Number of correct predictions Points:
1 point per correct winner in 1st round 2 points in 2nd
4 points in 3rd
8 points in 4rd
Data and Results of Heuristic Strategies
38
0 2000 4000 6000 8000 10000 12000 14000 16000 180000
10
20
30
40
50
60
0 2000 4000 6000 8000 10000 12000 14000 16000 180000
20
40
60
80
100
individuals
#cor
rect
pre
dict
ions
poin
ts
Obama47%
majority rule71%
priorseeding
66%
priorseeding
61%
majority rule73%
Obama83%
Thurstonian Model
39
Team A
Team B
Team C
• Each team has a mean on a single “strength” dimension • Each person has single variance
Thurstonian Model
40
Team A
Team B
Team C
A
B
B wins over A
The probability a person will choose team A over team B is the probability their strength for team A will be sampled above team B
Thurstonian Model
41
Team A
Team B
Team CC
B
C wins over B
The probability a person will choose team A over team B is the probability their strength for team A will be sampled above team B
Modeling Results
42
0 2000 4000 6000 8000 10000 12000 14000 16000 180000
10
20
30
40
50
60
0 2000 4000 6000 8000 10000 12000 14000 16000 180000
20
40
60
80
100
individuals
majority rule71%
priorseeding
66%
priorseeding
61%
majority rule73%
Thurst model83%
Thurstonian modelinform.priors90%
Thurst. model78%
Thurst. modelinform. priors81%
#cor
rect
pre
dict
ions
poin
ts
Overview of talk
General knowledge tasks reconstructing order of US presidents
Sports prediction forecasting NBA and NCAA outcomes
Episodic memory reconstructing order of personally experienced events
43
Recollecting Order from Episodic Memory
44
Study this sequence of images
How good is your memory? Place the images in the correct sequence (by reading order)
45
A B C D
E F G H
I J
Problem
What if we have only a small number of individuals?
How can we guard against individuals with poor memory?
Idea: “smooth” the inferred group ordering with a prior
46
Approach
Empirically measure the prior orderings over events
Experiment: a separate group of individuals orders the images without seeing original video
Use this data to construct a prior on the group ordering
47
ω
yj
θj
Mallows Model
(memory data)
latent truth
expertise for person j
observed ranking for person j
jjdjj ep )|(),|( ωyωy
Kendall tau distance
ω
yj
θj
θ*
ωo
yoj
θoj
Mallows Model with an informative prior on the latent truth
(prior knowledge data) (memory data)
latent truth
expertise for person j
observed ranking for person j
prior on orderings
1 2 5 10 280
5
10
15
20
25
K
Mea
n
Type I
ChanceModel 1Model 2
1 2 5 10 280
5
10
15
20
25
K
Mea
n
Type II
Results when picking K worst “witnesses”
50
Number of “witnesses” (K)
uniform prior
informative prior
Summary Combine ordering / ranking data
going beyond numerical estimates or multiple choice questions
Incorporate individual differences assume some individuals might be “experts” going beyond models that treat every vote equally
Incorporate prior knowledge downweight individuals with “wrong” prior knowledge correct judgments towards natural prior orderings
51
Influence of communication
Many researchers argue best aggregation is achieved by complete independence between individuals
But does sharing of information always lead to worse aggregates?
52
Iterated Learning Experiment:each individual refines the previous ordering
53
Abraham Lincoln
Andrew Johnson
James Garfield
Ulysses S. Grant
R. B. Hayes
Andrew Johnson
Abraham Lincoln
individual 1
Related to work by Griffiths and colleagues on iterated learning
Abraham Lincoln
James Garfield
Ulysses S. Grant
R. B. Hayes
Andrew Johnson
individual 2
Andrew Johnson
James Garfield
R. B. Hayes
Andrew Johnson
Abraham Lincoln
individual 3
Influence of information sharingComparing independent judgments and an iterated learning task
54
0 10 20 30 40 50 60 705
6
7
8
9
10
11
12
13Borda Count averaged across problems and chains
Number of individuals
iteratedindependent
independent
iterated
Number of individuals
0.8 1 1.2 1.4 1.6 1.8
0
2
4
6
8
10
12
14
16
18R=-0.752
1
2
3
4
5
6
7
8
9
10
1112
13
14
15
16
17
Predicting problem difficulty
56
std
dispersion of expertise
distance of inferred truth to
actual truth
ordering states geographically
city size rankings
Effect of Group Size
57
0 10 20 30 40 50 60 70 807
8
9
10
11
12
13
14
Group Size
T=0T=2
T=12
Notes
Bradley Terry model is another model for paired comparisons
58
To do
look hyperparameter parametrization for Matlab and other languages
What are natural priors for standard deviation? inverse gamma?
Look up Babington model in Marden (1997) Look up un. of new mexico lady Look up recent research by Pennock and Klementiev Look up Hal Stern
59
Average results across 6 problems
60
Mea
n
1 10 20 300
5
10
15
Individuals
Thurstonian ModelPerturbation ModelBorda countIndividuals
1
2
3
45
6
7
8
9
10
11
12
13
14
15
16
17
18
19
2021
22
23
24
25
26
27
28
29
30
B30-21Find the shortest route between cities
61
1
2
3
45
6
7
8
9
10
11
12
13
14
15
16
17
18
19
2021
22
23
24
25
26
27
28
29
30
B30-21 - subj 5
1
2
3
45
6
7
8
9
10
11
12
13
14
15
16
17
18
19
2021
22
23
24
25
26
27
28
29
30
B30-21 - subj 83
1
2
3
45
6
7
8
9
10
11
12
13
14
15
16
17
18
19
2021
22
23
24
25
26
27
28
29
30
B30-21 - subj 60
1
2
3
45
6
7
8
9
10
11
12
13
14
15
16
17
18
19
2021
22
23
24
25
26
27
28
29
30
B30-21
B30-21
Individual 5 Individual 83 Individual 60Optimal
Dataset Vickers, Bovet, Lee, & Hughes (2003)
83 participants 7 problems of 30 cities
TSP Aggregation Problem
Data consists of city order only No access to city locations
63
Heuristic Approach
Idea: find tours with edges for which many individuals agree
Calculate agreement matrix A A = n × n matrix, where n is the number of cities aij indicates the number of participants that connect cities i and j. use a non-linear transform function f() to emphasize high
agreement edges
Find tour that maximizes
64
( , )
( )iji j tour
f a
(this itself is a non-Euclidian TSP problem)
Line thickness = agreement
65
1
2
3
45
6
7
8
9
10
11
12
13
14
15
16
17
18
19
2021
22
23
24
25
26
27
28
29
30
B30-21Blue = Aggregate Tour
66
Results averaged across 7 problems
0
2
4
6
8
10
12
14
16
18
Per
cent
ove
r Opt
imal
aggregate
Average results over 17 Problems
69
Individuals
Mea
n
1 10 20 30 40 50 60 70 800
5
10
15
20
25
Individuals
Mea
n
Thurstonian ModelPerturbation ModelBorda countIndividuals
Strong wisdom of crowds effect across problems
Results when randomly selecting individuals
70
1 2 5 10 280
5
10
15
20
25
K
Mea
n
Type I
ChanceModel 1Model 2
1 2 5 10 280
5
10
15
20
25
K
Mea
n
Type II
Group size
uniform prior
informative prior
Experiment 2
78 participants 17 problems each with 10 items
Chronological Events Physical Measures Purely ordinal problems, e.g.
Ten Amendments Ten commandments
71
Ordering states west-east
72
Oregon (1)
Utah (2)
Nebraska (3)
Iowa (4)
Alabama (6)
Ohio (5)
Virginia (7)
Delaware (8)
Connecticut (9)
Maine (10)
0 1 2 3
0
5
10
15
20
25
30
35
40
45
R=0.961