Get Another Label? Using Multiple, Noisy Labelers Joint work with Victor Sheng and Foster Provost...
-
Upload
angela-mccarthy -
Category
Documents
-
view
235 -
download
0
Transcript of Get Another Label? Using Multiple, Noisy Labelers Joint work with Victor Sheng and Foster Provost...
Get Another Label? Using Multiple, Noisy Labelers
Joint work with Victor Sheng and Foster Provost
Panos Ipeirotis
Stern School of BusinessNew York University
2
Motivation
Many task rely on high-quality labels for objects:– relevance judgments
– duplicate database records
– image recognition
– song categorization
– videos
Labeling can be relatively inexpensive, using Mechanical Turk, ESP game …
ESP Game (by Luis von Ahn)
3
Mechanical Turk Example
“Are these two documents about the same topic?”
4
Mechanical Turk Example
5
6
Motivation
Labels can be used in training predictive models – Duplicate detection systems
– Image recognition
– Web search
But: labels obtained from above sources are noisy. This directly affects the quality of learning models
– How can we know the quality of annotators?
– How can we know the correct answer?
– How can we use best noisy annotators?
7
40
50
60
70
80
90
100
1 20 40 60 80 100
120
140
160
180
200
220
240
260
280
300
Number of examples (Mushroom)
Acc
ura
cyQuality and Classification Performance
Labeling quality increases classification quality increases
Q = 0.5
Q = 0.6
Q = 0.8
Q = 1.0
8
How to Improve Labeling Quality
Find better labelers– Often expensive, or beyond our control
Use multiple, noisy labelers: repeated-labeling– Our focus
9
Multiple labelers and resulting label quality
Multiple labelers and classification quality
Selective label acquisition
Our Focus: Labeling using Multiple Noisy Labelers
10
Majority Voting and Label Quality
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 3 5 7 9 11 13
Number of labelers
Inte
grat
ed q
ualit
y
P=0.4
P=0.5
P=0.6
P=0.7
P=0.8
P=0.9
P=1.0
Ask multiple labelers, keep majority label as “true” label
Quality is probability of majority label being correct
P is probabilityof individual labelerbeing correct
So…
(Sometimes) quality of multiple noisy labelers better than quality of best labeler in set
11
Multiple noisy labelers improve quality
So, should we always get multiple labels?
12
Tradeoffs for Classification
Get more labels Improve label quality Improve classification Get more examples Improve classification
40
50
60
70
80
90
100
1 20 40 60 80 100
120
140
160
180
200
220
240
260
280
300
Number of examples (Mushroom)
Acc
ura
cy
Q = 0.5
Q = 0.6
Q = 0.8
Q = 1.0
13
Basic Labeling Strategies
Get as many data points as possible, one label each
Repeatedly-label everything, same number of times
14
Repeat-Labeling vs. Single Labeling
P= 0.6, labeling qualityK=5, #labels/example
Repeated
Single
With high noise, repeated labeling better than single labeling
15
Repeat-Labeling vs. Single Labeling
P= 0.8, labeling qualityK=5, #labels/example
Repeated
Single
With low noise, more (single labeled) examples better
Estimating Labeler Quality
(Dawid, Skene 1979): “Multiple diagnoses”
– Assume equal qualities– Estimate “true” labels for examples– Estimate qualities of labelers given the “true” labels– Repeat until convergence
16
17
Selective Repeated-Labeling
We have seen: – With noise and enough (noisy) examples getting
multiple labels better than single-labeling
Can we do better?
Select data points, in terms of uncertainty score, to allocate multi-label resource, e.g. {+,-,+,+,-,+,+} vs. {+,+,+,+}
18
Natural Candidate: Entropy
Entropy is a natural measure of label uncertainty:
E({+,+,+,+,+,+})=0 E({+,-, +,-, +,- })=1
Strategy: Get more labels for high-entropy examples
||
||log
||
||
||
||log
||
||)( 22 S
S
S
S
S
S
S
SSE
negativeSpositiveS |:||:|
19
What Not to Do: Use Entropy
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
0 400 800 1200 1600 2000
Number of labels (waveform, p=0.6)
Lab
eli
ng
qu
ali
ty
ENTROPYUNF
Improves at first, hurts in long run
EntropyRound robin
Why not Entropy
In the presence of noise, entropy will be high even with many labels
Entropy is scale invariant – (3+ , 2-) has same entropy as (600+ , 400-)
20
21
Estimating Label Uncertainty (LU)
Observe +’s and –’s and compute Pr{+|obs} and Pr{-|obs}
Label uncertainty = tail of beta distribution
SLU
0.50.0 1.0
Beta probability density function
Label Uncertainty
p=0.7 5 labelers
(3+, 2-) Entropy ~ 0.97
22
Label Uncertainty
p=0.7 10 labelers
(7+, 3-) Entropy ~ 0.88
23
Label Uncertainty
p=0.7 20 labelers
(14+, 6-) Entropy ~ 0.88
24
Comparison
25
0.60.650.7
0.750.8
0.850.9
0.951
0 400 800 1200 1600 2000Number of labels (waveform, p=0.6)
Labe
ling
qual
ity
UNF MULU LMU
Label Uncertainty
Uniform, round robin
26
Model Uncertainty (MU)
However, we do not have only labelers
A classifier can also give us labels!
Model uncertainty: get more labels for ambiguous/difficult examples
Intuitively: make sure that difficult cases are correct
+ ++
++ ++
+
+ ++
+
+ ++
++ ++
+
- - - -
- - - -- -
- -
- - - -
- - - -- - - -- - - -
- - - -
?
??
27
Label + Model Uncertainty
Label and model uncertainty (LMU): avoid examples where either strategy is certain
MULULMU SSS
Comparison
28
0.60.650.7
0.750.8
0.850.9
0.951
0 400 800 1200 1600 2000Number of labels (waveform, p=0.6)
Labe
ling
qual
ity
UNF MULU LMU
Label Uncertainty
Uniform, round robin
Label + Model Uncertainty
Model Uncertainty alone also improves
quality
29
Classification Improvement
60
65
70
75
80
85
0 400 800 1200 1600 2000Number of labels (spambase, p=0.6)
Acc
urac
y
UNF MULU LMU
30
Conclusions
Gathering multiple labels from noisy users is a useful strategy
Under high noise, almost always better than single-labeling
Selectively labeling using label and model uncertainty is more effective
31
More Work to Do
Estimating the labeling quality of each labeler
Increased compensation vs. labeler quality
Example-conditional quality issues (some examples more difficult than others)
Multiple “real” labels
Hybrid labeling strategies using “learning-curve gradient”
Other Projects
SQoUT projectStructured Querying over Unstructured Texthttp://sqout.stern.nyu.edu
Faceted InterfacesEconoMining project
The Economic Value of User Generated Contenthttp://economining.stern.nyu.edu
32
33
SQoUT: Structured Querying over Unstructured Text
Information extraction applications extract structured relations from unstructured text
May 19 1995, Atlanta -- The Centers for Disease Control and Prevention, which is in the front line of the world's response to the deadly Ebola epidemic in Zaire , is finding itself hard pressed to cope with the crisis…
Date Disease Name Location
Jan. 1995 Malaria Ethiopia
July 1995 Mad Cow Disease U.K.
Feb. 1995 Pneumonia U.S.
May 1995 Ebola Zaire
Information Extraction System
(e.g., NYU’s Proteus)
Disease Outbreaks in The New York Times
34
SQoUT: The QuestionsOutput Tokens
…Extraction
System(s)
Text Databases
3. Extract output tuples2. Process documents1. Retrieve documents from database/web/archive
Questions: 1.How to we retrieve the documents?2.How to configure the extraction systems?3.What is the execution time? 4.What is the output quality?
SIGMOD’06, TODS’07, + in progress
EconoMining Project
Show me the Money!
Applications (in increasing order of difficulty)
Buyer feedback and seller pricing power in online marketplaces (ACL 2007)
Product reviews and product sales (KDD 2007)
Importance of reviewers based on economic impact (ICEC 2007)
Hotel ranking based on “bang for the buck” (WebDB 2008)
Political news (MSM, blogs), prediction markets, and news importance
Basic Idea
Opinion mining an important application of information extraction
Opinions of users are reflected in some economic variable (price, sales)
Some Indicative Dollar ValuesPositive Negative
Natural method for extracting sentiment strength and polarity
good packaging -$0.56
Naturally captures the pragmatic meaning within the given context
captures misspellings as well
Positive? Negative ?
Thanks!
Q & A?