On Sample Selection Bias and Its Efficient Correction via Model Averaging and Unlabeled Examples Wei...
-
Upload
dominic-mckenna -
Category
Documents
-
view
222 -
download
3
Transcript of On Sample Selection Bias and Its Efficient Correction via Model Averaging and Unlabeled Examples Wei...
![Page 1: On Sample Selection Bias and Its Efficient Correction via Model Averaging and Unlabeled Examples Wei Fan Ian Davidson.](https://reader036.fdocuments.net/reader036/viewer/2022062618/551515a3550346c77d8b4ce7/html5/thumbnails/1.jpg)
On Sample Selection Bias and Its Efficient Correction via
Model Averaging and Unlabeled Examples
Wei Fan
Ian Davidson
![Page 2: On Sample Selection Bias and Its Efficient Correction via Model Averaging and Unlabeled Examples Wei Fan Ian Davidson.](https://reader036.fdocuments.net/reader036/viewer/2022062618/551515a3550346c77d8b4ce7/html5/thumbnails/2.jpg)
A Toy Example
Two classes:red and green
red: f2>f1green: f2<=f1
![Page 3: On Sample Selection Bias and Its Efficient Correction via Model Averaging and Unlabeled Examples Wei Fan Ian Davidson.](https://reader036.fdocuments.net/reader036/viewer/2022062618/551515a3550346c77d8b4ce7/html5/thumbnails/3.jpg)
Unbiased and Biased Samples
Not so-biased sampling Biased sampling
![Page 4: On Sample Selection Bias and Its Efficient Correction via Model Averaging and Unlabeled Examples Wei Fan Ian Davidson.](https://reader036.fdocuments.net/reader036/viewer/2022062618/551515a3550346c77d8b4ce7/html5/thumbnails/4.jpg)
Effect on Learning
Unbiased 97.1% Biased 92.1% Unbiased 96.9% Biased 95.9%Unbiased 96.405% Biased 92.7% Some techniques are more sensitive to bias than others.
One important question: How to reduce the effect of sample selection
bias?
![Page 5: On Sample Selection Bias and Its Efficient Correction via Model Averaging and Unlabeled Examples Wei Fan Ian Davidson.](https://reader036.fdocuments.net/reader036/viewer/2022062618/551515a3550346c77d8b4ce7/html5/thumbnails/5.jpg)
Ubiquitous
Loan Approval Drug screening Weather forecasting Ad Campaign Fraud Detection User Profiling Biomedical Informatics Intrusion Detection Insurance etc
1. Normally, banks only have data of their own customers2. “Late payment, default” models are computed using their own data3. New customers may not completely follow the same distribution.4. Is the New Century “sub-prime mortgage” bankcrupcy due to bad modeling?
![Page 6: On Sample Selection Bias and Its Efficient Correction via Model Averaging and Unlabeled Examples Wei Fan Ian Davidson.](https://reader036.fdocuments.net/reader036/viewer/2022062618/551515a3550346c77d8b4ce7/html5/thumbnails/6.jpg)
Bias as Distribution Think of “sampling an example (x,y) into the training data” as
an event denoted by random variable s s=1: example (x,y) is sampled into the training data s=0: example (x,y) is not sampled.
Think of bias as a conditional probability of “s=1” dependent on x and y
P(s=1|x,y) : the probability for (x,y) to be sampled into the training data, conditional on the example’s feature vector x and class label y.
![Page 7: On Sample Selection Bias and Its Efficient Correction via Model Averaging and Unlabeled Examples Wei Fan Ian Davidson.](https://reader036.fdocuments.net/reader036/viewer/2022062618/551515a3550346c77d8b4ce7/html5/thumbnails/7.jpg)
Categorization
From Zadrozny’04 No Sample Selection
Bias P(s=1|x,y) = P(s=1)
Feature Bias P(s=1|x,y) = P(s=1|x)
Class Bias P(s=1|x,y) = P(s=1|y)
Complete Bias No more reduction
![Page 8: On Sample Selection Bias and Its Efficient Correction via Model Averaging and Unlabeled Examples Wei Fan Ian Davidson.](https://reader036.fdocuments.net/reader036/viewer/2022062618/551515a3550346c77d8b4ce7/html5/thumbnails/8.jpg)
Alternatively, consider D of the size can be sampled “exhaustively” from the universe of examples.
Bias for a Training Set
How P(s=1|x,y) is computed Practically, for a given training set D
P(s=1|x,y) = 1: if (x,y) is sampled into D P(s=1|x,y) = 0: otherwise
![Page 9: On Sample Selection Bias and Its Efficient Correction via Model Averaging and Unlabeled Examples Wei Fan Ian Davidson.](https://reader036.fdocuments.net/reader036/viewer/2022062618/551515a3550346c77d8b4ce7/html5/thumbnails/9.jpg)
Realistic Datasets are biased?
Most datasets are biased. Unlikely to sample each and every feature
vector. For most problems, it is at least feature bias.
P(s=1|x,y) = P(s=1|x)
![Page 10: On Sample Selection Bias and Its Efficient Correction via Model Averaging and Unlabeled Examples Wei Fan Ian Davidson.](https://reader036.fdocuments.net/reader036/viewer/2022062618/551515a3550346c77d8b4ce7/html5/thumbnails/10.jpg)
Effect on Learning
Learning algorithms estimate the “true conditional probability” True probability P(y|x), such as P(fraud|x)? Estimated probabilty P(y|x,M): M is the model built.
Conditional probability in the biased data. P(y|x,s=1)
Key Issue: P(y|x,s=1) = P(y|x) ? At least for those sampled examples.
![Page 11: On Sample Selection Bias and Its Efficient Correction via Model Averaging and Unlabeled Examples Wei Fan Ian Davidson.](https://reader036.fdocuments.net/reader036/viewer/2022062618/551515a3550346c77d8b4ce7/html5/thumbnails/11.jpg)
Appropriate Assumptions More “good training examples” in “feature bias” than both
“class bias” and “complete bias”. “good”: P(y|x,s=1) = P(y|x) beware: it is “incorrect” to conclude that P(y|x,s=1) = P(y|x)
unless under some restricted situations that can rarely happen. For class bias and complete bias, it is hard to derive anything. It is hard to make any more detailed claims without knowing
more about Both the sampling process The true function.
![Page 12: On Sample Selection Bias and Its Efficient Correction via Model Averaging and Unlabeled Examples Wei Fan Ian Davidson.](https://reader036.fdocuments.net/reader036/viewer/2022062618/551515a3550346c77d8b4ce7/html5/thumbnails/12.jpg)
Categorizing into the exact type is difficult. You don’t know what you don’t know.
Not that bad, since the key issue is the number of examples with “bad” conditional probability. Small Large
![Page 13: On Sample Selection Bias and Its Efficient Correction via Model Averaging and Unlabeled Examples Wei Fan Ian Davidson.](https://reader036.fdocuments.net/reader036/viewer/2022062618/551515a3550346c77d8b4ce7/html5/thumbnails/13.jpg)
“Small” Solutions
Posterior weighting
Class Probability
Integration Over Model Space
Averaging of estimated class probabilities weighted by posterior
Removes model uncertainty by averaging
![Page 14: On Sample Selection Bias and Its Efficient Correction via Model Averaging and Unlabeled Examples Wei Fan Ian Davidson.](https://reader036.fdocuments.net/reader036/viewer/2022062618/551515a3550346c77d8b4ce7/html5/thumbnails/14.jpg)
Prove that the expected error of model averaging is less than any single model combined.
What this says: Compute many models in different ways Don’t hang on one tree
![Page 15: On Sample Selection Bias and Its Efficient Correction via Model Averaging and Unlabeled Examples Wei Fan Ian Davidson.](https://reader036.fdocuments.net/reader036/viewer/2022062618/551515a3550346c77d8b4ce7/html5/thumbnails/15.jpg)
“Large” Solutions
When too many base models’s estimates are off track, the power of model averaging will be limited.
In this case, we need to smartly use “unlabeled example” that are unbiased.
Reasonable assumption: unlabeled examples are usually plenty and easier to get.
![Page 16: On Sample Selection Bias and Its Efficient Correction via Model Averaging and Unlabeled Examples Wei Fan Ian Davidson.](https://reader036.fdocuments.net/reader036/viewer/2022062618/551515a3550346c77d8b4ce7/html5/thumbnails/16.jpg)
How to Use Them
Estimate “joint probability” P(x,y) instead of just conditional probability, i.e., P(x,y) = P(y|x)P(x) Makes no difference use 1 model, but
Multiple models
![Page 17: On Sample Selection Bias and Its Efficient Correction via Model Averaging and Unlabeled Examples Wei Fan Ian Davidson.](https://reader036.fdocuments.net/reader036/viewer/2022062618/551515a3550346c77d8b4ce7/html5/thumbnails/17.jpg)
Examples of How This Works
P1(+|x) = 0.8 and P2(+|x) = 0.4
P1(-|x) = 0.2 and P2(+|x) = 0.6 model averaging,
P(+|x) = (0.8 + 0.4) / 2 = 0.6 P(-|x) = (0.2 + 0.6)/2 = 0.4 Prediction will be –
![Page 18: On Sample Selection Bias and Its Efficient Correction via Model Averaging and Unlabeled Examples Wei Fan Ian Davidson.](https://reader036.fdocuments.net/reader036/viewer/2022062618/551515a3550346c77d8b4ce7/html5/thumbnails/18.jpg)
But if there are two P(x) models, with probability 0.05 and 0.4
Then P(+,x) = 0.05 * 0.8 + 0.4 * 0.4 = 0.2 P(-,x) = 0.05 * 0.2 + 0.4 * 0.6 = 0.25
Recall with model averaging: P(+|x) = 0.6 and P(-|x)=0.4 Prediction is +
But, now the prediction will be – instead of + Key Idea:
Unlabeled examples can be used as “weights” to re-weight the models.
![Page 19: On Sample Selection Bias and Its Efficient Correction via Model Averaging and Unlabeled Examples Wei Fan Ian Davidson.](https://reader036.fdocuments.net/reader036/viewer/2022062618/551515a3550346c77d8b4ce7/html5/thumbnails/19.jpg)
Improve P(y|x)
Use a semi-supervised discriminant learning procedure (Vittaut et al, 2002)
Basic procedure: Use learned models to predict unlabeled
examples. Use a random sample of “predicted” unlabeled
examples to combine with labeled training data Re-train the model Repeat until the predictions on unlabeled
examples remain stable.
![Page 20: On Sample Selection Bias and Its Efficient Correction via Model Averaging and Unlabeled Examples Wei Fan Ian Davidson.](https://reader036.fdocuments.net/reader036/viewer/2022062618/551515a3550346c77d8b4ce7/html5/thumbnails/20.jpg)
Experiments
Feature Bias Generation
Sort the according to feature values “chop off” the top.
![Page 21: On Sample Selection Bias and Its Efficient Correction via Model Averaging and Unlabeled Examples Wei Fan Ian Davidson.](https://reader036.fdocuments.net/reader036/viewer/2022062618/551515a3550346c77d8b4ce7/html5/thumbnails/21.jpg)
Class Bins
Randomly generate prior class probability distribution P(y). Just the number, such as P(+)=0.1 and P(-)=0.9
Sample without replacement from “class bins”
Class Bias Generation
+ -+-
![Page 22: On Sample Selection Bias and Its Efficient Correction via Model Averaging and Unlabeled Examples Wei Fan Ian Davidson.](https://reader036.fdocuments.net/reader036/viewer/2022062618/551515a3550346c77d8b4ce7/html5/thumbnails/22.jpg)
Complete Bias Generation
Recall: the probability to sample an example (x,y) is dependent on both x and y.
Easiest simulation: Sample (x,y) without replacement from the
training data.
![Page 23: On Sample Selection Bias and Its Efficient Correction via Model Averaging and Unlabeled Examples Wei Fan Ian Davidson.](https://reader036.fdocuments.net/reader036/viewer/2022062618/551515a3550346c77d8b4ce7/html5/thumbnails/23.jpg)
Feature Bias
0
10
20
30
40
50
60
70
80
90
100
Single CA sCA JA sJA Jax sJAx
Adult
SJ
SS
Pendig
ArtiChar
CAR4
![Page 24: On Sample Selection Bias and Its Efficient Correction via Model Averaging and Unlabeled Examples Wei Fan Ian Davidson.](https://reader036.fdocuments.net/reader036/viewer/2022062618/551515a3550346c77d8b4ce7/html5/thumbnails/24.jpg)
Datasets
Adult: 2 classes SJ: 3 classes SS: 3 classes Pendig: 10 classes ArtiChar: 10 classes Query: 4 classes Donation: 2 classes, cost-sensitive Credit Card: 2 classes, cost-sensitive
![Page 25: On Sample Selection Bias and Its Efficient Correction via Model Averaging and Unlabeled Examples Wei Fan Ian Davidson.](https://reader036.fdocuments.net/reader036/viewer/2022062618/551515a3550346c77d8b4ce7/html5/thumbnails/25.jpg)
Winners and Losers
Single model *never wins* Under feature bias winners:
model averaging *with or without* improved conditional probability using unlabeled examples
Joint probability averaging with *uncorrelated* P(y|x) and P(x) models (details in paper)
Under class bias winners: Joint probability averaging with *correlated* P(y|x) and
*improved* P(x) models.
Under complete bias: Model averaging with improved P(y|x)
![Page 26: On Sample Selection Bias and Its Efficient Correction via Model Averaging and Unlabeled Examples Wei Fan Ian Davidson.](https://reader036.fdocuments.net/reader036/viewer/2022062618/551515a3550346c77d8b4ce7/html5/thumbnails/26.jpg)
Summary According to our definition, sample selection bias is ubiquitous. Categorization of sample selection bias into 4 types is useful for
analysis, but hard to use in practice. In practice, the key question is: relative number of examples with
inaccurate P(y|x). Small: use model averaging of conditional probabilities of several
models Medium: use model averaging of improved conditional
probabilities Large: use joint probability averaging of uncorrelated conditional
probability and feature probability
![Page 27: On Sample Selection Bias and Its Efficient Correction via Model Averaging and Unlabeled Examples Wei Fan Ian Davidson.](https://reader036.fdocuments.net/reader036/viewer/2022062618/551515a3550346c77d8b4ce7/html5/thumbnails/27.jpg)
![Page 28: On Sample Selection Bias and Its Efficient Correction via Model Averaging and Unlabeled Examples Wei Fan Ian Davidson.](https://reader036.fdocuments.net/reader036/viewer/2022062618/551515a3550346c77d8b4ce7/html5/thumbnails/28.jpg)
When the number is small Prove in the paper that the expected error of
model averaging is less than any single model combined.
What this says: Compute model in different ways Don’t hang yourself on one tree