When Efficient Model Averaging Out-Perform Bagging and Boosting
description
Transcript of When Efficient Model Averaging Out-Perform Bagging and Boosting
![Page 1: When Efficient Model Averaging Out-Perform Bagging and Boosting](https://reader035.fdocuments.net/reader035/viewer/2022062803/56814685550346895db3a7bc/html5/thumbnails/1.jpg)
When Efficient Model Averaging Out-Perform Bagging and Boosting
Ian Davidson, SUNY Albany
Wei Fan, IBM T.J.Watson
![Page 2: When Efficient Model Averaging Out-Perform Bagging and Boosting](https://reader035.fdocuments.net/reader035/viewer/2022062803/56814685550346895db3a7bc/html5/thumbnails/2.jpg)
Ensemble Techniques
• Techniques such as boosting and bagging are methods of combining models.
• Used extensively in ML and DM seems to work well in a large variety of situations.
• But model averaging is the “correct” Bayesian method of using multiple models.
• Does model averaging have a place in ML and DM?
![Page 3: When Efficient Model Averaging Out-Perform Bagging and Boosting](https://reader035.fdocuments.net/reader035/viewer/2022062803/56814685550346895db3a7bc/html5/thumbnails/3.jpg)
What is Model Averaging?
Posterior weighting
Class Probability
Integration Over Model Space
Averaging of class probabilities weighted by posterior
Removes model uncertainty by averaging
Prohibitive for large model spacessuch as decision trees
![Page 4: When Efficient Model Averaging Out-Perform Bagging and Boosting](https://reader035.fdocuments.net/reader035/viewer/2022062803/56814685550346895db3a7bc/html5/thumbnails/4.jpg)
Efficient Model Averaging:PBMA and Random DT
• PBMA (Davidson 04): parametric bootstrap model averaging– Use parametric model to generate multiple bootstraps
computed from a single training set.• Random Decision Tree (Fan et al 03)
– Construct each tree’s structure randomly• Categorical feature used once in a decision path• Random threshold for continuous features.
– Leaf node statistics estimated from data.– Average probability of multiple trees.
![Page 5: When Efficient Model Averaging Out-Perform Bagging and Boosting](https://reader035.fdocuments.net/reader035/viewer/2022062803/56814685550346895db3a7bc/html5/thumbnails/5.jpg)
Our Empirical Study
• Idea: When model uncertainty occurs, model averaging should perform well
• Four specific but common situations when factoring in model uncertainty is beneficial– Class label noise– Many label problem– Sample selection bias– Small data sets
![Page 6: When Efficient Model Averaging Out-Perform Bagging and Boosting](https://reader035.fdocuments.net/reader035/viewer/2022062803/56814685550346895db3a7bc/html5/thumbnails/6.jpg)
Class Label Noise
• Randomly flip 10% of labels
![Page 7: When Efficient Model Averaging Out-Perform Bagging and Boosting](https://reader035.fdocuments.net/reader035/viewer/2022062803/56814685550346895db3a7bc/html5/thumbnails/7.jpg)
Data Set with Many Classes
![Page 8: When Efficient Model Averaging Out-Perform Bagging and Boosting](https://reader035.fdocuments.net/reader035/viewer/2022062803/56814685550346895db3a7bc/html5/thumbnails/8.jpg)
Biased Training Sets
• See ICDM 2005 for a formal analysis• See KDD 2006 to look at estimating accuracy• See ICDM 2006 for a case study
![Page 9: When Efficient Model Averaging Out-Perform Bagging and Boosting](https://reader035.fdocuments.net/reader035/viewer/2022062803/56814685550346895db3a7bc/html5/thumbnails/9.jpg)
Universe of Examples
Two classes:red and green
red: f2>f1green: f2<=f1
![Page 10: When Efficient Model Averaging Out-Perform Bagging and Boosting](https://reader035.fdocuments.net/reader035/viewer/2022062803/56814685550346895db3a7bc/html5/thumbnails/10.jpg)
Unbiased and Biased Samples
![Page 11: When Efficient Model Averaging Out-Perform Bagging and Boosting](https://reader035.fdocuments.net/reader035/viewer/2022062803/56814685550346895db3a7bc/html5/thumbnails/11.jpg)
Single Decision Tree
Unbiased 97.1% Biased 92.1%
![Page 12: When Efficient Model Averaging Out-Perform Bagging and Boosting](https://reader035.fdocuments.net/reader035/viewer/2022062803/56814685550346895db3a7bc/html5/thumbnails/12.jpg)
Random Decision Tree
Unbiased 96.9% Biased 95.9%
![Page 13: When Efficient Model Averaging Out-Perform Bagging and Boosting](https://reader035.fdocuments.net/reader035/viewer/2022062803/56814685550346895db3a7bc/html5/thumbnails/13.jpg)
Bagging
Unbiased 97.82% Biased 93.52%
![Page 14: When Efficient Model Averaging Out-Perform Bagging and Boosting](https://reader035.fdocuments.net/reader035/viewer/2022062803/56814685550346895db3a7bc/html5/thumbnails/14.jpg)
PBMA
Unbiased 99.08% Biased 94.55
![Page 15: When Efficient Model Averaging Out-Perform Bagging and Boosting](https://reader035.fdocuments.net/reader035/viewer/2022062803/56814685550346895db3a7bc/html5/thumbnails/15.jpg)
Boosting
Unbiased 96.405% Biased 92.7%
![Page 16: When Efficient Model Averaging Out-Perform Bagging and Boosting](https://reader035.fdocuments.net/reader035/viewer/2022062803/56814685550346895db3a7bc/html5/thumbnails/16.jpg)
Scope of This Paper
• Identifies conditions where model averaging should outperform bagging and boosting.
• Empirically verifies these claims.
• Other questions:– Why does bagging and boosting perform
badly in these conditions?