Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision...
Transcript of Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision...
Copyright © SAS Inst itute Inc. A l l r ights reserved.
Ensemble Models and Partitioning Algorithms in SAS® Enterprise Miner™
Copyright © SAS Inst itute Inc. A l l r ights reserved.
2
Goals
• Introduce ensemble models
• Increase awareness of capabilities in SAS® Enterprise Miner™ supporting ensemble modeling
• Share resources for learning more
Copyright © SAS Inst itute Inc. A l l r ights reserved.
3
SAS Enterprise MinerEnsemble Models and Partitioning Algorithms
• Ensemble Models
• Decision Trees
• Perturb and Combine
• Bagging
• Boosting
• Gradient Boosting
• Random Forests (SAS 9.4)
• Ensemble Forests
• Stacked Ensembles
Copyright © SAS Inst itute Inc. A l l r ights reserved.
“Wisdom of the Crowd”
Copyright © SAS Inst itute Inc. A l l r ights reserved.
5
ExperimentHow Many Jelly Beans Are in the Jar?
Copyright © SAS Inst itute Inc. A l l r ights reserved.
Ensemble Models
Copyright © SAS Inst itute Inc. A l l r ights reserved.
7
Ensemble Modeling
Introduction
Two or more predictive models combined to create a potentially more accurate model
Works better when model predictions are uncorrelated
“Wisdom of the crowd” – Aristotle (‘Politics’)
Collective wisdom of many is likely more accurate than any one
Copyright © SAS Inst itute Inc. A l l r ights reserved.
8
Ensemble ModelingApplications
http://www.nhc.noaa.gov/
Copyright © SAS Inst itute Inc. A l l r ights reserved.
9
Ensemble ModelingApplications
Copyright © SAS Inst itute Inc. A l l r ights reserved.
10
Ensemble ModelingApplications
Copyright © SAS Inst itute Inc. A l l r ights reserved.
11
Ensemble ModelingApproaches to Build Models
Different algorithms
• Example: Decision Tree + SVM + Neural Network
One algorithm, different configurations
• Example: Various configurations of Neural Networks
One algorithm, different data samples
• Example: Random Forest, Gradient Boosting
Combine Models
Build Predictive Models
Copyright © SAS Inst itute Inc. A l l r ights reserved.
12
Ensemble ModelingApproaches to Build Models
Different algorithms
• Example: Decision Tree + SVM + Neural Network
One algorithm, different configurations
• Example: Various configurations of Neural Networks
One algorithm, different data samples
• Example: Random Forest, Gradient Boosting
Combine Models
Decision Tree SVM Neural Network
Copyright © SAS Inst itute Inc. A l l r ights reserved.
13
Ensemble ModelingApproaches to Build Models
Different algorithms
• Example: Decision Tree + SVM + Neural Network
One algorithm, different configurations
• Example: Various configurations of Neural Networks
One algorithm, different data samples
• Example: Random Forest, Gradient Boosting
Combine Models
Neural Networks
Copyright © SAS Inst itute Inc. A l l r ights reserved.
14
Ensemble ModelingApproaches to Build Models
Different algorithms
• Example: Decision Tree + SVM + Neural Network
One algorithm, different configurations
• Example: Various configurations of Neural Networks
One algorithm, different data samples
• Example: Random Forest, Gradient Boosting
Combine Models
Decision Trees
Different Samples of Data
Copyright © SAS Inst itute Inc. A l l r ights reserved.
15
Ensemble ModelingAn Ensemble Model Is a Combination of Multiple Models.
Copyright © SAS Inst itute Inc. A l l r ights reserved.
16
Ensemble ModelingApproaches to Combine Models
• Averaging or Voting
• Stacking/Blending
• Cluster-based selection Combine Models
Build Predictive Models
Copyright © SAS Inst itute Inc. A l l r ights reserved.
17
Ensemble ModelingApproaches to Combine Models
• Averaging or Voting
• Stacking/Blending
• Cluster-based selection
Decision Tree SVM Neural Network
(P1+P2+P3)/3
P2P1 P3
Copyright © SAS Inst itute Inc. A l l r ights reserved.
18
Ensemble ModelingApproaches to Combine Models
• Averaging or Voting
• Stacking/Blending
• Cluster-based selection
Decision Tree SVM Neural Network
P2 P3P1
Copyright © SAS Inst itute Inc. A l l r ights reserved.
19
Ensemble ModelingApproaches to Combine Models
• Averaging or Voting
• Stacking/Blending
• Cluster-based selection Cluster
Decision Tree SVM Neural Network
P2P1 P3
P
Combine Models
P2 P3
Copyright © SAS Inst itute Inc. A l l r ights reserved.
20
Ensemble Models
The result of combining models can sometimes lead to a more accurate model.*
* It is important to note that the ensemble model can be more accurate than theindividual models only if the individual models disagree with one another.
Copyright © SAS Inst itute Inc. A l l r ights reserved.
21
Trade-Off
Copyright © SAS Inst itute Inc. A l l r ights reserved.
Decision Trees
Copyright © SAS Inst itute Inc. A l l r ights reserved.
23
Decision TreesWhat Is It?
• Linear separation of data using “if then else” logic
• Separation is performed via an exhaustive search of splitting points for each variable.
• Many different architectural variations based on the above architecture
• Users might refer to them as
• CHAID Trees
• CART Trees
• C4.5 Trees
• C5.0 Trees.
• Each of the above is simply a variation on the tree architecture.
Copyright © SAS Inst itute Inc. A l l r ights reserved.
24
Decision Tree
Easy to Visualize
Copyright © SAS Inst itute Inc. A l l r ights reserved.
25
Decision TreesDecision Rules
Node = 9
if Saving Balance < 2500 or MISSING
AND Money Market Balance >= 7000
then
Tree Node Identifier = 9
Number of Observations = 917.56099466
Predicted: INS=1 = 0.05
Predicted: INS=0 = 0.95
Copyright © SAS Inst itute Inc. A l l r ights reserved.
26
Analytics Life CycleDecision Trees Can Help in Various Stages.
Formulate
ProblemData
Preparation
Data Exploration
Transform & Select
Develop Models
Validate Models
Deploy Model
Evaluate & Monitor Model
Copyright © SAS Inst itute Inc. A l l r ights reserved.
27
Decision TreesUses
• Data exploration
• Generating business rules
• Segmentation
• Missing value imputation
• Variable transformation and variable selection
• Predictive models
• Comparison Model
• Test decision trees versus regression to determine whether there are two (or more) different populations in the data and possibly two models need to be built.
Copyright © SAS Inst itute Inc. A l l r ights reserved.
28
Decision TreesMultivariate Step Function
Copyright © SAS Inst itute Inc. A l l r ights reserved.
29
Decision TreesAdvantages
• Fast training time
• Can handle outliers and missing values
• Simple to interpret
• Simple to deploy models into production
• Wide range of uses (models, fix missing values, variable selection, and so on)
• Consistently gives the same accuracy when data changes.
Copyright © SAS Inst itute Inc. A l l r ights reserved.
30
Decision TreesDisadvantages
• Coarse segmenting (everybody in the same leaf gets same prediction).
• Small change in data can result in a completely different looking tree.
• Highly linear. It is difficult to discover non-linear transformations and multi-factor interactions.
Copyright © SAS Inst itute Inc. A l l r ights reserved.
31
Decision TreesDisadvantages
• Coarse segmenting (everybody in the same leaf gets same prediction).
• Small change in data can result in a completely different looking tree.
• Highly linear. It is difficult to discover non-linear transformations and multi-factor interactions
Copyright © SAS Inst itute Inc. A l l r ights reserved.
Ensemble Trees
Copyright © SAS Inst itute Inc. A l l r ights reserved.
33
Decision Trees: InstabilityDisadvantage? or “Feature to Exploit”?
Small change in data can result in a completely different looking tree.
Copyright © SAS Inst itute Inc. A l l r ights reserved.
34
Ensemble Trees
Perturb and Combine (P&C) methods generate multiple models by manipulating the distribution of the data or altering the construction method and then averaging the results.
Copyright © SAS Inst itute Inc. A l l r ights reserved.
Bagging
Copyright © SAS Inst itute Inc. A l l r ights reserved.
36
Bagging
Bagging (bootstrap aggregation) is the original P&C method (Breiman).
A bootstrap sample is a random sample of size n drawn from the training data with replacement.
• Some observations will be left out of the sample.
• Some observations will be represented more than once.
A tree is built on each sample.
Vote or average the posterior probabilities.
Copyright © SAS Inst itute Inc. A l l r ights reserved.
37
Bagging
Copyright © SAS Inst itute Inc. A l l r ights reserved.
38
Start and End GroupsSAS Enterprise Miner
Copyright © SAS Inst itute Inc. A l l r ights reserved.
39
BaggingSAS Enterprise Miner
Copyright © SAS Inst itute Inc. A l l r ights reserved.
40
Ensemble Trees
“Bagging goes a long way towards making a silk purse out of a sow’s ear, especially if the sow’s ear is twitchy. It is a relatively easy way to improve an existing method, since all that needs adding is a loop in front that selects the bootstrap sample and sends it to the procedure and a back end that does the aggregation. What one loses, with the trees, is a simple and interpretable structure. What one gains is increased accuracy.”
Leo Breiman (1996)
Copyright © SAS Inst itute Inc. A l l r ights reserved.
Boosting
Copyright © SAS Inst itute Inc. A l l r ights reserved.
42
BoostingReweighted Sampling
Arcing (adaptive resampling and combining) methods sequentially perturb the training data based on the results of previous models.
Cases that are incorrectly classified are given more weight in subsequent models.
Copyright © SAS Inst itute Inc. A l l r ights reserved.
43
Boosting
Copyright © SAS Inst itute Inc. A l l r ights reserved.
44
Boosting
For the ith case, the arc-x4 weights are calculated as
where 0<= mi <= k is the number of times that the ith case is misclassified in the preceding steps.
𝑝𝑖 =1 +𝑚𝑖
4
σ(1 +𝑚𝑖4)
Copyright © SAS Inst itute Inc. A l l r ights reserved.
45
BoostingSAS Enterprise Miner
Copyright © SAS Inst itute Inc. A l l r ights reserved.
47
ComparisonSingle, Bagged, and Boosted Trees
Copyright © SAS Inst itute Inc. A l l r ights reserved.
Gradient Boosting
Copyright © SAS Inst itute Inc. A l l r ights reserved.
49
Gradient BoostingWhat Is It?
• A combination of several “decision trees.”
• Gradient boosting consists of a forest of small decision trees (“shrubs”, “stumps”).
• Each shrub is poor at predicting target, but each subsequent shrub tries to fit the remaining error.
• Eventually converges to good solution.
Copyright © SAS Inst itute Inc. A l l r ights reserved.
50
Gradient BoostingUses
Any type of predictive models (often used with Fraud and Customer Behavior analytics). Widely used in search engine ranking and the general field of learning to rank.
• Variable selection
• Comparison model
- Test gradient boosting versus regression to determine where there are two different populations in the data and possibly two models need to be built.
Copyright © SAS Inst itute Inc. A l l r ights reserved.
51
Gradient Boosting
Example: Iterations=0
Copyright © SAS Inst itute Inc. A l l r ights reserved.
52
Gradient Boosting
Example: Iterations=1
Copyright © SAS Inst itute Inc. A l l r ights reserved.
53
Gradient Boosting
Example: Iterations=10
Copyright © SAS Inst itute Inc. A l l r ights reserved.
54
Gradient Boosting
Example: Iterations=25
Copyright © SAS Inst itute Inc. A l l r ights reserved.
55
Gradient Boosting
Example: Iterations=50
Copyright © SAS Inst itute Inc. A l l r ights reserved.
56
Gradient Boosting
Example: Iterations=75
Copyright © SAS Inst itute Inc. A l l r ights reserved.
57
Gradient Boosting
Example: Iterations=100
Copyright © SAS Inst itute Inc. A l l r ights reserved.
58
Gradient Boosting
Example: Iterations=200
Copyright © SAS Inst itute Inc. A l l r ights reserved.
59
Gradient Boosting
Example: Iterations=300
Copyright © SAS Inst itute Inc. A l l r ights reserved.
60
Gradient Boosting
Copyright © SAS Inst itute Inc. A l l r ights reserved.
61
Gradient BoostingAdvantages
• Fast training time
• Can handle outliers and missing values
• Can handle complex functions
• Consistently gives the same accuracy when data changes.
Copyright © SAS Inst itute Inc. A l l r ights reserved.
62
Gradient BoostingDisadvantages
• Can cause over-fitting
• Difficult to discover (visualize) non-linear transformations and multi-factor interactions
• Slightly more difficult to deploy the model into production
• “Black box” not easy to interpret
• Might not be legal to use in some industries (that is, consumer auto or credit)
Copyright © SAS Inst itute Inc. A l l r ights reserved.
63
Gradient BoostingSAS Enterprise Miner
N Iterations: how many iterations occur
Shrinkage: how influential each iteration is
Maximum Depth: complexity of each individual model
Copyright © SAS Inst itute Inc. A l l r ights reserved.
Random Forests
Copyright © SAS Inst itute Inc. A l l r ights reserved.
65
Random ForestWhat Is It?
• A combination of several “decision trees.”
• A random forest consists of a forest of fully trained decision trees (each with a random variation).
• The random forest averages the output of all the decision trees in the “forest.”
Copyright © SAS Inst itute Inc. A l l r ights reserved.
66
Random ForestUses
• Any type of predictive models. Usually used in applications where a decision tree or gradient boosting tree would be used.
• Often used with big data
• Variable selection
• Comparison model
- Test random forests versus regression to determine whether there are two different populations in the data and possibly two models need to be built.
Copyright © SAS Inst itute Inc. A l l r ights reserved.
67
Random ForestAlgorithm
Select a number of trees in the random forest.
For each tree in the forest, use the following split algorithm:
• Select a random sample of data.
• Select a random subset of variables.
• Determine the best split from the sample of data and the sample of variables.
• Keep selecting random data and random subsets of variables until the maximum number of trees is trained.
When all the trees are built, the prediction is the average of all trees.
Copyright © SAS Inst itute Inc. A l l r ights reserved.
68
Random ForestAdvantages
Fast training time for big data sets with lots of variables
• Can also determine a variable’s importance for predicting a target
Can handle outliers and missing values
Can handle complex functions
Consistently gives the same accuracy when data changes.
• Perturbs training data more than the bagging algorithm, producing more variation in the models.
• Ensembles of a more diverse set of trees often leads to improved predictive accuracy.
Copyright © SAS Inst itute Inc. A l l r ights reserved.
69
Random ForestDisadvantages
• Difficult to discover (visualize) non-linear transformations and multi-factor interactions
• More difficult to deploy the model into production
• “Black box” not easy to interpret
• Might not be legal to use in some industries (that is, consumer auto or credit)
Copyright © SAS Inst itute Inc. A l l r ights reserved.
70
Random ForestsSAS Enterprise Miner
SAS Enterprise Miner 13.1, 13.2, 14.1 or 14.2 on SAS 9.4
Copyright © SAS Inst itute Inc. A l l r ights reserved.
71
Random ForestsSAS Enterprise Miner
Maximum Number of Trees: how many trees will be in the forest.
Sampling Strategy: specifies number of observations used in each sample and sampling method.
Number vars to consider in split search: how many input variables to consider when splitting each node.
(The default value is the square root of the number of inputs.)
Copyright © SAS Inst itute Inc. A l l r ights reserved.
Ensemble Forests
Copyright © SAS Inst itute Inc. A l l r ights reserved.
73
Ensemble Forests
What Is It?
• A combination of “decision trees”
• A collection of two or more decision trees where output is averaged
• Different from random forest in that trees are built one at a time by analyst
• Much smaller than a random forest
• Slower to develop trees one at a time
Copyright © SAS Inst itute Inc. A l l r ights reserved.
74
Ensemble Forests
Uses
Any type of predictive models where decision trees are used
Comparison model
• Test ensemble forests versus regression to determine whether there are two (or more) different populations in the data and possibly two (or more) models need to be built.
Copyright © SAS Inst itute Inc. A l l r ights reserved.
75
Ensemble Forests
Algorithm
• Select a number of trees in the ensemble forest.
• Build two or more decision trees using different parameters so that trees are different from one another.
• When all the trees are built, the prediction is the average of all trees.
Copyright © SAS Inst itute Inc. A l l r ights reserved.
76
Ensemble Forests
Copyright © SAS Inst itute Inc. A l l r ights reserved.
77
Ensemble Forests
Advantages
• Fast training time
• Can handle outliers and missing values
• Can handle complex functions
• Consistently gives the same accuracy when data changes
Copyright © SAS Inst itute Inc. A l l r ights reserved.
78
Ensemble Forests
Disadvantages
• Difficult to discover (visualize) non-linear transformations and multi-factor interactions
• “Black box” not easy to interpret
• Might not be legal to use in some industries (that is, consumer auto or credit)
Copyright © SAS Inst itute Inc. A l l r ights reserved.
79
EnsembleModeling Algorithms
Creates new models by combining the posterior probabilities (for class targets) or the predicted values (for interval targets) from multiple predecessor models.
Three methods:
• Average
• Maximum
• Voting
Copyright © SAS Inst itute Inc. A l l r ights reserved.
Stacked Ensembles
Copyright © SAS Inst itute Inc. A l l r ights reserved.
81
Stacked EnsemblesWhat Is It?
• A variation on Ensemble node
• Generate as many different models as you like
• Use prediction values from those models as inputs into a new model
Copyright © SAS Inst itute Inc. A l l r ights reserved.
82
Stacked EnsemblesAlgorithm
• Generate many different models on a training data set, each with predicted values or predicted probabilities for the target.
• Generate a metadata set that includes the predicted values or predicted probabilities from each model.
• Can also include the original input variables
• Run another modeling algorithm using the new metadata set (stacked ensemble) to predict the target.
• Stacked ensembles can be as complicated or simple as you want.
Copyright © SAS Inst itute Inc. A l l r ights reserved.
83
Stacked Ensembles
Copyright © SAS Inst itute Inc. A l l r ights reserved.
84
Stacked EnsemblesMetadata: Change Predictions to Inputs
Copyright © SAS Inst itute Inc. A l l r ights reserved.
85
Stacked EnsemblesDecision Tree Output
Copyright © SAS Inst itute Inc. A l l r ights reserved.
86
Stacked EnsemblesModel Comparison Output
Copyright © SAS Inst itute Inc. A l l r ights reserved.
87
Stacked EnsemblesScore Code
Includes score code for all four original models and the final decision tree.
Copyright © SAS Inst itute Inc. A l l r ights reserved.
88
Stacked Ensembles
Advantages
• Enables us to use many different modeling algorithms
• Can handle outliers and missing values
• Can handle complex functions
• Consistently gives the same accuracy when data changes.
Copyright © SAS Inst itute Inc. A l l r ights reserved.
89
Stacked Ensembles
Disadvantages
• Difficult to discover (visualize) non-linear transformations and multi-factor interactions
• “Black box” not easy to interpret
• Might not be legal to use in some industries (that is, consumer auto or credit)
Copyright © SAS Inst itute Inc. A l l r ights reserved.
Review
Copyright © SAS Inst itute Inc. A l l r ights reserved.
91
SAS Enterprise MinerEnsemble Models and Partitioning Algorithms
• Ensemble Models
• Decision Trees
• Perturb and Combine
• Bagging
• Boosting
• Gradient Boosting
• Random Forests (SAS 9.4)
• Ensemble Forests
• Stacked Ensembles
Copyright © SAS Inst itute Inc. A l l r ights reserved.
Resources
Copyright © SAS Inst itute Inc. A l l r ights reserved.
94
Learning MoreSAS Resources
SAS Global Forum Papers:
• Leveraging Ensemble Models in SAS® Enterprise Miner™ Miguel Maldonado, Jared Dean, Wendy Czika, and Susan Haller
• Ensemble Modeling: Recent Advances and Application Wendy Czika, Miguel Maldonado, and Ye Liu
• Stacked Ensemble Models for Improved Prediction Accuracy Funda Güneş, Russ Wolfinger, and Pei-Yi Ta
Blog: Why do stacked ensemble models win data science competitions?
Copyright © SAS Inst itute Inc. A l l r ights reserved.
95
Learning More
Decision Trees for Analytics Using SAS® Enterprise Miner™
By: Barry de Ville and Padraic Neville
ISBN: 978-1-61290-315-6
Copyright Date: July 2013
SAS Bookstore: https://www.sas.com/store/prodBK_63319_en.html
Table of Contents [PDF]
Free Chapter [PDF]
Example Code and Data
Available on Amazon
Copyright © SAS Inst itute Inc. A l l r ights reserved.
96
Learning More
Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions
By: Giovanni Seni and John Elder
ISBN-10: 1608452840
Publisher: Morgan and Claypool Publishers (February 24, 2010)
Available on Amazon
Copyright © SAS Inst itute Inc. A l l r ights reserved.
97
Learning MoreAcademic References
Bauer, E. and Kohavi, R. 1999. “An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting and Variants.” Machine Learning 36:105-169, 1999.
Breiman, L. 1996. “Bagging Predictors.” Machine Learning 24:123-140.
Breiman, L. 1998. “Arcing Classifiers (with discussion). “ Annals of Statistics 26: 801-849.
Breiman, L. 2001. “Random Forests.” Machine Learning Volume 45: 5-32.
de Ville, Barry 2006. Decision Trees for Business Intelligence and Data Mining: Using SAS Enterprise Miner. Cary, NC.
Friedman, Jerome H. 2001 “Greedy Function Approximation: A Gradient Boosting Machine.” The Annals of Statistics 29: 1189–1232.
Friedman, Jerome H. 2002. “Stochastic Gradient Boosting.” Computations Statistics & Data Analysis 38: 367–378.
Copyright © SAS Inst itute Inc. A l l r ights reserved.
98
SAS Online CommunityCommunities.sas.com/data-mining
Copyright © SAS Inst itute Inc. A l l r ights reserved.
Questions?
Thank you for your time and attention!
Connect with me:LinkedIn: https://www.linkedin.com/in/melodierushTwitter: @Melodie_Rush