Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision...

97
Copyright © SAS Institute Inc. All rights reserved. Ensemble Models and Partitioning Algorithms in SAS ® Enterprise Miner™

Transcript of Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision...

Page 1: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

Ensemble Models and Partitioning Algorithms in SAS® Enterprise Miner™

Page 2: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

2

Goals

• Introduce ensemble models

• Increase awareness of capabilities in SAS® Enterprise Miner™ supporting ensemble modeling

• Share resources for learning more

Page 3: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

3

SAS Enterprise MinerEnsemble Models and Partitioning Algorithms

• Ensemble Models

• Decision Trees

• Perturb and Combine

• Bagging

• Boosting

• Gradient Boosting

• Random Forests (SAS 9.4)

• Ensemble Forests

• Stacked Ensembles

Page 4: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

“Wisdom of the Crowd”

Page 5: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

5

ExperimentHow Many Jelly Beans Are in the Jar?

Page 6: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

Ensemble Models

Page 7: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

7

Ensemble Modeling

Introduction

Two or more predictive models combined to create a potentially more accurate model

Works better when model predictions are uncorrelated

“Wisdom of the crowd” – Aristotle (‘Politics’)

Collective wisdom of many is likely more accurate than any one

Page 8: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

8

Ensemble ModelingApplications

http://www.nhc.noaa.gov/

Page 9: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

9

Ensemble ModelingApplications

Page 10: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

10

Ensemble ModelingApplications

Page 11: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

11

Ensemble ModelingApproaches to Build Models

Different algorithms

• Example: Decision Tree + SVM + Neural Network

One algorithm, different configurations

• Example: Various configurations of Neural Networks

One algorithm, different data samples

• Example: Random Forest, Gradient Boosting

Combine Models

Build Predictive Models

Page 12: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

12

Ensemble ModelingApproaches to Build Models

Different algorithms

• Example: Decision Tree + SVM + Neural Network

One algorithm, different configurations

• Example: Various configurations of Neural Networks

One algorithm, different data samples

• Example: Random Forest, Gradient Boosting

Combine Models

Decision Tree SVM Neural Network

Page 13: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

13

Ensemble ModelingApproaches to Build Models

Different algorithms

• Example: Decision Tree + SVM + Neural Network

One algorithm, different configurations

• Example: Various configurations of Neural Networks

One algorithm, different data samples

• Example: Random Forest, Gradient Boosting

Combine Models

Neural Networks

Page 14: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

14

Ensemble ModelingApproaches to Build Models

Different algorithms

• Example: Decision Tree + SVM + Neural Network

One algorithm, different configurations

• Example: Various configurations of Neural Networks

One algorithm, different data samples

• Example: Random Forest, Gradient Boosting

Combine Models

Decision Trees

Different Samples of Data

Page 15: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

15

Ensemble ModelingAn Ensemble Model Is a Combination of Multiple Models.

Page 16: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

16

Ensemble ModelingApproaches to Combine Models

• Averaging or Voting

• Stacking/Blending

• Cluster-based selection Combine Models

Build Predictive Models

Page 17: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

17

Ensemble ModelingApproaches to Combine Models

• Averaging or Voting

• Stacking/Blending

• Cluster-based selection

Decision Tree SVM Neural Network

(P1+P2+P3)/3

P2P1 P3

Page 18: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

18

Ensemble ModelingApproaches to Combine Models

• Averaging or Voting

• Stacking/Blending

• Cluster-based selection

Decision Tree SVM Neural Network

P2 P3P1

Page 19: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

19

Ensemble ModelingApproaches to Combine Models

• Averaging or Voting

• Stacking/Blending

• Cluster-based selection Cluster

Decision Tree SVM Neural Network

P2P1 P3

P

Combine Models

P2 P3

Page 20: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

20

Ensemble Models

The result of combining models can sometimes lead to a more accurate model.*

* It is important to note that the ensemble model can be more accurate than theindividual models only if the individual models disagree with one another.

Page 21: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

21

Trade-Off

Page 22: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

Decision Trees

Page 23: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

23

Decision TreesWhat Is It?

• Linear separation of data using “if then else” logic

• Separation is performed via an exhaustive search of splitting points for each variable.

• Many different architectural variations based on the above architecture

• Users might refer to them as

• CHAID Trees

• CART Trees

• C4.5 Trees

• C5.0 Trees.

• Each of the above is simply a variation on the tree architecture.

Page 24: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

24

Decision Tree

Easy to Visualize

Page 25: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

25

Decision TreesDecision Rules

Node = 9

if Saving Balance < 2500 or MISSING

AND Money Market Balance >= 7000

then

Tree Node Identifier = 9

Number of Observations = 917.56099466

Predicted: INS=1 = 0.05

Predicted: INS=0 = 0.95

Page 26: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

26

Analytics Life CycleDecision Trees Can Help in Various Stages.

Formulate

ProblemData

Preparation

Data Exploration

Transform & Select

Develop Models

Validate Models

Deploy Model

Evaluate & Monitor Model

Page 27: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

27

Decision TreesUses

• Data exploration

• Generating business rules

• Segmentation

• Missing value imputation

• Variable transformation and variable selection

• Predictive models

• Comparison Model

• Test decision trees versus regression to determine whether there are two (or more) different populations in the data and possibly two models need to be built.

Page 28: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

28

Decision TreesMultivariate Step Function

Page 29: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

29

Decision TreesAdvantages

• Fast training time

• Can handle outliers and missing values

• Simple to interpret

• Simple to deploy models into production

• Wide range of uses (models, fix missing values, variable selection, and so on)

• Consistently gives the same accuracy when data changes.

Page 30: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

30

Decision TreesDisadvantages

• Coarse segmenting (everybody in the same leaf gets same prediction).

• Small change in data can result in a completely different looking tree.

• Highly linear. It is difficult to discover non-linear transformations and multi-factor interactions.

Page 31: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

31

Decision TreesDisadvantages

• Coarse segmenting (everybody in the same leaf gets same prediction).

• Small change in data can result in a completely different looking tree.

• Highly linear. It is difficult to discover non-linear transformations and multi-factor interactions

Page 32: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

Ensemble Trees

Page 33: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

33

Decision Trees: InstabilityDisadvantage? or “Feature to Exploit”?

Small change in data can result in a completely different looking tree.

Page 34: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

34

Ensemble Trees

Perturb and Combine (P&C) methods generate multiple models by manipulating the distribution of the data or altering the construction method and then averaging the results.

Page 35: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

Bagging

Page 36: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

36

Bagging

Bagging (bootstrap aggregation) is the original P&C method (Breiman).

A bootstrap sample is a random sample of size n drawn from the training data with replacement.

• Some observations will be left out of the sample.

• Some observations will be represented more than once.

A tree is built on each sample.

Vote or average the posterior probabilities.

Page 37: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

37

Bagging

Page 38: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

38

Start and End GroupsSAS Enterprise Miner

Page 39: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

39

BaggingSAS Enterprise Miner

Page 40: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

40

Ensemble Trees

“Bagging goes a long way towards making a silk purse out of a sow’s ear, especially if the sow’s ear is twitchy. It is a relatively easy way to improve an existing method, since all that needs adding is a loop in front that selects the bootstrap sample and sends it to the procedure and a back end that does the aggregation. What one loses, with the trees, is a simple and interpretable structure. What one gains is increased accuracy.”

Leo Breiman (1996)

Page 41: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

Boosting

Page 42: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

42

BoostingReweighted Sampling

Arcing (adaptive resampling and combining) methods sequentially perturb the training data based on the results of previous models.

Cases that are incorrectly classified are given more weight in subsequent models.

Page 43: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

43

Boosting

Page 44: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

44

Boosting

For the ith case, the arc-x4 weights are calculated as

where 0<= mi <= k is the number of times that the ith case is misclassified in the preceding steps.

𝑝𝑖 =1 +𝑚𝑖

4

σ(1 +𝑚𝑖4)

Page 45: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

45

BoostingSAS Enterprise Miner

Page 46: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

47

ComparisonSingle, Bagged, and Boosted Trees

Page 47: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

Gradient Boosting

Page 48: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

49

Gradient BoostingWhat Is It?

• A combination of several “decision trees.”

• Gradient boosting consists of a forest of small decision trees (“shrubs”, “stumps”).

• Each shrub is poor at predicting target, but each subsequent shrub tries to fit the remaining error.

• Eventually converges to good solution.

Page 49: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

50

Gradient BoostingUses

Any type of predictive models (often used with Fraud and Customer Behavior analytics). Widely used in search engine ranking and the general field of learning to rank.

• Variable selection

• Comparison model

- Test gradient boosting versus regression to determine where there are two different populations in the data and possibly two models need to be built.

Page 50: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

51

Gradient Boosting

Example: Iterations=0

Page 51: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

52

Gradient Boosting

Example: Iterations=1

Page 52: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

53

Gradient Boosting

Example: Iterations=10

Page 53: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

54

Gradient Boosting

Example: Iterations=25

Page 54: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

55

Gradient Boosting

Example: Iterations=50

Page 55: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

56

Gradient Boosting

Example: Iterations=75

Page 56: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

57

Gradient Boosting

Example: Iterations=100

Page 57: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

58

Gradient Boosting

Example: Iterations=200

Page 58: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

59

Gradient Boosting

Example: Iterations=300

Page 59: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

60

Gradient Boosting

Page 60: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

61

Gradient BoostingAdvantages

• Fast training time

• Can handle outliers and missing values

• Can handle complex functions

• Consistently gives the same accuracy when data changes.

Page 61: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

62

Gradient BoostingDisadvantages

• Can cause over-fitting

• Difficult to discover (visualize) non-linear transformations and multi-factor interactions

• Slightly more difficult to deploy the model into production

• “Black box” not easy to interpret

• Might not be legal to use in some industries (that is, consumer auto or credit)

Page 62: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

63

Gradient BoostingSAS Enterprise Miner

N Iterations: how many iterations occur

Shrinkage: how influential each iteration is

Maximum Depth: complexity of each individual model

Page 63: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

Random Forests

Page 64: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

65

Random ForestWhat Is It?

• A combination of several “decision trees.”

• A random forest consists of a forest of fully trained decision trees (each with a random variation).

• The random forest averages the output of all the decision trees in the “forest.”

Page 65: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

66

Random ForestUses

• Any type of predictive models. Usually used in applications where a decision tree or gradient boosting tree would be used.

• Often used with big data

• Variable selection

• Comparison model

- Test random forests versus regression to determine whether there are two different populations in the data and possibly two models need to be built.

Page 66: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

67

Random ForestAlgorithm

Select a number of trees in the random forest.

For each tree in the forest, use the following split algorithm:

• Select a random sample of data.

• Select a random subset of variables.

• Determine the best split from the sample of data and the sample of variables.

• Keep selecting random data and random subsets of variables until the maximum number of trees is trained.

When all the trees are built, the prediction is the average of all trees.

Page 67: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

68

Random ForestAdvantages

Fast training time for big data sets with lots of variables

• Can also determine a variable’s importance for predicting a target

Can handle outliers and missing values

Can handle complex functions

Consistently gives the same accuracy when data changes.

• Perturbs training data more than the bagging algorithm, producing more variation in the models.

• Ensembles of a more diverse set of trees often leads to improved predictive accuracy.

Page 68: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

69

Random ForestDisadvantages

• Difficult to discover (visualize) non-linear transformations and multi-factor interactions

• More difficult to deploy the model into production

• “Black box” not easy to interpret

• Might not be legal to use in some industries (that is, consumer auto or credit)

Page 69: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

70

Random ForestsSAS Enterprise Miner

SAS Enterprise Miner 13.1, 13.2, 14.1 or 14.2 on SAS 9.4

Page 70: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

71

Random ForestsSAS Enterprise Miner

Maximum Number of Trees: how many trees will be in the forest.

Sampling Strategy: specifies number of observations used in each sample and sampling method.

Number vars to consider in split search: how many input variables to consider when splitting each node.

(The default value is the square root of the number of inputs.)

Page 71: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

Ensemble Forests

Page 72: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

73

Ensemble Forests

What Is It?

• A combination of “decision trees”

• A collection of two or more decision trees where output is averaged

• Different from random forest in that trees are built one at a time by analyst

• Much smaller than a random forest

• Slower to develop trees one at a time

Page 73: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

74

Ensemble Forests

Uses

Any type of predictive models where decision trees are used

Comparison model

• Test ensemble forests versus regression to determine whether there are two (or more) different populations in the data and possibly two (or more) models need to be built.

Page 74: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

75

Ensemble Forests

Algorithm

• Select a number of trees in the ensemble forest.

• Build two or more decision trees using different parameters so that trees are different from one another.

• When all the trees are built, the prediction is the average of all trees.

Page 75: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

76

Ensemble Forests

Page 76: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

77

Ensemble Forests

Advantages

• Fast training time

• Can handle outliers and missing values

• Can handle complex functions

• Consistently gives the same accuracy when data changes

Page 77: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

78

Ensemble Forests

Disadvantages

• Difficult to discover (visualize) non-linear transformations and multi-factor interactions

• “Black box” not easy to interpret

• Might not be legal to use in some industries (that is, consumer auto or credit)

Page 78: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

79

EnsembleModeling Algorithms

Creates new models by combining the posterior probabilities (for class targets) or the predicted values (for interval targets) from multiple predecessor models.

Three methods:

• Average

• Maximum

• Voting

Page 79: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

Stacked Ensembles

Page 80: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

81

Stacked EnsemblesWhat Is It?

• A variation on Ensemble node

• Generate as many different models as you like

• Use prediction values from those models as inputs into a new model

Page 81: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

82

Stacked EnsemblesAlgorithm

• Generate many different models on a training data set, each with predicted values or predicted probabilities for the target.

• Generate a metadata set that includes the predicted values or predicted probabilities from each model.

• Can also include the original input variables

• Run another modeling algorithm using the new metadata set (stacked ensemble) to predict the target.

• Stacked ensembles can be as complicated or simple as you want.

Page 82: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

83

Stacked Ensembles

Page 83: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

84

Stacked EnsemblesMetadata: Change Predictions to Inputs

Page 84: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

85

Stacked EnsemblesDecision Tree Output

Page 85: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

86

Stacked EnsemblesModel Comparison Output

Page 86: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

87

Stacked EnsemblesScore Code

Includes score code for all four original models and the final decision tree.

Page 87: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

88

Stacked Ensembles

Advantages

• Enables us to use many different modeling algorithms

• Can handle outliers and missing values

• Can handle complex functions

• Consistently gives the same accuracy when data changes.

Page 88: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

89

Stacked Ensembles

Disadvantages

• Difficult to discover (visualize) non-linear transformations and multi-factor interactions

• “Black box” not easy to interpret

• Might not be legal to use in some industries (that is, consumer auto or credit)

Page 89: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

Review

Page 90: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

91

SAS Enterprise MinerEnsemble Models and Partitioning Algorithms

• Ensemble Models

• Decision Trees

• Perturb and Combine

• Bagging

• Boosting

• Gradient Boosting

• Random Forests (SAS 9.4)

• Ensemble Forests

• Stacked Ensembles

Page 91: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

Resources

Page 92: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

94

Learning MoreSAS Resources

SAS Global Forum Papers:

• Leveraging Ensemble Models in SAS® Enterprise Miner™ Miguel Maldonado, Jared Dean, Wendy Czika, and Susan Haller

• Ensemble Modeling: Recent Advances and Application Wendy Czika, Miguel Maldonado, and Ye Liu

• Stacked Ensemble Models for Improved Prediction Accuracy Funda Güneş, Russ Wolfinger, and Pei-Yi Ta

Blog: Why do stacked ensemble models win data science competitions?

Page 93: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

95

Learning More

Decision Trees for Analytics Using SAS® Enterprise Miner™

By: Barry de Ville and Padraic Neville

ISBN: 978-1-61290-315-6

Copyright Date: July 2013

SAS Bookstore: https://www.sas.com/store/prodBK_63319_en.html

Table of Contents [PDF]

Free Chapter [PDF]

Example Code and Data

Available on Amazon

Page 94: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

96

Learning More

Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions

By: Giovanni Seni and John Elder

ISBN-10: 1608452840

Publisher: Morgan and Claypool Publishers (February 24, 2010)

Available on Amazon

Page 95: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

97

Learning MoreAcademic References

Bauer, E. and Kohavi, R. 1999. “An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting and Variants.” Machine Learning 36:105-169, 1999.

Breiman, L. 1996. “Bagging Predictors.” Machine Learning 24:123-140.

Breiman, L. 1998. “Arcing Classifiers (with discussion). “ Annals of Statistics 26: 801-849.

Breiman, L. 2001. “Random Forests.” Machine Learning Volume 45: 5-32.

de Ville, Barry 2006. Decision Trees for Business Intelligence and Data Mining: Using SAS Enterprise Miner. Cary, NC.

Friedman, Jerome H. 2001 “Greedy Function Approximation: A Gradient Boosting Machine.” The Annals of Statistics 29: 1189–1232.

Friedman, Jerome H. 2002. “Stochastic Gradient Boosting.” Computations Statistics & Data Analysis 38: 367–378.

Page 96: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

98

SAS Online CommunityCommunities.sas.com/data-mining

Page 97: Ensemble Models and Partitioning Algorithms in SAS® Enterprise … · 2018. 2. 2. · decision tree or gradient boosting tree would be used. •Often used with big data •Variable

Copyright © SAS Inst itute Inc. A l l r ights reserved.

Questions?

Thank you for your time and attention!

Connect with me:LinkedIn: https://www.linkedin.com/in/melodierushTwitter: @Melodie_Rush