Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1)...
Transcript of Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1)...
![Page 1: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/1.jpg)
Comparative Review of Classification Trees
by Leonardo Auslender,
leoldv12 ‘at’ gmail ‘dot’ com
Independent Statistical Research Consultant 2013
![Page 2: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/2.jpg)
— 2 —
1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding Remarks: Brains, the future Review of Trees: Auslender, L. (1998): Alacart, Poor Man’s Classification Trees, North Eastern SAS Users Group Conference.
Contents
![Page 3: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/3.jpg)
— 3 —
A Field vARIGuide to Tree
CART Tree (S+, R)
AID THAID CHAID
ID3 C4.5 C5.0
1) Varieties of Tree Methods
![Page 4: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/4.jpg)
— 4 —
CART
Classification and
Regression Trees
Source: Breiman L., Freedman J. Stone R., Olshen C.: Classification and Regression Trees, Wadsworth, International Group, Belmont, CA, 1984
![Page 5: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/5.jpg)
— 5 —
Aim: separate two classes by using X1 and X2 and producing more homogenous rectangular regions.
![Page 6: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/6.jpg)
— 6 —
CART: Underlying Classification algorithm Using missclasification.
Y X1 X2 X3 x4
0 1 10 21 1
1 1 30 8 1
0 2 0 8 0
0 3 10 8 0
Misscl (Y / X1 <=1) = .5 Misscl (Y / X1 > 1) = 0, repeat for every value of X1 and for every other X variable, select optimal variable and split (actually uses Gini in
Cart).
![Page 7: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/7.jpg)
— 7 —
Basic CART Algorithm: binary dependent
variable or target (0,1)
Range of Continuous Variable A
“0” “0”
70% “1”
“1”
20%
50%
Original % of ‘0’s and ‘1’s of dep. var
Splitting point
Xi
Y
![Page 8: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/8.jpg)
— 8 —
Divide and Conquer: recursive partitioning
n = 5,000
10% HELOC
n = 3,350 n = 1,650
Debits < 19 yes no
21% HELOC 5% HELOC
![Page 9: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/9.jpg)
— 9 —
Ideal SAS code to find splits
Proc summary data = …. Nway; class (all independent vars); var depvar; /* this is ‘target’, 0/1*/
output out = ….. Sum = ; run;
For large data sets (large N, large NVAR), hardware and software constraints prevent completion.
![Page 10: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/10.jpg)
— 10 —
Fitted Decision Tree: Interpretation and
structure
VAR C
>1
25%
0-52
45%
VAR B
VAR A <19 19
5%
0,1
21%
>52
![Page 11: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/11.jpg)
— 11 —
The Cultivation of Trees
• Split Search
– Which splits are to be considered?
• Splitting Criterion
– Which split is best?
• Stopping Rule
– When should the splitting stop?
• Pruning Rule
– Should some branches be lopped-off?
![Page 12: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/12.jpg)
— 12 —
Possible Splits to Consider: most common is binary
because...
1
100,000
200,000
300,000
400,000
500,000
2 4 6 8 10 12 14 16 18 20
Nominal Input Ordinal
Input
Input Levels
If input has 1000 levels, 999 possible binary splits and 999 * 998 /2 trinary split, etc.
![Page 13: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/13.jpg)
— 13 —
Splitting Criterion: gini, twoing,
misclassification, entropy…
A) Minimize Gini impurity criterion (favors node homogeneity) ----------------- B) Maximize Twoing impurity criterion (favors class separation) Empirical results: for binary dependent variables, Gini and Twoing are equivalent. For trinomial, Gini provides more accurate trees. Beyond three categories, twoing performs better.
![Page 14: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/14.jpg)
— 14 —
The Right-Sized Tree
Stunting
Pruning
![Page 15: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/15.jpg)
— 15 —
![Page 16: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/16.jpg)
— 16 —
![Page 17: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/17.jpg)
— 17 —
![Page 18: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/18.jpg)
— 18 —
Benefits of Trees
• Interpretability
– Tree structured presentation
• Mixed Measurement Scales
– Nominal, ordinal, interval
– Regression trees
• Robustness
• Missing Values
![Page 19: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/19.jpg)
— 19 —
…Benefits
• Automatically
– Detects interactions (AID) in hierarchical conditioning search, not ‘ala’ regression analysis.
– Selects input variables
Input Input
Prob
Multivariate Step Function
![Page 20: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/20.jpg)
— 20 —
Drawbacks of Trees
. Unstable: small perturbations in data can lead to big changes in trees.
. Linear structures are approximated in very rough
form.
. Applications may require that
rules descriptions for
different categories not
share the same attributes.
. It is a conditional
Structure and interpretation many times misunderstands the conditioning effect.
![Page 21: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/21.jpg)
— 21 —
Drawbacks of Trees (cont.)
. Tends to over-fit => overly optimistic accuracy.
. Large trees are very difficult to interpret.
. Tree size conditioned by data set size.
. No valid inferential
procedures at present
(if it matters).
. Greedy search
algorithm.
![Page 22: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/22.jpg)
— 22 —
Note on Missing Values.
1) Missingness NOT in Y (see Wang and Sheng, 2007, JMLR for semi-supervised method for missing Y).
2) Different methods of imputation: 1) C4.5: probabilistic split: variables with missing values are
attached to child nodes with weights equal to proportion of non-missing values.
2) Complete case: eliminate all missing observations, and train. 3) Grand mode/mean: imputed if categorical/continuous. 4) Separate class: appropriate for categorical. For continuous,
create extreme large value and thus separate missings from non-missings.
5) Complete variable case: delete all variables with missing values.
6) Surrogate (CART default): Use surrogate variable/s whenever variable is missing. At testing or scoring, if variable is missing, uses surrogate/s.
![Page 23: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/23.jpg)
Tree Derivative: Random Forests. (Breiman, 1999)
— 23 —
Random Forests proceed in the following steps, and notice that there is no
need to create a training, validation and a test data sets:
1. Take a random sample of N observations with replacement
(“bagging”) from the data set. On average, select about 2/3 of rows. The
remaining 1/3 are called “out of bag (OOB)” observations. A new random
selection is performed for each tree constructed.
2. Using the observations selected in step 1, construct a decision tree to
its maximum size, without pruning. As the tree is built, allow only a
subset of the total set of predictor variables to be considered as
possible splitters for each node. Select the set of predictors to be
considered as random subset of the total set of available predictors.
For example, if there are ten predictors, choose five of them randomly as
candidate splitters. Perform a new random selection for each split. Some
predictors (possibly best one) will not be considered for each split, but
predictor excluded from one split may be used for another split in the
same tree.
![Page 24: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/24.jpg)
— 24 —
No Overfitting or Pruning.
The "Over-fitting“ problem appears in large, single-tree models where the model fits
noise in the data, which causes poor generalization power, which is the basis for
pruning those models. In nearly all cases, decision tree forests do not have problem
with over-fitting, and there is no need to prune trees in the forest. Generally, the more
trees in a forest, the better the fit.
Internal Measure of Test Set (Generalization) Error .
About 1/3 of observations are excluded from each tree in the forest, which are called
“out of bag (OOB)”. That is, each tree has a different set of out-of-bag observations
that implies each OOB set constitutes an independent test sample.
To measure the generalization error of decision tree forests, the OOB set for each tree
is run through the tree and the error rate of prediction is computed.
The error rates for the trees in the forest are then averaged to obtain the overall
generalization error rate for the decision tree forest model.
There are several advantages to this method of computing the generalization error:
(1) All observations are used to construct the model, and none have to be held back
as a separate test set,
(2) The testing is fast because only one forest has to be constructed (as compared to
V-fold cross-validation where additional trees have to be constructed).
![Page 25: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/25.jpg)
— 25 —
2) Scoring: battle horse of database marketing.
Model Deployment.
![Page 26: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/26.jpg)
— 26 —
Scoring Recipe
• Model
– Formula
• Data Modifications
– Derived inputs
– Variable Transformations
– Missing value imputation
• Scoring Code
Scored data
Original
computation algorithm
![Page 27: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/27.jpg)
— 27 —
/* PROGRAM ALGOR8.PGM WITH 8 FINAL NODES*/
/* METHOD MISSCL ALACART TEST */
RETAIN ROOT 1;
IF ROOT & CURRDUE <= 105.38 & PASTDUE <= 90.36 & CURRDUE <= 12
THEN DO;
NODE = '4_1 ';
PRED = 0 ;
/* % NODE IMPURITY = 0.0399 ; */
/* BRANCH # = 1 ; */
/* NODE FREQ = 81 ; */
END;
ELSE IF ROOT & CURRDUE <= 105.38 & PASTDUE <= 90.36 & CURRDUE > 12
THEN DO;
NODE = '4_2 ';
PRED = 1 ;
/* % NODE IMPURITY = 0.4478 ; */
/* BRANCH # = 2 ; */
/* NODE FREQ = 212 ; */
END;
ELSE IF ROOT & CURRDUE <= 105.38 & PASTDUE > 90.36
THEN DO;
NODE = '3_2 ';
PRED = 0 ;
Scoring Recipe: example of scoring output generated by
Alacart
![Page 28: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/28.jpg)
— 28 —
Scorability
X1
0
.2
.4
.6
.8
1
X2 0 .2 .4 .6 .8 1
Scoring Code Classifier
If x1<.47 & x2<.18 or x1>.47 & x2>.29, then red.
Tree
Training Data
New Case
![Page 29: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/29.jpg)
— 29 —
![Page 30: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/30.jpg)
1st. Data set: Titanic.
Titanic survival data, available on the web. 1313 observations
but due to missing “age” values, 756 complete observations,
out of 1313 total number of observations. Below, variables
available for analysis (the “*” variables are transformations
to “help” the logistic). „ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ† ‚Data Contents ‚Variable‚ ‚ ‚ Length ‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰ ‚Variable Name ‚Variable Label ‚ ‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚AGE ‚Yrs of Age ‚ 8‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰ ‚AGESEX ‚Age * Sex ‚ 8‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰ ‚AGESQ ‚Age * Age ‚ 8‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰ ‚PASSCLASS1 ‚First Class ‚ 8‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰ ‚PASSCLASS2 ‚Second Class ‚ 8‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰ ‚PASSCLASS3 ‚Third Class ‚ 8‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰ ‚SEX ‚Sex Female = 1 ‚ 8‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰ ‚SURVIVED ‚Survived = 1 ‚ 8‚ Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒŒ
![Page 31: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/31.jpg)
Original Data
Survived
Total
Did not Survived
Count % Obs Count % Obs
All All All All Count
Age Gender Total
5 0.38 96 7.31 101 Present female 1st
2nd 10 0.76 75 5.71 85
3rd 56 4.27 46 3.50 102
male 1st 82 6.25 43 3.27 125
2nd 106 8.07 21 1.60 127
3rd 184 14.01 32 2.44 216
All 443 33.74 313 23.84 756
Missing Gender Total
4 0.30 38 2.89 42 female 1st
2nd 3 0.23 19 1.45 22
3rd 76 5.79 34 2.59 110
male 1st 38 2.89 16 1.22 54
2nd 42 3.20 4 0.30 46
3rd 257 19.57 26 1.98 283
All 420 31.99 137 10.43 557
All 863 65.73 450 34.27 1313
![Page 32: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/32.jpg)
— 32 —
![Page 33: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/33.jpg)
— 33 — 33
Complete Data
W/O Missing
Age.
Data Description: SURVIVED
G + PCl
% G +
PCl
DID NOT SURVIVED
Mean AGE Mean AGE
Count
%
Tota
l
Obs
% Of
Gender Mean Count
%
Tota
l
Obs
% Of
Gender
Mea
n Count
GENDER PSSNGR
CLASS
5 0.66 1.28 35.20 96
13.7
0 39.60 37.91 101 13.36 female 1st
2nd 10 1.32 3.28 31.40 75 9.92 21.92 26.85 85 11.24
3rd 56 7.41 9.27 23.82 46 6.08 11.37 23.72 102 13.49
Total
71 9.39 13.82 24.90 217
28.7
0 73.89 30.87 288 38.10
male PSSNGR
CLASS
82
10.8
5 26.66 44.84 43 5.69 16.03 34.25 125 16.53 1st
2nd
106
14.0
2 24.36 31.70 21 3.78 3.39 14.84 127 16.80
3rd
184
24.3
4 36.16 27.10 32 4.23 7.69 23.09 216 28.57
Total
372
49.2
1 87.18 33.32 96
13.7
0 27.11 25.95 468 61.90
Total
443
58.6
0 100.00 31.13 313
41.4
0 100.00 29.36 756 100.00
Pr (Fem/Surv)
Pr (Surv & Fem)
![Page 34: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/34.jpg)
— 34 —
Logistic Vs. trees. Titanic No missing Values.
Consistency Information Value Characteristic
756 'informs.titanic_no_missing' # obs. Number of variables 7
Number of continuous variables 3 Number of class variables
4
Trees used 3, Forest 5, while Stepwise 5 plus the intercept.
Models * Vars * Coeffs Estimate Pr > Chi-Square Importance # Rules
Var Sel Type Variable
0.546 493 Forest AGE
PASSCLASS1 0.400 66
PASSCLASS2 0.248 58
PASSCLASS3 0.465 59
SEX 1.000 97 STEPWISE AGE -0.039177938 0.000
PASSCLASS1 1.2919799232 0.000
PASSCLASS3 -1.229467857 0.000
SEX 2.631357225 0.000
Intercept -0.163634963 0.550 Trees AGE 0.408 1
PASSCLASS3 0.556 2
SEX 1.000 1
![Page 35: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/35.jpg)
— 35 —
Training: Rates '-' ==> misclass & Missprec
_PREDICTED
0 1 Overall
Class
Rate Prec Rate
Class Rate
Prec Rate
Class Rate Prec Rate
Survived = 1 Model Name
96.39 78.35 -3.61 -7.58 96.39 0 FOREST
LOGISTIC_STEPWISE 83.97 80.35 -16.03 -24.23 83.97
TREES 96.61 77.12 -3.39 -7.46 96.61 1 FOREST -37.70 -21.65 62.30 92.42 62.30
LOGISTIC_STEPWISE -29.07 -19.65 70.93 75.77 70.93
TREES -40.58 -22.88 59.42 92.54 59.42 Overall FOREST 78.35 92.42 82.28 82.28
LOGISTIC_STEPWISE 80.35 75.77 78.57 78.57
TREES 77.12 92.54 81.22 81.22
Trees have the highest classification rate (96.39%) and an excellent precision rate (92.42%). Forest comes at a close second.
![Page 36: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/36.jpg)
— 36 —
Gains Table
Events Rate
Cum Events Rate
% Event Captured
Cum % Events
Captured Lift Cum Lift Brier Score
* 100 Percentile Model Name
100.000 100.000 12.141 12.141 2.415 2.415 0.511
5 FOREST
LOGISTIC_STEPWISE 94.737 94.737 11.502 11.502 2.288 2.288 5.111
TREES 95.119 95.119 11.548 11.548 2.297 2.297 0.000
10 FOREST
97.368 98.684 11.821 23.962 2.352 2.384 0.936
LOGISTIC_STEPWISE 97.368 96.053 11.821 23.323 2.352 2.320 1.023
TREES 91.935 93.527 11.161 22.709 2.221 2.259 0.000
15 FOREST
90.789 96.053 11.022 34.984 2.193 2.320 5.166
LOGISTIC_STEPWISE 89.474 93.860 10.863 34.185 2.161 2.267 2.837
TREES 91.935 92.997 11.161 33.871 2.221 2.246 0.000
20 FOREST
93.421 95.395 11.342 46.326 2.256 2.304 3.859
LOGISTIC_STEPWISE 86.842 92.105 10.543 44.728 2.098 2.225 3.951
TREES 91.935 92.731 11.161 45.032 2.221 2.240 0.000
![Page 37: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/37.jpg)
— 37 —
Comparing the results.
1) Trees required fewer variables than logistic and thus easier to interpret. Forest by definition use all the variables.
2) Trees obtained slightly larger lift measures but who can beat forests?
3) Trees and Forest determine the most important variable immediately, female sex, at the top of the tree, while with logistic it is not very clear.
![Page 38: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/38.jpg)
— 38 —
2nd Data set: Surendra Financial
Data.
No information available about meaning or
measurement. All variables called R1 – R84,
one binary dependent variable “Newgroup”.
There are no missing values, the missing
values have been somehow imputed, but
not reported.
![Page 39: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/39.jpg)
— 39 —
Data Mining Example: Just fit a model.
Consistency Information Value
Characteristic
45,175
'surendra.newsurendra' # obs.
Number of variables
84 Number of continuous variables
84 Number of class variables
0
Financial information with target = “newgroup” and variable names R1 – R85 Without any information as to what anything means. Forest omitted from the exercise.
![Page 40: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/40.jpg)
— 40 —
Difficult to interpret, the larger ‘p’ is.
![Page 41: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/41.jpg)
— 41 —
Gains Table
Events Rate
Cum Events Rate
% Event Captured
Cum % Events
Captured Lift Cum Lift Brier Score
* 100 Percent
ile Model Name
92.873 92.873 37.391 37.391 7.477 7.477 5.043
5 LOGISTIC_STEPWISE
TREES
98.639 98.639 39.712 39.712 7.942 7.942 0.000 10 LOGISTIC_STEPWIS
E 49.270 71.071 19.836 57.227 3.967 5.722 25.486
TREES
50.476 74.558 20.322 60.034 4.064 6.003 0.000 15 LOGISTIC_STEPWIS
E 30.235 57.459 12.173 69.399 2.434 4.626 20.962
TREES 12.772 53.962 5.142 65.176 1.028 4.345 0.000
20 LOGISTIC_STEPWISE
19.088 47.869 7.681 77.081 1.537 3.854 15.217
Logistic selected 49 variables, Trees 12. 10 of the 12 also used by Logistic.
![Page 42: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/42.jpg)
— 42 —
Comparing the results.
1) Trees selected 12 and logistic 49 variables.
2) The initial split on R73 produced almost perfectly pure nodes. R73 was also selected by Stepwise, but Stepwise doesn’t stop fast enough.
3) Model performance, as evaluated by lift, favors Trees.
![Page 43: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/43.jpg)
— 43 —
Very quick: Trees vs. Gradient Boosting.
Task: Classify into ‘5’ segments. Tools: Trees and Gradient Boosting (different versions). Compare by classification, precision and F1 rates.
Model descriptions
_MODEL_
Obs STUDY NUMBER
1 tree_equal_prob_CV_10 1
2 tree_origl_prob_CV_10 2
3 tree_origl_CV_10_5_split 3
4 tree_origl_CV_10_2_split_dec 4
5 Boost simple 5
6 Boost equal Probs 6
7 Custs_Boost dec matrix 7
8 Boost_equal_2nd_stage 8
9 Boost_orig_2nd_stage 9
Then models 1 through 4 are TREES, 5 through 9 BOOSTING.
![Page 44: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/44.jpg)
— 44 —
![Page 45: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/45.jpg)
— 45 —
![Page 46: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/46.jpg)
— 46 —
![Page 47: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/47.jpg)
— 47 —
Quick summary conclusions for multi-classification.
1) All boosting methods are good and just
‘1’ of trees competes with them in one case.
2) The performance of any of the boosting methods was similar, thus not much model specification search is required.
3) Boosting methods very difficult to interpret.
![Page 48: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/48.jpg)
4. Concluding Remarks
![Page 49: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/49.jpg)
— 49 —
Different algorithms
1) Non-greedy algorithms and two- or three-step ahead search.
2) Hybrid models, which combine regression and tree Methods (not very popular after the 2000s). 3) Boosting or majority voting methods, which generate a sequence of trees and classifications and the outcome is decided democratically. 4) Binned trees, in which splits searches are conducted after discretizing all variables, thus allowing for possibly non-linear effect searches. ...
![Page 50: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/50.jpg)
— 50 —
Avoid over-fitting / overtorture... because ...
Instead, in Sherlock Holmes’ words: “I never guess. It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts”. (A Scandal in Bohemia).
We should not act as Mark Twain says: “Get your facts first, and then you can
distort
them as much as you please.”
![Page 51: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/51.jpg)
— 51 —
Let us not be in haste …
Method comparison by way of two examples does not imply general method superiority. There are many examples in the literature in which logistic regression performed better.
![Page 52: Comparative Review of Classification Treesnymetro.chapter.informs.org/prac_cor_pubs/05-13... · 1) Trees/CART: varieties, algorithm 2) Model Deployment: scoring 3) Examples. 4) Concluding](https://reader034.fdocuments.net/reader034/viewer/2022050423/5f9248efb7967568e735d8b8/html5/thumbnails/52.jpg)
— 52 —
The End