“BOF” Trees Diagram as a Visual Way to Improve Interpretability of Tree Ensembles
description
Transcript of “BOF” Trees Diagram as a Visual Way to Improve Interpretability of Tree Ensembles
![Page 1: “BOF” Trees Diagram as a Visual Way to Improve Interpretability of Tree Ensembles](https://reader036.fdocuments.net/reader036/viewer/2022062410/568157f2550346895dc56fe3/html5/thumbnails/1.jpg)
BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004 BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004
“BOF” Trees Diagram as a Visual Way to Improve Interpretability of Tree
Ensembles
“BOF” Trees Diagram as a Visual Way to Improve Interpretability of Tree
Ensembles
Vesna Luzar-Stiffler, Ph.D.University Computing Centre, and CAIR Research Centre,
Zagreb, Croatia Charles Stiffler, Ph.D.
CAIR Research Centre, Zagreb, [email protected], [email protected]
![Page 2: “BOF” Trees Diagram as a Visual Way to Improve Interpretability of Tree Ensembles](https://reader036.fdocuments.net/reader036/viewer/2022062410/568157f2550346895dc56fe3/html5/thumbnails/2.jpg)
BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004 BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004
OutlineOutline
Introduction/Background Trees Ensemble Trees Visualization Tools
Simulation Results
Web Survey Results
Conclusions/Recommendations
![Page 3: “BOF” Trees Diagram as a Visual Way to Improve Interpretability of Tree Ensembles](https://reader036.fdocuments.net/reader036/viewer/2022062410/568157f2550346895dc56fe3/html5/thumbnails/3.jpg)
BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004 BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004
Introduction / BackgroundIntroduction / Background
Classification / Decision Trees Data mining (statistical learning) method for
classification Invented twice:
Statistical community: Breiman: Friedman et.al. (1984) Machine Learning community: Quinlan (1986)
Many positive features Interpretability, ability to handle data of mixed type
and missing values, robustness to outliers, etc.
Disadvantage unstable vis-à-vis seemingly minor data perturbations
low predictive power
![Page 4: “BOF” Trees Diagram as a Visual Way to Improve Interpretability of Tree Ensembles](https://reader036.fdocuments.net/reader036/viewer/2022062410/568157f2550346895dc56fe3/html5/thumbnails/4.jpg)
BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004 BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004
Introduction / BackgroundIntroduction / Background
Possible improvements: Ensembles Bagging i.e., Bootstraping trees (Breiman, 1996) Boosting, e.g., AdaBoost (Freund & Schapire, 1997) Random Forests (Breiman, 2001) Stacking, randomized trees, etc.
Advantage: Improved prediction
Disadvantage Loss of interpretability (“black box”)
![Page 5: “BOF” Trees Diagram as a Visual Way to Improve Interpretability of Tree Ensembles](https://reader036.fdocuments.net/reader036/viewer/2022062410/568157f2550346895dc56fe3/html5/thumbnails/5.jpg)
BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004 BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004
Classification TreeClassification Tree
Let
be the classification tree prediction at input x obtained from the full “training” data Z=
{(x1,y1),(x2,y2)…(xN,yN)}
)(ˆ xf
![Page 6: “BOF” Trees Diagram as a Visual Way to Improve Interpretability of Tree Ensembles](https://reader036.fdocuments.net/reader036/viewer/2022062410/568157f2550346895dc56fe3/html5/thumbnails/6.jpg)
BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004 BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004
Bagging Classification TreeBagging Classification Tree
Let
be the classification tree prediction at input x obtained from the bootstrap sample Z*b, b=1,2,…B.
Bagging estimate:
)(ˆ * xf b
1
2
B
B
b
b
bagxf
Bxf
1
* )(ˆ1
)(ˆ
![Page 7: “BOF” Trees Diagram as a Visual Way to Improve Interpretability of Tree Ensembles](https://reader036.fdocuments.net/reader036/viewer/2022062410/568157f2550346895dc56fe3/html5/thumbnails/7.jpg)
BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004 BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004
Visualization toolsVisualization tools
Graphs based on predictor “importances”
(Bxp) matrix F (p=# of predictors)
For bagged trees, we take the avg: Diagram 1, importance mean bar chart Diagram 2, (“BOF Clusters”) is the cluster
means chart (NEW) Diagram 3, (“BOF MDPREF”) is the
multidimensional preference bi-plot (NEW)
)(ˆ1ˆ
1
22
b
B
b kkTI
BI
![Page 8: “BOF” Trees Diagram as a Visual Way to Improve Interpretability of Tree Ensembles](https://reader036.fdocuments.net/reader036/viewer/2022062410/568157f2550346895dc56fe3/html5/thumbnails/8.jpg)
BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004 BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004
Visualization toolsVisualization tools
Graphs based on proximity (nxn) matrix P, (n=# of cases) Diagram 4 (“Proximity Clusters”) is the cluster
means chart (Breiman,2002) Diagram 5 (“Proximity MDS”) is the
multidimensional scaling plot of “similar” cases (Breiman,2002)
![Page 9: “BOF” Trees Diagram as a Visual Way to Improve Interpretability of Tree Ensembles](https://reader036.fdocuments.net/reader036/viewer/2022062410/568157f2550346895dc56fe3/html5/thumbnails/9.jpg)
BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004 BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004
Simulation experimentsSimulation experiments
S1:Generate a sample of size n=30,two classes, and p=5 variables (x1-x5), with a standard normal distribution and pair-wise correlation 0.95.The responses are generated according toPr(Y=1|x1≤0.5) = 0.2, Pr(Y=1|x1>0.5)=0.8.
S2:Generate a sample of size n=30,two classes, and p=5 variables (x1-x5), with a standard normal distribution and pair-wise correlation 0.95 between x1 and x2, and 0 among other predictors.The responses are generated according toPr(Y=1|x1≤0.5) = 0.2, Pr(Y=1|x1>0.5)=0.8.
![Page 10: “BOF” Trees Diagram as a Visual Way to Improve Interpretability of Tree Ensembles](https://reader036.fdocuments.net/reader036/viewer/2022062410/568157f2550346895dc56fe3/html5/thumbnails/10.jpg)
BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004 BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004
Diagram 1, Mean importance Diagram 1, Mean importance
S1 S2
![Page 11: “BOF” Trees Diagram as a Visual Way to Improve Interpretability of Tree Ensembles](https://reader036.fdocuments.net/reader036/viewer/2022062410/568157f2550346895dc56fe3/html5/thumbnails/11.jpg)
BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004 BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004
Diagram 2, “BOF Clusters” Diagram 2, “BOF Clusters”
S1 S2
![Page 12: “BOF” Trees Diagram as a Visual Way to Improve Interpretability of Tree Ensembles](https://reader036.fdocuments.net/reader036/viewer/2022062410/568157f2550346895dc56fe3/html5/thumbnails/12.jpg)
BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004 BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004
Diagram 3, “BOF MDPREF” Diagram 3, “BOF MDPREF”
S1 S2
![Page 13: “BOF” Trees Diagram as a Visual Way to Improve Interpretability of Tree Ensembles](https://reader036.fdocuments.net/reader036/viewer/2022062410/568157f2550346895dc56fe3/html5/thumbnails/13.jpg)
BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004 BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004
Diagram 4, “Proximity Clusters” Diagram 4, “Proximity Clusters”
S1 S2
![Page 14: “BOF” Trees Diagram as a Visual Way to Improve Interpretability of Tree Ensembles](https://reader036.fdocuments.net/reader036/viewer/2022062410/568157f2550346895dc56fe3/html5/thumbnails/14.jpg)
BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004 BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004
Web Survey dataWeb Survey data
ICT infrastructure/usage in Croatian primary and secondary schools 25,000+ teachers (cases)200+ variablesResponse: “classroom use of a computer by educators” (yes/no)Partition 50% training 25% validation 25% test
![Page 15: “BOF” Trees Diagram as a Visual Way to Improve Interpretability of Tree Ensembles](https://reader036.fdocuments.net/reader036/viewer/2022062410/568157f2550346895dc56fe3/html5/thumbnails/15.jpg)
BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004 BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004
Initial tree (before bagging)Initial tree (before bagging)
![Page 16: “BOF” Trees Diagram as a Visual Way to Improve Interpretability of Tree Ensembles](https://reader036.fdocuments.net/reader036/viewer/2022062410/568157f2550346895dc56fe3/html5/thumbnails/16.jpg)
BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004 BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004
Diagram 1, “Mean importance” Diagram 1, “Mean importance”
![Page 17: “BOF” Trees Diagram as a Visual Way to Improve Interpretability of Tree Ensembles](https://reader036.fdocuments.net/reader036/viewer/2022062410/568157f2550346895dc56fe3/html5/thumbnails/17.jpg)
BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004 BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004
Diagram 2, “BOF Clusters” Diagram 2, “BOF Clusters”
![Page 18: “BOF” Trees Diagram as a Visual Way to Improve Interpretability of Tree Ensembles](https://reader036.fdocuments.net/reader036/viewer/2022062410/568157f2550346895dc56fe3/html5/thumbnails/18.jpg)
BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004 BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004
Diagram 3, “BOF MDPREF” Diagram 3, “BOF MDPREF”
![Page 19: “BOF” Trees Diagram as a Visual Way to Improve Interpretability of Tree Ensembles](https://reader036.fdocuments.net/reader036/viewer/2022062410/568157f2550346895dc56fe3/html5/thumbnails/19.jpg)
BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004 BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004
Bootstrap tree 11Bootstrap tree 11
![Page 20: “BOF” Trees Diagram as a Visual Way to Improve Interpretability of Tree Ensembles](https://reader036.fdocuments.net/reader036/viewer/2022062410/568157f2550346895dc56fe3/html5/thumbnails/20.jpg)
BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004 BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004
Bootstrap tree 22Bootstrap tree 22
![Page 21: “BOF” Trees Diagram as a Visual Way to Improve Interpretability of Tree Ensembles](https://reader036.fdocuments.net/reader036/viewer/2022062410/568157f2550346895dc56fe3/html5/thumbnails/21.jpg)
BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004 BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004
Bootstrap tree 12Bootstrap tree 12
![Page 22: “BOF” Trees Diagram as a Visual Way to Improve Interpretability of Tree Ensembles](https://reader036.fdocuments.net/reader036/viewer/2022062410/568157f2550346895dc56fe3/html5/thumbnails/22.jpg)
BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004 BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004
Clustering trees Clustering trees
![Page 23: “BOF” Trees Diagram as a Visual Way to Improve Interpretability of Tree Ensembles](https://reader036.fdocuments.net/reader036/viewer/2022062410/568157f2550346895dc56fe3/html5/thumbnails/23.jpg)
BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004 BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004
Diagram 5, “Proximity MDS” Diagram 5, “Proximity MDS”
![Page 24: “BOF” Trees Diagram as a Visual Way to Improve Interpretability of Tree Ensembles](https://reader036.fdocuments.net/reader036/viewer/2022062410/568157f2550346895dc56fe3/html5/thumbnails/24.jpg)
BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004 BOF Trees Visualization BOF Trees Visualization ZagrebZagreb, June , June 1212, 2004 , 2004
Conclusions/ RecommendationsConclusions/ Recommendations
There are SWs for trees
There are some SWs for tree ensembles
There are some visualization tools (old and new)
The problem is they are not “interfaced” (integrated)