Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits
Sergej PotapovMartin Theus
Simon Urbanek
TWIX
Trees WIth eXtra splits
2
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits
Motivation
• Where the classical CART algoritm fails– Greedy algorithms never go for a (locally) second best solution,
which would result in a better overall (global) solution.
Example: XOR-data
-3 -2 -1 0 1 2 3 4 5 6 7 8
0
>=3.532669
1
< 3.532669
2
1.437037
>=5.679445
y
< 3.151441
1
>=3.151441
2
1.290780
< 3.248369
x
>=2.703216
1
< 2.703216
2
1.806452
>=3.248369
x
1.532075
< 5.679445
y
1.500000
x
-3 -2 -1 0 1 2 3 4 5 6 7 8
-3
-2
-1
0
1
2
3
4
5
6
7
8
9
10
3
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits
Trees: More Problems
• Non-orthogonal splitting directions …
-3 -2 -1 0 1 2 3
-3
-2
-1
0
1
2
3
-3 -2 -1 0 1 2 3
-3
-2
-1
0
1
2
3
< -0.05426408
1
>=0.09766749
1
< 0.09766749
2
1
>=-0.05426408
V1
1
>=-0.5117182
V2
>=-1.571775
1
< -1.758858
1
>=-1.758858
2
1
< -1.571775
V2
1
< -0.8793875
V1
>=-0.8793875
2
1
< -0.5117182
V2
1
< 0.2145846
V1
< 1.029791
1
>=1.029791
2
2
>=0.8353015
V2
< 0.8353015
2
2
>=0.2145846
V1
1
V2
-3 -2 -1 0 1 2 3
-3
-2
-1
0
1
2
3
… can not be handled by single trees, no matter how we split
4
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits
Bagging Revisited
Bagging = Boostrap Aggegation, tries to simulate an infinite sam-
ple by bootstrapping, i.e. sampling from the original sample with
replacement.
Repeat N times:
1. Generate a bootstrap sample Di of size n.
2. Fit model f̂Di.
Depending on the problem the N results are aggregated:
• Classification: g(x) = argmaxc∈C
N∑
i=1
I(fDi(x) = c)
• Regression: g(x) =1
N
N∑
i=1
fDi(x)
5
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits
Ensembles
• General IdeaUse many “different” classifier and combine them to get more accurate results.
• Bagging: Instability of trees yields different models
• Random Forests: Restrict input space randomly to get wider range of models
• Boosting: Iterate to down-weight “bad” points
Question:Why use randomly generated (sub-optimal) models?
6
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits
Tree Mechanics
• CART is a recursive partitioning algorithm
• Each node is split according to the maximum gain in the loss function
• Mountain plots shows the loss function for a variable for all possible split points
Loss Functions Mountain Plot
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.1
0.2
0.3
0.4
0.5
p
Entropy
Gini
Missclassification
6400 6600 6800 7000 7200 7400 7600 7800 8000 8200 8400
0
100
200
300
6400 6600 6800 7000 7200 7400 7600 7800 8000 8200 8400
1
3
5
7
9
7
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits
Idea behind TWIX
• Since the greedy CART algorithm not necessarily finds the “optimal” tree, try second best splits.
• Use these forests for bagging
• Expect better results for both single trees and aggregations
• How to find “good” candidates for second best splits?
• Number of inner nodes grows exponentially with the number of levels in the tree
⇒ so does the number of alternative trees
Problems
8
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits
Second Best Splits: South African Heart Data
100 120 140 160 180 200
05
10
15
20
sbp
Deviance
0 5 10 15 20 25 30
05
10
15
20
tobacco
Deviance
2 4 6 8 10 12 14
05
10
15
20
ldl
Deviance
10 15 20 25 30 35 40
05
10
15
20
adiposity
Deviance
20 30 40 50 60 70
05
10
15
20
typea
Deviance
20 25 30 35 40 45
05
10
15
20
obesityDeviance
0 20 40 60 80 100 120
05
10
15
20
alcohol
Deviance
20 30 40 50 60
05
10
15
20
age
Deviance
9
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits
Second Best Splits: Global vs. Local
• When searching for a “best” split point, we can either look for– all top n greatest deviance gains, or– only look for local maxima
• ExampleTop 6 splits
100 120 140 160 180 200
05
10
15
20
sbp
Deviance
0 5 10 15 20 25 30
05
10
15
20
tobacco
Deviance
2 4 6 8 10 12 14
05
10
15
20
ldl
Deviance
10 15 20 25 30 35 40
05
10
15
20
adiposity
Deviance
20 30 40 50 60 70
05
10
15
20
typea
Deviance
20 25 30 35 40 45
05
10
15
20
obesityDeviance
0 20 40 60 80 100 120
05
10
15
20
alcohol
Deviance
20 30 40 50 60
05
10
15
20
age
Deviance
10
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits
Second Best Splits: Forcing Variables
• Often a single variable dominates the potential deviance gain, and shadows all other variables⇒ Many probably good split points are lost.
• Solution:Force a minimum number of split points for each variable.
• Example: top 6 vs. top 3
100 120 140 160 180 200
05
10
15
20
sbp
Deviance
0 5 10 15 20 25 30
05
10
15
20
tobacco
Deviance
2 4 6 8 10 12 14
05
10
15
20
ldl
Deviance
10 15 20 25 30 35 40
05
10
15
20
adiposity
Deviance
20 30 40 50 60 70
05
10
15
20
typea
Deviance
20 25 30 35 40 45
05
10
15
20
obesity
Deviance
0 20 40 60 80 100 120
05
10
15
20
alcohol
Deviance
20 30 40 50 60
05
10
15
20
age
Deviance
100 120 140 160 180 200
05
10
15
20
sbp
Deviance
0 5 10 15 20 25 30
05
10
15
20
tobaccoDeviance
2 4 6 8 10 12 14
05
10
15
20
ldl
Deviance
10 15 20 25 30 35 40
05
10
15
20
adiposity
Deviance
20 30 40 50 60 70
05
10
15
20
typea
Deviance
20 25 30 35 40 45
05
10
15
20
obesity
Deviance
0 20 40 60 80 100 120
05
10
15
20
alcohol
Deviance
20 30 40 50 60
05
10
15
20
age
Deviance
56
11
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits
Second Best Splits: Grid Search
• In some situations good split points might not even associated with some (local) maximum in deviance gain.
(Remember the XOR Example)
• Grid searches are most exhaustive, but also most expensive.
100 120 140 160 180 200
05
10
15
20
sbp
Deviance
0 5 10 15 20 25 30
05
10
15
20
tobacco
Deviance
2 4 6 8 10 12 14
05
10
15
20
ldl
Deviance
10 15 20 25 30 35 40
05
10
15
20
adiposity
Deviance
20 30 40 50 60 70
05
10
15
20
typea
Deviance
20 25 30 35 40 45
05
10
15
20
obesity
Deviance
0 20 40 60 80 100 120
05
10
15
20
alcohol
Deviance
20 30 40 50 60
05
10
15
20
age
Deviance
12
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits
Implementation: The Grid
• If we allow sj splits per node on level j of the tree, we get a max-imum of
S =k∏
i=1
s2i−1
i
trees for a tree with no more than k levels. Example:
s = (7, 4, 2) ⇒ S = 720
· 421
· 222
= 7 · 16 · 16 = 1792
⇒Work on a grid of computers
PVMController
13
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits
Implementation: The R-Package
TWIX {TWIX} R Documentation
Top Classification Trees
Description
Top Classification Trees
Usage
TWIX(formula, data = NULL, test.data = NULL, subset = NULL,
method = "deviance", topn.method = "complete", cluster = NULL,
minsplit = 20, minbucket = round(minsplit/3), Devmin = 0.01,
topN = 1, level = 6, st = 1, cl.level = 2, tol = 0.01, ...)
Arguments
formula formula of the form y ~ x1 + x2 + ..., where y must be a factor and x1,x2,... arenumeric.
data an optional data frame containing the variables in the model(training data).
test.data a data frame containing new data.
subset an optional vector specifying a subset of observations to be used.
method Which split points will be used? This can be "deviance" (default), "grid" or "local".If the method is set to: local the program uses the local maxima of the splitfunction(entropy), deviance all values of the entropy, grid grid points.
topn.method one of "complete"(default) or "single". A specification of the consideration of the splitpoints. If set to "complete" it uses split points from all variables, else it uses split pointsper variable.
cluster name of the cluster, if parallel computing will be used.
minsplit the minimum number of observations that must exist in a node.
minbucket the minimum number of observations in any terminal <leaf> node.
Devmin the minimum improvement on entropy by splitting.
topN integer vector. How many splits will be selected and at which level? If length 1, the samesize of splits will be selected at each level. If length > 1, for example topN=c(3,2), 3splits will be chosen at first level, 2 splits at second level and for all next levels 1 split.
level maximum depth of the trees. If level set to 1, trees consist of root node.
st step parameter for method "grid".
cl.level parameter for parallel computing.
tol parameter, which will be used, if topn.method is set to "single".
... further arguments to be passed to or from methods.
Value
a list with the following components :
call the call generating the object.
trees a list of all constructed trees, which include ID, Dev ... for each tree.
greedy.tree greedy tree
14
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits
“Driving the Beast”
• The most important tuning parameters are– method
Which split points will be used? This can be "deviance" (default), "grid" or "local". If the method is set to: local the program uses the local maxima of the split function(entropy), deviance all values of the entropy, grid grid points.
– topn.methodone of "complete"(default) or "single". A specification of the consideration of the split points. If set to "complete" it uses split points from all variables, else it uses split points per variable.
– topNinteger vector. How many splits will be selected and at which level? If length 1, the same size of splits will be selected at each level. If length > 1, for example topN=c(3,2), 3 splits will be chosen at first level, 2 splits at second level and for all next levels 1 split.
– levelmaximum depth of the trees. If level set to 1, trees consist of root node.
– Stopping Rules:■ minsplit
the minimum number of observations that must exist in a node.■ minbucket
the minimum number of observations in any terminal <leaf> node.■ Devmin
the minimum improvement on entropy by splitting.
15
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits
South African Heart Desease Data cont.
• To get a “fair”, i.e. generalizable and not too overfitted classifier, we usually split the data into 3 chunks:
– TrainingAll models are trained using the training data
– ValidationThe “best” model is selected using the validation data(The chosen model is then estimated with training+validation)
– TestThe performance is then assessed with the test data
Training Validation Test
What about trees?Model Structure = Model parameters
16
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits
The Dataset
• 10 Variables, 462 Observations
• Target: Coronary Heart Disease (chd), 34,63% = 160 cases
• Inputs:continuous– sbp systolic blood pressure– tobacco cumulative tobacco (kg)– ldl low density lipoprotein cholesterol– adiposity– typea type-A behavior– obesity– alcohol current alcohol consumption– age age at onset
discrete– famhist family history of heart disease (Present, Absent)omi
tted
101 218101 218101 218
0.0
1.0
17
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits
The Dataset: Univariate
sbp
18
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits
The Dataset: Univariate
101 218
0.0
1.0
0 31.2
0.0
1.0
0.98 15.33
0.0
1.0
6.7 42.5
0.0
1.0
13 78
0.0
1.0
14.7 46.6
0.0
1.0
0 147.2
0.0
1.0
15 64
0.0
1.0
sbp tobacco ldl adiposity
typea obesity alcohol age
19
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits
The Dataset:Bivariate
sbp
0 5 10 15 20 25 30 10 20 30 40 15 25 35 45 20 30 40 50 60
100
140
180
220
05
1015202530
tobacco
ldl
24
68
10
14
10
20
30
40
adiposity
typea
20
40
60
80
15
25
35
45
obesity
alcohol
050
100
150
100 140 180 220
20
30
40
50
60
2 4 6 8 10 14 20 40 60 80 0 50 100 150
age
20
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits
The Dataset: Multivariate
sbp
tobacco
ldl
adiposity
typea
obesity
alcohol
age
sbp
tobacco
ldl
adiposity
typea
obesity
alcohol
age
21
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits
The Competitors: On 100 random samples
• Logistic Regression
glm(response~., data=dataTrain, family="binomial")
• Traditional CART
rpart(response ~ ., data=dataTrain, parms=list(split='information'))
• Bagging
bagging(response~., data=dataTrain, nbagg=100)
• SVM
svm(response~.,data=dataTrain)
!! None of the methods has been further tuned !!
22
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits
Competitors Results: Error Rates
• Trad. Trees
• Bagging
• Support Vektor
Machines
• Log. Regression ! !
!
! !
log..reg
svm.te
bagging
rpart.te
0.25 0.30 0.35 0.40 0.45
23
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits
TWIX: Diagnostics
• For a given “Multitree” we can compare deviance and classification rate on training and test/validation data.
Example:
!!!
!! !
! !!!
!!
!!
!
!
!
!
!!
!
!!
!
!!
!
!
!
!! !
!!
!!
!
!!
!!
!
!
!
!
!!
!
!!
!
!
!
!
!
!
!
!!
!
!
!!!
!
!
!
!
!!
!!
!
!
!!
!! !
!!
!
!
!! !
!
!!!
!!!
!!
!
!
!
!
!
!
!
!!
!!
!!!
!
!!
!
!
!
!
!
!
!!
!
!
!
!
!!
!
!
! !!
!
!!!
!
!!!
!!
!
!
!
!!
!!
!!
!!
!!
!!!
!!!
!
!!!!!
!
!
!!
!
!!!!
!!
!!
!!
!!
!
!
!
!!
!! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!!!
!!
!
!!
!
!!!!
!
!
!
!!!
!!
!!
!
!!
!
!
!!
!!!
!
!
!
!!
!!
!
!!
!
!
!
!
!
!
!
!
!
!!!
!!
!
!
!
!!
!
!
!
!!
!!
!
!
!!!
!
!!
!
!
!!!!!
!!!!
!
!!
!
! !!! !
!!
!
!
!
!
!
!
!
!!
!
!!
!
!
!
!
!
!
!
!!
! !!!
! !
!!
!!
!
!!
!
!
!!
! !
!
!!
!
!
!
!!!!
!!
!
!!
!!!!
!!
!
!
!
!!
!
!!
!
!
!!!!
!
!
!!!!
!!
!!
!!
!!!!!!
!
!!
!!
!
!
!
!
!!!!
!
!!
!
!
!
!!
!
!!!
!
!
!
!
!
!!
!
!
!!
!
!
!
!!
!
!
!
!
!
!
!
!
!!!!
!
!!
!!
! !!
!!
!
!
!
!!
!!
!
!!
!
!
!
!
!
!
!!
!
!!
!
!
!!
!!!!
!!
!!
!
!
!!!
!
!
!!
!
!
!
!
!
!!!
!
!!
!
!! !
!!
!
!
!
!
!
!!
!
!!
!
!
!
!
!
!!
!
!!
!!
!
!!
!
!!
!
!
!
!!
!
!
!!!!!!!!
!
!
!
!
!
!
!
!!
!
!
!!
!
!
!!
!
!
!!
!!
!!
!
!
!
!
!!
!!
!
!
!!
!
!
!!
!
!
!
!
!
!
!!!
!!
!!!
!
!!
!
!
!!
!!!!
!
! !
!
!
!
!
!
!
!!!
!
!
!!
!
!
!
!
!
!
!
!
!!
!
!
!
!!!
!!!!
!!
!
!
!!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!!!
!
!
!
!
!!
!!
!!
!
!
!
!
!
!
!
!!
!!
!!!
!!
!!
!!
!
!
!!!!
!!
!!
!
!
!!
!!
!
!
!!
!
!!
!!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!!
!
!
!
!! !
!!
!!
!!!!!
!!
!!
!!
!
!
!
!
!
!
!!
!
!!!!!!
!
!
!
!
!!
!!
!
!
!
!!
!
!!!!
!
!
!!!
!!
!
!!
!
!
!
!!
!
!
!
!
!
!
!!
!
!
!
!!
!
!
!!
!
!!
!
! !
!
!
!
!
!
!
!
!
!
50 100 150 200
10
15
20
25
30
35
40
Training Deviance
!
Deviances of trees (n= 870 )
50 100 150 200
050
100
150
!
!
! !
! !
! ! !
! !!
! ! !
! !! ! !! !!
! ! ! ! ! !
! ! !!! ! !! !! !! !
!! !! !!! ! ! ! !!!!
!! !!!! !!!! !! ! !
! ! !!!!! ! !!!!!!!!!!!!!! ! ! !
!!!!!!!!!!!! !!! !
!!! !!!!!!!!!!!!!!!! !! !!!!!!!!!!
!! !! ! ! !!!!!!!!!!!!!!!!
!! !! !!!!!!!!!!!!!!!! !
!!!!!!!!!!!!!! !!! !
!! !!!!!!!!!!!!! !
! !!!!!!!!!!!!!!!!!!!!! !!
!!!!!!!!!! !!!! !!! !!!
!!! !!!!!!!!!!!!!!!! !
! !! ! ! !! !!!!! !
! ! !! !!!!!!! !!!!!!! !!! ! ! !! ! !! ! !
!! !! ! !! ! !
! ! !
! ! ! !
!! !
0.65 0.70 0.75 0.80 0.85
0.6
50
.70
0.7
50
.80
0.8
5
CCR of 870 trees
CCR of training data
CC
R o
f te
tst
da
ta
!
+rpart Tree
24
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits
TWIX: Tree Selection
• The CCR (Correct Classification Rate) of the top TWIX trees are far better than those of greedy trees and most other classification methods
Quest:How to find the “best” trees from the validation data?
• Currently:
Sort trees according to– training deviance– test deviance– training CCR– test CCR
and pick the best!
25
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits
TWIX: Results
• Local Maxima across all variables (50 samples)
– 1,1
– 2,1
– 4,2
– 6,3
– 8,4
– 10,5 !! ! !
!! !!! !
!!! !! ! !
!!! !
X10.5
X8.4
X6.3
X4.2
X2.1
X1.1
0.25 0.30 0.35 0.40 0.45
26
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits
TWIX: Results
• Local Maxima, within all variables (50 samples)
– 1,1
– 2,1
– 4,2
– 6,3
– 8,4
– 10,5 !
!
!
! !!
!
X10.5
X8.4
X6.3
X4.2
X2.1
X1.1
0.25 0.30 0.35 0.40 0.45
27
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits
TWIX: Results
• 100 runs of a (10,5,2) tree
!
!!
! !! !
! !!!
! !
!
! !
!
rpart.tr
TWIX.Tr
log..reg
TWIX.Bag
bagging
0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45
rpart (training)
svm (training)
TWIX (training)
TWIX (validation)
logistic (test)
svm (test)
TWIX bag (test)
TWIX (test)
Bagging (test)
rpart (test)
28
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits
Bagged TWIX
• What bagging should look like:
#trees
erro
r ra
te
29
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits
Bagged TWIX: Example
• Bagging TWIX trees does often not improve the CCR.
0 50 100 150 200 250 300
0.30
0.35
0.40
Index
ps
error of single trees
0 10 20 30 40 50
0.30
0.35
0.40
Index
bagseq
error of bagged trees
30
Martin Theus, Sergej Potapov – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse, Universität AugsburgSimon Urbanek – AT&T Research Labs, Florham Park, NJ
TWIX – Trees WIth eXtra splits
Conclusion
• For the South-African-Heart Data– Single TWIX-trees out-perform traditional tree and usually bagged trees
– Bagged TWIX beats bagged trees and reaches top performance
– TWIX gives good single alternative tree models
• Still much room for performance improvement– Stopping rules and pruning– Better tree selection– Improved selection of second best splits
– More tests on more datasets– Better understanding of the tree-families
• Computational effort is high, but parallel computing speeds up dramatically
• Complex methods are hard to implement and hard to test
Top Related