Introduction to Machine Learning Fall 2013 Decision Trees Koby Crammer Department of EE Technion...
-
Upload
moshe-hanford -
Category
Documents
-
view
219 -
download
0
Transcript of Introduction to Machine Learning Fall 2013 Decision Trees Koby Crammer Department of EE Technion...
Introduction to Machine LearningFall 2013
Decision Trees
Koby CrammerDepartment of EE
Technion
Most figures courtesy of Ben Taskar z”l
Course outline
Supervised
Unsupervised
supervised
Parameter Estimation
DecisionTreeRegression
Bayesian Reasoning Classification Boosting
NearestNeighbor
Theory
Regularization
Linear
Mainly Generative Models
Mainly Discriminative Models
Material
Section 9.5.2 Section 9.2
Outline
• Example and inference (8.1)• Tree learning (8.2)• Impurity (8.3)• Issues (8.4)• Regression (8.5)
Usage
• http://research.microsoft.com/pubs/145347/CVPR%202011%20-%20Final%20Video.mp4
• http://www.slate.com/articles/news_and_politics/politics/2010/08/can_rangel_hold_on.html
Example and inference (8.1)
example
Example Regression (HTF, 2001)
Building decision trees (8.2)
• Input to algorithm• Output: tree
• Q: can we fit a tree to any sample?
• Goals: – accuracy– size (simplicity, generalization)
1,
nk k kx d
Approach
• Top-down– Start from the root
• Greedy / myopic search– One node at a time
• Main question:– Given a tree, how to grow it– In other words, choose a feature and a criteria
example
Intuition
A2 B2A1 B1
Feature a
{8,12}
{8,0} {0,12}
Feature b
{8,12}
{0,0} {8,12}
Intuition II
E3C2 D2C1 D1
Feature c
{8,12}
{4,6} {4,6}
Feature d
{8,12}
{2,3} {6,9}
E2E1
Feature e
{8,12}
{2,3} {3,5} {3,4}
mpg cylinders displacement horsepower weight acceleration modelyear makerBad 8 350 150 4699 14.5 74 AmericaBad 8 400 170 4746 12 71 AmericaBad 8 400 175 4385 12 72 AmericaBad 6 250 72 3158 19.5 75 AmericaBad 8 304 150 3892 12.5 72 AmericaBad 8 350 145 4440 14 75 AmericaBad 6 250 105 3897 18.5 75 AmericaBad 6 163 133 3410 15.8 78 AsiaBad 8 260 110 4060 19 77 AmericaBad 8 305 130 3840 15.4 79 AmericaBad 6 250 110 3520 16.4 77 AmericaBad 6 258 95 3193 17.8 76 AmericaBad 4 121 112 2933 14.5 72 AsiaBad 6 225 105 3613 16.5 74 AmericaBad 4 121 112 2868 15.5 73 AsiaBad 6 225 95 3264 16 75 AmericaBad 6 200 85 2990 18.2 79 AmericaOK 4 121 98 2945 14.5 75 AsiaOK 6 232 90 3085 17.6 76 AmericaOK 4 120 97 2506 14.5 72 EuropeOK 4 151 85 2855 17.6 78 AmericaOK 4 116 75 2158 15.5 73 AsiaOK 4 119 97 2545 17 75 EuropeOK 6 146 120 2930 13.8 81 EuropeOK 4 116 81 2220 16.9 76 AsiaOK 4 156 92 2620 14.4 81 AmericaOK 4 140 88 2870 18.1 80 AmericaOK 4 97 60 1834 19 71 AsiaOK 4 134 95 2560 14.2 78 EuropeOK 4 97 75 2171 16 75 EuropeOK 4 97 78 1940 14.5 77 AsiaOK 4 98 83 2219 16.5 74 AsiaGood 4 79 70 2074 19.5 71 AsiaGood 4 91 68 1970 17.6 82 EuropeGood 4 89 71 1925 14 79 AsiaGood 4 83 61 2003 19 74 EuropeGood 4 112 88 2395 18 82 AmericaGood 4 81 60 1760 16.1 81 EuropeGood 4 135 84 2370 13 82 AmericaGood 4 105 63 2125 14.7 82 AmericaBad 4 135 84 2370 13 82 AmericaBad 4 105 63 2125 14.7 82 America
Stage 1
mpg cylinders displacement horsepower weight acceleration modelyear makerBad 8 350 150 4699 14.5 74 AmericaBad 8 400 170 4746 12 71 AmericaBad 8 400 175 4385 12 72 AmericaBad 6 250 72 3158 19.5 75 AmericaBad 8 304 150 3892 12.5 72 AmericaBad 8 350 145 4440 14 75 AmericaBad 6 250 105 3897 18.5 75 AmericaBad 6 163 133 3410 15.8 78 AsiaBad 8 260 110 4060 19 77 AmericaBad 8 305 130 3840 15.4 79 AmericaBad 6 250 110 3520 16.4 77 AmericaBad 6 258 95 3193 17.8 76 AmericaBad 4 121 112 2933 14.5 72 AsiaBad 6 225 105 3613 16.5 74 AmericaBad 4 121 112 2868 15.5 73 AsiaBad 6 225 95 3264 16 75 AmericaBad 6 200 85 2990 18.2 79 AmericaOK 4 121 98 2945 14.5 75 AsiaOK 6 232 90 3085 17.6 76 AmericaOK 4 120 97 2506 14.5 72 EuropeOK 4 151 85 2855 17.6 78 AmericaOK 4 116 75 2158 15.5 73 AsiaOK 4 119 97 2545 17 75 EuropeOK 6 146 120 2930 13.8 81 EuropeOK 4 116 81 2220 16.9 76 AsiaOK 4 156 92 2620 14.4 81 AmericaOK 4 140 88 2870 18.1 80 AmericaOK 4 97 60 1834 19 71 AsiaOK 4 134 95 2560 14.2 78 EuropeOK 4 97 75 2171 16 75 EuropeOK 4 97 78 1940 14.5 77 AsiaOK 4 98 83 2219 16.5 74 AsiaGood 4 79 70 2074 19.5 71 AsiaGood 4 91 68 1970 17.6 82 EuropeGood 4 89 71 1925 14 79 AsiaGood 4 83 61 2003 19 74 EuropeGood 4 112 88 2395 18 82 AmericaGood 4 81 60 1760 16.1 81 EuropeGood 4 135 84 2370 13 82 AmericaGood 4 105 63 2125 14.7 82 AmericaBad 4 135 84 2370 13 82 AmericaBad 4 105 63 2125 14.7 82 America
Stage 2
mpg cylinders displacement horsepower weight acceleration modelyear makerBad 8 350 150 4699 14.5 74 AmericaBad 8 400 170 4746 12 71 AmericaBad 8 400 175 4385 12 72 AmericaBad 6 250 72 3158 19.5 75 AmericaBad 8 304 150 3892 12.5 72 AmericaBad 8 350 145 4440 14 75 AmericaBad 6 250 105 3897 18.5 75 AmericaBad 6 163 133 3410 15.8 78 AsiaBad 8 260 110 4060 19 77 AmericaBad 8 305 130 3840 15.4 79 AmericaBad 6 250 110 3520 16.4 77 AmericaBad 6 258 95 3193 17.8 76 AmericaBad 4 121 112 2933 14.5 72 AsiaBad 6 225 105 3613 16.5 74 AmericaBad 4 121 112 2868 15.5 73 AsiaBad 6 225 95 3264 16 75 AmericaBad 6 200 85 2990 18.2 79 AmericaOK 4 121 98 2945 14.5 75 AsiaOK 6 232 90 3085 17.6 76 AmericaOK 4 120 97 2506 14.5 72 EuropeOK 4 151 85 2855 17.6 78 AmericaOK 4 116 75 2158 15.5 73 AsiaOK 4 119 97 2545 17 75 EuropeOK 6 146 120 2930 13.8 81 EuropeOK 4 116 81 2220 16.9 76 AsiaOK 4 156 92 2620 14.4 81 AmericaOK 4 140 88 2870 18.1 80 AmericaOK 4 97 60 1834 19 71 AsiaOK 4 134 95 2560 14.2 78 EuropeOK 4 97 75 2171 16 75 EuropeOK 4 97 78 1940 14.5 77 AsiaOK 4 98 83 2219 16.5 74 AsiaGood 4 79 70 2074 19.5 71 AsiaGood 4 91 68 1970 17.6 82 EuropeGood 4 89 71 1925 14 79 AsiaGood 4 83 61 2003 19 74 EuropeGood 4 112 88 2395 18 82 AmericaGood 4 81 60 1760 16.1 81 EuropeGood 4 135 84 2370 13 82 AmericaGood 4 105 63 2125 14.7 82 AmericaBad 4 135 84 2370 13 82 AmericaBad 4 105 63 2125 14.7 82 America
mpg cylinders displacement horsepower weight acceleration modelyear makerBad 8 350 150 4699 14.5 74 AmericaBad 8 400 170 4746 12 71 AmericaBad 8 400 175 4385 12 72 AmericaBad 6 250 72 3158 19.5 75 AmericaBad 8 304 150 3892 12.5 72 AmericaBad 8 350 145 4440 14 75 AmericaBad 6 250 105 3897 18.5 75 AmericaBad 6 163 133 3410 15.8 78 AsiaBad 8 260 110 4060 19 77 AmericaBad 8 305 130 3840 15.4 79 AmericaBad 6 250 110 3520 16.4 77 AmericaBad 6 258 95 3193 17.8 76 AmericaBad 4 121 112 2933 14.5 72 AsiaBad 6 225 105 3613 16.5 74 AmericaBad 4 121 112 2868 15.5 73 AsiaBad 6 225 95 3264 16 75 AmericaBad 6 200 85 2990 18.2 79 AmericaOK 4 121 98 2945 14.5 75 AsiaOK 6 232 90 3085 17.6 76 AmericaOK 4 120 97 2506 14.5 72 EuropeOK 4 151 85 2855 17.6 78 AmericaOK 4 116 75 2158 15.5 73 AsiaOK 4 119 97 2545 17 75 EuropeOK 6 146 120 2930 13.8 81 EuropeOK 4 116 81 2220 16.9 76 AsiaOK 4 156 92 2620 14.4 81 AmericaOK 4 140 88 2870 18.1 80 AmericaOK 4 97 60 1834 19 71 AsiaOK 4 134 95 2560 14.2 78 EuropeOK 4 97 75 2171 16 75 EuropeOK 4 97 78 1940 14.5 77 AsiaOK 4 98 83 2219 16.5 74 AsiaGood 4 79 70 2074 19.5 71 AsiaGood 4 91 68 1970 17.6 82 EuropeGood 4 89 71 1925 14 79 AsiaGood 4 83 61 2003 19 74 EuropeGood 4 112 88 2395 18 82 AmericaGood 4 81 60 1760 16.1 81 EuropeGood 4 135 84 2370 13 82 AmericaGood 4 105 63 2125 14.7 82 AmericaBad 4 135 84 2370 13 82 AmericaBad 4 105 63 2125 14.7 82 America
mpg cylinders displacement horsepower weight acceleration modelyear makerBad 8 350 150 4699 14.5 74 AmericaBad 8 400 170 4746 12 71 AmericaBad 8 400 175 4385 12 72 AmericaBad 6 250 72 3158 19.5 75 AmericaBad 8 304 150 3892 12.5 72 AmericaBad 8 350 145 4440 14 75 AmericaBad 6 250 105 3897 18.5 75 AmericaBad 6 163 133 3410 15.8 78 AsiaBad 8 260 110 4060 19 77 AmericaBad 8 305 130 3840 15.4 79 AmericaBad 6 250 110 3520 16.4 77 AmericaBad 6 258 95 3193 17.8 76 AmericaBad 4 121 112 2933 14.5 72 AsiaBad 6 225 105 3613 16.5 74 AmericaBad 4 121 112 2868 15.5 73 AsiaBad 6 225 95 3264 16 75 AmericaBad 6 200 85 2990 18.2 79 AmericaOK 4 121 98 2945 14.5 75 AsiaOK 6 232 90 3085 17.6 76 AmericaOK 4 120 97 2506 14.5 72 EuropeOK 4 151 85 2855 17.6 78 AmericaOK 4 116 75 2158 15.5 73 AsiaOK 4 119 97 2545 17 75 EuropeOK 6 146 120 2930 13.8 81 EuropeOK 4 116 81 2220 16.9 76 AsiaOK 4 156 92 2620 14.4 81 AmericaOK 4 140 88 2870 18.1 80 AmericaOK 4 97 60 1834 19 71 AsiaOK 4 134 95 2560 14.2 78 EuropeOK 4 97 75 2171 16 75 EuropeOK 4 97 78 1940 14.5 77 AsiaOK 4 98 83 2219 16.5 74 AsiaGood 4 79 70 2074 19.5 71 AsiaGood 4 91 68 1970 17.6 82 EuropeGood 4 89 71 1925 14 79 AsiaGood 4 83 61 2003 19 74 EuropeGood 4 112 88 2395 18 82 AmericaGood 4 81 60 1760 16.1 81 EuropeGood 4 135 84 2370 13 82 AmericaGood 4 105 63 2125 14.7 82 AmericaBad 4 135 84 2370 13 82 AmericaBad 4 105 63 2125 14.7 82 America
mpg cylinders displacement horsepower weight acceleration modelyear makerBad 8 350 150 4699 14.5 74 AmericaBad 8 400 170 4746 12 71 AmericaBad 8 400 175 4385 12 72 AmericaBad 6 250 72 3158 19.5 75 AmericaBad 8 304 150 3892 12.5 72 AmericaBad 8 350 145 4440 14 75 AmericaBad 6 250 105 3897 18.5 75 AmericaBad 6 163 133 3410 15.8 78 AsiaBad 8 260 110 4060 19 77 AmericaBad 8 305 130 3840 15.4 79 AmericaBad 6 250 110 3520 16.4 77 AmericaBad 6 258 95 3193 17.8 76 AmericaBad 4 121 112 2933 14.5 72 AsiaBad 6 225 105 3613 16.5 74 AmericaBad 4 121 112 2868 15.5 73 AsiaBad 6 225 95 3264 16 75 AmericaBad 6 200 85 2990 18.2 79 AmericaOK 4 121 98 2945 14.5 75 AsiaOK 6 232 90 3085 17.6 76 AmericaOK 4 120 97 2506 14.5 72 EuropeOK 4 151 85 2855 17.6 78 AmericaOK 4 116 75 2158 15.5 73 AsiaOK 4 119 97 2545 17 75 EuropeOK 6 146 120 2930 13.8 81 EuropeOK 4 116 81 2220 16.9 76 AsiaOK 4 156 92 2620 14.4 81 AmericaOK 4 140 88 2870 18.1 80 AmericaOK 4 97 60 1834 19 71 AsiaOK 4 134 95 2560 14.2 78 EuropeOK 4 97 75 2171 16 75 EuropeOK 4 97 78 1940 14.5 77 AsiaOK 4 98 83 2219 16.5 74 AsiaGood 4 79 70 2074 19.5 71 AsiaGood 4 91 68 1970 17.6 82 EuropeGood 4 89 71 1925 14 79 AsiaGood 4 83 61 2003 19 74 EuropeGood 4 112 88 2395 18 82 AmericaGood 4 81 60 1760 16.1 81 EuropeGood 4 135 84 2370 13 82 AmericaGood 4 105 63 2125 14.7 82 AmericaBad 4 135 84 2370 13 82 AmericaBad 4 105 63 2125 14.7 82 America
Impurity (8.3)
• Given a set (training set or subset of it)
• Denote empirical distribution of labels
• Goal: measure the impurity of the distribution
1,
Nk k k
S x y 1{ , , }k Ky c c
1
1ˆ { }j
N
k jk
p I y cN
1ˆ ˆ ˆ( , , )Kp p p
Impurity functions
• Bayes-optimal error• Gini index• Entropy
• Properties:– For point-distribution– For uniform distribution
ˆ ˆ ˆ( ) (1 )j jjQ p p p
{1, , }ˆ ˆ( ) 1 max j N jQ p p
2 21
ˆ ˆ ˆ ˆ ˆ( ) ( ) log log ( )ˆj j jj jj
Q p H p p p pp
ˆ( ) 0Q p
ˆ( ) ismaximalQ p
ˆˆ( )) (pQ Q QSp S
illustration
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
p1
Q(p
)
misclassificationGinientropy0.5*entropy
Information of a split
• Pick a node, with a set S of size N• Compute the impurity of the set Q(S)• Pick a criteria A• split the set S into M subsets• The average impurity of these sets is
• Reduction of impurity (or increase of purity)
{ : 1,2, , }mS m M
1
| |( | ) ( )
Mm
mm
SQ S A Q S
N
( | ) ( ) ( | )Q S A Q S Q S A
Algorithm
• Pick the test A which maximizes
• Q: how many values to consider?
• Lemma:
• ( see code below )
( | )Q S A
0 |Q S A Q S
Algorithm
• Initialize: single leaf (what label?)• Iterate:
– Go over all leafs– Go over all features d– Go over all splitting values N– Pick (leaf, feature, splitting value) that reduces most
impurity– Replace leaf with:
• new node• two new leafs (their label?)
Issues (8.4)
• number of splits• Missing features• Prevent over-fitting
– Early stopping– pruning
• Optimality vs greediness (Rivest et al, 76)
Example: xor
• Function:• Tree with single node?• Tree with two nodes
21xxsigny label input
1 (1,1)
1 (-1,-1)
-1 (-1,1)
-1 (1,-1)
X1>0
+1
X2>0X2>0
-1
-11
+1
yes
yes yesNo
no
no
Regression (8.5)
• Value of leaf– Replace a single label with
majority of outputs
• Impurity of a leaf– Replace discrete functions above with variance
{( , )}Ni i iS x y y
1( ) i
i
yN
y S
2( ) ( )1
)( ii
yN
Q S y S