Introduction to Machine Learning Fall 2013 Decision Trees Koby Crammer Department of EE Technion...

34
Introduction to Machine Learning Fall 2013 Decision Trees Koby Crammer Department of EE Technion Most figures courtesy of Ben Taskar z”l

Transcript of Introduction to Machine Learning Fall 2013 Decision Trees Koby Crammer Department of EE Technion...

Page 1: Introduction to Machine Learning Fall 2013 Decision Trees Koby Crammer Department of EE Technion Most figures courtesy of Ben Taskar zl.

Introduction to Machine LearningFall 2013

Decision Trees

Koby CrammerDepartment of EE

Technion

Most figures courtesy of Ben Taskar z”l

Page 2: Introduction to Machine Learning Fall 2013 Decision Trees Koby Crammer Department of EE Technion Most figures courtesy of Ben Taskar zl.

Course outline

Supervised

Unsupervised

Page 3: Introduction to Machine Learning Fall 2013 Decision Trees Koby Crammer Department of EE Technion Most figures courtesy of Ben Taskar zl.

supervised

Parameter Estimation

DecisionTreeRegression

Bayesian Reasoning Classification Boosting

NearestNeighbor

Theory

Regularization

Linear

Mainly Generative Models

Mainly Discriminative Models

Page 4: Introduction to Machine Learning Fall 2013 Decision Trees Koby Crammer Department of EE Technion Most figures courtesy of Ben Taskar zl.

Material

Section 9.5.2 Section 9.2

Page 5: Introduction to Machine Learning Fall 2013 Decision Trees Koby Crammer Department of EE Technion Most figures courtesy of Ben Taskar zl.

Outline

• Example and inference (8.1)• Tree learning (8.2)• Impurity (8.3)• Issues (8.4)• Regression (8.5)

Page 7: Introduction to Machine Learning Fall 2013 Decision Trees Koby Crammer Department of EE Technion Most figures courtesy of Ben Taskar zl.

Example and inference (8.1)

Page 8: Introduction to Machine Learning Fall 2013 Decision Trees Koby Crammer Department of EE Technion Most figures courtesy of Ben Taskar zl.

example

Page 9: Introduction to Machine Learning Fall 2013 Decision Trees Koby Crammer Department of EE Technion Most figures courtesy of Ben Taskar zl.

Example Regression (HTF, 2001)

Page 10: Introduction to Machine Learning Fall 2013 Decision Trees Koby Crammer Department of EE Technion Most figures courtesy of Ben Taskar zl.

Building decision trees (8.2)

• Input to algorithm• Output: tree

• Q: can we fit a tree to any sample?

• Goals: – accuracy– size (simplicity, generalization)

1,

nk k kx d

Page 11: Introduction to Machine Learning Fall 2013 Decision Trees Koby Crammer Department of EE Technion Most figures courtesy of Ben Taskar zl.

Approach

• Top-down– Start from the root

• Greedy / myopic search– One node at a time

• Main question:– Given a tree, how to grow it– In other words, choose a feature and a criteria

Page 12: Introduction to Machine Learning Fall 2013 Decision Trees Koby Crammer Department of EE Technion Most figures courtesy of Ben Taskar zl.

example

Page 13: Introduction to Machine Learning Fall 2013 Decision Trees Koby Crammer Department of EE Technion Most figures courtesy of Ben Taskar zl.

Intuition

A2 B2A1 B1

Feature a

{8,12}

{8,0} {0,12}

Feature b

{8,12}

{0,0} {8,12}

Page 14: Introduction to Machine Learning Fall 2013 Decision Trees Koby Crammer Department of EE Technion Most figures courtesy of Ben Taskar zl.

Intuition II

E3C2 D2C1 D1

Feature c

{8,12}

{4,6} {4,6}

Feature d

{8,12}

{2,3} {6,9}

E2E1

Feature e

{8,12}

{2,3} {3,5} {3,4}

Page 15: Introduction to Machine Learning Fall 2013 Decision Trees Koby Crammer Department of EE Technion Most figures courtesy of Ben Taskar zl.

mpg cylinders displacement horsepower weight acceleration modelyear makerBad 8 350 150 4699 14.5 74 AmericaBad 8 400 170 4746 12 71 AmericaBad 8 400 175 4385 12 72 AmericaBad 6 250 72 3158 19.5 75 AmericaBad 8 304 150 3892 12.5 72 AmericaBad 8 350 145 4440 14 75 AmericaBad 6 250 105 3897 18.5 75 AmericaBad 6 163 133 3410 15.8 78 AsiaBad 8 260 110 4060 19 77 AmericaBad 8 305 130 3840 15.4 79 AmericaBad 6 250 110 3520 16.4 77 AmericaBad 6 258 95 3193 17.8 76 AmericaBad 4 121 112 2933 14.5 72 AsiaBad 6 225 105 3613 16.5 74 AmericaBad 4 121 112 2868 15.5 73 AsiaBad 6 225 95 3264 16 75 AmericaBad 6 200 85 2990 18.2 79 AmericaOK 4 121 98 2945 14.5 75 AsiaOK 6 232 90 3085 17.6 76 AmericaOK 4 120 97 2506 14.5 72 EuropeOK 4 151 85 2855 17.6 78 AmericaOK 4 116 75 2158 15.5 73 AsiaOK 4 119 97 2545 17 75 EuropeOK 6 146 120 2930 13.8 81 EuropeOK 4 116 81 2220 16.9 76 AsiaOK 4 156 92 2620 14.4 81 AmericaOK 4 140 88 2870 18.1 80 AmericaOK 4 97 60 1834 19 71 AsiaOK 4 134 95 2560 14.2 78 EuropeOK 4 97 75 2171 16 75 EuropeOK 4 97 78 1940 14.5 77 AsiaOK 4 98 83 2219 16.5 74 AsiaGood 4 79 70 2074 19.5 71 AsiaGood 4 91 68 1970 17.6 82 EuropeGood 4 89 71 1925 14 79 AsiaGood 4 83 61 2003 19 74 EuropeGood 4 112 88 2395 18 82 AmericaGood 4 81 60 1760 16.1 81 EuropeGood 4 135 84 2370 13 82 AmericaGood 4 105 63 2125 14.7 82 AmericaBad 4 135 84 2370 13 82 AmericaBad 4 105 63 2125 14.7 82 America

Page 16: Introduction to Machine Learning Fall 2013 Decision Trees Koby Crammer Department of EE Technion Most figures courtesy of Ben Taskar zl.

Stage 1

Page 17: Introduction to Machine Learning Fall 2013 Decision Trees Koby Crammer Department of EE Technion Most figures courtesy of Ben Taskar zl.

mpg cylinders displacement horsepower weight acceleration modelyear makerBad 8 350 150 4699 14.5 74 AmericaBad 8 400 170 4746 12 71 AmericaBad 8 400 175 4385 12 72 AmericaBad 6 250 72 3158 19.5 75 AmericaBad 8 304 150 3892 12.5 72 AmericaBad 8 350 145 4440 14 75 AmericaBad 6 250 105 3897 18.5 75 AmericaBad 6 163 133 3410 15.8 78 AsiaBad 8 260 110 4060 19 77 AmericaBad 8 305 130 3840 15.4 79 AmericaBad 6 250 110 3520 16.4 77 AmericaBad 6 258 95 3193 17.8 76 AmericaBad 4 121 112 2933 14.5 72 AsiaBad 6 225 105 3613 16.5 74 AmericaBad 4 121 112 2868 15.5 73 AsiaBad 6 225 95 3264 16 75 AmericaBad 6 200 85 2990 18.2 79 AmericaOK 4 121 98 2945 14.5 75 AsiaOK 6 232 90 3085 17.6 76 AmericaOK 4 120 97 2506 14.5 72 EuropeOK 4 151 85 2855 17.6 78 AmericaOK 4 116 75 2158 15.5 73 AsiaOK 4 119 97 2545 17 75 EuropeOK 6 146 120 2930 13.8 81 EuropeOK 4 116 81 2220 16.9 76 AsiaOK 4 156 92 2620 14.4 81 AmericaOK 4 140 88 2870 18.1 80 AmericaOK 4 97 60 1834 19 71 AsiaOK 4 134 95 2560 14.2 78 EuropeOK 4 97 75 2171 16 75 EuropeOK 4 97 78 1940 14.5 77 AsiaOK 4 98 83 2219 16.5 74 AsiaGood 4 79 70 2074 19.5 71 AsiaGood 4 91 68 1970 17.6 82 EuropeGood 4 89 71 1925 14 79 AsiaGood 4 83 61 2003 19 74 EuropeGood 4 112 88 2395 18 82 AmericaGood 4 81 60 1760 16.1 81 EuropeGood 4 135 84 2370 13 82 AmericaGood 4 105 63 2125 14.7 82 AmericaBad 4 135 84 2370 13 82 AmericaBad 4 105 63 2125 14.7 82 America

Page 18: Introduction to Machine Learning Fall 2013 Decision Trees Koby Crammer Department of EE Technion Most figures courtesy of Ben Taskar zl.

Stage 2

Page 19: Introduction to Machine Learning Fall 2013 Decision Trees Koby Crammer Department of EE Technion Most figures courtesy of Ben Taskar zl.

mpg cylinders displacement horsepower weight acceleration modelyear makerBad 8 350 150 4699 14.5 74 AmericaBad 8 400 170 4746 12 71 AmericaBad 8 400 175 4385 12 72 AmericaBad 6 250 72 3158 19.5 75 AmericaBad 8 304 150 3892 12.5 72 AmericaBad 8 350 145 4440 14 75 AmericaBad 6 250 105 3897 18.5 75 AmericaBad 6 163 133 3410 15.8 78 AsiaBad 8 260 110 4060 19 77 AmericaBad 8 305 130 3840 15.4 79 AmericaBad 6 250 110 3520 16.4 77 AmericaBad 6 258 95 3193 17.8 76 AmericaBad 4 121 112 2933 14.5 72 AsiaBad 6 225 105 3613 16.5 74 AmericaBad 4 121 112 2868 15.5 73 AsiaBad 6 225 95 3264 16 75 AmericaBad 6 200 85 2990 18.2 79 AmericaOK 4 121 98 2945 14.5 75 AsiaOK 6 232 90 3085 17.6 76 AmericaOK 4 120 97 2506 14.5 72 EuropeOK 4 151 85 2855 17.6 78 AmericaOK 4 116 75 2158 15.5 73 AsiaOK 4 119 97 2545 17 75 EuropeOK 6 146 120 2930 13.8 81 EuropeOK 4 116 81 2220 16.9 76 AsiaOK 4 156 92 2620 14.4 81 AmericaOK 4 140 88 2870 18.1 80 AmericaOK 4 97 60 1834 19 71 AsiaOK 4 134 95 2560 14.2 78 EuropeOK 4 97 75 2171 16 75 EuropeOK 4 97 78 1940 14.5 77 AsiaOK 4 98 83 2219 16.5 74 AsiaGood 4 79 70 2074 19.5 71 AsiaGood 4 91 68 1970 17.6 82 EuropeGood 4 89 71 1925 14 79 AsiaGood 4 83 61 2003 19 74 EuropeGood 4 112 88 2395 18 82 AmericaGood 4 81 60 1760 16.1 81 EuropeGood 4 135 84 2370 13 82 AmericaGood 4 105 63 2125 14.7 82 AmericaBad 4 135 84 2370 13 82 AmericaBad 4 105 63 2125 14.7 82 America

Page 20: Introduction to Machine Learning Fall 2013 Decision Trees Koby Crammer Department of EE Technion Most figures courtesy of Ben Taskar zl.
Page 21: Introduction to Machine Learning Fall 2013 Decision Trees Koby Crammer Department of EE Technion Most figures courtesy of Ben Taskar zl.

mpg cylinders displacement horsepower weight acceleration modelyear makerBad 8 350 150 4699 14.5 74 AmericaBad 8 400 170 4746 12 71 AmericaBad 8 400 175 4385 12 72 AmericaBad 6 250 72 3158 19.5 75 AmericaBad 8 304 150 3892 12.5 72 AmericaBad 8 350 145 4440 14 75 AmericaBad 6 250 105 3897 18.5 75 AmericaBad 6 163 133 3410 15.8 78 AsiaBad 8 260 110 4060 19 77 AmericaBad 8 305 130 3840 15.4 79 AmericaBad 6 250 110 3520 16.4 77 AmericaBad 6 258 95 3193 17.8 76 AmericaBad 4 121 112 2933 14.5 72 AsiaBad 6 225 105 3613 16.5 74 AmericaBad 4 121 112 2868 15.5 73 AsiaBad 6 225 95 3264 16 75 AmericaBad 6 200 85 2990 18.2 79 AmericaOK 4 121 98 2945 14.5 75 AsiaOK 6 232 90 3085 17.6 76 AmericaOK 4 120 97 2506 14.5 72 EuropeOK 4 151 85 2855 17.6 78 AmericaOK 4 116 75 2158 15.5 73 AsiaOK 4 119 97 2545 17 75 EuropeOK 6 146 120 2930 13.8 81 EuropeOK 4 116 81 2220 16.9 76 AsiaOK 4 156 92 2620 14.4 81 AmericaOK 4 140 88 2870 18.1 80 AmericaOK 4 97 60 1834 19 71 AsiaOK 4 134 95 2560 14.2 78 EuropeOK 4 97 75 2171 16 75 EuropeOK 4 97 78 1940 14.5 77 AsiaOK 4 98 83 2219 16.5 74 AsiaGood 4 79 70 2074 19.5 71 AsiaGood 4 91 68 1970 17.6 82 EuropeGood 4 89 71 1925 14 79 AsiaGood 4 83 61 2003 19 74 EuropeGood 4 112 88 2395 18 82 AmericaGood 4 81 60 1760 16.1 81 EuropeGood 4 135 84 2370 13 82 AmericaGood 4 105 63 2125 14.7 82 AmericaBad 4 135 84 2370 13 82 AmericaBad 4 105 63 2125 14.7 82 America

Page 22: Introduction to Machine Learning Fall 2013 Decision Trees Koby Crammer Department of EE Technion Most figures courtesy of Ben Taskar zl.
Page 23: Introduction to Machine Learning Fall 2013 Decision Trees Koby Crammer Department of EE Technion Most figures courtesy of Ben Taskar zl.

mpg cylinders displacement horsepower weight acceleration modelyear makerBad 8 350 150 4699 14.5 74 AmericaBad 8 400 170 4746 12 71 AmericaBad 8 400 175 4385 12 72 AmericaBad 6 250 72 3158 19.5 75 AmericaBad 8 304 150 3892 12.5 72 AmericaBad 8 350 145 4440 14 75 AmericaBad 6 250 105 3897 18.5 75 AmericaBad 6 163 133 3410 15.8 78 AsiaBad 8 260 110 4060 19 77 AmericaBad 8 305 130 3840 15.4 79 AmericaBad 6 250 110 3520 16.4 77 AmericaBad 6 258 95 3193 17.8 76 AmericaBad 4 121 112 2933 14.5 72 AsiaBad 6 225 105 3613 16.5 74 AmericaBad 4 121 112 2868 15.5 73 AsiaBad 6 225 95 3264 16 75 AmericaBad 6 200 85 2990 18.2 79 AmericaOK 4 121 98 2945 14.5 75 AsiaOK 6 232 90 3085 17.6 76 AmericaOK 4 120 97 2506 14.5 72 EuropeOK 4 151 85 2855 17.6 78 AmericaOK 4 116 75 2158 15.5 73 AsiaOK 4 119 97 2545 17 75 EuropeOK 6 146 120 2930 13.8 81 EuropeOK 4 116 81 2220 16.9 76 AsiaOK 4 156 92 2620 14.4 81 AmericaOK 4 140 88 2870 18.1 80 AmericaOK 4 97 60 1834 19 71 AsiaOK 4 134 95 2560 14.2 78 EuropeOK 4 97 75 2171 16 75 EuropeOK 4 97 78 1940 14.5 77 AsiaOK 4 98 83 2219 16.5 74 AsiaGood 4 79 70 2074 19.5 71 AsiaGood 4 91 68 1970 17.6 82 EuropeGood 4 89 71 1925 14 79 AsiaGood 4 83 61 2003 19 74 EuropeGood 4 112 88 2395 18 82 AmericaGood 4 81 60 1760 16.1 81 EuropeGood 4 135 84 2370 13 82 AmericaGood 4 105 63 2125 14.7 82 AmericaBad 4 135 84 2370 13 82 AmericaBad 4 105 63 2125 14.7 82 America

Page 24: Introduction to Machine Learning Fall 2013 Decision Trees Koby Crammer Department of EE Technion Most figures courtesy of Ben Taskar zl.
Page 25: Introduction to Machine Learning Fall 2013 Decision Trees Koby Crammer Department of EE Technion Most figures courtesy of Ben Taskar zl.

mpg cylinders displacement horsepower weight acceleration modelyear makerBad 8 350 150 4699 14.5 74 AmericaBad 8 400 170 4746 12 71 AmericaBad 8 400 175 4385 12 72 AmericaBad 6 250 72 3158 19.5 75 AmericaBad 8 304 150 3892 12.5 72 AmericaBad 8 350 145 4440 14 75 AmericaBad 6 250 105 3897 18.5 75 AmericaBad 6 163 133 3410 15.8 78 AsiaBad 8 260 110 4060 19 77 AmericaBad 8 305 130 3840 15.4 79 AmericaBad 6 250 110 3520 16.4 77 AmericaBad 6 258 95 3193 17.8 76 AmericaBad 4 121 112 2933 14.5 72 AsiaBad 6 225 105 3613 16.5 74 AmericaBad 4 121 112 2868 15.5 73 AsiaBad 6 225 95 3264 16 75 AmericaBad 6 200 85 2990 18.2 79 AmericaOK 4 121 98 2945 14.5 75 AsiaOK 6 232 90 3085 17.6 76 AmericaOK 4 120 97 2506 14.5 72 EuropeOK 4 151 85 2855 17.6 78 AmericaOK 4 116 75 2158 15.5 73 AsiaOK 4 119 97 2545 17 75 EuropeOK 6 146 120 2930 13.8 81 EuropeOK 4 116 81 2220 16.9 76 AsiaOK 4 156 92 2620 14.4 81 AmericaOK 4 140 88 2870 18.1 80 AmericaOK 4 97 60 1834 19 71 AsiaOK 4 134 95 2560 14.2 78 EuropeOK 4 97 75 2171 16 75 EuropeOK 4 97 78 1940 14.5 77 AsiaOK 4 98 83 2219 16.5 74 AsiaGood 4 79 70 2074 19.5 71 AsiaGood 4 91 68 1970 17.6 82 EuropeGood 4 89 71 1925 14 79 AsiaGood 4 83 61 2003 19 74 EuropeGood 4 112 88 2395 18 82 AmericaGood 4 81 60 1760 16.1 81 EuropeGood 4 135 84 2370 13 82 AmericaGood 4 105 63 2125 14.7 82 AmericaBad 4 135 84 2370 13 82 AmericaBad 4 105 63 2125 14.7 82 America

Page 26: Introduction to Machine Learning Fall 2013 Decision Trees Koby Crammer Department of EE Technion Most figures courtesy of Ben Taskar zl.

Impurity (8.3)

• Given a set (training set or subset of it)

• Denote empirical distribution of labels

• Goal: measure the impurity of the distribution

1,

Nk k k

S x y 1{ , , }k Ky c c

1

1ˆ { }j

N

k jk

p I y cN

1ˆ ˆ ˆ( , , )Kp p p

Page 27: Introduction to Machine Learning Fall 2013 Decision Trees Koby Crammer Department of EE Technion Most figures courtesy of Ben Taskar zl.

Impurity functions

• Bayes-optimal error• Gini index• Entropy

• Properties:– For point-distribution– For uniform distribution

ˆ ˆ ˆ( ) (1 )j jjQ p p p

{1, , }ˆ ˆ( ) 1 max j N jQ p p

2 21

ˆ ˆ ˆ ˆ ˆ( ) ( ) log log ( )ˆj j jj jj

Q p H p p p pp

ˆ( ) 0Q p

ˆ( ) ismaximalQ p

ˆˆ( )) (pQ Q QSp S

Page 28: Introduction to Machine Learning Fall 2013 Decision Trees Koby Crammer Department of EE Technion Most figures courtesy of Ben Taskar zl.

illustration

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

p1

Q(p

)

misclassificationGinientropy0.5*entropy

Page 29: Introduction to Machine Learning Fall 2013 Decision Trees Koby Crammer Department of EE Technion Most figures courtesy of Ben Taskar zl.

Information of a split

• Pick a node, with a set S of size N• Compute the impurity of the set Q(S)• Pick a criteria A• split the set S into M subsets• The average impurity of these sets is

• Reduction of impurity (or increase of purity)

{ : 1,2, , }mS m M

1

| |( | ) ( )

Mm

mm

SQ S A Q S

N

( | ) ( ) ( | )Q S A Q S Q S A

Page 30: Introduction to Machine Learning Fall 2013 Decision Trees Koby Crammer Department of EE Technion Most figures courtesy of Ben Taskar zl.

Algorithm

• Pick the test A which maximizes

• Q: how many values to consider?

• Lemma:

• ( see code below )

( | )Q S A

0 |Q S A Q S

Page 31: Introduction to Machine Learning Fall 2013 Decision Trees Koby Crammer Department of EE Technion Most figures courtesy of Ben Taskar zl.

Algorithm

• Initialize: single leaf (what label?)• Iterate:

– Go over all leafs– Go over all features d– Go over all splitting values N– Pick (leaf, feature, splitting value) that reduces most

impurity– Replace leaf with:

• new node• two new leafs (their label?)

Page 32: Introduction to Machine Learning Fall 2013 Decision Trees Koby Crammer Department of EE Technion Most figures courtesy of Ben Taskar zl.

Issues (8.4)

• number of splits• Missing features• Prevent over-fitting

– Early stopping– pruning

• Optimality vs greediness (Rivest et al, 76)

Page 33: Introduction to Machine Learning Fall 2013 Decision Trees Koby Crammer Department of EE Technion Most figures courtesy of Ben Taskar zl.

Example: xor

• Function:• Tree with single node?• Tree with two nodes

21xxsigny label input

1 (1,1)

1 (-1,-1)

-1 (-1,1)

-1 (1,-1)

X1>0

+1

X2>0X2>0

-1

-11

+1

yes

yes yesNo

no

no

Page 34: Introduction to Machine Learning Fall 2013 Decision Trees Koby Crammer Department of EE Technion Most figures courtesy of Ben Taskar zl.

Regression (8.5)

• Value of leaf– Replace a single label with

majority of outputs

• Impurity of a leaf– Replace discrete functions above with variance

{( , )}Ni i iS x y y

1( ) i

i

yN

y S

2( ) ( )1

)( ii

yN

Q S y S