Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have...

67
Machine Learning Learning Decision Trees 1 Some slides from Tom Mitchell, Dan Roth and others

Transcript of Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have...

Page 1: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

MachineLearning

LearningDecisionTrees

1SomeslidesfromTomMitchell,DanRothandothers

Page 2: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

Thislecture:LearningDecisionTrees

1. Representation:Whataredecisiontrees?

2. Algorithm:Learningdecisiontrees– TheID3algorithm:Agreedyheuristic

3. Someextensions

2

Page 3: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

Thislecture:LearningDecisionTrees

1. Representation:Whataredecisiontrees?

2. Algorithm:Learningdecisiontrees– TheID3algorithm:Agreedyheuristic

3. Someextensions

3

Page 4: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

HistoryofDecisionTreeResearch

• Fullsearchdecisiontreemethodstomodelhumanconceptlearning:Huntetal60s,psychology

• QuinlandevelopedtheID3(IterativeDichotomiser 3)algorithm,withtheinformationgainheuristictolearnexpertsystemsfromexamples(late70s)

• Breiman,FreidmanandcolleaguesinstatisticsdevelopedCART (ClassificationAndRegressionTrees)

• Avarietyofimprovementsinthe80s:copingwithnoise,continuousattributes,missingdata,non-axisparallel,etc.

• Quinlan’supdatedalgorithms,C4.5(1993)andC5aremorecommonlyused

• Boosting(orBagging)overdecisiontreesisaverygoodgeneralpurposealgorithm

4

Page 5: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

WillIplaytennistoday?

• Features– Outlook: {Sun,Overcast,Rain}– Temperature: {Hot,Mild,Cool}– Humidity: {High,Normal,Low}– Wind: {Strong,Weak}

• Labels– Binaryclassificationtask:Y={+,-}

5

Page 6: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

WillIplaytennistoday?Outlook: Sunny,

Overcast,Rainy

Temperature: Hot,Medium,Cool

Humidity: High,Normal,Low

Wind: Strong,Weak

6

O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -

Page 7: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

BasicDecisionTreeLearningAlgorithm

• DataisprocessedinBatch(i.e.allthedataavailable)

7

O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -

Page 8: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

BasicDecisionTreeLearningAlgorithm

• DataisprocessedinBatch(i.e.allthedataavailable)• Recursivelybuildadecisiontreetopdown.

8

O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -

Outlook?

Sunny Overcast Rain

Humidity?

High Normal

Wind?

Strong Weak

No Yes

Yes

YesNo

Page 9: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

BasicDecisionTreeLearningAlgorithm

• DataisprocessedinBatch(i.e.allthedataavailable)• Recursivelybuildadecisiontreetopdown.

9

O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -

Outlook?

Sunny Overcast Rain

Humidity?

High Normal

Wind?

Strong Weak

No Yes

Yes

YesNo

1. Decidewhatattributegoesatthetop

Page 10: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

BasicDecisionTreeLearningAlgorithm

• DataisprocessedinBatch(i.e.allthedataavailable)• Recursivelybuildadecisiontreetopdown.

10

O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -

Outlook?

Sunny Overcast Rain

Humidity?

High Normal

Wind?

Strong Weak

No Yes

Yes

YesNo

1. Decidewhatattributegoesatthetop

2. Decidewhattodoforeachvaluetherootattributetakes

Page 11: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

BasicDecisionTreeAlgorithm:ID3

1. Ifallexamplesarehavesamelabel:Returnasinglenodetreewiththelabel

2. Otherwise1. CreateaRoot node fortree2. A =attributeinAttributesthatbest classifiesS3. foreachpossiblevalue v ofthatAcantake:

1. AddanewtreebranchcorrespondingtoA=v2. LetSv bethesubsetofexamplesinS withA=v3. ifSv isempty:

addleafnodewiththecommonvalueofLabelinSElse:belowthisbranchaddthesubtree ID3(Sv,Attributes- {A},Label)

4. ReturnRoot node

why?

Forgeneralizationattesttime

Input:S the setofExamplesAttributes isthesetofmeasuredattributes

11

ID3(S,Attributes):

Page 12: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

BasicDecisionTreeAlgorithm:ID3

1. Ifallexamplesarehavesamelabel:Returnasinglenodetreewiththelabel

2. Otherwise1. CreateaRoot node fortree2. A =attributeinAttributesthatbest classifiesS3. foreachpossiblevalue v ofthatAcantake:

1. AddanewtreebranchcorrespondingtoA=v2. LetSv bethesubsetofexamplesinS withA=v3. ifSv isempty:

addleafnodewiththecommonvalueofLabelinSElse:belowthisbranchaddthesubtree ID3(Sv,Attributes- {A},Label)

4. ReturnRoot node

why?

Forgeneralizationattesttime

Input:S the setofExamplesAttributes isthesetofmeasuredattributes

12

ID3(S,Attributes):

Page 13: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

BasicDecisionTreeAlgorithm:ID3

1. Ifallexamplesarehavesamelabel:Returnasinglenodetreewiththelabel

2. Otherwise1. CreateaRoot node fortree2. A =attributeinAttributesthatbest classifiesS3. foreachpossiblevalue v ofthatAcantake:

1. AddanewtreebranchcorrespondingtoA=v2. LetSv bethesubsetofexamplesinS withA=v3. ifSv isempty:

addleafnodewiththecommonvalueofLabelinSElse:belowthisbranchaddthesubtree ID3(Sv,Attributes- {A},Label)

4. ReturnRoot node

why?

Forgeneralizationattesttime

Input:S the setofExamplesAttributes isthesetofmeasuredattributes

13

ID3(S,Attributes):

Decidewhatattributegoesatthetop

Page 14: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

BasicDecisionTreeAlgorithm:ID3

1. Ifallexamplesarehavesamelabel:Returnasinglenodetreewiththelabel

2. Otherwise1. CreateaRoot node fortree2. A =attributeinAttributesthatbest classifiesS3. foreachpossiblevalue v ofthatAcantake:

1. AddanewtreebranchcorrespondingtoA=v2. LetSv bethesubsetofexamplesinS withA=v3. ifSv isempty:

addleafnodewiththecommonvalueofLabelinSElse:belowthisbranchaddthesubtree ID3(Sv,Attributes- {A},Label)

4. ReturnRoot node

why?

Forgeneralizationattesttime

Input:S the setofExamplesAttributes isthesetofmeasuredattributes

14

ID3(S,Attributes):

Decidewhatattributegoesatthetop

Page 15: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

BasicDecisionTreeAlgorithm:ID3

1. Ifallexamplesarehavesamelabel:Returnasinglenodetreewiththelabel

2. Otherwise1. CreateaRoot node fortree2. A =attributeinAttributesthatbest classifiesS3. foreachpossiblevalue v ofthatAcantake:

1. AddanewtreebranchcorrespondingtoA=v2. LetSv bethesubsetofexamplesinS withA=v3. ifSv isempty:

addleafnodewiththecommonvalueofLabelinSElse:belowthisbranchaddthesubtree ID3(Sv,Attributes- {A},Label)

4. ReturnRoot node

why?

Forgeneralizationattesttime

Input:S the setofExamplesAttributes isthesetofmeasuredattributes

15

ID3(S,Attributes):

Decidewhattodoforeachvaluetherootattributetakes

Page 16: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

BasicDecisionTreeAlgorithm:ID3

1. Ifallexamplesarehavesamelabel:Returnasinglenodetreewiththelabel

2. Otherwise1. CreateaRoot node fortree2. A =attributeinAttributesthatbest classifiesS3. foreachpossiblevalue v ofthatAcantake:

1. AddanewtreebranchforattributeAtakingvaluev2. LetSv bethesubsetofexamplesinS withA=v3. ifSv isempty:

addleafnodewiththecommonvalueofLabelinSElse:belowthisbranchaddthesubtree ID3(Sv,Attributes- {A},Label)

4. ReturnRoot node

why?

Forgeneralizationattesttime

Input:S the setofExamplesAttributes isthesetofmeasuredattributes

16

ID3(S,Attributes):

Decidewhattodoforeachvaluetherootattributetakes

Page 17: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

BasicDecisionTreeAlgorithm:ID3

1. Ifallexamplesarehavesamelabel:Returnasinglenodetreewiththelabel

2. Otherwise1. CreateaRoot node fortree2. A =attributeinAttributesthatbest classifiesS3. foreachpossiblevalue v ofthatAcantake:

1. AddanewtreebranchforattributeAtakingvaluev2. LetSv bethesubsetofexamplesinS withA=v3. ifSv isempty:

addleafnodewiththecommonvalueofLabelinSElse:belowthisbranchaddthesubtree ID3(Sv,Attributes- {A},Label)

4. ReturnRoot node

why?

Forgeneralizationattesttime

Input:S the setofExamplesAttributes isthesetofmeasuredattributes

17

ID3(S,Attributes):

Decidewhattodoforeachvaluetherootattributetakes

Page 18: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

BasicDecisionTreeAlgorithm:ID3

1. Ifallexamplesarehavesamelabel:Returnasinglenodetreewiththelabel

2. Otherwise1. CreateaRoot node fortree2. A =attributeinAttributesthatbest classifiesS3. foreachpossiblevalue v ofthatAcantake:

1. AddanewtreebranchforattributeAtakingvaluev2. LetSv bethesubsetofexamplesinS withA=v3. ifSv isempty:

addleafnodewiththecommonvalueofLabelinSElse:belowthisbranchaddthesubtree ID3(Sv,Attributes- {A},Label)

4. ReturnRoot node

Forgeneralizationattesttime

Input:S the setofExamplesAttributes isthesetofmeasuredattributes

18

ID3(S,Attributes):

Decidewhattodoforeachvaluetherootattributetakes

Page 19: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

BasicDecisionTreeAlgorithm:ID3

1. Ifallexamplesarehavesamelabel:Returnasinglenodetreewiththelabel

2. Otherwise1. CreateaRoot node fortree2. A =attributeinAttributesthatbest classifiesS3. foreachpossiblevalue v ofthatAcantake:

1. AddanewtreebranchforattributeAtakingvaluev2. LetSv bethesubsetofexamplesinS withA=v3. ifSv isempty:

addleafnodewiththecommonvalueofLabelinSElse:belowthisbranchaddthesubtree ID3(Sv,Attributes- {A},Label)

4. ReturnRoot node

why?

Input:S the setofExamplesAttributes isthesetofmeasuredattributes

19

ID3(S,Attributes):

Decidewhattodoforeachvaluetherootattributetakes

Page 20: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

BasicDecisionTreeAlgorithm:ID3

1. Ifallexamplesarehavesamelabel:Returnasinglenodetreewiththelabel

2. Otherwise1. CreateaRoot node fortree2. A =attributeinAttributesthatbest classifiesS3. foreachpossiblevalue v ofthatAcantake:

1. AddanewtreebranchforattributeAtakingvaluev2. LetSv bethesubsetofexamplesinS withA=v3. ifSv isempty:

addleafnodewiththecommonvalueofLabelinSElse:belowthisbranchaddthesubtree ID3(Sv,Attributes- {A},Label)

4. ReturnRoot node

why?

Input:S the setofExamplesAttributes isthesetofmeasuredattributes

20

ID3(S,Attributes):

Forgeneralizationattesttime

Decidewhattodoforeachvaluetherootattributetakes

Page 21: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

BasicDecisionTreeAlgorithm:ID3

1. Ifallexamplesarehavesamelabel:Returnasinglenodetreewiththelabel

2. Otherwise1. CreateaRoot node fortree2. A =attributeinAttributesthatbest classifiesS3. foreachpossiblevalue v ofthatAcantake:

1. AddanewtreebranchforattributeAtakingvaluev2. LetSv bethesubsetofexamplesinS withA=v3. ifSv isempty:

addleafnodewiththecommonvalueofLabelinSElse:belowthisbranchaddthesubtree ID3(Sv,Attributes- {A})

4. ReturnRoot node

why?

Input:S the setofExamplesAttributes isthesetofmeasuredattributes

21

ID3(S,Attributes):

Forgeneralizationattesttime

RecursivecalltotheID3algorithmwithalltheremainingattributes

Decidewhattodoforeachvaluetherootattributetakes

Page 22: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

PickingtheRootAttribute

• Goal:Havetheresultingdecisiontreeassmallaspossible(Occam’sRazor)– But,findingtheminimaldecisiontreeconsistentwithdataisNP-hard

• Therecursivealgorithmisagreedyheuristicsearchforasimpletree,butcannotguaranteeoptimality

• Themaindecisioninthealgorithmistheselectionofthenextattributetospliton

22

Page 23: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

PickingtheRootAttributeConsiderdatawithtwoBooleanattributes(A,B).

<(A=0,B=0),- >:50examples<(A=0,B=1),- >:50examples<(A=1,B=0),- >:0examples<(A=1,B=1),+ >:100examples

23

Page 24: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

PickingtheRootAttributeConsiderdatawithtwoBooleanattributes(A,B).

<(A=0,B=0),- >:50examples<(A=0,B=1),- >:50examples<(A=1,B=0),- >:0examples<(A=1,B=1),+ >:100examples

Whatshouldbethefirstattributeweselect?

24

Page 25: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

PickingtheRootAttributeConsiderdatawithtwoBooleanattributes(A,B).

<(A=0,B=0),- >:50examples<(A=0,B=1),- >:50examples<(A=1,B=0),- >:0examples<(A=1,B=1),+ >:100examples

A

+ -01

Whatshouldbethefirstattributeweselect?

SplittingonA: wegetpurelylabelednodes.

25

Page 26: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

PickingtheRootAttributeConsiderdatawithtwoBooleanattributes(A,B).

<(A=0,B=0),- >:50examples<(A=0,B=1),- >:50examples<(A=1,B=0),- >:0examples<(A=1,B=1),+ >:100examples

A

+ -01

B

-01

A

+ -01

SplittingonB:wedon’tgetpurelylabelednodes.

Whatshouldbethefirstattributeweselect?

SplittingonA: wegetpurelylabelednodes.

26

Page 27: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

PickingtheRootAttributeConsiderdatawithtwoBooleanattributes(A,B).

<(A=0,B=0),- >:50examples<(A=0,B=1),- >:50examples<(A=1,B=0),- >:0examples<(A=1,B=1),+ >:100examples

A

+ -01

B

-01

A

+ -01

SplittingonB:wedon’tgetpurelylabelednodes.

Whatifwehave:<(A=1,B=0),- >:3examples

Whatshouldbethefirstattributeweselect?

SplittingonA: wegetpurelylabelednodes.

27

Page 28: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

PickingtheRootAttributeConsiderdatawithtwoBooleanattributes(A,B).

<(A=0,B=0),- >:50examples<(A=0,B=1),- >:50examples<(A=1,B=0),- >:0examples3examples<(A=1,B=1),+ >:100examples

Whichattributeshouldwechoose?

28

Page 29: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

PickingtheRootAttributeConsiderdatawithtwoBooleanattributes(A,B).

<(A=0,B=0),- >:50examples<(A=0,B=1),- >:50examples<(A=1,B=0),- >:0examples3examples<(A=1,B=1),+ >:100examples

B

-

01

A

+ -01

Whichattributeshouldwechoose?

29

Page 30: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

PickingtheRootAttributeConsiderdatawithtwoBooleanattributes(A,B).

<(A=0,B=0),- >:50examples<(A=0,B=1),- >:50examples<(A=1,B=0),- >:0examples3examples<(A=1,B=1),+ >:100examples

B

-

01

A

+ -01

A

-

01

B

+ -

01

Whichattributeshouldwechoose?

30

Page 31: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

PickingtheRootAttributeConsiderdatawithtwoBooleanattributes(A,B).

<(A=0,B=0),- >:50examples<(A=0,B=1),- >:50examples<(A=1,B=0),- >:0examples3examples<(A=1,B=1),+ >:100examples

B

-

01

A

+ -01

A

-

01

B

+ -

01

Whichattributeshouldwechoose?

31

Treeslooksstructurallysimilar!

Page 32: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

PickingtheRootAttributeConsiderdatawithtwoBooleanattributes(A,B).

<(A=0,B=0),- >:50examples<(A=0,B=1),- >:50examples<(A=1,B=0),- >:0examples3examples<(A=1,B=1),+ >:100examples

B

-

01

A

+ -01

A

-

01

B

+ -

0153

50 3

100

100100

Whichattributeshouldwechoose?

32

Treeslooksstructurallysimilar!

Page 33: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

PickingtheRootAttributeConsiderdatawithtwoBooleanattributes(A,B).

<(A=0,B=0),- >:50examples<(A=0,B=1),- >:50examples<(A=1,B=0),- >:0examples3examples<(A=1,B=1),+ >:100examples

B

-

01

A

+ -01

A

-

01

B

+ -

0153

50 3

100

100100

Whichattributeshouldwechoose?

AdvantageA.But…Needawaytoquantifythings

33

Treeslooksstructurallysimilar!

Page 34: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

PickingtheRootAttribute

Goal:Havetheresultingdecisiontreeassmallaspossible(Occam’sRazor)

• Themaindecisioninthealgorithmistheselectionofthenextattributeforsplittingthedata

• Wewantattributesthatsplittheexamplestosetsthatarerelativelypureinonelabel– Thiswayweareclosertoaleafnode.

• Themostpopularheuristicisinformationgain,originatedwiththeID3systemofQuinlan

34

Page 35: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

Reminder:Entropy

Entropy(impurity,disorder)ofasetofexamplesSwithrespecttobinaryclassificationis

𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = 𝐻 𝑆 = −𝑝, log0 𝑝, − 𝑝1 log0 𝑝1

• Theproportionofpositiveexamplesis𝑝,• Theproportionofnegativeexamplesis𝑝1

Ingeneral,foradiscreteprobabilitydistributionwithKpossiblevalues,withprobabilities{𝑝4, 𝑝0,⋯ , 𝑝7} theentropyisgivenby

𝐻 𝑝4, 𝑝0,⋯ , 𝑝7 = −9𝑝: log0 𝑝:

;

:

35

Page 36: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

Reminder:Entropy

Entropy(impurity,disorder)ofasetofexamplesSwithrespecttobinaryclassificationis

𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = 𝐻 𝑆 = −𝑝, log0 𝑝, − 𝑝1 log0 𝑝1

• Theproportionofpositiveexamplesis𝑝,• Theproportionofnegativeexamplesis𝑝1

Ingeneral,foradiscreteprobabilitydistributionwithKpossiblevalues,withprobabilities{𝑝4, 𝑝0,⋯ , 𝑝7} theentropyisgivenby

𝐻 𝑝4, 𝑝0,⋯ , 𝑝7 = −9𝑝: log0 𝑝:

;

:

36

Page 37: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

Reminder:Entropy

Entropy(impurity,disorder)ofasetofexamplesSwithrespecttobinaryclassificationis

𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = 𝐻 𝑆 = −𝑝, log0 𝑝, − 𝑝1 log0 𝑝1

• Theproportionofpositiveexamplesis𝑝,• Theproportionofnegativeexamplesis𝑝1

• Ifallexamplesbelongtothesamecategory,thenentropy=0

• If𝑝+= 𝑝

−= 4

0thenentropy=1

37

Page 38: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

Reminder:Entropy

Entropy(impurity,disorder)ofasetofexamplesSwithrespecttobinaryclassificationis

𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = 𝐻 𝑆 = −𝑝, log0 𝑝, − 𝑝1 log0 𝑝1

• Theproportionofpositiveexamplesis𝑝,• Theproportionofnegativeexamplesis𝑝1

• Ifallexamplesbelongtothesamecategory,thenentropy=0

• If𝑝+= 𝑝

−= 4

0thenentropy=1

38

Entropycanbeviewedasthenumberofbitsrequired,onaverage,toencodeinformation.

Iftheprobabilityfor+is0.5,asinglebitisrequiredforeachexample;ifitis0.8,wecanuselessthen1bit.

Page 39: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

Reminder:Entropy

Entropy(impurity,disorder)ofasetofexamplesSwithrespecttobinaryclassificationis

𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = 𝐻 𝑆 = −𝑝, log0 𝑝, − 𝑝1 log0 𝑝1

• Theproportionofpositiveexamplesis𝑝,• Theproportionofnegativeexamplesis𝑝1

• Ifallexamplesbelongtothesamecategory,thenentropy=0

• If𝑝+= 𝑝

−= 4

0thenentropy=1

39

11

- +

1

- + - +

Page 40: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

Reminder:Entropy

Entropy(impurity,disorder)ofasetofexamplesSwithrespecttobinaryclassificationis

𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = 𝐻 𝑆 = −𝑝, log0 𝑝, − 𝑝1 log0 𝑝1

Theuniformdistributionhasthehighestentropy

1 1 1

40

Page 41: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

Reminder:Entropy

Entropy(impurity,disorder)ofasetofexamplesSwithrespecttobinaryclassificationis

𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = 𝐻 𝑆 = −𝑝, log0 𝑝, − 𝑝1 log0 𝑝1

Theuniformdistributionhasthehighestentropy

1 1 1

41

Page 42: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

Reminder:Entropy

Entropy(impurity,disorder)ofasetofexamplesSwithrespecttobinaryclassificationis

𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = 𝐻 𝑆 = −𝑝, log0 𝑝, − 𝑝1 log0 𝑝1

Theuniformdistributionhasthehighestentropy

1 1 1

42

HighEntropy:HighlevelofUncertainty

LowEntropy:LowUncertainty

Page 43: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

PickingtheRootAttribute

Goal:Havetheresultingdecisiontreeassmallaspossible(Occam’sRazor)

• Themaindecisioninthealgorithmistheselectionofthenextattributeforsplittingthedata

• Wewantattributesthatsplittheexamplestosetsthatarerelativelypureinonelabel– Thiswayweareclosertoaleafnode.

• Themostpopularheuristicisinformationgain,originatedwiththeID3systemofQuinlan

43

Intuition:Choosetheattributethatreducesthelabelentropythemost

Page 44: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

InformationGain

TheinformationgainofanattributeAistheexpectedreductioninentropycausedbypartitioningonthisattribute

Sv:thesubsetofexampleswherethevalueofattributeAissettovaluev

Entropyofpartitioningthedataiscalculatedbyweighingtheentropyofeachpartitionbyitssizerelativetotheoriginalset

– Partitionsoflowentropy(imbalancedsplits)leadtohighgain

GobacktocheckwhichoftheA,Bsplitsisbetter

44

Page 45: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

InformationGain

TheinformationgainofanattributeAistheexpectedreductioninentropycausedbypartitioningonthisattribute

Sv:thesubsetofexampleswherethevalueofattributeAissettovaluev

Entropyofpartitioningthedataiscalculatedbyweighingtheentropyofeachpartitionbyitssizerelativetotheoriginalset

– Partitionsoflowentropy(imbalancedsplits)leadtohighgain

GobacktocheckwhichoftheA,Bsplitsisbetter

45

Page 46: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

InformationGain

TheinformationgainofanattributeAistheexpectedreductioninentropycausedbypartitioningonthisattribute

Sv:thesubsetofexampleswherethevalueofattributeAissettovaluev

Entropyofpartitioningthedataiscalculatedbyweighingtheentropyofeachpartitionbyitssizerelativetotheoriginalset

– Partitionsoflowentropy(imbalancedsplits)leadtohighgain

GobacktocheckwhichoftheA,Bsplitsisbetter

46

Page 47: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

InformationGain

TheinformationgainofanattributeAistheexpectedreductioninentropycausedbypartitioningonthisattribute

Sv:thesubsetofexampleswherethevalueofattributeAissettovaluev

Entropyofpartitioningthedataiscalculatedbyweighingtheentropyofeachpartitionbyitssizerelativetotheoriginalset

– Partitionsoflowentropy(imbalancedsplits)leadtohighgain

GobacktocheckwhichoftheA,Bsplitsisbetter

47

HighEntropy:HighlevelofUncertainty

LowEntropy:LowUncertainty

Page 48: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

WillIplaytennistoday?Outlook: S(unny),

O(vercast),R(ainy)

Temperature: H(ot),M(edium),C(ool)

Humidity: H(igh),N(ormal),L(ow)

Wind: S(trong),W(eak)

48

O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -

Page 49: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

WillIplaytennistoday?

Currententropy:p =9/14n=5/14

H(Play?) =−(9/14)log2(9/14)−(5/14)log2(5/14)» 0.94

49

O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -

Page 50: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

InformationGain:Outlook

50

O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -

Page 51: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

InformationGain:OutlookOutlook=sunny:5of14examples

p=2/5n=3/5 HS=0.971

51

O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -

Page 52: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

InformationGain:OutlookOutlook=sunny:5of14examples

p=2/5n=3/5 HS=0.971

Outlook=overcast: 4of14examplesp=4/4n=0 Ho=0

Outlook=rainy: 5of14examplesp=3/5n=2/5 HR=0.971

Expectedentropy:

Informationgain:0.940– 0.694=0.246

52

(5/14)×0.971+(4/14)×0 +(5/14)×0.971=0.694

O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -

Page 53: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

InformationGain:OutlookOutlook=sunny:5of14examples

p=2/5n=3/5 HS=0.971

Outlook=overcast: 4of14examplesp=4/4n=0 Ho=0

Outlook=rainy: 5of14examplesp=3/5n=2/5 HR=0.971

Expectedentropy:

Informationgain:0.940– 0.694=0.246

53

(5/14)×0.971+(4/14)×0 +(5/14)×0.971=0.694

O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -

Page 54: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

InformationGain:HumidityHumidity=high:

p=3/7n=4/7 Hh =0.985Humidity=Normal:

p=6/7n=1/7 Ho=0.592

Expectedentropy:(7/14)×0.985+(7/14)×0.592=0.7885

Informationgain:0.940– 0.7885=0.1515

54

O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -

Page 55: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

InformationGain:HumidityHumidity=High:

p=3/7n=4/7 Hh =0.985Humidity=Normal:

p=6/7n=1/7 Ho=0.592

Expectedentropy:(7/14)×0.985+(7/14)×0.592=0.7885

Informationgain:0.940– 0.7885=0.1515

55

O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -

Page 56: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

InformationGain:HumidityHumidity=High:

p=3/7n=4/7 Hh =0.985Humidity=Normal:

p=6/7n=1/7 Ho=0.592

Expectedentropy:(7/14)×0.985+(7/14)×0.592=0.7885

Informationgain:0.940– 0.7885=0.1515

56

O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -

Page 57: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

InformationGain:HumidityHumidity=High:

p=3/7n=4/7 Hh =0.985Humidity=Normal:

p=6/7n=1/7 Ho=0.592

Expectedentropy:(7/14)×0.985+(7/14)×0.592=0.7885

Informationgain:0.940– 0.7885=0.1515

57

O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -

Page 58: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

Whichfeaturetospliton?Informationgain:

Outlook:0.246Humidity:0.151Wind:0.048Temperature:0.029

58

O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -

Page 59: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

Whichfeaturetospliton?Informationgain:

Outlook:0.246Humidity:0.151Wind:0.048Temperature:0.029

→SplitonOutlook

59

O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -

Page 60: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

AnIllustrativeExample

OutlookGain(S,Humidity)=0.151Gain(S,Wind)=0.048Gain(S,Temperature)=0.029Gain(S,Outlook)=0.246

60

Page 61: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

AnIllustrativeExample

Outlook

Overcast Rain3,7,12,13 4,5,6,10,14

3+,2-

Sunny1,2,8,9,11

4+,0-2+,3-Yes? ?

61

O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -

Page 62: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

AnIllustrativeExample

Outlook

Overcast Rain3,7,12,13 4,5,6,10,14

3+,2-

Sunny1,2,8,9,11

4+,0-2+,3-Yes? ?

Continueuntil:• Everyattributeisincludedinpath, or,• Allexamplesintheleafhavesamelabel

62

O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -

Page 63: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

AnIllustrativeExampleGain(Ssunny,Humidity)=.97-(3/5)0-(2/5)0=.97

Gain(Ssunny,Temp)=.97- 0-(2/5)1=.57

Gain(Ssunny,wind)=.97-(2/5)1- (3/5).92=.02

Day Outlook TemperatureHumidityWindPlayTennis1Sunny Hot High WeakNo2Sunny Hot High StrongNo8Sunny Mild High WeakNo9Sunny Cool Normal Weak Yes11Sunny Mild NormalStrong Yes

Outlook

Overcast Rain3,7,12,13 4,5,6,10,14

3+,2-

Sunny1,2,8,9,11

4+,0-2+,3-Yes? ?

63

Page 64: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

AnIllustrativeExampleOutlook

Overcast Rain

3,7,12,13 4,5,6,10,143+,2-

Sunny1,2,8,9,11

4+,0-2+,3-YesHumidity

NormalHighNo Yes

64

Page 65: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

AnIllustrativeExampleOutlook

Overcast Rain

3,7,12,13 4,5,6,10,143+,2-

Sunny1,2,8,9,11

4+,0-2+,3-YesHumidity Wind

NormalHighNo Yes

WeakStrongNo Yes

65

Page 66: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

HypothesisSpaceinDecisionTreeInduction

• Searchoverdecisiontrees,whichcanrepresentallpossiblediscretefunctions(hasprosandcons)

• Goal:tofindthebest decisiontree

• FindingaminimaldecisiontreeconsistentwithasetofdataisNP-hard.

• ID3performsagreedyheuristicsearch:hillclimbingwithoutbacktracking

• Makesstatisticallybaseddecisionsusingalldata

66

Page 67: Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have same label: Return a single node tree with the label 2.Otherwise 1.Create a Root

Summary:LearningDecisionTrees

1. Representation:Whataredecisiontrees?– Ahierarchicaldatastructurethatrepresentsdata

2. Algorithm:LearningdecisiontreesTheID3algorithm:Agreedyheuristic

• Ifalltheexampleshavethesamelabel,createaleafwiththatlabel

• Otherwise,findthe“mostinformative”attributeandsplitthedatafordifferentvaluesofthatattributes

• Recurse onthesplits

67