Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have...
Transcript of Learning Decision Trees - svivek · Basic Decision Tree Algorithm: ID3 1.If all examples are have...
MachineLearning
LearningDecisionTrees
1SomeslidesfromTomMitchell,DanRothandothers
Thislecture:LearningDecisionTrees
1. Representation:Whataredecisiontrees?
2. Algorithm:Learningdecisiontrees– TheID3algorithm:Agreedyheuristic
3. Someextensions
2
Thislecture:LearningDecisionTrees
1. Representation:Whataredecisiontrees?
2. Algorithm:Learningdecisiontrees– TheID3algorithm:Agreedyheuristic
3. Someextensions
3
HistoryofDecisionTreeResearch
• Fullsearchdecisiontreemethodstomodelhumanconceptlearning:Huntetal60s,psychology
• QuinlandevelopedtheID3(IterativeDichotomiser 3)algorithm,withtheinformationgainheuristictolearnexpertsystemsfromexamples(late70s)
• Breiman,FreidmanandcolleaguesinstatisticsdevelopedCART (ClassificationAndRegressionTrees)
• Avarietyofimprovementsinthe80s:copingwithnoise,continuousattributes,missingdata,non-axisparallel,etc.
• Quinlan’supdatedalgorithms,C4.5(1993)andC5aremorecommonlyused
• Boosting(orBagging)overdecisiontreesisaverygoodgeneralpurposealgorithm
4
WillIplaytennistoday?
• Features– Outlook: {Sun,Overcast,Rain}– Temperature: {Hot,Mild,Cool}– Humidity: {High,Normal,Low}– Wind: {Strong,Weak}
• Labels– Binaryclassificationtask:Y={+,-}
5
WillIplaytennistoday?Outlook: Sunny,
Overcast,Rainy
Temperature: Hot,Medium,Cool
Humidity: High,Normal,Low
Wind: Strong,Weak
6
O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -
BasicDecisionTreeLearningAlgorithm
• DataisprocessedinBatch(i.e.allthedataavailable)
7
O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -
BasicDecisionTreeLearningAlgorithm
• DataisprocessedinBatch(i.e.allthedataavailable)• Recursivelybuildadecisiontreetopdown.
8
O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -
Outlook?
Sunny Overcast Rain
Humidity?
High Normal
Wind?
Strong Weak
No Yes
Yes
YesNo
BasicDecisionTreeLearningAlgorithm
• DataisprocessedinBatch(i.e.allthedataavailable)• Recursivelybuildadecisiontreetopdown.
9
O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -
Outlook?
Sunny Overcast Rain
Humidity?
High Normal
Wind?
Strong Weak
No Yes
Yes
YesNo
1. Decidewhatattributegoesatthetop
BasicDecisionTreeLearningAlgorithm
• DataisprocessedinBatch(i.e.allthedataavailable)• Recursivelybuildadecisiontreetopdown.
10
O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -
Outlook?
Sunny Overcast Rain
Humidity?
High Normal
Wind?
Strong Weak
No Yes
Yes
YesNo
1. Decidewhatattributegoesatthetop
2. Decidewhattodoforeachvaluetherootattributetakes
BasicDecisionTreeAlgorithm:ID3
1. Ifallexamplesarehavesamelabel:Returnasinglenodetreewiththelabel
2. Otherwise1. CreateaRoot node fortree2. A =attributeinAttributesthatbest classifiesS3. foreachpossiblevalue v ofthatAcantake:
1. AddanewtreebranchcorrespondingtoA=v2. LetSv bethesubsetofexamplesinS withA=v3. ifSv isempty:
addleafnodewiththecommonvalueofLabelinSElse:belowthisbranchaddthesubtree ID3(Sv,Attributes- {A},Label)
4. ReturnRoot node
why?
Forgeneralizationattesttime
Input:S the setofExamplesAttributes isthesetofmeasuredattributes
11
ID3(S,Attributes):
BasicDecisionTreeAlgorithm:ID3
1. Ifallexamplesarehavesamelabel:Returnasinglenodetreewiththelabel
2. Otherwise1. CreateaRoot node fortree2. A =attributeinAttributesthatbest classifiesS3. foreachpossiblevalue v ofthatAcantake:
1. AddanewtreebranchcorrespondingtoA=v2. LetSv bethesubsetofexamplesinS withA=v3. ifSv isempty:
addleafnodewiththecommonvalueofLabelinSElse:belowthisbranchaddthesubtree ID3(Sv,Attributes- {A},Label)
4. ReturnRoot node
why?
Forgeneralizationattesttime
Input:S the setofExamplesAttributes isthesetofmeasuredattributes
12
ID3(S,Attributes):
BasicDecisionTreeAlgorithm:ID3
1. Ifallexamplesarehavesamelabel:Returnasinglenodetreewiththelabel
2. Otherwise1. CreateaRoot node fortree2. A =attributeinAttributesthatbest classifiesS3. foreachpossiblevalue v ofthatAcantake:
1. AddanewtreebranchcorrespondingtoA=v2. LetSv bethesubsetofexamplesinS withA=v3. ifSv isempty:
addleafnodewiththecommonvalueofLabelinSElse:belowthisbranchaddthesubtree ID3(Sv,Attributes- {A},Label)
4. ReturnRoot node
why?
Forgeneralizationattesttime
Input:S the setofExamplesAttributes isthesetofmeasuredattributes
13
ID3(S,Attributes):
Decidewhatattributegoesatthetop
BasicDecisionTreeAlgorithm:ID3
1. Ifallexamplesarehavesamelabel:Returnasinglenodetreewiththelabel
2. Otherwise1. CreateaRoot node fortree2. A =attributeinAttributesthatbest classifiesS3. foreachpossiblevalue v ofthatAcantake:
1. AddanewtreebranchcorrespondingtoA=v2. LetSv bethesubsetofexamplesinS withA=v3. ifSv isempty:
addleafnodewiththecommonvalueofLabelinSElse:belowthisbranchaddthesubtree ID3(Sv,Attributes- {A},Label)
4. ReturnRoot node
why?
Forgeneralizationattesttime
Input:S the setofExamplesAttributes isthesetofmeasuredattributes
14
ID3(S,Attributes):
Decidewhatattributegoesatthetop
BasicDecisionTreeAlgorithm:ID3
1. Ifallexamplesarehavesamelabel:Returnasinglenodetreewiththelabel
2. Otherwise1. CreateaRoot node fortree2. A =attributeinAttributesthatbest classifiesS3. foreachpossiblevalue v ofthatAcantake:
1. AddanewtreebranchcorrespondingtoA=v2. LetSv bethesubsetofexamplesinS withA=v3. ifSv isempty:
addleafnodewiththecommonvalueofLabelinSElse:belowthisbranchaddthesubtree ID3(Sv,Attributes- {A},Label)
4. ReturnRoot node
why?
Forgeneralizationattesttime
Input:S the setofExamplesAttributes isthesetofmeasuredattributes
15
ID3(S,Attributes):
Decidewhattodoforeachvaluetherootattributetakes
BasicDecisionTreeAlgorithm:ID3
1. Ifallexamplesarehavesamelabel:Returnasinglenodetreewiththelabel
2. Otherwise1. CreateaRoot node fortree2. A =attributeinAttributesthatbest classifiesS3. foreachpossiblevalue v ofthatAcantake:
1. AddanewtreebranchforattributeAtakingvaluev2. LetSv bethesubsetofexamplesinS withA=v3. ifSv isempty:
addleafnodewiththecommonvalueofLabelinSElse:belowthisbranchaddthesubtree ID3(Sv,Attributes- {A},Label)
4. ReturnRoot node
why?
Forgeneralizationattesttime
Input:S the setofExamplesAttributes isthesetofmeasuredattributes
16
ID3(S,Attributes):
Decidewhattodoforeachvaluetherootattributetakes
BasicDecisionTreeAlgorithm:ID3
1. Ifallexamplesarehavesamelabel:Returnasinglenodetreewiththelabel
2. Otherwise1. CreateaRoot node fortree2. A =attributeinAttributesthatbest classifiesS3. foreachpossiblevalue v ofthatAcantake:
1. AddanewtreebranchforattributeAtakingvaluev2. LetSv bethesubsetofexamplesinS withA=v3. ifSv isempty:
addleafnodewiththecommonvalueofLabelinSElse:belowthisbranchaddthesubtree ID3(Sv,Attributes- {A},Label)
4. ReturnRoot node
why?
Forgeneralizationattesttime
Input:S the setofExamplesAttributes isthesetofmeasuredattributes
17
ID3(S,Attributes):
Decidewhattodoforeachvaluetherootattributetakes
BasicDecisionTreeAlgorithm:ID3
1. Ifallexamplesarehavesamelabel:Returnasinglenodetreewiththelabel
2. Otherwise1. CreateaRoot node fortree2. A =attributeinAttributesthatbest classifiesS3. foreachpossiblevalue v ofthatAcantake:
1. AddanewtreebranchforattributeAtakingvaluev2. LetSv bethesubsetofexamplesinS withA=v3. ifSv isempty:
addleafnodewiththecommonvalueofLabelinSElse:belowthisbranchaddthesubtree ID3(Sv,Attributes- {A},Label)
4. ReturnRoot node
Forgeneralizationattesttime
Input:S the setofExamplesAttributes isthesetofmeasuredattributes
18
ID3(S,Attributes):
Decidewhattodoforeachvaluetherootattributetakes
BasicDecisionTreeAlgorithm:ID3
1. Ifallexamplesarehavesamelabel:Returnasinglenodetreewiththelabel
2. Otherwise1. CreateaRoot node fortree2. A =attributeinAttributesthatbest classifiesS3. foreachpossiblevalue v ofthatAcantake:
1. AddanewtreebranchforattributeAtakingvaluev2. LetSv bethesubsetofexamplesinS withA=v3. ifSv isempty:
addleafnodewiththecommonvalueofLabelinSElse:belowthisbranchaddthesubtree ID3(Sv,Attributes- {A},Label)
4. ReturnRoot node
why?
Input:S the setofExamplesAttributes isthesetofmeasuredattributes
19
ID3(S,Attributes):
Decidewhattodoforeachvaluetherootattributetakes
BasicDecisionTreeAlgorithm:ID3
1. Ifallexamplesarehavesamelabel:Returnasinglenodetreewiththelabel
2. Otherwise1. CreateaRoot node fortree2. A =attributeinAttributesthatbest classifiesS3. foreachpossiblevalue v ofthatAcantake:
1. AddanewtreebranchforattributeAtakingvaluev2. LetSv bethesubsetofexamplesinS withA=v3. ifSv isempty:
addleafnodewiththecommonvalueofLabelinSElse:belowthisbranchaddthesubtree ID3(Sv,Attributes- {A},Label)
4. ReturnRoot node
why?
Input:S the setofExamplesAttributes isthesetofmeasuredattributes
20
ID3(S,Attributes):
Forgeneralizationattesttime
Decidewhattodoforeachvaluetherootattributetakes
BasicDecisionTreeAlgorithm:ID3
1. Ifallexamplesarehavesamelabel:Returnasinglenodetreewiththelabel
2. Otherwise1. CreateaRoot node fortree2. A =attributeinAttributesthatbest classifiesS3. foreachpossiblevalue v ofthatAcantake:
1. AddanewtreebranchforattributeAtakingvaluev2. LetSv bethesubsetofexamplesinS withA=v3. ifSv isempty:
addleafnodewiththecommonvalueofLabelinSElse:belowthisbranchaddthesubtree ID3(Sv,Attributes- {A})
4. ReturnRoot node
why?
Input:S the setofExamplesAttributes isthesetofmeasuredattributes
21
ID3(S,Attributes):
Forgeneralizationattesttime
RecursivecalltotheID3algorithmwithalltheremainingattributes
Decidewhattodoforeachvaluetherootattributetakes
PickingtheRootAttribute
• Goal:Havetheresultingdecisiontreeassmallaspossible(Occam’sRazor)– But,findingtheminimaldecisiontreeconsistentwithdataisNP-hard
• Therecursivealgorithmisagreedyheuristicsearchforasimpletree,butcannotguaranteeoptimality
• Themaindecisioninthealgorithmistheselectionofthenextattributetospliton
22
PickingtheRootAttributeConsiderdatawithtwoBooleanattributes(A,B).
<(A=0,B=0),- >:50examples<(A=0,B=1),- >:50examples<(A=1,B=0),- >:0examples<(A=1,B=1),+ >:100examples
23
PickingtheRootAttributeConsiderdatawithtwoBooleanattributes(A,B).
<(A=0,B=0),- >:50examples<(A=0,B=1),- >:50examples<(A=1,B=0),- >:0examples<(A=1,B=1),+ >:100examples
Whatshouldbethefirstattributeweselect?
24
PickingtheRootAttributeConsiderdatawithtwoBooleanattributes(A,B).
<(A=0,B=0),- >:50examples<(A=0,B=1),- >:50examples<(A=1,B=0),- >:0examples<(A=1,B=1),+ >:100examples
A
+ -01
Whatshouldbethefirstattributeweselect?
SplittingonA: wegetpurelylabelednodes.
25
PickingtheRootAttributeConsiderdatawithtwoBooleanattributes(A,B).
<(A=0,B=0),- >:50examples<(A=0,B=1),- >:50examples<(A=1,B=0),- >:0examples<(A=1,B=1),+ >:100examples
A
+ -01
B
-01
A
+ -01
SplittingonB:wedon’tgetpurelylabelednodes.
Whatshouldbethefirstattributeweselect?
SplittingonA: wegetpurelylabelednodes.
26
PickingtheRootAttributeConsiderdatawithtwoBooleanattributes(A,B).
<(A=0,B=0),- >:50examples<(A=0,B=1),- >:50examples<(A=1,B=0),- >:0examples<(A=1,B=1),+ >:100examples
A
+ -01
B
-01
A
+ -01
SplittingonB:wedon’tgetpurelylabelednodes.
Whatifwehave:<(A=1,B=0),- >:3examples
Whatshouldbethefirstattributeweselect?
SplittingonA: wegetpurelylabelednodes.
27
PickingtheRootAttributeConsiderdatawithtwoBooleanattributes(A,B).
<(A=0,B=0),- >:50examples<(A=0,B=1),- >:50examples<(A=1,B=0),- >:0examples3examples<(A=1,B=1),+ >:100examples
Whichattributeshouldwechoose?
28
PickingtheRootAttributeConsiderdatawithtwoBooleanattributes(A,B).
<(A=0,B=0),- >:50examples<(A=0,B=1),- >:50examples<(A=1,B=0),- >:0examples3examples<(A=1,B=1),+ >:100examples
B
-
01
A
+ -01
Whichattributeshouldwechoose?
29
PickingtheRootAttributeConsiderdatawithtwoBooleanattributes(A,B).
<(A=0,B=0),- >:50examples<(A=0,B=1),- >:50examples<(A=1,B=0),- >:0examples3examples<(A=1,B=1),+ >:100examples
B
-
01
A
+ -01
A
-
01
B
+ -
01
Whichattributeshouldwechoose?
30
PickingtheRootAttributeConsiderdatawithtwoBooleanattributes(A,B).
<(A=0,B=0),- >:50examples<(A=0,B=1),- >:50examples<(A=1,B=0),- >:0examples3examples<(A=1,B=1),+ >:100examples
B
-
01
A
+ -01
A
-
01
B
+ -
01
Whichattributeshouldwechoose?
31
Treeslooksstructurallysimilar!
PickingtheRootAttributeConsiderdatawithtwoBooleanattributes(A,B).
<(A=0,B=0),- >:50examples<(A=0,B=1),- >:50examples<(A=1,B=0),- >:0examples3examples<(A=1,B=1),+ >:100examples
B
-
01
A
+ -01
A
-
01
B
+ -
0153
50 3
100
100100
Whichattributeshouldwechoose?
32
Treeslooksstructurallysimilar!
PickingtheRootAttributeConsiderdatawithtwoBooleanattributes(A,B).
<(A=0,B=0),- >:50examples<(A=0,B=1),- >:50examples<(A=1,B=0),- >:0examples3examples<(A=1,B=1),+ >:100examples
B
-
01
A
+ -01
A
-
01
B
+ -
0153
50 3
100
100100
Whichattributeshouldwechoose?
AdvantageA.But…Needawaytoquantifythings
33
Treeslooksstructurallysimilar!
PickingtheRootAttribute
Goal:Havetheresultingdecisiontreeassmallaspossible(Occam’sRazor)
• Themaindecisioninthealgorithmistheselectionofthenextattributeforsplittingthedata
• Wewantattributesthatsplittheexamplestosetsthatarerelativelypureinonelabel– Thiswayweareclosertoaleafnode.
• Themostpopularheuristicisinformationgain,originatedwiththeID3systemofQuinlan
34
Reminder:Entropy
Entropy(impurity,disorder)ofasetofexamplesSwithrespecttobinaryclassificationis
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = 𝐻 𝑆 = −𝑝, log0 𝑝, − 𝑝1 log0 𝑝1
• Theproportionofpositiveexamplesis𝑝,• Theproportionofnegativeexamplesis𝑝1
Ingeneral,foradiscreteprobabilitydistributionwithKpossiblevalues,withprobabilities{𝑝4, 𝑝0,⋯ , 𝑝7} theentropyisgivenby
𝐻 𝑝4, 𝑝0,⋯ , 𝑝7 = −9𝑝: log0 𝑝:
;
:
35
Reminder:Entropy
Entropy(impurity,disorder)ofasetofexamplesSwithrespecttobinaryclassificationis
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = 𝐻 𝑆 = −𝑝, log0 𝑝, − 𝑝1 log0 𝑝1
• Theproportionofpositiveexamplesis𝑝,• Theproportionofnegativeexamplesis𝑝1
Ingeneral,foradiscreteprobabilitydistributionwithKpossiblevalues,withprobabilities{𝑝4, 𝑝0,⋯ , 𝑝7} theentropyisgivenby
𝐻 𝑝4, 𝑝0,⋯ , 𝑝7 = −9𝑝: log0 𝑝:
;
:
36
Reminder:Entropy
Entropy(impurity,disorder)ofasetofexamplesSwithrespecttobinaryclassificationis
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = 𝐻 𝑆 = −𝑝, log0 𝑝, − 𝑝1 log0 𝑝1
• Theproportionofpositiveexamplesis𝑝,• Theproportionofnegativeexamplesis𝑝1
• Ifallexamplesbelongtothesamecategory,thenentropy=0
• If𝑝+= 𝑝
−= 4
0thenentropy=1
37
Reminder:Entropy
Entropy(impurity,disorder)ofasetofexamplesSwithrespecttobinaryclassificationis
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = 𝐻 𝑆 = −𝑝, log0 𝑝, − 𝑝1 log0 𝑝1
• Theproportionofpositiveexamplesis𝑝,• Theproportionofnegativeexamplesis𝑝1
• Ifallexamplesbelongtothesamecategory,thenentropy=0
• If𝑝+= 𝑝
−= 4
0thenentropy=1
38
Entropycanbeviewedasthenumberofbitsrequired,onaverage,toencodeinformation.
Iftheprobabilityfor+is0.5,asinglebitisrequiredforeachexample;ifitis0.8,wecanuselessthen1bit.
Reminder:Entropy
Entropy(impurity,disorder)ofasetofexamplesSwithrespecttobinaryclassificationis
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = 𝐻 𝑆 = −𝑝, log0 𝑝, − 𝑝1 log0 𝑝1
• Theproportionofpositiveexamplesis𝑝,• Theproportionofnegativeexamplesis𝑝1
• Ifallexamplesbelongtothesamecategory,thenentropy=0
• If𝑝+= 𝑝
−= 4
0thenentropy=1
39
11
- +
1
- + - +
Reminder:Entropy
Entropy(impurity,disorder)ofasetofexamplesSwithrespecttobinaryclassificationis
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = 𝐻 𝑆 = −𝑝, log0 𝑝, − 𝑝1 log0 𝑝1
Theuniformdistributionhasthehighestentropy
1 1 1
40
Reminder:Entropy
Entropy(impurity,disorder)ofasetofexamplesSwithrespecttobinaryclassificationis
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = 𝐻 𝑆 = −𝑝, log0 𝑝, − 𝑝1 log0 𝑝1
Theuniformdistributionhasthehighestentropy
1 1 1
41
Reminder:Entropy
Entropy(impurity,disorder)ofasetofexamplesSwithrespecttobinaryclassificationis
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = 𝐻 𝑆 = −𝑝, log0 𝑝, − 𝑝1 log0 𝑝1
Theuniformdistributionhasthehighestentropy
1 1 1
42
HighEntropy:HighlevelofUncertainty
LowEntropy:LowUncertainty
PickingtheRootAttribute
Goal:Havetheresultingdecisiontreeassmallaspossible(Occam’sRazor)
• Themaindecisioninthealgorithmistheselectionofthenextattributeforsplittingthedata
• Wewantattributesthatsplittheexamplestosetsthatarerelativelypureinonelabel– Thiswayweareclosertoaleafnode.
• Themostpopularheuristicisinformationgain,originatedwiththeID3systemofQuinlan
43
Intuition:Choosetheattributethatreducesthelabelentropythemost
InformationGain
TheinformationgainofanattributeAistheexpectedreductioninentropycausedbypartitioningonthisattribute
Sv:thesubsetofexampleswherethevalueofattributeAissettovaluev
Entropyofpartitioningthedataiscalculatedbyweighingtheentropyofeachpartitionbyitssizerelativetotheoriginalset
– Partitionsoflowentropy(imbalancedsplits)leadtohighgain
GobacktocheckwhichoftheA,Bsplitsisbetter
44
InformationGain
TheinformationgainofanattributeAistheexpectedreductioninentropycausedbypartitioningonthisattribute
Sv:thesubsetofexampleswherethevalueofattributeAissettovaluev
Entropyofpartitioningthedataiscalculatedbyweighingtheentropyofeachpartitionbyitssizerelativetotheoriginalset
– Partitionsoflowentropy(imbalancedsplits)leadtohighgain
GobacktocheckwhichoftheA,Bsplitsisbetter
45
InformationGain
TheinformationgainofanattributeAistheexpectedreductioninentropycausedbypartitioningonthisattribute
Sv:thesubsetofexampleswherethevalueofattributeAissettovaluev
Entropyofpartitioningthedataiscalculatedbyweighingtheentropyofeachpartitionbyitssizerelativetotheoriginalset
– Partitionsoflowentropy(imbalancedsplits)leadtohighgain
GobacktocheckwhichoftheA,Bsplitsisbetter
46
InformationGain
TheinformationgainofanattributeAistheexpectedreductioninentropycausedbypartitioningonthisattribute
Sv:thesubsetofexampleswherethevalueofattributeAissettovaluev
Entropyofpartitioningthedataiscalculatedbyweighingtheentropyofeachpartitionbyitssizerelativetotheoriginalset
– Partitionsoflowentropy(imbalancedsplits)leadtohighgain
GobacktocheckwhichoftheA,Bsplitsisbetter
47
HighEntropy:HighlevelofUncertainty
LowEntropy:LowUncertainty
WillIplaytennistoday?Outlook: S(unny),
O(vercast),R(ainy)
Temperature: H(ot),M(edium),C(ool)
Humidity: H(igh),N(ormal),L(ow)
Wind: S(trong),W(eak)
48
O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -
WillIplaytennistoday?
Currententropy:p =9/14n=5/14
H(Play?) =−(9/14)log2(9/14)−(5/14)log2(5/14)» 0.94
49
O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -
InformationGain:Outlook
50
O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -
InformationGain:OutlookOutlook=sunny:5of14examples
p=2/5n=3/5 HS=0.971
51
O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -
InformationGain:OutlookOutlook=sunny:5of14examples
p=2/5n=3/5 HS=0.971
Outlook=overcast: 4of14examplesp=4/4n=0 Ho=0
Outlook=rainy: 5of14examplesp=3/5n=2/5 HR=0.971
Expectedentropy:
Informationgain:0.940– 0.694=0.246
52
(5/14)×0.971+(4/14)×0 +(5/14)×0.971=0.694
O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -
InformationGain:OutlookOutlook=sunny:5of14examples
p=2/5n=3/5 HS=0.971
Outlook=overcast: 4of14examplesp=4/4n=0 Ho=0
Outlook=rainy: 5of14examplesp=3/5n=2/5 HR=0.971
Expectedentropy:
Informationgain:0.940– 0.694=0.246
53
(5/14)×0.971+(4/14)×0 +(5/14)×0.971=0.694
O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -
InformationGain:HumidityHumidity=high:
p=3/7n=4/7 Hh =0.985Humidity=Normal:
p=6/7n=1/7 Ho=0.592
Expectedentropy:(7/14)×0.985+(7/14)×0.592=0.7885
Informationgain:0.940– 0.7885=0.1515
54
O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -
InformationGain:HumidityHumidity=High:
p=3/7n=4/7 Hh =0.985Humidity=Normal:
p=6/7n=1/7 Ho=0.592
Expectedentropy:(7/14)×0.985+(7/14)×0.592=0.7885
Informationgain:0.940– 0.7885=0.1515
55
O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -
InformationGain:HumidityHumidity=High:
p=3/7n=4/7 Hh =0.985Humidity=Normal:
p=6/7n=1/7 Ho=0.592
Expectedentropy:(7/14)×0.985+(7/14)×0.592=0.7885
Informationgain:0.940– 0.7885=0.1515
56
O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -
InformationGain:HumidityHumidity=High:
p=3/7n=4/7 Hh =0.985Humidity=Normal:
p=6/7n=1/7 Ho=0.592
Expectedentropy:(7/14)×0.985+(7/14)×0.592=0.7885
Informationgain:0.940– 0.7885=0.1515
57
O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -
Whichfeaturetospliton?Informationgain:
Outlook:0.246Humidity:0.151Wind:0.048Temperature:0.029
58
O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -
Whichfeaturetospliton?Informationgain:
Outlook:0.246Humidity:0.151Wind:0.048Temperature:0.029
→SplitonOutlook
59
O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -
AnIllustrativeExample
OutlookGain(S,Humidity)=0.151Gain(S,Wind)=0.048Gain(S,Temperature)=0.029Gain(S,Outlook)=0.246
60
AnIllustrativeExample
Outlook
Overcast Rain3,7,12,13 4,5,6,10,14
3+,2-
Sunny1,2,8,9,11
4+,0-2+,3-Yes? ?
61
O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -
AnIllustrativeExample
Outlook
Overcast Rain3,7,12,13 4,5,6,10,14
3+,2-
Sunny1,2,8,9,11
4+,0-2+,3-Yes? ?
Continueuntil:• Everyattributeisincludedinpath, or,• Allexamplesintheleafhavesamelabel
62
O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -
AnIllustrativeExampleGain(Ssunny,Humidity)=.97-(3/5)0-(2/5)0=.97
Gain(Ssunny,Temp)=.97- 0-(2/5)1=.57
Gain(Ssunny,wind)=.97-(2/5)1- (3/5).92=.02
Day Outlook TemperatureHumidityWindPlayTennis1Sunny Hot High WeakNo2Sunny Hot High StrongNo8Sunny Mild High WeakNo9Sunny Cool Normal Weak Yes11Sunny Mild NormalStrong Yes
Outlook
Overcast Rain3,7,12,13 4,5,6,10,14
3+,2-
Sunny1,2,8,9,11
4+,0-2+,3-Yes? ?
63
AnIllustrativeExampleOutlook
Overcast Rain
3,7,12,13 4,5,6,10,143+,2-
Sunny1,2,8,9,11
4+,0-2+,3-YesHumidity
NormalHighNo Yes
64
AnIllustrativeExampleOutlook
Overcast Rain
3,7,12,13 4,5,6,10,143+,2-
Sunny1,2,8,9,11
4+,0-2+,3-YesHumidity Wind
NormalHighNo Yes
WeakStrongNo Yes
65
HypothesisSpaceinDecisionTreeInduction
• Searchoverdecisiontrees,whichcanrepresentallpossiblediscretefunctions(hasprosandcons)
• Goal:tofindthebest decisiontree
• FindingaminimaldecisiontreeconsistentwithasetofdataisNP-hard.
• ID3performsagreedyheuristicsearch:hillclimbingwithoutbacktracking
• Makesstatisticallybaseddecisionsusingalldata
66
Summary:LearningDecisionTrees
1. Representation:Whataredecisiontrees?– Ahierarchicaldatastructurethatrepresentsdata
2. Algorithm:LearningdecisiontreesTheID3algorithm:Agreedyheuristic
• Ifalltheexampleshavethesamelabel,createaleafwiththatlabel
• Otherwise,findthe“mostinformative”attributeandsplitthedatafordifferentvaluesofthatattributes
• Recurse onthesplits
67