Hiroshi Tsuruoka, Kazu Z. Nanjo, Naoshi Hirata (ERI), Danijel ...
Artificial Intelligence 7. Decision trees Japan Advanced Institute of Science and Technology (JAIST)...
-
Upload
eustace-daniel -
Category
Documents
-
view
219 -
download
0
Transcript of Artificial Intelligence 7. Decision trees Japan Advanced Institute of Science and Technology (JAIST)...
![Page 1: Artificial Intelligence 7. Decision trees Japan Advanced Institute of Science and Technology (JAIST) Yoshimasa Tsuruoka.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649e575503460f94b4f771/html5/thumbnails/1.jpg)
Artificial Intelligence7. Decision trees
Japan Advanced Institute of Science and Technology (JAIST)Yoshimasa Tsuruoka
![Page 2: Artificial Intelligence 7. Decision trees Japan Advanced Institute of Science and Technology (JAIST) Yoshimasa Tsuruoka.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649e575503460f94b4f771/html5/thumbnails/2.jpg)
Outline• What is a decision tree?• How to build a decision tree• Entropy• Information Gain
• Overfitting• Generalization performance• Pruning
• Lecture slides• http://www.jaist.ac.jp/~tsuruoka/lectures/
![Page 3: Artificial Intelligence 7. Decision trees Japan Advanced Institute of Science and Technology (JAIST) Yoshimasa Tsuruoka.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649e575503460f94b4f771/html5/thumbnails/3.jpg)
Decision treesChapter 3 of Mitchell, T., Machine Learning (1997)
• Decision Trees– Disjunction of conjunctions– Successfully applied to a broad range of tasks• Diagnosing medical cases• Assessing credit risk of loan applications
• Nice characteristics– Understandable to human– Robust to noise
![Page 4: Artificial Intelligence 7. Decision trees Japan Advanced Institute of Science and Technology (JAIST) Yoshimasa Tsuruoka.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649e575503460f94b4f771/html5/thumbnails/4.jpg)
• Concept: PlayTennis
A decision tree
Outlook
Humidity Wind
Sunny RainOvercast
Yes
No Yes
High Normal
No Yes
Strong Weak
![Page 5: Artificial Intelligence 7. Decision trees Japan Advanced Institute of Science and Technology (JAIST) Yoshimasa Tsuruoka.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649e575503460f94b4f771/html5/thumbnails/5.jpg)
• Instance <Outlook = Sunny, Temperature = Hot, Humidity = High, Wind = Strong>
Classification by a decision tree
Outlook
Humidity Wind
Sunny RainOvercast
Yes
No Yes
High Normal
No Yes
Strong Weak
![Page 6: Artificial Intelligence 7. Decision trees Japan Advanced Institute of Science and Technology (JAIST) Yoshimasa Tsuruoka.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649e575503460f94b4f771/html5/thumbnails/6.jpg)
(Outlook = Sunny ^ Humidity = Normal)v (Outlook = Overcast)v (Outlook = Rain ^ Wind = Weak)
Disjunction of conjunctions
Outlook
Humidity Wind
Sunny RainOvercast
Yes
No Yes
High Normal
No Yes
Strong Weak
![Page 7: Artificial Intelligence 7. Decision trees Japan Advanced Institute of Science and Technology (JAIST) Yoshimasa Tsuruoka.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649e575503460f94b4f771/html5/thumbnails/7.jpg)
Problems suited to decision trees
• Instanced are represented by attribute-value pairs• The target function has discrete target values• Disjunctive descriptions may be required• The training data may contain errors• The training data may contain missing
attribute values
![Page 8: Artificial Intelligence 7. Decision trees Japan Advanced Institute of Science and Technology (JAIST) Yoshimasa Tsuruoka.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649e575503460f94b4f771/html5/thumbnails/8.jpg)
Training dataDay Outlook Temperature Humidity Wind PlayTennis
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
![Page 9: Artificial Intelligence 7. Decision trees Japan Advanced Institute of Science and Technology (JAIST) Yoshimasa Tsuruoka.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649e575503460f94b4f771/html5/thumbnails/9.jpg)
Which attribute should be tested at each node?
• We want to build a small decision tree
• Information gain– How well a given attribute separates the training
examples according to their target classification– Reduction in entropy
• Entropy– (im)purity of an arbitrary collection of examples
![Page 10: Artificial Intelligence 7. Decision trees Japan Advanced Institute of Science and Technology (JAIST) Yoshimasa Tsuruoka.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649e575503460f94b4f771/html5/thumbnails/10.jpg)
Entropy
• If there are only two classes
• In general,
ppppSEntropy 22 loglog
940.0
14/5log14/514/9log14/9]5,9[ 22
Entropy
c
iii ppSEntropy
12log
![Page 11: Artificial Intelligence 7. Decision trees Japan Advanced Institute of Science and Technology (JAIST) Yoshimasa Tsuruoka.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649e575503460f94b4f771/html5/thumbnails/11.jpg)
Information Gain
vAValuesv
v SEntropyS
SSEntropyASGain
,
• The expected reduction in entropy achieved by splitting the training examples
![Page 12: Artificial Intelligence 7. Decision trees Japan Advanced Institute of Science and Technology (JAIST) Yoshimasa Tsuruoka.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649e575503460f94b4f771/html5/thumbnails/12.jpg)
Example
048.0
00.114
6811.0
14
8940.0
14
6
14
8
,
]3,3[
]2,6[
]5,9[
,
,
StrongWeak
StrongWeakvv
v
Strong
Weak
SEntropySEntropySEntropy
SEntropyS
SSEntropyWindSGain
S
S
S
StrongWeakWindValues
![Page 13: Artificial Intelligence 7. Decision trees Japan Advanced Institute of Science and Technology (JAIST) Yoshimasa Tsuruoka.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649e575503460f94b4f771/html5/thumbnails/13.jpg)
Coumpiting Information Gain
Humidity Wind
High Normal Weak Strong
940.0
]5,9[
E
S
985.0
]4,3[
E
S
592.0
]1,6[
E
S
940.0
]5,9[
E
S
811.0
]2,6[
E
S
00.1
]3,3[
E
S
151.0
592.014
7985.0
14
7940.0
,
HumiditySGain
048.0
592.014
6811.0
14
8940.0
,
WindSGain
![Page 14: Artificial Intelligence 7. Decision trees Japan Advanced Institute of Science and Technology (JAIST) Yoshimasa Tsuruoka.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649e575503460f94b4f771/html5/thumbnails/14.jpg)
Which attribute is the best classifier?
• Information gain
029.0,
048.0,
151.0,
246.0,
eTemperaturSGain
WindSGain
HumiditySGain
OutlookSGain
![Page 15: Artificial Intelligence 7. Decision trees Japan Advanced Institute of Science and Technology (JAIST) Yoshimasa Tsuruoka.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649e575503460f94b4f771/html5/thumbnails/15.jpg)
Splitting training data with Outlook
Outlook
Sunny RainOvercast
{D1,D2,…,D14}[9+,5-]
{D1,D2,D8,D9,D11}[2+,3-]
{D3,D7,D12,D13}[4+,0-]
{D4,D5,D6,D10,D14}[3+,2-]
Yes? ?
![Page 16: Artificial Intelligence 7. Decision trees Japan Advanced Institute of Science and Technology (JAIST) Yoshimasa Tsuruoka.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649e575503460f94b4f771/html5/thumbnails/16.jpg)
Overfitting
• Growing each branch of the tree deeply enough to perfectly classify the training examples is not a good strategy.– The resulting tree may overfit the training data
• Overfitting– The tree can explain the training data very well
but performs poorly on new data
![Page 17: Artificial Intelligence 7. Decision trees Japan Advanced Institute of Science and Technology (JAIST) Yoshimasa Tsuruoka.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649e575503460f94b4f771/html5/thumbnails/17.jpg)
Alleviating the overfitting problem
• Several approaches– Stop growing the tree earlier– Post-prune the tree
• How can we evaluate the classification performance of the tree for new data?– The available data are separated into two sets of
examples: a training set and a validation (development) set
![Page 18: Artificial Intelligence 7. Decision trees Japan Advanced Institute of Science and Technology (JAIST) Yoshimasa Tsuruoka.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649e575503460f94b4f771/html5/thumbnails/18.jpg)
Validation (development) set• Use a portion of the original training data to
estimate the generalization performance.
Original training set
Original training set
Test setTest set
Training setTraining set
Test setTest set
Validation setValidation set