Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack...
Transcript of Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack...
![Page 1: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/1.jpg)
1
Machine Learning
Lecture # 4Multilayer Percceptron & Decision Trees
![Page 2: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/2.jpg)
Artificial Neural Networks (ANN)• Neural computing requires a
number of neurons, to be connected together into a neural network.
• A neural network consists of:– layers
– links between layers
• The links are weighted.
• There are three kinds of layers:1. input layer
2. Hidden layer
3. output layer
![Page 3: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/3.jpg)
From Human Neurones to Artificial Neurones
![Page 4: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/4.jpg)
A simple neuron
• At each neuron, every input has an associated weight which modifies the strength of each input.
• The neuron simply adds together all the inputs and calculates an output to be passed on.
![Page 5: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/5.jpg)
Activation function
![Page 6: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/6.jpg)
MultiLayer Perceptron (MLP)
![Page 7: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/7.jpg)
Motivation
• Perceptrons are limited because they canonly solve problems that are linearlyseparable
• We would like to build more complicatedlearning machines to model our data
• One way to do this is to build a multiplelayers of perceptrons
![Page 8: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/8.jpg)
Brief History
• 1985 Ackley, Hinton and Sejnowski propose the Boltzmann machine
– This was a multi-layer step perceptron
– More powerful than perceptron
– Successful application NETtalk
• 1986 Rummelhart, Hinton and Williams invent Multi-Layer Perceptron (MLP) with backpropagation
– Dominant neural net architecture for 10 years
![Page 9: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/9.jpg)
Multi layer networks
• So far we discussed networks with one layer.
• But these networks can be extended to combine several layers, increasing the set of functions that can be represented using a NN
![Page 10: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/10.jpg)
MLP
![Page 11: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/11.jpg)
![Page 12: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/12.jpg)
Multilayer Neural Network
![Page 13: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/13.jpg)
Sigmoid Response Functions
![Page 14: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/14.jpg)
MLP
![Page 15: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/15.jpg)
Simple example: AND
0 00 11 01 1
![Page 16: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/16.jpg)
Example: OR function
0 00 11 01 1
-10
20
20
![Page 17: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/17.jpg)
Negation:
01
10
-20
![Page 18: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/18.jpg)
Putting it together:
0 0
0 1
1 0
1 1
-30
20
20
10
-20
-20
-10
20
20
-30
20
20
10
-20
-20
-10
20
20
![Page 19: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/19.jpg)
Example of multilayer Neural Network
![Page 20: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/20.jpg)
• Suppose input values are 10, 30, 20
• The weighted sum coming into H1
SH1 = (0.2 * 10) + (-0.1 * 30) + (0.4 * 20)
= 2 -3 + 8 = 7.
• The σ function is applied to SH1:
σ(SH1) = 1/(1+e-7) = 1/(1+0.000912) = 0.999
• Similarly, the weighted sum coming into H2:
SH2 = (0.7 * 10) + (-1.2 * 30) + (1.2 * 20)
= 7 - 36 + 24 = -5
• σ applied to SH2:
σ(SH2) = 1/(1+e5) = 1/(1+148.4) = 0.0067
![Page 21: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/21.jpg)
• Now the weighted sum to output unit O1 :
SO1 = (1.1 * 0.999) + (0.1*0.0067) = 1.0996
• The weighted sum to output unit O2:
SO2 = (3.1 * 0.999) + (1.17*0.0067) = 3.1047
• The output sigmoid unit in O1:
σ(SO1) = 1/(1+e-1.0996) = 1/(1+0.333) = 0.750
• The output from the network for O2:
σ(SO2) = 1/(1+e-3.1047) = 1/(1+0.045) = 0.957
• The input triple (10,30,20) would becategorised with O2, because this has thelarger output.
![Page 22: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/22.jpg)
Training Parametric Model
![Page 23: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/23.jpg)
Minimizing Error
![Page 24: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/24.jpg)
Least Squares Gradient
![Page 25: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/25.jpg)
Single Layer Perceptron
![Page 26: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/26.jpg)
Single layer Perceptrons
![Page 27: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/27.jpg)
Different Response Functions
![Page 28: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/28.jpg)
Learning a Logistic Perceptron
![Page 29: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/29.jpg)
Back Propagation
![Page 30: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/30.jpg)
Back Propagation
![Page 31: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/31.jpg)
![Page 32: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/32.jpg)
A Worked Example:
• Propagated the values (10,30,20) through the network
• Suppose now that the target categorization for the example was the one associated with O1(using a learning rate of η = 0.1)
• the target output for O1 was 1, and the target output for O2 was 0
• t1(E) = 1; t2(E) = 0; o1(E) = 0.750; o2(E) = 0.957
• error values for the output units O1 and O2 – δO1 = o1(E)(1 - o1(E))(t1(E) - o1(E)) = 0.750(1-0.750)(1-0.750) = 0.0469
– δO2 = o2(E)(1 - o2(E))(t2(E) - o2(E)) = 0.957(1-0.957)(0-0.957) = -0.0394
Input units Hidden units Output units
Unit Output UnitWeighted Sum
InputOutput Unit
Weighted Sum Input
Output
I1 10 H1 7 0.999 O1 1.0996 0.750
I2 30 H2 -5 0.0067 O2 3.1047 0.957
I3 20
![Page 33: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/33.jpg)
• To propagate this information backwards to the hidden nodes H1 and H2– Multiply the error term for O1 by the weight from H1
to O1, then add this to the multiplication of the error term for O2 and the weight between H1 and O2, (1.1*0.0469) + (3.1*-0.0394) = -0.0706
– δH1 = -0.0706*(0.999 * (1-0.999)) = -0.0000705
– Similarly for H2: (0.1*0.0469)+(1.17*-0.0394) = -0.0414
– δH2 -0.0414 * (0.067 * (1-0.067)) = -0.00259
A Worked Example:
![Page 34: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/34.jpg)
Input unit Hidden unit η δH xi Δ = η*δH*xi Old weight New weight
I1 H1 0.1 -0.0000705 10 -0.0000705 0.2 0.1999295
I1 H2 0.1 -0.00259 10 -0.00259 0.7 0.69741
I2 H1 0.1 -0.0000705 30 -0.0002115 -0.1 -0.1002115
I2 H2 0.1 -0.00259 30 -0.00777 -1.2 -1.20777
I3 H1 0.1 -0.0000705 20 -0.000141 0.4 0.39999
I3 H2 0.1 -0.00259 20 -0.00518 1.2 1.1948
Hiddenunit
Outputunit
η δO hi(E) Δ = η*δO*hi(E) Old weight New weight
H1 O1 0.1 0.0469 0.999 0.000469 1.1 1.100469
H1 O2 0.1 -0.0394 0.999 -0.00394 3.1 3.0961
H2 O1 0.1 0.0469 0.0067 0.00314 0.1 0.10314
H2 O2 0.1 -0.0394 0.0067 -0.0000264 1.17 1.16998
A Worked Example:
![Page 35: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/35.jpg)
When to Learn
![Page 36: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/36.jpg)
Online Learning
![Page 37: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/37.jpg)
Batch Learning
![Page 38: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/38.jpg)
Early Stopping
![Page 39: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/39.jpg)
Self Study Examples
![Page 40: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/40.jpg)
XOR Example
![Page 41: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/41.jpg)
Linear separation
Can AND, OR and NOT be represented?
• Is it possible to represent every boolean function by simply combining these?
• Every boolean function can be composed using AND, OR and NOT (or even only NAND).
![Page 42: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/42.jpg)
Linear separation
• How we can learn XOR function?
![Page 43: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/43.jpg)
Linear separation
X1 X2 XOR
0 0 0
1 0 1
0 1 1
1 1 0
![Page 44: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/44.jpg)
Linear separation
X1 X2 XOR
0 0 0
1 0 1
0 1 1
1 1 0
It is impossible to find the value of Wi to learn
XOR
![Page 45: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/45.jpg)
Linear separation
X1 X2 X1*X2 XOR
0 0 0
1 0 1
0 1 1
1 1 0
So we learned W1, W2 and W3
![Page 46: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/46.jpg)
Example, Back Propogation learning function XOR
• Training samples (bipolar)
• Network: 2-2-1 with thresholds (fixed output 1)
in_1 in_2 d
P0 -1 -1 -1
P1 -1 1 1
P2 1 -1 1
P3 1 1 1
• Initial weights W(0)
• Learning rate = 0.2
• Node function: hyperbolic tangent
)1,1,1(:
)5.0,5.0,5.0(:
)5.0,5.0,5.0(:
)1,2(
)0,1(2
)0,1(1
w
w
w
))(1))((1(5.0)('
))(1)(()('
1)(2)(
;1
1)(
1)(lim
;1
1)tanh()(
xgxgxg
xsxsxs
xsxge
xs
xge
exxg
x
x
x
x
pj
W(1,0) W(2,1)
o
0)1(
1x
)1(2x
2
1
0
1
2
![Page 47: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/47.jpg)
-0.63211)(
-1.489840.24492)-,0.24492-,1)(1,1,1(
-0.244921)1/(2)(
-0.244921)1/(2)(
5.0)1,1,1()5.0,5.0,5.0(
5.0)1,1,1()5.0,5.0,5.0(
)1()1,2(
5.02
)1(1
5.01
)1(1
0)0,1(
22
0)0,1(
11
o
o
netgo
xwnet
enetgx
enetgx
pwnet
pwnet
computing Forward
1- d :1)- 1,- (1, P Present00
0.22090.6321)0.6321)(1-1(-0.3679
))(1))((1()('
-0.36789-0.63211)(1
ooo netgnetglnetgl
odl
gpropogatin back Error
-0.207650.24492)(10.24492)-1(1-0.2209
)('
-0.207650.24492)(10.24492)-1(1-0.2209
)('
2)1,2(
22
1)1,2(
11
netgw
netgw
![Page 48: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/48.jpg)
0.0108)0.0108, 0.0442,(0.2449)- 0.2449,-(1,0.2209)(2.0
)1()1,2(
xw
update Weight
0.0415)0.0415,-0.0415,()1-,1-(1,-0.2077)(2.0
0.0415)0.0415,-0.0415,()1-,1-(1,-0.2077)(2.0
02)0,1(
2
01)0,1(
1
pw
pw
1.0108)1.0108, (-0.5415,
0.0108)0.0108, (-0.0442,)1,1,1()1,2()1,2()1,2(
www
0.5415) 0.4585,--0.5415,(
0.0415)0.0415,-0.0415,()5.0,5.0,5.0(
0.4585)-0.5415,-0.5415,(
0.0415)0.0415,-0.0415,()5.0,5.0,5.0(
)0,1(2
)0,1(2
)0,1(2
)0,1(1
)0,1(1
)0,1(1
www
www
0.102823 to0.135345 from reduced for Error 20 lP
![Page 49: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/49.jpg)
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
MSE reduction:every 10 epochs
Output: every 10 epochs
epoch 1 10 20 40 90 140 190 d
P0 -0.63 -0.05 -0.38 -0.77 -0.89 -0.92 -0.93 -1
P1 -0.63 -0.08 0.23 0.68 0.85 0.89 0.90 1
P2 -0.62 -0.16 0.15 0.68 0.85 0.89 0.90 1
p3 -0.38 0.03 -0.37 -0.77 -0.89 -0.92 -0.93 -1
MSE 1.44 1.12 0.52 0.074 0.019 0.010 0.007
![Page 50: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/50.jpg)
init (-0.5, 0.5, -0.5) (-0.5, -0.5, 0.5) (-1, 1, 1)
p0 -0.5415, 0.5415, -0.4585 -0.5415, -0.45845, 0.5415 -1.0442, 1.0108, 1.0108
p1 -0.5732, 0.5732, -0.4266 -0.5732, -0.4268, 0.5732 -1.0787, 1.0213, 1.0213
p2 -0.3858, 0.7607, -0.6142 -0.4617, -0.3152, 0.4617 -0.8867, 1.0616, 0.8952
p3 -0.4591, 0.6874, -0.6875 -0.5228, -0.3763, 0.4005 -0.9567, 1.0699, 0.9061
)0,1(1w )0,1(
2w )1,2(w
After epoch 1
# Epoch
13 -1.4018, 1.4177, -1.6290 -1.5219, -1.8368, 1.6367 0.6917, 1.1440, 1.1693
40 -2.2827, 2.5563, -2.5987 -2.3627, -2.6817, 2.6417 1.9870, 2.4841, 2.4580
90 -2.6416, 2.9562, -2.9679 -2.7002, -3.0275, 3.0159 2.7061, 3.1776, 3.1667
190 -2.8594, 3.18739, -3.1921 -2.9080, -3.2403, 3.2356 3.1995, 3.6531, 3.6468
![Page 51: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/51.jpg)
Decision Trees
![Page 52: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/52.jpg)
Decision Tree Classifier
Ross Quinlan
An
ten
na
Len
gth
10
1 2 3 4 5 6 7 8 9 10
1
2
3
4
5
6
7
8
9
Abdomen Length
Abdomen Length > 7.1?
no yes
KatydidAntenna Length > 6.0?
no yes
KatydidGrasshopper
![Page 53: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/53.jpg)
• An inductive learning task– Use particular facts to make more generalized conclusions
• A predictive model based on a branching series of Boolean tests– These smaller Boolean tests are less complex than a one-
stage classifier
• Let’s look at a sample decision tree…
What is a Decision Tree?
![Page 54: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/54.jpg)
Predicting Commute Time
Leave At
Stall? Accident?
10 AM 9 AM8 AM
Long
Long
Short Medium Long
No Yes No Yes
If we leave at
10 AM and
there are no
cars stalled on
the road, what
will our
commute time
be?
![Page 55: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/55.jpg)
Inductive Learning
• In this decision tree, we made a series of Boolean decisions and followed the corresponding branch– Did we leave at 10 AM?
– Did a car stall on the road?
– Is there an accident on the road?
• By answering each of these yes/no questions, we then came to a conclusion on how long our commute might take
![Page 56: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/56.jpg)
Decision Trees as Rules
• We did have represent this tree graphically
• We could have represented as a set of rules. However, this may be much harder to read…
![Page 57: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/57.jpg)
Decision Tree as a Rule Set
if hour == 8am
commute time = long
else if hour == 9am
if accident == yes
commute time = long
else
commute time = medium
else if hour == 10am
if stall == yes
commute time = long
else
commute time = short
• Notice that all attributes to not have to be used in each path of the decision.
• As we will see, all attributes may not even appear in the tree.
![Page 58: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/58.jpg)
Weather Example
![Page 59: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/59.jpg)
Objective
• From a set of observations, the objective is to
predict whether we will be able to play Tennis based
on the past examples (inductive principle)
– For this, we will build automatically a decision
tree
– The decision concerns the Play Tennis attribute. It
shares the dataset into two classes: play = Yes
and play = No
![Page 60: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/60.jpg)
Attribute-values
• 14 instances described by 4 attributes categorical
(nominal).
• Each attribute is associated with a set of values.
• An attribute is selected to be in the class, for which
we make a decision.
![Page 61: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/61.jpg)
Decision Tree
• An internal node is a test on an attribute
• A branch represents an outcome of the
test, e.g. outlook=sunny
• A leaf node represents a class label or
class label distribution
• At each node, one attribute is chosen to
split training examples into distinct
classes as much as possible
• A new case is classified by following a
matching path to a leaf node
![Page 62: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/62.jpg)
Building a Decision Tree
![Page 63: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/63.jpg)
Building a Decision Tree
• One approach is to generate all possible trees and
find the best too expensive in general!
• There must be a way:
– Exploration top-down or bottom-up
– form a decision trees
• The main problem:
– during construction at each step, choose a good
attribute on which a test to be performed
![Page 64: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/64.jpg)
Building a Decision Tree
Top-down Tree Construction
Initially, all the training examples are at the root
Then, the examples are recursively partitioned, by choosing one
attribute at a time
Bottom-up Tree Pruning
Remove subtrees or branches, in a bottom-up manner, to
improve the estimated accuracy on new cases.
![Page 65: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/65.jpg)
When Should Building Stop?
• There are several possible stopping criteria
– All samples for a given node belong to the same
class
– If there are no remaining attributes for further
partitioning, majority voting is employed
– There are no samples left
– Or there is nothing to gain in splitting
![Page 66: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/66.jpg)
Decision Tree Algorithms
• The basic idea behind any decision tree algorithm is as follows:
– Choose the best attribute(s) to split the remaining instances and make that attribute a decision node
– Repeat this process recursively for each child
– Stop when:• All the instances have the same target attribute value
• There are no more attributes
• There are no more instances
![Page 67: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/67.jpg)
Sample Experience Table
Example Attributes Target
Hour Weather Accident Stall Commute
D1 8 AM Sunny No No Long
D2 8 AM Cloudy No Yes Long
D3 10 AM Sunny No No Short
D4 9 AM Rainy Yes No Long
D5 9 AM Sunny Yes Yes Long
D6 10 AM Sunny No No Short
D7 10 AM Cloudy No No Short
D8 9 AM Rainy No No Medium
D9 9 AM Sunny Yes No Long
D10 10 AM Cloudy Yes Yes Long
D11 10 AM Rainy No No Short
D12 8 AM Cloudy Yes No Long
D13 9 AM Sunny No No Medium
![Page 68: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/68.jpg)
Predicting Commute Time
Leave At
Stall? Accident?
10 AM 9 AM8 AM
Long
Long
Short Medium Long
No Yes No Yes
If we leave at
10 AM and
there are no
cars stalled on
the road, what
will our
commute time
be?
![Page 69: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/69.jpg)
Choosing Attributes
• The previous experience decision table showed 4 attributes: hour, weather, accident and stall
• But the decision tree only showed 3 attributes: hour, accident and stall
• Why is that?
![Page 70: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/70.jpg)
Choosing Attributes
• Methods for selecting attributes (which will be described later) show that weather is not a discriminating attribute
![Page 71: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/71.jpg)
Choosing Attributes
• The basic structure of creating a decision tree is the same for most decision tree algorithms
• The difference lies in how we select the attributes for the tree
• We will focus on the ID3 algorithm developed by Ross Quinlan in 1975
![Page 72: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/72.jpg)
Identifying the Best Attributes
• Refer back to our original decision tree
Leave At
Stall? Accident?
10 AM 9 AM8 AM
Long
Long
Short Medium
No Yes No Yes
Long
How did we know to split on leave at
and then on stall and accident and not
weather?
![Page 73: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/73.jpg)
Which is the splitting (best) attribute?
![Page 74: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/74.jpg)
ID3 Heuristic
• To determine the best attribute, we look at the ID3 heuristic
• ID3 splits attributes based on their entropy.
• Entropy is the measure of disinformation…
![Page 75: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/75.jpg)
Entropy
• Entropy is minimized when all values of the target attribute are the same.– If we know that commute time will always be short, then
entropy = 0
• Entropy is maximized when there is an equal chance of all values for the target attribute (i.e. the result is random)– If commute time = short in 3 instances, medium in 3
instances and long in 3 instances, entropy is maximized
![Page 76: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/76.jpg)
Entropy
• Calculation of entropy
– Entropy(S) = ∑(i=1 to l)-|Si|/|S| * log2(|Si|/|S|)
• S = set of examples
• Si = subset of S with value vi under the target attribute
• l = size of the range of the target attribute
![Page 77: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/77.jpg)
The Entropy Function Relative to Boolean Classification
79
1.0
0.0 0.5 1.0
Proportion of positive examples
Entro
py
Example taken from
Tom Mitchell’s
Machine Learning
![Page 78: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/78.jpg)
ID3
• ID3 splits on attributes with the lowest entropy
• calculate the entropy for all values of an attribute as the weighted sum of subset entropies as follows:
• We can also measure information gain (which is inversely proportional to entropy) as follows:– Entropy(S) - ∑(i = 1 to k) |Si|/|S| Entropy(Si)
![Page 79: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/79.jpg)
ID3
• Given our commute time sample set, we can calculate the entropy of each attribute at the root node
Attribute Expected Entropy Information Gain
Hour 0.6511 0.768449
Weather 1.28884 0.130719
Accident 0.92307 0.496479
Stall 1.17071 0.248842
![Page 80: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/80.jpg)
Entropy: weather example
![Page 81: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/81.jpg)
Which is the splitting (best) attribute?
![Page 82: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/82.jpg)
Which is the splitting (best) attribute?
• At each node, available attributes are evaluated on the basis of separating
the classes of the training examples
• A purity or impurity measure is used for this purpose
• Information Gain: Increases with the average purity of the subsets that an
attribute produces
• Splitting Strategy: choose the attribute that results in greatest information
gain
• Typical goodness functions: information gain (ID3), information gain ratio
(C4.5), gini index (CART)
![Page 83: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/83.jpg)
Which is the splitting (best) attribute?
![Page 84: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/84.jpg)
Which is the splitting (best) attribute?
• Entropy is a measure of disorder prevailing in a collection of objects. If all
objects belong to the same class, there is no disorder.
• Quinlan proposed to select the attribute that minimizes the disorder of
the partition resultant.
![Page 85: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/85.jpg)
The attribute “outlook”
• “outlook” = “sunny”
• “outlook” = “overcast”
• “outlook” = “rainy”
• Expected information for attribute
![Page 86: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/86.jpg)
Information Gain
Difference between the information before split and the information after
split
The information before the split, info(D), is the entropy,
The information after the split using attribute A is computed as the
weighted sum of the entropies on each split, given n splits,
![Page 87: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/87.jpg)
Information Gain
• Difference between the information before split and the information after split
• Information gain for the attributes from the weather data:– gain(“outlook”)=0.247
– gain(“temperature”)=0.029
– gain(“humidity”)=0.152
– gain(“windy”)=0.048
![Page 88: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/88.jpg)
![Page 89: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/89.jpg)
The Final Decision Tree
• Not all the leaves need to be pure
• Splitting stops when data can not be split any further
![Page 90: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/90.jpg)
Rule extraction from Tree
if
Then PlayTennis = yes
![Page 91: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/91.jpg)
ID3 in Gaming
• Black & White, developed by Lionhead Studios, and released in 2001 used ID3
• Used to predict a player’s reaction to a certain creature’s action
• In this model, a greater feedback value means the creature should attack
![Page 92: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/92.jpg)
ID3 in Black & White
Example Attributes Target
Allegiance Defense Tribe Feedback
D1 Friendly Weak Celtic -1.0
D2 Enemy Weak Celtic 0.4
D3 Friendly Strong Norse -1.0
D4 Enemy Strong Norse -0.2
D5 Friendly Weak Greek -1.0
D6 Enemy Medium Greek 0.2
D7 Enemy Strong Greek -0.4
D8 Enemy Medium Aztec 0.0
D9 Friendly Weak Aztec -1.0
![Page 93: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/93.jpg)
ID3 in Black & White
Allegiance
Defense
Friendly Enemy
0.4 -0.3
-1.0
Weak Strong
0.1
Medium
Note that this decision tree does not even use the tribe attribute
![Page 94: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/94.jpg)
ID3 in Black & White
• Now suppose we don’t want the entire decision tree, but we just want the 2 highest feedback values
• We can create a Boolean expressions, such as((Allegiance = Enemy) ^ (Defense = Weak)) v ((Allegiance = Enemy) ^ (Defense = Medium))
![Page 95: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/95.jpg)
Deciding when a tree is complete
• Continue splitting nodes until some goodness-of-split criterion
fails to be met.
– when the quality of a particular split falls below the threshold, the
tree is not grown further along that branch.
– when all branches from the root reach terminal nodes then tree is
complete.
97
![Page 96: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/96.jpg)
Deciding when a tree is complete
• Grow the tree too large and then prune the nodes off.
– after tree construction stops create a sequence of subtrees from the
original tree
• choose one subtree for each possible number of leaves (the
subtree chosen with p leaves has the best assessment value of all
candidate subtrees with p leaves)
– once the sequence of subtrees is established select which subtree to
use according to some criterion.
• best assessment val
98
![Page 97: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/97.jpg)
Evaluation methodology
• Standard methodology:
1. Collect a large set of examples (all with correct classifications)
2. Randomly divide collection into two disjoint sets: training and test
3. Apply learning algorithm to training set giving hypothesis H
4. Measure performance of H w.r.t. test set
Important: keep the training and test sets disjoint!
• To study the efficiency and robustness of an algorithm, repeat steps 2-4 for
different training sets and sizes of training sets
• If you improve your algorithm, start again with step 1 to avoid evolving the
algorithm to work well on just this collection99
![Page 98: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/98.jpg)
Another Version of the Weather Dataset
![Page 99: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/99.jpg)
Decision Tree for the New Dataset
• Entropy for splitting using “ID Code” is zero, since each leaf node is “pure”
• Information Gain is thus maximal for ID code
![Page 100: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/100.jpg)
Highly-Branching attributes
• Attributes with a large number of values are usually problematic
E.g. id, primary keys, or almost primary key attributes
• Subsets are likely to be pure if there is a large number of values
• Information Gain is biased towards choosing attributes with a large number of values
• This may result in overfitting (selection of an attribute that is non-optimal for prediction)
![Page 101: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/101.jpg)
Solution: Information Gain Ratio
• Modification of the Information Gain that reduces the bias toward highly-branching attributes
• Information Gain Ratio should be
– Large when data is evenly spread
– Small when all data belong to one branch
• Information Gain Ratio takes number and size of branches into account when choosing an attribute
• It corrects the information gain by taking the intrinsic information of a split into account
![Page 102: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/102.jpg)
Information Gain Ratio andIntrinsic information
• Intrinsic information
computes the entropy of distribution of instances into branches
• Information Gain Ratio normalizes Information Gain by
![Page 103: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/103.jpg)
Computing the Information Gain Ratio
• The intrinsic information for ID code is
• Importance of attribute decreases as intrinsic information gets larger
• The Information gain ratio of “ID code”,
![Page 104: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/104.jpg)
Information Gain Ratio for Weather Data
![Page 105: Lecture # 4 Multilayer Percceptron & Decision Treesbiomisa.org/uploads/2014/06/Lect-4.pdfBack Propagation. Back Propagation. A Worked Example: • Propagated the values (10,30,20)](https://reader030.fdocuments.net/reader030/viewer/2022040923/5e9d5c83ba7d0346625ca4d9/html5/thumbnails/105.jpg)
107
Acknowledgements
Introduction to Machine Learning, Alphaydin
Statistical Pattern Recognition: A Review – A.K Jain et al., PAMI (22) 2000
Pattern Recognition and Analysis Course – A.K. Jain, MSU
Pattern Classification” by Duda et al., John Wiley & Sons.
http://www.doc.ic.ac.uk/~sgc/teaching/pre2012/v231/lecture13.html
Some Material adopted from Dr. Adam Prugel-Bennett Dr. Andrew Ng and Dr. Aman
ullah’s Slides
Mat
eria
l in
th
ese
slid
es h
as b
een
tak
en f
rom
, th
e fo
llow
ing
reso
urc
es