Other Network Models
description
Transcript of Other Network Models
![Page 1: Other Network Models](https://reader036.fdocuments.net/reader036/viewer/2022062315/56815946550346895dc681cd/html5/thumbnails/1.jpg)
Other Network Models
![Page 2: Other Network Models](https://reader036.fdocuments.net/reader036/viewer/2022062315/56815946550346895dc681cd/html5/thumbnails/2.jpg)
2
Deterministic weight updates
• Until now, weight updates have been deterministic.
• State = current weight values & unit activations• But a probabilistic distribution can be used to
determine whether or not a unit should change to the new calculated state.
• So for example, in Discrete Hopfield, even if a unit is selected for update, it might not be updated.
![Page 3: Other Network Models](https://reader036.fdocuments.net/reader036/viewer/2022062315/56815946550346895dc681cd/html5/thumbnails/3.jpg)
3
Simulated Annealing
Points tried at medium
Points tried at low
Find a global minimum using simulated annealing
![Page 4: Other Network Models](https://reader036.fdocuments.net/reader036/viewer/2022062315/56815946550346895dc681cd/html5/thumbnails/4.jpg)
4
S.A.
• A deterministic algorithm like backpropogation that uses gradient descent often gets caught in local minima.
• Once caught, the network can no longer move along error surface to a more optimal solution.
• Metropolis algorithm: Select at random a part of the system to change. The change is always accepted if the global system energy falls, but if there’s an increase in energy then the change is accepted with propability p.
![Page 5: Other Network Models](https://reader036.fdocuments.net/reader036/viewer/2022062315/56815946550346895dc681cd/html5/thumbnails/5.jpg)
5
S.A.
)exp(T
Ep
E Is change in energy and T is temperature.
![Page 6: Other Network Models](https://reader036.fdocuments.net/reader036/viewer/2022062315/56815946550346895dc681cd/html5/thumbnails/6.jpg)
6
Example algorithm for function minimizing (Geman and Hawang, 1986)
1. Select at random an initial vector x and an initial value of T.
2. Create a copy of x called xnew and randomly select a component of xnew to change. Flip the bit of the selected component.
3. Calculate the change in energy.4. If the change in energy is less than 0 then x =
xnew. Else select a random number between 0 and 1 using a uniform distribution probability density function. If the random number is less than formula then x = xnew.
![Page 7: Other Network Models](https://reader036.fdocuments.net/reader036/viewer/2022062315/56815946550346895dc681cd/html5/thumbnails/7.jpg)
7
Continued
5. If there have been a specified number (M) of changes in x for which the value of f has dropped or there have been N changes in x since the last change in temperature, then set T = αT.
6. If the minimum value of f has not decreased more than some specified constant in the last L iterations then stop, otherwise go back and repeat from step 2.
![Page 8: Other Network Models](https://reader036.fdocuments.net/reader036/viewer/2022062315/56815946550346895dc681cd/html5/thumbnails/8.jpg)
8
Boltzmann machine
• Is a neural network that uses the idea of simulated annealing for updating the network’s state.
• It’s a Hopfield network that uses a stochastic process for updating the state of a network unit.
• Assume +1 and -1 activation values.
iji
ij
ijj i
ij
wssE
wssE
2
1
![Page 9: Other Network Models](https://reader036.fdocuments.net/reader036/viewer/2022062315/56815946550346895dc681cd/html5/thumbnails/9.jpg)
9
Probability function for state change
)exp(1
1
TE
p
![Page 10: Other Network Models](https://reader036.fdocuments.net/reader036/viewer/2022062315/56815946550346895dc681cd/html5/thumbnails/10.jpg)
10
Weight update
ij
ij
ijijijw
)(
= correlation between units during clamped phase
= correlation between units during free-running phase
![Page 11: Other Network Models](https://reader036.fdocuments.net/reader036/viewer/2022062315/56815946550346895dc681cd/html5/thumbnails/11.jpg)
11
An example Boltzmann machine (can be used for autoasssociation)
Input layerOutput layer
![Page 12: Other Network Models](https://reader036.fdocuments.net/reader036/viewer/2022062315/56815946550346895dc681cd/html5/thumbnails/12.jpg)
12
Probabilistic Neural Networks
• In a PNN, a pattern is classified based on its proximity to neighbouring patterns.
• The manner in which neighbouring patterns are distributed is important.
• A simple metric to decide the class of a new metric is to calculate the centroid for each class.
• The PNN is based on Bayes’ technique of classification, make a decision as to the most likely class that a sample is taken from. The decision requires to estimate a probability density function for each class.
• The estimate is constructed from training data.
![Page 13: Other Network Models](https://reader036.fdocuments.net/reader036/viewer/2022062315/56815946550346895dc681cd/html5/thumbnails/13.jpg)
13
Class estimation methods
![Page 14: Other Network Models](https://reader036.fdocuments.net/reader036/viewer/2022062315/56815946550346895dc681cd/html5/thumbnails/14.jpg)
14
Guassian dist.
![Page 15: Other Network Models](https://reader036.fdocuments.net/reader036/viewer/2022062315/56815946550346895dc681cd/html5/thumbnails/15.jpg)
15
Gaussian dist.
Gaussian function for two variables
![Page 16: Other Network Models](https://reader036.fdocuments.net/reader036/viewer/2022062315/56815946550346895dc681cd/html5/thumbnails/16.jpg)
16
PDF (Probability density function)
The estimated PDF is the summation of the individual Gaussians centered at each sample point. Here σ = 0.1
![Page 17: Other Network Models](https://reader036.fdocuments.net/reader036/viewer/2022062315/56815946550346895dc681cd/html5/thumbnails/17.jpg)
17
The same estimate as in previous figure but with σ = 0.3. The width is too large, then there is a danger that classes will become blurred (a high chance of misclassifying).
![Page 18: Other Network Models](https://reader036.fdocuments.net/reader036/viewer/2022062315/56815946550346895dc681cd/html5/thumbnails/18.jpg)
18
The same estimate as in previous figure but with σ = 0.05. The width becomes too small, then there is a danger of poor generalization: the fit around the training samples becomes too close.
![Page 19: Other Network Models](https://reader036.fdocuments.net/reader036/viewer/2022062315/56815946550346895dc681cd/html5/thumbnails/19.jpg)
19
PPN
• The class with a highly dense population in the region of an unknown sample will be preferred over other classes.
• The probability density function (PDF) needs to be estimated.
• The estimate can be found using Parzen’s PDF estimator which uses a weight function that is centered at a training point. The weight function is called a potential function or kernel.
• A commonly used function is a Gaussian function.
![Page 20: Other Network Models](https://reader036.fdocuments.net/reader036/viewer/2022062315/56815946550346895dc681cd/html5/thumbnails/20.jpg)
20
PPN
• The Gaussian functions are then summed to give the PDF.
• The form of Gaussian function is as follows:
n
i
ixxxg
12
2
)exp()(
This square will be cancelled with square-root in normalization formula.
![Page 21: Other Network Models](https://reader036.fdocuments.net/reader036/viewer/2022062315/56815946550346895dc681cd/html5/thumbnails/21.jpg)
21
Example
• There are two classes of a data of a single variable in the following figure. A sample positioned at 0.2 is from an unknown class. Using a PDF with a Gaussian kernel, estimate the class that the sample is from.
![Page 22: Other Network Models](https://reader036.fdocuments.net/reader036/viewer/2022062315/56815946550346895dc681cd/html5/thumbnails/22.jpg)
22
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1
-1 -0,5 0 0,5 1
Series1
Unknown sample
Figure. The unknown sample to be classified using a PDF.
![Page 23: Other Network Models](https://reader036.fdocuments.net/reader036/viewer/2022062315/56815946550346895dc681cd/html5/thumbnails/23.jpg)
23
SOLUTION
• The value for α = 0.1. The result of the density estimation are shown in table of the following slide.
• Although the unknown sample is closest to a point in class A the calculation favors class B. The reason why B is preferred is the high density of points around 0.35.
![Page 24: Other Network Models](https://reader036.fdocuments.net/reader036/viewer/2022062315/56815946550346895dc681cd/html5/thumbnails/24.jpg)
24
The calculation of the density estimation
Class Training PointDistance from
unknown PDF
A -0,2 (-0.2 -0.2)2 =0.16 exp( -0.16 / 0.01) = 0 0
A -0,5 0,49 0 0
A -0,6 0,64 0 0
A -0,7 9,81 0 0
A -0,8 1 0 0
A 0,1 0,01 0,3679 0
0,3679
B 0,35 0,0225 exp( -0.0225 / 0.01) = 0,1054 0
B 0,36 0,0256 0,0773 0
B 0,38 0,0324 0,0392 0
B 0,365 0,0272 0,0657 0
B 0,355 0,024 0,0905 0
B 0,4 0,04 0,0183 0
B 0,5 0,09 0,001 0
B 0,6 0,16 0 0
B 0,7 0,25 0 0
0,3965
Sample point
![Page 25: Other Network Models](https://reader036.fdocuments.net/reader036/viewer/2022062315/56815946550346895dc681cd/html5/thumbnails/25.jpg)
25
The neural network architecture for a PNN
• The input and pattern layers are fully connected.• The weights feeding into a pattern unit are set to
the elements of the corresponding pattern vector.
• The activation of a pattern unit is
)
)(
exp(
2
2
iij
j
xw
ox is an unknown input pattern.
![Page 26: Other Network Models](https://reader036.fdocuments.net/reader036/viewer/2022062315/56815946550346895dc681cd/html5/thumbnails/26.jpg)
26
PNN
)
1
exp(2
iji
j
wx
o
If the input vectors are all of unit length, then the following form of the activation function can be used.
Number of input units = number of features
Number of pattern units = number of training samples
Number of summation units = number of classes
The weights from the pattern to summations units are fixed at 1.
![Page 27: Other Network Models](https://reader036.fdocuments.net/reader036/viewer/2022062315/56815946550346895dc681cd/html5/thumbnails/27.jpg)
27
An Example PNN Architecture
)(xf A
Input layer
Pattern layer
Summation layer
Output layer
)(xfb
![Page 28: Other Network Models](https://reader036.fdocuments.net/reader036/viewer/2022062315/56815946550346895dc681cd/html5/thumbnails/28.jpg)
28
Example
• Following figure shows a set of training points from three classes and an unknown sample. Normalize the inputs to unit length and, using a PNN, find the class to which the unknown sample is assigned.
![Page 29: Other Network Models](https://reader036.fdocuments.net/reader036/viewer/2022062315/56815946550346895dc681cd/html5/thumbnails/29.jpg)
29
The unknown sample to be classified using a PNN
-4
-3
-2
-1
0
1
2
3
4
5
6
7
0 2 4 6 8 10
Series1
A
B
C
Unknown sample
![Page 30: Other Network Models](https://reader036.fdocuments.net/reader036/viewer/2022062315/56815946550346895dc681cd/html5/thumbnails/30.jpg)
30
Solution
-1
-0,8
-0,6
-0,4
-0,2
0
0,2
0,4
0,6
0,8
1
0 0,5 1 1,5
Series1
The vectors shown in previous figure are normalized here.
![Page 31: Other Network Models](https://reader036.fdocuments.net/reader036/viewer/2022062315/56815946550346895dc681cd/html5/thumbnails/31.jpg)
31
Training data normalized to unit length
Unnormalized Normalized
x1 x2 x1 x2
3 5 0,5145 0,8575
4 4 0,7071 0,7071
3 4 0,6 0,8
5 6 0,6402 0,7682
4 6 0,5547 0,8321
4 5 0,6247 0,7809
7 2 0,9615 0,2747
7 3 0,9191 0,3939
8 2 0,9701 0,2425
8 3 0,9363 0,3511
9 4 0,9138 0,4061
1 -1 0,7071 -0,7071
1 -2 0,4472 -0,8944
2 -2 0,7071 -0,7071
3 -2 0,8321 -0,5547
3 -3 0,7071 -0,7071
![Page 32: Other Network Models](https://reader036.fdocuments.net/reader036/viewer/2022062315/56815946550346895dc681cd/html5/thumbnails/32.jpg)
32
Unknown sample
Unnormalized Normalized
x1 x2 x1 x2
5,8 4,4 0,7967 0,6044
![Page 33: Other Network Models](https://reader036.fdocuments.net/reader036/viewer/2022062315/56815946550346895dc681cd/html5/thumbnails/33.jpg)
33
Computation of the PNN for classifying the unknown sample
w1 w2 Activation
Activation of summed unit
0,5145 0,8575 0,0008
0,7071 0,7071 0,395
0,6 0,8 0,0213
0,6402 0,7682 0,0768
0,5547 0,8321 0,004
0,6247 0,7809 0,048 0,5459
0,9615 0,2747 0,0011
0,9191 0,3939 0,0516
0,9701 0,2425 0,0003
0,9363 0,3511 0,0153
0,9138 0,4061 0,0706 0,1389
0,7071 -0,7071 0
0,4472 -0,8944 0
0,7071 -0,7071 0
0,8321 -0,5547 0
0,7071 -0,7071 0 0
![Page 34: Other Network Models](https://reader036.fdocuments.net/reader036/viewer/2022062315/56815946550346895dc681cd/html5/thumbnails/34.jpg)
34
Calculations of activations
>> exp(((0.6247*0.7967)+(0.7809*0.6044)-1)/0.01)
ans =
0.0482
>> exp(((0.9138*0.7967)+(0.4061*0.6044)-1)/0.01)
ans =
0.0704