Jacek Mazurkiewicz, PhD Softcomputing · Softcomputing Part 3: Recurrent Artificial Neural Networks...
Transcript of Jacek Mazurkiewicz, PhD Softcomputing · Softcomputing Part 3: Recurrent Artificial Neural Networks...
Internet EngineeringJacek Mazurkiewicz, PhD
Softcomputing
Part 3: Recurrent Artificial Neural Networks
Self-Organising Artificial Neural Networks
Recurrent Artificial Neural NetworksFeedback signals between neurons
Dynamic relations
Single neuron change is transmitted to whole net
Stable state is reached after the set of temporary states
Stable state is available if strict assumptions are fixed to weights
Recurrent artificial neural networks are equipedby symmetric inter-neurons connections
Associative Memory
computer „memory” – as close as possible to human memory:
associative memory – to store „patterns”
auto-associative: Smoth – Smith
learning procedure – to inprint the set of patterns
retrieving phase– output the stored pattern closest to the actual input signal
hetero-associative: Smith – Smith’s face (Smith’s feature)
Hopfield Network (1)
Hamming distance for a binary input:
=
−+−=n
i
iiiiH yxyxd1
])1()1([
Hamming distance equals to zero if:
y = x
Hamming distance is a numberof not equal bits
wij
vN
v1
v2
vN-1
xN
x1
x2
xN-1
y1
neuron
y2
yN-1
yN
Retrieving Phase (1)each neuron performs the following two steps:
p pjj
N
j pk w v k ( ) ( )+ = −=
11
– computes the coproduct:
– updates the state:
p
p
p p
p
v k
k
v k k
k
for
for
for
( )
( )
( ) ( )
( )
+
+
+
− +
=
=
11 1 0
1 0
1 1 0
wij
vN
v1
v2
vN-1
xN
x1
x2
xN-1
y1
neuron
y2
yN-1
yN
where:
wpj – weight related to feedback signal
vi(k) – feedback signal
p – bias
initial condition:
process is repeated until convergence,which occurs when none of the elements changes state during any iteration:
Retrieving Phase (2)wij
vN
v1
v2
vN-1
xN
x1
x2
xN-1
y1
neuron
y2
yN-1
yN
p p pv x =( )0
p p p pv k v k y + = =( ) ( )1
converged state of Hopfield net means thatnet has already reached one of attractorsattractor - point of a local minimum of the energy function (Liapunov function):
E x w x x xij ij
N
i
N
j ii
N
i( ) = − +== =
1
2 11 1
E x x W x xT T
( ) = − +1
2
Hebbian Learningtraining patterns are presented one by onein a fitted time intervals
convergence condition:
wij
vN
v1
v2
vN-1
xN
x1
x2
xN-1
y1
neuron
y2
yN-1
yN
during each interval input data is communicated to neuron’s neighbours N times
ij
i
m
j
m
m
M
wx x i j
i jN
dla
dla
=
=
=
1
1
0
( ) ( )
p pp p j pj jpw w w = =0
algorithm: easy, fast, low memory capacity:
NM 138.0max=
correct weight values means:– input signal generates itself as output– converged state available at once:
one of possible solutions is:
Pseudoinverse Learningwij
vN
v1
v2
vN-1
xN
x1
x2
xN-1
y1
neuron
y2
yN-1
yN
XXW =
( )W X X X XT T
=
−1
algorithm: sophisticated, high memory capacity:
maxM N=
Delta-Rule Learningwij
vN
v1
v2
vN-1
xN
x1
x2
xN-1
y1
neuron
y2
yN-1
yN
weights are tuned step by step using all learning signals, presented in a sequence:
W W x W x xNi i i
T
= + − ( ) ( ) ( )
07 09. ., – learning rate
algorithm is quite similar to gradient methods used for Multilayer Perceptron learning
algorithm: sophisticated, high memory capacity:
maxM N=
Retrieving Phase - ProblemsInput signals heavily corrupted by noise can follow to a false answer
– net output is far from learned/stored patterns
Energy function value for symmetric states is identical(+1,+1,-1) == (-1,-1,+1)– both solutions offer the same „acceptance factor”
Learning algorithms can produce additional local minima– as linear combination of learning patterns
Additional minima are not fixed to any learning pattern– strongly important if the number of learning patterns is significant
Example of Answers
10 digits, 7x7 pixels
Hebbian learning:– 1 correct answer
Pseudoinverse & Delta-rule learning:– 7 correct answers– 9 answers with 1 wrong pixel– 4 answers with 2 wrong pixels
Hamming Network (1)
Hamming Network (2)
Hamming net – maximum likelihood classifierfor binary inputs corrupted by noise
Lower Sub Net calculates N minus the Hamming distanceto M exemplar patterns
Upper Sub Net selects that node with the maximum output
All nodes use threshold logic nonlinearities– the outputs of these nonlinearities never saturate
Thresholds and weights in the Maxnet are fixed
All thresholds are set to zero, weights from each node to itself are 1
Weights between nodes are inhibitory
Hamming Network (3)
weights and offsets of the Lower Sub Net:
weights in the Maxnet are fixed as:
ji
i
j
jwx N
= =2 2 for i N and j M0 1 0 1 − −
−
==
1
11
kif
kifwlk
for l k M andM
01
,
all thresholds in the Maxnet are kept zero
Hamming Network (4)outputs of the Lower Sub Net are obtained as:
weights in the Maxnet are fixed as:
for i N and j M0 1 0 1 − −
Maxnet does the maximisation by evaluating:
j ji i ji
N
w x = −=
−
0
1
( ) ( )j t j
y f0 = for j M0 1 −
( ) ( ) ( )j t j k
k j
y f y yt t t+ = −
1 for j k M0 1 −,
this process is repeated until convergence
Introduction
learning without a teacher – data overload
unsupervised learning:– similarity– PCA algorithms– classification– archetype finding– feature maps
Pavlov Experiment
FOOD (UCS) SALIVATION (UCR)
BELL (CS) SALIVATION (CR)
FOOD + BELL (UCS + CS) SALIVATION (CR)
CS – conditioned stimulus CR – conditioned reflexUCS – unconditioned stimulus UCR – unconditioned reflex
Fields of Usingsimilarity
– single-output net– how close is input signal to „mean-learned-pattern”
PCA– multi-output net, each output = single principal component– principal components responsible for similarity– actual output vector – correlation level
classification– binary multi-output with 1 of n code – class of closest data
stored patterns finding– associative memory
coding– data compression
Hebbian Rule (1949)
if neuron A is activated in a cyclic way by neuron B– neuron A is more and more sensitive to activation from neuron B
f(a) is any function– linear for example
f(ai)
X1
Xm
Wi1
Wim
ui yi
X2
Wi2)()()1( kijijij wkwkw +=+
)()( kykxw ijij =
General Hebbian Rule
Problem:– unlimited weight growth
Solution:– set limitations (Linsker)– Oja’s rule
Limitations:
Oja’s rule:– Hebbian rule + normalisation– additional requirements
),( jiij yxFw =
=
=m
j
jiji kxwky0
)()(
0;0;0
)()(
=
ijij
ijij
wyx
kykxw ],[ +− iii www
)]()()()[()( kwkykxkykw ijijiij −=
Principal Component Analysis - PCA
Statistic loss compression in telecommunication– Karhuenen-Loeve approach
Linear conversion into output space with reduced dimensions– preserves the most important features of stochastic process x
First component estimation– weights vector – using Oja’s rule:
Other principal components– by Sanger’s rule:
Wxy =NK
Rx N
NK
K
RW
Ry
+
)()()(0
111 kxWkxWkyN
j
jj
T =
==
=
=N
j
jiji kxWky0
)()(
Neural Networks for PCA
Oja’s rule - 1989
Sanger’s rule - 1989
nj
ki
wyxywk
l
ijijiij
,...,1
,...,1
1
=
=
−=
=
nj
ki
wyxywi
l
ijijiij
,...,1
,...,1
1
=
=
−=
=
Rubner & Tavan Network – 1989 (1)
Single-layer
One-way connections
Weights:– input layer – calculation layer according to the Hebbian rule
Internal connections within calculation layer– according to the anti-Hebb rule
ijij yxw =
ijij yyv −=
Rubner & Tavan Network – 1989 (2)
x1 x2 x3 x4 x5
y1 y2 y3 y4
v21 v32 v43
v41
v31v42
w11 w45
Picture Compression for PCA
Large amount of input data substitutedby lower amount combined in vector y and Wi
Level of compression – number of PCA components– main factor of the restored picture quality
More principal components– better quality– lower compression level
Picture restored based on:– 2 principal components– compression level: 28
Self-Organising Artificial Neural NetworksInter-neurons action
Goal: input signals mapped into output signals
Similar input data are grouped
Groups are separated
Kohonen neural network – leader!
T. Kohonen from Finland!
Concurrent Learning
WTA – Winner Takes All WTM – Winner Takes Most
W
Y X
WTA (1)
Single layer of working neurons
The same input signals xj are loaded to all competitive neurons
Starting weight values are random
Each neuron calculates the product:
The winner is … the neuron with a maximum output!
Neuron the winner – final output equals to 1
Other neurons set output values to 0
=j
jiji xwu
WTA (2)
First presentation of learning vectors is the base to pointthe winner neuron
Weights are modified by the Grossberg rule
If the learning vectors are similar the same winner neuron,the winner’s weights are the mean values of input signals
X
W
WTM (1)
Winner selection like in WTA
Winner’s output is maximum
Winner activates the neighbourhood neurons
Distance from the winner drives the level of activation
Level of activation is a part of weight tuning algorithm
All weights are modified during learning algorithm
Neurons Neighbourhood (1)
Neurons as nodes of regular network
Central neuron – in the middle of the region
Neighbourhood neurons in the closest columns and rows
simple neighbourhood sophisticated neighbourhood
Neurons Neighbourhood (2)
2-D neighbourhood
1-D neighbourhood
Neighbourhood function h(r)
distance function betweeneach neuron and the winner
defines the necessary parametersfor weights tuning
rrh
1)( =
2
)( rerh −=
r – distance between the winnerand neurons in the neighbourhood
or
Grossberg Ruleneighbourhood around the wining neuron,
size of neighbourhood decreases with iteration,
modulation of learning rate by frequency sensitivity.
Neighbourhood function = Mexican Hat:
a - neighbourhood parameter,r - distance from winner neuron
to each single neuron
=
=
rvaluesotherfora
rforar
arrfor
jijih ww
0
)2
,0()sin(
01
),,,(
The Grossberg rule:
))()(,,,()()()1( kwxjijihkkwkw lijl
ww
lijlij −+=+
k - iteration index, - learning rate function, xl - component of input learning vectorwlij - weight associated with proper connection, h - neighbourhood function,
(iw ,jw) - indexes related to the winner neuron, (i, j) - indexes related to a single neuron