7/23/2019 Widrow HoffLearning LMS
1/22
1430/10/28
IUT-Ahmadzadeh1
Ch 10: Widrow-Hoff Learning(LMS Algorithm)
In this chapter we apply the principles of performance
learning to a single-layer linear neural network.
Widrow-Hoff learning is an approximate steepest
1
descent algorithm, in which the performance index is
mean square error.
Bernard Widrow began working in NN in
the late 1950s, at about the same time thatFrank Rosenblatt developed theperceptron learning rule.
n row an o n ro uceADALINE (ADAptive LInear NEuron)network.
Its learning rule is called LMS (Least MeanSquare) algorithm.
2
ADALINE is similar to the perceptron,except that its transfer function is linear,instead of hard limiting.
7/23/2019 Widrow HoffLearning LMS
2/22
1430/10/28
IUT-Ahmadzadeh2
Widrow, B., and Hoff, M. E., Jr., 1960, Adaptive
switching circuits, in 1960 IRE WESCON Convention
Record, Part 4, New York: IRE, pp. 96104.
Widrow, B., and Lehr, M. A., 1990, 30 years of
adaptive neural networks: Perceptron, madaline, and
backpropagation,Proc. IEEE, 78:14151441.
Widrow, B., and Stearns, S. D., 1985,Adaptive Signal
3
Processing, Englewood Cliffs, NJ: Prentice-Hall.
Both have the same limitations; They can
only solve linearly separable problems.
The LMS algorithm minimizes mean
,
the decision boundaries as far from the
training patterns as possible.
The LMS algorithm found many more
practical uses than the perceptron (like
4
most long distance phone lines useADALINE network for echo cancellation).
7/23/2019 Widrow HoffLearning LMS
3/22
1430/10/28
IUT-Ahmadzadeh3
ADALINE Network
a purel in Wp b+ Wp b+= =
5
a i purelin ni purelin wT
i p b i+ wT
i p bi+= = =
wi
wi 1
wi 2
wi R
=
iw is made up of the elements of the ith row ofW:
Two-Input ADALINE
6
a pure lin n purelin wT
1 p b+ wT
1 p b+= = =
a wT
1 p b+ w1 1 p1 w1 2 p2 b+ += =
The ADALINE like perceptron has a decision boundary, which is
determined by the input vectors for which the net input n is zero.
7/23/2019 Widrow HoffLearning LMS
4/22
1430/10/28
IUT-Ahmadzadeh4
Mean Square Error
p1t1{ , } p2 t2{ , } pQtQ{ , } Training Set:
The LMS algorithm is an example of supervised training.
pq tqInput: Target:
x w1
b= z p
1= a w
T
1 p b+= a xTz=
Notation:
7
Fx E e2 = E t a 2 E t xTz 2 = =
Mean Square Error:
The expectation is taken over all sets of input/target pairs.
Error Analysis
Fx E e2 = E t a 2 E t xTz 2 = =
F x E t 2 tx Tz x Tzz x+ =
Fx E t2 2x TE tz xTEzzT x+=
This can be written in the following convenient form:
8
F x c 2xTh xTR x+=
c E t2 = h E tz = R E zz
T =
where
7/23/2019 Widrow HoffLearning LMS
5/22
1430/10/28
IUT-Ahmadzadeh5
The vector h gives the cross-correlation
between the input vector and its associatedtarget.
R is the in ut correlation matrix.
The diagonal elements of this matrix are
equal to the mean square values of the
elements of the input vectors.
9
Fx c dTx 12---x
TAx+ +=
d 2h= A 2 R=
quadratic function:
Stationary Point
A 2R=Hessian Matrix:
The correlation matrix R must be at least positive
semidefinite. Really it can be shown that all correlation
matrices are either positive definite or positive
semidefinite. If there are any zero eigenvalues, the
performance index will either have a weak minimum or
else no stationary point (depending on d= -2h),
10
Fx c dTx 12---x
TAx+ +
d Ax+ 2h 2Rx+= = =
2h 2R x+ 0=
(see Ch8).
Stationary point:
7/23/2019 Widrow HoffLearning LMS
6/22
1430/10/28
IUT-Ahmadzadeh6
x R 1 h=
If R (the correlation matrix) is positivedefinite:
If we could calculate the statistical quantities h and R,
we could find the minimum point directly from above
equation.
But it is not desirable or convenient to calculate h and
11
R. So
Approximate Steepest Descent
F x t k a k 2 e2 k = =
Approximate mean square error (one sample):
Fx e2 k =2
Approximate (stochastic) gradient:
Expectation of the squared error has been replaced
by the squared error at iteration k.
12
e k jw 1 j
---------------- 2 e k w 1 j
-------------= = j 1 2 R =
e2
k R 1+e
2k
b---------------- 2e k
e k b
-------------= =
7/23/2019 Widrow HoffLearning LMS
7/22
1430/10/28
IUT-Ahmadzadeh7
Approximate Gradient Calculation
e k w1 j
------------- t k a k
w1 j----------------------------------
w1 j
t k wT
1 pk b+ = =
e k w 1 j
-------------w1 j
t k w1 i p i k i 1=
R
b+
=
Where pi(k) is the ith elements of the input vector at kth iteration.
13
e k w1 j------------- pj k = e k b------------- 1=
Fx e2 k 2e k zk = =
Now we can see the beauty of approximating themean square error by the single error at iteration k as in:
Fx t k a k 2
e2
k= =
This approximation to )(xF can now be used
in the Steepest descent algorithm.
14
gor mxk 1+ xk F x x xk=
=
7/23/2019 Widrow HoffLearning LMS
8/22
1430/10/28
IUT-Ahmadzadeh8
If we substitute )(for)( xx FF
k 1+ k e=
w1 k 1+ w1 k 2e k pk +=
b k 1+ b k 2e k +=
15
These last two equations make up the LMS algorithm.Also called Delta Rule or the Widrow-Hofflearning
algorithm.
Multiple-Neuron Case
wi k 1+ wi k 2 ei k p k +=
b i k 1+ b i k 2e i k +=
Matrix Form:
16
Wk 1+ Wk 2ek pT k +=
b k 1+ bk 2e k +=
7/23/2019 Widrow HoffLearning LMS
9/22
1430/10/28
IUT-Ahmadzadeh9
Analysis of Convergence
Note that xk is a function only of z(k-1), z(k-2), , z(0). If
independent, then xk is independent of z(k).
We will show that for stationary input processes meeting
this condition, so the expected value of the weight vector
will converge to:*
17
x
This is the minimum mean square error {E[ek2]}
solution, as we saw before.
xk 1+ xk 2e k zk +=
=
Recall the LMS Algorithm:
Exk 1+ Exk 2 E t k z k E xkTzk z k +=
Substitute the error with )()( kkt T
kzx
TT
18
Exk 1+ Exk 2 E tkzk E zk zT
k xk +=
kk
7/23/2019 Widrow HoffLearning LMS
10/22
1430/10/28
IUT-Ahmadzadeh10
Exk 1+ Exk 2 h RExk +=
=
Since xk is independent of z(k)
For stability, the eigenvalues of this
matrix must fall inside the unit circle.
Conditions for Stability
19
eig I
2R
1 2 i
1=
(where i is an eigenvalue of R)
Therefore the stability condition simplifies to
i 0Since , 1 2i 1 .
1 2 i
1
1 i for all i
20
0 1 m ax
Note: we have the same condition as the SD algorithm. In
SD we use the Hessian MatrixA, here we use the input
correlation matrix R (Recall thatA=2R).
7/23/2019 Widrow HoffLearning LMS
11/22
1430/10/28
IUT-Ahmadzadeh11
Steady State Response
E xk 1+ I 2R E xk 2h+=
E xss I 2R E xss 2h+=
1 = =
, .
The solution to this equation is
21
ss
This is also the strong minimum of the performance index.
Thus the LMS solution, obtained by applying one input at a time, is
the same as the minimum mean square solution of hRx1*
Examplep1
1
1
1
t1 1= =
p2
1
1
1
t2 1= =
Banana Apple
R EppT
12---p1p1
T 1
2---p2p2
T+==
1---
11---
1 1 0 0
= =
If inputs are generated randomly with equal probability, the
input correlation matrix is:
22
2---
1
2---
1
0 1 1
1 1.0 2 0.0 3 2.0=== 1
max------------ 1
2.0------- 0.5==
We take =0.2 (Note: Practically it is difficult to calculate R and
. We choose them by trial and error).
7/23/2019 Widrow HoffLearning LMS
12/22
1430/10/28
IUT-Ahmadzadeh12
Iteration One
a 0 W 0 p 0 W 0 p1 0 0 01
1
1
0====Banana
W(0) is
selected
arbitrarily.
e 0 t 0 a 0 t1 a 0 1 0 1====
W 1 W 0 2e 0 pT 0 +=
23
W 1 0 0 0 2 0.2 1
1
1
1
0.4 0.4 0.4=+=
Iteration TwoApple
a 1 W 1 p 1 W 1 p2 0.4 0.4 0.4
1
1
1
0.4====
e 1 t1 a 1 t2 a 1 1 0.4 1.4====
W 2 0.4 0.4 0.4 2 0.2 1.4 1
1
T
0.96 0.16 0.16=+=
24
1
7/23/2019 Widrow HoffLearning LMS
13/22
1430/10/28
IUT-Ahmadzadeh13
Iteration Three
a 2 W 2 p 2 W 2 p1 0.96 0.16 0.16
1
1 0.64====
1
e 2 t2 a 2 t1 a 2 1 0.64 0.36====
T
25
3 2 2e 2 p 2 + 1.10400.0160 0.0160= =
W 1 0 0=
Some general comments on the
learning process:
Computationally, the learning process
goes roug a ra n ng examp es an
epoch) number of times, until a stopping
criterion is reached.
The convergence process can be
monitored with the lot of the mean-
squared error function F(W(k)).
26
7/23/2019 Widrow HoffLearning LMS
14/22
1430/10/28
IUT-Ahmadzadeh14
The popular stopping cri teria are:
the mean-squared error is sufficiently
sma : <
The rate of change of the mean-squared
error is sufficiently small:
27
Adaptive Filtering
ADALINE is one of the most widely used NNs in practical
applications. One of the major application areas has been
Adaptive Filtering.
Tapped Delay Line Adaptive Filter
28
7/23/2019 Widrow HoffLearning LMS
15/22
1430/10/28
IUT-Ahmadzadeh15
a k purelinWp b+ w1iy k i 1+ i 1=
R
b+= =
u w
recognize this network as a finite impulse response
(FIR) filter.
29
Example: Noise Cancellation
30
7/23/2019 Widrow HoffLearning LMS
16/22
1430/10/28
IUT-Ahmadzadeh16
Noise Cancellation Adaptive Filter
Two-input filter can attenuate and phase-shift the
noise in the desired way.
31
Correlation MatrixTo Analyze this system we need to find the inputcorrelation matrix R and the input/target cross-
h E tz =
zk v k
v k 1 = t k s k m k +=
.
][ TEzzR
32
R E v k E v k v k 1
E v k 1 v k E v2
k 1 =
h E s k m k + v k
E s k m k + v k 1 =
7/23/2019 Widrow HoffLearning LMS
17/22
7/23/2019 Widrow HoffLearning LMS
18/22
1430/10/28
IUT-Ahmadzadeh18
Stationary Point
E k m k + v k E s k v k E m k v k +=
0st
E m k v k 1
3--- 1.2
2k3
---------34
------ sin
1.2 2k3
---------sin
k 1=
3
0.51= =
independent and zero mean.
35
E s k m k + v k 1 E s k v k 1 E m k v k 1 +=
0
Now we find the 2nd element of h:
E m k v k 1 13--- 1.2 2k
3--------- 3
4------ sin 1.2
2k 1 3
-----------------------sin k 1=
3
0.70= =
h 0.51
0.70=
x R 1 h 0.72 0.36
0.36 0.72
10.51
0.70
0.30
0.82= = =
h E s k m k + v k
E s k m k + v k 1 =
36
Now, what kind of error will we have at the
minimum solution?
7/23/2019 Widrow HoffLearning LMS
19/22
1430/10/28
IUT-Ahmadzadeh19
Performance Index
Fx c 2 xTh xTRx+=
We have ust foundx* R andh. To find c we have
c E t2
k E s k m k + 2 ==
c E s2
k 2E s k m k E m2 k + +=
37
E s2
k 10.4------- s
2sd
0.2
0.2
1
3 0.4 ---------------s
3
0.2
0.2
0.0133= = =
independent and zero mean.
E m2 k 13--- 1.2 23------ 34------ sin
2
k 1=
3
0.72= =
Fx 0.7333 2 0.72 0.72+ 0.0133= =
. . .
The minimum mean square error is the same as the
mean square value of the EEG signal. This is what
38
we expected, since the error of this adaptive noise
canceller is in fact the reconstructed EEG Signal.
7/23/2019 Widrow HoffLearning LMS
20/22
1430/10/28
IUT-Ahmadzadeh20
LMS Response for =0.1
W1,2
39
W1,1
LMS trajectory looks like noisy version of steepest
descent.
Note that the contours in this figure reflect the fact that
the eigenvalues and the eigenvectors of the Hessian
matrix A=2R are
7071.0
7071.0z,75.0,
7071.0
7071.0z,16.2 2211
If the learning rate is decreased, the LMS trajectory is
smoother, but the learning proceed more slowly.
40
Note that max is 2/2.16=0.926 for stability.
7/23/2019 Widrow HoffLearning LMS
21/22
1430/10/28
IUT-Ahmadzadeh21
41
Note that error does not go to zero, because the LMS
algorithm is approximate steepest descent; it uses an estimate
of the gradient, not the true gradient. nnd10eeg
Echo Cancellation
42
7/23/2019 Widrow HoffLearning LMS
22/22
1430/10/28
HW
Ch 4: E 2, 4, 6, 7
Ch 5: 5, 7, 9
Ch 6: 4, 5, 8, 10
Ch 7: 1, 5, 6, 7
Ch 8: 2, 4, 5
43
Ch 9: 2, 5, 6
Ch 10: 3, 6, 7
Top Related