Post on 23-Dec-2015
Chapter 5 ELE 774 - Adaptive Signal Processing 1
Least Mean-SquareAdaptive Filtering
ELE 774 - Adaptive Signal Processing 2Chapter 5
Steepest Descent
The update rule for SD is
where
or
SD is a deterministic algorithm, in the sense that p and R are assumed to be exactly known.
In practice we can only estimate these functions.
ELE 774 - Adaptive Signal Processing 3Chapter 5
Basic Idea
The simplest estimate of the expectations is To remove the expectation terms and replace them with the
instantaneous values, i.e.
Then, the gradient becomes
Eventually, the new update rule is
Noexpectations,Instantaneous
samples!
ELE 774 - Adaptive Signal Processing 4Chapter 5
Basic Idea
However the term in the brackets is the error, i.e.
then
is the gradient of instead of as in SD.
ELE 774 - Adaptive Signal Processing 5Chapter 5
Basic Idea
Filter weights are updated using instantaneous values
ELE 774 - Adaptive Signal Processing 6Chapter 5
Update Equation forMethod of Steepest Descent
Update Equation forLeast Mean-Square
ELE 774 - Adaptive Signal Processing 7Chapter 5
LMS Algorithm
Since the expectations are omitted, the estimates will have a high variance. Therefore, the recursive computation of each tap weight in the LMS algorithm
suffers from a gradient noise.
In contrast to SD which is a deterministic algorithm, LMS is a member of the family of stochastic gradient descent algorithms.
LMS has higher MSE (J(∞)) compared to SD (Jmin) (Wiener Soln.) as n→∞
i.e., J(n) →J(∞) as n→∞ Difference is called the excess mean-square error Jex(∞)
The ratio Jex(∞)/ Jmin is called the misadjustment.
Hopefully, J(∞) is a finite value, then LMS is said to be stable in the mean square sense.
LMS will perform a random motion around the Wiener solution.
unbiased
ELE 774 - Adaptive Signal Processing 8Chapter 5
LMS Algorithm
Involves a feedback connection. Although LMS might seem very difficult to work due the
randomness, the feedback acts as a low-pass filter or performs averaging so that the randomness can be filtered-out.
The time-constant of averaging is inversely proportional to μ. Actually, if is chosen small enough, the adaptive process is made
to progress slowly and the effects of the gradient noise on the tap weights are largely filtered-out.
Computational complexity of LMS is very low → very attractive Only 2M+1 complex multiplications and 2M complex additions
per iteration.
ELE 774 - Adaptive Signal Processing 9Chapter 5
LMS Algorithm
ELE 774 - Adaptive Signal Processing 10Chapter 5
Canonical Model
LMS algorithm for complex signals/with complex coef.s can be represented in terms of four separate LMS algorithms for real signals with cross-coupling between them.
Write the input/desired signal/tap gains/output/error in the complex notation
ELE 774 - Adaptive Signal Processing 11Chapter 5
Canonical Model Then the relations bw. these expressions are
ELE 774 - Adaptive Signal Processing 12Chapter 5
Canonical Model
ELE 774 - Adaptive Signal Processing 13Chapter 5
Canonical Model
ELE 774 - Adaptive Signal Processing 14Chapter 5
Analysis of the LMS Algorithm
Although the filter is a linear combiner, the algorithm is highly non-linear and violates superposition and homogenity
Assume the initial condition , then
Analysis will continue using the weight-error vector
and its autocorrelation
inputoutput
Here we use expectation,however, actually it isthe ensemble average!.
ELE 774 - Adaptive Signal Processing 15Chapter 5
Analysis of the LMS Algorithm We have
Let
Then the update eqn. can be written as
Analyse convergence in an average sense Algorithm run many times→study their ensemble average behavior
ELE 774 - Adaptive Signal Processing 16Chapter 5
Analysis of the LMS Algorithm
Using
It can be shown that
Small step sizeassumption
Here we use expectation,however, actually it isthe ensemble average!.
ELE 774 - Adaptive Signal Processing 17Chapter 5
Small Step Size Analysis
Assumption I: step size is small (how small?) → LMS filter act like a low-pass filter with very low cut-off frequency.
Assumption II: Desired response is described by a linear multiple regression model that is matched exactly by the optimum Wiener filter
where eo(n) is the irreducible estimation error and
Assumption III: The input and the desired response are jointly Gaussian.
ELE 774 - Adaptive Signal Processing 18Chapter 5
Small Step Size Analysis
Applying the similarity transformation resulting from the eigendecom.
on
i.e.
Then, we have
where
We do not have this term in Wiener filtering!.
Components of v(n)are uncorrelated!
HW: Prove these relations.
ELE 774 - Adaptive Signal Processing 19Chapter 5
Small Step Size Analysis
Components of v(n) are uncorrelated:
first order difference equation (Brownian motion, thermodynamics)
Solution: Iterating from n=0
natural componentof v(n)
forced componentof v(n)
stochastic force
ELE 774 - Adaptive Signal Processing 20Chapter 5
Learning Curves
Two kinds of learning curves Mean-square error (MSE) learning curve
Mean-square deviation (MSD) learning curve
Ensemble averaging → results of many (→∞) realizations are averaged.
What is the relation bw. MSE and MSD?
for small
ELE 774 - Adaptive Signal Processing 21Chapter 5
Learning Curves
under the assumptions of slide 17. Excess MSE
LMS performs worse than SD, there is always an excess MSE
for small
← use
ELE 774 - Adaptive Signal Processing 22Chapter 5
Learning Curves
Mean-square deviation D is lower-upper bounded by the excess MSE. They have similar response: decaying as n grows
or
ELE 774 - Adaptive Signal Processing 23Chapter 5
Convergence For small
Hence, for convergence
The ensemble-average learning curve of an LMS filter does not exhibit oscillations, rather, it decays exponentially to the const. value
or
Jex(n)
ELE 774 - Adaptive Signal Processing 24Chapter 5
Misadjustment
Misadjustment, define
For small , from prev. slide
or equivalently
but
then
ELE 774 - Adaptive Signal Processing 25Chapter 5
Average Time Constant
From SD we know that
but
then
ELE 774 - Adaptive Signal Processing 26Chapter 5
Observations
Misadjustment is directly proportional to the filter length M, for a fixed mse,av inversely proportional to the time constant mse,av
slower convergence results in lower misadjustment. Directly proportional to the step size
smaller step size results in lower misadjustment.
Time constant is inversely proportional to the step size
smaller step size results in slower convergence
Large requires the inclusion of k(n) (k≥1) into the analysis Difficult to analyse, small step analysis is no longer valid, learning curve becomes more noisy
ELE 774 - Adaptive Signal Processing 27Chapter 5
LMS vs. SD Main goal is to minimise the Mean Square Error (MSE) Optimum solution found by Wiener-Hopf equations.
Requires auto/cross-correlations. Achieves the minimum value of MSE, Jmin.
LMS and SD are iterative algorithms designed to find wo. SD has direct access to auto/cross-correlations (exact measurements)
can approach the Wiener solution wo, can go down to Jmin.
LMS uses instantenous estimates instead (noisy measurements)
fluctuates around wo in a Brownian-motion manner, at most J(∞).
ELE 774 - Adaptive Signal Processing 28Chapter 5
LMS vs. SD
Learning curves SD has a well-defined curve composed of decaying exponentials
For LMS, curve is composed of noisy- decaying exponentials
ELE 774 - Adaptive Signal Processing 29Chapter 5
Statistical Wave Theory
As filter length increases, M→∞ Propagation of electromagnetic disturbances along a
transmission line towards infinity is similar to signals on n infinitely long LMS filter.
Finite length LMS filter (transmission line) Corrections have to be made at the edges to tackle reflections, As length increases reflection region decreases compared to the
total filter. Imposes a limit on the step size to avoid instability as M→∞
If the upper bound is exceeded, instability is observed.
Smax: maximum componentof the PSD S(ω) of the tap inputs u(n).
ELE 774 - Adaptive Signal Processing 30Chapter 5
H∞ Optimality of LMS
A single realisation of LMS is not optimum in the MSE sense Ensemble average is. The previous derivation is heuristic
(replacing auto/cross correlations with their instantenous estimates.)
In what sense is LMS optimum? It can be shown that LMS minimises
Maximum energy gain of the filter under the constraint
Minimising the maximum of something → minimax Optimisation of an H∞ criterion.
ELE 774 - Adaptive Signal Processing 31Chapter 5
H∞ Optimality of LMS
Provided that the step size parameter satisfies the limits on the prev. slide, then
no matter how different the initial weight vector is from the unknown parameter vector wo of the multiple regression model, and
irrespective of the value of the additive disturbance (n), the error energy produced at the output of the LMS filter will never
exceed a certain level.
ELE 774 - Adaptive Signal Processing 32Chapter 5
Limits on the Step Size