EIE6207: Estimation Theory - PolyU

31
EIE6207: Estimation Theory Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University [email protected] http://www.eie.polyu.edu.hk/mwmak References: Steven M. Kay, Fundamentals of Statistical Signal Processing, Prentice Hall, 1993. http://www.cs.tut.fi/ ~ hehu/SSP/ November 29, 2020 Man-Wai MAK (EIE) Estimation Theory November 29, 2020 1 / 31

Transcript of EIE6207: Estimation Theory - PolyU

Page 1: EIE6207: Estimation Theory - PolyU

EIE6207: Estimation Theory

Man-Wai MAK

Dept. of Electronic and Information Engineering,The Hong Kong Polytechnic University

[email protected]://www.eie.polyu.edu.hk/∼mwmak

References:

Steven M. Kay, Fundamentals of Statistical Signal Processing, Prentice Hall, 1993.

http://www.cs.tut.fi/~hehu/SSP/

November 29, 2020

Man-Wai MAK (EIE) Estimation Theory November 29, 2020 1 / 31

Page 2: EIE6207: Estimation Theory - PolyU

Overview

1 Motivations

2 Introduction

3 Minimum Variance Unbiased Estimation

4 Cramer-Rao Lower Bound

Man-Wai MAK (EIE) Estimation Theory November 29, 2020 2 / 31

Page 3: EIE6207: Estimation Theory - PolyU

Motivations

Estimation in signal processing is fundamental to many applications:

RadarSonarSpeech and audio processingImage and video processingBiomedical and biomedicineCommunicationsControl engineeringSeismology

All of these applications require to estimate a set of parameters, e.g.,the position of aircraft through radar, position of a submarine throughsonar, and phonemes of speech waveform.

All of these applications have a common characteristics: Theparameters are estimated from noisy signals. So, their value can onlybe an estimate.

Man-Wai MAK (EIE) Estimation Theory November 29, 2020 3 / 31

Page 4: EIE6207: Estimation Theory - PolyU

A Naive Example

Assume that you have a coin but do not know its true probability p ofgetting a head if you flip it.

You do an experiment to flip it N times and get Nh heads.

In statistics, we say that you have an estimator p =Np

N ≈ p.

The approximation becomes very close when N is big.

An estimator is a way of estimating an unknown parameter (p in ourexample) of a statistical model (the coin) given some data (no. ofheads in N trials).

Man-Wai MAK (EIE) Estimation Theory November 29, 2020 4 / 31

Page 5: EIE6207: Estimation Theory - PolyU

Mean Estimator

In all of these applications, we estimate the values of parametersbased on some waveforms.

Specifically, Given N discrete-time {x[0], x[1], . . . , x[N − 1]} thatdepends on an unknown parameter θ, we define an estimator θ toestimate θ:

θ = g(x[0], x[1], . . . , x[N − 1]),

where g is some function.

For example, for the mean estimator, g(·) is given by

g(x[0], x[1], . . . , x[N − 1]) =1

N

N−1∑i=0

x[i]

Man-Wai MAK (EIE) Estimation Theory November 29, 2020 5 / 31

Page 6: EIE6207: Estimation Theory - PolyU

Estimator of a Linear Model

The stock price in the figure below seems to increase on average.

To verify this hypothesis, we use the model:

x[n] = A+Bn+ w[n], n = 0, 1, . . . , N − 1,

where w[n] is white Gaussian noise with PDF N (0, σ2).

Figure: y-axis: stock price x[n]; x-axis: time point n

Man-Wai MAK (EIE) Estimation Theory November 29, 2020 6 / 31

Page 7: EIE6207: Estimation Theory - PolyU

Estimator of a Linear Model

If w[n] is uncorrelated with x[n], then

p(x;θ) =

N−1∏n=1

p(x[n];θ)

=

N−1∏n=0

1√2πσ2

exp

{− 1

2σ2(x[n]−A−Bn)2

}

=1

(2πσ2)N/2exp

{− 1

2σ2

N−1∑n=0

(x[n]−A−Bn)2

}

where x = [x[0] . . . x[N − 1]]T and θ = [A B]T.

Maximizing p(x;θ) with respect to θ, we obtain close-form solutionsfor A and B.

For the diagram shown in previous page, B > 0, meaning the stockprice is increasing.

Man-Wai MAK (EIE) Estimation Theory November 29, 2020 7 / 31

Page 8: EIE6207: Estimation Theory - PolyU

Bayesian Estimator

So far, we assume that θ is deterministic but unknown. This is calledclassical estimation.

But we could also bring in prior knowledge about θ, e.g., prior rangeor distribution of θ = [A B]T. Then, we have a Bayesian estimator.

The data and parameters are modeled by a joint PDF:

p(x,θ) = p(x|θ)p(θ),

where p(θ) is the prior PDF, which summarizes our knowledge aboutθ before any data are observed.

p(x|θ) is the conditional PDF, summarizing our knowledge about xwhen we know θ.

Man-Wai MAK (EIE) Estimation Theory November 29, 2020 8 / 31

Page 9: EIE6207: Estimation Theory - PolyU

Minimum Variance Unbiased Estimation

Man-Wai MAK (EIE) Estimation Theory November 29, 2020 9 / 31

Page 10: EIE6207: Estimation Theory - PolyU

Minimum Variance Unbiased Estimation

An estimator is unbiased if on average it yields the true value of theunknown parameter θ.

Mathematically, an estimator is unbiased if

E(θ) = θ a < θ < b.

Example: Unbiased estimator for DC level with white Gaussian noise(WGN):

Consider the observations

x[n] = A+ w[n] n = 0, 1, . . . , N − 1 (1)

where −∞ < A <∞ is the parameter to be estimated and w[n] isWGN.A reasonable estimator for the average value of x[n] is

A =1

N

N−1∑n=0

x[n]

Man-Wai MAK (EIE) Estimation Theory November 29, 2020 10 / 31

Page 11: EIE6207: Estimation Theory - PolyU

Minimum Variance Unbiased Estimation

Using the linearity property of expectation:

E(A) =1

N

N−1∑n=0

E(x[n]) = A.

The sample mean is an unbiased estimator of the DC level in WGN.

If multiple estimators of the same parameter θ are available, i.e.,{θ1, θ2, . . . , θK}, a better estimator can be obtained by averagingthese estimators:

θ =1

K

K∑k=1

θk.

If all estimators are unbiased, we have

E(θ) = θ

If all estimators have the same variance, then

var(θ) =1

K2

∑K

k=1var(θk) =

var(θ1)

K=⇒ variance decrease

Man-Wai MAK (EIE) Estimation Theory November 29, 2020 11 / 31

Page 12: EIE6207: Estimation Theory - PolyU

Minimum Variance Unbiased Estimation

However, a biased estimator is characterized by a systematic error,e.g., E(θk) = θ + b(θ). Then,

E(θ) =1

K

K∑k=1

E(θk) = θ + b(θ)

This means that having more biased estimators does not mean thatthe bias can be reduced.

Man-Wai MAK (EIE) Estimation Theory November 29, 2020 12 / 31

Page 13: EIE6207: Estimation Theory - PolyU

Minimum Variance Criterion

Mean square error (MSE):

mse(θ) = E{

(θ − θ)2}

The MSE has problem if the estimator is biased (see Tutorial Q1):

mse(θ) = E{[(

θ − E(θ))

+(E(θ)− θ

)]2}= var(θ) +

[E(θ)− θ

]2= var(θ) + b2(θ)

(2)

This means that the MSE comprises errors due to the variance of theestimators and the bias.

Therefore, we should find an unbiased estimator (with b = 0) thatproduces the minimum variance.

Man-Wai MAK (EIE) Estimation Theory November 29, 2020 13 / 31

Page 14: EIE6207: Estimation Theory - PolyU

Example of Biased Estimators

Let the estimator of A in Eq. 1 be

A =α

N

N−1∑n=0

x[n],

where α is an arbitrary constant.

We would like to find the value of α that minimize

mse(A) = E{

(A−A)2}

= var(A) +[E{A} −A

]2=α2σ2

N+ (α− 1)2A2 (See Tut Q2)

Setting ∂mse(A)∂α = 0, we obtain

α∗ =A2

A2 + σ2/N

The optimal value α∗ depends on the unknown parameter A.Therefore the estimator is not realizable.Man-Wai MAK (EIE) Estimation Theory November 29, 2020 14 / 31

Page 15: EIE6207: Estimation Theory - PolyU

Example of Biased Estimators

Given a dataset, its sample variance is a biased estimator of the truepopulation variance (see Tut Q3).

Man-Wai MAK (EIE) Estimation Theory November 29, 2020 15 / 31

Page 16: EIE6207: Estimation Theory - PolyU

Unbiased Estimators

We could focus on estimators that have zero bias so that the biascontributes noting to the MSE.

Without the bias term, the MSE in Eq. 2 does not involve theunknown parameter.

By focusing on estimators with zero bias, we may hope to arrive at adesign criterion that yields realizable estimators.

Definition: An estimator θ is called unbiased if its bias is zero for allvalues of the unknown parameter.

Recap: For an estimator to be unbiased, we require that on averagethe estimator will yield the true value of the unknown parameter.

Man-Wai MAK (EIE) Estimation Theory November 29, 2020 16 / 31

Page 17: EIE6207: Estimation Theory - PolyU

Unbiased Estimators

Some unbiased estimators are more useful than others. For example,consider two estimators for Eq. 1:

A1 = x[0] and A2 =1

N

N−1∑n=0

x[n].

Although E(A1) = E(A2) = A, A2 is better because

var(A1) = σ2 > var(A2) =σ2

N. (3)

Man-Wai MAK (EIE) Estimation Theory November 29, 2020 17 / 31

Page 18: EIE6207: Estimation Theory - PolyU

How Good is Our Estimator

(a) (b) (c) (d)

If you hit the bullseye on average, you are an unbiased dart-thrower,e.g., (a) and (c).If your throws often off-centered, you are a biased dart-thrower, e.g.,(b) and (d).A good unbiased dart-thrower should hit the bullseye on average withsmall variation around the bullseye, e.g., (a).A bad unbiased dart-thrower has large variation, although he/she canhit the bullseye on average, e.g., (c).A really bad biased dart-thrower has large variation around a pointwith an offset from the bullseye, e.g., (d).Man-Wai MAK (EIE) Estimation Theory November 29, 2020 18 / 31

Page 19: EIE6207: Estimation Theory - PolyU

Minimum Variance Unbiased Estimators

If the bias is zero, the MSE is just the variance.

This gives rise to the minimum variance unbiased estimator (MVUE)for θ.

Definition: An estimator θ is the MVUE if it is (1) unbiased and (2)has the smallest variance among any unbiased estimator for all valuesof the unknown parameter θ, i.e., ∀θ, we have

var(θMVUE) ≤ var(θ)

It is important to note that a uniformly MVUE may not always exist,and even if it does, we may not be able to find it.

This definition and the above discussions apply to vector parametersθ.

Man-Wai MAK (EIE) Estimation Theory November 29, 2020 19 / 31

Page 20: EIE6207: Estimation Theory - PolyU

Cramer-Rao Lower Bound

Man-Wai MAK (EIE) Estimation Theory November 29, 2020 20 / 31

Page 21: EIE6207: Estimation Theory - PolyU

Cramer-Rao Lower Bound

Eq. 2 suggests that the variance of an unbiased estimator is the MSE,i.e., if b(θ) = 0, we have

mse(θ) = var(θ)

But how do we know that the unbiased estimator θ is the one withthe smallest MSE?

Also, how do we compute its variance?

This quantity is given by the Cramer-Rao lower bound (CRLB).

The CRLB theorem induces a lower bound on the MSE of anestimator

The CRLB theorem also provides a method for finding the bestestimator.

Man-Wai MAK (EIE) Estimation Theory November 29, 2020 21 / 31

Page 22: EIE6207: Estimation Theory - PolyU

Cramer-Rao Lower Bound

We will use the example in Eq. 1 to explain the idea of CRLB.

For simplicity, suppose that we are using only a single observationx[0] to estimate A in Eq. 1, x[n] = A+ w[n].

The PDF of x[0] is

p(x[0];A) =1√

2πσ2exp

{− 1

2σ2(x[0]−A)2

}Once we have observed x[0], say for example x[0] = 3, some values ofA are more likely than others.

Man-Wai MAK (EIE) Estimation Theory November 29, 2020 22 / 31

Page 23: EIE6207: Estimation Theory - PolyU

Cramer-Rao Lower Bound

The pdf of A has the same form as the PDF of x[0]:

pdf of A =1√

2πσ2exp

{− 1

2σ2(3−A)2

}Discovering the CRLB idea (cont.)

0 1 2 3 4 5 60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

x[0]

Value of A

PDF plotted versus A in the case x[0]=3 and σ2 = 1/3

0 1 2 3 4 5 60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

x[0]

Value of A

PDF plotted versus A in the case x[0]=3 and σ2 = 1

• Observation: in the upper case we will probably get moreaccurate results.

We probably will get more accurate estimate of A for the distributionof A in the upper pannel.

Man-Wai MAK (EIE) Estimation Theory November 29, 2020 23 / 31

Page 24: EIE6207: Estimation Theory - PolyU

Cramer-Rao Lower Bound

If the pdf is viewed as a function of the unknown parameter (with xfixed), it is called the likelihood function (function of the unknownparameters).

The “sharpness” of the likelihood function determines how accuratelywe can estimate the parameter.

If a single sample is observed as x[0] = A+ w[0], then we can expecta better estimate if σ2 is small.

How to quantify the sharpness? Is there any measure that would becommon for all possible estimators for a specific estimation problem?

The second derivative of the likelihood function (or log-likelihood) isan alternative for measuring the sharpness of a function.

Man-Wai MAK (EIE) Estimation Theory November 29, 2020 24 / 31

Page 25: EIE6207: Estimation Theory - PolyU

Cramer-Rao Lower Bound

Recall that the PDF of x[0] is

p(x[0];A) =1√

2πσ2exp

{− 1

2σ2(x[0]−A)2

}The log-likelihood function:

log p(x[0];A) = − log√

2πσ2 − 1

2σ2(x[0]−A)2

The first derivative w.r.t. A is the score function:

∇A log p(x[0];A) =∂ log p(x[0];A)

∂A=

1

σ2(x[0]−A)

The second derivative is independent of x[0]:

∂2 log p(x[0];A)

∂A2= − 1

σ2

Man-Wai MAK (EIE) Estimation Theory November 29, 2020 25 / 31

Page 26: EIE6207: Estimation Theory - PolyU

Cramer-Rao Lower Bound

Because σ2 is the smallest possible variance, for this particular case,we have an alternative way of finding the minimum variance:

minimum variance = σ2 =1

−∂2 log p(x[0];A)∂A2

For the general cases,

If the function depends on data x, take the expectation over all x’s.If the function depends on the parameter θ, evaluate the derivative atthe true value of θ,

We have the general rule:

min variance of any unbiased estimator =1

−Ex∼p(x;θ)

[∂2 log p(x;θ)

∂θ2

] ,

Man-Wai MAK (EIE) Estimation Theory November 29, 2020 26 / 31

Page 27: EIE6207: Estimation Theory - PolyU

CRLB Theorem

If the pdf p(x; θ) satisfies the “regularity” condition (see Tut Q5):

Ex∼p(x;θ)

{∂ log p(x; θ)

∂θ

}= 0, ∀θ,

then, the variance of any unbiased estimator θ must satisfy

var(θ) ≥ 1

−Ex∼p(x;θ)

[∂2 log p(x,θ)

∂θ2

] ,where the derivative is evalauted at the true value of θ.

An unbiased estimator attains the bound for all θ if and only if

∂ log p(x; θ)

∂θ= I(θ)(g(x)− θ) (4)

for some functions g and I. The MVU estimator is θ = g(x) and theminimum variance = 1/I(θ).

Man-Wai MAK (EIE) Estimation Theory November 29, 2020 27 / 31

Page 28: EIE6207: Estimation Theory - PolyU

CRLB Example 1: Estimation of DC Level in WGN

Example: DC level in Gaussian noise:

x[n] = A+ w[n], n = 0, 1, . . . , N − 1

What is the minimum variance of any unbiased estimator using Nsamples?

The likelihood function is the product of N densities:

p(x;A) =

N−1∏n=0

1√2πσ2

exp

{− 1

2σ2(x[n]−A)2

}

=1

(2πσ2)N/2exp

{− 1

2σ2

N−1∑n=0

(x[n]−A)2

}

Man-Wai MAK (EIE) Estimation Theory November 29, 2020 28 / 31

Page 29: EIE6207: Estimation Theory - PolyU

CRLB Example 1: Estimation of DC Level in WGN

The log-likelihood function is

log p(x;A) = − log[(2πσ2)

N2

]− 1

2σ2

N−1∑n=0

(x[n]−A)2

The first derivative is

∂ log p(x, A)

∂A=

1

σ2

N−1∑n=0

(x[n]−A) =N

σ2(x−A),

where x is the sample mean.

The 2nd derivative is

∂2 log p(x, A)

∂A2= −N

σ2

Therefore, the minimum variance of any unbiased estimator is

var(A) ≥ σ2

N(5)

Man-Wai MAK (EIE) Estimation Theory November 29, 2020 29 / 31

Page 30: EIE6207: Estimation Theory - PolyU

CRLB Example: Estimation of DC Level in WGN

In Eq. 3, the sample mean A2 gives the minimum variance in Eq. 5.Therefore, the sample mean is an MVUE for this problem.

Using the CRLB theorem in Eq. 4, we have

∂ log p(x;A)

∂A=N

σ2(x−A)

= I(A)(g(x)−A),

where I(A) = Nσ2 and g(x) = x.

So, g(x) is the MVUE and 1/I(A) is its variance.

Man-Wai MAK (EIE) Estimation Theory November 29, 2020 30 / 31

Page 31: EIE6207: Estimation Theory - PolyU

CRLB Summary

Cramer Rao inequality provides lower bound for the estimation errorvariance.

Minimum attainable variance is often larger than CRLB.

We need to know the pdf to evaluate CRLB. If the data is multivariateGaussian or i.i.d. with known distribution, we can evaluate it.

Often we don’t know this information and cannot evaluate this bound.

It is not guaranteed that MVUE exists or is realizable.

Man-Wai MAK (EIE) Estimation Theory November 29, 2020 31 / 31