Espanol l Ectr 16

download Espanol l Ectr 16

of 33

Transcript of Espanol l Ectr 16

  • 7/28/2019 Espanol l Ectr 16

    1/33

    1

    16. Mean Square Estimation

    Given some information that is related to an unknown quantity of

    interest, the problem is to obtain a good estimate for the unknown interms of the observed data.

    Suppose represent a sequence of random

    variables about whom one set of observations are available, and Y

    represents an unknown random variable. The problem is to obtain agood estimate forYin terms of the observations

    Let

    represent such an estimate forY.

    Note that can be a linear or a nonlinear function of the observationClearly

    represents the error in the above estimate, and the square of

    nXXX ,,, 21

    .,,, 21 nXXX

    )(),,,( 21 XXXXY n (16-1)

    )(.,,, 21 nXXX

    )()( XYYYX (16-2)

    2

    || PILLAI

  • 7/28/2019 Espanol l Ectr 16

    2/33

    2

    the error. Since is a random variable, represents the mean

    square error. One strategy to obtain a good estimator would be to

    minimize the mean square error by varying over all possible forms

    of and this procedure gives rise to the Minimization of the

    Mean Square Error (MMSE) criterion for estimation. Thus under

    MMSE criterion,the estimator is chosen such that the mean

    square error is at its minimum.

    Next we show that the conditional mean ofYgivenXis the

    best estimator in the above sense.Theorem1: Under MMSE criterion, the best estimator for the unknown

    Yin terms of is given by the conditional mean ofY

    givesX. Thus

    Proof : Let represent an estimate ofYin terms of

    Then the error and the mean square

    error is given by

    }||{2E

    ),(

    )(

    nXXX ,,, 21

    }.|{)( XYEXY (16-3)

    )( XY ).,,,( 21 nXXXX ,YY

    }|)(|{}||{}||{ 2222 XYEYYEE (16-4) PILLAI

    }||{ 2E

  • 7/28/2019 Espanol l Ectr 16

    3/33

    3

    Since

    we can rewrite (16-4) as

    where the inner expectation is with respect to Y, and the outer one is

    with respect to

    Thus

    To obtain the best estimator we need to minimize in (16-6)with respect to In (16-6), since

    and the variable appears only in the integrand term, minimization

    of the mean square error in (16-6) with respect to is

    equivalent to minimization of with respect to

    }]|{[][ XzEEzE zX

    }]|)(|{[}|)(|{

    z

    2

    z

    22 XXYEEXYE YX

    .X

    .)(}|)(|{

    }]|)(|{[

    2

    22

    dxXfXXYE

    XXYEE

    X

    (16-6)

    (16-5)

    ,2

    . ,0)( XfX ,0}|)(|{

    2 XXYE

    2

    }|)(|{ 2 XXYE .

    PILLAI

  • 7/28/2019 Espanol l Ectr 16

    4/33

    4

    Since X is fixed at some value, is no longer random,

    and hence minimization of is equivalent to

    This gives

    or

    But

    since when is a fixed number Using (16-9)

    )(X

    }|)(|{ 2 XXYE

    .0}|)(|{ 2

    XXYE

    (16-7)

    0}|)({| XXYE

    (16-8)

    ),(}|)({ XXXE (16-9)

    )(, XxX ).(x

    PILLAI

    .0}|)({}|{ XXEXYE

  • 7/28/2019 Espanol l Ectr 16

    5/33

    5

    in (16-8) we get the desired estimator to be

    Thus the conditional mean ofYgiven represents the bestestimator forYthat minimizes the mean square error.

    The minimum value of the mean square error is given by

    As an example, suppose is the unknown. Then the bestMMSE estimator is given by

    Clearly if then indeed is the best estimator forY

    }.,,,|{}|{)( 21 nXXXYEXYEXY

    nXXX ,,, 21

    .0)}|{var(

    ]}|)|(|{[}|)|(|{

    )var(

    222

    min

    XYE

    XXYEYEEXYEYE

    XY

    (16-11)

    (16-10)

    3

    XY

    ,3XY 3 XY

    .}|{}|{ 33 XXXEXYEY (16-12)

    PILLAI

  • 7/28/2019 Espanol l Ectr 16

    6/33

    6

    in terms ofX. Thus the best estimator can be nonlinear.

    Next, we will consider a less trivial example.

    Example : Let

    where k> 0 is a suitable normalization constant. To determine the best

    estimate forYin terms ofX, we need

    Thus

    Hence the best MMSE estimator is given by

    otherwise,0

    10,

    ),(,yxkxy

    yxf YX

    ).|(|

    xyfXY

    1.x0,2

    )1(

    2

    ),()(

    212

    1

    1

    ,

    xkxkxy

    kxydydyyxfxf

    x

    x xYXX

    y

    x

    1

    1

    .10;1

    2

    2/)1()(

    ),()|(

    22

    ,

    yxx

    y

    xkx

    kxy

    xf

    yxfxyf

    X

    YX

    XY

    PILLAI

    (16-13)

  • 7/28/2019 Espanol l Ectr 16

    7/33

    7

    Once again the best estimator is nonlinear. In general the best

    estimator is difficult to evaluate, and hence next we

    will examine the special subclass of best linear estimators.

    Best Linear EstimatorIn this case the estimator is a linear function of the

    observations Thus

    where are unknown quantities to be determined. The

    mean square error is given by

    .1

    )1(

    3

    2

    1

    1

    3

    2

    13

    2

    )|(}|{)(

    2

    2

    2

    31

    2

    3

    1

    2

    1

    21

    1

    2

    1

    22

    |

    x

    xx

    x

    x

    x

    y

    dyydyy

    dyxyfyXYEXY

    x

    xxx x

    y

    x XY

    }|{

    XYE

    (16-14)

    Y

    .,,, 21 nXXX

    naaa ,,, 21

    n

    iiinnl XaXaXaXaY

    12211 . (16-15)

    )

    ( lYY PILLAI

  • 7/28/2019 Espanol l Ectr 16

    8/33

    8

    and under the MMSE criterion should be chosen so

    that the mean square error is at its minimum possible

    value. Let represent that minimum possible value. Then

    To minimize (16-16), we can equate

    This gives

    But

    naaa ,,, 21

    }||{}|{|}||{ 222 iil XaYEYYEE

    }||{ 2E2

    n

    n

    iii

    aaan XaYE

    n 1

    2

    ,,,

    2 }.|{|min21

    (16-17)

    ,n.,,kEak

    21,0}|{| 2

    (16-18)

    .02||

    }|{|

    *22

    kkk aE

    aEE

    a

    (16-19)

    PILLAI

    (16-16)

  • 7/28/2019 Espanol l Ectr 16

    9/33

    9

    Substituting (16-19) in to (16-18), we get

    or the best linear estimator must satisfy

    Notice that in (16-21), represents the estimation error

    and represents the data. Thus from(16-21), the error is orthogonal to the data for the

    best linear estimator. This is the orthogonality principle.

    In other words, in the linear estimator (16-15), the unknown

    constants must be selected such that the error

    .

    )()(11

    k

    k

    n

    iii

    kk

    n

    iii

    k

    Xa

    Xa

    a

    Y

    a

    XaY

    a

    ,0}{2}|{| *

    2

    k

    k

    XEa

    E

    .,,2,1,0}{*

    nkXE k

    (16-20)

    (16-21)

    n

    i iiXaY 1 ),(

    nkXk 1, nkXk 1,

    naaa ,,, 21

    PILLAI

  • 7/28/2019 Espanol l Ectr 16

    10/33

    10

    is orthogonal to every data for the

    best linear estimator that minimizes the mean square error.

    Interestingly a general form of the orthogonality principle

    holds good in the case of nonlinear estimators also.Nonlinear Orthogonality Rule: Let represent any functional

    form of the data and the best estimator forYgiven With

    we shall show that

    implying that

    This follows since

    n

    i iiXaY

    1 nXXX ,,, 21

    )(Xh}|{ XYE .X

    }|{ XYEYe

    ).(}|{ XhXYEYe

    ,0)}({ XehE

    .0)}({)}({

    ]}|)([{)}({

    })(]|[{)}({

    )}(])|[{()}({

    XYhEXYhE

    XXYhEEXYhE

    XhXYEEXYhE

    XhXYEYEXehE

    PILLAI

    (16-22)

  • 7/28/2019 Espanol l Ectr 16

    11/33

    11

    Thus in the nonlinear version of the orthogonality rule the error is

    orthogonal to any functional form of the data.

    The orthogonality principle in (16-20) can be used to obtain

    the unknowns in the linear case.

    For example suppose n = 2, and we need to estimate Yin

    terms of linearly. Thus

    From (16-20), the orthogonality rule gives

    Thus

    or

    naaa ,,, 21

    21 andXX

    2211 XaXaYl

    0}){(}X{

    0}){(}X{

    *

    22211

    *

    2

    *

    12211

    *

    1

    XXaXaYEE

    XXaXaYEE

    }{}|{|}{

    }{}{}|{|

    *

    22

    2

    21

    *

    21

    *

    12

    *

    121

    2

    1

    YXEaXEaXXE

    YXEaXXEaXE

    PILLAI

  • 7/28/2019 Espanol l Ectr 16

    12/33

  • 7/28/2019 Espanol l Ectr 16

    13/33

    13

    where are the optimum values from (16-21).

    Since the linear estimate in (16-15) is only a special case of

    the general estimator in (16-1), the best linear estimator that

    satisfies (16-20) cannot be superior to the best nonlinear estimator

    Often the best linear estimator will be inferior to the best

    estimator in (16-3).

    This raises the following question. Are there situations in

    which the best estimator in (16-3) also turns out to be linear ? In

    those situations it is enough to use (16-21) and obtain the bestlinear estimators, since they also represent the best global estimators.

    Such is the case ifYand are distributed as jointly Gaussia

    We summarize this in the next theorem and prove that result.

    Theorem2: If and Y are jointly Gaussian zero

    naaa ,,, 21

    )(X

    }.|{ XYE

    }{}|{|

    }){(}{

    1

    *2

    1

    **2

    n

    i

    ii

    n

    iiin

    YXEaYE

    YXaYEYE

    (16-25)

    nXXX ,,, 21

    nXXX ,,, 21 PILLAI

  • 7/28/2019 Espanol l Ectr 16

    14/33

    14

    mean random variables, then the best estimate forYin terms of

    is always linear.

    Proof : Let

    represent the best (possibly nonlinear) estimate ofY, and

    the best linear estimate of Y. Then from (16-21)

    is orthogonal to the data Thus

    Also from (16-28),

    nXXX ,,, 21

    }|{),,,( 21 XYEXXXY n (16-26)

    n

    i

    iil XaY1

    (16-27)

    .1, nkXk

    (16-28)

    .1,0}X{ *k nkE (16-29)

    .0}{}{}{1

    n

    iii XEaYEE (16-30)

    PILLAI

    1

    n

    l i ii

    Y Y Y a X

  • 7/28/2019 Espanol l Ectr 16

    15/33

    15

    Using (16-29)-(16-30), we get

    From (16-31), we obtain that and are zero mean uncorrelatedrandom variables for But itself represents a Gaussian

    random variable, since from (16-28) it represents a linear combination

    of a set of jointly Gaussian random variables. Thus and X are

    jointly Gaussian and uncorrelated random variables. As a result, and

    X are independent random variables. Thus from their independence

    But from (16-30), and hence from (16-32)

    Substituting (16-28) into (16-33), we get

    .1 nk kX

    .1,0}{}{}{**

    nkXEEXE kk (16-31)

    }.{}|{ EXE (16-32)

    ,0}{ E

    .0}|{ XE (16-33)

    0}|{}|{ 1

    XXaYEXEn

    iii

    PILLAI

  • 7/28/2019 Espanol l Ectr 16

    16/33

    16

    or

    From (16-26), represents the best possible estimator,and from (16-28), represents the best linear estimator.

    Thus the best linear estimator is also the best possible overall estimator

    in the Gaussian case.

    Next we turn our attention to prediction problems using linearestimators.

    Linear Prediction

    Suppose are known and is unknown.Thus and this represents a one-step prediction problem.

    If the unknown is then it represents a k-step ahead prediction

    problem. Returning back to the one-step predictor, let

    represent the best linear predictor. Then

    .}|{}|{1

    1

    l

    n

    iii

    n

    iii YXaXXaEXYE

    (16-34)

    )(}|{ xXYE

    n

    i iiXa

    1

    nXXX ,,, 21 1nX,1 nXY,knX

    1

    nX

    PILLAI

  • 7/28/2019 Espanol l Ectr 16

    17/33

    17

    where the error

    is orthogonal to the data, i.e.,

    Using (16-36) in (16-37), we get

    Suppose represents the sample of a wide sense stationaryiX

    (16-35)

    ,1,

    1

    1

    1

    12211

    1111

    n

    n

    i ii

    nnn

    n

    iiinnnn

    aXa

    XXaXaXa

    XaXXX

    (16-36)

    .1,0}{ * nkXE kn (16-37)

    1

    1

    ** .1,0}{}{n

    ikiikn nkXXEaXE (16-38)

    PILLAI

    11

    = ,n

    n i ii

    X a X

  • 7/28/2019 Espanol l Ectr 16

    18/33

    18

    stochastic process so that

    Thus (16-38) becomes

    Expanding (16-40) for we get the following set of

    linear equations.

    Similarly using (16-25), the minimum mean square error is given by

    **)(}{ ikkiki rrkiRXXE

    )(tX

    (16-39)

    .1,1,0}{ 1

    1

    1

    * nkaraXE n

    n

    ikiikn

    (16-40)

    ,,,2,1 nk

    .0

    20

    10

    10

    *

    33

    *

    22

    *

    11

    121302

    *

    11

    1231201

    nkrrararara

    krrararara

    krrararara

    nnnn

    nnn

    nnn

    (16-41)

    PILLAI

  • 7/28/2019 Espanol l Ectr 16

    19/33

    19

    The n equations in (16-41) together with (16-42) can be represented as

    Let

    .

    }){(

    }{}{}|{|

    01

    *

    23

    *

    12

    *

    1

    1

    1

    *

    1

    1

    1

    *

    1

    *

    1

    *22

    rrararara

    raXXaE

    XEYEE

    nnnn

    n

    i

    ini

    n

    i

    nii

    nnnn

    (16-42)

    .

    0

    0

    0

    0

    1

    2

    n

    3

    2

    1

    0

    *

    1

    *

    1

    *

    10

    *

    2

    *

    1

    20

    *

    1

    *

    2

    110

    *

    1

    210

    n

    nn

    nn

    n

    n

    n

    a

    a

    a

    a

    rrrr

    rrrr

    rrrr

    rrrr

    rrrr

    (16-43)

    PILLAI

  • 7/28/2019 Espanol l Ectr 16

    20/33

    20

    Notice that is Hermitian Toeplitz and positive definite. Using

    (16-44), the unknowns in (16-43) can be represented as

    Let

    nT

    .

    0*

    1

    *

    1

    *

    110

    *

    1

    210

    rrrr

    rrrr

    rrrr

    T

    nn

    n

    n

    n

    (16-44)

    1

    2

    2

    n

    13

    2

    1

    of

    column

    Last

    0

    0

    0

    0

    1

    n

    nn

    n T

    T

    a

    a

    a

    a

    (16-45)

    PILLAI

  • 7/28/2019 Espanol l Ectr 16

    21/33

    21

    Then from (16-45),

    Thus

    .

    1,12,11,1

    1,22221

    1,11211

    1

    nn

    n

    n

    n

    n

    n

    n

    nnn

    n

    nnn

    n

    TTT

    TTT

    TTT

    T

    .

    1

    1,1

    1,2

    1,1

    2

    2

    1

    nnn

    n

    n

    n

    n

    n

    n

    T

    T

    T

    a

    a

    a

    ,01

    1,1

    2 nn

    n

    n

    T

    (16-46)

    (16-47)

    (16-48)PILLAI

  • 7/28/2019 Espanol l Ectr 16

    22/33

    22

    and

    Eq. (16-49) represents the best linear predictor coefficients, and they

    can be evaluated from the last column of in (16-45). Using these,

    The best one-step ahead predictor in (16-35) taken the form

    and from (16-48), the minimum mean square error is given by the

    (n +1, n +1) entry of

    From (16-36), since the one-step linear prediction error

    .

    1

    1,1

    1,2

    1,1

    1,1

    2

    1

    nn

    n

    n

    n

    n

    n

    nn

    n

    n T

    T

    T

    T

    a

    a

    a

    (16-49)

    nT

    .1n

    T

    .)(1

    1

    1,

    1,11

    n

    i

    i

    ni

    nnn

    n

    n XTT

    X (16-50)

    ,11111 XaXaXaX nnnnnn (16-51) PILLAI

  • 7/28/2019 Espanol l Ectr 16

    23/33

    23

    we can represent (16-51) formally as follows

    Thus, let

    them from the above figure, we also have the representation

    The filter

    represents anAR(n) filter, and this shows that linear prediction leads

    to an auto regressive (AR) model.

    n

    n

    nnn zazazaX

    1 12

    1

    1

    1

    ,1)( 12

    1

    1 n

    nnn zazazazA

    (16-52)

    .)(

    1 1 n

    n

    n XzA

    n

    nnn zazazazAzH

    1

    2

    1

    11

    1

    )(

    1)(

    (16-53)

    PILLAI

  • 7/28/2019 Espanol l Ectr 16

    24/33

    24

    The polynomial in (16-52)-(16-53) can be simplified using

    (16-43)-(16-44). To see this, we rewrite as

    To simplify (16-54), we can make use of the following matrix identity

    )(zAn)(zAn

    2

    n

    11)1(

    2

    1

    1)1(

    12

    1

    )1(

    21

    0

    0

    0

    ]1,,,,[

    1

    ]1,,,,[

    1)(

    n

    nn

    n

    nn

    nn

    nn

    n

    Tzzz

    a

    a

    a

    zzz

    zazazazazA

    (16-54)

    PILLAI

    .

    0

    0

    1

    BCADC

    A

    I

    ABI

    DC

    BA(16-55)

  • 7/28/2019 Espanol l Ectr 16

    25/33

    25

    Taking determinants, we get

    In particular if we get

    Using (16-57) in (16-54), with

    .

    1

    BCADADC

    BA

    (16-56)

    ,0D

    .0

    )1(1C

    BAA

    BCAn

    (16-57)

    2

    1)1(

    0

    ,],1,,,,[

    n

    n

    nnBTAzzzC

    PILLAI

  • 7/28/2019 Espanol l Ectr 16

    26/33

    26

    we get

    Referring back to (16-43), using Cramers rule to solve for

    we get

    .

    1

    ||

    01,,,

    0

    0

    0

    ||

    )1()(

    1)1(

    10

    *

    2

    *

    1

    110*

    1

    210

    2

    1

    2

    zzz

    rrrr

    rrrr

    rrrr

    T

    zz

    T

    TzA

    nn

    nn

    n

    n

    n

    n

    n

    n

    n

    n

    n

    n

    (16-58)

    ),1(1 na

    PILLAI

    1||

    ||

    ||

    1201

    10

    2

    1

    n

    nn

    n

    n

    n

    n

    nT

    T

    T

    rr

    rr

    a

  • 7/28/2019 Espanol l Ectr 16

    27/33

    27

    or

    Thus the polynomial (16-58) reduces to

    The polynomial in (16-53) can be alternatively represented as

    in (16-60), and in fact represents a stable

    .0||

    ||

    1

    2 n

    nn

    T

    T

    .1

    1

    ||

    1)(

    1

    2

    1

    1

    1)1(

    10

    *

    2

    *

    1

    110

    *

    1

    210

    1

    n

    nn

    nn

    nn

    n

    n

    n

    n

    zazaza

    zzz

    rrrr

    rrrr

    rrrr

    TzA

    (16-59)

    (16-60)

    PILLAI

    )(zAn

    )(~)(

    1)( nAR

    zAzH

    n

  • 7/28/2019 Espanol l Ectr 16

    28/33

    28

    AR filter of ordern, whose input error signal is white noise of

    constant spectral height equal to and output is

    It can be shown that has all its zeros in provided

    thus establishing stability.

    Linear prediction ErrorFrom (16-59), the mean square error using n samples is given

    by

    Suppose one more sample from the past is available to evaluate

    ( i.e., are available). Proceeding as abovethe new coefficients and the mean square error can be determined.

    From (16-59)-(16-61),

    n

    ||/|| 1nn TT .1nX

    1nX

    .0||

    ||

    1

    2 n

    nn

    T

    T (16-61)

    011 ,,,, XXXX nn 21n

    PILLAI

    .

    ||

    || 121

    n

    nn

    T

    T (16-62)

    )(zAn 1|| z0|| nT

  • 7/28/2019 Espanol l Ectr 16

    29/33

    29

    Using another matrix identity it is easy to show that

    Since we must have or for every n.

    From (16-63), we have

    or

    since Thus the mean square error decreases as more

    and more samples are used from the past in the linear predictor.

    In general from (16-64), the mean square errors for the one-step

    predictor form a monotonic nonincreasing sequence

    ).||1(||

    |||| 21

    1

    2

    1

    nn

    nn s

    T

    TT (16-63)

    ,0|| kT 0)||1(2

    1 ns 1|| 1 ns

    )||1(

    ||

    ||

    ||

    || 21

    1

    1

    221

    n

    n

    n

    n

    n s

    T

    T

    T

    T

    nn

    ,)||1(22

    1

    22

    1 nnnn s (16-64)

    PILLAI

    .1)||1( 21 ns

  • 7/28/2019 Espanol l Ectr 16

    30/33

    30

    whose limiting value

    Clearly, corresponds to the irreducible error in linearprediction using the entire past samples, and it is related to the power

    spectrum of the underlying process through the relation

    where represents the power spectrum of

    For any finite power process, we have

    and since Thus

    222

    1

    2 knn (16-65)

    .02

    )(nTX

    ).(nTX( ) 0XX

    S

    2

    1

    exp ln ( ) 0.2 XXS d

    (16-66)

    ( ) ,XXS d

    PILLAI

    02

    ln ( ) ( ) .XX XX

    S d S d

    ( ( ) 0), ln ( ) ( ).XX XX XX

    S S S

    (16-67)

  • 7/28/2019 Espanol l Ectr 16

    31/33

    31

    Moreover, if the power spectrum is strictly positive at every

    Frequency, i.e.,

    then from (16-66)

    and hence

    i.e., For processes that satisfy the strict positivity condition in

    (16-68) almost everywhere in the interval the finalminimum mean square error is strictly positive (see (16-70)).

    i.e., Such processes are not completely predictable even using

    their entire set of past samples, or they are inherently stochastic,

    ( ) 0, in - ,XX

    S

    ln ( ) .XX

    S d

    2

    1exp ln ( ) 0

    2XX

    S d e

    (16-68)

    (16-69)

    (16-70)

    ),,(

    PILLAI

  • 7/28/2019 Espanol l Ectr 16

    32/33

    32

    since the next output contains information that is not contained in

    the past samples. Such processes are known as regularstochastic

    processes, and their power spectrum is strictly positive.

    )(XX

    S

    Power Spectrum of a regular stochastic Process

    PILLAI

    Conversely, if a process has the following power spectrum,

    such that in then from (16-70),( ) 0XX

    S 21 .02

    2

    1

    )(XX

    S

  • 7/28/2019 Espanol l Ectr 16

    33/33

    33

    Such processes are completely predictable from their past data

    samples. In particular

    is completely predictable from its past samples, since consists

    of line spectrum.

    in (16-71) is a shape deterministic stochastic process.

    k

    kkk tanTX )cos()( (16-71)

    ( )XX

    S

    )(nTX

    1

    k

    )(XX

    S