The Projection Theorem in Hilbert Spaces

12
The projection theorem in Hilbert spaces and some of its applications Wolfgang H. Schmidt Fachhochschule für Technik und Wirtschaft Berlin Summary. The projection theorem in Hilbert spaces with real valued scalar products is shown to imply several well known results. Special attention is paid to statistical problems. AMS subject classification: Key words: Hilbert spaces, projection, Fourier approximation, Least-Squares, best linear unbiased estimator, best unbiased estimator, Bayesian estimator, Cramer-Rao inequality 1 Introduction The purpose of this paper is to show that several well known results in different fields can be deduced from the projection theorem in Hilbert spaces . This is demonstrated e.g. for the computation of Fourier approximations of periodic functions , for Least-Squares approximations in R n and in R n m × and for the determination of best unbiased estimators. The Gauss-Markov theorem and the Lehmann-Scheffé theorem as well as a characterisation of the Maximum-Likelihood estimator by the attainment of the Rao-Cramér inequality are shown to be consequences of the projection theorem. These examples are intended to highlight the projection theorem. Of course, the interested reader may have in mind further applications. 2 The projection theorem Definition. Let F be a linear space endowed with a real valued scalar product < > f g , for f g , F and the norm f ff = < > , and let G be a subset of F. A g G is called a projection of f F onto G iff f g f g for all g G. Theorem. Let F be an Hilbert space with the real valued scalar product the norm <⋅⋅> , and G F. Then it holds a) The condition < for all >= f g g g , 0 g G (1) is sufficient for to be the unique projection of f onto G. g b) If moreover, G is a linear subspace of F the condition (1) is necessary for g to be a projection of f F onto G. Proof. Let the condition (1) be fulfilled. It then holds f g f g g g f g g g f g = + = + 2 2 2 2 2

description

Funtional Analysis topic - projection theorem in Hilbert Spaces.

Transcript of The Projection Theorem in Hilbert Spaces

Page 1: The Projection Theorem in Hilbert Spaces

The projection theorem in Hilbert spaces and some of its applications

Wolfgang H. Schmidt

Fachhochschule für Technik und Wirtschaft Berlin

Summary. The projection theorem in Hilbert spaces with real valued scalar products is shown to imply several well known results. Special attention is paid to statistical problems. AMS subject classification: Key words: Hilbert spaces, projection, Fourier approximation, Least-Squares, best linear unbiased estimator, best unbiased estimator, Bayesian estimator, Cramer-Rao inequality

1 Introduction The purpose of this paper is to show that several well known results in different fields can be deduced from the projection theorem in Hilbert spaces . This is demonstrated e.g. for the computation of Fourier approximations of periodic functions , for Least-Squares approximations in R n and in R n m× and for the determination of best unbiased estimators. The Gauss-Markov theorem and the Lehmann-Scheffé theorem as well as a characterisation of the Maximum-Likelihood estimator by the attainment of the Rao-Cramér inequality are shown to be consequences of the projection theorem. These examples are intended to highlight the projection theorem. Of course, the interested reader may have in mind further applications.

2 The projection theorem Definition. Let F be a linear space endowed with a real valued scalar product < >f g, for f g, ∈F and the norm f f f= < >, and let G be a subset of F. A g∗ ∈G is called a

projection of f ∈F onto G iff f g f g− ≤ −∗ for all g ∈G. Theorem. Let F be an Hilbert space with the real valued scalar product the norm < ⋅ ⋅ >, ⋅ and G F. Then it holds ⊂

a) The condition < − for all− >=∗ ∗f g g g, 0 g ∈G (1) is sufficient for to be the unique projection of f onto G. g∗

b) If moreover, G is a linear subspace of F the condition (1) is necessary for g to be a ∗

projection of f ∈F onto G.

Proof. Let the condition (1) be fulfilled. It then holds

f g f g g g

f g g g

f g

− = − + −

= − + −

≥ −

∗ ∗

∗ ∗

2 2

2 2

2

Page 2: The Projection Theorem in Hilbert Spaces

The projection theorem in Hilbert spaces

for all g ∈G, thus g being a projection of f onto G. We show that g is the unique projection. To that aim we assume that g is another projection. Then we infer with (1)

∗ ∗

∗∗

g g g f f g

g f f g g f f g

f g g f f g

f g g g g f f g

g g f g

∗∗ ∗ ∗∗ ∗

∗∗ ∗ ∗∗ ∗

∗ ∗∗ ∗

∗ ∗∗ ∗ ∗

∗∗ ∗ ∗

− = − + −

= − + − + < − −

= − + < − − >

= − + < − + − − >

= < − − >=

2 2

2 2

2

2

2

2 2

2 2

20

,

,

,

,

>

what implies . g g∗∗ ∗=To prove the statement b) we assume g to be a projection of f onto the linear subspace G and let (1) be violated. Then there must be a

g ∈G with which implies < − − >≠∗ ∗f g g g, 0

g g∗ − ≠ 0 . Now ~ (g g g g= + −∗ ∗λ ) with λ = < −

−∗

f g

g g

, − >∗ ∗g g2 belongs to G since G is a

linear subspace. Finally,

f g f g g g f g g g

f g g g

f g

− = − + − − < − −

= − − −

< −

∗ ∗ ∗

∗ ∗

~ ,2 2 2 2

2 2 2

2

2λ λ

λ

>∗

demonstrates that g cannot be a projection of f onto G if (1) is violated. Remark 1. The assumption of G to be a linear subspace can be weakened to G to contain with two elements h and k also h h for all k+ −λb g λ ∈R . Notice that this weaker assumption is fulfilled if G is the set of all unbiased estimators for an unknown parameter. Remark 2. The condition (1) is implied by the somewhat stronger condition

< − >=∗f g g, 0 for all g ∈G. (2)

Remark 3. The condition (1) ensures g to be a projection of f onto G if is a semi-scalar product only. Under this weaker assumption there may exist other projections than g .

∗ < ⋅ ⋅ >,∗

3 Applications of the projection theorem

3.1 Approximation of square integrable functions First we discuss the problem of approximating a square integrable function f L a b∈ 2 ,

within a linear subspace of L a b2 , .

Let F= = → <RST

UVW∞zL a b f a b R f x dx

a

b

22, : , , ( ) be the linear space of all square integrable

functions defined over a finite interval a b R, ⊂ .

2

Page 3: The Projection Theorem in Hilbert Spaces

The projection theorem in Hilbert spaces

For given linearly independent basic functions g gm1, . . . , ∈F

G is an m-dimensional linear subspace of F.

Obviously, F is endowed with the semi-scalar product

= = ∈RST

UVW=∑g g x g x Ri i mi

m

: ( ) ( ); , . . . ,β β β11

< >= zf g f x g x dxa

b

, ( ) ( )

and the semi-norm

f f xa

b

= z 2 ( ) dx .

Thus minimising f g− over G means minimising

f x g x dxa

b

b g b gc h−z 2 over G.

Let be a best quadratic approximation for f. The problem is the determination of

the coefficients β . Now the condition (2)

g i ii

m∗

=

= ∑β1

g∗

∗β1∗ , . . . , m

< − >=∗f g g, 0 for all g ∈G is equivalent to

< >= < > =∗

=∑f g g g j mi i j ji

m

, , , , . . . ,β1

1 (3)

which is the well known normal equations system used to compute the coefficients . From (3) we easily get the unique Fourier coefficients of the Fourier

approximation of a periodic function with period β β1∗ , . . . , m

2π . For that purpose we choose a b= − =π π, , g x g x x g x x g x nx

g x x g x x g x nxn

n n n

1 2 3 1

2 3 2 1

1 2

2

b g b g b g b gb g b g b g≡ = = =

= = =+

+ + +

, cos , cos , . . . , cos ,

sin , sin , . . . , sin ,

thus m=2n+1 for an integer n. Then in view of

< >=g g1 1 2, ,π < >= =g g i mi i, , , . . . ,π 2

and < >= ≠ = =g g i j, i m j mi j, , , . . . , , , . . ,0 1 1

the normal equations system (3) leads to the Fourier approximation

g x a a kx b kkk

n

kk

n∗

∗∗

=

=

= + +∑ ∑b g 0

1 12cos sin x

with the Fourier coefficients

a f x kx dx kk∗

= =z1 0π π

π

b gcos , , . . , n

and

b f x kx dx kk∗

= =z1 1π π

π

b gsin , , . . , n .

3

Page 4: The Projection Theorem in Hilbert Spaces

The projection theorem in Hilbert spaces

Especially, for the best polynomial approximation of f by the condition (3)

reads as .

g x xjj

nj∗

=

= ∑b g β0

x f x dx x f x dx i ni

a

b

jj

ni j

a

bz ∑ z= =∗

=

+b g b gβ0

0, , . . . ,

3.2 Least-Squares approximations Let us look for the nearest with respect to the quadratic distance solution of an inhomogeneous linear equation system

G fβ = which may have no exact solutions. Here f R n∈ is an n-vector with real components and G is an n m× matrix with real entries. The problem is to find a β such that ∗ ∈R m

f G f GR m

− = −∗

∈β β

βmin

where

f fii

n2 2

1

==∑

denotes the Euclidean norm of f. then is the Least-Squares approximation of f in range of G. To apply the projection theorem we choose F=

Gβ∗

R n , G= and

, the usual scalar product in

G R mβ β: ∈ ⊂n s R n

< >==∑f g f gii

n

i,1

R n . Here the condition (2)

< − >=∗f g g, 0 for all g ∈G is equivalent to the normal equations system for β ∗

G G G fT β∗ =T

T

f

(4) where G denotes the transpose of G. All solutions of (4) are obtained in the form

β∗ −= G G G fT Td i

where denotes any generalised inverse of the matrix A. By the projection theorem A−

f G G G G GT T∗ ∗ −= =β d i

is the unique projection of f onto G .

Remark 4. The matrix is known to be the unique projection matrix onto the linear space generated by the columns of G, i.e. the range of G. It also holds the equality

.

G G G GTd i− T

f−f G GG∗ ∗= =β The empirical Fourier analysis fits into this frame too. Assume that a -periodic function h(t) can be observed at equidistant points

t in

i N Ni = = − =2 0 1 2π , , . . . , , ,n

n being an integer only. For instance it may happen in technical applications that the analytic expression for h is not available but h may be observed at the points . ti

4

Page 5: The Projection Theorem in Hilbert Spaces

The projection theorem in Hilbert spaces

With , f h t h tnT

= −0 1b g b gc h, . . . ,

f t f t a a kt bn n kk

n

kk

n

b g b g= = + += =

∑ ∑, cosβ 0

1 1

1

2ktsin

n

and β = the problem is to find a vector with a a a b bnT

0 1 1, , . . . , , , . . . ,b g β∗

f f t f f tnR

nn− = −∗

∈, min ,β β

βd i b g2 2

.

With

G

t nt t n

t nt t nN N N

=

t

tN

F

H

GGGGGGG

I

K

JJJJJJJ− − −

12

1

12

1

0 0 0

1 1 1

cos . . . cos sin .. . sin( )

. . . . .

. . . . .

. . . . .cos . . . cos sin .. . sin( ) −

0

1

t

it holds G G n IT

n= (5) where the identity matrix. The condition (5) is implied by the property that the functions 1

In n n×s , . . . , 1, co cos , sin , . . .sin( )t nt t n − form an orthonormal system.

Utilising (5) we get that

f t a a kt bn kk

n

kk

n

, cosβ• ∗

=

=

= + +∑ ∑d i 0

1 1

1

2ktsin

with

an

h t k t k nk i ii

N∗

=

= ∑1 00

1

b gcos , , . . . ,= (6)

and

bn

h t k t k nk i ii

N∗

=

= =∑1 00

1

b gsin , , . . . , −1 (7)

is the best approximation for f. The equations (6) and (7) define the well known empirical Fourier coefficients. Next we discuss the problem of approximating a matrix within a linear subspace of matrices. Let R n p×

F ∈ be the set of all n matrices with real elements. The problem is to approximate a

given p×

F= R n p× using the Least-Squares approach by a matrix in the linear subspace G B∗

G=n s . G B B R k p: ∈ ×

Here G is fixed n k× matrix. As a scalar product in R n p× we choose

. < >=F H tr F HT,

Now we are looking for a matrix B R k p∗ ×∈ with F G B F G B

B R k p− = −∗

∈ ×

2 2min

a task usually occurring in the theory of multivariate linear models.

5

Page 6: The Projection Theorem in Hilbert Spaces

The projection theorem in Hilbert spaces

Again the condition (2)

< − >= −LNMOQP =

∗ ∗F G B G B tr F G B G BT

, d i 0 for all B R k p∈ ×

is equivalent to the normal equations system G G B G FT ∗ = T . (8)

The condition (4) is a special case of the condition (8) for p = 1. Remark 5. Usually, the result (8) is derived utilising the theory of Kronecker matrices. Remark 6. Every other scalar product in R n p× with a norm equivalent to F tr F FT2

= leads to the system (8) too. As an example one might choose

< >= +F H F H H FT T, maxλ d i where denotes the largest eigenvalue of a given symmetric matrix. λmax

3.3 Approximation of a random variable by a constant Let [ ,A,P] be a probability space and let X be a random variable mapping Ω

[Ω ,A,P]→ [X,A, P ] X

X

with X⊂ R . By F we denote the set of all such random variables with finite second moments and positive variance, i.e.

E X x dPX2 2= < ∞zX

, with σ2 2= −E X E Xb g E X xdPX= z

X

.

As a semi- scalar product serves < >= = zX Y E X Y X Y dP, ω ωb g b g

Ω

.

The problem is to find a constant a R∗ ∈ with E X a E X a

a R− = −∗

∈d i b g2 2min .

It is well known that is the solution. Again this can be deduced from the projection theorem regarding

a E∗ = X

∗< − >= − = − =∗ ∗X a a E X a a E X a a, d ie j d i 0 for all a R∈

if a . E∗ = XUsing this result one also easily proves that the posterior mean is a Bayes estimator. For that we assume that the random sample X with values in X possesses a probability distribution P within a family , P = ∈Pϑ ϑ: Θl q ϑ being a real parameter. Let P be dominated

by a σ -finite measure µ and let f x / ϑb g = dPdµ

ϑ xb g be a version of the Radon-Nikodym

densities. is assumed to be the conditional density of X under the condition f x / ϑb g ϑ ϑ=

and ϑ is considered at random with a given prior density . The Bayesian risk of an

estimator for is

π ϑb g$ $ϑ ϑ= Xb g ϑ

6

Page 7: The Projection Theorem in Hilbert Spaces

The projection theorem in Hilbert spaces

r x f x d$ , [ $ / ]ϑ π ϑ ϑ ϑ µ π ϑ ϑe j b ge j b g b g= − dzzX

2

Θ

and is a Bayes estimator if $ $ϑ ϑ∗ ∗= Xb gr r$ , min $ ,

$ϑ π ϑ π

ϑ

∗ =e j e j

d

.

Because of

r g x x h x d$ , ( )[ $ / ]ϑ π ϑ ϑ ϑ ϑ µe j b ge j b g= −zzΘ

2

X

with g x f x db g b g b g= z / ϑ π ϑ ϑ

Θ

and

h xf xf x

ϑϑ π ϑ

ϑ π ϑ/

//

b g b g b gb g b= zΘ

g

ϑ

ϑ

$ϑ∗ is a Bayes estimator iff minimises $ϑ∗

. $ /ϑ ϑ ϑx h x db ge j b g−zΘ

2

Thus $ /ϑ ϑ ϑ∗ = zX h Xb g b gΘ

d

XX

is a Bayes estimator.

3.4 Best linear unbiased estimator, the Gauss-Markov theorem Let F be the set of random variables as described in section 3.3. Further, let be a sample of size n to the distribution , i.e. are independent and identically distributed random variables with joint distribution P . By F we denote the set of all random variables being measurable functions of the random vector with finite moments of second order. Again in F the semi-scalar product

X n1, . . . ,

XnTg

PX X n1, . . . ,X

X Xnb g b= 1, . . . ,

< >=X Y E X Y, and the semi-norm

X E X= 2 are defined. We are looking for a best linear unbiased estimator for

µ = z xdPX

X

,

that is a g is to be determined with ∗ ∈G

E g E gg

∈− = −µ µd i b g2 2min

G .

Here is the set of all linear unbiased estimators for µ . It is

well known that the arithmetic mean

G = =RST

UVW= =∑ ∑g X c X cn

i ii

n

ii

nb ge j1 1

1, =

g Xn

Xii

n∗

=

= = ∑11

is the solution.

7

Page 8: The Projection Theorem in Hilbert Spaces

The projection theorem in Hilbert spaces

Again this follows by the projection theorem.

With and g we have g ci ii

n∗ ∗

=

= ∑1

X Xci ii

n

==∑

1

< − − >=∗ ∗µ g g g, 0 for all g ∈G and all µ ∈R iff

σ2 2

1

0c c ci i ii

n∗ ∗

=

− =∑ e j .

Thus condition (1) is fulfilled for cn

i ni∗ = =

1 1, , . . . , .

Next we shall discuss the linear model

f G= +β ε . Here f is the n-vector of random observations, G R n m∈ × is the known design matrix, β ∈ is the unknown vector of regression coefficients and

R m

ε is an unobservable random error vector with the properties

E R nε = ∈0 and E Tε ε σ= 2Λwith unknown variance 0 and 2< <σ ∞ Λ ∈ ×Rn n

In

is a known positive definite matrix. In particular, could be the identity . Λ n n×The Gauss-Markov Theorem claims that the estimator

$γ ∗ − + −= C G G G fT TΛ Λ1 1d i for has smallest covariance matrix within the class of all linear unbiased estimators for whenever with C

γ = Cγ = C

ββ βγ = C R l m∈ × is an estimable parameter. As usual

denotes the Moore-Penrose inverse of a given matrix A. A+

In the set F of all random l-vectors with existing covariance matrices we introduce the semi-scalar product

< >=f h tr f hT, . Let G be the set of all linear unbiased estimators for γ β= C ,

G F= =L f L G C f: ,l q∈ .

The problem is to find an L R l n∗ ∈ × such that minimises γ ∗ ∗= L f

L f E L f L f tr E L f L fT T− = − − = − −γ γ γ γ

2 b g b g b gb gγ

over all l R l n∈ × with L G =C. Obviously, is the covariance matrix of the linear unbiased estimator

E L f L f T− −γb gb gγ

$γ = L f provided L G =C. With these notations the condition (1) reads as follows:

< − − >=∗ ∗γ γ γ γ$ , $ $ 0 for all $γ ∈G and all f ∈F . Because of

$γ γ β ε β∗ ∗− = + − =L G C Lb g ε∗

ε

and

$ $γ γ∗ ∗− = −L Ld i

8

Page 9: The Projection Theorem in Hilbert Spaces

The projection theorem in Hilbert spaces

we have

< − − >=< − >

= −LNM OQP= −LNM OQP=

∗ ∗ ∗ ∗

∗ ∗

∗ ∗

γ γ γ γ ε ε

εε

σ

$ , $ $ ,L L L

tr E L L L

tr L L L

T T

T

d id id i2

0

Λ

for since L C G G GT∗ − + −= Λ 1 1d i TΛ

=L L L L C G G G L C G G G G G G CT T T T T T T T T∗ ∗ ∗ − + − + − − +− = −Λ Λ Λ Λ Λ Λ1 1 1 1 0d i d i d id i .

Thus the generalised Aitken-estimator $γ ∗ − + −= C G G G fT TΛ Λ1 1d i

minimises the trace of the covariance matrix within the class of all linear unbiased estimators of the estimable parameter . γ β= COf course, also has minimal covariance matrix, not only trace of, since k as an estimator for k has smallest variance for every fixed

$γ ∗ T $γ ∗

T γ k R l∈ what can be proved by the same reasoning. For details see C.R. Rao (1973).

3.5 The Lehmann-Scheffé Theorem Let X be a random vector having a distribution in P and let P possess a sufficient and complete statistic T(X). The Lehmann-Scheffé Theorem asserts that there is for every estimable one-dimensional parameter an estimator of the form which has smallest variance within the class of all unbiased estimators for , see e.g. H.Witting (1985)

= ∈Pϑ ϑ: Θl q

.

γ γ ϑ= b g $γ ∗ = h T Xb gc hγ

Indeed, let be an arbitrary but fixed unbiased estimator of , then $ $γ γ= Xb g γ

$ $ /γ γ∗•= E X T Xb g b gm r

a version of the conditional expectation independent of ϑ outside of a P-zero set has the desired property. Actually, for with the semi-scalar product $γ ∗

< >=$ , ~ $ ~γ γ γ γϑE and any other unbiased γ $

< − − >= − −

= − −

= − −

∗ ∗ ∗ ∗

•∗ ∗

∗•

γ γ γ γ γ γ γ γ

γ γ γ γ

γ γ γ γ

ϑ

ϑ

ϑ

$ , $ $ $ $ $

$ $ $ /

$ $ $ /

E

E E T X

E E T

d id id id i b go te jd i d i b go te jX

X h

(9) = − ∗E k Tϑ γ γ$d i b gce jwith . k T X E T Xb gc h d i b go t= −•

∗$ $ /γ γ

Now

9

Page 10: The Projection Theorem in Hilbert Spaces

The projection theorem in Hilbert spaces

E k T X E Eϑ ϑ ϑγ γ γ ϑ γ ϑb gc h b g b g= − = − =∗$ $ 0 implies

k T Xb gc h = 0 P-almost everywhere (10) by the completeness of T(X). (9) and (10) together yield

< − − >=∗ ∗γ γ γ γ$ , $ $ 0 for all $γ ∈G and all ϑ ∈Θ .

Remark. 7 The extension to l-dimensional estimable parameters is obvious.

Thus every unbiased estimator for depending through X upon a sufficient and complete statistic T(X) only possesses smallest covariance matrix within the class of all unbiased estimators for .

γ γ ϑ= ∈b g R l

γ γ ϑ= b g

γ ϑb g Example 1. In the normal linear model

f∼ N Gβ σ, 2Λd i

T XG f

f

T

b g = FHGIKJ2 is a complete and sufficient for ϑ

β

σ=FHGIKJ2 . Therefore, the generalised Aitkin-

estimator $γ ∗ − + −= C G G G fT TΛ Λ1 1d i

is a best unbiased estimator for which is assumed to be an estimable parameter. γ ϑ βb g = CSimilarly,

$σ2 121

=−

− − + −

n r Gf G G G G fT TΛ Λd i 1

is a best unbiased estimator for . γ ϑ σb g = 2

Example 2. Quality control Let be a sample to the binomial distribution , . Then with

is a complete and sufficient statistic for ϑ ∈ and

therefore

X n1, . . . ,

X X Xn1, . . . ,bX 1B ϑ,1b g 0 < <ϑ

T= g T X Xi

i

n

b g ==∑

1

0 1,b g

$ϑ∗

=

= = ∑Xn

Xii

n11

is a best unbiased estimator for . ϑ

3.6 Attainment of the Rao-Cramér inequality Attainment of the Rao-Cramér inequality can take place for Maximum-Likelihood estimators only. To make this statement precise let us first formulate some regularity conditions needed. Let X be a ra alues in having a distribution in a family ndom vector with v X ∈R n

P = ∈Pϑ ϑ: Θl q which is dominated by a σ -finite measure µ . For a version of the Radon-Nikodym densities

dPd

x f xϑ

µϑb g b g= ,

10

Page 11: The Projection Theorem in Hilbert Spaces

The projection theorem in Hilbert spaces

it holds f x,ϑb g > 0 for all x and for all ∈X ϑ ∈Θ .

Let be differentiable with respect to f x,ϑb g ϑ for all x ∈X and let Θ be an open interval of the real line. Further

dd

f x dϑ

ϑ µ, ,b gXz = 0

$ ,ϑϑ

ϑ µx dd

f x db g b gXz = 1 for every unbiased , $ϑ Xb g

I dd

f x f x dϑϑ

ϑ ϑ µb g b g b g= FHGIKJ < ∞z ln , ,

X

2

and I ϑb g > 0 for all ϑ ∈Θ

should be fulfilled. Under these assumptions the Rao-Cramér inequality

Var X Iϑϑ ϑ$ b g b g≥ −1 for all ϑ ∈Θ holds true, see e.g. P. Hoel, S. Port, C. Stone (1971). Further let us assume that a unique Maximum-Likelihood estimator ϑML for exists. ϑWe shall show that the equality

Var X Iϑϑ ϑ$ b g b g= −1 for ϑ ∈Θ

for an unbiased estimator implies the existence of a $ϑ∗ Xb g µ -zero set N such that $ϑ ϑ∗ =x MLb g b gx outside of N.

To this aim let F be the set of all random variables being functions of X with finite second moments and let G be the subset of all unbiased estimators for ϑ in F. With the semi-scalar product

< >= z$ , ~ $ ~ ,ϑ ϑ ϑ ϑ ϑ µx x f x db g b g b gX

it holds

Varϑϑ ϑ ϑ$ $= −2 .

Since attains the lower bound it is a projection of $ϑ∗ Xb g I−1 ϑb g ϑ onto G. Therefore, it follows by the projection theorem

< − − >=∗ ∗ϑ ϑ ϑ ϑ$ , $ $ 0 for all and all $ϑ ∈G ϑ ∈Θ . (11) Because of

ϑ ϑϑ

ϑ µ+ =−z I t dd

f x d1 1b gd i b gX

,

with

t dd

f x=ϑ

ϑln ,b g (11) holds especially for . $ϑ ϑ ϑ= + −I t1b gTherefore, we get

11

Page 12: The Projection Theorem in Hilbert Spaces

The projection theorem in Hilbert spaces

12

>t

< − − > =< − >∗ ∗ ∗ −ϑ ϑ ϑ ϑ ϑ ϑ ϑ$ , $ $ $ , I t1b g or

− = < −− − ∗I I1 1ϑ ϑ ϑ ϑb g b g $ , or

1=< − >∗$ ,ϑ ϑ t and because of

1= −∗t $ϑ ϑ

< − > = −∗ ∗$ , $ϑ ϑ ϑ ϑt t

holds true. In view of the Cauchy-Schwarz inequality there is a µ -zero set N such that

t a f x= −∗ϑ ϑ ϑb g b ge j, $ ϑ for all x ∈X N\ for some constant . a ϑb gSince

I t a f xϑ ϑ ϑ ϑb g b g e j b g= = −∗z2 2 2 3$ ,X

dϑ µ

is positive cannot be zero for a ϑb g ϑ ∈Θ . By assumption ϑML is a Maximum-Likelihood estimator for and therefore the Likelihood equation ϑ

a f xML ML MLϑ ϑ ϑ ϑb g b ge j, $ ∗ − = 0 for x ∈X N\

holds true what implies $ϑ ϑ∗ =x MLb g b gx outside of N.

References. [1] P.G. Hoel, S.C. Port, C.J. Stone (1971), Introduction to Statistical Theory, Houghton Mifflin Company, Boston. [2] C.R. Rao (1973), Linear Statistical Inference and Its Applications, Wiley, New York. [3] H. Witting, (1985), Mathematische Statistik, Teubner Verlag.