M L D M Chapter 3 Parameter Estimation

MIMA Group

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University

Chapter 3Parameter Estimation

M LD M

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 2

Contents Introduction Maximum-Likelihood Estimation Bayesian Estimation

MIMA


Samples

},,,{ 21 cDDDD D1 D2

D3

The samples in Dj are drawn independently according to the probability law p(x|j). That is, examples in Dj are i.i.d. random variables, i.e., independent and identically distributed.

It is easy to compute the prior probability:

c

i i

ji

D

DP

1

)(

MIMA


Samples For class-conditional pdf:

Case I: p(x|j) has certain parametric forme.g.

If j contains “d+d(d+1)/2” free parameters.

Case II: p(x|j) doesn’t have parametric formNext chapter.

),(~)|( jjj Np Σμx

jT

mj ),,,( 21 θdRX

MIMA


Goal

},,,{ 21 cDDDD D1 D2

D3

1 2

3

)|()|( jj pp θxx )|()|( jj pp θxx

Use Dj to estimate the unknown parameter vector j

1θ̂2θ̂

3θ̂

Tmj ),,,( 21 θ

MIMA


Estimation Under Parametric Form

Maximum-Likelihood Estimation

Bayesian Estimation

View parameters as quantities whose values are fixed but unknown

Estimate parameter values by maximizing the likelihood (probability) of observing the actual examples.

View parameters as random variables having some known prior distribution

Observation of the actual training examples transforms parameters’ prior into posterior distribution. (via Bayes rule)

MIMA


Maximum-Likelihood Estimation

Because each class is considered individually, the subscript used before will be dropped.

Now the problem becomes:

Given a sample set D, whose elements are drawn independently from a population possessing a known parameter form, say p(x|), we want to choose a that will make D to occur most likely.

θ̂

D

θ̂

MIMA


Maximum-Likelihood Estimation (Cont.)

Criterion of ML

By the independence assumption, we have

The Likelihood Function

The maximum-likelihood estimation:

},,,{ 21 nxxx D

)|()|()|()|( 21 θxθxθxθ npppp D

n

kkp

1

)|( θx

)|()|( θθ DD pL

n

kkp

1

)|( θx

)|(maxarg ˆ DL

θ

MIMA



Find the extreme values using the method in differential calculus.

Gradient Operator Let f() be a continuous function, where =(1, 2,…, n)T.

Find the extreme values by solving

T

n

,,,21

θGradientOperator

0 fθ

MIMA


The Gaussian Case I Case I: unknown , and is known

)()(

21exp

||)2(1),|( 1

2/12/ μxΣμxΣ

Σμx Tdp

)|()|( μμ DD pL

n

kk

Tknnd

1

12/2/ )()(

21exp

||)2(1 μxΣμx

Σ

n

kkp

1

)|( θx

)|(ln)|( DD μμ Ll

n

kk

Tk

nnd

1

12/2/ )()(21||)2ln( μxΣμxΣ

MIMA


The Gaussian Case I

0)()|(1

1

n

kkl μxΣμμ D

n

kkn 1

1ˆ xμ

Intuitive Result: Maximum estimate for the unknown is just the arithmetic average of training samples---sample mean.

Sample Mean!

)|(ln)|( DD μμ Ll

n

kk

Tk

nnd

1

12/2/ )()(21||)2ln( μxΣμxΣ

MIMA


The Gaussian Case II Case II: both and are unknown Consider univariate case

2

22

2)(exp

21),|(

xxp TT ),(),( 2

21 θ

)|()|( θθ DD pL

n

k

knn

x1

2

2

2/ 2)(exp

)2(1

n

kkxp

1

)|( θ

)|(ln)|( DD θθ Ll

n

kk

nn x1

22

2/ )(2

1)2ln(

n

kk

nn x1

21

2

2/2

2/ )(21)2ln(

MIMA


The Gaussian Case II

n

kk

nn xl1

21

2

2/2

2/ )(21)2ln()|(

Dθ

0θθ

n

k

k

n

kk

xn

xl

122

21

2

11

2

2)(

2

)(1

)|(

D

n

kkx

n 11

1ˆˆ

n

kkx

n 1

22

2 )ˆ(1ˆˆ

biased

unbiased

Arithmetic average of n vectors

Arithmetic average of n matricesT

kk )ˆ)(ˆ( μxμx

Unbiased Estimator:

Consistent Estimator:θθ ]ˆ[E

θθ

]ˆ[lim En

MIMA


MLE for Normal Population

n

kkn 1

1ˆ xμ

n

kkn 1

1ˆ xμ

n

k

Tkkn 1

)ˆ)(ˆ(1ˆ μxμxΣ

n

k

Tkkn 1

)ˆ)(ˆ(1ˆ μxμxΣ

μμ ]ˆ[E

ΣΣΣ

n

nE 1]ˆ[

n

k

Tkkn 1

)ˆ)(ˆ(1

1 μxμxC

n

k

Tkkn 1

)ˆ)(ˆ(1

1 μxμxCSample Covariance Matrix

ΣC ][E

Sample Mean

MIMA


MIMA


Bayesian Estimation Settings

The parametric form of the likelihood function for each category is known

However, j is considered to be random variables instead of being fixed (but unknown) values.

In this case, we can no longer make a single ML estimate and then infer based on and

θ̂)|( xiP )( iP )|( ip x

How can we proceed?

Fully exploit training examples!

MIMA


Posterior Probabilities from sample

c

j j

iii

PP

PPP

1)x,,(

)x,,()x,(

)x,,(),|(D

DD

DD

x

),|x()|()()|x,()(),,( DDDD iiii PPDPPDPP x

),|(),|( iDD ii PP xx ),|(),|( iDD ii PP xx

)()|( ii PP D )()|( ii PP D

Assumptions:

c

jjjj

iiii

PP

PPP

1)(),|(

)(),|(),|(

D

DD

x

xx

Each class can be considered independently

Each class can be considered independently

MIMA


Class-Conditional Density Estimation

θθθx

θθθx

θθxx

dpp

dpDp

dDpp

)|()|(

)|(),|(

)|,()|(

D

D

D :Random variable w.r.t. parametric form

x is independent of D given

The form of distribution is assumed

known

The posterior density we want to estimate

Assume p(x) is unknown but knowing it has a fixed form with parameter vector .

MIMA


Bayesian Estimation: General Procedure

?)|( DθpPhase I:

MIMA


Bayesian Estimation: General Procedure

θθθxx dppp )|()|( )|( DDPhase II:

Phase III:

c

jjjj

iiii

PP

PPP

1)(),|(

)(),|(),|(

D

DD

x

xx

MIMA


The Gaussian Case

2

0

0

01

2

21exp

21

21exp

21)|(

n

k

kxp D

n

k

kx1

2

0

02

21exp

n

kkxn

120

02

220

2

12121exp

2

21exp

21)|(

xxp

2

0

0

0 21exp

21)(

p

)()|()|(1

θθxθ pppn

kk

D

MIMA


The Gaussian Case

)|( Dp

n

kkxn

120

02

220

2

12121exp

),(~)|( 2nnNp D

2

21exp

21)|(

n

n

n

p

D

22

2 22

1exp2

1nn

nn

Com

paris

on

is an exponential function of a quadratic function of ; thus is also a normal.

)|( Dp)|( Dp

MIMA


The Gaussian Case Equating the coefficients in both form; then, we

have

0220

2

220

20 ˆ

nn

nnn

220

2202

nn

n

kkn x

n 1

1̂

MIMA


The Gaussian Case

Phase III:

c

jjjj

iiii

PP

PPP

1)(),|(

)(),|(),|(

D

DD

x

xx

MIMA


Summary Key issue

Estimate prior and class-conditional pdf from training set

Basic assumption on training examples: i.i.d. Two strategies to key issue

Parametric form for class-conditional pdfMaximum likelihood estimationBayesian estimation

No parametric form for class-conditional pdf

MIMA


Summary Maximum likelihood estimation

Settings: parameters as fixed but unknown values The objective function: log-likelihood function The gradient for the objective function should be zero Gaussian

Bayesian estimation Settings: parameters as random variables General procedure: I, II, III Gaussian caseProject 3.2

MIMA Group

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University

Any Question?

M L D M Chapter 3 Parameter Estimation

Documents

Transcript of M L D M Chapter 3 Parameter Estimation