2-15-051 Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation...

16
2-15-05 1 Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point estimation Empirical Bayes estimation

Transcript of 2-15-051 Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation...

Page 1: 2-15-051 Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.

2-15-05 1

Estimating parameters in a statistical model

• Likelihood and Maximum likelihood estimation

• Bayesian point estimates• Maximum a posteriori point estimation• Empirical Bayes estimation

Page 2: 2-15-051 Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.

2-15-05 2

Random Sample

•Random sample: A set of independently generated observations by the statistical model.•For example, n replicated measurements of the differences in expression levels for a gene under two different treatments x1,x2,....,xn ~ iid N(,2)•Given parameters, the statistical model defines the probability of observing any particular combination of the values in this sample•Since the observations are independent, the probability distribution function describing the probability of observing a particular combination is the product of probability distribution functions

Page 3: 2-15-051 Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.

2-15-05 3

Probability distribution function vs probability

•In the case of the discrete random variables which have a countable number of potential values (can assume finitely many for now), probability density function is equal to the probability of each value (outcome)•In the case of a continuous random variable which can yield any number in a particular interval, the probability distribution function is different from the probability•Probability of any particular number for a continuous random variable is equal to zero•Probability density function defines the probability of the number falling into any particular sub-interval as the area under the curve defined by the probability density function.

Page 4: 2-15-051 Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.

2-15-05 4

Probability distribution function vs probability

•Example: The assumption of our Normal model is that there the outcome can be pretty much any real number. This is obviously a wrong assumption, but it turns out that this model is a good approximation of the reality.•We could "discretize" this random variable. Define r.v. y={1 if |x|>c and 0 otherwise} for some constant c•This random variable can be assume 2 different values and the probability distribution function is define by p(y=1)•Although the probability distribution function in the case of a continuous random variable does not give probabilities, it satisfies key properties of the probability.

Page 5: 2-15-051 Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.

2-15-05 5

Back to basics – Probability, Conditional Probability and Independence

1)( 3) 2

0

i

iyp

• Discrete pdf (y)

1) 3)

dxf(x

0i)(y 1) p

• Continuous pdf (x)

1i)(y 2) p

0(x) 1) f

1)( 2) dxxfb

a

-2 0 2 4 6

-2 0 2 4 6

-2 0 2 4 6

LR

-2 0 2 4 6

-2 0 2 4 6

-2 0 2 4 6

LR

a b x

• For y1,...,yn iid of p(y) • For x1,...,xn iid of f(x)

)()...(),...,( 5) 11 nn ypypyyp

)(

)()|(

)(

),()|( 4)

2

112

2

2121 yp

ypyyp

yp

yypyyp )(

)()|(

)(

),()|( 4)

2

112

2

2121 xf

xfxxf

xf

xxfxxf

)()...(),...,( 5) 11 nn xfxfxxf

• From now on, we will talk in terms of just a pdf and things will hold for both discrete and continuous random variables

Page 6: 2-15-051 Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.

2-15-05 6

Expectation, Expected value and Variance

• Discrete pdf (y)• Expectation of any function g of the random

variable y (average value of the function after a large number of experiments

1

0

i)(y)()]E[ i

pigg(y

• Continuous pdf (x)• Expectation of any function g of

the random variable x

-2 0 2 4 6

-2 0 2 4 6

-2 0 2 4 6

LR

-2 0 2 4 6

-2 0 2 4 6

-2 0 2 4 6

LR

a b x

-

)()()]E[ dxxfxgg(x

1

0

i)(y ]E[ i

piy

-

)( ]E[ dxxfxx

• Expected value - average x after a very large number of experiments

• Expected value - average y after a very large number of experiments

• Variance - Expected value of (y-E(y))2

1

0

22 i)(y ])[(]])[E[( i

pyEiyEy

• Variance - Expected value of (x-E(x))2

-

22 )(])[(]])[E[( dxxfxExxEx

Page 7: 2-15-051 Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.

2-15-05 7

Expected Value and Variance of a Normal Random Variable

• Normal pdf

-2 0 2 4 6

-2 0 2 4 6

-2 0 2 4 6

LR

-2 0 2 4 6

-2 0 2 4 6

-2 0 2 4 6

LR

a b xμ)σ,μ | ( ]E[

-

2

dxxfxx N• Expected value - average x after a

very large number of experiments

• Variance - Expected value of (x-E(x))22

-

222 σ)σ,μ | ()μ(])μE[(

dxxfxx N

2

2

μ)(x

2

σ2π

1σ,μ | x(

efN )

Page 8: 2-15-051 Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.

2-15-05 8

Maximum Likelihood

•x1,x2,....,xn ~ iid N(,2)

•Joint pdf for the whole random sample

nμ̂

ix

n

)μ̂(σ̂

2i2

x

•Maximum likelihood estimates of the model parameters and 2 are numbers that maximize the joint pdf for the fixed sample which is called the Likelihood function

2

2

μ)(x

2

σ2π

1σ,μ | x(

efN )

)σ,μ |()...σ,μ |()σ,μ |(σ,μ | ,...,, ( 222

21

221 nn xfxfxfxxxf )

)σ,μ |()...σ,μ| ()σ,μ |( ,...,,|σ,μ ( 2121 nn xfxfxfxxxl )

•Likelihood function is basically the pdf for the fixed sample

Page 9: 2-15-051 Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.

2-15-05 9

Bayesian Inference

• Assumes parameters are random variables - key difference• Inference based on the posterior distribution given

data

• Prior DistributionDefines prior knowledge or ignorance about the parameter

• Posterior DistributionPrior belief modified by data

),...,(

)()|,...,(),...,| :Posterior

),...,( :Likelihood

)( :Prior

1

11

1

n

nn

n

xxD

μpμxxlxxf(μ

|xxl

p

Page 10: 2-15-051 Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.

2-15-05 10

Bayesian Inference

-10 -8 -6 -4 -2 0 2 4 6 8 10

-10 -8 -6 -4 -2 0 2 4 6 8 10LogRatio

-10 -8 -6 -4 -2 0 2 4 6 8 10

Prior distribution of

Data model given

Posterior distribution of given data (Bayes theorem)

P(>0|data)

),1

1

(~,...,,,| :Posterior

),(~,| x:Likelihood

),(~,| :Prior

22

22

22

22

12

22

22

nn

xn

Nxx

N

N

n

Page 11: 2-15-051 Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.

2-15-05 11

Bayesian Estimation

• Bayesian point-estimate is the expected value of the parameter under its posterior distribution given data

22

22

12

22

22

22

22

12

1

1

],...,,,|E[ ),1

1

(~,...,,,| :Posterior

n

xn

xxnn

xn

Nxx nn

• In some cases, the expectation of the posterior distribution could be difficult to assess - easer to find the value for the parameter that maximized the posterior distribution given data - Maximum a Posteriori (MAP) estimate

• Since the numerator of the posterior distribution in the Bayes theorem is constant in the parameter, this is equivalent to maximizing the product of the likelihood and the prior pdf

),...,(

)()|,...,(),...,| :Posterior

1

11

n

nn xxD

μpμxxlxxf(μ

Page 12: 2-15-051 Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.

2-15-05 12

Alternative prior for the normal model

)|,...,(*),...,(

)()|,...,(),...,|( :Posterior

1) :Prior

11

11

nn

nn xxlconst

xxP

pxxlxxf

p(

• Degenerate uniform prior for assuming that any prior value is equally likely - this is clearly unrealistic - we know more than that

• MAP estimate for is identical to the maximum likelihood estimate• Bayesian point-estimation and maximum likelihood are very closely

related

Page 13: 2-15-051 Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.

2-15-05 13

Hierarchical Bayesian Models and Empirical Bayes Inference

• If we are not happy with pre-specifying and 2, we can estimate them based on the "marginal" distribution of the data given and 2 and plug them back into the formula for the Bayesian estimate - the result is the Empirical Bayes estimate

•xi ~ ind N(i,2), i=1,...,n, assume that variance is known

•Need to estimate i , i=1,...,n

•The simplest estimate is

•Assuming that i ~ iid N(,2), i=1,...,n

iiμ̂ x

22

22

12

22

22

22

22

12

11

11

],...,,,|E[ ),1

11

(~,...,,,| :Posterior

i

ni

i

ni

xxx

n

xNxx

Page 14: 2-15-051 Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.

2-15-05 14

Hierarchical Bayesian Models and Empirical Bayes Inference

iiμ̂ x

•If xi ~ ind N(i,2), i ~ iid N(,2), i=1,...,n,

•The "marginal" distribution of each xi, with i's "factored out" is

N(,2+2), i=1,...,n

•Now we can estimate using say maximum likelihood and

plug them back into the formula for the Bayesian estimates of

i's

2ˆ and ˆ

22

22

1ˆ1

1ˆˆ1

of estimate Bayes Empirical

i

i

x

Page 15: 2-15-051 Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.

2-15-05 15

Hierarchical Bayesian Models and Empirical Bayes Inference

•The estimates for individual means are "shrunk" towards the mean of all means•Turns out such estimates are better overall than estimates based on the individual observations ("Stein effect")•Individual observations from our model can be replaced with groups of observations

x1i,x2

i,...,xki ~ ind N(i,2)

•Limma does the similar thing, only with variances•Data for each gene i are assumed to be distributed asx1

i,x2i,...,xk

i ~ iid N(i,i2), and the means are estimated in the usual way, while

an additional hierarchy is placed on the variances describing how variances are expected to vary across genes:

sassumptionminor some 1

~,|1

:Prior 2d2

00

2002 0

sd

sd 1

ˆ)1(],,ˆ|[~

0

22002

0022

nd

nsdsdEs i

iii

Page 16: 2-15-051 Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.

2-15-05 16

Hierarchical Bayesian Models and Empirical Bayes Inference

•Testing the hypothesis i=0, by calculating the modified t-statistics

1d*

0t~

1~

ˆ n

i ns

t