2-15-051 Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation...
-
Upload
bethany-briggs -
Category
Documents
-
view
214 -
download
0
Transcript of 2-15-051 Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation...
2-15-05 1
Estimating parameters in a statistical model
• Likelihood and Maximum likelihood estimation
• Bayesian point estimates• Maximum a posteriori point estimation• Empirical Bayes estimation
2-15-05 2
Random Sample
•Random sample: A set of independently generated observations by the statistical model.•For example, n replicated measurements of the differences in expression levels for a gene under two different treatments x1,x2,....,xn ~ iid N(,2)•Given parameters, the statistical model defines the probability of observing any particular combination of the values in this sample•Since the observations are independent, the probability distribution function describing the probability of observing a particular combination is the product of probability distribution functions
2-15-05 3
Probability distribution function vs probability
•In the case of the discrete random variables which have a countable number of potential values (can assume finitely many for now), probability density function is equal to the probability of each value (outcome)•In the case of a continuous random variable which can yield any number in a particular interval, the probability distribution function is different from the probability•Probability of any particular number for a continuous random variable is equal to zero•Probability density function defines the probability of the number falling into any particular sub-interval as the area under the curve defined by the probability density function.
2-15-05 4
Probability distribution function vs probability
•Example: The assumption of our Normal model is that there the outcome can be pretty much any real number. This is obviously a wrong assumption, but it turns out that this model is a good approximation of the reality.•We could "discretize" this random variable. Define r.v. y={1 if |x|>c and 0 otherwise} for some constant c•This random variable can be assume 2 different values and the probability distribution function is define by p(y=1)•Although the probability distribution function in the case of a continuous random variable does not give probabilities, it satisfies key properties of the probability.
2-15-05 5
Back to basics – Probability, Conditional Probability and Independence
1)( 3) 2
0
i
iyp
• Discrete pdf (y)
1) 3)
dxf(x
0i)(y 1) p
• Continuous pdf (x)
1i)(y 2) p
0(x) 1) f
1)( 2) dxxfb
a
-2 0 2 4 6
-2 0 2 4 6
-2 0 2 4 6
LR
-2 0 2 4 6
-2 0 2 4 6
-2 0 2 4 6
LR
a b x
• For y1,...,yn iid of p(y) • For x1,...,xn iid of f(x)
)()...(),...,( 5) 11 nn ypypyyp
)(
)()|(
)(
),()|( 4)
2
112
2
2121 yp
ypyyp
yp
yypyyp )(
)()|(
)(
),()|( 4)
2
112
2
2121 xf
xfxxf
xf
xxfxxf
)()...(),...,( 5) 11 nn xfxfxxf
• From now on, we will talk in terms of just a pdf and things will hold for both discrete and continuous random variables
2-15-05 6
Expectation, Expected value and Variance
• Discrete pdf (y)• Expectation of any function g of the random
variable y (average value of the function after a large number of experiments
1
0
i)(y)()]E[ i
pigg(y
• Continuous pdf (x)• Expectation of any function g of
the random variable x
-2 0 2 4 6
-2 0 2 4 6
-2 0 2 4 6
LR
-2 0 2 4 6
-2 0 2 4 6
-2 0 2 4 6
LR
a b x
-
)()()]E[ dxxfxgg(x
1
0
i)(y ]E[ i
piy
-
)( ]E[ dxxfxx
• Expected value - average x after a very large number of experiments
• Expected value - average y after a very large number of experiments
• Variance - Expected value of (y-E(y))2
1
0
22 i)(y ])[(]])[E[( i
pyEiyEy
• Variance - Expected value of (x-E(x))2
-
22 )(])[(]])[E[( dxxfxExxEx
2-15-05 7
Expected Value and Variance of a Normal Random Variable
• Normal pdf
-2 0 2 4 6
-2 0 2 4 6
-2 0 2 4 6
LR
-2 0 2 4 6
-2 0 2 4 6
-2 0 2 4 6
LR
a b xμ)σ,μ | ( ]E[
-
2
dxxfxx N• Expected value - average x after a
very large number of experiments
• Variance - Expected value of (x-E(x))22
-
222 σ)σ,μ | ()μ(])μE[(
dxxfxx N
2
2
2σ
μ)(x
2
σ2π
1σ,μ | x(
efN )
2-15-05 8
Maximum Likelihood
•x1,x2,....,xn ~ iid N(,2)
•Joint pdf for the whole random sample
nμ̂
ix
n
)μ̂(σ̂
2i2
x
•Maximum likelihood estimates of the model parameters and 2 are numbers that maximize the joint pdf for the fixed sample which is called the Likelihood function
2
2
2σ
μ)(x
2
σ2π
1σ,μ | x(
efN )
)σ,μ |()...σ,μ |()σ,μ |(σ,μ | ,...,, ( 222
21
221 nn xfxfxfxxxf )
)σ,μ |()...σ,μ| ()σ,μ |( ,...,,|σ,μ ( 2121 nn xfxfxfxxxl )
•Likelihood function is basically the pdf for the fixed sample
2-15-05 9
Bayesian Inference
• Assumes parameters are random variables - key difference• Inference based on the posterior distribution given
data
• Prior DistributionDefines prior knowledge or ignorance about the parameter
• Posterior DistributionPrior belief modified by data
),...,(
)()|,...,(),...,| :Posterior
),...,( :Likelihood
)( :Prior
1
11
1
n
nn
n
xxD
μpμxxlxxf(μ
|xxl
p
2-15-05 10
Bayesian Inference
-10 -8 -6 -4 -2 0 2 4 6 8 10
-10 -8 -6 -4 -2 0 2 4 6 8 10LogRatio
-10 -8 -6 -4 -2 0 2 4 6 8 10
Prior distribution of
Data model given
Posterior distribution of given data (Bayes theorem)
P(>0|data)
),1
1
(~,...,,,| :Posterior
),(~,| x:Likelihood
),(~,| :Prior
22
22
22
22
12
22
22
nn
xn
Nxx
N
N
n
2-15-05 11
Bayesian Estimation
• Bayesian point-estimate is the expected value of the parameter under its posterior distribution given data
22
22
12
22
22
22
22
12
1
1
],...,,,|E[ ),1
1
(~,...,,,| :Posterior
n
xn
xxnn
xn
Nxx nn
• In some cases, the expectation of the posterior distribution could be difficult to assess - easer to find the value for the parameter that maximized the posterior distribution given data - Maximum a Posteriori (MAP) estimate
• Since the numerator of the posterior distribution in the Bayes theorem is constant in the parameter, this is equivalent to maximizing the product of the likelihood and the prior pdf
),...,(
)()|,...,(),...,| :Posterior
1
11
n
nn xxD
μpμxxlxxf(μ
2-15-05 12
Alternative prior for the normal model
)|,...,(*),...,(
)()|,...,(),...,|( :Posterior
1) :Prior
11
11
nn
nn xxlconst
xxP
pxxlxxf
p(
• Degenerate uniform prior for assuming that any prior value is equally likely - this is clearly unrealistic - we know more than that
• MAP estimate for is identical to the maximum likelihood estimate• Bayesian point-estimation and maximum likelihood are very closely
related
2-15-05 13
Hierarchical Bayesian Models and Empirical Bayes Inference
• If we are not happy with pre-specifying and 2, we can estimate them based on the "marginal" distribution of the data given and 2 and plug them back into the formula for the Bayesian estimate - the result is the Empirical Bayes estimate
•xi ~ ind N(i,2), i=1,...,n, assume that variance is known
•Need to estimate i , i=1,...,n
•The simplest estimate is
•Assuming that i ~ iid N(,2), i=1,...,n
iiμ̂ x
22
22
12
22
22
22
22
12
11
11
],...,,,|E[ ),1
11
(~,...,,,| :Posterior
i
ni
i
ni
xxx
n
xNxx
2-15-05 14
Hierarchical Bayesian Models and Empirical Bayes Inference
iiμ̂ x
•If xi ~ ind N(i,2), i ~ iid N(,2), i=1,...,n,
•The "marginal" distribution of each xi, with i's "factored out" is
N(,2+2), i=1,...,n
•Now we can estimate using say maximum likelihood and
plug them back into the formula for the Bayesian estimates of
i's
2ˆ and ˆ
22
22
1ˆ1
1ˆˆ1
of estimate Bayes Empirical
i
i
x
2-15-05 15
Hierarchical Bayesian Models and Empirical Bayes Inference
•The estimates for individual means are "shrunk" towards the mean of all means•Turns out such estimates are better overall than estimates based on the individual observations ("Stein effect")•Individual observations from our model can be replaced with groups of observations
x1i,x2
i,...,xki ~ ind N(i,2)
•Limma does the similar thing, only with variances•Data for each gene i are assumed to be distributed asx1
i,x2i,...,xk
i ~ iid N(i,i2), and the means are estimated in the usual way, while
an additional hierarchy is placed on the variances describing how variances are expected to vary across genes:
sassumptionminor some 1
~,|1
:Prior 2d2
00
2002 0
sd
sd 1
ˆ)1(],,ˆ|[~
0
22002
0022
nd
nsdsdEs i
iii
2-15-05 16
Hierarchical Bayesian Models and Empirical Bayes Inference
•Testing the hypothesis i=0, by calculating the modified t-statistics
1d*
0t~
1~
ˆ n
i ns
t