Post on 01-Oct-2020
SDS 321: Introduction to Probability andStatistics
Lecture 24: Maximum Likelihood Estimation
Purnamrita SarkarDepartment of Statistics and Data Science
The University of Texas at Austin
www.cs.cmu.edu/∼psarkar/teaching
1
Roadmap
I Frequentist Statistics introduction
I Biased, Unbiased and Asymptotically unbiased estimators
I M.L.E and how to find it.
2
Frequentist Statistics
I The parameter(s) θ is fixed and unknown
I Data is generated through the likelihood function p(X ; θ) (ifdiscrete) or f (X ; θ) (if continuous).
I Now we will be dealing with multiple candidate models, one for eachvalue of θ
I We will use Eθ[h(X )] to define the expectation of the randomvariable h(X ) as a function of parameter θ
3
Problems we will look at
I Parameter estimation: We want to estimate unknown parametersfrom data.
I Maximum Likelihood estimation (section 9.1): Select theparameter that makes the observed data most likely.
I i.e. maximize the probability of obtaining the data at hand.
I Hypothesis testing: An unknown parameter takes a finite numberof values. One wants to find the best hypothesis based on the data.
I Significance testing: Given a hypothesis, figure out the rejectionregion and reject the hypothesis if the observation falls within thisregion.
4
Classical parameter estimation
We are given observations X = (X1, . . . ,Xn). An estimator is a randomvariable of the form Θ = g(X ) (sometime also denoted by Θn).
I Since the distribution of X depends on θ, so does the distribution Θn
I The mean and variance of Θn can be defined as Eθ[Θn] andvarθ(Θn).
I For simplicity we will also use E [.] and var[.] and drop the θ from thenotation
I The estimation error denoted by Θn = Θn − θ.
I Bias of an estimator is given by bθ(Θn) = Eθ[Θn]− θI An Unbiased estimator is one for which E [Θn] = θ
I An asymptotically unbiased estimator is one for whichlim
n→∞E [Θn] = θ
5
Maximum Likelihood Estimation
I Find the θ that maximizes the joint likelihood p(X1, . . . ,Xn; θ) (ofthe joint pdf for continuous random variables).
I We have P(Xi ; θ) for random variable Xi
I Often Xi are independent and so P(X1, . . . ,Xn; θ) =∏i
P(Xi ; θ)
I We want to calculate the Maximum Likelihood Estimate θ
I First calculate P(X1, . . . ,Xn; θ) =∏i
P(Xi ; θ)
I Now calculate the logarithm. logP(X1, . . . ,Xn; θ) =∑i
logP(Xi ; θ)
I Now take a derivative and set it to zero.d
dθ
∑i
logP(Xi ; θ) = 0 and
solve for θ
6
Likelihood vs. Log likelihood
(A) Likelihood (B) Log-likelihood[Takeaway]
I Loglikelihood is a monotonic function of the likelihood.
I So the maximum is achieved at the same point, albeit, in most caseswith a lot less computation.
7
Estimating the parameter of the Exponential
n customers arrive at a mall at times Yi . We take Y0 = 0. The interarrival times are Xi = Yi − Yi−1 are often modeled as i.i.d Exponential(λ)
r.v’s. We want to the MLE of λ.
I First write fXi(xi ;λ) = λe−λxi
I Now write the joint likelihood fX (x ;λ) =∏i
λe−λxi = λn∏i
e−λxi
I Now take logarithm of the joint likelihood.log fX (x ;λ) = n log λ− λ
∑i
xi .
I Differentiate and set to zero to get the MLE.
n1
λ−∑i
xi = 0→ λ =1∑i xi/n
.
8
Maximum Likelihood EstimationLets start with an example.
I You have a n i.i.d. bernoulli random variables Xi ∼ Bernoulli(p).I Find the Maximum Likelihood Estimate of p
I First write the likelihood of r.v i conveniently as P(Xi ; p) = pXi (1− p)1−XiI Now note that the r.v’s are all independent and so
P(X1, . . . ,Xn; p) =n∏
i=1
pXi (1− p)1−Xi
I Its often more convenient to maximizes the logarithm of a product form.
I
logP(X1, . . . ,Xn; p) =∑i
logP(Xi ; p)
=∑i
log(pXi (1− p)1−Xi
)=∑i
Xi log p +∑i
(1− Xi ) log(1− p)
I p = arg maxp
logP(X1, . . . ,Xn; p)
I ∑i Xip−
n −∑
i Xi1− p
= 0→ p =
∑i Xin
9
Maximum Likelihood EstimationLets start with an example.
I You have a n i.i.d. bernoulli random variables Xi ∼ Bernoulli(p).I Find the Maximum Likelihood Estimate of pI First write the likelihood of r.v i conveniently as P(Xi ; p) = pXi (1− p)1−Xi
I Now note that the r.v’s are all independent and so
P(X1, . . . ,Xn; p) =n∏
i=1
pXi (1− p)1−Xi
I Its often more convenient to maximizes the logarithm of a product form.
I
logP(X1, . . . ,Xn; p) =∑i
logP(Xi ; p)
=∑i
log(pXi (1− p)1−Xi
)=∑i
Xi log p +∑i
(1− Xi ) log(1− p)
I p = arg maxp
logP(X1, . . . ,Xn; p)
I ∑i Xip−
n −∑
i Xi1− p
= 0→ p =
∑i Xin
9
Maximum Likelihood EstimationLets start with an example.
I You have a n i.i.d. bernoulli random variables Xi ∼ Bernoulli(p).I Find the Maximum Likelihood Estimate of pI First write the likelihood of r.v i conveniently as P(Xi ; p) = pXi (1− p)1−XiI Now note that the r.v’s are all independent and so
P(X1, . . . ,Xn; p) =n∏
i=1
pXi (1− p)1−Xi
I Its often more convenient to maximizes the logarithm of a product form.
I
logP(X1, . . . ,Xn; p) =∑i
logP(Xi ; p)
=∑i
log(pXi (1− p)1−Xi
)=∑i
Xi log p +∑i
(1− Xi ) log(1− p)
I p = arg maxp
logP(X1, . . . ,Xn; p)
I ∑i Xip−
n −∑
i Xi1− p
= 0→ p =
∑i Xin
9
Maximum Likelihood EstimationLets start with an example.
I You have a n i.i.d. bernoulli random variables Xi ∼ Bernoulli(p).I Find the Maximum Likelihood Estimate of pI First write the likelihood of r.v i conveniently as P(Xi ; p) = pXi (1− p)1−XiI Now note that the r.v’s are all independent and so
P(X1, . . . ,Xn; p) =n∏
i=1
pXi (1− p)1−Xi
I Its often more convenient to maximizes the logarithm of a product form.
I
logP(X1, . . . ,Xn; p) =∑i
logP(Xi ; p)
=∑i
log(pXi (1− p)1−Xi
)=∑i
Xi log p +∑i
(1− Xi ) log(1− p)
I p = arg maxp
logP(X1, . . . ,Xn; p)
I ∑i Xip−
n −∑
i Xi1− p
= 0→ p =
∑i Xin
9
Maximum Likelihood EstimationLets start with an example.
I You have a n i.i.d. bernoulli random variables Xi ∼ Bernoulli(p).I Find the Maximum Likelihood Estimate of pI First write the likelihood of r.v i conveniently as P(Xi ; p) = pXi (1− p)1−XiI Now note that the r.v’s are all independent and so
P(X1, . . . ,Xn; p) =n∏
i=1
pXi (1− p)1−Xi
I Its often more convenient to maximizes the logarithm of a product form.
I
logP(X1, . . . ,Xn; p) =∑i
logP(Xi ; p)
=∑i
log(pXi (1− p)1−Xi
)=∑i
Xi log p +∑i
(1− Xi ) log(1− p)
I p = arg maxp
logP(X1, . . . ,Xn; p)
I ∑i Xip−
n −∑
i Xi1− p
= 0→ p =
∑i Xin
9
Maximum Likelihood EstimationLets start with an example.
I You have a n i.i.d. bernoulli random variables Xi ∼ Bernoulli(p).I Find the Maximum Likelihood Estimate of pI First write the likelihood of r.v i conveniently as P(Xi ; p) = pXi (1− p)1−XiI Now note that the r.v’s are all independent and so
P(X1, . . . ,Xn; p) =n∏
i=1
pXi (1− p)1−Xi
I Its often more convenient to maximizes the logarithm of a product form.
I
logP(X1, . . . ,Xn; p) =∑i
logP(Xi ; p)
=∑i
log(pXi (1− p)1−Xi
)=∑i
Xi log p +∑i
(1− Xi ) log(1− p)
I p = arg maxp
logP(X1, . . . ,Xn; p)
I ∑i Xip−
n −∑
i Xi1− p
= 0→ p =
∑i Xin
9
MLE estimate of mean of Gaussian r.v.’s
I have n iid random variables from a N(µ, σ2) distribution. I know σ2 butnot µ. Whats the MLE of µ?
I Notation: X = (X1, . . . ,Xn) and x = (x1, . . . , xn).
I First write fX (x ;µ, σ) =1√
2πσ2e− (xi−µ)
2
2σ2
I Now write the joint PDF fX (x ;µ, σ) =∏i
fXi(xi ;µ, σ)
I Now take the logarithm of the joint likelihood.
log fX (x ;µ, σ) = −n log(√
2πσ2)−∑i
(xi − µ)2
2σ2
I Derive w.r.t µ and set to zero to solve for µ. −∑
i (xi − µ)
σ2= 0
I Derive w.r.t µ and set to zero to solve for µ.∑i
xi − nµ = 0→ µ =
∑i xin
I The MLE is just the sample mean, which is often denoted by X
10
MLE estimate of mean of Gaussian r.v.’s
I have n iid random variables from a N(µ, σ2) distribution. I know σ2 butnot µ. Whats the MLE of µ?
I Notation: X = (X1, . . . ,Xn) and x = (x1, . . . , xn).
I First write fX (x ;µ, σ) =1√
2πσ2e− (xi−µ)
2
2σ2
I Now write the joint PDF fX (x ;µ, σ) =∏i
fXi(xi ;µ, σ)
I Now take the logarithm of the joint likelihood.
log fX (x ;µ, σ) = −n log(√
2πσ2)−∑i
(xi − µ)2
2σ2
I Derive w.r.t µ and set to zero to solve for µ. −∑
i (xi − µ)
σ2= 0
I Derive w.r.t µ and set to zero to solve for µ.∑i
xi − nµ = 0→ µ =
∑i xin
I The MLE is just the sample mean, which is often denoted by X
10
MLE estimate of mean of Gaussian r.v.’s
I have n iid random variables from a N(µ, σ2) distribution. I know σ2 butnot µ. Whats the MLE of µ?
I Notation: X = (X1, . . . ,Xn) and x = (x1, . . . , xn).
I First write fX (x ;µ, σ) =1√
2πσ2e− (xi−µ)
2
2σ2
I Now write the joint PDF fX (x ;µ, σ) =∏i
fXi(xi ;µ, σ)
I Now take the logarithm of the joint likelihood.
log fX (x ;µ, σ) = −n log(√
2πσ2)−∑i
(xi − µ)2
2σ2
I Derive w.r.t µ and set to zero to solve for µ. −∑
i (xi − µ)
σ2= 0
I Derive w.r.t µ and set to zero to solve for µ.∑i
xi − nµ = 0→ µ =
∑i xin
I The MLE is just the sample mean, which is often denoted by X
10
MLE estimate of mean of Gaussian r.v.’s
I have n iid random variables from a N(µ, σ2) distribution. I know σ2 butnot µ. Whats the MLE of µ?
I Notation: X = (X1, . . . ,Xn) and x = (x1, . . . , xn).
I First write fX (x ;µ, σ) =1√
2πσ2e− (xi−µ)
2
2σ2
I Now write the joint PDF fX (x ;µ, σ) =∏i
fXi(xi ;µ, σ)
I Now take the logarithm of the joint likelihood.
log fX (x ;µ, σ) = −n log(√
2πσ2)−∑i
(xi − µ)2
2σ2
I Derive w.r.t µ and set to zero to solve for µ. −∑
i (xi − µ)
σ2= 0
I Derive w.r.t µ and set to zero to solve for µ.∑i
xi − nµ = 0→ µ =
∑i xin
I The MLE is just the sample mean, which is often denoted by X
10
MLE estimate of mean of Gaussian r.v.’s
I have n iid random variables from a N(µ, σ2) distribution. I know σ2 butnot µ. Whats the MLE of µ?
I Notation: X = (X1, . . . ,Xn) and x = (x1, . . . , xn).
I First write fX (x ;µ, σ) =1√
2πσ2e− (xi−µ)
2
2σ2
I Now write the joint PDF fX (x ;µ, σ) =∏i
fXi(xi ;µ, σ)
I Now take the logarithm of the joint likelihood.
log fX (x ;µ, σ) = −n log(√
2πσ2)−∑i
(xi − µ)2
2σ2
I Derive w.r.t µ and set to zero to solve for µ. −∑
i (xi − µ)
σ2= 0
I Derive w.r.t µ and set to zero to solve for µ.∑i
xi − nµ = 0→ µ =
∑i xin
I The MLE is just the sample mean, which is often denoted by X
10
MLE estimate of mean of Gaussian r.v.’s
I have n iid random variables from a N(µ, σ2) distribution. I do not knowσ2 or µ. Whats the MLE of µ and σ2?
I We start with taking the logarithm of the joint likelihood.
log fX (x ;µ, σ) = −n log(√
2πσ2)−∑i
(xi − µ)2
2σ2∗
I First derive w.r.t µ and set to zero to solve for µ. −∑
i (xi − µ)
σ2= 0
I Next derive w.r.t σ and set to zero to solve for σ.
− n
σ+∑i
2(xi − nµ)2
2σ3= 0
∗Thanks to Aneesh Adhikari-Desai for correcting this!11
MLE estimate of mean of Gaussian r.v.’s
I have n iid random variables from a N(µ, σ2) distribution. I do not knowσ2 or µ. Whats the MLE of µ and σ2?
I We start with taking the logarithm of the joint likelihood.
log fX (x ;µ, σ) = −n log(√
2πσ2)−∑i
(xi − µ)2
2σ2∗
I First derive w.r.t µ and set to zero to solve for µ. −∑
i (xi − µ)
σ2= 0
I Next derive w.r.t σ and set to zero to solve for σ.
− n
σ+∑i
2(xi − nµ)2
2σ3= 0
∗Thanks to Aneesh Adhikari-Desai for correcting this!11
MLE estimate of mean of Gaussian r.v.’s
I have n iid random variables from a N(µ, σ2) distribution. I do not knowσ2 or µ. Whats the MLE of µ and σ2?
I We start with taking the logarithm of the joint likelihood.
log fX (x ;µ, σ) = −n log(√
2πσ2)−∑i
(xi − µ)2
2σ2∗
I First derive w.r.t µ and set to zero to solve for µ. −∑
i (xi − µ)
σ2= 0
I Next derive w.r.t σ and set to zero to solve for σ.
− n
σ+∑i
2(xi − nµ)2
2σ3= 0
∗Thanks to Aneesh Adhikari-Desai for correcting this!11
MLE estimate of mean of Gaussian r.v.’s
I have n iid random variables from a N(µ, σ2) distribution. I do not knowσ2 or µ. Whats the MLE of µ and σ2?
I We start with taking the logarithm of the joint likelihood.
log fX (x ;µ, σ) = −n log(√
2πσ2)−∑i
(xi − µ)2
2σ2∗
I First derive w.r.t µ and set to zero to solve for µ. −∑
i (xi − µ)
σ2= 0
I Next derive w.r.t σ and set to zero to solve for σ.
− n
σ+∑i
2(xi − nµ)2
2σ3= 0
∗Thanks to Aneesh Adhikari-Desai for correcting this!11
MLE estimate of mean of Gaussian r.v.’s
I have n iid random variables from a N(µ, σ2) distribution. I do not knowσ2 or µ. Whats the MLE of µ and σ2?
I Now you have two equations and two unknowns!†∑i (xi − µ)
σ2= 0
− n
σ+∑i
(xi − µ)2
σ3= 0
I Solving we see:
µ =∑i
xi/n = x σ2 =
∑i (xi − x)2
n
†Thanks to Aneesh Adhikari-Desai for correcting this!12
MLE estimate of mean of Gaussian r.v.’s
I have n iid random variables from a N(µ, σ2) distribution. I do not knowσ2 or µ. Whats the MLE of µ and σ2?
I Now you have two equations and two unknowns!†∑i (xi − µ)
σ2= 0
− n
σ+∑i
(xi − µ)2
σ3= 0
I Solving we see:
µ =∑i
xi/n = x σ2 =
∑i (xi − x)2
n
†Thanks to Aneesh Adhikari-Desai for correcting this!12
MLE estimate of mean of Gaussian r.v.’s
I have n iid random variables from a N(µ, σ2) distribution. I do not knowσ2 or µ. Whats the MLE of µ and σ2?
I Now you have two equations and two unknowns!†∑i (xi − µ)
σ2= 0
− n
σ+∑i
(xi − µ)2
σ3= 0
I Solving we see:
µ =∑i
xi/n = x σ2 =
∑i (xi − x)2
n
†Thanks to Aneesh Adhikari-Desai for correcting this!12
Estimating the lower limit of the Uniform
I have n independent Uniform([a, 1]) random variables. Whats the MLEof a? Although most time taking the log helps, sometimes its easier towork with the joint likelihood directly.
I First write f (Xi ; a) =1{a ≤ Xi ≤ 1}
(1− a)
I Now write f (X1,X2, . . .Xn; a) =1
(1− a)n
n∏i=1
1{a ≤ Xi ≤ 1} =
1{a ≤ min(X1,X2, . . . ) ≤ max(X1,X2, . . . ) ≤ 1}(1− a)n
‡
I How do I maximize this? Well, a has to be less thanmin(X1, . . . ,Xn), otherwise the likelihood will be zero.
I For all a ≤ min(X1, . . . ,Xn), what maximizes 1/(1− a)n?
I For all a ≤ min(X1, . . . ,Xn), what is the largest a? (larger a smaller1/(1− a)n.
I a = min(X1, . . . ,Xn)!
‡Thanks Miles Hutson for correcting this!13
Estimating the lower limit of the Uniform
I have n independent Uniform([a, 1]) random variables. Whats the MLEof a? Although most time taking the log helps, sometimes its easier towork with the joint likelihood directly.
I First write f (Xi ; a) =1{a ≤ Xi ≤ 1}
(1− a)
I Now write f (X1,X2, . . .Xn; a) =1
(1− a)n
n∏i=1
1{a ≤ Xi ≤ 1} =
1{a ≤ min(X1,X2, . . . ) ≤ max(X1,X2, . . . ) ≤ 1}(1− a)n
‡
I How do I maximize this? Well, a has to be less thanmin(X1, . . . ,Xn), otherwise the likelihood will be zero.
I For all a ≤ min(X1, . . . ,Xn), what maximizes 1/(1− a)n?
I For all a ≤ min(X1, . . . ,Xn), what is the largest a? (larger a smaller1/(1− a)n.
I a = min(X1, . . . ,Xn)!
‡Thanks Miles Hutson for correcting this!13
Estimating the lower limit of the Uniform
I have n independent Uniform([a, 1]) random variables. Whats the MLEof a? Although most time taking the log helps, sometimes its easier towork with the joint likelihood directly.
I First write f (Xi ; a) =1{a ≤ Xi ≤ 1}
(1− a)
I Now write f (X1,X2, . . .Xn; a) =1
(1− a)n
n∏i=1
1{a ≤ Xi ≤ 1} =
1{a ≤ min(X1,X2, . . . ) ≤ max(X1,X2, . . . ) ≤ 1}(1− a)n
‡
I How do I maximize this? Well, a has to be less thanmin(X1, . . . ,Xn), otherwise the likelihood will be zero.
I For all a ≤ min(X1, . . . ,Xn), what maximizes 1/(1− a)n?
I For all a ≤ min(X1, . . . ,Xn), what is the largest a? (larger a smaller1/(1− a)n.
I a = min(X1, . . . ,Xn)!
‡Thanks Miles Hutson for correcting this!13
Estimating the lower limit of the Uniform
I have n independent Uniform([a, 1]) random variables. Whats the MLEof a? Although most time taking the log helps, sometimes its easier towork with the joint likelihood directly.
I First write f (Xi ; a) =1{a ≤ Xi ≤ 1}
(1− a)
I Now write f (X1,X2, . . .Xn; a) =1
(1− a)n
n∏i=1
1{a ≤ Xi ≤ 1} =
1{a ≤ min(X1,X2, . . . ) ≤ max(X1,X2, . . . ) ≤ 1}(1− a)n
‡
I How do I maximize this? Well, a has to be less thanmin(X1, . . . ,Xn), otherwise the likelihood will be zero.
I For all a ≤ min(X1, . . . ,Xn), what maximizes 1/(1− a)n?
I For all a ≤ min(X1, . . . ,Xn), what is the largest a? (larger a smaller1/(1− a)n.
I a = min(X1, . . . ,Xn)!
‡Thanks Miles Hutson for correcting this!13
Estimating the lower limit of the Uniform
I have n independent Uniform([a, 1]) random variables. Whats the MLEof a? Although most time taking the log helps, sometimes its easier towork with the joint likelihood directly.
I First write f (Xi ; a) =1{a ≤ Xi ≤ 1}
(1− a)
I Now write f (X1,X2, . . .Xn; a) =1
(1− a)n
n∏i=1
1{a ≤ Xi ≤ 1} =
1{a ≤ min(X1,X2, . . . ) ≤ max(X1,X2, . . . ) ≤ 1}(1− a)n
‡
I How do I maximize this? Well, a has to be less thanmin(X1, . . . ,Xn), otherwise the likelihood will be zero.
I For all a ≤ min(X1, . . . ,Xn), what maximizes 1/(1− a)n?
I For all a ≤ min(X1, . . . ,Xn), what is the largest a? (larger a smaller1/(1− a)n.
I a = min(X1, . . . ,Xn)!
‡Thanks Miles Hutson for correcting this!13
Unbiased and asymptotically unbiased estimators
Recall that the MLE of σ2 obtained using n iid random variables from aN(µ, σ2) distribution are given by:
σ2 =
∑i (xi − x)2
nσ2 =
∑i
(xi − µ)2
n
This is also the sample variance. Is it unbiased?
I First note that∑i
(xi − x)2
n=∑i
x2i − 2xi x + x2
n
=∑i
x2in− 2x2 + x2
=∑i
x2in− x2
14
Unbiased and asymptotically unbiased estimators
Recall that the MLE of σ2 obtained using n iid random variables from aN(µ, σ2) distribution are given by:
σ2 =
∑i (xi − x)2
nσ2 =
∑i
(xi − µ)2
n
This is also the sample variance. Is it unbiased?
I First note that∑i
(xi − x)2
n=∑i
x2i − 2xi x + x2
n
=∑i
x2in− 2x2 + x2
=∑i
x2in− x2
14
Unbiased and asymptotically unbiased estimators
I E
∑i
(xi − x)2
n
= E [X21 ]− E [X2]
I E [X21 ] = σ2 + µ2 and E [X2] = var(X ) + E [X ]2 =
σ2
n+ µ2
I And so, E [σ2] = σ2 − σ2/n = σ2(1− 1/n)
I So the MLE of the variance is not unbiased. However it isasymptotically unbiased!
I Also you can have another estimator∑i
(xi − x)2/(n − 1), which is
not the MLE, but it is unbiased.
15
Unbiased and asymptotically unbiased estimators
I E
∑i
(xi − x)2
n
= E [X21 ]− E [X2]
I E [X21 ] = σ2 + µ2 and E [X2] = var(X ) + E [X ]2 =
σ2
n+ µ2
I And so, E [σ2] = σ2 − σ2/n = σ2(1− 1/n)
I So the MLE of the variance is not unbiased. However it isasymptotically unbiased!
I Also you can have another estimator∑i
(xi − x)2/(n − 1), which is
not the MLE, but it is unbiased.
15
Unbiased and asymptotically unbiased estimators
I E
∑i
(xi − x)2
n
= E [X21 ]− E [X2]
I E [X21 ] = σ2 + µ2 and E [X2] = var(X ) + E [X ]2 =
σ2
n+ µ2
I And so, E [σ2] = σ2 − σ2/n = σ2(1− 1/n)
I So the MLE of the variance is not unbiased. However it isasymptotically unbiased!
I Also you can have another estimator∑i
(xi − x)2/(n − 1), which is
not the MLE, but it is unbiased.
15
Unbiased and asymptotically unbiased estimators
I E
∑i
(xi − x)2
n
= E [X21 ]− E [X2]
I E [X21 ] = σ2 + µ2 and E [X2] = var(X ) + E [X ]2 =
σ2
n+ µ2
I And so, E [σ2] = σ2 − σ2/n = σ2(1− 1/n)
I So the MLE of the variance is not unbiased. However it isasymptotically unbiased!
I Also you can have another estimator∑i
(xi − x)2/(n − 1), which is
not the MLE, but it is unbiased.
15
Unbiased and asymptotically unbiased estimators
I E
∑i
(xi − x)2
n
= E [X21 ]− E [X2]
I E [X21 ] = σ2 + µ2 and E [X2] = var(X ) + E [X ]2 =
σ2
n+ µ2
I And so, E [σ2] = σ2 − σ2/n = σ2(1− 1/n)
I So the MLE of the variance is not unbiased. However it isasymptotically unbiased!
I Also you can have another estimator∑i
(xi − x)2/(n − 1), which is
not the MLE, but it is unbiased.
15