SDS 321: Introduction to Probability and Statistics ...psarkar/sds321/lecture24-new.pdf · I...

SDS 321: Introduction to Probability andStatistics

Lecture 24: Maximum Likelihood Estimation

Purnamrita SarkarDepartment of Statistics and Data Science

The University of Texas at Austin

www.cs.cmu.edu/∼psarkar/teaching

Roadmap

I Frequentist Statistics introduction

I Biased, Unbiased and Asymptotically unbiased estimators

I M.L.E and how to find it.

Frequentist Statistics

I The parameter(s) θ is fixed and unknown

I Data is generated through the likelihood function p(X ; θ) (ifdiscrete) or f (X ; θ) (if continuous).

I Now we will be dealing with multiple candidate models, one for eachvalue of θ

I We will use Eθ[h(X )] to define the expectation of the randomvariable h(X ) as a function of parameter θ

Problems we will look at

I Parameter estimation: We want to estimate unknown parametersfrom data.

I Maximum Likelihood estimation (section 9.1): Select theparameter that makes the observed data most likely.

I i.e. maximize the probability of obtaining the data at hand.

I Hypothesis testing: An unknown parameter takes a finite numberof values. One wants to find the best hypothesis based on the data.

I Significance testing: Given a hypothesis, figure out the rejectionregion and reject the hypothesis if the observation falls within thisregion.

Classical parameter estimation

We are given observations X = (X1, . . . ,Xn). An estimator is a randomvariable of the form Θ = g(X ) (sometime also denoted by Θn).

I Since the distribution of X depends on θ, so does the distribution Θn

I The mean and variance of Θn can be defined as Eθ[Θn] andvarθ(Θn).

I For simplicity we will also use E [.] and var[.] and drop the θ from thenotation

I The estimation error denoted by Θn = Θn − θ.

I Bias of an estimator is given by bθ(Θn) = Eθ[Θn]− θI An Unbiased estimator is one for which E [Θn] = θ

I An asymptotically unbiased estimator is one for whichlim

n→∞E [Θn] = θ

Maximum Likelihood Estimation

I Find the θ that maximizes the joint likelihood p(X1, . . . ,Xn; θ) (ofthe joint pdf for continuous random variables).

I We have P(Xi ; θ) for random variable Xi

I Often Xi are independent and so P(X1, . . . ,Xn; θ) =∏i

P(Xi ; θ)

I We want to calculate the Maximum Likelihood Estimate θ

I First calculate P(X1, . . . ,Xn; θ) =∏i

P(Xi ; θ)

I Now calculate the logarithm. logP(X1, . . . ,Xn; θ) =∑i

logP(Xi ; θ)

I Now take a derivative and set it to zero.d

logP(Xi ; θ) = 0 and

solve for θ

Likelihood vs. Log likelihood

(A) Likelihood (B) Log-likelihood[Takeaway]

I Loglikelihood is a monotonic function of the likelihood.

I So the maximum is achieved at the same point, albeit, in most caseswith a lot less computation.

Estimating the parameter of the Exponential

n customers arrive at a mall at times Yi . We take Y0 = 0. The interarrival times are Xi = Yi − Yi−1 are often modeled as i.i.d Exponential(λ)

r.v’s. We want to the MLE of λ.

I First write fXi(xi ;λ) = λe−λxi

I Now write the joint likelihood fX (x ;λ) =∏i

λe−λxi = λn∏i

e−λxi

I Now take logarithm of the joint likelihood.log fX (x ;λ) = n log λ− λ

I Differentiate and set to zero to get the MLE.

λ−∑i

xi = 0→ λ =1∑i xi/n

Maximum Likelihood EstimationLets start with an example.

I You have a n i.i.d. bernoulli random variables Xi ∼ Bernoulli(p).I Find the Maximum Likelihood Estimate of p

I First write the likelihood of r.v i conveniently as P(Xi ; p) = pXi (1− p)1−XiI Now note that the r.v’s are all independent and so

P(X1, . . . ,Xn; p) =n∏

pXi (1− p)1−Xi

I Its often more convenient to maximizes the logarithm of a product form.

logP(X1, . . . ,Xn; p) =∑i

logP(Xi ; p)

log(pXi (1− p)1−Xi

)=∑i

Xi log p +∑i

(1− Xi ) log(1− p)

I p = arg maxp

logP(X1, . . . ,Xn; p)

I ∑i Xip−

n −∑

i Xi1− p

= 0→ p =

∑i Xin

I You have a n i.i.d. bernoulli random variables Xi ∼ Bernoulli(p).I Find the Maximum Likelihood Estimate of pI First write the likelihood of r.v i conveniently as P(Xi ; p) = pXi (1− p)1−Xi

I Now note that the r.v’s are all independent and so

P(X1, . . . ,Xn; p) =n∏

pXi (1− p)1−Xi

logP(X1, . . . ,Xn; p) =∑i

logP(Xi ; p)

)=∑i

Xi log p +∑i

(1− Xi ) log(1− p)

I p = arg maxp

logP(X1, . . . ,Xn; p)

I ∑i Xip−

n −∑

i Xi1− p

= 0→ p =

∑i Xin

I You have a n i.i.d. bernoulli random variables Xi ∼ Bernoulli(p).I Find the Maximum Likelihood Estimate of pI First write the likelihood of r.v i conveniently as P(Xi ; p) = pXi (1− p)1−XiI Now note that the r.v’s are all independent and so

P(X1, . . . ,Xn; p) =n∏

pXi (1− p)1−Xi

logP(X1, . . . ,Xn; p) =∑i

logP(Xi ; p)

)=∑i

Xi log p +∑i

(1− Xi ) log(1− p)

I p = arg maxp

logP(X1, . . . ,Xn; p)

I ∑i Xip−

n −∑

i Xi1− p

= 0→ p =

∑i Xin

P(X1, . . . ,Xn; p) =n∏

pXi (1− p)1−Xi

logP(X1, . . . ,Xn; p) =∑i

logP(Xi ; p)

)=∑i

Xi log p +∑i

(1− Xi ) log(1− p)

I p = arg maxp

logP(X1, . . . ,Xn; p)

I ∑i Xip−

n −∑

i Xi1− p

= 0→ p =

∑i Xin

P(X1, . . . ,Xn; p) =n∏

pXi (1− p)1−Xi

logP(X1, . . . ,Xn; p) =∑i

logP(Xi ; p)

)=∑i

Xi log p +∑i

(1− Xi ) log(1− p)

I p = arg maxp

logP(X1, . . . ,Xn; p)

I ∑i Xip−

n −∑

i Xi1− p

= 0→ p =

∑i Xin

P(X1, . . . ,Xn; p) =n∏

pXi (1− p)1−Xi

logP(X1, . . . ,Xn; p) =∑i

logP(Xi ; p)

)=∑i

Xi log p +∑i

(1− Xi ) log(1− p)

I p = arg maxp

logP(X1, . . . ,Xn; p)

I ∑i Xip−

n −∑

i Xi1− p

= 0→ p =

∑i Xin

MLE estimate of mean of Gaussian r.v.’s

I have n iid random variables from a N(µ, σ2) distribution. I know σ2 butnot µ. Whats the MLE of µ?

I Notation: X = (X1, . . . ,Xn) and x = (x1, . . . , xn).

I First write fX (x ;µ, σ) =1√

2πσ2e− (xi−µ)

I Now write the joint PDF fX (x ;µ, σ) =∏i

fXi(xi ;µ, σ)

I Now take the logarithm of the joint likelihood.

log fX (x ;µ, σ) = −n log(√

2πσ2)−∑i

(xi − µ)2

I Derive w.r.t µ and set to zero to solve for µ. −∑

i (xi − µ)

σ2= 0

I Derive w.r.t µ and set to zero to solve for µ.∑i

xi − nµ = 0→ µ =

∑i xin

I The MLE is just the sample mean, which is often denoted by X

fXi(xi ;µ, σ)

2πσ2)−∑i

(xi − µ)2

i (xi − µ)

σ2= 0

xi − nµ = 0→ µ =

∑i xin

fXi(xi ;µ, σ)

2πσ2)−∑i

(xi − µ)2

i (xi − µ)

σ2= 0

xi − nµ = 0→ µ =

∑i xin

fXi(xi ;µ, σ)

2πσ2)−∑i

(xi − µ)2

i (xi − µ)

σ2= 0

xi − nµ = 0→ µ =

∑i xin

fXi(xi ;µ, σ)

2πσ2)−∑i

(xi − µ)2

i (xi − µ)

σ2= 0

xi − nµ = 0→ µ =

∑i xin

I have n iid random variables from a N(µ, σ2) distribution. I do not knowσ2 or µ. Whats the MLE of µ and σ2?

I We start with taking the logarithm of the joint likelihood.

2πσ2)−∑i

(xi − µ)2

2σ2∗

I First derive w.r.t µ and set to zero to solve for µ. −∑

i (xi − µ)

σ2= 0

I Next derive w.r.t σ and set to zero to solve for σ.

σ+∑i

2(xi − nµ)2

2σ3= 0

∗Thanks to Aneesh Adhikari-Desai for correcting this!11

2πσ2)−∑i

(xi − µ)2

2σ2∗

i (xi − µ)

σ2= 0

σ+∑i

2(xi − nµ)2

2σ3= 0

2πσ2)−∑i

(xi − µ)2

2σ2∗

i (xi − µ)

σ2= 0

σ+∑i

2(xi − nµ)2

2σ3= 0

2πσ2)−∑i

(xi − µ)2

2σ2∗

i (xi − µ)

σ2= 0

σ+∑i

2(xi − nµ)2

2σ3= 0

I Now you have two equations and two unknowns!†∑i (xi − µ)

σ2= 0

σ+∑i

(xi − µ)2

σ3= 0

I Solving we see:

µ =∑i

xi/n = x σ2 =

∑i (xi − x)2

†Thanks to Aneesh Adhikari-Desai for correcting this!12

σ2= 0

σ+∑i

(xi − µ)2

σ3= 0

I Solving we see:

µ =∑i

xi/n = x σ2 =

∑i (xi − x)2

σ2= 0

σ+∑i

(xi − µ)2

σ3= 0

I Solving we see:

µ =∑i

xi/n = x σ2 =

∑i (xi − x)2

Estimating the lower limit of the Uniform

I have n independent Uniform([a, 1]) random variables. Whats the MLEof a? Although most time taking the log helps, sometimes its easier towork with the joint likelihood directly.

I First write f (Xi ; a) =1{a ≤ Xi ≤ 1}

(1− a)

I Now write f (X1,X2, . . .Xn; a) =1

(1− a)n

n∏i=1

1{a ≤ Xi ≤ 1} =

1{a ≤ min(X1,X2, . . . ) ≤ max(X1,X2, . . . ) ≤ 1}(1− a)n

I How do I maximize this? Well, a has to be less thanmin(X1, . . . ,Xn), otherwise the likelihood will be zero.

I For all a ≤ min(X1, . . . ,Xn), what maximizes 1/(1− a)n?

I For all a ≤ min(X1, . . . ,Xn), what is the largest a? (larger a smaller1/(1− a)n.

I a = min(X1, . . . ,Xn)!

‡Thanks Miles Hutson for correcting this!13

(1− a)

(1− a)n

n∏i=1

1{a ≤ Xi ≤ 1} =

1{a ≤ min(X1,X2, . . . ) ≤ max(X1,X2, . . . ) ≤ 1}(1− a)n

I a = min(X1, . . . ,Xn)!

(1− a)

(1− a)n

n∏i=1

1{a ≤ Xi ≤ 1} =

1{a ≤ min(X1,X2, . . . ) ≤ max(X1,X2, . . . ) ≤ 1}(1− a)n

I a = min(X1, . . . ,Xn)!

(1− a)

(1− a)n

n∏i=1

1{a ≤ Xi ≤ 1} =

1{a ≤ min(X1,X2, . . . ) ≤ max(X1,X2, . . . ) ≤ 1}(1− a)n

I a = min(X1, . . . ,Xn)!

(1− a)

(1− a)n

n∏i=1

1{a ≤ Xi ≤ 1} =

1{a ≤ min(X1,X2, . . . ) ≤ max(X1,X2, . . . ) ≤ 1}(1− a)n

I a = min(X1, . . . ,Xn)!

Unbiased and asymptotically unbiased estimators

Recall that the MLE of σ2 obtained using n iid random variables from aN(µ, σ2) distribution are given by:

∑i (xi − x)2

nσ2 =

(xi − µ)2

This is also the sample variance. Is it unbiased?

I First note that∑i

(xi − x)2

n=∑i

x2i − 2xi x + x2

x2in− 2x2 + x2

x2in− x2

Recall that the MLE of σ2 obtained using n iid random variables from aN(µ, σ2) distribution are given by:

∑i (xi − x)2

nσ2 =

(xi − µ)2

This is also the sample variance. Is it unbiased?

I First note that∑i

(xi − x)2

n=∑i

x2i − 2xi x + x2

x2in− 2x2 + x2

x2in− x2

(xi − x)2

= E [X21 ]− E [X2]

I E [X21 ] = σ2 + µ2 and E [X2] = var(X ) + E [X ]2 =

n+ µ2

I And so, E [σ2] = σ2 − σ2/n = σ2(1− 1/n)

I So the MLE of the variance is not unbiased. However it isasymptotically unbiased!

I Also you can have another estimator∑i

(xi − x)2/(n − 1), which is

not the MLE, but it is unbiased.

(xi − x)2

= E [X21 ]− E [X2]

n+ µ2

I And so, E [σ2] = σ2 − σ2/n = σ2(1− 1/n)

(xi − x)2

= E [X21 ]− E [X2]

n+ µ2

I And so, E [σ2] = σ2 − σ2/n = σ2(1− 1/n)

(xi − x)2

= E [X21 ]− E [X2]

n+ µ2

I And so, E [σ2] = σ2 − σ2/n = σ2(1− 1/n)

(xi − x)2

= E [X21 ]− E [X2]

n+ µ2

I And so, E [σ2] = σ2 − σ2/n = σ2(1− 1/n)

SDS 321: Introduction to Probability and Statistics ...psarkar/sds321/lecture24-new.pdf · I...

Documents

Transcript of SDS 321: Introduction to Probability and Statistics ...psarkar/sds321/lecture24-new.pdf · I...

Lecture24 RConcrete Ret Walls

Lecture24 Abnormal 2014 - University of Arizonapsyc150allen.class.arizona.edu/.../files/Lecture24_Abnormal_2014.pdfA. Definition of abnormal behavior ... castrated, mutilated; ...

lecture24 - People | MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml13/slides/lecture24.pdf · David Sontag (NYU) Graphical Models Lecture 1, January 31, 2013 30 / 44. Bayesian networks

SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture23-new.pdfSDS 321: Introduction to Probability and Statistics Lecture 23: Continuous random variables-Inequalities,

Lecture24 basiccircuits

Lecture24 MOS Transistors

(lecture24.ppt) - students.cs.ubc.ca

Hierarchical Linear Models - markirwin.netmarkirwin.net/stat220/Lecture/Lecture24.pdf · Fitting Hierarchical Linear Models Not surprisingly, exact distributional results for these

SDS 321: Introduction to Probability and Statistics ...psarkar/sds321/lecture1-ps.pdf · 1.Introduction to Probability, by Dimitri P. Bertsekas and John N. Tsitsiklis. 2.A First Course

ACC 2013 lecture24 handout - UMD

SDS 321: Introduction to Probability and Statistics ...psarkar/sds321/lecture3-ps.pdf · SDS 321: Introduction to Probability and Statistics Lecture 3: Conditional probability and

Lecture24 - physics.udel.edunowak/phys207/05s/Lecture24.pdf(a) case 1 (b) case 2 (c) same L L F F axis case 1 case 2 6 Lecture 23, Act 3 Solution zTorque = F x (distance of closest

Lecture24 Navier Stokes

SDS 321: Introduction to Probability and Statistics Lecture 7 ...psarkar/sds321/lecture7-ps.pdfSDS 321: Introduction to Probability and Statistics Lecture 7: Counting II Purnamrita

Lecture24. Fluid ﬂow and elasticity •Recap blood ﬂow •Solid and …courses.washington.edu/biomechs/lectures/lecture24.pdf · 2016-11-23 · stiffening of red blood cells (RBCs)

Red Sox - Yankees. Baseball can not get more exciting than ...teacher.pas.rochester.edu/phy121/LectureSlides/Lecture24/Lecture24.pdfFrank L. H. Wolfs Department of Physics and Astronomy,

CMPE306 Lecture24 Frequency Response 2 101201

Lecture 24 - inst.eecs.berkeley.educs164/fa13/lectures/lecture24.pdf · Lecture 24 PublicServiceAnnouncement: “Hackers@BerkeleyishostingBEARHACK thisSaturdayat11AMintheWozniakLounge.It’sa24hourhackathon-there’llbetonsofgoodfood

Lecture24 Ee474 Otac Filters

lecture24 MPI FirstPage - GitHub Pages