The E-M Algorithm - Center for Statistical...

47
The E M Algorithm The E-M Algorithm Biostatistics 615/815 Lecture 17 Lecture 17

Transcript of The E-M Algorithm - Center for Statistical...

Page 1: The E-M Algorithm - Center for Statistical Geneticscsg.sph.umich.edu/abecasis/class/2008/615.17.pdfoptim() Function in R zoptim(point, function, method) • Point – starting point

The E M AlgorithmThe E-M Algorithm

Biostatistics 615/815Lecture 17Lecture 17

Page 2: The E-M Algorithm - Center for Statistical Geneticscsg.sph.umich.edu/abecasis/class/2008/615.17.pdfoptim() Function in R zoptim(point, function, method) • Point – starting point

Last Lecture:Last Lecture:The Simplex Method

General method for optimization• Makes few assumptions about function

Crawls towards minimum

Some recommendations• Multiple starting points• Restart maximization at proposed solution

Page 3: The E-M Algorithm - Center for Statistical Geneticscsg.sph.umich.edu/abecasis/class/2008/615.17.pdfoptim() Function in R zoptim(point, function, method) • Point – starting point

Summary:Summary:The Simplex Method

highOriginal Simplex

highlow

reflection contractionreflection contraction

reflection andmultiplecontraction

expansion

Page 4: The E-M Algorithm - Center for Statistical Geneticscsg.sph.umich.edu/abecasis/class/2008/615.17.pdfoptim() Function in R zoptim(point, function, method) • Point – starting point

Improvements to amoeba()

Different scaling along each dimension• If parameters have different impact on the likelihood

Track total function evaluations• Avoid getting stuck if function does not cooperateAvoid getting stuck if function does not cooperate

Rotate simplexp• If the current simplex is leading to slow improvement

Page 5: The E-M Algorithm - Center for Statistical Geneticscsg.sph.umich.edu/abecasis/class/2008/615.17.pdfoptim() Function in R zoptim(point, function, method) • Point – starting point

optim() Function in R

optim(point, function, method)

• Point – starting point for minimization• Function that accepts point as argumentp p g• Method can be

• "Nelder-Mead" for simplex method (default)• "BFGS", "CG" and other options use gradient

Page 6: The E-M Algorithm - Center for Statistical Geneticscsg.sph.umich.edu/abecasis/class/2008/615.17.pdfoptim() Function in R zoptim(point, function, method) • Point – starting point

Other Methods for Other Methods for Minimization in Multiple Dimensions

Typically, sophisticated methods will…

Use derivatives • May be calculated numerically. How?

Select a direction for minimization, using:• W i ht d f i di ti• Weighted average of previous directions• Current gradient• Avoid right angle turnsAvoid right angle turns

Page 7: The E-M Algorithm - Center for Statistical Geneticscsg.sph.umich.edu/abecasis/class/2008/615.17.pdfoptim() Function in R zoptim(point, function, method) • Point – starting point

One parameter at a timeSimple but inefficient approach

Consider• Parameters θ = (θ1, θ2, …, θk)• Function f (θ)

M i i θ ith t t h θ i tMaximize θ with respect to each θi in turn• Cycle through parameters

Page 8: The E-M Algorithm - Center for Statistical Geneticscsg.sph.umich.edu/abecasis/class/2008/615.17.pdfoptim() Function in R zoptim(point, function, method) • Point – starting point

The Inefficiency…

θ2

θ1

Page 9: The E-M Algorithm - Center for Statistical Geneticscsg.sph.umich.edu/abecasis/class/2008/615.17.pdfoptim() Function in R zoptim(point, function, method) • Point – starting point

Steepest Descent

Consider• Parameters θ = (θ1, θ2, …, θk)

F i f( )• Function f(θ; x)

SScore vector

⎟⎟⎞

⎜⎜⎛

==fdfdfdS lnlnln

• Find maximum along θ + δS

⎟⎟⎠

⎜⎜⎝

==kddd

Sθθθ

...,,1

Page 10: The E-M Algorithm - Center for Statistical Geneticscsg.sph.umich.edu/abecasis/class/2008/615.17.pdfoptim() Function in R zoptim(point, function, method) • Point – starting point

Still inefficient…

Consecutive steps are still perpendicular!

Page 11: The E-M Algorithm - Center for Statistical Geneticscsg.sph.umich.edu/abecasis/class/2008/615.17.pdfoptim() Function in R zoptim(point, function, method) • Point – starting point

Other Strategies for Other Strategies for Multidimensional Optimization

Most strategies will define a series of vectors or lines through parameter space

Estimate of minimum improved by adding ti l lti l f h tan optimal multiple of each vector

Some “intuitive” choices might be:• The function gradient• U it t l di i• Unit vectors along one dimension

Page 12: The E-M Algorithm - Center for Statistical Geneticscsg.sph.umich.edu/abecasis/class/2008/615.17.pdfoptim() Function in R zoptim(point, function, method) • Point – starting point

The key is to right angle turns!

Most methods that use derivatives don’t simply optimizefunction along current gradient or the unit vectorsfunction along current gradient or the unit vectors …

Page 13: The E-M Algorithm - Center for Statistical Geneticscsg.sph.umich.edu/abecasis/class/2008/615.17.pdfoptim() Function in R zoptim(point, function, method) • Point – starting point

Today …

The E-M algorithm• General algorithm for missing data problems

• Requires "specialization" to the problem at hand

• Frequently applied to mixture distributions

Page 14: The E-M Algorithm - Center for Statistical Geneticscsg.sph.umich.edu/abecasis/class/2008/615.17.pdfoptim() Function in R zoptim(point, function, method) • Point – starting point

The E-M Algorithm

Original Citation• Dempster, Laird and Rubin (1977)

J Royal Statistical Society (B) 39:1 38J Royal Statistical Society (B) 39:1-38• Cited in over 9,184 research articles

For comparison• Nelder and Mead (1965)Nelder and Mead (1965)

Computer Journal 7: 308-313• Cited in over 8,094 research articles

Page 15: The E-M Algorithm - Center for Statistical Geneticscsg.sph.umich.edu/abecasis/class/2008/615.17.pdfoptim() Function in R zoptim(point, function, method) • Point – starting point

The Basic E-M Strategy

X = (Y, Z)• Complete data X (eg. what we’d like to have!)• Observed data Y (eg. individual observations)• Missing data Z (eg. class assignments)

The algorithm• Use estimated parameters to infer Z• Update estimated parameters using Y and Z• Repeat until convergence

Page 16: The E-M Algorithm - Center for Statistical Geneticscsg.sph.umich.edu/abecasis/class/2008/615.17.pdfoptim() Function in R zoptim(point, function, method) • Point – starting point

The E-M Algorithm

Consider a set of starting parameters

Use these to “estimate” the missing data

Use “complete” data to update parameters

Repeat as necessary

Page 17: The E-M Algorithm - Center for Statistical Geneticscsg.sph.umich.edu/abecasis/class/2008/615.17.pdfoptim() Function in R zoptim(point, function, method) • Point – starting point

Setting for the E-M Algorithm...

Problem is simpler to solve for complete data• Maximum likelihood estimates can be calculated

using standard methods

Estimates of mixture parameters could be obtained in straightforward manner if the origin of each observation is known…

Page 18: The E-M Algorithm - Center for Statistical Geneticscsg.sph.umich.edu/abecasis/class/2008/615.17.pdfoptim() Function in R zoptim(point, function, method) • Point – starting point

Filling In Missing Data

The missing data is the group assignment for each observation

Complete data generated by assigningComplete data generated by assigning observations to groups • ProbabilisticallyProbabilistically• We will use “fractional” assignments

Page 19: The E-M Algorithm - Center for Statistical Geneticscsg.sph.umich.edu/abecasis/class/2008/615.17.pdfoptim() Function in R zoptim(point, function, method) • Point – starting point

The E-Step: Mixture of Normals

Estimate missing data• Estimate assignment of observations to groups

How?How?• Conditional on current parameter values

Basically, “classify” each observation

Page 20: The E-M Algorithm - Center for Statistical Geneticscsg.sph.umich.edu/abecasis/class/2008/615.17.pdfoptim() Function in R zoptim(point, function, method) • Point – starting point

Classification Probabilities

∑==

lil

jijii xf

xf,xjZ

),|(),|(

),,|Pr(ηφπηφπ

ηφπ∑

llil f ),|( ηφ

Results from the application of Bayes' theoremResults from the application of Bayes theorem

Implemented in classprob() functionImplemented in classprob() function• classprob(int j, double x,

int k, double *prob, double *mean, double *sd)

Page 21: The E-M Algorithm - Center for Statistical Geneticscsg.sph.umich.edu/abecasis/class/2008/615.17.pdfoptim() Function in R zoptim(point, function, method) • Point – starting point

C Code:C Code:Updating Group Membershipsvoid update_class_prob(int n, double * data,

int k, double * prob, double * mean, double * sd,double ** class_prob){{int i, j;

for (i = 0; i < n; i++)for (i = 0; i < n; i++)for (j = 0; j < k; j++)

class_prob[i][j] =classprob(j, data[i], p (j, [ ],

k, prob, mean, sd);}

Page 22: The E-M Algorithm - Center for Statistical Geneticscsg.sph.umich.edu/abecasis/class/2008/615.17.pdfoptim() Function in R zoptim(point, function, method) • Point – starting point

The M-StepUpdate mixture parameters to maximize the likelihood of the data …

Appears tricky, but becomes simple when we assume cluster assignments are correct

We simply use the sample proportions, and weighted means and variances to update parametersmeans and variances to update parameters

This step is guaranteed never to decrease likelihood

Page 23: The E-M Algorithm - Center for Statistical Geneticscsg.sph.umich.edu/abecasis/class/2008/615.17.pdfoptim() Function in R zoptim(point, function, method) • Point – starting point

Updating Mixture Proportions

,xjZ ii ),,|Pr( ηφπ=n

,j iii

),,|( ηπ φ=

"Count" the observations assigned to each groupCount the observations assigned to each group

Page 24: The E-M Algorithm - Center for Statistical Geneticscsg.sph.umich.edu/abecasis/class/2008/615.17.pdfoptim() Function in R zoptim(point, function, method) • Point – starting point

C Code:C Code:Updating Mixture Proportionsvoid update_prob(int n, double * data,

int k, double * prob,double ** class_prob)

{int i, j;

for (int j = 0; j < k; j++){

[j] 0 0prob[j] = 0.0;

for (int i = 0; i < n; i++)prob[j] += class_prob[i][j];

prob[j] /= n;}

}

Page 25: The E-M Algorithm - Center for Statistical Geneticscsg.sph.umich.edu/abecasis/class/2008/615.17.pdfoptim() Function in R zoptim(point, function, method) • Point – starting point

Updating Component Means

ii

iiii

j ,xjZ

,xjZx

η

ημ

∑∑

=

==

),,|Pr(

),,|Pr(ˆ

φπ

φπ

iiii

i

,xjZx η∑

∑=

=),,|Pr( φπ

jnπ

Calculate weighted mean for groupCalculate weighted mean for groupWeights are probabilities of group membership

Page 26: The E-M Algorithm - Center for Statistical Geneticscsg.sph.umich.edu/abecasis/class/2008/615.17.pdfoptim() Function in R zoptim(point, function, method) • Point – starting point

C Code:C Code:Update Component Meansvoid update_mean(int n, double * data,

int k, double * prob, double * mean,double ** class_prob)

{int i, j;

for (int j = 0; j < k; j++){

[j] 0 0mean[j] = 0.0;

for (int i = 0; i < n; i++)mean[j] += data[i] * class_prob[i][j];

mean[j] /= n * prob[j] + TINY;}

}

Page 27: The E-M Algorithm - Center for Statistical Geneticscsg.sph.umich.edu/abecasis/class/2008/615.17.pdfoptim() Function in R zoptim(point, function, method) • Point – starting point

Updating Component Variances

iiii ,xjZx ημ∑ =− ),,|Pr()( 2 φπ

j

iiiii

i n

,xjZx

π

ημσ

∑=

),,|Pr()(ˆ 2

φπ

Calculate weighted sum of squared differencesCalculate weighted sum of squared differencesWeights are probabilities of group membership

Page 28: The E-M Algorithm - Center for Statistical Geneticscsg.sph.umich.edu/abecasis/class/2008/615.17.pdfoptim() Function in R zoptim(point, function, method) • Point – starting point

C Code:C Code:Update Component Std Deviationsvoid update_sd(int n, double * data,

int k, double * prob, double * mean, double * sd,double ** class_prob)

{int i, j;

for (int j = 0; j < k; j++){[j] 0 0sd[j] = 0.0;

for (int i = 0; i < n; i++)sd[j] += square(data[i] - mean[j]) * class_prob[i][j];

sd[j] /= (n * prob[j] + TINY);sd[j] = sqrt(sd[j]);}

}}

Page 29: The E-M Algorithm - Center for Statistical Geneticscsg.sph.umich.edu/abecasis/class/2008/615.17.pdfoptim() Function in R zoptim(point, function, method) • Point – starting point

C Code:C Code:Update Mixturevoid update_parameters

(int n, double * data,int k, double * prob, double * mean, double * sd,double ** class prob)doub e c ass_p ob)

{// First, we update the mixture proportionsupdate_prob(n, data, k, prob, class_prob);

// Next, update the mean for each componentupdate_mean(n, data, k, prob, mean, class_prob);

// Finally, update the standard deviationupdate_sd(n, data, k, prob, mean, sd, class_prob);}

Page 30: The E-M Algorithm - Center for Statistical Geneticscsg.sph.umich.edu/abecasis/class/2008/615.17.pdfoptim() Function in R zoptim(point, function, method) • Point – starting point

E-M Algorithm For Mixtures

1. “Guesstimate” starting parameters

2. Use Bayes' theorem to calculate group assignment probabilities

3. Update parameters using estimated i tassignments

4 R t t 2 d 3 til lik lih d i t bl4. Repeat steps 2 and 3 until likelihood is stable

Page 31: The E-M Algorithm - Center for Statistical Geneticscsg.sph.umich.edu/abecasis/class/2008/615.17.pdfoptim() Function in R zoptim(point, function, method) • Point – starting point

C Code:C Code:The E-M Algorithmdouble em(int n, double * data,

int k, double * prob, double * mean, double * sd, double eps)

{double llk = 0, prev_llk = 0;double ** class_prob = alloc_matrix(n, k);

start_em(n, data, k, prob, mean, sd);d {do {

prev_llk = llk;update_class_prob(n, data, k, prob, mean, sd, class_prob);update_parameters(n, data, k, prob, mean, sd, class_prob);llk i ( d k b d)llk = mixLLK(n, data, k, prob, mean, sd);

} while ( !check_tol(llk, prev_llk, eps) );

return llk;}}

Page 32: The E-M Algorithm - Center for Statistical Geneticscsg.sph.umich.edu/abecasis/class/2008/615.17.pdfoptim() Function in R zoptim(point, function, method) • Point – starting point

Picking Starting Parameters

Mixing proportionsAssumed equal

Means for each groupPick one observation as the group meanPick one observation as the group mean

Variances for each groupg pUse overall variance

Page 33: The E-M Algorithm - Center for Statistical Geneticscsg.sph.umich.edu/abecasis/class/2008/615.17.pdfoptim() Function in R zoptim(point, function, method) • Point – starting point

C Code:C Code:Picking Starting Parametersvoid start em(int n, double * data,_ ( , ,

int k, double * prob, double * mean, double * sd){int i, j; double mean1 = 0.0, sd1 = 0.0;

for (i = 0; i < n; i++)mean1 += data[i];

mean1 /= n;for (i = 0; i < n; i++)( )

sd1 += square(data[i] - mean1);sd1 = sqrt(sd1 / n);

for (j = 0; j < k; j++)(j j j ){prob[j] = 1.0 / k;mean[j] = data[rand() % n];sd[j] = sd1;j}

}

Page 34: The E-M Algorithm - Center for Statistical Geneticscsg.sph.umich.edu/abecasis/class/2008/615.17.pdfoptim() Function in R zoptim(point, function, method) • Point – starting point

Example ApplicationExample ApplicationOld Faithful Eruptions (n = 272)

Old Faithful Eruptions

ncy

1520

Freq

ue

510

1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

05

Duration (mins)

Page 35: The E-M Algorithm - Center for Statistical Geneticscsg.sph.umich.edu/abecasis/class/2008/615.17.pdfoptim() Function in R zoptim(point, function, method) • Point – starting point

Using Simplex MethodUsing Simplex MethodA Mixture of Two Normals

FiFit 5 parameters• Proportion in 1st component, 2 means, 2 variances

44/50 runs found minimum

R i d b t 700 l tiRequired about ~700 evaluations• First component contributes 0.348 of mixture• Means are 2.018 and 4.273• Variances are 0.055 and 0.191

• Maximum log-likelihood = -276.36g

Page 36: The E-M Algorithm - Center for Statistical Geneticscsg.sph.umich.edu/abecasis/class/2008/615.17.pdfoptim() Function in R zoptim(point, function, method) • Point – starting point

Using E-M AlgorithmUsing E M AlgorithmA Mixture of Two Normals

Fit 5 parameters

50/50 runs found maximum50/50 runs found maximum

Required about ~25 evaluationsRequired about 25 evaluations• First component contributes 0.348 of mixture• Means are 2.018 and 4.273• Variances are 0.055 and 0.191

• Maximum log-likelihood = -276.36g

Page 37: The E-M Algorithm - Center for Statistical Geneticscsg.sph.umich.edu/abecasis/class/2008/615.17.pdfoptim() Function in R zoptim(point, function, method) • Point – starting point

Two Components

Old Faithful Eruptions

50.

6

Fitted Distribution

quen

cy

1520

0.3

0.4

0.5

ensi

ty

Freq

510

0.1

0.2

De

Duration (mins)

1 2 3 4 5 6

0

1 2 3 4 5 6

0.0

Duration (mins)

Page 38: The E-M Algorithm - Center for Statistical Geneticscsg.sph.umich.edu/abecasis/class/2008/615.17.pdfoptim() Function in R zoptim(point, function, method) • Point – starting point

Simplex Method:Simplex Method:A Mixture of Three Normals

Fi 8Fit 8 parameters• 2 proportions, 3 means, 3 variances

R i d b t 1400 l tiRequired about ~1400 evaluations• Found best solution in 7/50 runs• Other solutions effectively included only 2 components

The best solutions …• Components contributing .339, 0.512 and 0.149• Component means are 2 002 4 401 and 3 727Component means are 2.002, 4.401 and 3.727• Variances are 0.0455, 0.106, 0.2959

• Maximum log-likelihood = -267.89g

Page 39: The E-M Algorithm - Center for Statistical Geneticscsg.sph.umich.edu/abecasis/class/2008/615.17.pdfoptim() Function in R zoptim(point, function, method) • Point – starting point

Three Components

Old Faithful Eruptions

0.6

0.7

Fitted Distribution

quen

cy

1520

0.4

0.5

ensi

ty

Freq

510

0.1

0.2

0.3D

e

Duration (mins)

1 2 3 4 5 6

0

1 2 3 4 5 6

0.0

Duration (mins)

Page 40: The E-M Algorithm - Center for Statistical Geneticscsg.sph.umich.edu/abecasis/class/2008/615.17.pdfoptim() Function in R zoptim(point, function, method) • Point – starting point

E-M Algorithm:E M Algorithm:A Mixture of Three Normals

Fi 8Fit 8 parameters• 2 proportions, 3 means, 3 variances

R i d b t 150 l tiRequired about ~150 evaluations• Found log-likelihood of ~267.89 in 42/50 runs• Found log-likelihood of ~263.91 in 7/50 runs

The best solutions …• Components contributing .160, 0.195 and 0.644• Component means are 1 856 2 182 and 4 289Component means are 1.856, 2.182 and 4.289• Variances are 0.00766, 0.0709 and 0.172

• Maximum log-likelihood = -263.91g

Page 41: The E-M Algorithm - Center for Statistical Geneticscsg.sph.umich.edu/abecasis/class/2008/615.17.pdfoptim() Function in R zoptim(point, function, method) • Point – starting point

Three Components

Old Faithful Eruptions

0.8

Fitted Density

quen

cy

1520

0.6

ensi

ty

Freq

510

0.2

0.4De

Duration (mins)

1 2 3 4 5 6

0

1 2 3 4 5 6

0.0

Duration (mins)

Page 42: The E-M Algorithm - Center for Statistical Geneticscsg.sph.umich.edu/abecasis/class/2008/615.17.pdfoptim() Function in R zoptim(point, function, method) • Point – starting point

Convergence for E-M Algorithm

LogLikelihood

-200

-300

hood LogLikelihood

-266

-400Like

lih

-270

-268

0 50 100 150 200

IterationLi

kelih

ood

-5000 50 100 150 200

Iteration

Page 43: The E-M Algorithm - Center for Statistical Geneticscsg.sph.umich.edu/abecasis/class/2008/615.17.pdfoptim() Function in R zoptim(point, function, method) • Point – starting point

Convergence for E-M Algorithm

Mixture Means

5

3

4

ean

2

3

Me

10 50 100 150 200

Iteration

Page 44: The E-M Algorithm - Center for Statistical Geneticscsg.sph.umich.edu/abecasis/class/2008/615.17.pdfoptim() Function in R zoptim(point, function, method) • Point – starting point

E-M Algorithm:E M Algorithm:A Mixture of Four Normals

Fit 11 parameters• 3 proportions, 4 means, 4 variances

Required about ~300 evaluations• F d l lik lih d f 267 89 i 1/50• Found log-likelihood of ~267.89 in 1/50 runs• Found log-likelihood of ~263.91 in 2/50 runs• Found log-likelihood of ~257.46 in 47/50 runs

"Appears" more reliable than with 3 components

Page 45: The E-M Algorithm - Center for Statistical Geneticscsg.sph.umich.edu/abecasis/class/2008/615.17.pdfoptim() Function in R zoptim(point, function, method) • Point – starting point

Four Components

Old Faithful Eruptions

1.0

quen

cy

1520

0.6

0.8

ensi

ty

Freq

510

0.2

0.4

De

Duration (mins)

1 2 3 4 5 6

0

1 2 3 4 5 6

0.0

Duration

Page 46: The E-M Algorithm - Center for Statistical Geneticscsg.sph.umich.edu/abecasis/class/2008/615.17.pdfoptim() Function in R zoptim(point, function, method) • Point – starting point

Today …

The E-M algorithm

Missing data formulation

Application to mixture distributions• Consider multiple starting pointsConsider multiple starting points

Page 47: The E-M Algorithm - Center for Statistical Geneticscsg.sph.umich.edu/abecasis/class/2008/615.17.pdfoptim() Function in R zoptim(point, function, method) • Point – starting point

Further Reading

There is a nice discussion of the E-M algorithm, with application to mixtures at:

http://en.wikipedia.org/wiki/EM algorithmhttp://en.wikipedia.org/wiki/EM_algorithm