Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A...

58
Maximum Likelihood Estimation for Allele Frequencies Biostatistics 666

Transcript of Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A...

Page 1: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

Maximum Likelihood Estimation for Allele Frequencies

Biostatistics 666

Page 2: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

Previous Series of Lectures:Introduction to Coalescent Models• Computationally efficient framework

• Alternative to forward simulations

• Amenable to analytical solutions

• Predictions about sequence variation• Number of polymorphisms

• Frequency of polymorphisms

• Distribution of polymorphisms across haplotypes

Page 3: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

Next Series of Lectures

• Estimating allele and haplotype frequencies from genotype data• Maximum likelihood approach

• Application of an E-M algorithm

• Challenges• Using information from related individuals

• Allowing for non-codominant genotypes

• Allowing for ambiguity in haplotype assignments

Page 4: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

Maximum Likelihood

• A general framework for estimating model parameters• Find parameter values that maximize the probability of the observed data

• Learn about population characteristics• E.g. allele frequencies, population size

• Using a specific sample • E.g. a set sequences, unrelated individuals, or even families

• Applicable to many different problems

Page 5: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

Example: Allele Frequencies

• Consider…• A sample of n chromosomes

• X of these are of type “a”

• Parameter of interest is allele frequency…

XnX ppX

nXnpL

)1(),|(

Page 6: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

Evaluate for various parameters

p 1-p L

0.0 1.0 0.000

0.2 0.8 0.088

0.4 0.6 0.251

0.6 0.4 0.111

0.8 0.2 0.006

1.0 0.0 0.000

For n = 10 and X = 4

Page 7: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

Likelihood Plot

0

0.1

0.2

0.3

0.4

0.0 0.2 0.4 0.6 0.8 1.0

Allele Frequency

Lik

eli

ho

od

For n = 10 and X = 4

Page 8: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

In this case

• The likelihood tells us the data is most probable if p = 0.4

• The likelihood curve allows us to evaluate alternatives…• Is p = 0.8 a possibility?

• Is p = 0.2 a possibility?

Page 9: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

Example: Estimating 4N

• Consider S polymorphisms in sample of n sequences…

• Where Pn is calculated using the Qn and P2 functions defined previously

)|(),|( SPSnL n

Page 10: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

Likelihood Plot

4N

Lik

elih

ood

With n = 5, S = 10

MLE

Page 11: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

Maximum Likelihood Estimation

• Two basic steps…

• In principle, applicable to any problem where a likelihood function exists

)|( maximizes that ˆ of valueFind b)

)|()|(

function likelihooddown Writea)

xL

xfxL

Page 12: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

MLEs

• Parameter values that maximize likelihood• where observations have maximum probability

• Finding MLEs is an optimization problem

• How do MLEs compare to other estimators?

Page 13: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

Comparing Estimators

• How do MLEs rate in terms of …• Unbiasedness

• Consistency

• Efficiency

• For a review, see Garthwaite, Jolliffe, Jones (1995) Statistical Inference, Prentice Hall

Page 14: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

Analytical Solutions

• Write out log-likelihood …

• Calculate derivative of likelihood

• Find zeros for derivative function

)|(ln)|( dataLdata

d

datad )|(

Page 15: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

Information

• The second derivative is also extremely useful

• The speed at which log-likelihood decreases

• Provides an asymptotic variance for estimates

IV

d

datadEI

1

)|(

ˆ

2

2

Page 16: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

Allele Frequency Estimation …

• When individual chromosomes are observed this is not so tricky…

• What about with genotypes?

• What about with parent-offspring pairs?

Page 17: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

Coming up …

• We will walk through allele frequency estimation in three distinct settings:

• Samples single chromosomes …

• Samples of unrelated Individuals …

• Samples of parents and offspring …

Page 18: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

I. Single Alleles Observed

• Consider…• A sample of n chromosomes

• X of these are of type “a”

• Parameter of interest is allele frequency…

XnX ppX

nXnpL

)1(),|(

Page 19: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

Some Notes

• The following two likelihoods are just as good:

• For ML estimation, constant factors in likelihood don’t matter

n

i

xx

n

XnX

ii ppnxxxpL

ppX

nnXpL

1

1

21 )1(),...,;(

)1(),;(

Page 20: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

Analytic Solution

• The log-likelihood

• The derivative

• Find zero …

)1ln()(lnln),|(ln pXnpXX

nXnpL

p

Xn

p

X

dp

XpLd

1

)|(ln

Page 21: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

Samples of Individual Chromosomes• The natural estimator (where we count the proportion of sequences

of a particular type) and the MLE give identical solutions

• Maximum likelihood provides a justification for using the “natural” estimator

Page 22: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

II. Genotypes Observed

• Use notation nij to denote the number of individuals with genotype i / j

• Sample of n individuals

Genotype Counts

Genotype A1A1 A1A2 A2A2 Total

Observed Counts n11 n12 n22 n=n11+n12+n22

Frequency p11 p12 p22 1.0

Page 23: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

Allele Frequencies by Counting…

• A natural estimate for allele frequencies is to calculate the proportion of individuals carrying each allele

Allele Counts

Genotype A1 A2 Total

Observed Counts n1 = 2n11 + n12 n2 = 2n22 + n12 2n=n1+n2

Frequency p1=n1/2n p2=n2/2n 1.0

Page 24: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

MLE using genotype data…

• Consider a sample such as ...

• The likelihood as a function of allele frequencies is …

221211 ²2²!!!

!);(

221211

nnnqpqp

nnn

nnpL

Genotype Counts

Genotype A1A1 A1A2 A2A2 Total

Observed Counts n11 n12 n22 n=n11+n12+n22

Frequency p11 p12 p22 1.0

Page 25: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

Which gives…

• Log-likelihood and its derivative

• Giving the MLE as …

)1(

22

)1ln(2ln2ln

1

1222

1

1211

1

1122211211

p

nn

p

nn

dp

d

CpnnpnnL

221211

12111

2

nnn

nnp

Page 26: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

Samples ofUnrelated Individuals• Again, natural estimator (where we count the proportion of alleles of

a particular type) and the MLE give identical solutions

• Maximum likelihood provides a justification for using the “natural” estimator

Page 27: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

III. Parent-Offspring Pairs

Child

Parent A1A1 A1A2 A2A2

A1A1 a1 a2 0 a1+a2

A1A2 a3 a4 a5 a3+a4+a5

A2A2 0 a6 a7 a6+a7

a1+a3 a2+a4+a6 a5+a7 N pairs

Page 28: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

Probability for Each Observation

Child

Parent A1A1 A1A2 A2A2

A1A1

A1A2

A2A2

1.0

Page 29: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

Probability for Each Observation

Child

Parent A1A1 A1A2 A2A2

A1A1 p13 p1

2p2 0 p12

A1A2 p12p2 p1p2 p1p2

2 2p1p2

A2A2 0 p1p22 p2

3 p22

p12 2p1p2 p2

2 1.0

Page 30: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

Which gives…

CB

Bp

aaaaaaC

aaaaaaB

pp

1

765432

654321

12

ˆ

32

23

1

Lln

Page 31: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

Which gives…

CB

Bp

aaaaaaC

aaaaaaB

pp

pCpB

pappaa

ppappaapa

1

765432

654321

12

11

3

27

2

2165

2142

2

132

3

11

ˆ

32

23

1

)1ln(ln

constantlnln

lnlnlnLln

Page 32: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

Samples ofParent Offspring-Pairs

• The natural estimator (where we count the proportion of alleles of a particular type) and the MLE no longer give identical solutions

• In this case, we expect the MLE to be more accurate

Page 33: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

Comparing Sampling Strategies

• We can compare sampling strategies by calculating the information for each one

• Which one to you expect to be most informative?

IV

d

datadEI

1

)|(

ˆ

2

2

Page 34: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

How informative is each setting?

• Single chromosomes

• Unrelated individuals

• Parent offspring pairs43

)(

2)(

)(

aN

pqpVar

N

pqpVar

N

pqpVar

pairs

sindividual

schromosome

Page 35: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

Other Likelihoods

• Allele frequencies when individuals are…• Diagnosed for Mendelian disorder

• Genotyped at two neighboring loci

• Phenotyped for the ABO blood groups

• Many other interesting problems…

• … but some have no analytical solution

Page 36: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

Today’s Summary

• Examples of Maximum Likelihood

• Allele Frequency Estimation• Allele counts

• Genotype counts

• Pairs of Individuals

Page 37: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

Take home reading

• Excoffier and Slatkin (1995)• Mol Biol Evol 12:921-927

• Introduces the E-M algorithm

• Widely used for maximizing likelihoods in genetic problems

Page 38: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

Properties of EstimatorsFor Review

Page 39: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

Unbiasedness

• An estimator is unbiased if

• Multiple unbiased estimators may exist

• Other properties may be desirable

)ˆ()ˆ(

)ˆ(

Ebias

E

Page 40: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

Consistency

• An estimator is consistent if

• for any

• Estimate converges to true value in probability with increasing sample size

nP as 0|ˆ|

Page 41: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

Mean Squared Error

• MSE is defined as

• If MSE 0 as n then the estimator must be consistent

Page 42: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

Efficiency

• The relative efficiency of two estimators is the ratio of their variances

• Comparison only meaningful for estimators with equal biases

efficient more is ˆ then1)ˆvar(

)ˆvar( if 1

1

2

Page 43: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

Sufficiency• Consider…

• Observations X1, X2, … Xn

• Statistic T(X1, X2, … Xn)

• T is a sufficient statistic if it includes all information about parameter in the sample• Distribution of Xi conditional on T is independent of

• Posterior distribution of conditional on T is independent of Xi

Page 44: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

Minimal Sufficient Statistic

• There can be many alternative sufficient statistics.

• A statistic is a minimal sufficient statistic if it can be expressed as a function of every other sufficient statistic.

Page 45: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

Typical Properties of MLEs

• Bias• Can be biased or unbiased

• Consistency• Subject to regularity conditions, MLEs are consistent

• Efficiency• Typically, MLEs are asymptotically efficient estimators

• Sufficiency• Often, but not always

• Cox and Hinkley, 1974

Page 46: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

Strategies for Likelihood Optimization

For Review

Page 47: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

Generic Approaches

• Suitable for when analytical solutions are impractical

• Bracketing

• Simplex Method

• Newton-Rhapson

Page 48: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

Bracketing

• Find 3 points such that • a < b < c

• L(b) > L(a) and L(b) > L(c)

• Search for maximum by• Select trial point in interval

• Keep maximum and flanking points

Page 49: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

Bracketing

12

3

4

5

6

Page 50: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

The Simplex Method

• Calculate likelihoods at simplex vertices• Geometric shape with k+1 corners

• E.g. a triangle in k = 2 dimensions

• At each step, move the high vertex in the direction of lower points

Page 51: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

The Simplex Method II

highlow

Original Simplex

reflection

reflection andexpansion

contraction

multiplecontraction

Page 52: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

One parameter maximization

• Simple but inefficient approach

• Consider• Parameters = (1, 2, …, k)

• Likelihood function L (; x)

• Maximize with respect to each i in turn• Cycle through parameters

Page 53: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

The Inefficiency…

1

2

Page 54: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

Steepest Descent

• Consider• Parameters = (1, 2, …, k)

• Likelihood function L (; x)

• Score vector

• Find maximum along + S

kd

Ld

d

Ld

d

LdS

)ln(...,,

)ln()ln(

1

Page 55: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

Still inefficient…

Consecutive steps are perpendicular!

Page 56: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

Local Approximations to Log-Likelihood Function

matrix ninformatio observed theis)(²

vectorscore theis )(

function oodloglikelih theis)(ln)(

where

)()(2

1)()()(

of oodneighboorh theIn

i

i

i

t

iii

i

d

d

L

S

θI

θS

θθ

θθIθθθθθθ

θ

Page 57: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

Newton’s Method

SIθθ

0θθIS

θθIθθθθSθθ

1

1

point trialnew aget and

)(

zero... toderivative its settingby

)()(2

1)()()(

ionapproximat theMaximize

ii

i

i

t

iii

Page 58: Maximum Likelihood Estimation for Allele Frequencies€¦ · Allele Frequencies by Counting… •A natural estimate for allele frequencies is to calculate the proportion of individuals

Fisher Scoring

• Use expected information matrix instead of observed information:

2

2

2

2

)|(

of instead

)(

d

datad

d

dE

Compared to Newton-Rhapson:

Converges faster when estimates

are poor.

Converges slower when close to

MLE.