Lewis, 2001. Phylogenetic System a Tics Turns Over a New Leaf

8/6/2019 Lewis, 2001. Phylogenetic System a Tics Turns Over a New Leaf

1/8

TRENDS in Ecology & EvolutionVol.16 No.1 January 2001

http://tree.trends.com 01695347/01/$ see fron t m atter 2001 Elsevier Science Ltd. A ll righ ts reserved . PII: S0169-5347(00)02025-5

30 Review

As an OPTIMALITYCRITE RION (see Glossary) in

phylogenetics, MAXIMUM LIKELIHOOD has been in usealmost as long as par simony, which remains t he

dominant meth od for phylogenetic ana lyses of

discrete (morph ological an d molecular ) chara cter

dat a. Unlike the other t wo ma jor optimality criteria ,

both par simony and likelihood operat e directly on

discrete character data r ather than on a m atr ix of

pairwise distan ces. This allows parsimony an d

likelihood met hods to be used to estimat e ancestra l

char acter states, in addition t o estimating th e tree

topology. F elsenst eins1 pru nin g algorit hm ma de it

feasible to apply th e likelihood criter ion to nucleic

acid sequence data . His freely distribut ed package of

compu ter pr ogram s for phylogeny inferen ce, calledPH YLIP (Ref. 2), int roduced th e likelihood criter ion

into molecular systema tics and m olecular evolut ion.

Felsens teins 1981 model1 was th e first in a series of

improvement s ma de to the pioneering model of Juk es

and Cant or3 an d was the first to allow nu cleotide

frequencies to differ (the J ukes a nd Cantor m odel3

assum es tha t th e four types of nucleotide are pr esent

in equa l proportions). Fur ther developments led to

models th at accommodate both

tra nsitiontr ansversion substitu tion bias4 and

unequal nucleotide frequencies5,6. This tren d

culminated in the general t ime-reversible (GTR)

model, which n ot only allows unequ al nu cleotide

frequen cies, but also allows each of th e six possible

tra nsitions between nucleotide sta tes (i.e. AC, AG,AT, CG, CT and GT) to occur at different r at es7.

A furt her improvement, used in conjunction with

the abovement ioned m odels, is th e ability to

accommodate variat ion in subst itution ra tes across

sites, using several meth ods810. One of th e most

comm only app lied meth ods for allowing rat es to

vary among sites is the discrete gam ma m ethod,

which a ssumes that relative rat es are distr ibutedaccording to a gam ma distr ibution with a mean of

1 and a var iance of 1/, where defines the sh ape ofthe distribution. Large values of tran slate to a t inyvarian ce in relative rat es (i.e. the ra tes at all sites

are almost equivalent), whereas sm all values (e.g.

0.01) result in an L-shaped distribution of relat ive

rat es (most rat es are low, but some rat es are h igh).

Yang11 reviewed in depth th is approach to

accommodating site-to-site substitu tion ra te

heterogeneity, and other r eviews of this and other

topics includ e Swofford et al.12, Huelsenbeck an d

Crandall13 an d Lewis14.

Some comput er program s tha t ar e used todeter mine ph ylogenies allow other options with

respect to modeling ra te het erogeneity, including:

(1) invar iable sites m odels, in wh ich a proportion of

sites is assum ed incapable of un dergoing

substitution15; (2) site-specific ra tes models, wh ich

allow different r at es for arbitr ar y subsets of sites

th at correspond to different genes, intr ons versus

exons or differe nt codon positions 15; (3) hidd en

Mark ov models which allow for corr elation in t he

rat es at adjacent sites2; and (4) discrete gam ma

models in wh ich can differ between subset s ofsites16. It is also possible to combine an invar iable

sites model with a discrete gam ma model, letting the

Review

Paul O. Lewis

Dept of Ecology and

Evolutionary Biology,

University of Connecticut,

75 N. Eaglevill e Road Uni t

3043, Storr s, CT 06269-3043, USA.

e-mail:

[email protected]

Phylogenetic systematics turns overa new leafPaul O. Lewis

Long restricted to the domain of molecular systematics and studies of

molecular evolution, likelihood methods are now being used in analyses of

discrete morphological data, specifically to estimate ancestral character states

and for tests of character correlation. Biologists are beginning to apply

likelihood models within a Bayesian statistical framework,which promises not

only to provide answers that evolutionary biologists desire,but also to make

practical the application of more realistic evolutionary models.

Bootstrapping: a statistical techniqu e, first applied to phylog enetics

by Felsenstein39, in which new data sets are created by samplin g

randomly (and with replacement) from the origi nal characters.

These new data sets (called bo otstrap d ata sets) are of th e same size

as the origi nal. A desired quantity i s computed for each bootstrap

data set and the resulting distributi on is used to estimate the

dispersion that would be expected if the same num ber of new

independent data sets had been collected. Bootstrappin g assumes

that the original characters were sampled independent ly.

Likelihood:a quantity that is proportional to the probability of the

data (or probabili ty density, if the data are continuous-valu ed), given

specific values for all parameters in the m odel.The likelihoodfunction provides a means to estimate the parameters of the model.

Parameter values that are associated with the global m aximum of

the likelihood function are termed maxim um li kelihood estimates

(MLEs).

Optimality criterion: a rule used to decide which of tw o trees is

best. Four opti mality criteria are currently widely used:

Maximum parsimony the tree requir ing th e fewest character state

changes is the better of th e two t rees.

Maximum likelihood the tree maximizing the likelihood under the

assumed evolutionary model is better.

Minimum evolution the tree having t he smallest sum of branch

lengths (estimated using ordi nary least squares) is better.

Least squares the tree showin g the best fit between

estimated pairw ise distances and the corresponding pairw isedistances obtained by summi ng paths through t he tree is

better.

Glossary


2/8


http://tree.trends.com

31Review

gamma shape parameter () apply only to th evaria ble fraction of sites 15.

This review focuses on new n ucleotide sequen ce-based m odels (codon a nd seconda ry str uctur e

models), new a pplications of likelihood models

(discrete morph ological char acters) an d a new

framework with in wh ich likelihood m odels can be

applied t o phylogenetics (Bayesian inference). First ,

models k nown as codon m odels an d second ar y

stru cture models allow evolutiona ry dependence

am ong nucleotide sites within t he sam e codon an d

between sites th at are opposite one a nother in a stem

region of an rRN A gene, res pectively. By contr ast ,

par simony and previous likelihood models assu me

th at all nucleotide sites ar e evolutionarily

independen t. Therefore, codon a nd seconda rystru cture models represent a significant step forwar d

by allowing recognized evolut ionar y correla tions

between sites to be directly incorpora ted int o a

phylogeny met hod. Second, likelihood is being applied

to discrete morph ological char acters, wh ich ha ve been

limited previously to the r ealm of parsimony an alysis.

Applicat ions t ha t u se likelihood m odels include th e

estima tion of an cestral chara cter stat es and test s for

chara cter correlat ion. Third, in addition to new

models, the very fra mework in which likelihood

models ar e used is cha ngin g. Likelihood models form

th e basis of Bayesian sta tistical methods, and it is

only natu ral th at Bayesian meth ods be applied tophylogenetic problems alrea dy being addressed u sing

likelihood models.

Likelihood models that allow nonindependence

In 1994, Muse and Gau t 17 and Goldman a nd Yang 18

indepen dent ly intr oduced likelihood models tha twere designed t o account for evolutionar y

dependen cy among sites with in codons. This is in

cont ra st t o par simony and pr evious likelihood

models, which were forced to assu me indep enden ce

am ong sites in a gene sequ ence, even if th e

independence a ssumption h ad been violated. For

instance, in genes th at encode proteins, the t hree

nu cleotide sites tha t form a single codon cann ot

evolve indepen dent ly of one a nother if there is

selection for a par ticular a mino acid at th e corre-

sponding site in the polypeptide.

Codon modelsCodon models represent an importa nt advance in

ter ms of the biological r ealism of substitu tion

models. Codon models t ake th e genet ic code explicitly

into accoun t wh en computin g the pr obability of a

change at a site across a bran ch. Thus, rath er tha n

four possible sta tes (A, C, G and T) there a re 61 (64

possible codons, less th e th ree st op codons), which

mean s tha t codon models require a great deal more

compu ta tional effort th an classic DNA an d RNA

subst itut ion models. In par ticular, likelihood

met hods consider each stat e in tur n for every interior

node of a t ree. Un like par simony, the score used for

compar ing trees does not depend on an y part icularcombina tion of (unobserved) an cestral st at es. The

overa ll likelihood is a su m over all sta te

Review

A small portion of the full (6161)INSTANTANEOUS RATEM ATRIX (see Glossary) fo r

the Muse and Gauta codon model isillustrated (Table I). The quanti ties

A,

C,

G

and T

are the equilibrium nucleotide

frequencies, only three of w hich represent

free parameters of the model, because the

fourth can be obtained by subtraction. The

remaining tw o parameters in the model, (synonym ous substitutions) and

(nonsynonymous substitutions), are

relative r ate parameters. The diagonal

elements are not shown, but can becalculated by subtraction. All changes

between the codons that involve more than

one nucleotide substitution have an

instantaneous rate of zero. The remaining

elements represent single nucleotide

substitutions that result in a change from

one codon to another. The rate of any one

of this type of change depends on the

nucleot ide frequencies (e.g. the rate is

lower for substitutions to a relatively rarebase) and the nature of the change

(synonymous versus nonsynonymous).

Reference

a Muse, S.V. and Gaut, B.S. (1994) A likelihood

approach for comparing synonymous a nd

nonsynonymous substitu tion rates, with

application to the chloroplast genome.Mol.Biol.

Evol. 11, 715724

Box Glossary

Instantaneous rate matrix:a square matrix ofsubstitut ion rate parameters that serves to define a

particular substitution m odel. The probability of

observing statej, given starting state i, changes wit h

time in a nonlinear fashion. The rate parameter

represents the slope of this curve evaluated at t= 0.

The elements on the diagonal are negative and equal

to the sum of the other rates in the same row. This

quantity is negative, because the probability of

observing state i, given that iwas the starting state,

decreases with t ime.

Box 1.The anatomy of a codon model

Table I. Part of Muse and Gauts 61 61 instantaneous rate matrixa

Codon before Codon after substitution (the to state)

substitution

(the from state) TTT TTC TTA TTG CTT CTC GGG

(Phe) (Phe) (Leu) (Leu) (Leu) (Leu) (Gly)

TTT (Phe) C

A

G

C

0 0

TTC (Phe) T

A

G

0 C

0

TTA (Leu) T

C

G

0 0 0

TTG (Leu) T

C

A

0 0 0

CTT (Leu) T

0 0 0 C

0

CTC (Leu) 0 T

0 0 T

0

GGG (Gly) 0 0 0 0 0 0


3/8



32 Review

combinations, in mu ch the same way tha t th e

probability of rolling a seven with two dice is a sum

over th e six possible ways in which the n um bers on

each die can add u p to seven.

Although it would be possible to allow a differen t

ra te for every possible tr an sition bet ween codons,

th e estimat ion of so ma ny par am eters (3660 in

total) would require a n u nreasonable amount of

dat a. To keep th e num ber of free (estima ted)para meters m ana geable, the t wo codon models th at

ha ve been pr oposed ma ke some concessions. F or

instance, neither model permits more tha n a single

change in an y given insta nt (e.g. the ra te of th e

chan ge AAAAGC is 0). However, t his is areasona ble restriction, becau se two changes ar e

allowed to occur in t wo consecut ive infinit esima l

periods of tim e.

The model by Muse an d Gaut 17 assigns rat es to all

other possible chan ges (i.e. the changes th at do not

require more th an two nucleotide point m uta tions) onthe basis of two param eters that represent the r ate of

synonymous an d nonsynonymous subst itutions,

Review

In Pagelsa,b likelihood ratio test, two m odels are

considered, of which one is a more general version of

the other. Normall y, the constrained model

represents the null hypothesis and is nested wi thin(i.e. is a special case of) the m ore general m odel.

A likelihood ratio test returns a significant result if

the data are better explained by the general model (for

which the maximum of the likelihood function isL1)

than by the constrained model that represents the null

hypothesis (for which the likelihood m aximum is L0). A

ratio L1/L

0much greater than 1 indicates signi ficance

and the likelihood ratio test statistic, defined asLR=2(ln L

0 ln L

1), is large and positi ve.LRis distributed as

a chi-squared random variable with degrees of freedom

equal to the difference in the number of free (estimated)

parameters between the two models if the null model is

perfectly nested with in the unconstrained model.

The null hypothesis model for a test of correlated

evolution is based on the follow ing instantaneous

rate matrix (Table I).

The same matr ix is used for each character, but andare estimated separately for each character.Thenumber of free parameters in the null model is thus

equal to 4. In practice, the TRANSITION MATRIX (see Glossary)

that corresponds to this rate matrix is computed for each

character (let the t ransition m atrix for character 1 be P

and let that for character 2 beR), which allow s

probabili ties for specific events to be computed. For

example, the joint probability that character 1 remains in

state 0 and character 2 changes from state 0 to 1 across a

specific segment of the tree, is obtained as follows:

Pr(00,01) =P(00)R(01) [1]The fact that the joint transition p robability is the

product of the separate transition probabili ties foreach character m akes explicit the assumption of

independence in this model.

The general model enables (but does not f orce)

the evolution of the tw o characters to be correlated

and can account for simul taneous changes in the

characters by using a single m atrix (Table II).

This model has eight free parameters. The

maximum -likelihood value under this modelapproaches that of the null m odel if the characters are

evolving independently o f one another. As in the

codon m odels, a rate of 0 is specified for events that

require two changes. The difference in the num ber of

free parameters is 84= 4, so the likelihood ratio teststatistic is nom inally distributed as a2 randomvariable with four degrees of freedom. LRmight not

follow a 2 distribu tion i n this case, because themodels are not strictly nested (owing to the four rates

wi th values of 0 in the general model, but see Ref. b).

Pagel provides (in his com puter program , DISCRETE)

the means for using parametric bootstrapping to

determine the significance level w ithout making thisdistributional assumption .

References

a Pagel, M. (1994) Detecting correlat ed evolution on phylogenies:

a genera l method for th e compar ative ana lysis of discrete

characters.Proc.R . S oc.London B Biol. Sci. 255, 3745

b Pagel, M. (1997) Inferring evolutionary processes from

phylogenies.Zoologica S cripta 26, 331348

Box Glossary

Transition matrix: a matrix that shows the conditional probability

of observing statejgiven starting state iafter some arbitrary am ount

of time (t) for each pair of states iandj.This matrix can be derived

from the instantaneous rate matrix, w hich describes fully the

process at any given instant. (In this context, transition should not

be confused with the same term that is used in conjunction with

transversions.)

Box 2. Likelihood ratio test for character correlation

Table I. Instantaneous rate matrix for the null

hypothesis assuming evolutionary independence

Changing from: Changing to:

0 1

0 1

Table II. Instantaneous rate matrix for the general

hypothesis allowing evolutionary correlation

After state Before state

0, 0 0, 1 1, 0 1, 1

0, 0 ab a b 00, 1 c cd 0 d1, 0 e 0 ef f1, 1 0 g h gh


4/8



33ReviewReview

Bayesian inference involves interplay

between the likelihood function and the

prio r and posterio r distributions. To

illustrate this concept, consider thefollowi ng simplistic problem. Suppose a

black marbl e has come from one of two

urns that each contain mi llions of marbles.

The challenge is to decide from w hich urn

the marble was removed. Suppose further

that it is known that 40% of the marbles in

urn A and 80% of the marbles in urn B are

black. Choosing a marble from urn B

would be clearly more likely to yield a

black marble. Nevertheless, it is instructive

to calculate the posterior probabili ty that

the marble came from urn B and to

compare this to the posterior probabilitythat it came from urn A.

Bayes Rule, which is used to obtain the

posterior probability from the likelihood and

prior probability, is based on the definition of

conditional probability. Accordingly, for two

events, A and B, the joint probability of A and

Bequals the conditional probability of one,

given the other (i.e. A given B or Bgiven

A) multiplied by the probability of the

condition (Eqn 1):

Pr(A,B)=Pr(A)Pr(BA)=Pr(B)Pr(AB) [1]

Focusing only on the equation on the right ,

divi ding bo th sides by Pr(A) yields Bayes

Rule (Eqn 2):

[2]

In the Bayesian framework, A

represents data and Ba hypothesis (or

parameter):

Pr(BA) is termed the posterior probabil ity

and is the probability of the hypothesis (or

parameter value), given the data.Pr(AB) is termed the likelihood and is

the probability of the data, given the

hypothesis (or parameter value).

Pr(B) is termed the prior probability and

is the unconditional probability of the

hypothesis (or parameter value). This must

be specified by the investigator w ithou t

reference to the data.

Pr(A) is the unconditional probability of

the data, which can be obtained, using the

law of total probability, by calculating the

sum of the product Pr(B)Pr(A/B) for all

possible values of B. This serves as a

norm alizing constant, wh ich ensures that

the sum of the posterior probabilit ies is 1.

In this example, there is only one

datum (the black marble). Therefore,

computing the likelihood (probability ofthe data, given the hypothesis) is

straightforward and is simply the

probability t hat a single marble is black,

given a particular urn hypothesis. The

likelihood is 0.4 for urn A and 0.8 for urn B.

If maximum likelihood were being used to

decide between the two urns, urn B would

be selected, because the li kelihood of

drawing a black marble is greater for urn B

than for urn A (0.8 versus 0.4). However, to

choose a prior distribution that reflects

complete prior ignorance of which urn w as

used when drawi ng the black marb le, wespecify that the prio r probabi lity of each

urn is 0.5. The norm alizing constant is

therefore (Eqn 3):

Pr(black marble chosen) =

(0.5)(0.4)+ (0.5)(0.8)=0.6 [3]

The posterior probability can now be

computed for each urn as follows (Eqns 4,5):

[4]

[5]

Thus, the probability that the black

marble came from urn B, given the datum,

is two thirds.The posterior distribution i s

often described as an updated version of

the prior distribution. In this case, the

posterior distribution (0.33, 0.67)

represents an updated version o f the prior

distribution (0.5, 0.5), the evidence used for

the update being the fact that the single

marble drawn was black. A m ajor benefit of

taking the Bayesian perspective lies in the

fact that it produces probabilities for

hypotheses of in terest, which is exactly

what investigators desire. Likelihoods are

useful, but not easily in terpreted, because

they represent the probability of the datagiven the hypothesis rather than the

probability of the hypothesis given the

data.

A criti cism of Bayesian approaches is

the subjectivity o f the prior. Note that in

the example above (with only a single

datum), it w ould be necessary to stipulate

a prior probability for urn A that was

greater than two thirds to rig the analysis

(that is, to ensure that the conclusion is in

favor of urn A). While the posterior

distribution always changes if the prior is

changed, the conclusions are usually notoverly sensitive to the prior and any effect

of the pri or decreases as the amount of

data increases. A counterargum ent to the

subjectivity criticism is that subjectivity

that is inherent in the pri or is explicit and

must be defensible.

In the example above, discrete

hypotheses were used. However,

continuous parameters are more often the

targets of Bayesian analyses. In such cases,

probabili ty density functions replace the

probabili ties of discrete hypotheses, but

Bayes rule is still applicable. Figure Iillustrates the prior and posterior density

curves for the parameter p (probability of

heads) in a simple coin-flipping experiment

in which six heads were observed for ten

flips.Two di fferent prior distributions were

used, namely a flat [Beta(1,1), Fig. Ia] and a

more inform ative [Beta(2,2), Fig. Ib] prior

distribution. As an example of how such

curves could be used, the posterior

probabili ty that pis between 0.45 and 0.55

could be obtained by integrating the

posterior density curve between 0.45 and

0.55.

Box 3. Prior and posterior probabilities

TRENDS in Ecology & Evolution

0.00 0.20 0.40 0.60 0.80 1.00

0.00

0.55

1.10

1.66

2.21

2.76

(a) (b)

0.00

0.59

1.19

1.78

2.38

2.97

Posterior

Prior

0.00 0.20 0.40 0.60 0.80 1.00

Posterior

Prior

Probabilitydensity

Probabilitydensity

p p

Fig. I. Prior and posterior density curves for p(probability of heads) in a coin-flipping experiment, illustrating a

flat [Beta(1,1)] (a) and an inform ative [Beta(2,2)] (b) prior distr ibuti on.

)Pr(

)Pr()Pr()Pr(

A

BABAB

=

31

6.0

)4.0)(5.0()chosenmarbleA/blackurn(Pr ==

32

6.0)8.0)(5.0()chosenmarbleA/blackurn(Pr ==


5/8

respectively (Box 1). By estima ting only five

param eters (the two rate para meters and thr ee

nucleotide frequen cy par am eters), the entire 61 61ma trix of tra nsition probabilities required to compu te

the likelihood can be specified. The Muse an d Ga ut

codon model is implemen ted in Mu se an d Kosakovskys

Hypoth esis Test ing Usin g Ph ylogenies (HYPHY)

compu ter pa ckage19. The commonly employed HKY85

model5 also requires five param eters, but th e two rat e

param eters in this case represent t he ra tes of

tran sitions an d tran sversions, rather t han

nonsynonymous and synonymous substitut ions.

The codon model proposed by Goldman an d Yang18

goes a step furt her in a llowing both

tra nsitiontran sversion bias an d synonymous-

nonsynonymous su bstitut ion bias. This m odel also

allows biochemical differen ces among amino acids to

play a role in determining the ra te at wh ich changes

am ong amino a cids occur. For exam ple, a chan ge from

one hydrophobic to an other hydrophobic residue can

occur at a higher r ate t han a chan ge from a hydrophobic

to a h ydrophilic residue. Goldman a nd Yangs model is

implemen ted in Yangs Ph ylogenetic Analysis By

Maximum Likelihood (PAML) compu ter package16.

Secondary structure models

Secondary structure models represent an a ttempt to

add biological rea lism to an alyses of genes with RNAproducts th at have a secondary stru cture. Tillier and

Collins20 and Muse21 independently introduced



34 ReviewReview

The principles behind M arkov Chain

Monte Carlo (M CMC) methods can be

illustrated w ith a simpl e analogy. A robot

is allow ed to w alk in a square field. The

robo t takes steps that vary in length and

random ly selects a direction fo r each

step. The robot never leaves the field,

because if a step is about to take itoutside the boundary, it is reflected back

into th e field. A representative path

followed by such a robot is illustrated in

Fig. Iac for 100 (Fig. Ia), 1000 (Fig. Ib)

and 10 000 steps (Fig. Ic). (The path l ines

have been removed from Fig. Ic to clarify

the distribu tion o f steps.) Note that, even

though the robot i s walking randoml y,

eventually every portion of the field is

visited if it is allowed to walk for long

enough.

Now suppose that two hills are present

(represented by bivariate normal densities

one uncorrelated and w ith a correlation

coefficient of 0.9) and the robot is

programm ed to use the following simpl e

rules (based on the local environm ent)

when taking steps:

If a proposed step will take the robotuphi ll, it autom atically takes the step.

If a proposed step will take the robot

downhil l, it divides the elevation at the

proposed location by i ts current

elevation and on ly takes the step if it

draws a random number (uniform on

the open in terval 0,1) that is smaller

than this quotient.

The proposal distribution is

symm etrical, wh ich means that the

probability of p roposing a step from

poin t A to B is the same as that of

proposing a step from point B to A.

By follow ing these rules, the pathfollowed by the robot w ill take it to points

in the field in proportion to their

elevation, with higher points being

visited more often than low er ones.

Letting the robot begin i ts walk near the

upper left-hand corner, its mov ements

are illustrated in Fig. Idf for 100 (Fig. Id),

1000 (Fig. Ie) and 10 000 steps (Fig. If).

(As in Fig. Ic, path lines are not show n in

Fig. If.)

The robots movem ents can be used to

estimate the volum e under any specified

portion of thi s landscape. In the exampleabove, if the fir st few steps (called the

burn-in stage) are disregarded then the

resulting distribution of points is a good

approximation of the landscape, which is a

mixtu re of two separate bivariate normal

densities.The longer the robot is allowed to

walk, the closer the approxim ation

becomes.

This analogy also illustrates one of the

pitfalls of approximating surfaces using

MCMC methods. Were the robot to take

only a few steps, it is clear that it would not

be likely to encounter the second hi ll.Therefore, care should be taken to ensure

that Markov chains are run for long enough

to provide a good approximation of the

surface over the entire parameter space.

One simple w ay to investigate this is to run

several chains, each starting from a

different (randomly selected) starting point.

If the resulting approximations are

substantially di fferent, this indicates that

none of the chains were run for long

enough.

Box 4. Markov Chain Monte Carlo methods

TRENDS in Ecology & Evolution

(d) (e) (f)

I (a) (b) (c)


6/8

models that can accoun t for th e evolutiona ry

dependence between nu cleotide sites tha t a re

opposite one anoth er in th e stem r egions of an RNA

gene product. Following subst itut ions in t hese sites,

selection favors compen sat ory substitu tions to

preser ve the sta bility of stem r egions an d hence alsothe secondar y structure t hat is needed for the

cat alyt ic act ivities of th ese RNA molecules. Muses21

model requires only a single additiona l free

par am eter t o accoun t for th e evolutiona ry correlat ion

between paired stem sites and reduces to a sta ndar d

nu cleotide substitu tion model (the HKY85 model) if

no such correla tion exists in the stem regions.

Applications of likelihood to discrete morphological

characters

The most common met hod for obtainin g estimat es of

an cestral chara cter sta tes for discrete morphological

chara cters involves par simony optimization,alth ough likelihood-based a ltern at ives have been

proposed2224. Cunn ingham 25 reviewed an cestral

chara cter sta te estim at ion. I will therefore focus on a

differen t a pplication of likelihood models in

inferences concerning morphological features,

na mely testin g for evolutiona ry correlat ion.

Pagel26 described a likelihood ra tio test (Box 2) th at

allows an investigator to determine whether two

discret e morphological tr aits a re corr elated t o a grea ter

extent t ha n can be explained simply by th e phylogeny.

Two traits a re evolut iona rily correlated if a cha nge in

sta te of one of the chara cters predisposes the other

character t o chan ge state soon after. For example, theevolution of feat her s on t he forelimbs of dinosaurs is

corr elated with th e evolution of wings. In th is case, th e

appea ra nce of feather s for reasons oth er th an flight 27

appa ren tly encour aged a chan ge from th e stat e wings

absen t to wings presen t, which might not h ave

otherwise occur red. P hylogenies impose correlations

on chara cters, even if th ose cha ra cters evolve

independently. Pa gels test addresses wheth er a n

evolutionar y correlation exists between t wo char acters

th at a cts to increase th e observed correlation above and

beyond t he level imposed by the ph ylogeny.

Suppose tha t t wo characters each change state just

once on a phylogeny and h appen to cha nge along thesam e segment of th e phylogeny. At first sight , this

coincidence might a ppear to be the resu lt of an

evolutionar y corr elation. However, evidence to supp ort

th is interpr eta tion would be weak, becau se the co-

evolutionar y event occur red only once. By contra st, th e

evidence of correlated evolution of the t wo cha ra cters

would be stronger if cha nges in t he t wo cha ra cters co-

occur on ma ny different par ts of the phylogeny. Pa gel

and others have emph asized th e importance of using

explicit phylogeny-based met hods for assessing t he

str ength of evidence for such correla tions, becau se th e

nu mber of degrees of freedom a vailable for the t est can

otherwise be grea tly exaggera ted26.In Pa gels test, the null hypothesis is tha t t he two

char acters of inter est ha ve evolved indepen dent ly.

The gener al (unconstra ined) model allows for some

correla tion in the evolution of th e two cha ra cters.

The nu ll model is a const ra ined version of th e genera l

model, because the indepen dence assum ption

const ra ins th e corr elation to equal zero. The

ma ximum of th e likelihood fun ction un der th eunconstr ained model will, thu s, approach tha t of the

constra ined model as the correlation a pproaches

zero. If th e tru e corr elation is not zero, th en th e

ma ximum of th e likelihood fun ction un der th e

genera l model will be significant ly larger t ha n th at

which can be att ained by th e null model, reflecting

the bett er fit of the general model. Pagel argues th at

examination of the estimated values of the

para meters of the general model enables a

deter mina tion of which chara cter is most likely to

have changed sta te first, precipitat ing a chan ge of

stat e in t he other character. However, such fine-

scaled interpret ations should be made with caution,because several paramet ers are estimated un der this

model using data from only two cha ra cters.

An inter esting exam ple of the app licat ion of

Pa gels meth od involves testin g wheth er switches to

mu tu alism in fungi, eith er in th e form of associat ion

with algae (i.e. lichen ization) or liverworts, a re

associat ed with chan ges in the ra te of molecular

evolution 28. Lutzoni an d Pa gel28 began by classifying

a grou p of closely rela ted Omphalina mushroom

species a s eith er fast (F) or slow (S) evolving a nd as

either m ut ua listic (M) or nonmu tu alistic (N).

Est imat es of th e ra te of evolution were ba sed on

nu clear ribosoma l DNA (nr DNA) from t he 25S an d5.8S genes and t he intern al tr anscribed spacer (ITS)

region. They then obtained likelihoods for both t he

nu ll model (assumin g independen t evolution of both

mut ualism an d th e rat e of molecular evolution) and

th e genera l model (in which th e rat e of evolution an d

mut ualism can be correlated). The three set s of data

fit th e correlation model better t han the

indepen dence model. However, the impr oved fit wa s

only significant for t he 25S da ta . Therefore, Lut zoni

and Pa gel28 concluded th at t her e is a corr elation

between the rat e of molecular evolution a t t he 25S

nr DNA locus and t he evolut ion of mu tu alism. These

aut hors also looked at individual rat e para meterstha t were estimat ed in the nonindependence model.

For the 25S data, the estima ted rat e for the

tra nsition M,SM,F was significant ly greater t hantha t for N,SN,F, indicating th at increases in th era te of molecular evolution a re m ore likely to occur in

lineages tha t ar e alread y mutu alistic. Also (again for

the 25S dat a), the ra te of the t ran sition N,SM,Swas significant ly greater tha n t hat for N,SN,F,indicating t ha t t he t enden cy for slowly evolving

lineages to evolve mut ualism is greater t han the

ten dency for nonmu tu alistic linea ges to evolve fast

ra tes of molecular evolution. This ability to tease

apar t t he correlations am ong char acters sets Pagelsmethod apa rt from other m ethods, which simply

assess wheth er a correlation exists.



35ReviewReview


7/8



36 Review

References

1 Felsenstein, J. (1981) Evolutionary trees from

DNA sequences: a ma ximum likelihood approach.

J. Mol. Evol. 17, 368376

2 Felsenstein, J . (1999) PHYLIP, Phylogeny

Inference Package (University of Washin gton,

Seat tle), Version 3.572

3 Ju kes, T.H. and Cantor, C.R. (1969) Evolution of

protein m olecules. In Mamm alian Protein

Metabolism (Munro, H.N., ed.), pp. 21132,Academ ic Press

4 Kimura, M. (1980) A simple method for

estimating evolutionary rat e of base

substitut ions t hrough comparative studies of

nucleotide sequences.J. Mol. Evol . 16, 111120

5 Hasegawa , M. et a l. (1985) Dating of th e hum an-

ape split ting by a m olecular clock of

mitochondria l DNA.J. Mol. Evol. 21, 160174

6 Kishino, H. and Hasegawa, M. (1989)

Evaluation of the ma ximum likelihood estimate

of the evolut ionary t ree topologies from DNA

sequence data , and the bra nching order in

hominoidea.J. Mol. Evol . 29, 1701797 Rodrguez, F. et a l. (1990) The gener al

stochast ic model of nucleotide subst itut ion.

J. Th eor. Biol . 142, 485501

8 Churchill, G.A. et al. (1992) Sample size for a

phylogenetic inference.Mol. Biol. Evol . 9, 753769

9 Yang, Z. (1994) Maximu m likelihood

phylogenetic estimat ion from DNA sequences

with variable rat es over sites: approximate

methods.J. Mol. Evol . 39, 306314

10 Yang, Z. (1993) Maximu m-likelihood estima tion

of phylogeny from DNA sequen ces when

substitut ion r ates differ over sites.Mol. Biol.

Evol. 10, 1396140111 Yang, Z. (1996) Among-site rat e variation an d

its impact on phylogenetic ana lyses. Trends

Ecol. Evol. 11, 367372

Acknowledgements

I would like to thank Kent

Holsinger, Louise A.

Lewis, Mark Pagel, Chris

Simo n and Zih eng Yang

for their careful reading

of earlier drafts of this

article. I also grateful ly

acknow ledge th e Alfr ed P.Sloan Foundation/

National Science

Foundation for funding

(grant 98-4-5 ME).

A Bayesian future for model-based phylogenetics?

Likelihood m ethods in phylogenetics t ake longer tha n

other met hods. For large data sets, it is not uncommon for

single heuristic searches to take days, if not month s, for a

compu ter t o complete. Even when complete, such

ana lyses only represent a point estimate of a ph ylogeny. Ameasu re of support for individual clades requires BOOT-

STRAPPING, which effectively tur ns a n a lready time

consum ing analysis into one that r equires a far grea ter

amount of time. This situat ion might eventua lly be

ameliora ted by taking a Bayesian approach. Larget and

Simon29 have summ arized the recent litera tu re on Baye-

sian approaches to making inferences about

phylogenies2934 (S. Li, PhD t hesis, Ohio State U niversity,

1996; B. Mau , PhD t hesis, University of Wisconsin, 1996)

and h ave presented novel algorithm s to estimat e

simultan eoulsy the t ree topology and provide a measure

of nodal support. These algorithm s have the potential t o

tr ansform model-based phylogenetic inference.Bayesian a pproaches to sta tistical problems involve

ma king inferences using th e posterior distr ibution (or

simply th e posterior) of hypotheses or pa ra met ers of

inter est (Box 3). This is feasible when problems are

simple enough for a n a na lytical form ula for the

poster ior to exist. However , for most rea l problems,

models involve several para meters a nd, th us, a

complicated joint poster ior for which t here is no simple

form ula. The poster ior for a n u nr ooted ph ylogenetic

tr ee, for examp le, involves at least th e topology (a

discrete par ameter) and 2N3 branch length

para meters, whereNis the n um ber of tip nodes.

Fortu na tely, a technique exists for approxima tingcomplicated posteriors, such as th ose th at a re

chara cteristic of problems th at involve phylogenies.

The technique is called Mark ov Chain Mont e Carlo

(MCMC) or th e Metr opolisHastin gs algorithm 35,36.

The idea un derlying MCMC is tha t a Ma rkov cha in

th at t akes t he form of a correla ted ra ndom walk

thr ough th e para meter spa ce can be conducted in such

a way th at a ny probability distribut ion (however

complicated) can be a pproximated by periodically

sam pling values (Box 4). The approximation can be

made a rbitrarily accurate by runn ing the Markov

chain for a sufficient nu mber of steps.

In ph ylogenetic applicat ions, each step in aMark ov cha in involves a ra ndom m odification of th e

tree topology, a bra nch length or a param eter in t he

substitu tion model (e.g. substitu tion rat e ra tio). If th e

posterior t ha t is compu ted for a proposed step is

larger than tha t of the current tree topology and

param eter values, the proposed step is taken.

Proposed steps that result in downh ill moves on th e

posterior sur face are not au tomat ic an d depend on the

ma gnitude of the decrease. The function th atdeterm ines th e probability of a downhill step in th is

case is based on the ra tio of th e new an d current

posteriors. Using th ese rules, the Ma rkov cha in

visits regions of tr ee spa ce (sensu lato, including th e

tree topology space an d dimen sions for other

par am eters, such as branch lengths) in proport ion to

their posterior.

For a pr actical examp le of how th is meth od could be

used to answer a question posed by a systemat ist,

consider t he qu estion, What is th e probability t ha t

group X is monophyletic? To answer th is quest ion, we

would run the Mar kov chain, sampling trees

periodically. Suppose tha t in a sa mple of 100 000 trees,group X appeared a s a m onophyletic group in 74 695

trees. The pr obability (given th e observed dat a) tha t

group X is monophyletic is appr oxima tely 0.74695,

because t he Mar kov chain visits t rees in proportion t o

their posterior pr obability. This value is a n atu ral

measu re of nodal support an d is easier to interpret

than existing measures th at are based on parsimony

(decay ind ex and bootst ra p values ) or likelihood

(bootstr ap valu es). Lar get an d Simon29 mak e the point

tha t only one r un of the Mar kov chain is needed to gain

such informa tion, compared t o the ma ny bootstr ap

run s needed in a m aximum likelihood analysis.

Another import an t app licat ion of MCMC in Bayesianphylogenetic inference involves estimat ing divergence

times in a relaxed molecular clock model, that is, a

model in which substitut ion r ates var y across the

phylogeny, but in which rat es in descendan t lineages are

correlated with the rat e in th eir common a ncestor37,38.

MCMC meth ods provide Bayesian credibility int ervals

for divergence dates with out ha ving to assu me a per fect

clock and also allow consider able flexibility (an d even

un certain ty) in how th e clock is calibrated.

The a bility of MCMC models to provide an swers t o

virtu ally any question t hat might be imagined by the

investigator is a t hrilling prospect. The a bility to

provide an swers mu ch more efficiently th an ispossible with curr ent likelihood met hods should

ma ke th e Bayesian ap proach to phylogenetics well-

wort h wat ching in th e next few year s.

Review


8/8

During the past decade, the explosion of molecular

techniques h as led to the a ccumu lation of a

considera ble amoun t of compara tive genetic

informa tion at t he population level. At the sam e time,

recent a dvances in population genetics theory,

especially coalescent theory, ha ve genera ted p owerful

tools for t he an alysis of intr aspecific data . These t wo

developmen ts h ave converted int rasp ecific

phylogenies int o useful tools for test ing a var iety of

evolutionary a nd populat ion genet ic hypotheses.Several phylogenetic m ethods, especially NETWORK

(see Glossary) approaches, ha ve been developed t o

ta ke advan ta ge of the u nique chara cteristics of

intr aspecific dat a. In th is art icle, we summ ar ize some

population genetics principles, explain why net works

ar e appropriat e represent at ions of intr aspecific

genetic variation, describe an d compa re a vailablemeth ods an d software for net work estimat ion, and

give exam ples of their application.

Gene genealogies

Given a sam ple ofGENES, the relationships among

th em can be tra ced back in time to a comm on

an cestral gene. The genealogical pat hways

interconnecting the current sam ple to th e common

an cestor const itut e a GENE TREE or gene gene alogy. A

gene tr ee is the p edigree of a set of genes a nd exists

independen tly of potent ial mut at ions. The only

portion of a gene tr ee tha t can genera lly be estimat ed

with genetic data is that portion m arked by the(potent ial) mu ta tional events th at d efine the different

ALLELES (Box 1). This lower r esolution tr ee is th e allele

Intraspecific gene evolution cannot always be represented by a bifurcating t ree.

Rather, population genealogies are often multifurcated,descendant genes

coexist with persistent ancestors and recombination events produce reticulate

relationships.Whereas traditional phylogenetic methods assume bifurcating

trees,several networking approaches have recently been developed toestimate intraspecific genealogies that take into account these population-

level phenomena.

Intraspecific gene genealogies:trees grafting into networksDavid Posadaand Keith A.Crandall


http://tree.trends.com 01695347/01/$ see fro nt m atter 2001 Elsevier Science Ltd. A ll righ ts reserved . PII: S0169-5347(00)02026-7

37ReviewReview

12 Swofford, D.L. et a l. (1996) Phylogenetic

inference. In Molecular Syst ematics (Hillis,

D.M. et al., eds), pp. 407514, Sinau er

Associates

13 Huelsenbeck, J.P. an d Cranda ll, K.A. (1997)

Phylogeny estimation and hypothesis testing

using maximum likelihood.Annu . Rev. Ecol.Syst . 28, 437466

14 Lewis, P.O. (1998) Maximum likelihood as an

alterna tive to parsimony for inferring

phylogeny using nucleotide sequen ce dat a. In

Molecular S ystematics of Plants II (Soltis, D.E .

et al., eds), pp. 132163, Kluwer

15 Swofford, D.L. (2000) PAUP*, Ph ylogenetic

Analysis Using Parsimony (Sinauer,

Sun derlan d, MA), Version 4.0b3,

16 Yang, Z. (1997) PAML: a program pa ckage for

phylogenetic analysis by maximu m likelihood.

CABIOS 13, 555556

17 Muse, S.V. an d Gaut , B.S. (1994) A likelihood

approach for comparing synonymous and

nonsynonymous substitut ion r ates, with

application to the chloroplast genome.Mol. Biol.

Evol. 11, 715724

18 Goldman, N. an d Yang, Z. (1994) A codon-based

model of nucleotide subst itut ion for protein -

coding DNA sequ ences.Mol. Biol. Evol . 11,

725736

19 Muse, S.V. and Kosakovsky Pond, S.L. (2000)

HYPHY , Hypothesis Testing Using Phylogenies ,

(North Carolina State University, Raleigh)

Version 1

20 Tillier, E.R.M. an d Collins, R.A. (1995)

Neighbor joining an d maximu m likelihood with

RNA sequences: addressing th e interdependence

of sites.Mol. Biol. Evol . 12, 715

21 Muse, S.V. (1995) Evolutionary an alyses of DNA

sequences subject t o constr aints on secondar y

structure. Genetics 139, 14291439

22 Schluter, D. et al. (1997) Likelihood of ancestor

stat es in adaptive radiat ion.Evolution 51,16991711

23 Mooers, A.. and Schluter, D. (1999)

Reconstructing ancestor states with m aximum

likelihood: support for one- and t wo-ra te models.

Syst. Biol. 48, 623633

24 Pagel, M. (1999) The maximum likelihood

approach to reconstr ucting ancestral chara cter

stat es of discrete characters on phylogenies. Syst.

Biol. 48, 612622

25 Cunningham, C.W. et al. (1998) Reconst ru cting

ancestral character sta tes: a critical reappraisal.

Trend s Ecol. Evol. 13, 361366

26 Pagel, M. (1994) Detecting corr elated evolution on

phylogenies: a general met hod for t he

compar ative an alysis of discrete chara cters. Proc.

R. Soc. London B Biol. Sci. 255, 3745

27 Qiang, J . et al. (1998) Two feather ed dinosau rs

from northeaster n China.Nature 393, 753761

28 Lutzoni, F. and P agel, M. (1997) Accelerated

evolut ion as a consequ ence of tran sitions to

mutualism. Proc. Natl. Acad. Sci. U. S. A . 94,

1142211427

29 Larget, B. and Simon, D.L. (1999) Markov chain

Monte Ca rlo algorithms for t he Bayesian a nalysis

of phylogenetic tr ees.Mol. Biol. Evol . 16, 750759

30 Mau, B. and Newton, M.A. (1997) Phylogenetic

inference for binar y data on dendrograms using

Markov chain Monte Car lo.J. Comput. Graph.

Stat . 6, 122131

31 Mau, B. et al. (1999) Bayesian phylogenetic

inference via Mar kov chain Monte Carlo methods.

Biometrics 55, 112

32 Newton, M. et al. (1999) Mark ov cha in Monte

Car lo for th e Bayesian an alysis of evolutionarytrees from a ligned m olecular sequences. In

S tatistics in Molecular Biology (Seillier-

Moseiwitch, F. et al., eds), Instit ute of

Mathema tical Statist ics

33 Ranna la, B. and Yang, Z. (1996) Probability

distr ibution of molecular evolutionar y trees: a

new met hod of phylogenetic inference.J. Mol.

Evol. 43, 304311

34 Yang, Z.H. and Rann ala, B. (1997) Bayesian

phylogenetic inference usin g DNA sequences: a

Markov Chain Monte Car lo method.Mol. Biol.

Evol. 14, 717724

35 Metropolis, N. et al . (1953) Equ at ion of sta te

calculations by fast computing ma chines.

J. Chem. Phys. 21, 10871092

36 Hast ings, W.K. (1970) Monte Carlo sampling

methods using Markov chains a nd th eir

applications.Biometrika 57, 97109

37 Huelsenbeck, J.P. etal. (2000) A compoun d

Poisson process for relaxing t he molecular clock.

Genetics 154, 18791892

38 Thorne, J .L. et al . (1998) Estimating th e rat e of

evolution of th e rat e of molecular evolution.

Mol. Biol. Evol . 15, 16471657

39 Felsenstein, J. (1985) Confidence int ervals on

phylogenies: an approach using t he bootstr ap.

Evolution 39, 783791

Lewis, 2001. Phylogenetic System a Tics Turns Over a New Leaf

Documents

Transcript of Lewis, 2001. Phylogenetic System a Tics Turns Over a New Leaf