Lewis, 2001. Phylogenetic System a Tics Turns Over a New Leaf

download Lewis, 2001. Phylogenetic System a Tics Turns Over a New Leaf

of 8

Transcript of Lewis, 2001. Phylogenetic System a Tics Turns Over a New Leaf

  • 8/6/2019 Lewis, 2001. Phylogenetic System a Tics Turns Over a New Leaf

    1/8

    TRENDS in Ecology & EvolutionVol.16 No.1 January 2001

    http://tree.trends.com 01695347/01/$ see fron t m atter 2001 Elsevier Science Ltd. A ll righ ts reserved . PII: S0169-5347(00)02025-5

    30 Review

    As an OPTIMALITYCRITE RION (see Glossary) in

    phylogenetics, MAXIMUM LIKELIHOOD has been in usealmost as long as par simony, which remains t he

    dominant meth od for phylogenetic ana lyses of

    discrete (morph ological an d molecular ) chara cter

    dat a. Unlike the other t wo ma jor optimality criteria ,

    both par simony and likelihood operat e directly on

    discrete character data r ather than on a m atr ix of

    pairwise distan ces. This allows parsimony an d

    likelihood met hods to be used to estimat e ancestra l

    char acter states, in addition t o estimating th e tree

    topology. F elsenst eins1 pru nin g algorit hm ma de it

    feasible to apply th e likelihood criter ion to nucleic

    acid sequence data . His freely distribut ed package of

    compu ter pr ogram s for phylogeny inferen ce, calledPH YLIP (Ref. 2), int roduced th e likelihood criter ion

    into molecular systema tics and m olecular evolut ion.

    Felsens teins 1981 model1 was th e first in a series of

    improvement s ma de to the pioneering model of Juk es

    and Cant or3 an d was the first to allow nu cleotide

    frequencies to differ (the J ukes a nd Cantor m odel3

    assum es tha t th e four types of nucleotide are pr esent

    in equa l proportions). Fur ther developments led to

    models th at accommodate both

    tra nsitiontr ansversion substitu tion bias4 and

    unequal nucleotide frequencies5,6. This tren d

    culminated in the general t ime-reversible (GTR)

    model, which n ot only allows unequ al nu cleotide

    frequen cies, but also allows each of th e six possible

    tra nsitions between nucleotide sta tes (i.e. AC, AG,AT, CG, CT and GT) to occur at different r at es7.

    A furt her improvement, used in conjunction with

    the abovement ioned m odels, is th e ability to

    accommodate variat ion in subst itution ra tes across

    sites, using several meth ods810. One of th e most

    comm only app lied meth ods for allowing rat es to

    vary among sites is the discrete gam ma m ethod,

    which a ssumes that relative rat es are distr ibutedaccording to a gam ma distr ibution with a mean of

    1 and a var iance of 1/, where defines the sh ape ofthe distribution. Large values of tran slate to a t inyvarian ce in relative rat es (i.e. the ra tes at all sites

    are almost equivalent), whereas sm all values (e.g.

    0.01) result in an L-shaped distribution of relat ive

    rat es (most rat es are low, but some rat es are h igh).

    Yang11 reviewed in depth th is approach to

    accommodating site-to-site substitu tion ra te

    heterogeneity, and other r eviews of this and other

    topics includ e Swofford et al.12, Huelsenbeck an d

    Crandall13 an d Lewis14.

    Some comput er program s tha t ar e used todeter mine ph ylogenies allow other options with

    respect to modeling ra te het erogeneity, including:

    (1) invar iable sites m odels, in wh ich a proportion of

    sites is assum ed incapable of un dergoing

    substitution15; (2) site-specific ra tes models, wh ich

    allow different r at es for arbitr ar y subsets of sites

    th at correspond to different genes, intr ons versus

    exons or differe nt codon positions 15; (3) hidd en

    Mark ov models which allow for corr elation in t he

    rat es at adjacent sites2; and (4) discrete gam ma

    models in wh ich can differ between subset s ofsites16. It is also possible to combine an invar iable

    sites model with a discrete gam ma model, letting the

    Review

    Paul O. Lewis

    Dept of Ecology and

    Evolutionary Biology,

    University of Connecticut,

    75 N. Eaglevill e Road Uni t

    3043, Storr s, CT 06269-3043, USA.

    e-mail:

    [email protected]

    Phylogenetic systematics turns overa new leafPaul O. Lewis

    Long restricted to the domain of molecular systematics and studies of

    molecular evolution, likelihood methods are now being used in analyses of

    discrete morphological data, specifically to estimate ancestral character states

    and for tests of character correlation. Biologists are beginning to apply

    likelihood models within a Bayesian statistical framework,which promises not

    only to provide answers that evolutionary biologists desire,but also to make

    practical the application of more realistic evolutionary models.

    Bootstrapping: a statistical techniqu e, first applied to phylog enetics

    by Felsenstein39, in which new data sets are created by samplin g

    randomly (and with replacement) from the origi nal characters.

    These new data sets (called bo otstrap d ata sets) are of th e same size

    as the origi nal. A desired quantity i s computed for each bootstrap

    data set and the resulting distributi on is used to estimate the

    dispersion that would be expected if the same num ber of new

    independent data sets had been collected. Bootstrappin g assumes

    that the original characters were sampled independent ly.

    Likelihood:a quantity that is proportional to the probability of the

    data (or probabili ty density, if the data are continuous-valu ed), given

    specific values for all parameters in the m odel.The likelihoodfunction provides a means to estimate the parameters of the model.

    Parameter values that are associated with the global m aximum of

    the likelihood function are termed maxim um li kelihood estimates

    (MLEs).

    Optimality criterion: a rule used to decide which of tw o trees is

    best. Four opti mality criteria are currently widely used:

    Maximum parsimony the tree requir ing th e fewest character state

    changes is the better of th e two t rees.

    Maximum likelihood the tree maximizing the likelihood under the

    assumed evolutionary model is better.

    Minimum evolution the tree having t he smallest sum of branch

    lengths (estimated using ordi nary least squares) is better.

    Least squares the tree showin g the best fit between

    estimated pairw ise distances and the corresponding pairw isedistances obtained by summi ng paths through t he tree is

    better.

    Glossary

  • 8/6/2019 Lewis, 2001. Phylogenetic System a Tics Turns Over a New Leaf

    2/8

    TRENDS in Ecology & EvolutionVol.16 No.1 January 2001

    http://tree.trends.com

    31Review

    gamma shape parameter () apply only to th evaria ble fraction of sites 15.

    This review focuses on new n ucleotide sequen ce-based m odels (codon a nd seconda ry str uctur e

    models), new a pplications of likelihood models

    (discrete morph ological char acters) an d a new

    framework with in wh ich likelihood m odels can be

    applied t o phylogenetics (Bayesian inference). First ,

    models k nown as codon m odels an d second ar y

    stru cture models allow evolutiona ry dependence

    am ong nucleotide sites within t he sam e codon an d

    between sites th at are opposite one a nother in a stem

    region of an rRN A gene, res pectively. By contr ast ,

    par simony and previous likelihood models assu me

    th at all nucleotide sites ar e evolutionarily

    independen t. Therefore, codon a nd seconda rystru cture models represent a significant step forwar d

    by allowing recognized evolut ionar y correla tions

    between sites to be directly incorpora ted int o a

    phylogeny met hod. Second, likelihood is being applied

    to discrete morph ological char acters, wh ich ha ve been

    limited previously to the r ealm of parsimony an alysis.

    Applicat ions t ha t u se likelihood m odels include th e

    estima tion of an cestral chara cter stat es and test s for

    chara cter correlat ion. Third, in addition to new

    models, the very fra mework in which likelihood

    models ar e used is cha ngin g. Likelihood models form

    th e basis of Bayesian sta tistical methods, and it is

    only natu ral th at Bayesian meth ods be applied tophylogenetic problems alrea dy being addressed u sing

    likelihood models.

    Likelihood models that allow nonindependence

    In 1994, Muse and Gau t 17 and Goldman a nd Yang 18

    indepen dent ly intr oduced likelihood models tha twere designed t o account for evolutionar y

    dependen cy among sites with in codons. This is in

    cont ra st t o par simony and pr evious likelihood

    models, which were forced to assu me indep enden ce

    am ong sites in a gene sequ ence, even if th e

    independence a ssumption h ad been violated. For

    instance, in genes th at encode proteins, the t hree

    nu cleotide sites tha t form a single codon cann ot

    evolve indepen dent ly of one a nother if there is

    selection for a par ticular a mino acid at th e corre-

    sponding site in the polypeptide.

    Codon modelsCodon models represent an importa nt advance in

    ter ms of the biological r ealism of substitu tion

    models. Codon models t ake th e genet ic code explicitly

    into accoun t wh en computin g the pr obability of a

    change at a site across a bran ch. Thus, rath er tha n

    four possible sta tes (A, C, G and T) there a re 61 (64

    possible codons, less th e th ree st op codons), which

    mean s tha t codon models require a great deal more

    compu ta tional effort th an classic DNA an d RNA

    subst itut ion models. In par ticular, likelihood

    met hods consider each stat e in tur n for every interior

    node of a t ree. Un like par simony, the score used for

    compar ing trees does not depend on an y part icularcombina tion of (unobserved) an cestral st at es. The

    overa ll likelihood is a su m over all sta te

    Review

    A small portion of the full (6161)INSTANTANEOUS RATEM ATRIX (see Glossary) fo r

    the Muse and Gauta codon model isillustrated (Table I). The quanti ties

    A,

    C,

    G

    and T

    are the equilibrium nucleotide

    frequencies, only three of w hich represent

    free parameters of the model, because the

    fourth can be obtained by subtraction. The

    remaining tw o parameters in the model, (synonym ous substitutions) and

    (nonsynonymous substitutions), are

    relative r ate parameters. The diagonal

    elements are not shown, but can becalculated by subtraction. All changes

    between the codons that involve more than

    one nucleotide substitution have an

    instantaneous rate of zero. The remaining

    elements represent single nucleotide

    substitutions that result in a change from

    one codon to another. The rate of any one

    of this type of change depends on the

    nucleot ide frequencies (e.g. the rate is

    lower for substitutions to a relatively rarebase) and the nature of the change

    (synonymous versus nonsynonymous).

    Reference

    a Muse, S.V. and Gaut, B.S. (1994) A likelihood

    approach for comparing synonymous a nd

    nonsynonymous substitu tion rates, with

    application to the chloroplast genome.Mol.Biol.

    Evol. 11, 715724

    Box Glossary

    Instantaneous rate matrix:a square matrix ofsubstitut ion rate parameters that serves to define a

    particular substitution m odel. The probability of

    observing statej, given starting state i, changes wit h

    time in a nonlinear fashion. The rate parameter

    represents the slope of this curve evaluated at t= 0.

    The elements on the diagonal are negative and equal

    to the sum of the other rates in the same row. This

    quantity is negative, because the probability of

    observing state i, given that iwas the starting state,

    decreases with t ime.

    Box 1.The anatomy of a codon model

    Table I. Part of Muse and Gauts 61 61 instantaneous rate matrixa

    Codon before Codon after substitution (the to state)

    substitution

    (the from state) TTT TTC TTA TTG CTT CTC GGG

    (Phe) (Phe) (Leu) (Leu) (Leu) (Leu) (Gly)

    TTT (Phe) C

    A

    G

    C

    0 0

    TTC (Phe) T

    A

    G

    0 C

    0

    TTA (Leu) T

    C

    G

    0 0 0

    TTG (Leu) T

    C

    A

    0 0 0

    CTT (Leu) T

    0 0 0 C

    0

    CTC (Leu) 0 T

    0 0 T

    0

    GGG (Gly) 0 0 0 0 0 0

  • 8/6/2019 Lewis, 2001. Phylogenetic System a Tics Turns Over a New Leaf

    3/8

    TRENDS in Ecology & EvolutionVol.16 No.1 January 2001

    http://tree.trends.com

    32 Review

    combinations, in mu ch the same way tha t th e

    probability of rolling a seven with two dice is a sum

    over th e six possible ways in which the n um bers on

    each die can add u p to seven.

    Although it would be possible to allow a differen t

    ra te for every possible tr an sition bet ween codons,

    th e estimat ion of so ma ny par am eters (3660 in

    total) would require a n u nreasonable amount of

    dat a. To keep th e num ber of free (estima ted)para meters m ana geable, the t wo codon models th at

    ha ve been pr oposed ma ke some concessions. F or

    instance, neither model permits more tha n a single

    change in an y given insta nt (e.g. the ra te of th e

    chan ge AAAAGC is 0). However, t his is areasona ble restriction, becau se two changes ar e

    allowed to occur in t wo consecut ive infinit esima l

    periods of tim e.

    The model by Muse an d Gaut 17 assigns rat es to all

    other possible chan ges (i.e. the changes th at do not

    require more th an two nucleotide point m uta tions) onthe basis of two param eters that represent the r ate of

    synonymous an d nonsynonymous subst itutions,

    Review

    In Pagelsa,b likelihood ratio test, two m odels are

    considered, of which one is a more general version of

    the other. Normall y, the constrained model

    represents the null hypothesis and is nested wi thin(i.e. is a special case of) the m ore general m odel.

    A likelihood ratio test returns a significant result if

    the data are better explained by the general model (for

    which the maximum of the likelihood function isL1)

    than by the constrained model that represents the null

    hypothesis (for which the likelihood m aximum is L0). A

    ratio L1/L

    0much greater than 1 indicates signi ficance

    and the likelihood ratio test statistic, defined asLR=2(ln L

    0 ln L

    1), is large and positi ve.LRis distributed as

    a chi-squared random variable with degrees of freedom

    equal to the difference in the number of free (estimated)

    parameters between the two models if the null model is

    perfectly nested with in the unconstrained model.

    The null hypothesis model for a test of correlated

    evolution is based on the follow ing instantaneous

    rate matrix (Table I).

    The same matr ix is used for each character, but andare estimated separately for each character.Thenumber of free parameters in the null model is thus

    equal to 4. In practice, the TRANSITION MATRIX (see Glossary)

    that corresponds to this rate matrix is computed for each

    character (let the t ransition m atrix for character 1 be P

    and let that for character 2 beR), which allow s

    probabili ties for specific events to be computed. For

    example, the joint probability that character 1 remains in

    state 0 and character 2 changes from state 0 to 1 across a

    specific segment of the tree, is obtained as follows:

    Pr(00,01) =P(00)R(01) [1]The fact that the joint transition p robability is the

    product of the separate transition probabili ties foreach character m akes explicit the assumption of

    independence in this model.

    The general model enables (but does not f orce)

    the evolution of the tw o characters to be correlated

    and can account for simul taneous changes in the

    characters by using a single m atrix (Table II).

    This model has eight free parameters. The

    maximum -likelihood value under this modelapproaches that of the null m odel if the characters are

    evolving independently o f one another. As in the

    codon m odels, a rate of 0 is specified for events that

    require two changes. The difference in the num ber of

    free parameters is 84= 4, so the likelihood ratio teststatistic is nom inally distributed as a2 randomvariable with four degrees of freedom. LRmight not

    follow a 2 distribu tion i n this case, because themodels are not strictly nested (owing to the four rates

    wi th values of 0 in the general model, but see Ref. b).

    Pagel provides (in his com puter program , DISCRETE)

    the means for using parametric bootstrapping to

    determine the significance level w ithout making thisdistributional assumption .

    References

    a Pagel, M. (1994) Detecting correlat ed evolution on phylogenies:

    a genera l method for th e compar ative ana lysis of discrete

    characters.Proc.R . S oc.London B Biol. Sci. 255, 3745

    b Pagel, M. (1997) Inferring evolutionary processes from

    phylogenies.Zoologica S cripta 26, 331348

    Box Glossary

    Transition matrix: a matrix that shows the conditional probability

    of observing statejgiven starting state iafter some arbitrary am ount

    of time (t) for each pair of states iandj.This matrix can be derived

    from the instantaneous rate matrix, w hich describes fully the

    process at any given instant. (In this context, transition should not

    be confused with the same term that is used in conjunction with

    transversions.)

    Box 2. Likelihood ratio test for character correlation

    Table I. Instantaneous rate matrix for the null

    hypothesis assuming evolutionary independence

    Changing from: Changing to:

    0 1

    0 1

    Table II. Instantaneous rate matrix for the general

    hypothesis allowing evolutionary correlation

    After state Before state

    0, 0 0, 1 1, 0 1, 1

    0, 0 ab a b 00, 1 c cd 0 d1, 0 e 0 ef f1, 1 0 g h gh

  • 8/6/2019 Lewis, 2001. Phylogenetic System a Tics Turns Over a New Leaf

    4/8

    TRENDS in Ecology & EvolutionVol.16 No.1 January 2001

    http://tree.trends.com

    33ReviewReview

    Bayesian inference involves interplay

    between the likelihood function and the

    prio r and posterio r distributions. To

    illustrate this concept, consider thefollowi ng simplistic problem. Suppose a

    black marbl e has come from one of two

    urns that each contain mi llions of marbles.

    The challenge is to decide from w hich urn

    the marble was removed. Suppose further

    that it is known that 40% of the marbles in

    urn A and 80% of the marbles in urn B are

    black. Choosing a marble from urn B

    would be clearly more likely to yield a

    black marble. Nevertheless, it is instructive

    to calculate the posterior probabili ty that

    the marble came from urn B and to

    compare this to the posterior probabilitythat it came from urn A.

    Bayes Rule, which is used to obtain the

    posterior probability from the likelihood and

    prior probability, is based on the definition of

    conditional probability. Accordingly, for two

    events, A and B, the joint probability of A and

    Bequals the conditional probability of one,

    given the other (i.e. A given B or Bgiven

    A) multiplied by the probability of the

    condition (Eqn 1):

    Pr(A,B)=Pr(A)Pr(BA)=Pr(B)Pr(AB) [1]

    Focusing only on the equation on the right ,

    divi ding bo th sides by Pr(A) yields Bayes

    Rule (Eqn 2):

    [2]

    In the Bayesian framework, A

    represents data and Ba hypothesis (or

    parameter):

    Pr(BA) is termed the posterior probabil ity

    and is the probability of the hypothesis (or

    parameter value), given the data.Pr(AB) is termed the likelihood and is

    the probability of the data, given the

    hypothesis (or parameter value).

    Pr(B) is termed the prior probability and

    is the unconditional probability of the

    hypothesis (or parameter value). This must

    be specified by the investigator w ithou t

    reference to the data.

    Pr(A) is the unconditional probability of

    the data, which can be obtained, using the

    law of total probability, by calculating the

    sum of the product Pr(B)Pr(A/B) for all

    possible values of B. This serves as a

    norm alizing constant, wh ich ensures that

    the sum of the posterior probabilit ies is 1.

    In this example, there is only one

    datum (the black marble). Therefore,

    computing the likelihood (probability ofthe data, given the hypothesis) is

    straightforward and is simply the

    probability t hat a single marble is black,

    given a particular urn hypothesis. The

    likelihood is 0.4 for urn A and 0.8 for urn B.

    If maximum likelihood were being used to

    decide between the two urns, urn B would

    be selected, because the li kelihood of

    drawing a black marble is greater for urn B

    than for urn A (0.8 versus 0.4). However, to

    choose a prior distribution that reflects

    complete prior ignorance of which urn w as

    used when drawi ng the black marb le, wespecify that the prio r probabi lity of each

    urn is 0.5. The norm alizing constant is

    therefore (Eqn 3):

    Pr(black marble chosen) =

    (0.5)(0.4)+ (0.5)(0.8)=0.6 [3]

    The posterior probability can now be

    computed for each urn as follows (Eqns 4,5):

    [4]

    [5]

    Thus, the probability that the black

    marble came from urn B, given the datum,

    is two thirds.The posterior distribution i s

    often described as an updated version of

    the prior distribution. In this case, the

    posterior distribution (0.33, 0.67)

    represents an updated version o f the prior

    distribution (0.5, 0.5), the evidence used for

    the update being the fact that the single

    marble drawn was black. A m ajor benefit of

    taking the Bayesian perspective lies in the

    fact that it produces probabilities for

    hypotheses of in terest, which is exactly

    what investigators desire. Likelihoods are

    useful, but not easily in terpreted, because

    they represent the probability of the datagiven the hypothesis rather than the

    probability of the hypothesis given the

    data.

    A criti cism of Bayesian approaches is

    the subjectivity o f the prior. Note that in

    the example above (with only a single

    datum), it w ould be necessary to stipulate

    a prior probability for urn A that was

    greater than two thirds to rig the analysis

    (that is, to ensure that the conclusion is in

    favor of urn A). While the posterior

    distribution always changes if the prior is

    changed, the conclusions are usually notoverly sensitive to the prior and any effect

    of the pri or decreases as the amount of

    data increases. A counterargum ent to the

    subjectivity criticism is that subjectivity

    that is inherent in the pri or is explicit and

    must be defensible.

    In the example above, discrete

    hypotheses were used. However,

    continuous parameters are more often the

    targets of Bayesian analyses. In such cases,

    probabili ty density functions replace the

    probabili ties of discrete hypotheses, but

    Bayes rule is still applicable. Figure Iillustrates the prior and posterior density

    curves for the parameter p (probability of

    heads) in a simple coin-flipping experiment

    in which six heads were observed for ten

    flips.Two di fferent prior distributions were

    used, namely a flat [Beta(1,1), Fig. Ia] and a

    more inform ative [Beta(2,2), Fig. Ib] prior

    distribution. As an example of how such

    curves could be used, the posterior

    probabili ty that pis between 0.45 and 0.55

    could be obtained by integrating the

    posterior density curve between 0.45 and

    0.55.

    Box 3. Prior and posterior probabilities

    TRENDS in Ecology & Evolution

    0.00 0.20 0.40 0.60 0.80 1.00

    0.00

    0.55

    1.10

    1.66

    2.21

    2.76

    (a) (b)

    0.00

    0.59

    1.19

    1.78

    2.38

    2.97

    Posterior

    Prior

    0.00 0.20 0.40 0.60 0.80 1.00

    Posterior

    Prior

    Probabilitydensity

    Probabilitydensity

    p p

    Fig. I. Prior and posterior density curves for p(probability of heads) in a coin-flipping experiment, illustrating a

    flat [Beta(1,1)] (a) and an inform ative [Beta(2,2)] (b) prior distr ibuti on.

    )Pr(

    )Pr()Pr()Pr(

    A

    BABAB

    =

    31

    6.0

    )4.0)(5.0()chosenmarbleA/blackurn(Pr ==

    32

    6.0)8.0)(5.0()chosenmarbleA/blackurn(Pr ==

  • 8/6/2019 Lewis, 2001. Phylogenetic System a Tics Turns Over a New Leaf

    5/8

    respectively (Box 1). By estima ting only five

    param eters (the two rate para meters and thr ee

    nucleotide frequen cy par am eters), the entire 61 61ma trix of tra nsition probabilities required to compu te

    the likelihood can be specified. The Muse an d Ga ut

    codon model is implemen ted in Mu se an d Kosakovskys

    Hypoth esis Test ing Usin g Ph ylogenies (HYPHY)

    compu ter pa ckage19. The commonly employed HKY85

    model5 also requires five param eters, but th e two rat e

    param eters in this case represent t he ra tes of

    tran sitions an d tran sversions, rather t han

    nonsynonymous and synonymous substitut ions.

    The codon model proposed by Goldman an d Yang18

    goes a step furt her in a llowing both

    tra nsitiontran sversion bias an d synonymous-

    nonsynonymous su bstitut ion bias. This m odel also

    allows biochemical differen ces among amino acids to

    play a role in determining the ra te at wh ich changes

    am ong amino a cids occur. For exam ple, a chan ge from

    one hydrophobic to an other hydrophobic residue can

    occur at a higher r ate t han a chan ge from a hydrophobic

    to a h ydrophilic residue. Goldman a nd Yangs model is

    implemen ted in Yangs Ph ylogenetic Analysis By

    Maximum Likelihood (PAML) compu ter package16.

    Secondary structure models

    Secondary structure models represent an a ttempt to

    add biological rea lism to an alyses of genes with RNAproducts th at have a secondary stru cture. Tillier and

    Collins20 and Muse21 independently introduced

    TRENDS in Ecology & EvolutionVol.16 No.1 January 2001

    http://tree.trends.com

    34 ReviewReview

    The principles behind M arkov Chain

    Monte Carlo (M CMC) methods can be

    illustrated w ith a simpl e analogy. A robot

    is allow ed to w alk in a square field. The

    robo t takes steps that vary in length and

    random ly selects a direction fo r each

    step. The robot never leaves the field,

    because if a step is about to take itoutside the boundary, it is reflected back

    into th e field. A representative path

    followed by such a robot is illustrated in

    Fig. Iac for 100 (Fig. Ia), 1000 (Fig. Ib)

    and 10 000 steps (Fig. Ic). (The path l ines

    have been removed from Fig. Ic to clarify

    the distribu tion o f steps.) Note that, even

    though the robot i s walking randoml y,

    eventually every portion of the field is

    visited if it is allowed to walk for long

    enough.

    Now suppose that two hills are present

    (represented by bivariate normal densities

    one uncorrelated and w ith a correlation

    coefficient of 0.9) and the robot is

    programm ed to use the following simpl e

    rules (based on the local environm ent)

    when taking steps:

    If a proposed step will take the robotuphi ll, it autom atically takes the step.

    If a proposed step will take the robot

    downhil l, it divides the elevation at the

    proposed location by i ts current

    elevation and on ly takes the step if it

    draws a random number (uniform on

    the open in terval 0,1) that is smaller

    than this quotient.

    The proposal distribution is

    symm etrical, wh ich means that the

    probability of p roposing a step from

    poin t A to B is the same as that of

    proposing a step from point B to A.

    By follow ing these rules, the pathfollowed by the robot w ill take it to points

    in the field in proportion to their

    elevation, with higher points being

    visited more often than low er ones.

    Letting the robot begin i ts walk near the

    upper left-hand corner, its mov ements

    are illustrated in Fig. Idf for 100 (Fig. Id),

    1000 (Fig. Ie) and 10 000 steps (Fig. If).

    (As in Fig. Ic, path lines are not show n in

    Fig. If.)

    The robots movem ents can be used to

    estimate the volum e under any specified

    portion of thi s landscape. In the exampleabove, if the fir st few steps (called the

    burn-in stage) are disregarded then the

    resulting distribution of points is a good

    approximation of the landscape, which is a

    mixtu re of two separate bivariate normal

    densities.The longer the robot is allowed to

    walk, the closer the approxim ation

    becomes.

    This analogy also illustrates one of the

    pitfalls of approximating surfaces using

    MCMC methods. Were the robot to take

    only a few steps, it is clear that it would not

    be likely to encounter the second hi ll.Therefore, care should be taken to ensure

    that Markov chains are run for long enough

    to provide a good approximation of the

    surface over the entire parameter space.

    One simple w ay to investigate this is to run

    several chains, each starting from a

    different (randomly selected) starting point.

    If the resulting approximations are

    substantially di fferent, this indicates that

    none of the chains were run for long

    enough.

    Box 4. Markov Chain Monte Carlo methods

    TRENDS in Ecology & Evolution

    (d) (e) (f)

    I (a) (b) (c)

  • 8/6/2019 Lewis, 2001. Phylogenetic System a Tics Turns Over a New Leaf

    6/8

    models that can accoun t for th e evolutiona ry

    dependence between nu cleotide sites tha t a re

    opposite one anoth er in th e stem r egions of an RNA

    gene product. Following subst itut ions in t hese sites,

    selection favors compen sat ory substitu tions to

    preser ve the sta bility of stem r egions an d hence alsothe secondar y structure t hat is needed for the

    cat alyt ic act ivities of th ese RNA molecules. Muses21

    model requires only a single additiona l free

    par am eter t o accoun t for th e evolutiona ry correlat ion

    between paired stem sites and reduces to a sta ndar d

    nu cleotide substitu tion model (the HKY85 model) if

    no such correla tion exists in the stem regions.

    Applications of likelihood to discrete morphological

    characters

    The most common met hod for obtainin g estimat es of

    an cestral chara cter sta tes for discrete morphological

    chara cters involves par simony optimization,alth ough likelihood-based a ltern at ives have been

    proposed2224. Cunn ingham 25 reviewed an cestral

    chara cter sta te estim at ion. I will therefore focus on a

    differen t a pplication of likelihood models in

    inferences concerning morphological features,

    na mely testin g for evolutiona ry correlat ion.

    Pagel26 described a likelihood ra tio test (Box 2) th at

    allows an investigator to determine whether two

    discret e morphological tr aits a re corr elated t o a grea ter

    extent t ha n can be explained simply by th e phylogeny.

    Two traits a re evolut iona rily correlated if a cha nge in

    sta te of one of the chara cters predisposes the other

    character t o chan ge state soon after. For example, theevolution of feat her s on t he forelimbs of dinosaurs is

    corr elated with th e evolution of wings. In th is case, th e

    appea ra nce of feather s for reasons oth er th an flight 27

    appa ren tly encour aged a chan ge from th e stat e wings

    absen t to wings presen t, which might not h ave

    otherwise occur red. P hylogenies impose correlations

    on chara cters, even if th ose cha ra cters evolve

    independently. Pa gels test addresses wheth er a n

    evolutionar y correlation exists between t wo char acters

    th at a cts to increase th e observed correlation above and

    beyond t he level imposed by the ph ylogeny.

    Suppose tha t t wo characters each change state just

    once on a phylogeny and h appen to cha nge along thesam e segment of th e phylogeny. At first sight , this

    coincidence might a ppear to be the resu lt of an

    evolutionar y corr elation. However, evidence to supp ort

    th is interpr eta tion would be weak, becau se the co-

    evolutionar y event occur red only once. By contra st, th e

    evidence of correlated evolution of the t wo cha ra cters

    would be stronger if cha nges in t he t wo cha ra cters co-

    occur on ma ny different par ts of the phylogeny. Pa gel

    and others have emph asized th e importance of using

    explicit phylogeny-based met hods for assessing t he

    str ength of evidence for such correla tions, becau se th e

    nu mber of degrees of freedom a vailable for the t est can

    otherwise be grea tly exaggera ted26.In Pa gels test, the null hypothesis is tha t t he two

    char acters of inter est ha ve evolved indepen dent ly.

    The gener al (unconstra ined) model allows for some

    correla tion in the evolution of th e two cha ra cters.

    The nu ll model is a const ra ined version of th e genera l

    model, because the indepen dence assum ption

    const ra ins th e corr elation to equal zero. The

    ma ximum of th e likelihood fun ction un der th eunconstr ained model will, thu s, approach tha t of the

    constra ined model as the correlation a pproaches

    zero. If th e tru e corr elation is not zero, th en th e

    ma ximum of th e likelihood fun ction un der th e

    genera l model will be significant ly larger t ha n th at

    which can be att ained by th e null model, reflecting

    the bett er fit of the general model. Pagel argues th at

    examination of the estimated values of the

    para meters of the general model enables a

    deter mina tion of which chara cter is most likely to

    have changed sta te first, precipitat ing a chan ge of

    stat e in t he other character. However, such fine-

    scaled interpret ations should be made with caution,because several paramet ers are estimated un der this

    model using data from only two cha ra cters.

    An inter esting exam ple of the app licat ion of

    Pa gels meth od involves testin g wheth er switches to

    mu tu alism in fungi, eith er in th e form of associat ion

    with algae (i.e. lichen ization) or liverworts, a re

    associat ed with chan ges in the ra te of molecular

    evolution 28. Lutzoni an d Pa gel28 began by classifying

    a grou p of closely rela ted Omphalina mushroom

    species a s eith er fast (F) or slow (S) evolving a nd as

    either m ut ua listic (M) or nonmu tu alistic (N).

    Est imat es of th e ra te of evolution were ba sed on

    nu clear ribosoma l DNA (nr DNA) from t he 25S an d5.8S genes and t he intern al tr anscribed spacer (ITS)

    region. They then obtained likelihoods for both t he

    nu ll model (assumin g independen t evolution of both

    mut ualism an d th e rat e of molecular evolution) and

    th e genera l model (in which th e rat e of evolution an d

    mut ualism can be correlated). The three set s of data

    fit th e correlation model better t han the

    indepen dence model. However, the impr oved fit wa s

    only significant for t he 25S da ta . Therefore, Lut zoni

    and Pa gel28 concluded th at t her e is a corr elation

    between the rat e of molecular evolution a t t he 25S

    nr DNA locus and t he evolut ion of mu tu alism. These

    aut hors also looked at individual rat e para meterstha t were estimat ed in the nonindependence model.

    For the 25S data, the estima ted rat e for the

    tra nsition M,SM,F was significant ly greater t hantha t for N,SN,F, indicating th at increases in th era te of molecular evolution a re m ore likely to occur in

    lineages tha t ar e alread y mutu alistic. Also (again for

    the 25S dat a), the ra te of the t ran sition N,SM,Swas significant ly greater tha n t hat for N,SN,F,indicating t ha t t he t enden cy for slowly evolving

    lineages to evolve mut ualism is greater t han the

    ten dency for nonmu tu alistic linea ges to evolve fast

    ra tes of molecular evolution. This ability to tease

    apar t t he correlations am ong char acters sets Pagelsmethod apa rt from other m ethods, which simply

    assess wheth er a correlation exists.

    TRENDS in Ecology & EvolutionVol.16 No.1 January 2001

    http://tree.trends.com

    35ReviewReview

  • 8/6/2019 Lewis, 2001. Phylogenetic System a Tics Turns Over a New Leaf

    7/8

    TRENDS in Ecology & EvolutionVol.16 No.1 January 2001

    http://tree.trends.com

    36 Review

    References

    1 Felsenstein, J. (1981) Evolutionary trees from

    DNA sequences: a ma ximum likelihood approach.

    J. Mol. Evol. 17, 368376

    2 Felsenstein, J . (1999) PHYLIP, Phylogeny

    Inference Package (University of Washin gton,

    Seat tle), Version 3.572

    3 Ju kes, T.H. and Cantor, C.R. (1969) Evolution of

    protein m olecules. In Mamm alian Protein

    Metabolism (Munro, H.N., ed.), pp. 21132,Academ ic Press

    4 Kimura, M. (1980) A simple method for

    estimating evolutionary rat e of base

    substitut ions t hrough comparative studies of

    nucleotide sequences.J. Mol. Evol . 16, 111120

    5 Hasegawa , M. et a l. (1985) Dating of th e hum an-

    ape split ting by a m olecular clock of

    mitochondria l DNA.J. Mol. Evol. 21, 160174

    6 Kishino, H. and Hasegawa, M. (1989)

    Evaluation of the ma ximum likelihood estimate

    of the evolut ionary t ree topologies from DNA

    sequence data , and the bra nching order in

    hominoidea.J. Mol. Evol . 29, 1701797 Rodrguez, F. et a l. (1990) The gener al

    stochast ic model of nucleotide subst itut ion.

    J. Th eor. Biol . 142, 485501

    8 Churchill, G.A. et al. (1992) Sample size for a

    phylogenetic inference.Mol. Biol. Evol . 9, 753769

    9 Yang, Z. (1994) Maximu m likelihood

    phylogenetic estimat ion from DNA sequences

    with variable rat es over sites: approximate

    methods.J. Mol. Evol . 39, 306314

    10 Yang, Z. (1993) Maximu m-likelihood estima tion

    of phylogeny from DNA sequen ces when

    substitut ion r ates differ over sites.Mol. Biol.

    Evol. 10, 1396140111 Yang, Z. (1996) Among-site rat e variation an d

    its impact on phylogenetic ana lyses. Trends

    Ecol. Evol. 11, 367372

    Acknowledgements

    I would like to thank Kent

    Holsinger, Louise A.

    Lewis, Mark Pagel, Chris

    Simo n and Zih eng Yang

    for their careful reading

    of earlier drafts of this

    article. I also grateful ly

    acknow ledge th e Alfr ed P.Sloan Foundation/

    National Science

    Foundation for funding

    (grant 98-4-5 ME).

    A Bayesian future for model-based phylogenetics?

    Likelihood m ethods in phylogenetics t ake longer tha n

    other met hods. For large data sets, it is not uncommon for

    single heuristic searches to take days, if not month s, for a

    compu ter t o complete. Even when complete, such

    ana lyses only represent a point estimate of a ph ylogeny. Ameasu re of support for individual clades requires BOOT-

    STRAPPING, which effectively tur ns a n a lready time

    consum ing analysis into one that r equires a far grea ter

    amount of time. This situat ion might eventua lly be

    ameliora ted by taking a Bayesian approach. Larget and

    Simon29 have summ arized the recent litera tu re on Baye-

    sian approaches to making inferences about

    phylogenies2934 (S. Li, PhD t hesis, Ohio State U niversity,

    1996; B. Mau , PhD t hesis, University of Wisconsin, 1996)

    and h ave presented novel algorithm s to estimat e

    simultan eoulsy the t ree topology and provide a measure

    of nodal support. These algorithm s have the potential t o

    tr ansform model-based phylogenetic inference.Bayesian a pproaches to sta tistical problems involve

    ma king inferences using th e posterior distr ibution (or

    simply th e posterior) of hypotheses or pa ra met ers of

    inter est (Box 3). This is feasible when problems are

    simple enough for a n a na lytical form ula for the

    poster ior to exist. However , for most rea l problems,

    models involve several para meters a nd, th us, a

    complicated joint poster ior for which t here is no simple

    form ula. The poster ior for a n u nr ooted ph ylogenetic

    tr ee, for examp le, involves at least th e topology (a

    discrete par ameter) and 2N3 branch length

    para meters, whereNis the n um ber of tip nodes.

    Fortu na tely, a technique exists for approxima tingcomplicated posteriors, such as th ose th at a re

    chara cteristic of problems th at involve phylogenies.

    The technique is called Mark ov Chain Mont e Carlo

    (MCMC) or th e Metr opolisHastin gs algorithm 35,36.

    The idea un derlying MCMC is tha t a Ma rkov cha in

    th at t akes t he form of a correla ted ra ndom walk

    thr ough th e para meter spa ce can be conducted in such

    a way th at a ny probability distribut ion (however

    complicated) can be a pproximated by periodically

    sam pling values (Box 4). The approximation can be

    made a rbitrarily accurate by runn ing the Markov

    chain for a sufficient nu mber of steps.

    In ph ylogenetic applicat ions, each step in aMark ov cha in involves a ra ndom m odification of th e

    tree topology, a bra nch length or a param eter in t he

    substitu tion model (e.g. substitu tion rat e ra tio). If th e

    posterior t ha t is compu ted for a proposed step is

    larger than tha t of the current tree topology and

    param eter values, the proposed step is taken.

    Proposed steps that result in downh ill moves on th e

    posterior sur face are not au tomat ic an d depend on the

    ma gnitude of the decrease. The function th atdeterm ines th e probability of a downhill step in th is

    case is based on the ra tio of th e new an d current

    posteriors. Using th ese rules, the Ma rkov cha in

    visits regions of tr ee spa ce (sensu lato, including th e

    tree topology space an d dimen sions for other

    par am eters, such as branch lengths) in proport ion to

    their posterior.

    For a pr actical examp le of how th is meth od could be

    used to answer a question posed by a systemat ist,

    consider t he qu estion, What is th e probability t ha t

    group X is monophyletic? To answer th is quest ion, we

    would run the Mar kov chain, sampling trees

    periodically. Suppose tha t in a sa mple of 100 000 trees,group X appeared a s a m onophyletic group in 74 695

    trees. The pr obability (given th e observed dat a) tha t

    group X is monophyletic is appr oxima tely 0.74695,

    because t he Mar kov chain visits t rees in proportion t o

    their posterior pr obability. This value is a n atu ral

    measu re of nodal support an d is easier to interpret

    than existing measures th at are based on parsimony

    (decay ind ex and bootst ra p values ) or likelihood

    (bootstr ap valu es). Lar get an d Simon29 mak e the point

    tha t only one r un of the Mar kov chain is needed to gain

    such informa tion, compared t o the ma ny bootstr ap

    run s needed in a m aximum likelihood analysis.

    Another import an t app licat ion of MCMC in Bayesianphylogenetic inference involves estimat ing divergence

    times in a relaxed molecular clock model, that is, a

    model in which substitut ion r ates var y across the

    phylogeny, but in which rat es in descendan t lineages are

    correlated with the rat e in th eir common a ncestor37,38.

    MCMC meth ods provide Bayesian credibility int ervals

    for divergence dates with out ha ving to assu me a per fect

    clock and also allow consider able flexibility (an d even

    un certain ty) in how th e clock is calibrated.

    The a bility of MCMC models to provide an swers t o

    virtu ally any question t hat might be imagined by the

    investigator is a t hrilling prospect. The a bility to

    provide an swers mu ch more efficiently th an ispossible with curr ent likelihood met hods should

    ma ke th e Bayesian ap proach to phylogenetics well-

    wort h wat ching in th e next few year s.

    Review

  • 8/6/2019 Lewis, 2001. Phylogenetic System a Tics Turns Over a New Leaf

    8/8

    During the past decade, the explosion of molecular

    techniques h as led to the a ccumu lation of a

    considera ble amoun t of compara tive genetic

    informa tion at t he population level. At the sam e time,

    recent a dvances in population genetics theory,

    especially coalescent theory, ha ve genera ted p owerful

    tools for t he an alysis of intr aspecific data . These t wo

    developmen ts h ave converted int rasp ecific

    phylogenies int o useful tools for test ing a var iety of

    evolutionary a nd populat ion genet ic hypotheses.Several phylogenetic m ethods, especially NETWORK

    (see Glossary) approaches, ha ve been developed t o

    ta ke advan ta ge of the u nique chara cteristics of

    intr aspecific dat a. In th is art icle, we summ ar ize some

    population genetics principles, explain why net works

    ar e appropriat e represent at ions of intr aspecific

    genetic variation, describe an d compa re a vailablemeth ods an d software for net work estimat ion, and

    give exam ples of their application.

    Gene genealogies

    Given a sam ple ofGENES, the relationships among

    th em can be tra ced back in time to a comm on

    an cestral gene. The genealogical pat hways

    interconnecting the current sam ple to th e common

    an cestor const itut e a GENE TREE or gene gene alogy. A

    gene tr ee is the p edigree of a set of genes a nd exists

    independen tly of potent ial mut at ions. The only

    portion of a gene tr ee tha t can genera lly be estimat ed

    with genetic data is that portion m arked by the(potent ial) mu ta tional events th at d efine the different

    ALLELES (Box 1). This lower r esolution tr ee is th e allele

    Intraspecific gene evolution cannot always be represented by a bifurcating t ree.

    Rather, population genealogies are often multifurcated,descendant genes

    coexist with persistent ancestors and recombination events produce reticulate

    relationships.Whereas traditional phylogenetic methods assume bifurcating

    trees,several networking approaches have recently been developed toestimate intraspecific genealogies that take into account these population-

    level phenomena.

    Intraspecific gene genealogies:trees grafting into networksDavid Posadaand Keith A.Crandall

    TRENDS in Ecology & EvolutionVol.16 No.1 January 2001

    http://tree.trends.com 01695347/01/$ see fro nt m atter 2001 Elsevier Science Ltd. A ll righ ts reserved . PII: S0169-5347(00)02026-7

    37ReviewReview

    12 Swofford, D.L. et a l. (1996) Phylogenetic

    inference. In Molecular Syst ematics (Hillis,

    D.M. et al., eds), pp. 407514, Sinau er

    Associates

    13 Huelsenbeck, J.P. an d Cranda ll, K.A. (1997)

    Phylogeny estimation and hypothesis testing

    using maximum likelihood.Annu . Rev. Ecol.Syst . 28, 437466

    14 Lewis, P.O. (1998) Maximum likelihood as an

    alterna tive to parsimony for inferring

    phylogeny using nucleotide sequen ce dat a. In

    Molecular S ystematics of Plants II (Soltis, D.E .

    et al., eds), pp. 132163, Kluwer

    15 Swofford, D.L. (2000) PAUP*, Ph ylogenetic

    Analysis Using Parsimony (Sinauer,

    Sun derlan d, MA), Version 4.0b3,

    16 Yang, Z. (1997) PAML: a program pa ckage for

    phylogenetic analysis by maximu m likelihood.

    CABIOS 13, 555556

    17 Muse, S.V. an d Gaut , B.S. (1994) A likelihood

    approach for comparing synonymous and

    nonsynonymous substitut ion r ates, with

    application to the chloroplast genome.Mol. Biol.

    Evol. 11, 715724

    18 Goldman, N. an d Yang, Z. (1994) A codon-based

    model of nucleotide subst itut ion for protein -

    coding DNA sequ ences.Mol. Biol. Evol . 11,

    725736

    19 Muse, S.V. and Kosakovsky Pond, S.L. (2000)

    HYPHY , Hypothesis Testing Using Phylogenies ,

    (North Carolina State University, Raleigh)

    Version 1

    20 Tillier, E.R.M. an d Collins, R.A. (1995)

    Neighbor joining an d maximu m likelihood with

    RNA sequences: addressing th e interdependence

    of sites.Mol. Biol. Evol . 12, 715

    21 Muse, S.V. (1995) Evolutionary an alyses of DNA

    sequences subject t o constr aints on secondar y

    structure. Genetics 139, 14291439

    22 Schluter, D. et al. (1997) Likelihood of ancestor

    stat es in adaptive radiat ion.Evolution 51,16991711

    23 Mooers, A.. and Schluter, D. (1999)

    Reconstructing ancestor states with m aximum

    likelihood: support for one- and t wo-ra te models.

    Syst. Biol. 48, 623633

    24 Pagel, M. (1999) The maximum likelihood

    approach to reconstr ucting ancestral chara cter

    stat es of discrete characters on phylogenies. Syst.

    Biol. 48, 612622

    25 Cunningham, C.W. et al. (1998) Reconst ru cting

    ancestral character sta tes: a critical reappraisal.

    Trend s Ecol. Evol. 13, 361366

    26 Pagel, M. (1994) Detecting corr elated evolution on

    phylogenies: a general met hod for t he

    compar ative an alysis of discrete chara cters. Proc.

    R. Soc. London B Biol. Sci. 255, 3745

    27 Qiang, J . et al. (1998) Two feather ed dinosau rs

    from northeaster n China.Nature 393, 753761

    28 Lutzoni, F. and P agel, M. (1997) Accelerated

    evolut ion as a consequ ence of tran sitions to

    mutualism. Proc. Natl. Acad. Sci. U. S. A . 94,

    1142211427

    29 Larget, B. and Simon, D.L. (1999) Markov chain

    Monte Ca rlo algorithms for t he Bayesian a nalysis

    of phylogenetic tr ees.Mol. Biol. Evol . 16, 750759

    30 Mau, B. and Newton, M.A. (1997) Phylogenetic

    inference for binar y data on dendrograms using

    Markov chain Monte Car lo.J. Comput. Graph.

    Stat . 6, 122131

    31 Mau, B. et al. (1999) Bayesian phylogenetic

    inference via Mar kov chain Monte Carlo methods.

    Biometrics 55, 112

    32 Newton, M. et al. (1999) Mark ov cha in Monte

    Car lo for th e Bayesian an alysis of evolutionarytrees from a ligned m olecular sequences. In

    S tatistics in Molecular Biology (Seillier-

    Moseiwitch, F. et al., eds), Instit ute of

    Mathema tical Statist ics

    33 Ranna la, B. and Yang, Z. (1996) Probability

    distr ibution of molecular evolutionar y trees: a

    new met hod of phylogenetic inference.J. Mol.

    Evol. 43, 304311

    34 Yang, Z.H. and Rann ala, B. (1997) Bayesian

    phylogenetic inference usin g DNA sequences: a

    Markov Chain Monte Car lo method.Mol. Biol.

    Evol. 14, 717724

    35 Metropolis, N. et al . (1953) Equ at ion of sta te

    calculations by fast computing ma chines.

    J. Chem. Phys. 21, 10871092

    36 Hast ings, W.K. (1970) Monte Carlo sampling

    methods using Markov chains a nd th eir

    applications.Biometrika 57, 97109

    37 Huelsenbeck, J.P. etal. (2000) A compoun d

    Poisson process for relaxing t he molecular clock.

    Genetics 154, 18791892

    38 Thorne, J .L. et al . (1998) Estimating th e rat e of

    evolution of th e rat e of molecular evolution.

    Mol. Biol. Evol . 15, 16471657

    39 Felsenstein, J. (1985) Confidence int ervals on

    phylogenies: an approach using t he bootstr ap.

    Evolution 39, 783791