Lewis, 2001. Phylogenetic System a Tics Turns Over a New Leaf
-
Upload
joao-paulo-bresciani-neto -
Category
Documents
-
view
219 -
download
0
Transcript of Lewis, 2001. Phylogenetic System a Tics Turns Over a New Leaf
-
8/6/2019 Lewis, 2001. Phylogenetic System a Tics Turns Over a New Leaf
1/8
TRENDS in Ecology & EvolutionVol.16 No.1 January 2001
http://tree.trends.com 01695347/01/$ see fron t m atter 2001 Elsevier Science Ltd. A ll righ ts reserved . PII: S0169-5347(00)02025-5
30 Review
As an OPTIMALITYCRITE RION (see Glossary) in
phylogenetics, MAXIMUM LIKELIHOOD has been in usealmost as long as par simony, which remains t he
dominant meth od for phylogenetic ana lyses of
discrete (morph ological an d molecular ) chara cter
dat a. Unlike the other t wo ma jor optimality criteria ,
both par simony and likelihood operat e directly on
discrete character data r ather than on a m atr ix of
pairwise distan ces. This allows parsimony an d
likelihood met hods to be used to estimat e ancestra l
char acter states, in addition t o estimating th e tree
topology. F elsenst eins1 pru nin g algorit hm ma de it
feasible to apply th e likelihood criter ion to nucleic
acid sequence data . His freely distribut ed package of
compu ter pr ogram s for phylogeny inferen ce, calledPH YLIP (Ref. 2), int roduced th e likelihood criter ion
into molecular systema tics and m olecular evolut ion.
Felsens teins 1981 model1 was th e first in a series of
improvement s ma de to the pioneering model of Juk es
and Cant or3 an d was the first to allow nu cleotide
frequencies to differ (the J ukes a nd Cantor m odel3
assum es tha t th e four types of nucleotide are pr esent
in equa l proportions). Fur ther developments led to
models th at accommodate both
tra nsitiontr ansversion substitu tion bias4 and
unequal nucleotide frequencies5,6. This tren d
culminated in the general t ime-reversible (GTR)
model, which n ot only allows unequ al nu cleotide
frequen cies, but also allows each of th e six possible
tra nsitions between nucleotide sta tes (i.e. AC, AG,AT, CG, CT and GT) to occur at different r at es7.
A furt her improvement, used in conjunction with
the abovement ioned m odels, is th e ability to
accommodate variat ion in subst itution ra tes across
sites, using several meth ods810. One of th e most
comm only app lied meth ods for allowing rat es to
vary among sites is the discrete gam ma m ethod,
which a ssumes that relative rat es are distr ibutedaccording to a gam ma distr ibution with a mean of
1 and a var iance of 1/, where defines the sh ape ofthe distribution. Large values of tran slate to a t inyvarian ce in relative rat es (i.e. the ra tes at all sites
are almost equivalent), whereas sm all values (e.g.
0.01) result in an L-shaped distribution of relat ive
rat es (most rat es are low, but some rat es are h igh).
Yang11 reviewed in depth th is approach to
accommodating site-to-site substitu tion ra te
heterogeneity, and other r eviews of this and other
topics includ e Swofford et al.12, Huelsenbeck an d
Crandall13 an d Lewis14.
Some comput er program s tha t ar e used todeter mine ph ylogenies allow other options with
respect to modeling ra te het erogeneity, including:
(1) invar iable sites m odels, in wh ich a proportion of
sites is assum ed incapable of un dergoing
substitution15; (2) site-specific ra tes models, wh ich
allow different r at es for arbitr ar y subsets of sites
th at correspond to different genes, intr ons versus
exons or differe nt codon positions 15; (3) hidd en
Mark ov models which allow for corr elation in t he
rat es at adjacent sites2; and (4) discrete gam ma
models in wh ich can differ between subset s ofsites16. It is also possible to combine an invar iable
sites model with a discrete gam ma model, letting the
Review
Paul O. Lewis
Dept of Ecology and
Evolutionary Biology,
University of Connecticut,
75 N. Eaglevill e Road Uni t
3043, Storr s, CT 06269-3043, USA.
e-mail:
Phylogenetic systematics turns overa new leafPaul O. Lewis
Long restricted to the domain of molecular systematics and studies of
molecular evolution, likelihood methods are now being used in analyses of
discrete morphological data, specifically to estimate ancestral character states
and for tests of character correlation. Biologists are beginning to apply
likelihood models within a Bayesian statistical framework,which promises not
only to provide answers that evolutionary biologists desire,but also to make
practical the application of more realistic evolutionary models.
Bootstrapping: a statistical techniqu e, first applied to phylog enetics
by Felsenstein39, in which new data sets are created by samplin g
randomly (and with replacement) from the origi nal characters.
These new data sets (called bo otstrap d ata sets) are of th e same size
as the origi nal. A desired quantity i s computed for each bootstrap
data set and the resulting distributi on is used to estimate the
dispersion that would be expected if the same num ber of new
independent data sets had been collected. Bootstrappin g assumes
that the original characters were sampled independent ly.
Likelihood:a quantity that is proportional to the probability of the
data (or probabili ty density, if the data are continuous-valu ed), given
specific values for all parameters in the m odel.The likelihoodfunction provides a means to estimate the parameters of the model.
Parameter values that are associated with the global m aximum of
the likelihood function are termed maxim um li kelihood estimates
(MLEs).
Optimality criterion: a rule used to decide which of tw o trees is
best. Four opti mality criteria are currently widely used:
Maximum parsimony the tree requir ing th e fewest character state
changes is the better of th e two t rees.
Maximum likelihood the tree maximizing the likelihood under the
assumed evolutionary model is better.
Minimum evolution the tree having t he smallest sum of branch
lengths (estimated using ordi nary least squares) is better.
Least squares the tree showin g the best fit between
estimated pairw ise distances and the corresponding pairw isedistances obtained by summi ng paths through t he tree is
better.
Glossary
-
8/6/2019 Lewis, 2001. Phylogenetic System a Tics Turns Over a New Leaf
2/8
TRENDS in Ecology & EvolutionVol.16 No.1 January 2001
http://tree.trends.com
31Review
gamma shape parameter () apply only to th evaria ble fraction of sites 15.
This review focuses on new n ucleotide sequen ce-based m odels (codon a nd seconda ry str uctur e
models), new a pplications of likelihood models
(discrete morph ological char acters) an d a new
framework with in wh ich likelihood m odels can be
applied t o phylogenetics (Bayesian inference). First ,
models k nown as codon m odels an d second ar y
stru cture models allow evolutiona ry dependence
am ong nucleotide sites within t he sam e codon an d
between sites th at are opposite one a nother in a stem
region of an rRN A gene, res pectively. By contr ast ,
par simony and previous likelihood models assu me
th at all nucleotide sites ar e evolutionarily
independen t. Therefore, codon a nd seconda rystru cture models represent a significant step forwar d
by allowing recognized evolut ionar y correla tions
between sites to be directly incorpora ted int o a
phylogeny met hod. Second, likelihood is being applied
to discrete morph ological char acters, wh ich ha ve been
limited previously to the r ealm of parsimony an alysis.
Applicat ions t ha t u se likelihood m odels include th e
estima tion of an cestral chara cter stat es and test s for
chara cter correlat ion. Third, in addition to new
models, the very fra mework in which likelihood
models ar e used is cha ngin g. Likelihood models form
th e basis of Bayesian sta tistical methods, and it is
only natu ral th at Bayesian meth ods be applied tophylogenetic problems alrea dy being addressed u sing
likelihood models.
Likelihood models that allow nonindependence
In 1994, Muse and Gau t 17 and Goldman a nd Yang 18
indepen dent ly intr oduced likelihood models tha twere designed t o account for evolutionar y
dependen cy among sites with in codons. This is in
cont ra st t o par simony and pr evious likelihood
models, which were forced to assu me indep enden ce
am ong sites in a gene sequ ence, even if th e
independence a ssumption h ad been violated. For
instance, in genes th at encode proteins, the t hree
nu cleotide sites tha t form a single codon cann ot
evolve indepen dent ly of one a nother if there is
selection for a par ticular a mino acid at th e corre-
sponding site in the polypeptide.
Codon modelsCodon models represent an importa nt advance in
ter ms of the biological r ealism of substitu tion
models. Codon models t ake th e genet ic code explicitly
into accoun t wh en computin g the pr obability of a
change at a site across a bran ch. Thus, rath er tha n
four possible sta tes (A, C, G and T) there a re 61 (64
possible codons, less th e th ree st op codons), which
mean s tha t codon models require a great deal more
compu ta tional effort th an classic DNA an d RNA
subst itut ion models. In par ticular, likelihood
met hods consider each stat e in tur n for every interior
node of a t ree. Un like par simony, the score used for
compar ing trees does not depend on an y part icularcombina tion of (unobserved) an cestral st at es. The
overa ll likelihood is a su m over all sta te
Review
A small portion of the full (6161)INSTANTANEOUS RATEM ATRIX (see Glossary) fo r
the Muse and Gauta codon model isillustrated (Table I). The quanti ties
A,
C,
G
and T
are the equilibrium nucleotide
frequencies, only three of w hich represent
free parameters of the model, because the
fourth can be obtained by subtraction. The
remaining tw o parameters in the model, (synonym ous substitutions) and
(nonsynonymous substitutions), are
relative r ate parameters. The diagonal
elements are not shown, but can becalculated by subtraction. All changes
between the codons that involve more than
one nucleotide substitution have an
instantaneous rate of zero. The remaining
elements represent single nucleotide
substitutions that result in a change from
one codon to another. The rate of any one
of this type of change depends on the
nucleot ide frequencies (e.g. the rate is
lower for substitutions to a relatively rarebase) and the nature of the change
(synonymous versus nonsynonymous).
Reference
a Muse, S.V. and Gaut, B.S. (1994) A likelihood
approach for comparing synonymous a nd
nonsynonymous substitu tion rates, with
application to the chloroplast genome.Mol.Biol.
Evol. 11, 715724
Box Glossary
Instantaneous rate matrix:a square matrix ofsubstitut ion rate parameters that serves to define a
particular substitution m odel. The probability of
observing statej, given starting state i, changes wit h
time in a nonlinear fashion. The rate parameter
represents the slope of this curve evaluated at t= 0.
The elements on the diagonal are negative and equal
to the sum of the other rates in the same row. This
quantity is negative, because the probability of
observing state i, given that iwas the starting state,
decreases with t ime.
Box 1.The anatomy of a codon model
Table I. Part of Muse and Gauts 61 61 instantaneous rate matrixa
Codon before Codon after substitution (the to state)
substitution
(the from state) TTT TTC TTA TTG CTT CTC GGG
(Phe) (Phe) (Leu) (Leu) (Leu) (Leu) (Gly)
TTT (Phe) C
A
G
C
0 0
TTC (Phe) T
A
G
0 C
0
TTA (Leu) T
C
G
0 0 0
TTG (Leu) T
C
A
0 0 0
CTT (Leu) T
0 0 0 C
0
CTC (Leu) 0 T
0 0 T
0
GGG (Gly) 0 0 0 0 0 0
-
8/6/2019 Lewis, 2001. Phylogenetic System a Tics Turns Over a New Leaf
3/8
TRENDS in Ecology & EvolutionVol.16 No.1 January 2001
http://tree.trends.com
32 Review
combinations, in mu ch the same way tha t th e
probability of rolling a seven with two dice is a sum
over th e six possible ways in which the n um bers on
each die can add u p to seven.
Although it would be possible to allow a differen t
ra te for every possible tr an sition bet ween codons,
th e estimat ion of so ma ny par am eters (3660 in
total) would require a n u nreasonable amount of
dat a. To keep th e num ber of free (estima ted)para meters m ana geable, the t wo codon models th at
ha ve been pr oposed ma ke some concessions. F or
instance, neither model permits more tha n a single
change in an y given insta nt (e.g. the ra te of th e
chan ge AAAAGC is 0). However, t his is areasona ble restriction, becau se two changes ar e
allowed to occur in t wo consecut ive infinit esima l
periods of tim e.
The model by Muse an d Gaut 17 assigns rat es to all
other possible chan ges (i.e. the changes th at do not
require more th an two nucleotide point m uta tions) onthe basis of two param eters that represent the r ate of
synonymous an d nonsynonymous subst itutions,
Review
In Pagelsa,b likelihood ratio test, two m odels are
considered, of which one is a more general version of
the other. Normall y, the constrained model
represents the null hypothesis and is nested wi thin(i.e. is a special case of) the m ore general m odel.
A likelihood ratio test returns a significant result if
the data are better explained by the general model (for
which the maximum of the likelihood function isL1)
than by the constrained model that represents the null
hypothesis (for which the likelihood m aximum is L0). A
ratio L1/L
0much greater than 1 indicates signi ficance
and the likelihood ratio test statistic, defined asLR=2(ln L
0 ln L
1), is large and positi ve.LRis distributed as
a chi-squared random variable with degrees of freedom
equal to the difference in the number of free (estimated)
parameters between the two models if the null model is
perfectly nested with in the unconstrained model.
The null hypothesis model for a test of correlated
evolution is based on the follow ing instantaneous
rate matrix (Table I).
The same matr ix is used for each character, but andare estimated separately for each character.Thenumber of free parameters in the null model is thus
equal to 4. In practice, the TRANSITION MATRIX (see Glossary)
that corresponds to this rate matrix is computed for each
character (let the t ransition m atrix for character 1 be P
and let that for character 2 beR), which allow s
probabili ties for specific events to be computed. For
example, the joint probability that character 1 remains in
state 0 and character 2 changes from state 0 to 1 across a
specific segment of the tree, is obtained as follows:
Pr(00,01) =P(00)R(01) [1]The fact that the joint transition p robability is the
product of the separate transition probabili ties foreach character m akes explicit the assumption of
independence in this model.
The general model enables (but does not f orce)
the evolution of the tw o characters to be correlated
and can account for simul taneous changes in the
characters by using a single m atrix (Table II).
This model has eight free parameters. The
maximum -likelihood value under this modelapproaches that of the null m odel if the characters are
evolving independently o f one another. As in the
codon m odels, a rate of 0 is specified for events that
require two changes. The difference in the num ber of
free parameters is 84= 4, so the likelihood ratio teststatistic is nom inally distributed as a2 randomvariable with four degrees of freedom. LRmight not
follow a 2 distribu tion i n this case, because themodels are not strictly nested (owing to the four rates
wi th values of 0 in the general model, but see Ref. b).
Pagel provides (in his com puter program , DISCRETE)
the means for using parametric bootstrapping to
determine the significance level w ithout making thisdistributional assumption .
References
a Pagel, M. (1994) Detecting correlat ed evolution on phylogenies:
a genera l method for th e compar ative ana lysis of discrete
characters.Proc.R . S oc.London B Biol. Sci. 255, 3745
b Pagel, M. (1997) Inferring evolutionary processes from
phylogenies.Zoologica S cripta 26, 331348
Box Glossary
Transition matrix: a matrix that shows the conditional probability
of observing statejgiven starting state iafter some arbitrary am ount
of time (t) for each pair of states iandj.This matrix can be derived
from the instantaneous rate matrix, w hich describes fully the
process at any given instant. (In this context, transition should not
be confused with the same term that is used in conjunction with
transversions.)
Box 2. Likelihood ratio test for character correlation
Table I. Instantaneous rate matrix for the null
hypothesis assuming evolutionary independence
Changing from: Changing to:
0 1
0 1
Table II. Instantaneous rate matrix for the general
hypothesis allowing evolutionary correlation
After state Before state
0, 0 0, 1 1, 0 1, 1
0, 0 ab a b 00, 1 c cd 0 d1, 0 e 0 ef f1, 1 0 g h gh
-
8/6/2019 Lewis, 2001. Phylogenetic System a Tics Turns Over a New Leaf
4/8
TRENDS in Ecology & EvolutionVol.16 No.1 January 2001
http://tree.trends.com
33ReviewReview
Bayesian inference involves interplay
between the likelihood function and the
prio r and posterio r distributions. To
illustrate this concept, consider thefollowi ng simplistic problem. Suppose a
black marbl e has come from one of two
urns that each contain mi llions of marbles.
The challenge is to decide from w hich urn
the marble was removed. Suppose further
that it is known that 40% of the marbles in
urn A and 80% of the marbles in urn B are
black. Choosing a marble from urn B
would be clearly more likely to yield a
black marble. Nevertheless, it is instructive
to calculate the posterior probabili ty that
the marble came from urn B and to
compare this to the posterior probabilitythat it came from urn A.
Bayes Rule, which is used to obtain the
posterior probability from the likelihood and
prior probability, is based on the definition of
conditional probability. Accordingly, for two
events, A and B, the joint probability of A and
Bequals the conditional probability of one,
given the other (i.e. A given B or Bgiven
A) multiplied by the probability of the
condition (Eqn 1):
Pr(A,B)=Pr(A)Pr(BA)=Pr(B)Pr(AB) [1]
Focusing only on the equation on the right ,
divi ding bo th sides by Pr(A) yields Bayes
Rule (Eqn 2):
[2]
In the Bayesian framework, A
represents data and Ba hypothesis (or
parameter):
Pr(BA) is termed the posterior probabil ity
and is the probability of the hypothesis (or
parameter value), given the data.Pr(AB) is termed the likelihood and is
the probability of the data, given the
hypothesis (or parameter value).
Pr(B) is termed the prior probability and
is the unconditional probability of the
hypothesis (or parameter value). This must
be specified by the investigator w ithou t
reference to the data.
Pr(A) is the unconditional probability of
the data, which can be obtained, using the
law of total probability, by calculating the
sum of the product Pr(B)Pr(A/B) for all
possible values of B. This serves as a
norm alizing constant, wh ich ensures that
the sum of the posterior probabilit ies is 1.
In this example, there is only one
datum (the black marble). Therefore,
computing the likelihood (probability ofthe data, given the hypothesis) is
straightforward and is simply the
probability t hat a single marble is black,
given a particular urn hypothesis. The
likelihood is 0.4 for urn A and 0.8 for urn B.
If maximum likelihood were being used to
decide between the two urns, urn B would
be selected, because the li kelihood of
drawing a black marble is greater for urn B
than for urn A (0.8 versus 0.4). However, to
choose a prior distribution that reflects
complete prior ignorance of which urn w as
used when drawi ng the black marb le, wespecify that the prio r probabi lity of each
urn is 0.5. The norm alizing constant is
therefore (Eqn 3):
Pr(black marble chosen) =
(0.5)(0.4)+ (0.5)(0.8)=0.6 [3]
The posterior probability can now be
computed for each urn as follows (Eqns 4,5):
[4]
[5]
Thus, the probability that the black
marble came from urn B, given the datum,
is two thirds.The posterior distribution i s
often described as an updated version of
the prior distribution. In this case, the
posterior distribution (0.33, 0.67)
represents an updated version o f the prior
distribution (0.5, 0.5), the evidence used for
the update being the fact that the single
marble drawn was black. A m ajor benefit of
taking the Bayesian perspective lies in the
fact that it produces probabilities for
hypotheses of in terest, which is exactly
what investigators desire. Likelihoods are
useful, but not easily in terpreted, because
they represent the probability of the datagiven the hypothesis rather than the
probability of the hypothesis given the
data.
A criti cism of Bayesian approaches is
the subjectivity o f the prior. Note that in
the example above (with only a single
datum), it w ould be necessary to stipulate
a prior probability for urn A that was
greater than two thirds to rig the analysis
(that is, to ensure that the conclusion is in
favor of urn A). While the posterior
distribution always changes if the prior is
changed, the conclusions are usually notoverly sensitive to the prior and any effect
of the pri or decreases as the amount of
data increases. A counterargum ent to the
subjectivity criticism is that subjectivity
that is inherent in the pri or is explicit and
must be defensible.
In the example above, discrete
hypotheses were used. However,
continuous parameters are more often the
targets of Bayesian analyses. In such cases,
probabili ty density functions replace the
probabili ties of discrete hypotheses, but
Bayes rule is still applicable. Figure Iillustrates the prior and posterior density
curves for the parameter p (probability of
heads) in a simple coin-flipping experiment
in which six heads were observed for ten
flips.Two di fferent prior distributions were
used, namely a flat [Beta(1,1), Fig. Ia] and a
more inform ative [Beta(2,2), Fig. Ib] prior
distribution. As an example of how such
curves could be used, the posterior
probabili ty that pis between 0.45 and 0.55
could be obtained by integrating the
posterior density curve between 0.45 and
0.55.
Box 3. Prior and posterior probabilities
TRENDS in Ecology & Evolution
0.00 0.20 0.40 0.60 0.80 1.00
0.00
0.55
1.10
1.66
2.21
2.76
(a) (b)
0.00
0.59
1.19
1.78
2.38
2.97
Posterior
Prior
0.00 0.20 0.40 0.60 0.80 1.00
Posterior
Prior
Probabilitydensity
Probabilitydensity
p p
Fig. I. Prior and posterior density curves for p(probability of heads) in a coin-flipping experiment, illustrating a
flat [Beta(1,1)] (a) and an inform ative [Beta(2,2)] (b) prior distr ibuti on.
)Pr(
)Pr()Pr()Pr(
A
BABAB
=
31
6.0
)4.0)(5.0()chosenmarbleA/blackurn(Pr ==
32
6.0)8.0)(5.0()chosenmarbleA/blackurn(Pr ==
-
8/6/2019 Lewis, 2001. Phylogenetic System a Tics Turns Over a New Leaf
5/8
respectively (Box 1). By estima ting only five
param eters (the two rate para meters and thr ee
nucleotide frequen cy par am eters), the entire 61 61ma trix of tra nsition probabilities required to compu te
the likelihood can be specified. The Muse an d Ga ut
codon model is implemen ted in Mu se an d Kosakovskys
Hypoth esis Test ing Usin g Ph ylogenies (HYPHY)
compu ter pa ckage19. The commonly employed HKY85
model5 also requires five param eters, but th e two rat e
param eters in this case represent t he ra tes of
tran sitions an d tran sversions, rather t han
nonsynonymous and synonymous substitut ions.
The codon model proposed by Goldman an d Yang18
goes a step furt her in a llowing both
tra nsitiontran sversion bias an d synonymous-
nonsynonymous su bstitut ion bias. This m odel also
allows biochemical differen ces among amino acids to
play a role in determining the ra te at wh ich changes
am ong amino a cids occur. For exam ple, a chan ge from
one hydrophobic to an other hydrophobic residue can
occur at a higher r ate t han a chan ge from a hydrophobic
to a h ydrophilic residue. Goldman a nd Yangs model is
implemen ted in Yangs Ph ylogenetic Analysis By
Maximum Likelihood (PAML) compu ter package16.
Secondary structure models
Secondary structure models represent an a ttempt to
add biological rea lism to an alyses of genes with RNAproducts th at have a secondary stru cture. Tillier and
Collins20 and Muse21 independently introduced
TRENDS in Ecology & EvolutionVol.16 No.1 January 2001
http://tree.trends.com
34 ReviewReview
The principles behind M arkov Chain
Monte Carlo (M CMC) methods can be
illustrated w ith a simpl e analogy. A robot
is allow ed to w alk in a square field. The
robo t takes steps that vary in length and
random ly selects a direction fo r each
step. The robot never leaves the field,
because if a step is about to take itoutside the boundary, it is reflected back
into th e field. A representative path
followed by such a robot is illustrated in
Fig. Iac for 100 (Fig. Ia), 1000 (Fig. Ib)
and 10 000 steps (Fig. Ic). (The path l ines
have been removed from Fig. Ic to clarify
the distribu tion o f steps.) Note that, even
though the robot i s walking randoml y,
eventually every portion of the field is
visited if it is allowed to walk for long
enough.
Now suppose that two hills are present
(represented by bivariate normal densities
one uncorrelated and w ith a correlation
coefficient of 0.9) and the robot is
programm ed to use the following simpl e
rules (based on the local environm ent)
when taking steps:
If a proposed step will take the robotuphi ll, it autom atically takes the step.
If a proposed step will take the robot
downhil l, it divides the elevation at the
proposed location by i ts current
elevation and on ly takes the step if it
draws a random number (uniform on
the open in terval 0,1) that is smaller
than this quotient.
The proposal distribution is
symm etrical, wh ich means that the
probability of p roposing a step from
poin t A to B is the same as that of
proposing a step from point B to A.
By follow ing these rules, the pathfollowed by the robot w ill take it to points
in the field in proportion to their
elevation, with higher points being
visited more often than low er ones.
Letting the robot begin i ts walk near the
upper left-hand corner, its mov ements
are illustrated in Fig. Idf for 100 (Fig. Id),
1000 (Fig. Ie) and 10 000 steps (Fig. If).
(As in Fig. Ic, path lines are not show n in
Fig. If.)
The robots movem ents can be used to
estimate the volum e under any specified
portion of thi s landscape. In the exampleabove, if the fir st few steps (called the
burn-in stage) are disregarded then the
resulting distribution of points is a good
approximation of the landscape, which is a
mixtu re of two separate bivariate normal
densities.The longer the robot is allowed to
walk, the closer the approxim ation
becomes.
This analogy also illustrates one of the
pitfalls of approximating surfaces using
MCMC methods. Were the robot to take
only a few steps, it is clear that it would not
be likely to encounter the second hi ll.Therefore, care should be taken to ensure
that Markov chains are run for long enough
to provide a good approximation of the
surface over the entire parameter space.
One simple w ay to investigate this is to run
several chains, each starting from a
different (randomly selected) starting point.
If the resulting approximations are
substantially di fferent, this indicates that
none of the chains were run for long
enough.
Box 4. Markov Chain Monte Carlo methods
TRENDS in Ecology & Evolution
(d) (e) (f)
I (a) (b) (c)
-
8/6/2019 Lewis, 2001. Phylogenetic System a Tics Turns Over a New Leaf
6/8
models that can accoun t for th e evolutiona ry
dependence between nu cleotide sites tha t a re
opposite one anoth er in th e stem r egions of an RNA
gene product. Following subst itut ions in t hese sites,
selection favors compen sat ory substitu tions to
preser ve the sta bility of stem r egions an d hence alsothe secondar y structure t hat is needed for the
cat alyt ic act ivities of th ese RNA molecules. Muses21
model requires only a single additiona l free
par am eter t o accoun t for th e evolutiona ry correlat ion
between paired stem sites and reduces to a sta ndar d
nu cleotide substitu tion model (the HKY85 model) if
no such correla tion exists in the stem regions.
Applications of likelihood to discrete morphological
characters
The most common met hod for obtainin g estimat es of
an cestral chara cter sta tes for discrete morphological
chara cters involves par simony optimization,alth ough likelihood-based a ltern at ives have been
proposed2224. Cunn ingham 25 reviewed an cestral
chara cter sta te estim at ion. I will therefore focus on a
differen t a pplication of likelihood models in
inferences concerning morphological features,
na mely testin g for evolutiona ry correlat ion.
Pagel26 described a likelihood ra tio test (Box 2) th at
allows an investigator to determine whether two
discret e morphological tr aits a re corr elated t o a grea ter
extent t ha n can be explained simply by th e phylogeny.
Two traits a re evolut iona rily correlated if a cha nge in
sta te of one of the chara cters predisposes the other
character t o chan ge state soon after. For example, theevolution of feat her s on t he forelimbs of dinosaurs is
corr elated with th e evolution of wings. In th is case, th e
appea ra nce of feather s for reasons oth er th an flight 27
appa ren tly encour aged a chan ge from th e stat e wings
absen t to wings presen t, which might not h ave
otherwise occur red. P hylogenies impose correlations
on chara cters, even if th ose cha ra cters evolve
independently. Pa gels test addresses wheth er a n
evolutionar y correlation exists between t wo char acters
th at a cts to increase th e observed correlation above and
beyond t he level imposed by the ph ylogeny.
Suppose tha t t wo characters each change state just
once on a phylogeny and h appen to cha nge along thesam e segment of th e phylogeny. At first sight , this
coincidence might a ppear to be the resu lt of an
evolutionar y corr elation. However, evidence to supp ort
th is interpr eta tion would be weak, becau se the co-
evolutionar y event occur red only once. By contra st, th e
evidence of correlated evolution of the t wo cha ra cters
would be stronger if cha nges in t he t wo cha ra cters co-
occur on ma ny different par ts of the phylogeny. Pa gel
and others have emph asized th e importance of using
explicit phylogeny-based met hods for assessing t he
str ength of evidence for such correla tions, becau se th e
nu mber of degrees of freedom a vailable for the t est can
otherwise be grea tly exaggera ted26.In Pa gels test, the null hypothesis is tha t t he two
char acters of inter est ha ve evolved indepen dent ly.
The gener al (unconstra ined) model allows for some
correla tion in the evolution of th e two cha ra cters.
The nu ll model is a const ra ined version of th e genera l
model, because the indepen dence assum ption
const ra ins th e corr elation to equal zero. The
ma ximum of th e likelihood fun ction un der th eunconstr ained model will, thu s, approach tha t of the
constra ined model as the correlation a pproaches
zero. If th e tru e corr elation is not zero, th en th e
ma ximum of th e likelihood fun ction un der th e
genera l model will be significant ly larger t ha n th at
which can be att ained by th e null model, reflecting
the bett er fit of the general model. Pagel argues th at
examination of the estimated values of the
para meters of the general model enables a
deter mina tion of which chara cter is most likely to
have changed sta te first, precipitat ing a chan ge of
stat e in t he other character. However, such fine-
scaled interpret ations should be made with caution,because several paramet ers are estimated un der this
model using data from only two cha ra cters.
An inter esting exam ple of the app licat ion of
Pa gels meth od involves testin g wheth er switches to
mu tu alism in fungi, eith er in th e form of associat ion
with algae (i.e. lichen ization) or liverworts, a re
associat ed with chan ges in the ra te of molecular
evolution 28. Lutzoni an d Pa gel28 began by classifying
a grou p of closely rela ted Omphalina mushroom
species a s eith er fast (F) or slow (S) evolving a nd as
either m ut ua listic (M) or nonmu tu alistic (N).
Est imat es of th e ra te of evolution were ba sed on
nu clear ribosoma l DNA (nr DNA) from t he 25S an d5.8S genes and t he intern al tr anscribed spacer (ITS)
region. They then obtained likelihoods for both t he
nu ll model (assumin g independen t evolution of both
mut ualism an d th e rat e of molecular evolution) and
th e genera l model (in which th e rat e of evolution an d
mut ualism can be correlated). The three set s of data
fit th e correlation model better t han the
indepen dence model. However, the impr oved fit wa s
only significant for t he 25S da ta . Therefore, Lut zoni
and Pa gel28 concluded th at t her e is a corr elation
between the rat e of molecular evolution a t t he 25S
nr DNA locus and t he evolut ion of mu tu alism. These
aut hors also looked at individual rat e para meterstha t were estimat ed in the nonindependence model.
For the 25S data, the estima ted rat e for the
tra nsition M,SM,F was significant ly greater t hantha t for N,SN,F, indicating th at increases in th era te of molecular evolution a re m ore likely to occur in
lineages tha t ar e alread y mutu alistic. Also (again for
the 25S dat a), the ra te of the t ran sition N,SM,Swas significant ly greater tha n t hat for N,SN,F,indicating t ha t t he t enden cy for slowly evolving
lineages to evolve mut ualism is greater t han the
ten dency for nonmu tu alistic linea ges to evolve fast
ra tes of molecular evolution. This ability to tease
apar t t he correlations am ong char acters sets Pagelsmethod apa rt from other m ethods, which simply
assess wheth er a correlation exists.
TRENDS in Ecology & EvolutionVol.16 No.1 January 2001
http://tree.trends.com
35ReviewReview
-
8/6/2019 Lewis, 2001. Phylogenetic System a Tics Turns Over a New Leaf
7/8
TRENDS in Ecology & EvolutionVol.16 No.1 January 2001
http://tree.trends.com
36 Review
References
1 Felsenstein, J. (1981) Evolutionary trees from
DNA sequences: a ma ximum likelihood approach.
J. Mol. Evol. 17, 368376
2 Felsenstein, J . (1999) PHYLIP, Phylogeny
Inference Package (University of Washin gton,
Seat tle), Version 3.572
3 Ju kes, T.H. and Cantor, C.R. (1969) Evolution of
protein m olecules. In Mamm alian Protein
Metabolism (Munro, H.N., ed.), pp. 21132,Academ ic Press
4 Kimura, M. (1980) A simple method for
estimating evolutionary rat e of base
substitut ions t hrough comparative studies of
nucleotide sequences.J. Mol. Evol . 16, 111120
5 Hasegawa , M. et a l. (1985) Dating of th e hum an-
ape split ting by a m olecular clock of
mitochondria l DNA.J. Mol. Evol. 21, 160174
6 Kishino, H. and Hasegawa, M. (1989)
Evaluation of the ma ximum likelihood estimate
of the evolut ionary t ree topologies from DNA
sequence data , and the bra nching order in
hominoidea.J. Mol. Evol . 29, 1701797 Rodrguez, F. et a l. (1990) The gener al
stochast ic model of nucleotide subst itut ion.
J. Th eor. Biol . 142, 485501
8 Churchill, G.A. et al. (1992) Sample size for a
phylogenetic inference.Mol. Biol. Evol . 9, 753769
9 Yang, Z. (1994) Maximu m likelihood
phylogenetic estimat ion from DNA sequences
with variable rat es over sites: approximate
methods.J. Mol. Evol . 39, 306314
10 Yang, Z. (1993) Maximu m-likelihood estima tion
of phylogeny from DNA sequen ces when
substitut ion r ates differ over sites.Mol. Biol.
Evol. 10, 1396140111 Yang, Z. (1996) Among-site rat e variation an d
its impact on phylogenetic ana lyses. Trends
Ecol. Evol. 11, 367372
Acknowledgements
I would like to thank Kent
Holsinger, Louise A.
Lewis, Mark Pagel, Chris
Simo n and Zih eng Yang
for their careful reading
of earlier drafts of this
article. I also grateful ly
acknow ledge th e Alfr ed P.Sloan Foundation/
National Science
Foundation for funding
(grant 98-4-5 ME).
A Bayesian future for model-based phylogenetics?
Likelihood m ethods in phylogenetics t ake longer tha n
other met hods. For large data sets, it is not uncommon for
single heuristic searches to take days, if not month s, for a
compu ter t o complete. Even when complete, such
ana lyses only represent a point estimate of a ph ylogeny. Ameasu re of support for individual clades requires BOOT-
STRAPPING, which effectively tur ns a n a lready time
consum ing analysis into one that r equires a far grea ter
amount of time. This situat ion might eventua lly be
ameliora ted by taking a Bayesian approach. Larget and
Simon29 have summ arized the recent litera tu re on Baye-
sian approaches to making inferences about
phylogenies2934 (S. Li, PhD t hesis, Ohio State U niversity,
1996; B. Mau , PhD t hesis, University of Wisconsin, 1996)
and h ave presented novel algorithm s to estimat e
simultan eoulsy the t ree topology and provide a measure
of nodal support. These algorithm s have the potential t o
tr ansform model-based phylogenetic inference.Bayesian a pproaches to sta tistical problems involve
ma king inferences using th e posterior distr ibution (or
simply th e posterior) of hypotheses or pa ra met ers of
inter est (Box 3). This is feasible when problems are
simple enough for a n a na lytical form ula for the
poster ior to exist. However , for most rea l problems,
models involve several para meters a nd, th us, a
complicated joint poster ior for which t here is no simple
form ula. The poster ior for a n u nr ooted ph ylogenetic
tr ee, for examp le, involves at least th e topology (a
discrete par ameter) and 2N3 branch length
para meters, whereNis the n um ber of tip nodes.
Fortu na tely, a technique exists for approxima tingcomplicated posteriors, such as th ose th at a re
chara cteristic of problems th at involve phylogenies.
The technique is called Mark ov Chain Mont e Carlo
(MCMC) or th e Metr opolisHastin gs algorithm 35,36.
The idea un derlying MCMC is tha t a Ma rkov cha in
th at t akes t he form of a correla ted ra ndom walk
thr ough th e para meter spa ce can be conducted in such
a way th at a ny probability distribut ion (however
complicated) can be a pproximated by periodically
sam pling values (Box 4). The approximation can be
made a rbitrarily accurate by runn ing the Markov
chain for a sufficient nu mber of steps.
In ph ylogenetic applicat ions, each step in aMark ov cha in involves a ra ndom m odification of th e
tree topology, a bra nch length or a param eter in t he
substitu tion model (e.g. substitu tion rat e ra tio). If th e
posterior t ha t is compu ted for a proposed step is
larger than tha t of the current tree topology and
param eter values, the proposed step is taken.
Proposed steps that result in downh ill moves on th e
posterior sur face are not au tomat ic an d depend on the
ma gnitude of the decrease. The function th atdeterm ines th e probability of a downhill step in th is
case is based on the ra tio of th e new an d current
posteriors. Using th ese rules, the Ma rkov cha in
visits regions of tr ee spa ce (sensu lato, including th e
tree topology space an d dimen sions for other
par am eters, such as branch lengths) in proport ion to
their posterior.
For a pr actical examp le of how th is meth od could be
used to answer a question posed by a systemat ist,
consider t he qu estion, What is th e probability t ha t
group X is monophyletic? To answer th is quest ion, we
would run the Mar kov chain, sampling trees
periodically. Suppose tha t in a sa mple of 100 000 trees,group X appeared a s a m onophyletic group in 74 695
trees. The pr obability (given th e observed dat a) tha t
group X is monophyletic is appr oxima tely 0.74695,
because t he Mar kov chain visits t rees in proportion t o
their posterior pr obability. This value is a n atu ral
measu re of nodal support an d is easier to interpret
than existing measures th at are based on parsimony
(decay ind ex and bootst ra p values ) or likelihood
(bootstr ap valu es). Lar get an d Simon29 mak e the point
tha t only one r un of the Mar kov chain is needed to gain
such informa tion, compared t o the ma ny bootstr ap
run s needed in a m aximum likelihood analysis.
Another import an t app licat ion of MCMC in Bayesianphylogenetic inference involves estimat ing divergence
times in a relaxed molecular clock model, that is, a
model in which substitut ion r ates var y across the
phylogeny, but in which rat es in descendan t lineages are
correlated with the rat e in th eir common a ncestor37,38.
MCMC meth ods provide Bayesian credibility int ervals
for divergence dates with out ha ving to assu me a per fect
clock and also allow consider able flexibility (an d even
un certain ty) in how th e clock is calibrated.
The a bility of MCMC models to provide an swers t o
virtu ally any question t hat might be imagined by the
investigator is a t hrilling prospect. The a bility to
provide an swers mu ch more efficiently th an ispossible with curr ent likelihood met hods should
ma ke th e Bayesian ap proach to phylogenetics well-
wort h wat ching in th e next few year s.
Review
-
8/6/2019 Lewis, 2001. Phylogenetic System a Tics Turns Over a New Leaf
8/8
During the past decade, the explosion of molecular
techniques h as led to the a ccumu lation of a
considera ble amoun t of compara tive genetic
informa tion at t he population level. At the sam e time,
recent a dvances in population genetics theory,
especially coalescent theory, ha ve genera ted p owerful
tools for t he an alysis of intr aspecific data . These t wo
developmen ts h ave converted int rasp ecific
phylogenies int o useful tools for test ing a var iety of
evolutionary a nd populat ion genet ic hypotheses.Several phylogenetic m ethods, especially NETWORK
(see Glossary) approaches, ha ve been developed t o
ta ke advan ta ge of the u nique chara cteristics of
intr aspecific dat a. In th is art icle, we summ ar ize some
population genetics principles, explain why net works
ar e appropriat e represent at ions of intr aspecific
genetic variation, describe an d compa re a vailablemeth ods an d software for net work estimat ion, and
give exam ples of their application.
Gene genealogies
Given a sam ple ofGENES, the relationships among
th em can be tra ced back in time to a comm on
an cestral gene. The genealogical pat hways
interconnecting the current sam ple to th e common
an cestor const itut e a GENE TREE or gene gene alogy. A
gene tr ee is the p edigree of a set of genes a nd exists
independen tly of potent ial mut at ions. The only
portion of a gene tr ee tha t can genera lly be estimat ed
with genetic data is that portion m arked by the(potent ial) mu ta tional events th at d efine the different
ALLELES (Box 1). This lower r esolution tr ee is th e allele
Intraspecific gene evolution cannot always be represented by a bifurcating t ree.
Rather, population genealogies are often multifurcated,descendant genes
coexist with persistent ancestors and recombination events produce reticulate
relationships.Whereas traditional phylogenetic methods assume bifurcating
trees,several networking approaches have recently been developed toestimate intraspecific genealogies that take into account these population-
level phenomena.
Intraspecific gene genealogies:trees grafting into networksDavid Posadaand Keith A.Crandall
TRENDS in Ecology & EvolutionVol.16 No.1 January 2001
http://tree.trends.com 01695347/01/$ see fro nt m atter 2001 Elsevier Science Ltd. A ll righ ts reserved . PII: S0169-5347(00)02026-7
37ReviewReview
12 Swofford, D.L. et a l. (1996) Phylogenetic
inference. In Molecular Syst ematics (Hillis,
D.M. et al., eds), pp. 407514, Sinau er
Associates
13 Huelsenbeck, J.P. an d Cranda ll, K.A. (1997)
Phylogeny estimation and hypothesis testing
using maximum likelihood.Annu . Rev. Ecol.Syst . 28, 437466
14 Lewis, P.O. (1998) Maximum likelihood as an
alterna tive to parsimony for inferring
phylogeny using nucleotide sequen ce dat a. In
Molecular S ystematics of Plants II (Soltis, D.E .
et al., eds), pp. 132163, Kluwer
15 Swofford, D.L. (2000) PAUP*, Ph ylogenetic
Analysis Using Parsimony (Sinauer,
Sun derlan d, MA), Version 4.0b3,
16 Yang, Z. (1997) PAML: a program pa ckage for
phylogenetic analysis by maximu m likelihood.
CABIOS 13, 555556
17 Muse, S.V. an d Gaut , B.S. (1994) A likelihood
approach for comparing synonymous and
nonsynonymous substitut ion r ates, with
application to the chloroplast genome.Mol. Biol.
Evol. 11, 715724
18 Goldman, N. an d Yang, Z. (1994) A codon-based
model of nucleotide subst itut ion for protein -
coding DNA sequ ences.Mol. Biol. Evol . 11,
725736
19 Muse, S.V. and Kosakovsky Pond, S.L. (2000)
HYPHY , Hypothesis Testing Using Phylogenies ,
(North Carolina State University, Raleigh)
Version 1
20 Tillier, E.R.M. an d Collins, R.A. (1995)
Neighbor joining an d maximu m likelihood with
RNA sequences: addressing th e interdependence
of sites.Mol. Biol. Evol . 12, 715
21 Muse, S.V. (1995) Evolutionary an alyses of DNA
sequences subject t o constr aints on secondar y
structure. Genetics 139, 14291439
22 Schluter, D. et al. (1997) Likelihood of ancestor
stat es in adaptive radiat ion.Evolution 51,16991711
23 Mooers, A.. and Schluter, D. (1999)
Reconstructing ancestor states with m aximum
likelihood: support for one- and t wo-ra te models.
Syst. Biol. 48, 623633
24 Pagel, M. (1999) The maximum likelihood
approach to reconstr ucting ancestral chara cter
stat es of discrete characters on phylogenies. Syst.
Biol. 48, 612622
25 Cunningham, C.W. et al. (1998) Reconst ru cting
ancestral character sta tes: a critical reappraisal.
Trend s Ecol. Evol. 13, 361366
26 Pagel, M. (1994) Detecting corr elated evolution on
phylogenies: a general met hod for t he
compar ative an alysis of discrete chara cters. Proc.
R. Soc. London B Biol. Sci. 255, 3745
27 Qiang, J . et al. (1998) Two feather ed dinosau rs
from northeaster n China.Nature 393, 753761
28 Lutzoni, F. and P agel, M. (1997) Accelerated
evolut ion as a consequ ence of tran sitions to
mutualism. Proc. Natl. Acad. Sci. U. S. A . 94,
1142211427
29 Larget, B. and Simon, D.L. (1999) Markov chain
Monte Ca rlo algorithms for t he Bayesian a nalysis
of phylogenetic tr ees.Mol. Biol. Evol . 16, 750759
30 Mau, B. and Newton, M.A. (1997) Phylogenetic
inference for binar y data on dendrograms using
Markov chain Monte Car lo.J. Comput. Graph.
Stat . 6, 122131
31 Mau, B. et al. (1999) Bayesian phylogenetic
inference via Mar kov chain Monte Carlo methods.
Biometrics 55, 112
32 Newton, M. et al. (1999) Mark ov cha in Monte
Car lo for th e Bayesian an alysis of evolutionarytrees from a ligned m olecular sequences. In
S tatistics in Molecular Biology (Seillier-
Moseiwitch, F. et al., eds), Instit ute of
Mathema tical Statist ics
33 Ranna la, B. and Yang, Z. (1996) Probability
distr ibution of molecular evolutionar y trees: a
new met hod of phylogenetic inference.J. Mol.
Evol. 43, 304311
34 Yang, Z.H. and Rann ala, B. (1997) Bayesian
phylogenetic inference usin g DNA sequences: a
Markov Chain Monte Car lo method.Mol. Biol.
Evol. 14, 717724
35 Metropolis, N. et al . (1953) Equ at ion of sta te
calculations by fast computing ma chines.
J. Chem. Phys. 21, 10871092
36 Hast ings, W.K. (1970) Monte Carlo sampling
methods using Markov chains a nd th eir
applications.Biometrika 57, 97109
37 Huelsenbeck, J.P. etal. (2000) A compoun d
Poisson process for relaxing t he molecular clock.
Genetics 154, 18791892
38 Thorne, J .L. et al . (1998) Estimating th e rat e of
evolution of th e rat e of molecular evolution.
Mol. Biol. Evol . 15, 16471657
39 Felsenstein, J. (1985) Confidence int ervals on
phylogenies: an approach using t he bootstr ap.
Evolution 39, 783791