Systematics and molecular evolution: some history...

47
Systematics and molecular evolution: some history of numerical methods Joe Felsenstein Department of Genome Sciences and Department of Biology University of Washington, Seattle Systematics and molecular evolution: some history of numerical methods – p.1/32

Transcript of Systematics and molecular evolution: some history...

Systematics and molecular evolution: some history ofnumerical methods

Joe Felsenstein

Department of Genome Sciences and Department of Biology

University of Washington, Seattle

Systematics and molecular evolution: some history of numerical methods – p.1/32

Irrelevant?In the 1960’s molecular biologists commonly considered all populationbiology as an irrelevancy at best: mere “stamp collecting"

In science there is only physics,all else is stamp collecting

Ernest Rutherford

By contrast, population biologists have been using molecular methodssince the late 1960’s, but to a degree limited by their limited funding.

Systematics and molecular evolution: some history of numerical methods – p.2/32

The problem

Population biologists have been working on numerical methods forinferring phylogenies (evolutionary trees) since the 1960’s. Theirimportance has only recently been recognized by molecular biologists.The inference of, and use of phylogenies is a central problem incomputational molecular biology, though in bioinformatics textbooks theyusually are covered in only a small part of the book.

Systematics and molecular evolution: some history of numerical methods – p.3/32

Why evolution?

Nothing in biology makes senseexcept in the light of evolution.

Theodosius Dobzhansky, 1973

Perhaps that is an overstatement (what about physics and chemistry?),but we can say that evolution is how the data came to be, and takingevolution into account is the only efficient way of analyzing it.

Systematics and molecular evolution: some history of numerical methods – p.4/32

Ernst Mayr and George Gaylord Simpson

Ernst Mayr (1905-2005) George Gaylord Simpson (1902-1984)

Major figures in the completion of the “modern synthesis” or“Neodarwinian synthesis” in the 1940s, and leaders of the “evolutionarysystematics” approach to taxonomic classification, dominant until the1970s.

Systematics and molecular evolution: some history of numerical methods – p.5/32

A horse tree drawn by Simpson

(tree is getting obese and nonspecific.)

Systematics and molecular evolution: some history of numerical methods – p.6/32

Willi Hennig (1913-1976)

The major advocate and developer of “phylogenetic systematics” whichadvocates that all groups be monophyletic. Outlined a simple method forinferring phylogenies that can be used when characters do not conflict.

Systematics and molecular evolution: some history of numerical methods – p.7/32

Positions on classification as of about 1960Evolutionary systematics. George Gaylord Simpson and ErnstMayr led a movement that allowed non-monophyletic (paraphyletic)groups such as reptiles, on the assumption that groups could beseparated by real differences of rates of evolution (sometimes“grades” rather than “clades”).

Phylogenetic systematics. Willi Hennig advocated purelymonophyletic classification.

Phenetics. Sokal and Sneath advocated making a classificationwithout reference to evolution, using numerical clustering methods

Systematics and molecular evolution: some history of numerical methods – p.8/32

Molecular evolution gets off the ground

Zuckerkandl and Pauling in 1962 discussed using trees to infer ancestralsequences, and named this “chemical paleogenetics". They were about30 years ahead of their time.

Emile Zuckerkandlrecently

Linus in 1962, fromNobel Peace Prize web page

(But then it isn’t fair to anyone to compare them unfavorably to LinusPauling).

Systematics and molecular evolution: some history of numerical methods – p.9/32

Peter Sneath in 1962 and Robert Sokal in 1964

Developers in the late 1950s and early 1960s of numerical clusteringmethods and chief originators and advocates of the “phenetic” approachto classification, which clusters organisms by similarity without referenceto evolutionary history.

Systematics and molecular evolution: some history of numerical methods – p.10/32

The first numerical phylogeny, by Sokal and Michener 1957

Systematics and molecular evolution: some history of numerical methods – p.11/32

Cavalli-Sforza and Edwards, 1963; Edwards, 1970

Introduced (in 1963-1964) the parsimony and likelihood methods forinferring phylogenies, and were co-inventors (with Fitch and Margoliash)of the distance matrix methods. These are the three major methods forreconstructing phylogenies.

Systematics and molecular evolution: some history of numerical methods – p.12/32

The first phylogeny by parsimony

Systematics and molecular evolution: some history of numerical methods – p.13/32

Camin in the 1970s, one of the Caminalcules

Camin noticed (in 1965) that students who did the best job recovering thetrue “phylogeny” of the Caminalcules made the reconstruction whichrequired the fewest changes of state.

Systematics and molecular evolution: some history of numerical methods – p.14/32

J. S. Farris and Arnold Kluge in the 1980s

Further developments of parsimony methods, starting in 1969, advocacyof them and of Hennig’s approach to classification. Central in the rise (in1980) of the Willi Hennig Society.

Systematics and molecular evolution: some history of numerical methods – p.15/32

Margaret Dayhoff

The late Margaret Dayhoff was a pioneer of molecular databases (startingin 1965), made the first use of parsimony methods on molecular data, andpresented trees of gene families in the Atlas of Protein Sequences(whichbecome the Protein Information Resource) in 1966. In the recognition ofgene families she was 30 years ahead of her time.

Margaret Dayhoff in about 1966. Courtesy of Ed Dayhoff.

Systematics and molecular evolution: some history of numerical methods – p.16/32

Walter Fitch: distance methods and DNA parsimony

Walter Fitch (1929-2011) was central figure in methods for analyzingmolecular evolution data, including the first major distance matrix method(1967), and developing the algorithm (in 1971) that counts changes ofstate in the parsimony method for DNA sequences.

Systematics and molecular evolution: some history of numerical methods – p.17/32

Fitch and Margoliash’s 1967 distance tree

Systematics and molecular evolution: some history of numerical methods – p.18/32

Thomas Jukes and Charles Cantor, middle, in the 1990s

In 1969 Jukes and Cantor introduced the first stochastic process model ofDNA change, as one paragraph buried in the midst of a giant review ofprotein sequence evolution.

Cantor later made important technical discoveries in genomics. Jukes wasa nutritional biochemist who was the primary person responsible forensuring that pregnant women get folic acid in their diet.

Systematics and molecular evolution: some history of numerical methods – p.19/32

Jerzy Neyman

A major figure in mathematical statistics, Neyman was enticed in 1971into doing the first likelihood analysis of molecular sequence data.

Systematics and molecular evolution: some history of numerical methods – p.20/32

Further development of statistical methods

“Pruning” algorithm for efficient calculation of likelihoods on trees(me, 1973, but not yet applied to molecules)

Systematics and molecular evolution: some history of numerical methods – p.21/32

Further development of statistical methods

“Pruning” algorithm for efficient calculation of likelihoods on trees(me, 1973, but not yet applied to molecules)

Proof of inconsistency of parsimony (me, 1978; James Cavendertoo, 1978)

Systematics and molecular evolution: some history of numerical methods – p.21/32

Further development of statistical methods

“Pruning” algorithm for efficient calculation of likelihoods on trees(me, 1973, but not yet applied to molecules)

Proof of inconsistency of parsimony (me, 1978; James Cavendertoo, 1978)

Programs distrbuted (WAGNER78, JS Farris 1978; PHYLIP, me,1980; PAUP, Swofford, 1983)

Systematics and molecular evolution: some history of numerical methods – p.21/32

Further development of statistical methods

“Pruning” algorithm for efficient calculation of likelihoods on trees(me, 1973, but not yet applied to molecules)

Proof of inconsistency of parsimony (me, 1978; James Cavendertoo, 1978)

Programs distrbuted (WAGNER78, JS Farris 1978; PHYLIP, me,1980; PAUP, Swofford, 1983)

Pruning algorithm used to make ML work for DNA (me, 1981)

Systematics and molecular evolution: some history of numerical methods – p.21/32

Further development of statistical methods

“Pruning” algorithm for efficient calculation of likelihoods on trees(me, 1973, but not yet applied to molecules)

Proof of inconsistency of parsimony (me, 1978; James Cavendertoo, 1978)

Programs distrbuted (WAGNER78, JS Farris 1978; PHYLIP, me,1980; PAUP, Swofford, 1983)

Pruning algorithm used to make ML work for DNA (me, 1981)

Bootstrap and other sampling methods assess confidence of clades(me, 1985; Penny and Hendy, 1985)

Systematics and molecular evolution: some history of numerical methods – p.21/32

Further development of statistical methods

“Pruning” algorithm for efficient calculation of likelihoods on trees(me, 1973, but not yet applied to molecules)

Proof of inconsistency of parsimony (me, 1978; James Cavendertoo, 1978)

Programs distrbuted (WAGNER78, JS Farris 1978; PHYLIP, me,1980; PAUP, Swofford, 1983)

Pruning algorithm used to make ML work for DNA (me, 1981)

Bootstrap and other sampling methods assess confidence of clades(me, 1985; Penny and Hendy, 1985)

Neighbor-Joining method (Nei and Saitou, 1987)

Systematics and molecular evolution: some history of numerical methods – p.21/32

Further development of statistical methods

“Pruning” algorithm for efficient calculation of likelihoods on trees(me, 1973, but not yet applied to molecules)

Proof of inconsistency of parsimony (me, 1978; James Cavendertoo, 1978)

Programs distrbuted (WAGNER78, JS Farris 1978; PHYLIP, me,1980; PAUP, Swofford, 1983)

Pruning algorithm used to make ML work for DNA (me, 1981)

Bootstrap and other sampling methods assess confidence of clades(me, 1985; Penny and Hendy, 1985)

Neighbor-Joining method (Nei and Saitou, 1987)

Bayesian MCMC inference(Yang/Rannala/Mau/Newton/Larget/Li/Pearl/Doss, 1997-2000)

Systematics and molecular evolution: some history of numerical methods – p.21/32

Sources of work on molecular phylogenies

Populationgenetics

Systematics

Numericalphylogenies

Molecularevolution

FitchDayhoff

Wilson

Cavalli−Sforza

Sokal

Edwards, me

Goodman

Farris

(Sankoff)

(Sneath)

Biochemistry(and molecularbiology)

A noticeable tendency is for much of the influence to come from outside

systematics, and for biochemists (as opposed to “molecular biologists”) to

be an important influence on work in molecular evolution.

Systematics and molecular evolution: some history of numerical methods – p.22/32

Conflict over phylogeny methods, 1970s to 1980s

Hennig’s 1950 book Grundz̈uge einer Theorie der phylogenetischenSystematikis translated as Phylogenetic Systematicsin 1966.

Systematics and molecular evolution: some history of numerical methods – p.23/32

Conflict over phylogeny methods, 1970s to 1980s

Hennig’s 1950 book Grundz̈uge einer Theorie der phylogenetischenSystematikis translated as Phylogenetic Systematicsin 1966.

Advocacy of his views is taken up by a number of systematists(Brundin, Nelson, Platnick) in the 1970s.

Systematics and molecular evolution: some history of numerical methods – p.23/32

Conflict over phylogeny methods, 1970s to 1980s

Hennig’s 1950 book Grundz̈uge einer Theorie der phylogenetischenSystematikis translated as Phylogenetic Systematicsin 1966.

Advocacy of his views is taken up by a number of systematists(Brundin, Nelson, Platnick) in the 1970s.

Some numerical systematists, chiefly J. S. Farris, ally with them.

Systematics and molecular evolution: some history of numerical methods – p.23/32

Conflict over phylogeny methods, 1970s to 1980s

Hennig’s 1950 book Grundz̈uge einer Theorie der phylogenetischenSystematikis translated as Phylogenetic Systematicsin 1966.

Advocacy of his views is taken up by a number of systematists(Brundin, Nelson, Platnick) in the 1970s.

Some numerical systematists, chiefly J. S. Farris, ally with them.

They adopt an evangelical style and attack entrenched evolutionarysystematists, calling themselves “phylogenetic systematists”.

Systematics and molecular evolution: some history of numerical methods – p.23/32

Conflict over phylogeny methods, 1970s to 1980s

Hennig’s 1950 book Grundz̈uge einer Theorie der phylogenetischenSystematikis translated as Phylogenetic Systematicsin 1966.

Advocacy of his views is taken up by a number of systematists(Brundin, Nelson, Platnick) in the 1970s.

Some numerical systematists, chiefly J. S. Farris, ally with them.

They adopt an evangelical style and attack entrenched evolutionarysystematists, calling themselves “phylogenetic systematists”.

Tensions between phylogenetic systematists and others within theNumerical Taxonomy Conference lead to major conflicts at its 1979meeting at Harvard University. Not fun.

Systematics and molecular evolution: some history of numerical methods – p.23/32

Conflict over phylogeny methods, 1970s to 1980s

Hennig’s 1950 book Grundz̈uge einer Theorie der phylogenetischenSystematikis translated as Phylogenetic Systematicsin 1966.

Advocacy of his views is taken up by a number of systematists(Brundin, Nelson, Platnick) in the 1970s.

Some numerical systematists, chiefly J. S. Farris, ally with them.

They adopt an evangelical style and attack entrenched evolutionarysystematists, calling themselves “phylogenetic systematists”.

Tensions between phylogenetic systematists and others within theNumerical Taxonomy Conference lead to major conflicts at its 1979meeting at Harvard University. Not fun.

They found a separate organization, the Willi Hennig Society, in1980 (today their house journal is Cladistics)

Systematics and molecular evolution: some history of numerical methods – p.23/32

More conflict over phylogeny methods, 1980s to today

Many other numerically-oriented people, of various views, foundthemselves outside this group.

Systematics and molecular evolution: some history of numerical methods – p.24/32

More conflict over phylogeny methods, 1980s to today

Many other numerically-oriented people, of various views, foundthemselves outside this group.

Molecular evolutionists mostly avoided these conflicts and haveremained pragmatic and eclectic.

Systematics and molecular evolution: some history of numerical methods – p.24/32

More conflict over phylogeny methods, 1980s to today

Many other numerically-oriented people, of various views, foundthemselves outside this group.

Molecular evolutionists mostly avoided these conflicts and haveremained pragmatic and eclectic.

After about 1990, the Society for Systematic Biology and the Societyof Molecular Biology and Evolution became the gathering points forall these other folks, many of whom defined themselves asphylogenetic systematists.

Systematics and molecular evolution: some history of numerical methods – p.24/32

More conflict over phylogeny methods, 1980s to today

Many other numerically-oriented people, of various views, foundthemselves outside this group.

Molecular evolutionists mostly avoided these conflicts and haveremained pragmatic and eclectic.

After about 1990, the Society for Systematic Biology and the Societyof Molecular Biology and Evolution became the gathering points forall these other folks, many of whom defined themselves asphylogenetic systematists.

In the 1990s use of likelihood and Bayesian methods (and distancemethods interpreted statistically) gradually became recognized as“statistical phylogenetics”, an alternative to the Hennig Society’sparsimony-only approach to reconstructing phylogenies.

Systematics and molecular evolution: some history of numerical methods – p.24/32

More conflict over phylogeny methods, 1980s to today

Many other numerically-oriented people, of various views, foundthemselves outside this group.

Molecular evolutionists mostly avoided these conflicts and haveremained pragmatic and eclectic.

After about 1990, the Society for Systematic Biology and the Societyof Molecular Biology and Evolution became the gathering points forall these other folks, many of whom defined themselves asphylogenetic systematists.

In the 1990s use of likelihood and Bayesian methods (and distancemethods interpreted statistically) gradually became recognized as“statistical phylogenetics”, an alternative to the Hennig Society’sparsimony-only approach to reconstructing phylogenies.

The Hennig Society spurns compromise and has centers of strengthamong morphological systematists.

Systematics and molecular evolution: some history of numerical methods – p.24/32

Positions on classification nowadays

Phylogenetic systematics. Willi Hennig advocated purelymonophyletic classification. Now the (strongly) dominant approach.

Evolutionary systematics. Has almost faded away. Its adherentswere reluctant to make it algorithmic.

Phenetics. Although Sokal and Sneath strongly influenced the fieldof numerical clustering, their approach to biological classificationhas few adherents.IDMVM One person (me) takes the view that It Doesn’t Matter VeryMuch, as we use the phylogeny, and, given that we never use theclassification system. This is widely regarded as a marginalcrackpot view [“A bizarre thumb in the eye to systematists” –Michael Sanderson].

Systematics and molecular evolution: some history of numerical methods – p.25/32

Classification versus phylogenies

It is critically important to realize that the task of making a classificationsystem and the task of making inferences about phylogeny are logicallyseparable. You can infer the phylogeny without yet deciding how it will beused (or not used) in determining the classification.

Many biologists do not understand this.

Most textbooks muddle it thoroughly.

Historians of science and philosophers of science do the same.

Systematics and molecular evolution: some history of numerical methods – p.26/32

Rise of interest in things phylogenetic

050

0010

000

1500

0

1995 2000 2005 2010

year

Web of Science citations that have "phylogen*"

Systematics and molecular evolution: some history of numerical methods – p.27/32

Rise of phylogenetic considerations in genomics

0.10

0.00

0.05

0.15

0.20

1995 2000 2005 2010

year

fraction of Web of Science citations with "genom*"that have "phylogen*"

Systematics and molecular evolution: some history of numerical methods – p.28/32

Rise of statistical-model-based methods

0.00

0.05

0.10

0.15

1995 2000 2005 2010year

fraction of those Web of Science citationsthat have "phylog*" or "cladist*" that also have

"likelihood" or "bayesian"

Systematics and molecular evolution: some history of numerical methods – p.29/32

ReferencesEck, R. V. and M. O. Dayhoff. 1966. Atlas of Protein Sequence and Structure

1966.National Biomedical Research Foundation, Silver Spring, Maryland.Edwards, A. W. F. and L. L. Cavalli-Sforza. 1964. Reconstruction of

evolutionary trees. pp. 67-76 in Phenetic and Phylogenetic Classification,ed.V. H. Heywood and J. McNeill. Systematics Association Publ. No. 6,London.

Felsenstein, J. 2001. The troubled growth of statistical phylogenetics.Systematic Biology50: 465–467.

Felsenstein, J. 2004. Inferring Phylogenies. Sinauer Associates, Sunderland,Massachusetts. (especially Chapter 10).

Fitch, W. M. and E. Margoliash. 1967. Construction of phylogenetic trees.Science155: 279–284.

Hennig, W. 1950. Grundz̈uge einer Theorie der phylogenetischen Systematik.Deutscher Zentralverlag, Berlin.

Systematics and molecular evolution: some history of numerical methods – p.30/32

ReferencesHennig, W. 1966. Phylogenetic Systematics.translated by D. D. Davis and R.

Zangerl. University of Illinois Press, Urbana.Jukes, T. H. and C. Cantor. 1969. Evolution of protein molecules. pp. 21-132

in Mammalian Protein Metabolism,ed. M. N. Munro. Academic Press, NewYork.

Neyman, J. 1971. Molecular studies of evolution: a source of novel statisticalproblems. In Statistical Decision Theory and Related Topics,ed. S. S. Guptaand J. Yackel, pp. 1-27. New York: Academic Press.

Sokal, R. R. and P. H. A. Sneath. 1963. Principles of Numerical Taxonomy. W.H. Freeman and Co., San Francisco.

Zuckerkandl, E. and L. Pauling. 1962. Molecular disease, evolution, andgenetic heterogeneity. pp. 189-225 in Horizons in Biochemistry,ed. M.Kasha and B. Pullman. Academic Press, New York.

Systematics and molecular evolution: some history of numerical methods – p.31/32

ReferencesZuckerkandl, E. and L. Pauling. 1965. Evolutionary divergence and

convergence in proteins. pp. 97-166 in V. Bryson and H.J. Vogel (eds.),Evolving Genes and Proteins, Academic Press, New York.

Zuckerkandl, E., and L. Pauling. 1965. Molecules as documents ofevolutionary history. Journal of Theoretical Biology8(2): 357-366.

Systematics and molecular evolution: some history of numerical methods – p.32/32