1)Origins of Classification -Organization of variation 2) Modern Systematics -Taxonomy and...

Post on 29-Jan-2016

231 views 0 download

Tags:

Transcript of 1)Origins of Classification -Organization of variation 2) Modern Systematics -Taxonomy and...

1) Origins of Classification-Organization of variation

2) Modern Systematics-Taxonomy and phylogenetics

3) Cladistics -Shared derived characters

-Outgroup-Parsimony

4) Maximum Likelihood and Bayesian Inference

Lecture 2: Principles of Phylogenetics

Origins of Biological Classification

Aristotle384-322 BC

“An effort to show the relationships of living things as a scala naturae”1

1C. Singer, A Short History of Biology (1931)

Scala Naturae — From Charles Bonnet's Œuvresd'histoire naturelle et de philosophie, 1781

Linnaeus1707-1778

"God created, Linnaeus organized."

Systematics

Phylogenetic Systematics-Relationships reflected in taxonomy

vertebral column

complete jaw

“bony vertebrates”

4 legs

amniotic egg

Maxilla separated from quadratojugal by jugal

Anatomy of a phylogenetic tree

Node

Outgroup

Terminal taxa

Terminal branch

Sister-taxa

Internalbranch

older splits

younger splits

Common Ancestor

Bifurcating vs multifurcating trees

polytomytrichotomy

A German entomologist, Willi Hennig developed the field of “Phylogenetic Systematics” which provides a framework for reconstructing phylogenies and using them to study evolutionary history

Hennig (1950)

Cladistics-Builds trees by identifying monophyletic groups-All other widely used methods are derived

How do you identify synapomorphies?

Close Outgroups

Distant Outgroups

Amphioxus (Cephalochordate)

Cladistics-Builds trees by identifying monophyletic groups-All other widely used methods are derived

Principle of Parsimony

Heuristic = educated guess; rule of thumb; common sense; a general way to approach problem solving.

3) Beak:

2) Long ears

4) Tail:

1) Gloves:

6) Feathers:

wiley rr bugs daffy tweety happy

0 0 1 0 0 0

1 0 1 0 0 0

0 1 0 1 1 0

1 1 1 1 1 0

0 1 0 1 1 0

character

taxon

5) Appendages:1 1 1 1 1 0

Make a tree: 1) use only derived character states2) minimize evolutionary change

outgroup

1 0 1 1 1 07) Thumb:

4 & 5

bugshappy

wileydaffytweety rr

+ tail+ appendages

3 & 6bugs

happywiley

daffytweety rr

+ beak

+ feathers

3, 4, 5, & 6.bugs

happywiley

daffytweety rr

+ beak

+ tail+ appendages

+ feathers

1, 2, 3, 4, 5, & 6. bugs

happywiley

daffytweety rr

+ beak

+ gloves

+ long ears

+ tail+ appendages

+ feathers

Autapomorphy

Phylogenetically uninformative

1, 2, 3, 4, 5, 6, & 7

bugshappy

wileydaffytweety rr

+ beak

+ gloves

+ long ears

+ tail+ appendages

+ feathers

+ thumb

- thumb

bugshappy

wileydaffytweety rr

+ beak

+ gloves

+ long ears

+ tail+ appendages

+ feathers

+ thumb

+ thumb

1) Exhaustive Search

2) Branch and Bound Search

3) Heuristic Search

Finding the Most Parsimonious Tree

1)ExhaustiveSearch

with stepwise addition of taxa

Exhaustive Searches Rarely Used

N =

The number of bifurcating unrooted trees:

(2n-5)!2n-3(n-3)!

Where n = the number of terminal taxa

For 6 taxa 105 trees

For 20 taxa 2 x 1020 trees

3) Heuristic Search

No guarantee best tree will be foundImpossible to “pass through” poorer trees to get to more parsimonious

Adenine

Guanine

Purines Pyrimidines

Thymine

Cytosine

Transversions

Transitions Transitions

The Problem with Parsimony:

Molecular Phylogenetics

Multiple Substitutions at single sites can lead to “Long-branch attraction”

Weighted Parsimony

(Unweighted) Parsimony

C

CG

A

Maximum Likelihood

4) Repeat for all trees (in a heuristic search)

2) Sum probs across all ancestral reconstructions

3) Sum probs across each site

1) Start with one tree

A C

G T

4 bases6 different types of substitutions

But…we don’t know:

Simplest Model: Jukes-Cantor (JC)

All 6 substitutions - equal probability (α)

Kimura 2-parameter model (K2P)

α= transitions β = transversions

General Time Reversible (GTR)

C

CG

A

Wait…we’re using a tree to infer the model parameters that we will then use to find…the best tree?

Where do the parameters values come from?

T

C

T

ts

tv

tv

Maximum Likelihood Operationally

1. Select a model of sequence evolution; infer parameter values

2. With fixed parameter values, search tree space heuristically, with branch swapping

3. Select the topology that yields the greatest likelihood for the

Summary

Symmetrical Branch Lengths

Asymmetrical Branch Lengths

Positively misleading

Disadvantages of ML

Bayesian Phylogenetic InferenceSimilar to ML except:

1. Model parameters:

2. Simultaneously search

Pr(p|k)

p p

Bayesian Phylogenetic Inference

3. Save trees

Tree topology

Model parameters

Bayesian Phylogenetic InferenceSearching for trees and parameters

Markov-Chain Monte Carlo Search

Start: random tree, model parameter values. Calculate likelihood (L).

Slightly change the tree and/or parameter values; re-calculate L.

Accept or reject new tree/parameter values based on L scores.

Better L scores (fewer changes) are always accepted, lower or equal scores accepted with some probability (“hill-climbing” algorithm = Metropolis sampling)

Advantages of Bayesian Inference

2) Support for clades: evaluated across a large set of likely trees

1) Simultaneous exploration of parameter space and trees

3) MCMC: Faster

Reed et al. (2002)

ML heuristic search: 93 days

MCMC search: 9 daysNearly identical topologies