. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.

24
. Class 9: Phylogenetic Trees
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    221
  • download

    2

Transcript of . Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.

.

Class 9: Phylogenetic Trees

The Tree of Life

D’après Ernst Haeckel, 1891

Evolution

Many theories of evolution Basic idea:

speciation events lead to creation of different species

Speciation caused by physical separation into groups where different genetic variants become dominant

Any two species share a (possibly distant) common ancestor

Phylogenies

A phylogeny is a tree that describes the sequence of speciation events that lead to the forming of a set of current day species

Leafs - current day species Nodes - hypothetical most recent common ancestors Edges length - “time” from one speciation to the next

Aardvark Bison Chimp Dog Elephant

Phylogenetic Tree

Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2

leaf

branch internal node

Example: Primate evolution

40-45

mya

35-37

mya

20-25

mya

How to construct a Phylogeny?

Until mid 1950’s phylogenies were constructed by experts based on their opinion (subjective criteria)

Since then, focus on objective criteria for constructing phylogenetic trees

Thousands of articles in the last decades

Important for many aspects of biology Classification (systematics) Understanding biological mechanisms

Morphological vs. Molecular

Classical phylogenetic analysis: morphological features

number of legs, lengths of legs, etc.

Modern biological methods allow to use molecular features

Gene sequences Protein sequences

Analysis based on homologous sequences (e.g., globins) in different species

Dangers in Molecular Phylogenies

We have to remember that gene/protein sequence can be homologous for different reasons:

Orthologs -- sequences diverged after a speciation event

Paralogs -- sequences diverged after a duplication event

Xenologs -- sequences diverged after a horizontal transfer (e.g., by virus)

Dangers of Paralogues

Speciation events

Gene Duplication

1A 2A 3A 3B 2B 1B

Dangers of Paralogs

Speciation events

Gene Duplication

1A 2A 3A 3B 2B 1B

If we only consider 1A, 2B, and 3A...

Types of Trees

A natural model to consider is that of rooted trees

CommonAncestor

Types of Trees

Depending on the model, data from current day species does not distinguish between different placements of the root

vs

Types of trees

Unrooted tree represents the same phylogeny with out the root node

Positioning Roots in Unrooted Trees

We can estimate the position of the root by introducing an outgroup:

a set of species that are definitely distant from all the species of interest

Aardvark Bison Chimp Dog Elephant

Falcon

Proposed root

Types of Data

Distance-based Input is a matrix of distances between species Can be fraction of residues they disagree on, or

-alignment score between them, or …

Character-based Examine each character (e.g., residue)

separately

Simple Distance-Based Method

Input: distance matrix between species

Outline: Cluster species together Initially clusters are singletons At each iteration combine two “closest” clusters to

get a new one

UPGMA Clustering

Let Ci and Cj be clusters, define distance between them to be

When combining two clusters, Ci and Cj, to form a new cluster Ck, then

i jCp Cqji

ji qpdCC

1CCd ),(

||||),(

||||

),(||),(||),(

ji

ljjliilk CC

CCdCCCdCCCd

Molecular Clock

UPGMA implicitly assumes that all distances measure time in the same way

1

2 3

42 3 4 1

Additivity

A weaker requirement is additivity In “real” tree, distances between species are the

sum of distances between intermediate nodes

ab

c

i

j

k

cbkjd

cakid

bajid

),(

),(

),(

Consequences of Additivity

Suppose input distances are additive For any three leaves

Thus

ab

c

i

j

k

cbkjd

cakid

bajid

),(

),(

),(

m

)),(),(),((),( jidkjdkid21

kmd

Can we use this fact to construct trees? Let

where

Theorem: if D(i,j) is minimal (among all pairs of leaves), then i and j are neighbors in the tree

Neighbor Joining

)(),(),( ji rrjidjiD

ki kid

Lr ),(

2||

1

Set L to contain all leaves

Iteration: Choose i,j such that D(i,j) is minimal Create new node k, and set

remove i,j from L, and add kTerminate:

when |L| =2, connect two remaining nodes

Neighbor Joining

)),(),(),((2

1),(

),(),(),(

)),((2

1),(

jidmjdmidmkd

kidjidkjd

rrjidkid ji

i

j

m

k

Distance Based Methods

If we make strong assumptions on distances, we can reconstruct trees

In real-life distances are not additive Sometimes they are close to additive