. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
-
date post
21-Dec-2015 -
Category
Documents
-
view
221 -
download
2
Transcript of . Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Evolution
Many theories of evolution Basic idea:
speciation events lead to creation of different species
Speciation caused by physical separation into groups where different genetic variants become dominant
Any two species share a (possibly distant) common ancestor
Phylogenies
A phylogeny is a tree that describes the sequence of speciation events that lead to the forming of a set of current day species
Leafs - current day species Nodes - hypothetical most recent common ancestors Edges length - “time” from one speciation to the next
Aardvark Bison Chimp Dog Elephant
Phylogenetic Tree
Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2
leaf
branch internal node
How to construct a Phylogeny?
Until mid 1950’s phylogenies were constructed by experts based on their opinion (subjective criteria)
Since then, focus on objective criteria for constructing phylogenetic trees
Thousands of articles in the last decades
Important for many aspects of biology Classification (systematics) Understanding biological mechanisms
Morphological vs. Molecular
Classical phylogenetic analysis: morphological features
number of legs, lengths of legs, etc.
Modern biological methods allow to use molecular features
Gene sequences Protein sequences
Analysis based on homologous sequences (e.g., globins) in different species
Dangers in Molecular Phylogenies
We have to remember that gene/protein sequence can be homologous for different reasons:
Orthologs -- sequences diverged after a speciation event
Paralogs -- sequences diverged after a duplication event
Xenologs -- sequences diverged after a horizontal transfer (e.g., by virus)
Dangers of Paralogs
Speciation events
Gene Duplication
1A 2A 3A 3B 2B 1B
If we only consider 1A, 2B, and 3A...
Types of Trees
Depending on the model, data from current day species does not distinguish between different placements of the root
vs
Positioning Roots in Unrooted Trees
We can estimate the position of the root by introducing an outgroup:
a set of species that are definitely distant from all the species of interest
Aardvark Bison Chimp Dog Elephant
Falcon
Proposed root
Types of Data
Distance-based Input is a matrix of distances between species Can be fraction of residues they disagree on, or
-alignment score between them, or …
Character-based Examine each character (e.g., residue)
separately
Simple Distance-Based Method
Input: distance matrix between species
Outline: Cluster species together Initially clusters are singletons At each iteration combine two “closest” clusters to
get a new one
UPGMA Clustering
Let Ci and Cj be clusters, define distance between them to be
When combining two clusters, Ci and Cj, to form a new cluster Ck, then
i jCp Cqji
ji qpdCC
1CCd ),(
||||),(
||||
),(||),(||),(
ji
ljjliilk CC
CCdCCCdCCCd
Molecular Clock
UPGMA implicitly assumes that all distances measure time in the same way
1
2 3
42 3 4 1
Additivity
A weaker requirement is additivity In “real” tree, distances between species are the
sum of distances between intermediate nodes
ab
c
i
j
k
cbkjd
cakid
bajid
),(
),(
),(
Consequences of Additivity
Suppose input distances are additive For any three leaves
Thus
ab
c
i
j
k
cbkjd
cakid
bajid
),(
),(
),(
m
)),(),(),((),( jidkjdkid21
kmd
Can we use this fact to construct trees? Let
where
Theorem: if D(i,j) is minimal (among all pairs of leaves), then i and j are neighbors in the tree
Neighbor Joining
)(),(),( ji rrjidjiD
ki kid
Lr ),(
2||
1
Set L to contain all leaves
Iteration: Choose i,j such that D(i,j) is minimal Create new node k, and set
remove i,j from L, and add kTerminate:
when |L| =2, connect two remaining nodes
Neighbor Joining
)),(),(),((2
1),(
),(),(),(
)),((2
1),(
jidmjdmidmkd
kidjidkjd
rrjidkid ji
i
j
m
k