Distance and Bark Photosynthesis In Aspen Trees In The Front Range
Bioinformatics 1 · Cladistic trees: conserved characters Phenetic trees: measure of distance...
Transcript of Bioinformatics 1 · Cladistic trees: conserved characters Phenetic trees: measure of distance...
Bioinformatics 1: Biology, Sequences, Phylogenetics
Bioinformatics 1 Biology, Sequences, Phylogenetics
Part 4
Sepp Hochreiter
Bioinformatics 1: Biology, Sequences, Phylogenetics
Klausur
Mo. 30.01.2011
Zeit: 15:30 – 17:00
Raum: HS14
Anmeldung Kusss
Bioinformatics 1: Biology, Sequences, Phylogenetics
Contents
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular Phylogenies
5.1.3 Methods
5.2 Maximum Parsimony Methods
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted Parsimony and Bootstrapping
5.2.4 Inconsistency of Maximum Parsimony
5.3 Distance-based Methods
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum Evolution
5.3.4 Neighbor Joining
5.3.5 Distance Measures
5.4 Maximum Likelihood Methods
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
Motivation
central field in biology: relation between species in form of a tree
Root: beginning of life
Leaves: taxa (current species)
Branches: relationship “is ancestor of'' between nodes
Node: split of a species into two
phylogeny (phylo = tribe and genesis)
Cladistic trees: conserved characters
Phenetic trees: measure of distance between the leaves of the tree
(distance as a whole and not based on single features)
Phenetic problems: simultaneous development of features and
different evolution rates
Convergent evolution e.g. finding the best form in water
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
Motivation
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
Motivation
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
Motivation
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
Molecular Phylogenies
Not characteristics like wings, feathers, (morphological)
differences between organisms: RNA, DNA, and proteins
α-hemoglobin
Molecular phylogenetics
is more precise can also distinguish bacteria or viruses
connects all species by DNA
uses mathematical and statistical methods
is model-based as mutations can be modeled
detects remote homologies
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
Molecular Phylogenies
Difficulties in constructing a phylogenetic tree:
different mutation rates
horizontal transfer of genetic material (Horizontal Gene
Transfer, Glycosyl Hydrolase from E.coli to B.subtilis)
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Species tree Gene tree
Bioinformatics 1: Biology, Sequences, Phylogenetics
Molecular Phylogenies
Branches: time in number of mutations
molecular clock: same evolution / mutation rate
number of substitution: Poisson distribution
mutation rate: equally distributed over the sequence
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
Methods
choose the sequences e.g. rRNA (RNA of ribosomes) and
mitochondrial genes (in most organism, enough mutations)
pairwise and multiple sequence alignments
method for constructing a phylogenetic tree
distance-based
maximum parsimony
maximum likelihood
• Maximum parsimony: strong sequence similarities
(few trees), few sequences
• Distance based (CLUSTALW): less similarity, many
sequences
• Maximum likelihood: very variable sequences, high
computational costs
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
Methods
Software
Name Author URL
PHYLIP Felsenstein 89,96 http://evolution.genetics.
washington.edu/phylip.html
PAUP Sinauer Asssociates http://www.lms.si.edu/PAUP
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
Maximum Parsimony
minimize number of mutations
mutations are branches in the tree
tree explains the evolution of the sequences
surviving mutations are rare tree with minimal mutations is
most likely explanation
maximum parsimony PHYLIP programs:
DNAPARS, DNAPENNY, DNACOMP, DNAMOVE, and
PROTPARS
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
Tree Length
maximum parsimony tree: tree with smallest tree length
tree length: number of substitutions in the tree
Example protein triosephosphate isomerase for the taxa
“Human”, “Pig”, “Rye”, “Rice”, and “Chicken”: Human ISPGMI
Pig IGPGMI
Rye ISAEQL
Rice VSAEML
Chicken ISPAMI
If we focus on column 4: Human G
Pig G
Rye E
Rice E
Chicken A
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
Fitch (71) alg. for computing the tree length with taxa at leaves:
1. Root node added to an arbitrary branch
2. bottom-up pass: sets of symbols (amino acids) for a
hypothetical sequence at this node.
Minimize the number mutations by maximal agreement of the
subtrees avoid a mutation at the actual node
In the first case is leave, the second case enforces a
mutation, and the third case avoids a mutation
Tree Length
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
$m_
{12
}$
Bioinformatics 1: Biology, Sequences, Phylogenetics
3. top town pass: hypothetical sequences at the interior nodes;
counts the number of mutations
means that the formula holds for and for
Tree Length
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
Non-informative columns: not used
Columns with one symbol occur multiple an others only single
(number of mutations independent of topology)
Columns with only one symbol
minimal number of substitutions:
maximal subs., star tree (center: most frequent symbol):
number of substitutions for the topology:
consistency index :
High values support the according tree as being plausible
retention index
rescaled consistency index
Tree Length
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
few sequences: all trees and their length can be constructed
for a larger number of sequences heuristics are used
Branch and Bound (Hendy & Penny, 82) for 20 and more taxa:
1. This step determines the addition order of the taxa as follows.
First, compute the core tree of three taxa with maximal length of all three taxa trees. Next the taxa is added to one of
the three branches which leads to maximal tree length. For
the tree with four branches we determine the next taxa which
leads to maximal tree length. …
2. This step determines an upper bound for the tree length by
either distance based methods (neighbor joining) or heuristic
search (stepwise addition algorithm).
Tree Search
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
Branch and Bound (continued):
3. Start with the core tree of three taxa.
4. Construct new tree topologies by stepwise adding new taxa to
the trees which do not possess a STOP mark. The next taxa is
chosen according to the list in step 1 and added to each tree
at all of its branches. Tree lengths are computed.
5. Assign STOP marks if upper bound is exceeded. Terminate if
all trees possess a STOP mark. List of step 1 leads to many
early STOP signals.
Tree Search
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
Tree Search
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
Heuristics for Tree Search (Step 2 in previous algorithm)
Stepwise Addition Algorithm: extend only tree with shortest
length. If all taxa are inserted then perform branch swapping
Branch Swapping:
(1) neighbor interchange: two taxa connected to the same
node,
(2) subtree pruning and regrafting: remove small tree and
connect it with the root to a branch,
(3) bisection-reconnection: remove branch to obtain two trees
and reconnect them by inserting a new branch where each
branch of the subtrees can be connected in contrast to (2)
where the root is connected
Tree Search
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
Heuristics for Tree Search
Branch and Bound Like:
Use step 1. of the branch-and-bound algorithm to obtain the
minimal tree (not maximal!).
Upper local bounds Un for n taxa are constructed in this way.
These upper bounds server for stopping signals.
Tree Search
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
Type of substitution is weighted according to PAM and
BLOSUM matrices to address survival of substituition
Bootstrapping - accesses the variability of the tree with respect to the data
(“variance”) and identifies stable substructures
- is possible because the temporal order of the alignment
columns does not matter
- cannot access the quality of a method but only its robustness
Weighted Parsimony / Bootstrapping
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
Inconsistency of Maximum Parsimony
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
True Tree
Maximum Parsimony
Bioinformatics 1: Biology, Sequences, Phylogenetics
a and b are similar to each other and match to 99% whereas c and
d are not similar to any other sequence and only match 5% by
chance (1 out of 20).
Informative columns: only two symbols and each appears twice.
Probabilities of informative columns and their rate:
90% of the cases of informative columns we observe ci = di.
Maximum parsimony will judge c and d as similar as a and b.
Inconsistency of Maximum Parsimony
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
matrix of pairwise distances between the sequences
A distance D is produced by a metric d (function) on objects x
indexed by i,j,k:
metric d must fulfill
not all scoring schemes are
a metric, e.g. the e-value
Distance-based Methods
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
Unweighted Pair Group Method using arithmetic Averages
(constructive clustering method based on joining pairs of clusters)
It works as follows:
1. each sequence i is a cluster ci with one element ni =1 and
height li = 0. Put all i into a list.
2. Select cluster pair (i,j) from the list with minimal Dij and create
a new cluster ck by joining ci and cj with height lk = Dij / 2 and
number of elements nk = ni + nj.
3. Compute the distances for ck:
4. Remove i and j from the list and add k to the list. If the list
contains only one element then terminate else go to step 2.
UPGMA
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
assumption of constant rate of evolution in different lineages
bootstrap can evaluate the reliability to data variation
Positive interior branches contribute to the quality of the tree
UPGMA
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
minimize (Dij – Eij), where Eij is the sum of distances in the tree on
the path from taxa i to taxa j (the path metric)
The objective is
Fitch and Margoliash, 1967, introduced weighted least squares:
optimized under the constraint of nonnegative branch length
Least Squares
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
If matrix A is the binary topology matrix with N(N-1)/2 rows, one for
each Dij, and v columns for the v branches of the topology. In
each row (i,j) all branches contained in the path from i to j are
marked by 1 and all other branches are 0
l is the v -dimensional vector of branch weights, then
least squares assumption: Dij deviates from Eij according
to a Gaussian distribution εij with mean 0 and variance D2ij:
maximum likelihood estimator (least squares) is
Gaussianity assumption: justified by sufficient large sequences
li are Gaussian and, therefore, also Dij
Least Squares
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
The objective is the sum of branch length' l:
Given an unbiased branch length estimator l˄
The expected value of L is smallest for the true topology
independent of the number of sequences
(Rzhetsky and Nei, 1993)
Minimum evolution is computational expensive
Minimum Evolution
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
The neighbor joining (Saitou and Nei, 1987) simplifies the
minimum evolution method (for fewer than six taxa both
methods give the same result)
Neighbors: taxa that are connected by a single node
Additive metric d: any four elements fulfill
path metric (counting the branch weights) is an additive metric
Neighbor Joining
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
i
k
m
j
Bioinformatics 1: Biology, Sequences, Phylogenetics
An additive metric can be represented by an unique additive tree
construction of an additive tree where a node v is inserted:
Path metric holds:
Neighbor Joining
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
objective of the neighbor joining algorithm is
S the sum of all branch length' lij
starts with a star tree
We assume N taxa with initial (star tree)
objective S0:
where the comes from the fact that , therefore
is part of distances
Neighbor Joining
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
In the next step taxa 1 and 2 are joined and a new internal node Y
is introduced and computed as
set all paths from i to j containing equal to
and solve for . These are all path'
from nodes 1 and 2 to . Therefore
paths start from nodes 1 and 2, each, giving
paths. is in all node 1 paths and
in all node 2 paths. The tail is in one
node 1 and one node 2 path.
Above equation is obtained by averaging
over these equations for .
Neighbor Joining
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
similar as for the initial star tree
inserted into the last equation:
Neighbor Joining
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
generalized from joining (1,2) to (k,l):
net divergences rk are accumulated distances of k to all other taxa:
giving
is constant for all objectives , therefore an
equivalent objective is
If k and l are evolutionary neighbors but is large due to
fast evolution of k and/or l, then rk and/or rl are large and Qkl small
Neighbor Joining
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
The algorithm:
1. Given start with a star tree where the taxa are the leaves.
Put all taxa in a set of objects.
2. For each leave i compute
3. For each pair (i,j) compute
4. Determine the minimal Qij. Join these (i,j) to new leave u.
Compute new branch length’ and new distances of u:
Delete i and j from the set of objects and add.
Stop if the set of objects contains only u otherwise go to Step 1
Neighbor Joining
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
Neighbor joining: algorithm; for larger data sets
The formula for Qij accounts for differences in evolution rates
The objective S is only minimized approximatively
CLUSTALW uses neighbor-joining for multiple alignments
Neighbor Joining
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
we focus on nucleotides!
substitution rates:
Distance Measures
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
Jukes Cantor
Mutation probability:
Identical positions of 2 seq. remain identical:
different nucleotides will be identical:
One changes and the other not:
Two of these events:
Distance Measures
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
Jukes Cantor
difference equation:
continuous model:
The solution for :
Distance Measures
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
Jukes Cantor
The substitutions per position d for two sequences is 2r t
Estimating q and inserting in above equation: estimate d.
The variance of the estimate for d:
Distance Measures
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
Kimura
group nucleotide pairs:
Distance Measures
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
Kimura
transitional substitutions:
transversional substitutions:
equilibrium frequency of each nucleotide: 0.25
However occurrence of GC Drosophila mitochondrial DNA is 0.1
Distance Measures
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
Felsenstein / Tajima-Nei
xij: relative frequency of nucleotide pair (i,j)
Distance Measures
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
Tamura
extends Kimura's model for GC content different from 0.5
Distance Measures
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
Hasegawa (HKY)
hybrid of Kimuras and Felsenstein / Tajima-Nei: GC content and
transition / transversion
Distance Measures
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
Tamura-Nei
includes Hasegawa's model
P1: proportion of transitional differences between A and G
P2: proportion of transitional differences between T and C
Q: proportion of transversional differences
Distance Measures
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
Tree probability: product of the mutation rates in each branch
Mutation rate: product between substitution rate and branch length
D: data, multiple alignment of N sequences (taxa)
Dk: N-dimensional vector at position k of the multiple alignment
A: tree topology (see least squares)
l: vector of branch length
H: number of hidden nodes of the topology A
: model for nucleotide substitution
: set of letters (e.g. the amino acids)
Hidden nodes are indexed from 1 to H
taxa are indexed from H+1 to H+N
Root node has index 1
Maximum Likelihood Methods
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
The likelihood of the tree at the k-th position:
: prior probability of the root node assigned with
: indicates an existing branch
: probability of branch length between and
hidden states are summed out
Prior is estimated or given
Maximum Likelihood Methods
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
If is the Felsenstein / Tajima-Nei equal-input model, the
branch length probabilities are
For and we obtain Jukes-Cantor:
reversible models:
choice of the root does not matter because branch lengths
count independent of their substitution direction
Maximum Likelihood Methods
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
(independent positions)
Felsenstein's (81) pruning algorithm to compute
: probability of a letter a at node i
recursive formula:
computed by dynamic programming from leaves to root
Likelihood:
Maximum Likelihood Methods
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
best tree: both the branch length' and the topology optimized
branch length‘: gradient based / EM (expectation-maximization)
tree topology: Felsenstein (81) growing (constructive) algorithm
start with 3 taxa, at k-th taxa test all (2k-5) branches for insertion
further optimized by local changing the topology
tree topology: small N all topologies can be tested, then local
changes similar to parsimony tree
ML estimator is computationally expensive but unbiased
(sequence length) and asymptotically efficient (minimal variance)
fast heuristics: Strimmer and v. Haeseler (96): all topologies of 4
taxa then build the final tree (software: http://www.tree-puzzle.de/)
Maximum Likelihood Methods
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
From triosephosphat isomerase of different species trees are
constructed with PHYLIP (Phylogeny Inference Package) 3.5c
Examples
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
Fitch-Margoliash
Examples
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
Fitch-Margoliash with assumption of a molecular clock (“kitsch”)
Examples
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
neighbor joining
Examples
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
UPGMA
Examples
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
UPGMA
Examples
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Fitch-Margoliash
neighbor joining
Kitsch
Bioinformatics 1: Biology, Sequences, Phylogenetics
Phylogenetic tree based on triosephosphat isomerase for:
Human
Monkey
Mouse
Rat
Cow
Pig
Goose
Chicken
Zebrafish
Fruit FLY
Rye
Rice
Corn
Soybean
Bacterium
Examples
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
Fitch-Margoliash
Examples
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
Fitch-Margoliash with assumption of a molecular clock (“kitsch”)
Examples
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
neighbor joining
Examples
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
UPGMA
Examples
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
UPGMA
Examples
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Fitch-Margoliash
neighbor joining
Kitsch
Bioinformatics 1: Biology, Sequences, Phylogenetics
relationship between humans and apes
Examples
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
relationship between humans and apes
Examples
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
Examples
5 Phylogenetics
5.1 Motivation
5.1.1 Tree of Life
5.1.2 Molecular
Phylogenies
5.1.3 Methods
5.2 Maximum
Parsimony
5.2.1 Tree Length
5.2.2 Tree Search
5.2.3 Weighted
Parsimony
5.2.4 Inconsistency
5.3 Distance-based
5.3.1 UPGMA
5.3.2 Least Squares
5.3.3 Minimum
Evolution
5.3.4 Neighbor
Joining
5.3.5 Distance
Measures
5.4 Maximum
Likelihood
5.5 Examples
Bioinformatics 1: Biology, Sequences, Phylogenetics
Three Anwers
From where came
the first human?
Africa!
Is Anna Anderson the tsar‘s daughter Anastasia?
No!
Are the neanderthals the
ancestors of the humans?
No! Separate Species
Asiats Europeans Australians Africans
Ancestor
Modern Humans Neanderthals
T
T
T
337
C
C
C
C
C
C
A
A
A
T
T
T
G
G
G
A
A
A
A
A
A
T
T
T
A
A
A
T
T
T
T
T
T
G
G
G
106
T
C
C
C
T
T
324
A
A
A
G
G
G
T
T
T
C
C
C
A
A
A
A
A
A
A
A
A
T
T
T
C
C
C
C
C
C
C
C
C
A
A
A
T
C
C
91
C C Prince Philip (Grand nephew zar)
T C Carl Maucher (Grand nephew
F. Schanzkowska)
T C Anna Anderson
Bioinformatics 1: Biology, Sequences, Phylogenetics
END