Phylogenetic Trees
Tutorial 6
• Measuring distance
• Bottom-up algorithm (Neighbor Joining)– Distance based algorithm– Relative distance based
Phylogenetic Trees
Tutorial 6
• Problem: unrelated sequences approach a fraction of difference expected by chance The distance measure converges.
• Jukes-Cantor
, Fraction of sites where residues differi jd f
Measuring Distance
,
3 4log(1 )
4 3i jd f
Measuring Distance (cont)• Euclidean Distance: Given a multiple sequence alignment, calculate the square root of the sum of the score at every position between two sequences
• the score increases proportionally to the extent of dissimilarity between residues
2
,1
( , )n
a b i ii
d s a b
Star StructureAssumption: Divergence of sequences is assumed to occur at constant rate Distance to root equals
a
d
c
b
acb d
fe
Star StructureAssumption: Divergence of sequences is assumed to occur at constant rate Distance to root equals
a
d
c
b
acb d
fe
Unweighted Pair Group Method using Arithmetic AveragesUPGMA
7
a b c d
a 0 8 7 5
b 8 0 3 9
c 7 3 0 8
d 5 9 8 0
a
d
c
b
Basic Algorithm
Initial star diagramDistance matrix
UPGMA Constructs a rooted tree.
8
a b c d
a 0 8 7 5
b 8 0 3 9
c 7 3 0 8
d 5 9 8 0
a
d
c
b
Choose the nodes with the shortest distance and fuse them.
UPGMA: Selection step
9
a b c d
a 0 8 7 5
b 8 0 3 9
c 7 3 0 8
d 5 9 8 0
-Even distance between ce and be-ea, ed by average distance from c anb b
UPGMA: Distance recalculation step
a,d
c
e
b
f
10
dc,b e
a
a,d
c
e
b
f
d
ac
e
b
f
Dbf
a b c d
a 0 8 7 5
b 8 0 3 9
c 7 3 0 8
d 5 9 8 0
1 2
3
acb d
fe
4
11
Neighbor Joining Algorithm
Constructs unrooted tree.
Step by step summary:
1. Calculate all pairwise distances.
2. Pick two nodes (i and j) for which the distance is minimal.
3. Define a new node (x) and re-calculate the distances from the free nodes to the new node.
4. Calculate Dix and Djx - the distance of the chosen nodes I and J to the new node X, as well as the distance from X to all other nodes.
5. Continue until two nodes remain – connect with edge.
Neighbor Joining’ (merging close sequences – not the actual algorithm)
Pick two nodes for which the distance is minimal (i,j)
Node 10 is a new node.
5,6
Re-calculate the distances from new node
I,j : the fused nodes (5,6)X :a new added node (node 10)m :the remaining nodes in the star
, , ,, 2
i m j m i jX m
d d dd
Calculate Dix and Djx
r : ~average distance to nodes L : number of leaves left in the tree (leaves nodes representing taxa, sequences,etc)
,,
,, , ,
2
2
i j i jX i
i j j iX j i j X i
d r rd
d r rd d d
,
,
2
2
i ki
j kj
dr
Ld
rL
Calculate Dix and Djx
r5=ΣD5k/(L-2)= 3.22406/(9-2)=0.46058
r6=ΣD6k/(L-2)= 3.22758/(9-2)=0.461083
ΣD5k
ΣD6k
Calculate Dix and Djx
D10,5=(D5,6+r5-r6)/2=(0.06088+0.46058-0.461083)/2) = 0.0301886
D10,6=D5,6-D10,5=0.06088-0.0301886=0.0306914
0.0301886
0.0306914
Step 2
0.080375
0.044625
Step 3
0.069258
0.040447
Step 4
Step 5
Step 6
Step 7
Problems
0.1 0.10.1
0.40.4
43
1 2
Step by step summary:1. Calculate all pairwise distances.
2. Pick two nodes (i and j) for which the
relative distance is minimal (lowest).
3. Define a new node (x) and re-calculate the distances from the free nodes to the new node.
4. Calculate Dix and Djx - the distance of the chosen nodes I and J to the new node X, as well as the distance from X to all other nodes.
5. Continue until two nodes remain – connect with edge.
Neighbor Joining (Not assuming equal divergence)
Step 2. Pick two nodes (i and j) for which the relative distance is minimal (lowest).
, , ( )i j i j i jM d r r
,
,
2
2
i ki
j kj
dr
Ld
rL
, , ( )i j i j i jM d r r • Negative values
• As the average distance from the common ancestor to the rest of the nodes increases, Mij has a lower value.
• Select pair that produce lowest value
• Reevaluate M with every iteration
JI
X
M
0.1 0.10.1
0.40.4
43
1 2
0.1 0.10.1
0.40.4
43
1 2
Re-calculate the distances from new node
, , ,, 2
i m j m i jX m
d d dd
,,
,, , ,
2
2
i j i jX i
i j j iX j i j X i
d r rd
d r rd d d
,
,
2
2
i ki
j kj
dr
Ld
rL
JI
X
M
33
EXAMPLE
A B C D E
B 5
C 4 7
D 7 10 7
E 6 9 6 5
F 8 11 8 9 8
A B C D E
B -13
C -11 -11
D -10 -10 -10.5
E -10 -10 -11 -13
F -10.5 -10.5 -11 -11.5 -11.5
Original distance Matrix Relative Distance Matrix (Mij)
The Mij Table is used only to choose the closest pairs and not for calculating the distances
1
7
5
3
6
2
4
0.2
Bacillus
1
3
7
5
6
2
4
0.2
1
5
3
7
6
2
4
0.2
3
5
7
1
6
2
4
0.2
Bacillus
Bacillus
Bacillus
E.coli
E.coli E.coli
E.coli
Pseudomonas
Pseudomonas
Pseudomonas
Pseudomonas
Salmonella
Salmonella Salmonella
Salmonella
Aeromonas
Aeromonas
Aeromonas
Aeromonas
Lechevaliera
Lechevaliera
Lechevaliera
Lechevaliera
Burkholderias
Burkholderias
Burkholderias
Burkholderias
Problems with phylogenetic trees
Software
PHYLIP
PAUP
MEGA3
http://evolution.gs.washington.edu/phylip.html
http://paup.csit.fsu.edu/
http://www.megasoftware.net/
http://evolution.genetics.washington.edu/phylip/software.htmlMore
Top Related