. Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees:...
-
date post
21-Dec-2015 -
Category
Documents
-
view
225 -
download
0
Transcript of . Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees:...
![Page 1: . Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d6a5503460f94a48369/html5/thumbnails/1.jpg)
.
Comput. Genomics, Lecture 5b
Character Based Methods for Reconstructing Phylogenetic Trees:
Maximum Parsimony
Based on presentations by Dan Geiger, Shlomo Moran, and Ido Wexler. Modified by Benny Chor.
References: Durbin et al 7.4, Gusfield 17.1-17.3, Setubal&Meidanis 6.1
![Page 2: . Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d6a5503460f94a48369/html5/thumbnails/2.jpg)
2
Phylogenetic Trees - Reminder
Leaves represent objects (genes, species) being compared
• Internal nodes are hypothetical ancestral objects
• In a rooted tree, path from root to a node corresponds to a path in evolutionary time
• An unrooted tree specifies relationships among objects, but not evolutionary time
![Page 3: . Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d6a5503460f94a48369/html5/thumbnails/3.jpg)
3
Parsimony Based Approch
Input: Character data (aligned sequences)
Goal/Output: A labeled tree (labeled internal nodes) that “explains” the data with a minimal number of changes across edges
![Page 4: . Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d6a5503460f94a48369/html5/thumbnails/4.jpg)
4
Parsimony: An Example
Various trees that could explain the phylogeny of the following four sequences: AAG, AAA, GGA, AGA. For example,
AAA
AAA AAA
AGA AAAAAG GGA
AAA
AAA AGA
AGAAAAAAG GGA
Parsimony prefers the second tree to the first, because it requires less substitution events (three vs. four changes).
![Page 5: . Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d6a5503460f94a48369/html5/thumbnails/5.jpg)
5
Big and Small Parsimony
Usually the approaches to finding a maximum parsimony tree have two separate components:
A search through the space of trees (BIG parsimony)
Given a specific tree topology, find an assignment of “ancestral labels” to internal nodes as to the minimize the total number of changes across tree edges (small parsimony)
![Page 6: . Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d6a5503460f94a48369/html5/thumbnails/6.jpg)
6
Formally: Big Parsimony
Input: Character data (aligned sequences)
Goal/Output: A labeled tree (labeled internal nodes) that minimizes number of changes across edges (over all trees and internal labelings).
![Page 7: . Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d6a5503460f94a48369/html5/thumbnails/7.jpg)
7
Formally: Small Parsimony
Input: Character data (aligned sequences) and a tree with sequences at leaves.
Goal/Output: A labeling of internal nodes that minimizes number of changes across edges (over all internal labelings).
![Page 8: . Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d6a5503460f94a48369/html5/thumbnails/8.jpg)
8
Big, Small, and Weighted Parsimony
Small parsimony has a linear time solution (Fitch’ algorithm).
BIG parsimony is NP hard: An easy reduction from vertex cover, that will be shown soon (on the board).
Weighted small parsimony also has a linear time solution (Sankoff’s algorithm, dynamic programming).
![Page 9: . Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d6a5503460f94a48369/html5/thumbnails/9.jpg)
9
Small Parsimony: Fitch’s Algorithm
Traverse tree “up”, from leaves to root, finding sets of possible ancestral states (labels) for each internal node.
Traverse tree “down”, from root to leaves, determining ancestral states (labels) for internal nodes.
Key observation: Different sites are independent. Can solve one site at a time.
![Page 10: . Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d6a5503460f94a48369/html5/thumbnails/10.jpg)
10
Fitch’s Algorithm – Step 1
Do a post-order (from leaves to root) traversal of tree
Find out possible states Ri of internal node i with children j and k
otherwiseRR
RRifRRR
kj
kjkj
i
![Page 11: . Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d6a5503460f94a48369/html5/thumbnails/11.jpg)
11
Fitch’s Algorithm – Step 1
# of changes = # union operations
T
T
CT
T
C T AG T
AGT
GT
![Page 12: . Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d6a5503460f94a48369/html5/thumbnails/12.jpg)
12
Fitch’s Algorithm – Step 2
Do a pre-order (from root to leaves) traversal of tree
Select state rj of internal node j with parent i
otherwiseRstatearbitrary
Rrifrr
j
jii
j
![Page 13: . Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d6a5503460f94a48369/html5/thumbnails/13.jpg)
13
Fitch’s Algorithm – Step 2
T
T
CT
T
C T AG T
AGT
GT
T
T
CT
T
C T AG T
AGT
GT
T
T
CT
T
C T AG T
AGT
GT
T
T
CT
T
C T AG T
AGT
GT
T
T
CT
T
C T AG T
AGT
GT
T
T
CT
T
C T AG T
AGT
GT
![Page 14: . Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d6a5503460f94a48369/html5/thumbnails/14.jpg)
14
Weighted Version
Instead of assuming all state changes are unit cost ( equally likely), use different costs S(a,b) for different changes
1st step of algorithm is to propagate costs up through tree
ba
![Page 15: . Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d6a5503460f94a48369/html5/thumbnails/15.jpg)
15
Weighted Version of Fitch’s Algorithm
Want to determine min. cost Ri(a) of assigning character a to node i
for leaves:
otherwise
leafatcharacteraisaifaRi
0)(
![Page 16: . Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d6a5503460f94a48369/html5/thumbnails/16.jpg)
16
Weighted Version of Fitch’s Algorithm
want to determine min. cost Ri(a)of assigning character a to node i
for internal nodes:
)),()((min)),()((min)( baSbRbaSbRaR kb
jb
i
a
b
i
j kba
![Page 17: . Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d6a5503460f94a48369/html5/thumbnails/17.jpg)
17
Weighted Version of Fitch’s Algorithm – Step 2
do a pre-order (from root to leaves) traversal of tree
select minimal cost character for root
For each internal node j, select character that produced minimal cost at parent i
![Page 18: . Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d6a5503460f94a48369/html5/thumbnails/18.jpg)
18
Big Parsimony: Exploring the Space of Trees
(2 3)!!n
We’ve considered small parsimony: How to find the
minimum number of changes for a given tree topology
To solve big parsimony, need some search procedure for exploring the space of tree topologies
There are unrooted trees on n leaves
(2 3)!! 3 5 (2 3)n n
![Page 19: . Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d6a5503460f94a48369/html5/thumbnails/19.jpg)
19
Exploring the Space of Trees
taxa (n) # trees4 155 1056 9458 135,13510 30,405,375
![Page 20: . Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d6a5503460f94a48369/html5/thumbnails/20.jpg)
20
Does This Implies Big MP is Hard?
taxa (n) # trees4 155 1056 9458 135,13510 30,405,375
Not necessarily: There could be some smarter way to zoom directly to best topology.
But: We will show hardness of Big MP by a (simple) reduction from vertex cover (VC).
![Page 21: . Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d6a5503460f94a48369/html5/thumbnails/21.jpg)
21
Big MP is NP Hard !
First, define VC and VC for triangle free graphs.Then…
1. You will show a poly time reduction from VC to VC for
triangle free graphs as part of home assignment (easy).
2. In class, I will show a poly time reduction from
VC for triangle free graphs to Big MP
(old style, white board proof).
• This establishes NP hardness of Big MP.