. Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1.

24
. Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    217
  • download

    2

Transcript of . Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1.

Page 1: . Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1.

.

Perfect Phylogeny MLE for Phylogeny

Lecture 14

Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1

ינואר 2005: הוספתי שקפים מparsimony בסוף ההרצאה.
Page 2: . Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1.

2

Some Announcements:

The Final Exam will take Place on Friday, 17.2.04, 0900, at Taub 8.

Allowed Material: Course&Tutorial slides+ the textbooks of the course (Durbin et el, Setubal&Meidanis, Gusfield).

Lab offered next semester:

algorithms for constructing phylogenetic trees:

http://www.cs.technion.ac.il/~moran/lab06.htm

Page 3: . Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1.

3

2. The perfect phylogeny problem

A character is assumed to be a property which distinguishes between species (e.g. dental structure).

A characters state is a value of the character (human dental structure).

Problem: Given set of species, specified by their characters, reconstruct their evolutionary tree.

Page 4: . Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1.

4

Input: Partial colorings (C1,…,Ck) of a set of vertices U (in the example: 3 total colorings: left, center, right, each by two colors).

Problem: Is there a tree T=(V,E), s.t. UV and for i=1,…,k,, Ci is a convex (partial) coloring of T?

RBRRRRBBRRRB

The Perfect Phylogeny Problem(pure graph theoretic setting)

NP-Hard In general, in P for some special cases

Page 5: . Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1.

5

Perfect Phylogeny for directed binary characters

Input: a matrix where rows correspond to objects (species), columns to characters.

Each character has two states: 0 (non exists) or 1 (exists).

Question: Is there a directed perfect phylogeny tree for the given species, in which all the characters have value 0 at the root?

C1 C2 C3 C4 C5

A 1 1 0 0 0

B 0 0 1 0 0

C 1 1 0 0 1

D 0 0 1 1 0

E 0 1 0 0 0A

E

D

C

B

(11000)

(00100)

(01000)

(00110)

(11001)

(00000)

Page 6: . Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1.

6

Perfect Phylogeny for directed binary characters

By the definition, for each character C there is one edge in which it is converted from 0 to 1. In the below tree, the edge on which character C2 is converted to 1 is marked. The resulted tree is convex for this character.

C1 C2 C3 C4 C5

A 1

B 0

C 1

D 0

E 1A

E

D

C

B

C2

1

1

1

0 0

0

the edge on which character C2 is converted to 1

Page 7: . Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1.

7

Perfect Phylogeny for directed binary characters

A tree is a directed perfect phylogeny for a given 0-1 matrix M iff we can map each character to an edge s.t. edge labeled by Ci represent changing character Ci’s state from 0 to 1. Below we show such a tree for the given matrix:

C1 C2 C3 C4 C5

A 1 1 0 0 0

B 0 0 1 0 0

C 1 1 0 0 1

D 0 0 1 1 0

E 0 1 0 0 0A

ED

C

B

C4

C3 C2

C1

C5

Page 8: . Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1.

8

Efficient algorithm for the Binary Perfect Phylogeny Problem

Definition: Given a 0-1 matrix M, Ok={j:Mjk=1}, ie: Ok is the set of objects that have character Ck.

Theorem: M has a perfect phylogenetic tree iff the sets {Oi} are laminar, ie: for all i, j, either Oi and Oj are disjoint, or one includes the other.

C1 C2 C3 C4 C5

A 1 1 0 0 0

B 0 0 1 0 0

C 1 1 0 0 1

D 0 0 1 1 0

E 0 1 0 0 0

C1 C2 C3 C4 C5

A 1 1 0 0 0

B 0 0 1 0 1

C 1 1 0 0 1

D 0 0 1 1 0

E 0 1 0 0 1

Laminar Not Laminar

Page 9: . Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1.

9

Proof

: Assume M has a perfect phylogeny, and let Ci, Cj be given.

Consider the edges labeled Ci and Cj.

Case 1: There is a root to leaf path containing both edges. Then one is included in the other (C2 and C1 below).

Case 2: not case 1. Then they are disjoint (C2 and C3).

A

ED

C

B

C4

C3 C2

C1

C5

Page 10: . Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1.

10

Proof (cont.)

: Assume for all i, j, either Oi and Oj are disjoint, or one includes the other. We prove by induction on the number of characters that M has a perfect phylogenetic tree for the matrix.

Basis: one character. Then there are at most two objects, one with and one without this character.

C1

A 1

B 0

C1

AB

Page 11: . Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1.

11

Proof (cont.)

: Induction step: Assume correctness for n-1 characters, and consider a matrix with n characters (non-zero columns). WLOG assume that O1 is not contained in Oj for j > 1.

Let S1 be the set of objects j for which Mj1= 1, and S2 be the remaining objects. Then each character belongs to objects in S1 or S2, but not both (prove!). By induction there are trees T1 and T2 for S1 and S2. Combining them as below gives the desired tree.

C1 C2 C3 C4 C5

A 1 1 0 0 0

B 0 0 1 0 0

C 1 1 0 0 1

D 0 0 1 1 0

E 1 0 0 0 0

T1 T2

1S1={A,C,E}S2={B,D}

Page 12: . Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1.

12

Efficient Implementation1 Sort the columns (characters) by decreasing value when considered as binary numbers. (Time complexity: O(mn), using radix sort).

Claim: If the binary value of column i is larger than that of column j, then Oi is not a proper subset of Oj.

Proof: Oi – Oj > 0 means the 1’s in Oi are not covered by the 1’s in Oj.

C1 C2 C3 C4 C5

A 1 1 0 0 0

B 0 0 1 0 0

C 1 1 0 0 1

D 0 0 1 1 0

E 0 1 0 0 0

C2 C1 C3 C5 C4

A 1 1 0 0 0

B 0 0 1 0 0

C 1 1 0 1 0

D 0 0 1 0 1

E 1 0 0 0 0

Page 13: . Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1.

13

Efficient Implementation(2)2. Make a backwards linked list of the 1’s in each row (leftmost 1 in each row points at itself). Time complexity: O(mn).

C2 C1 C3 C5 C4

A 1 1 0 0 0

B 0 0 1 0 0

C 1 1 0 1 0

D 0 0 1 0 1

E 1 0 0 0 0

Claim: If the columns are sorted, then the set of columns is laminar iff for each column i, all the links leaving column i point at the same column. Can be checked in O(mn) time.

Page 14: . Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1.

14

Examples

A 1 1 0 0 0

B 0 0 1 0 0

C 1 1 0 1 0

D 0 0 1 0 1

E 1 0 0 0 0

A 1 1 0 0 0

B 0 0 1 0 0

C 1 1 0 1 0

D 0 0 1 0 1

E 1 0 1 1 0

laminarNot laminar

Page 15: . Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1.

15

Efficient Implementation(3)3. When the matrix is laminar, the tree edges corresponding to characters are defined by the backwards links in the matrix.

C2 C1 C3 C5 C4

A 1 1 0 0 0

B 0 0 1 0 0

C 1 1 0 1 0

D 0 0 1 0 1

E 1 0 0 0 0A

ED

C

B

C4

C3 C2

C1

C5

remaining edges and leaves are determined by the characters of each object. Needs O(mn) time.

Page 16: . Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1.

16

A scenario where Maximum Parsimony (and Perfect Phylogeny) are misleading

A

A A

1 4

32

Consider a model with 4 letters (DNA), where the probability for a substitution is proportional to time.

In the following topology, 2 and 3 are likely to be as the origin, but 4 and 5 are likely to be different. In this case, Maximum Parsimony principle may be useless or misleading.

Page 17: . Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1.

17

Parsimony may be useless/misleading

AACG

AGGG

I UninformativeII UninformativeIII Uninformative

A

A A

1 4

32

IV MisinformativeAssume the (likely) scenario where leaves 2 and 3 are the same. There are 4 combinations of substitution for leaves 1,4. In the first three, all three topologies will obtain the same parsimony score.

In the fourth, a wrong topology will score best

Page 18: . Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1.

18

Case I Parsimony is Useless

A

A A

1 4

32

AA

1

2

3

4

A

A

A

A

1

3

2

4

A

A

A

A

1

4

2

3

A

A

A

A

Score=0 Score=0 Score=0

Page 19: . Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1.

19

Case II Parsimony is Useless

A

A A

1 4

32

GA

1

2

3

4

A

A

A

G

1

3

2

4

A

A

A

G

1

4

2

3

A

G

A

A

Score=1 Score=1 Score=1

Page 20: . Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1.

20

Case III Parsimony is useless

A

A A

1 4

32

GC

1

2

3

4

A

A

C

G

1

3

2

4

A

A

C

G

1

4

2

3

A

G

C

A

Score=2 Score=2 Score=2

Page 21: . Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1.

21

Case III Parsimony is misleading

A

A A

1 4

32

CC

1

2

3

4

A

A

C

C

1

3

2

4

A

A

C

C

1

4

2

3

A

C

C

A

Score=2 Score=2 Score=1

Page 22: . Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1.

22

Parsimony is correct only in rare cases

A

C A

1 4

32

AC

C A

A

C A

1 4

32

AC

A A

Will infer correctly only in the rare case of a change on the central edge, or

In an even more rare case of a parallel change from A to C on the pendant edges to 1 and 2.

Page 23: . Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1.

23

3. Maximum Likelihood ApproachConsider the phylogenetic tree to be a stochastic process.

AGAGGA

AAAAAG

AAA AGA

AAA

A simple model assumes that in each edge, likelihood of transition from character a to charcter b is given by parameters b|a . The liklihood of a letter a in the root is qa.

Given the complete tree, its probability is defined by the values of the b|a ‘s and the qa’s.

Page 24: . Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1.

24

Maximum Likelihood Approach(2)

When the data consists only of the leaves sequences (but the topology is fixed):

AGAGGA

AAAAAG

Write down the likelihood of the data (leaves sequences) given the tree. Use EM to estimate the b|a

parameters.

When the tree is not given: Search for the tree that maximizes Prob(data|Tree, EM)