Inferring Phylogeny using Permutation Patterns on Genomic Data

22
Inferring Phylogeny using Permutation Patterns on Genomic Data 1 Md Enamul Karim 2 Laxmi Parida 1 Arun Lakhotia 1 University of Louisiana at Lafayette 2 IBM T. J. Watson Research Center

description

Inferring Phylogeny using Permutation Patterns on Genomic Data. 1 Md Enamul Karim 2 Laxmi Parida 1 Arun Lakhotia 1 University of Louisiana at Lafayette 2 IBM T. J. Watson Research Center. Phylogeny. - PowerPoint PPT Presentation

Transcript of Inferring Phylogeny using Permutation Patterns on Genomic Data

Page 1: Inferring Phylogeny using Permutation Patterns on Genomic Data

Inferring Phylogeny using Permutation Patterns on Genomic Data1Md Enamul Karim2Laxmi Parida1Arun Lakhotia

1University of Louisiana at Lafayette2IBM T. J. Watson Research Center

Page 2: Inferring Phylogeny using Permutation Patterns on Genomic Data

Phylogeny

Reconstruction of the evolutionary relationship of a collection of organisms, usually in the form of a tree.

Page 3: Inferring Phylogeny using Permutation Patterns on Genomic Data

Phylogenetic data Behavioral, morphological,

metabolic, etc. Molecular data: sequence data,

gene-order data etc.gene-order data

Page 4: Inferring Phylogeny using Permutation Patterns on Genomic Data

Why gene order data?

Low error rate. Rare evolutionary events unlikely

to cause “silent" changes; can help inferring millions of years.

Page 5: Inferring Phylogeny using Permutation Patterns on Genomic Data

Genomes rearrangements

• Inverted Transposition

1 2 3 9 -8 –7 –6 –5 –4 10

• Inversion

1 2 3 –8 –7 –6 –5 -4 9 10

• Transposition

1 2 3 9 4 5 6 7 8 10

1 2 3 4 5 6 7 8 9 10

Page 6: Inferring Phylogeny using Permutation Patterns on Genomic Data

Breakpoint distance

Breakpoints are number of adjacencies present in one genome, but not in the other.

1 2 3 4 5 6 7 8 9 10

1 –3 –2 4 5 9 6 7 8 10

For some datasets, a close-to-linear relationship between the breakpoints and evolutionary events may exist.

Can be used for building phylogeny (Blanchette et al.).

Page 7: Inferring Phylogeny using Permutation Patterns on Genomic Data

Limitations of breakpoint The number of breakpoints created by a

certain number of inversions may vary. Also, transpositions generally create more

breakpoints than inversions. Computing the breakpoint phylogeny is

NP-hard.

Page 8: Inferring Phylogeny using Permutation Patterns on Genomic Data

MPBE (Maximum Parsimony on Binary Encoding)

A heuristic for the breakpoint phylogeny

(Cosner et al.). All ordered pairs of signed genes

appearing consecutively are coded as binary features.

Exponential time complexity, however, much faster than BPAnalysis.

Page 9: Inferring Phylogeny using Permutation Patterns on Genomic Data

Limitations

May fail to find feasible solutions to the breakpoint phylogeny problem.

Page 10: Inferring Phylogeny using Permutation Patterns on Genomic Data

Observation: The closer is the evolution history, the more permutations (of different granularity) are in common

1 2 3 4 5 6 7 8 9 10

1 2 3 –8 –7 –6 –5 –4 9 10

1 8 –3 –2 –7 –6 –5 –4 9 10

Page 11: Inferring Phylogeny using Permutation Patterns on Genomic Data

Maximal pi-pattern (Eres et al.)

Matches permutations at different granularity.

Polynomial time complexity.

Page 12: Inferring Phylogeny using Permutation Patterns on Genomic Data

pi-pattern

Example :

For S = and k=2

All pi-patterns are: ac, bc, abc, abcc

acbcabacbcab

abc

Pattern with minimum k permutations

Page 13: Inferring Phylogeny using Permutation Patterns on Genomic Data

Cover

P1 covers P2=> Every P1 has a P2 Every P2 is within a P1

Example In S = acbcababc covers ac

Page 14: Inferring Phylogeny using Permutation Patterns on Genomic Data

Maximal pi-pattern

pi-pattern which is not covered

Example In S = acbcabpi-patterns: ac, bc, abc, abcc

Maximal pi-patterns: abc, abcc

not covered by abcc

Page 15: Inferring Phylogeny using Permutation Patterns on Genomic Data

Results

Page 16: Inferring Phylogeny using Permutation Patterns on Genomic Data

Phylogeny for simulated evolution on synthetic data

a

b

c

d

e

f

g

h

i

j

k

l

m

n

o

p

Page 17: Inferring Phylogeny using Permutation Patterns on Genomic Data

12 genera of Campanulaceaeand the outgroup tobacco

Page 18: Inferring Phylogeny using Permutation Patterns on Genomic Data

Tree1: MPBE tree

Page 19: Inferring Phylogeny using Permutation Patterns on Genomic Data

Tree2: Neighbor joining tree (using few different distances)

Tra

Sym

Cam

Ade

Wah

Mer

Leg

Asy

Tri

Cod

Cya

Pla

Tob

Page 20: Inferring Phylogeny using Permutation Patterns on Genomic Data

Tree3: Neighbor joining tree using permutation patterns

Tra

Sym

Cam

Ade

Wah

Mer

Asy

Leg

Tri

Cod

Cya

Pla

Tob

167 Maximal pi-patterns(from 10769 pi-patterns) used as binary feature

XOR Distance measure

Distance/Similarity matrix is created to find neighbor joining tree

Page 21: Inferring Phylogeny using Permutation Patterns on Genomic Data

Tree3 vs Tree2

Page 22: Inferring Phylogeny using Permutation Patterns on Genomic Data

Conclusion Permutation patterns may preserve more

evolutionary information. Evolutionary events could be counted

within permuted segments to develop a hybrid

scheme. Current approaches remain unable to

handle unequal gene content, which could be solved using maximal pi-patterns.