Pasaniuc Presentation
-
Upload
ephemeralete -
Category
Documents
-
view
216 -
download
0
Transcript of Pasaniuc Presentation
-
8/2/2019 Pasaniuc Presentation
1/26
Sorting by reversals
Bogdan Pasaniuc
Dept. of Computer Science & Engineering
-
8/2/2019 Pasaniuc Presentation
2/26
Overview
Biological background
Definitions
Unsigned Permutations
Approximation Algorithm
Sorting Signed Permutations
Simplified Algorithm
-
8/2/2019 Pasaniuc Presentation
3/26
What is the evolutionary path ?
What is the ancestor chromosome?
Chromosomes lists of genes permutation
Unknown ancestor
Human (X chrom.)
Mouse (X chrom.)
-
8/2/2019 Pasaniuc Presentation
4/26
Mutation at chromosome level Inversion (1 2 3 4 5 6 7) (1 4 3 2 5 6 7)
Transposition (1 2 3 4 5 6 7) (1 5 6 2 3 4 7) Translocation (1 2 3 4 5 6 7) (1 2 3 4 5 2 3 4 6 7)
Inversions Known as reversals
The most common Most often reflect the differences between and within species
What is the minimum number of reversals required to
transform one perm. into another? Reversal distance good approx. for evolutionary
distance
-
8/2/2019 Pasaniuc Presentation
5/26
1 32
4
10
56
8
9
7
1, 2, 3, 4, 5, 6, 7, 8, 9, 10
Reversals
Genes (blocks)
-
8/2/2019 Pasaniuc Presentation
6/26
Reversals
1 32
4
10
56
8
9
7
1, 2, 3, 8, 7, 6, 5, 4, 9, 10
-
8/2/2019 Pasaniuc Presentation
7/26
Reversals
1 32
4
10
56
8
9
7Breakpoints
1, 2, 3, 8, 7, 6, 5, 4, 9, 10
-
8/2/2019 Pasaniuc Presentation
8/26
Breakpointa pair of adjacent positions
(i,i+1) s. t. | i - i+1| 1 The values ii+1are not consecutive If | i - i+1| = 1 then the values ii+1are adjacent
Introduce 0= 0 , n+1 = n+1
(0,1) breakpoint if 1 1
(n,n+1) breakpoint if n n
A reversal affects the breakpoints only atits endpoints
Any reversal can remove or induce at most2 bkpts.
-
8/2/2019 Pasaniuc Presentation
9/26
StripA maximal run of increasing (decreasing)elements.
Identity permutation has no breakpoints and anyother permutation has at least one breakpoint
Greedy at each step remove the maximumnumber of breakpoints.
() = number of breakpoints in While(() > 0)
Choose a reversal that removes the maximum number
of breakpoints. (if there is a tie favor the reversal thatleaves a decreasing strip)
Greedy ends in at most () steps.
-
8/2/2019 Pasaniuc Presentation
10/26
Quality of approximation
Lemma1:Every permutation with a decreasing striphas a reversal that removes one breakpoint.
Proof:
consider the decreasing strip with i being the smallest
i -1 must be in an increasing strip that lies to the left or right
Breakpoint that will be removed
-
8/2/2019 Pasaniuc Presentation
11/26
Lemma2: has a decreasing strip. If every reversalthat removes one bkpt leaves a permutation with nodecreasing strips has a reversal that removes
two bkpts.Proof:
consider the decreasing strip with i being the smallest
increasing strip must be to the left. i
consider the decreasing strip with jbeing the largest
decreasing strip containing j+1must be to the right.j
-
8/2/2019 Pasaniuc Presentation
12/26
Fact 1: i andj must overlap
j must lie in i if it doesnt then oi has the
decreasing strip that contains j i must lie in jif it doesnt then oj has the
decreasing strip that contains i
-
8/2/2019 Pasaniuc Presentation
13/26
Fact 2. i =jIf i -j 0 then
- if i -j contains an increasing stripoj has a decreasing
strip- if i -j contains an decreasing stripoi has a decreasing
strip
Then =i = removes 2 breakpoints.
-
8/2/2019 Pasaniuc Presentation
14/26
Lemma 3:Greedy solves a permutation with adecreasing strip in at most() 1 reversals
Obs:
if
i has no decreasing strip
at step i-1 the reversalremoved 2 bkpts.
we can use one reversal to create a decr. strip existsa reversal that removes at least one bkpt
Theorem1: Greedy sorts every permutation inat most() reversals.
If has a decreasing strip at most () -1reversals
If has no decreasing strip
every reversal inducesa decreasing strip after one step we can apply
lemma3 at most () reversals
-
8/2/2019 Pasaniuc Presentation
15/26
Corollary:Greedy is a 2-approximation algorithm
Every reversal removes at most 2 bkpts OPT() () /2 Greedy() /2
Greedy() 2* OPT() .
Runtime#of steps O(n).
At each step we need to analyze reversalsO(n2).
Total runtime = O(n
3
). analyze only reversals that remove bkpts O(n2).
2
n
-
8/2/2019 Pasaniuc Presentation
16/26
Signed permutations:
reversals change the sign:(1,2,3,4,5,6,7,8,9,10)
(1,2,3,-8,-7,-6,-5,-4,9,10)
Problem:
Given a signed perm., find the minimum lengthseries of reversals that transforms it into the
identity perm.
polynomial algorithm (Hannenhalli&Pevzner 95)
relies on several intermediary constructions
these constructions have been simplified
first completely elementary treatment of the problem(Bergeron 05)
-
8/2/2019 Pasaniuc Presentation
17/26
Oriented pair a pair of consecutive integers withdifferent signs
(0,3,1,6,5,-2,4,7) o.p. (3,-2) and (1,-2).
o.p. reversals that create consecutive integers
(3,-2) : (0,3,1,6,5,-2,4,7)
(0,3,2,-5,-6,-1,4,7)(1,-2) : (0,3,1,6,5,-2,4,7) (0,3,-5,-6,-1,-2,4,7)
Oriented reversal:reversal that creates consecutive
integers Score of a reversal:# of oriented pairs it creates.
-
8/2/2019 Pasaniuc Presentation
18/26
Algorithm1:As long as has an oriented pair, choose theoriented reversal that has the maximal score.
output will be a permutation with positive elements.
0 and n+1 are positive;
if there is a negative element there exists an o.p.
Claim1: If Alg1 applies k reversals to , yielding thend() = d() + k.
-
8/2/2019 Pasaniuc Presentation
19/26
Sorting positive perms.:
- signed perm. with positive elements
- circular order: 0 successor of n+1.
- reduced if it does not contain consecutive elements.
framed interval in : i j+1j+2j+k-1i+k
s.t. i < j+1j+2 j+k-1 < i+k
(0 2 5 4 3 6 1 7 )
hurdle a framed int. that contains no shorter framed int.
(0 2 5 4 3 6 1 7 )
(0 2 5 4 3 6 1 7 )
(0 2 5 4 3 6 1 7 )
-
8/2/2019 Pasaniuc Presentation
20/26
Idea: create oriented pairs and then apply Algorithm1
Operations on Hurdles:
Hurdle Cutting: i j+1j+2i+1j+k-1i+k
(0 1 4 3 2 5) (0 -3 -4 -1 2 5)
Hurdle Merging: i i+k i ii+k
(0 2 5 4 3 6 1 7)
Simple hurdle if cutting it decreases the # of hurdles
Super hurdles if cutting it increases the # of hurdles
(0 2 5 4 3 -6 1 7 )
-
8/2/2019 Pasaniuc Presentation
21/26
Algorithm2:
has 2k hurdles merge any two non-consecutivehurdles
has 2k+1 hurdles cut one simple hurdle (if it has nonemerge any two non-consecutive)
Claim2: Alg1 + Alg2 optimally sort any signed perm.
-
8/2/2019 Pasaniuc Presentation
22/26
Proof of claims:
breakpoint graph
1. each positive el x 2x-1,2x and each negative (-x) 2x,2x-1
(0 -1 3 5 4 6 -2 7)
(0 2 1 5 6 9 10 7 8 11 12 4 3 13 )
arcs
-
8/2/2019 Pasaniuc Presentation
23/26
Arcs oriented if they span an odd # of elements
Arc overlap graph:
Vertices -> arcs from breakpoint graph
Edges arcs overlap
-
8/2/2019 Pasaniuc Presentation
24/26
Every oriented vertex corresponds to an orientedpair.
Fact2: Score of an oriented reversal (orientedvertex v) is T+U-O+1.
T= #oriented vertices.
U= #unoriented vertices adjacent to v O= #oriented vertices adjacent to v
Oriented component if it contains an oriented v
Safe reversal does not create new unorientedcomponents.
-
8/2/2019 Pasaniuc Presentation
25/26
Theorem (Hannenhalli&Pevzner). Anysequence of oriented safe reversals is optimal.
Theorem. An oriented reversal of maximal scoreis safe.
claim1 holds.
Claim2 is proven in a similar manner.
-
8/2/2019 Pasaniuc Presentation
26/26
J. Kececioglu and D. Sankoff. Exact and
approximation algorithms for sorting by reversals, withapplication to genome rearrangement. 1995.
A. Bergeron. A very elementary presentation of the
Hannenhalli-Pevzner Theory. 2005
A. Caprara. Sorting by reversals is difficult. 1997
S. Hannenhalli and Pavel Pevzner.Transforming
cabbage into turnip: polynomial algorithm for sorting
signed permutations by reversals. 1999