Pasaniuc Presentation

download Pasaniuc Presentation

of 26

Transcript of Pasaniuc Presentation

  • 8/2/2019 Pasaniuc Presentation

    1/26

    Sorting by reversals

    Bogdan Pasaniuc

    Dept. of Computer Science & Engineering

  • 8/2/2019 Pasaniuc Presentation

    2/26

    Overview

    Biological background

    Definitions

    Unsigned Permutations

    Approximation Algorithm

    Sorting Signed Permutations

    Simplified Algorithm

  • 8/2/2019 Pasaniuc Presentation

    3/26

    What is the evolutionary path ?

    What is the ancestor chromosome?

    Chromosomes lists of genes permutation

    Unknown ancestor

    Human (X chrom.)

    Mouse (X chrom.)

  • 8/2/2019 Pasaniuc Presentation

    4/26

    Mutation at chromosome level Inversion (1 2 3 4 5 6 7) (1 4 3 2 5 6 7)

    Transposition (1 2 3 4 5 6 7) (1 5 6 2 3 4 7) Translocation (1 2 3 4 5 6 7) (1 2 3 4 5 2 3 4 6 7)

    Inversions Known as reversals

    The most common Most often reflect the differences between and within species

    What is the minimum number of reversals required to

    transform one perm. into another? Reversal distance good approx. for evolutionary

    distance

  • 8/2/2019 Pasaniuc Presentation

    5/26

    1 32

    4

    10

    56

    8

    9

    7

    1, 2, 3, 4, 5, 6, 7, 8, 9, 10

    Reversals

    Genes (blocks)

  • 8/2/2019 Pasaniuc Presentation

    6/26

    Reversals

    1 32

    4

    10

    56

    8

    9

    7

    1, 2, 3, 8, 7, 6, 5, 4, 9, 10

  • 8/2/2019 Pasaniuc Presentation

    7/26

    Reversals

    1 32

    4

    10

    56

    8

    9

    7Breakpoints

    1, 2, 3, 8, 7, 6, 5, 4, 9, 10

  • 8/2/2019 Pasaniuc Presentation

    8/26

    Breakpointa pair of adjacent positions

    (i,i+1) s. t. | i - i+1| 1 The values ii+1are not consecutive If | i - i+1| = 1 then the values ii+1are adjacent

    Introduce 0= 0 , n+1 = n+1

    (0,1) breakpoint if 1 1

    (n,n+1) breakpoint if n n

    A reversal affects the breakpoints only atits endpoints

    Any reversal can remove or induce at most2 bkpts.

  • 8/2/2019 Pasaniuc Presentation

    9/26

    StripA maximal run of increasing (decreasing)elements.

    Identity permutation has no breakpoints and anyother permutation has at least one breakpoint

    Greedy at each step remove the maximumnumber of breakpoints.

    () = number of breakpoints in While(() > 0)

    Choose a reversal that removes the maximum number

    of breakpoints. (if there is a tie favor the reversal thatleaves a decreasing strip)

    Greedy ends in at most () steps.

  • 8/2/2019 Pasaniuc Presentation

    10/26

    Quality of approximation

    Lemma1:Every permutation with a decreasing striphas a reversal that removes one breakpoint.

    Proof:

    consider the decreasing strip with i being the smallest

    i -1 must be in an increasing strip that lies to the left or right

    Breakpoint that will be removed

  • 8/2/2019 Pasaniuc Presentation

    11/26

    Lemma2: has a decreasing strip. If every reversalthat removes one bkpt leaves a permutation with nodecreasing strips has a reversal that removes

    two bkpts.Proof:

    consider the decreasing strip with i being the smallest

    increasing strip must be to the left. i

    consider the decreasing strip with jbeing the largest

    decreasing strip containing j+1must be to the right.j

  • 8/2/2019 Pasaniuc Presentation

    12/26

    Fact 1: i andj must overlap

    j must lie in i if it doesnt then oi has the

    decreasing strip that contains j i must lie in jif it doesnt then oj has the

    decreasing strip that contains i

  • 8/2/2019 Pasaniuc Presentation

    13/26

    Fact 2. i =jIf i -j 0 then

    - if i -j contains an increasing stripoj has a decreasing

    strip- if i -j contains an decreasing stripoi has a decreasing

    strip

    Then =i = removes 2 breakpoints.

  • 8/2/2019 Pasaniuc Presentation

    14/26

    Lemma 3:Greedy solves a permutation with adecreasing strip in at most() 1 reversals

    Obs:

    if

    i has no decreasing strip

    at step i-1 the reversalremoved 2 bkpts.

    we can use one reversal to create a decr. strip existsa reversal that removes at least one bkpt

    Theorem1: Greedy sorts every permutation inat most() reversals.

    If has a decreasing strip at most () -1reversals

    If has no decreasing strip

    every reversal inducesa decreasing strip after one step we can apply

    lemma3 at most () reversals

  • 8/2/2019 Pasaniuc Presentation

    15/26

    Corollary:Greedy is a 2-approximation algorithm

    Every reversal removes at most 2 bkpts OPT() () /2 Greedy() /2

    Greedy() 2* OPT() .

    Runtime#of steps O(n).

    At each step we need to analyze reversalsO(n2).

    Total runtime = O(n

    3

    ). analyze only reversals that remove bkpts O(n2).

    2

    n

  • 8/2/2019 Pasaniuc Presentation

    16/26

    Signed permutations:

    reversals change the sign:(1,2,3,4,5,6,7,8,9,10)

    (1,2,3,-8,-7,-6,-5,-4,9,10)

    Problem:

    Given a signed perm., find the minimum lengthseries of reversals that transforms it into the

    identity perm.

    polynomial algorithm (Hannenhalli&Pevzner 95)

    relies on several intermediary constructions

    these constructions have been simplified

    first completely elementary treatment of the problem(Bergeron 05)

  • 8/2/2019 Pasaniuc Presentation

    17/26

    Oriented pair a pair of consecutive integers withdifferent signs

    (0,3,1,6,5,-2,4,7) o.p. (3,-2) and (1,-2).

    o.p. reversals that create consecutive integers

    (3,-2) : (0,3,1,6,5,-2,4,7)

    (0,3,2,-5,-6,-1,4,7)(1,-2) : (0,3,1,6,5,-2,4,7) (0,3,-5,-6,-1,-2,4,7)

    Oriented reversal:reversal that creates consecutive

    integers Score of a reversal:# of oriented pairs it creates.

  • 8/2/2019 Pasaniuc Presentation

    18/26

    Algorithm1:As long as has an oriented pair, choose theoriented reversal that has the maximal score.

    output will be a permutation with positive elements.

    0 and n+1 are positive;

    if there is a negative element there exists an o.p.

    Claim1: If Alg1 applies k reversals to , yielding thend() = d() + k.

  • 8/2/2019 Pasaniuc Presentation

    19/26

    Sorting positive perms.:

    - signed perm. with positive elements

    - circular order: 0 successor of n+1.

    - reduced if it does not contain consecutive elements.

    framed interval in : i j+1j+2j+k-1i+k

    s.t. i < j+1j+2 j+k-1 < i+k

    (0 2 5 4 3 6 1 7 )

    hurdle a framed int. that contains no shorter framed int.

    (0 2 5 4 3 6 1 7 )

    (0 2 5 4 3 6 1 7 )

    (0 2 5 4 3 6 1 7 )

  • 8/2/2019 Pasaniuc Presentation

    20/26

    Idea: create oriented pairs and then apply Algorithm1

    Operations on Hurdles:

    Hurdle Cutting: i j+1j+2i+1j+k-1i+k

    (0 1 4 3 2 5) (0 -3 -4 -1 2 5)

    Hurdle Merging: i i+k i ii+k

    (0 2 5 4 3 6 1 7)

    Simple hurdle if cutting it decreases the # of hurdles

    Super hurdles if cutting it increases the # of hurdles

    (0 2 5 4 3 -6 1 7 )

  • 8/2/2019 Pasaniuc Presentation

    21/26

    Algorithm2:

    has 2k hurdles merge any two non-consecutivehurdles

    has 2k+1 hurdles cut one simple hurdle (if it has nonemerge any two non-consecutive)

    Claim2: Alg1 + Alg2 optimally sort any signed perm.

  • 8/2/2019 Pasaniuc Presentation

    22/26

    Proof of claims:

    breakpoint graph

    1. each positive el x 2x-1,2x and each negative (-x) 2x,2x-1

    (0 -1 3 5 4 6 -2 7)

    (0 2 1 5 6 9 10 7 8 11 12 4 3 13 )

    arcs

  • 8/2/2019 Pasaniuc Presentation

    23/26

    Arcs oriented if they span an odd # of elements

    Arc overlap graph:

    Vertices -> arcs from breakpoint graph

    Edges arcs overlap

  • 8/2/2019 Pasaniuc Presentation

    24/26

    Every oriented vertex corresponds to an orientedpair.

    Fact2: Score of an oriented reversal (orientedvertex v) is T+U-O+1.

    T= #oriented vertices.

    U= #unoriented vertices adjacent to v O= #oriented vertices adjacent to v

    Oriented component if it contains an oriented v

    Safe reversal does not create new unorientedcomponents.

  • 8/2/2019 Pasaniuc Presentation

    25/26

    Theorem (Hannenhalli&Pevzner). Anysequence of oriented safe reversals is optimal.

    Theorem. An oriented reversal of maximal scoreis safe.

    claim1 holds.

    Claim2 is proven in a similar manner.

  • 8/2/2019 Pasaniuc Presentation

    26/26

    J. Kececioglu and D. Sankoff. Exact and

    approximation algorithms for sorting by reversals, withapplication to genome rearrangement. 1995.

    A. Bergeron. A very elementary presentation of the

    Hannenhalli-Pevzner Theory. 2005

    A. Caprara. Sorting by reversals is difficult. 1997

    S. Hannenhalli and Pavel Pevzner.Transforming

    cabbage into turnip: polynomial algorithm for sorting

    signed permutations by reversals. 1999