6.096 Lecture 10
description
Transcript of 6.096 Lecture 10
![Page 1: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/1.jpg)
6.096 Lecture 10
EvolutionGene correspondence - Rearrangements - Genome duplication
![Page 2: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/2.jpg)
Evolutionary change
Lecture 1 - Introduction
Lecture 2 - Hashing / BLAST
Lecture 3 - Combinatorial Motif Finding
Lecture 4 - Statistical Motif Finding
Lecture 5 - Sequence alignment and Dynamic Programming
Lecture 6 - RNA structure and Context Free Grammars
Lecture 7 - Gene finding and Hidden Markov Models
Lecture 8 - HMMs algorithms and Dynamic Programming
Lecture 9 - Evolutionary change, phylogenetic trees
Lecture 10 - Genome rearrangements, genome duplication
6.096 – Algorithms for Computational Biology – Lecture 9
![Page 3: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/3.jpg)
Challenges in Computational Biology
DNA
4 Genome Assembly
Gene FindingRegulatory motif discovery
Database lookup
Gene expression analysis9
RNA transcript
Sequence alignment
Evolutionary Theory7
TCATGCTATTCGTGATAATGAGGATATTTATCATATTTATGATTT
Cluster discovery10 Gibbs samplingProtein network analysis12
Emerging network properties14
13 Regulatory network inference
Comparative Genomics
RNA folding
6
![Page 4: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/4.jpg)
Overview
Genome correspondence
Chromosome evolution
Genome rearrangements
Sorting by reversals
Genome duplication
Duplicate gene evolution
Duplication and rearrangements
![Page 5: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/5.jpg)
Framework: graph of gene correspondence
S.cerevisiae S.bayanus
duplication
loss
ortholog
merge
• Weighted bipartite graph– Graph represents gene correspondence
– Nodes: genes (w/ coordinates)
– Edges: sequence similarity (w/ weights)
• Two types of evolutionary relationships– Orthologs (1-to-1 matches)
– Paralogs (1-to-many / many-to-many)
• Method– Eliminate spurious edges (simplify graph)
– Select edges based on available information
• Blocks of conserved gene order
• Protein sequence similarity
![Page 6: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/6.jpg)
Inferring orthologous gene relationships
• BBH - Best bi-directional hits
• COG - Clusters of orthologous genes
• BUS - Best unambiguous subgraphs
![Page 7: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/7.jpg)
C
A
B
1
2
3
BUS: Best Unambiguous Subgraphs
806040
353080
403580
80603
CB
802
801
A
806040
3530
4035
80
80
C
A
B
1
2
3C
A
B
1
2
3
Species1 Species2
3
2
1
CBA
Sp
ecie
s 1
Species2
Connected components of all best edges
![Page 8: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/8.jpg)
A
B
C
1
2
3
Implementation: Iterative refinement
A
B
C
1
2
3
A
B
C
1
2
3
A
B
C
1
2
3
IterateFull bipartite graph
G=(X+Y,E)Directed Graph D
Maximal out-edges M, relative threshold T
Separate connected
components of M
Iterative refinement with increasing relative threshold
![Page 9: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/9.jpg)
Conservation of gene order (synteny)
S.paradoxus
S.c
ere
visi
ae
Preferentially select edges in synteny blocks
![Page 10: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/10.jpg)
Genomic Dot-Plot (6000 x 6000)
S.paradoxus scaffolds
S.c
ere
visi
ae
ch
rom
oso
mes
II
III
IV
V
VI
VII
VIII
IX
X
XI
XII
XIII
XIV
XV
XVI
I
![Page 11: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/11.jpg)
Conservation of local gene order
S.cerevisiae
S.cerevisiae Chromosome VI 250-300kbp
S.paradoxus
S.mikatae
S.bayanus
![Page 12: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/12.jpg)
Overview
Genome correspondence
Chromosome evolution
Genome rearrangements
Sorting by reversals
Genome duplication
Duplicate gene evolution
Duplication and rearrangements
![Page 13: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/13.jpg)
Regions of rapid change
Protein family expansions in chromosome ends
![Page 14: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/14.jpg)
Specific regions of rapid evolution
Position on chromosome
S.c
ere
visi
ae
ch
rom
oso
mes
• HXT, FLO, COS, PAU, YRF families
• 80% of ambiguities in 5% of the genome
• 31 of 32 telomeres in ambiguity clusters
![Page 15: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/15.jpg)
Specific mechanisms mediate rearrangements
Evolutionary features
• 10 translocations– 8 across Ty elements– 2 across nearly identical genes
Ty
Ty
tRNA tRNA
• 19 inversions– All flanked by tRNA genes
![Page 16: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/16.jpg)
Transposon locations are conserved
• Transposons are active– Full-length Ty elements are recent– Typically appear in only one genome
• Transposon locations are conserved– Recent insertions reuse old loci– LTR remnants found in other genomes
![Page 17: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/17.jpg)
Evolutionary advantage of transposons ?
• Studied 8 strains resulting from experimental evolution– 3 strains duplicate Chr4R, containing HXT genes– 3 strains display extensive overlapping Chr5R deletions– 3 strains reuse same breakpoint on Chr14R
• Population advantage in transposon location– Ability to mediate reversible rearrangements
Dunham et al, PNAS, Dec 10, 2002 (Botstein Lab, Stanford)
Transposons selectively kept in specific loci
![Page 18: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/18.jpg)
Differences in gene content
• 8-10 genes unique to each genome– Metabolism, regulation/silencing, stress
• Changes in gene dosage– 10-20 tandem duplications (1-2 genes)– 2 segment duplications (5-6 genes)
• Protein family expansions– 211 genes (3%) with ambiguous correspondence– Paralog duplication and/or loss
Segment duplicationDifferent species, few novel genes
![Page 19: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/19.jpg)
iii
v
iii
iv Inversions Intein insertion
TranspositionChromosomal exchangeTelomeric expansion
Intein sequence
Chromosomal Evolution
vi Segmental duplication
![Page 20: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/20.jpg)
Overview
Genome correspondence
Chromosome evolution
Genome rearrangements
Sorting by reversals
Genome duplication
Duplicate gene evolution
Duplication and rearrangements
![Page 21: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/21.jpg)
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
1 2 -6 -5 -4 -3 7 8 9 10
1 2 -6 -8 -7 3 4 5 9 10
1 2 -6 -8 -7 3 4 -10 -9 -5
1 2 3 4 5 6 7 8 9 10
Gene order rearrangement: overlapping inversions
![Page 22: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/22.jpg)
1 2 3 4 5 6 7 8 9 10
1 2 -6 -5 -4 -3 7 8 9 10
1 2 -6 -8 -7 3 4 5 9 10
1 2 -6 -8 -7 3 4 -10 -9 -5
Inference of inversion history: “sorting signed permutations by reversals”
![Page 23: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/23.jpg)
Reversals: Example
= 1 2 3 4 5 6 7 8
(3,5)
1 2 5 4 3 6 7 8
![Page 24: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/24.jpg)
Reversals: Example
= 1 2 3 4 5 6 7 8
(3,5)
1 2 5 4 3 6 7 8
(5,6)
1 2 5 4 6 3 7 8
![Page 25: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/25.jpg)
Reversals and Gene Orders
• Gene order is represented by a permutation 1------ i-1 i i+1 ------j-1 j j+1 -----n
1------ i-1 j j-1 ------i+1 i j+1 -----n
Reversal ( i, j ) reverses (flips) the elements from i to j in
,j)
![Page 26: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/26.jpg)
Reversal Distance Problem
• Goal: Given two permutations, find the shortest series of reversals that transforms one into another
• Input: Permutations and
• Output: A series of reversals 1,…t transforming into such that t is minimum
• t - reversal distance between and • d(, ) - smallest possible value of t, given and
![Page 27: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/27.jpg)
Sorting By Reversals Problem
• Goal: Given a permutation, find a shortest series of reversals that transforms it into the identity permutation (1 2 … n )
• Input: Permutation
• Output: A series of reversals 1, … t transforming into the identity permutation such that t is minimum
![Page 28: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/28.jpg)
Sorting By Reversals: Example
• t =d( ) - reversal distance of • Example : = 3 4 2 1 5 6 7 10 9 8 4 3 2 1 5 6 7 10 9 8 4 3 2 1 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 So d( ) = 3
![Page 29: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/29.jpg)
Sorting by reversals: 5 steps
Step 0: 2 -4 -3 5 -8 -7 -6 1Step 1: 2 3 4 5 -8 -7 -6 1Step 2: 2 3 4 5 6 7 8 1Step 3: 2 3 4 5 6 7 8 -1Step 4: -8 -7 -6 -5 -4 -3 -2 -1Step 5: g 1 2 3 4 5 6 7 8
![Page 30: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/30.jpg)
Sorting by reversals: 4 steps
Step 0: 2 -4 -3 5 -8 -7 -6 1Step 1: 2 3 4 5 -8 -7 -6 1Step 2: -5 -4 -3 -2 -8 -7 -6 1Step 3: -5 -4 -3 -2 -1 6 7 8Step 4: g 1 2 3 4 5 6 7 8
![Page 31: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/31.jpg)
Sorting by reversals: 4 steps
Step 0: 2 -4 -3 5 -8 -7 -6 1Step 1: 2 3 4 5 -8 -7 -6 1Step 2: -5 -4 -3 -2 -8 -7 -6 1Step 3: -5 -4 -3 -2 -1 6 7 8Step 4: g 1 2 3 4 5 6 7 8
What is the reversal distance for this permutation? Can it be sorted in 3 steps?
![Page 32: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/32.jpg)
Pancake Flipping Problem
• The chef is sloppy; he prepares an unordered stack of pancakes of different sizes
• The waiter wants to rearrange them (so that the smallest winds up on top, and so on, down to the largest at the bottom)
• He does it by flipping over several from the top, repeating this as many times as necessary
Christos Papadimitrou and Bill Gates flip pancakes
![Page 33: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/33.jpg)
Pancake Flipping Problem: Formulation
• Goal: Given a stack of n pancakes, what is the minimum number of flips to rearrange them into perfect stack?
• Input: Permutation • Output: A series of prefix reversals 1, … t
transforming into the identity permutation such that t is minimum
![Page 34: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/34.jpg)
Pancake Flipping Problem: Greedy Algorithm• Greedy approach: 2 prefix reversals at most to
place a pancake in its right position, 2n – 2 steps total
at most
• William Gates and Christos Papadimitriou showed in the mid-1970s that this problem can be solved by at most 5/3 (n + 1) prefix reversals
![Page 35: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/35.jpg)
Sorting By Reversals: A Greedy Algorithm
• If sorting permutation = 1 2 3 6 4 5, the first three elements are already in order so it does not make any sense to break them.
• The length of the already sorted prefix of is denoted prefix()
– prefix() = 3• This results in an idea for a greedy algorithm:
increase prefix() at every step
![Page 36: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/36.jpg)
• Doing so, can be sorted
1 2 3 6 4 5
1 2 3 4 6 5 1 2 3 4 5 6
• Number of steps to sort permutation of length n is at most (n – 1)
Greedy Algorithm: An Example
![Page 37: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/37.jpg)
Greedy Algorithm: Pseudocode
SimpleReversalSort()1 for i 1 to n – 12 j position of element i in (i.e., j = i)
3 if j ≠i4 * (i, j)5 output 6 if is the identity permutation 7 return
![Page 38: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/38.jpg)
Analyzing SimpleReversalSort
• SimpleReversalSort does not guarantee the smallest number of reversals and takes five steps on = 6 1 2 3 4 5 :
• Step 1: 1 6 2 3 4 5• Step 2: 1 2 6 3 4 5 • Step 3: 1 2 3 6 4 5• Step 4: 1 2 3 4 6 5• Step 5: 1 2 3 4 5 6
![Page 39: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/39.jpg)
• But it can be sorted in two steps:
= 6 1 2 3 4 5 – Step 1: 5 4 3 2 1 6 – Step 2: 1 2 3 4 5 6
• So, SimpleReversalSort() is not optimal
• Optimal algorithms are unknown for many problems; approximation algorithms are used
Analyzing SimpleReversalSort (cont’d)
![Page 40: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/40.jpg)
Approximation Algorithms
• These algorithms find approximate solutions rather than optimal solutions
• The approximation ratio of an algorithm A on input is:
A() / OPT()where A() -solution produced by algorithm A
OPT() - optimal solution of the problem
![Page 41: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/41.jpg)
Approximation Ratio/Performance Guarantee• Approximation ratio (performance guarantee) of
algorithm A: max approximation ratio of all inputs of size n
– For algorithm A that minimizes objective function (minimization algorithm):
•max|| = n A() / OPT()
![Page 42: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/42.jpg)
Approximation Ratio/Performance Guarantee• Approximation ratio (performance guarantee) of
algorithm A: max approximation ratio of all inputs of size n
– For algorithm A that minimizes objective function (minimization algorithm):
•max|| = n A() / OPT()
– For maximization algorithm:
•min|| = n A() / OPT()
![Page 43: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/43.jpg)
= 23…n-1n
• A pair of elements i and i + 1 are adjacent if
i+1 = i + 1
• For example:
= 1 9 3 4 7 8 2 6 5
• (3, 4) or (7, 8) and (6,5) are adjacent pairs
Adjacencies and Breakpoints
![Page 44: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/44.jpg)
There is a breakpoint between any pair of non-adjacent elements:
= 1 9 3 4 7 8 2 6 5
• Pairs (1,9), (9,3), (4,7), (8,2) and (2,5) form breakpoints of permutation
• b() - # breakpoints in permutation
Breakpoints: An Example
![Page 45: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/45.jpg)
• We put two elements 0 =0 and n + 1=n+1 at the ends of Example:
Extending with 0 and 10
Note: A new breakpoint was created after extending
Extending Permutations
= 1 9 3 4 7 8 2 6 5
= 0 1 9 3 4 7 8 2 6 5 10
![Page 46: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/46.jpg)
Each reversal eliminates at most 2 breakpoints.
= 2 3 1 4 6 5
0 2 3 1 4 6 5 7 b() = 5
0 1 3 2 4 6 5 7 b() = 4
0 1 2 3 4 6 5 7 b() = 2
0 1 2 3 4 5 6 7 b() = 0
Reversal Distance and Breakpoints
![Page 47: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/47.jpg)
Each reversal eliminates at most 2 breakpoints. This implies:
reversal distance ≥ #breakpoints / 2
= 2 3 1 4 6 5
0 2 3 1 4 6 5 7 b() = 5
0 1 3 2 4 6 5 7 b() = 4
0 1 2 3 4 6 5 7 b() = 2
0 1 2 3 4 5 6 7 b() = 0
Reversal Distance and Breakpoints
![Page 48: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/48.jpg)
Sorting By Reversals: A Better Greedy Algorithm
BreakPointReversalSort()1 while b() > 02 Among all possible reversals, choose
reversal minimizing b( • )3 • (i, j)4 output 5 return
![Page 49: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/49.jpg)
Sorting By Reversals: A Better Greedy Algorithm
BreakPointReversalSort()1 while b() > 02 Among all possible reversals, choose
reversal minimizing b( • )3 • (i, j)4 output 5 return
Problem: this algorithm may work forever
![Page 50: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/50.jpg)
Strips
• Strip: an interval between two consecutive breakpoints in a permutation – Decreasing strip: strip of elements in decreasing
order (e.g. 6 5 and 3 2 ).– Increasing strip: strip of elements in increasing
order (e.g. 7 8) 0 1 9 4 3 7 8 2 5 6 10
– A single-element strip can be declared either increasing or decreasing. We will choose to declare them as decreasing with exception of the strips with 0 and n+1
![Page 51: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/51.jpg)
Reducing the Number of Breakpoints
Theorem 1:
If permutation contains at least one decreasing strip, then there exists a reversal which decreases the number of breakpoints (i.e. b(• ) < b() )
![Page 52: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/52.jpg)
Things To Consider
• For = 1 4 6 5 7 8 3 2
0 1 4 6 5 7 8 3 2 9 b() = 5
– Choose decreasing strip with the smallest element k in ( k = 2 in this case)
![Page 53: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/53.jpg)
Things To Consider (cont’d)
• For = 1 4 6 5 7 8 3 2
0 1 4 6 5 7 8 3 22 9 b() = 5
– Choose decreasing strip with the smallest element k in ( k = 2 in this case)
![Page 54: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/54.jpg)
Things To Consider (cont’d)
• For = 1 4 6 5 7 8 3 2
0 11 4 6 5 7 8 3 22 9 b() = 5
– Choose decreasing strip with the smallest element k in ( k = 2 in this case)
– Find k – 1 in the permutation
![Page 55: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/55.jpg)
Things To Consider (cont’d)
• For = 1 4 6 5 7 8 3 2
0 1 4 6 5 7 8 3 2 9 b() = 5
– Choose decreasing strip with the smallest element k in ( k = 2 in this case)
– Find k – 1 in the permutation– Reverse the segment between k and k-1:– 0 1 4 6 5 7 8 3 2 9 b() = 5
– 0 1 2 3 8 7 5 6 4 9 b() = 4
![Page 56: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/56.jpg)
Reducing the Number of Breakpoints Again
– If there is no decreasing strip, there may be no reversal that reduces the number of breakpoints (i.e. b(•) ≥ b() for any reversal ).
– By reversing an increasing strip ( # of breakpoints stay unchanged ), we will create a decreasing strip at the next step. Then the number of breakpoints will be reduced in the next step (theorem 1).
![Page 57: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/57.jpg)
Things To Consider (cont’d)
• There are no decreasing strips in , for:
= 0 1 2 5 6 7 3 4 8 b() = 3
• (6,7) = 0 1 2 5 6 7 4 3 8 b() = 3
(6,7) does not change the # of breakpoints (6,7) creates a decreasing strip thus guaranteeing
that the next step will decrease the # of breakpoints.
![Page 58: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/58.jpg)
ImprovedBreakpointReversalSort
ImprovedBreakpointReversalSort()1 while b() > 02 if has a decreasing strip3 Among all possible reversals, choose reversal that minimizes b( • )4 else5 Choose a reversal that flips an increasing strip in 6 • 7 output 8 return
![Page 59: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/59.jpg)
• ImprovedBreakPointReversalSort is an approximation algorithm with a performance guarantee of at most 4
– It eliminates at least one breakpoint in every two steps; at most 2b() steps
– Approximation ratio: 2b() / d()– Optimal algorithm eliminates at most 2 breakpoints in
every step: d() b() / 2– Performance guarantee:
•( 2b() / d() ) [ 2b() / (b() / 2) ] = 4
ImprovedBreakpointReversalSort: Performance Guarantee
![Page 60: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/60.jpg)
Signed Permutations
• Up to this point, all permutations to sort were unsigned
• But genes have directions… so we should consider signed permutations
5’ 3’
= 1 -2 - 3 4 -5
![Page 61: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/61.jpg)
GRIMM Web Server
• Real genome architectures are represented by signed permutations
• Efficient algorithms to sort signed permutations have been developed
• GRIMM web server computes the reversal distances between signed permutations:
http://nbcr.sdsc.edu/GRIMM/grimm.cgi
![Page 62: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/62.jpg)
GRIMM Web Server
http://www-cse.ucsd.edu/groups/bioinformatics/GRIMM
![Page 63: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/63.jpg)
Sorting by reversals
![Page 64: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/64.jpg)
Overview
Genome correspondence
Chromosome evolution
Genome rearrangements
Sorting by reversals
Genome duplication
Duplicate gene evolution
Duplication and rearrangements
![Page 65: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/65.jpg)
20 Myr
S.bayanus
S.paradoxusS.mikatae
S.cerevisiae
5 Myr
K. waltii
100 Myr
Further back in evolutionary time
Ability to ask different set of questions
![Page 66: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/66.jpg)
Whole Genome Duplication (WGD) in Yeast?
Wolfe ‘97
Scer chr 4
Scer chr 12
• Genomic evidence – Conserved order of paralogous genes– Same transcriptional orientation
• However– Interspersed with single-copy genes
Interpretation: Genome duplication followed by gene loss
![Page 67: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/67.jpg)
Whole genome duplication is controversial
• “There was a whole-genome duplication.” Wolfe, Nature ‘97
• “There was no whole-genome duplication.” Dujon, FEBS 2000
• “At least some chrom dup. occurred independently” Langkjaer, JMB, 2000
• “Dynamic equilibrium of duplications and loss” Llorente, FEBS, 2000
• “Recent evidence supports single event”. Wong, PNAS ‘02
• “Continuous block duplications and deletions” Dujon, Yeast 2003
• “Dup. precedes divergence from Kluyveromyces.” Piskur, Nature, 2003
• “Telomere-mediated duplication events” Coissac, Mol Bio Evo 1997
• “Multiple closely spaced events” Friedman, Genome Res, 2003
• “Spontaneous duplication of large chromosomal segments” Koszul, EMBO ’04
• Insufficient evidence– Only 50% of genome in duplicate regions
– Only 8% of genes present in two copies
– Extensive redundancy outside duplicate regions
• Evidence against WGD– Divergence-based dating show multiple times
– Other species have similar level of redundancy
• Alternative evolutionary scenario proposed– Independent segmental duplications
– Also consistent with the evidence
Evidence remains inconclusive
![Page 68: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/68.jpg)
Missing evidence supporting WGD
Non-duplicated relative
![Page 69: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/69.jpg)
II
III
IV
V
VI
VII
VIII
IX
X
XI
XII
XIII
XIV
XV
XVI
I
K.waltii scaffolds
S.c
ere
visi
ae
ch
rom
oso
mes
10.5Mb
12MbGene correspondence
![Page 70: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/70.jpg)
II
III
IV
V
VI
VII
VIII
IX
X
XI
XII
XIII
XIV
XV
XVI
I
K.waltii scaffolds
S.c
ere
visi
ae
ch
rom
oso
mes
Gene correspondence
![Page 71: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/71.jpg)
K. waltiiK. waltii
Sister regions show gene interleaving
K. waltii
Gene interleaving is evidence of complete duplication
Few genes remain in 2 copies
![Page 72: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/72.jpg)
K. w
altii
chr
om
oso
me
s
Chr 2
Chr 1
Chr 3
Chr 4
Chr 5
Chr 6
Chr 7
Chr 8
Duplicate mapping tiles K. waltii
K.waltii
Scer copy1
Scer copy2
S. cer.
![Page 73: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/73.jpg)
Doubly Conserved Synteny Blocks (DCS)
• 253 DCS blocks were identified containing 75% of K. waltii genes and 81% of S. cerevisae genes
• A typical DCS block has 27 genes (largest block has 81 genes).
• DCS blocks are separated by ~3 genes on the average.
• In a DCS block 90% of Kw genes have a match in at least 1 of the 2 Sc regions.
• 47 blocks have no duplicated gene.
![Page 74: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/74.jpg)
Duplicate mapping of centromeres
Recognize sister regions solely based on gene order
![Page 75: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/75.jpg)
S. c
ere
visi
ae
S. c
ere
visi
ae
145 blocks cover 88% of genome
Duplicate mapping tiles S. cerevisiae
![Page 76: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/76.jpg)
Whole-genome duplication resolved
Numberof genes
5,000
10,000
WG
D
100Myrstime
Today
5,500
Gen
eL
oss
~500 gained
![Page 77: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/77.jpg)
Overview
Genome correspondence
Chromosome evolution
Genome rearrangements
Sorting by reversals
Genome duplication
Duplicate gene evolution
Duplication and rearrangements
![Page 78: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/78.jpg)
Accelerated gene divergence
• Ohno hypothesized that after duplication, one copy would preserve the original function, and the other copy would be free to diverge. Others argued that both copies would diverge.
• 76 of 457 duplicated gene pairs show accelerated evolution. In 95% of the cases, acceleration was limited to one of the 2 paralogs.
• Deletion of the ancestral paralog is lethal in 18% of the cases.
• Deletion of a derived paralog is never lethal.
![Page 79: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/79.jpg)
Fate of duplicated genes
S. cerevisiae copy 1
S. cerevisiae copy 2
K. waltii
Evidence of accelerated protein divergence ?
• 457 genes kept in two copies, result of selection– Involved in sugar metabolism and fermentation
WGD
![Page 80: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/80.jpg)
Scenarios for rapid gene evolution
One copy faster
Both copies faster
Scer - copy1
Scer - copy2
Kwal
Scer - copy1
Scer - copy2Kwal
Ohno, 1970
Lynch, 2000
20% of duplicated genes show acceleration20% of duplicated genes show acceleration95% of cases: Only one copy faster
![Page 81: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/81.jpg)
Emerging gene functions after duplication
Asymmetric divergence recognize ancestral / derived
Scer - Sir3 (silencing)
Scer - Orc1 (origin of replication)
Kwal - Orc1
4-fold acceleration
Scer - Ski7 (anti-viral defense)
Scer - Hbs1 (translation initiation)
Kwal - Hbs1
3-fold acceleration
• Origin of replication silencing
• Translation initiation anti-viral defense
![Page 82: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/82.jpg)
Distinct functional properties
Gain new function and lose ancestral function
Ancestral function Derived function
Gene deletion
Lethal (20%) Never lethal
![Page 83: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/83.jpg)
Distinct functional properties
Gain new function and lose ancestral function
Ancestral function Derived function
Gene deletion
Lethal (20%) Never lethal
Expression AbundantSpecific
(stress, starvation)
Localization GeneralSpecific
(mitochondrion, spores)
![Page 84: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/84.jpg)
Decelerated evolution
• 60 gene pairs (13% of 457 pairs)– 98% protein identity (all pairs: 55%)– 90% identity in 4fold degenerate sites (all pairs: 41%)
• Not recent duplication– Gene order argues ancestral WGD pairs
Scer copy1
Scer copy2
Kwal
Gene conversion?
![Page 85: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/85.jpg)
Evidence of gene conversion
Periodic gene conversion
K. waltii
YER102W – S. cerevisiae
YBL072C – S. cerevisiae
YER102W – S. cerevisiae
YBL072C – S. cerevisiae
YER102W – S. bayanus
YBL072C – S. bayanus
• Tree root reveals time of duplication– No acceleration in the K. waltii branch– The two genes have recently replaced each other
WGD
A. gossypii
• Branching order reveals gene conversion– Paralogs are closer to each other than to their ortholog– Both S. cerevisiae and S. bayanus show gene conversion
![Page 86: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/86.jpg)
Evolutionary genomics in yeast
• Genome ancestry resolved– Whole-genome duplication– Massive gene loss
• Emergence of new functions– Asymmetric acceleration– Ancestral and derived functions– Repository for buffering mutations
![Page 87: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/87.jpg)
Overview
Genome correspondence
Chromosome evolution
Genome rearrangements
Sorting by reversals
Genome duplication
Duplicate gene evolution
Duplication and rearrangements
![Page 88: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/88.jpg)
Genome duplication in a vertebrate
![Page 89: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/89.jpg)
![Page 90: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/90.jpg)
Mammals: How many WGDs?
![Page 91: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/91.jpg)
How did the pre-duplicated ancestor look like?
• Can we derive the architecture of the current (human and tetraodon genomes) genomes in terms of the common ancestor?
• What was the sequence of rearrangement events after WGD?
![Page 92: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/92.jpg)
• What is the architecture of the ancestral genome?• What is the evolutionary scenario for transforming one genome into the
other?
Unknown ancestor~ 80 million yearsago
Mouse (X chrom.)
Human (X chrom.)
Genome rearrangements
![Page 93: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/93.jpg)
History of Chromosome X
Rat Consortium, Nature, 2004
![Page 94: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/94.jpg)
Reversals
• Blocks represent conserved genes.• In the course of evolution, blocks 1, 2, …9, 10 could be transformed into 1, 2, 3, -8, -7, -6, -5, -4, 9, 10.
1 32
4
10
56
8
9
7
1, 2, 3, 4, 5, 6, 7, 8, 9, 10
![Page 95: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/95.jpg)
Reversals
1 32
4
10
56
8
9
7
1, 2, 3, -8, -7, -6, -5, -4, 9, 10
Blocks represent conserved genes. In the course of evolution, blocks 1,…,10 could be
misread as 1, 2, 3, -8, -7, -6, -5, -4, 9, 10. Evolution: occurred one-two times every million years
on the evolutionary path between human and mouse.
![Page 96: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/96.jpg)
Reversals
1 32
4
10
56
8
9
7
1, 2, 3, -8, -7, -6, -5, -4, 9, 10
The inversion introduced two breakpoints(disruptions in order).
![Page 97: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/97.jpg)
Sorting by reversalsMost parsimonious scenarios
Step 0: 2 -4 -3 5 -8 -7 -6 1Step 1: 2 3 4 5 -8 -7 -6 1Step 2: -5 -4 -3 -2 -8 -7 -6 1Step 3: -5 -4 -3 -2 -1 6 7 8Step 4: g 1 2 3 4 5 6 7 8
The reversal distance is the minimum number of reversals required to transform into g.Here, the reversal distance is d=4.
![Page 98: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/98.jpg)
Breakpoint graph
G(g)
0 2 -4 -3 5 -8 -7 -6 1 9
0b 2a 2b 4b 4a 3b 3a 5a 5b 8b 8a 7b 7a 6b 6a 1a 1b 9a
6 breakpoints (consecutive elements in both genomes disagree)3 adjacencies (consecutive elements in both genomes agree)
DualityTheorem:
reversal distance = #genes + 1 – #cycles + h
where h is rather complicated, but can be computed from breakpoint graph in polynomial time.
Here, reversal distance = 8 + 1 – 5 + 0 + 0 = 4
![Page 99: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/99.jpg)
Breakpoint graph
G(g)
0 2 -4 -3 5 -8 -7 -6 1 9
0b 2a 2b 4b 4a 3b 3a 5a 5b 8b 8a 7b 7a 6b 6a 1a 1b 9a
6 breakpoints (consecutive elements in both genomes disagree)3 adjacencies (consecutive elements in both genomes agree)
Duality Theorem for Sorting by Reversals - simple and imprecise version.
reversal distance = number of elements + 1 – number of cycles
![Page 100: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/100.jpg)
Constructing Breakpoint Graph: Dot Plot
![Page 101: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/101.jpg)
Constructing Breakpoint Graph: Black Path
![Page 102: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/102.jpg)
Constructing Breakpoint Graph: Gray Path
![Page 103: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/103.jpg)
Constructing Breakpoint Graph: Superimposing Two Paths
![Page 104: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/104.jpg)
Constructing Breakpoint Graph: Removing Dot-Plot
![Page 105: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/105.jpg)
Human-mouse breakpoint graph
![Page 106: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/106.jpg)
Constructing Rearrangement Scenarios
![Page 107: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/107.jpg)
Reconstructing pre-duplicated Genome
• WGD of genome R results in perfect duplicated genome R+R
• R+R becomes subject to rearrangements that shuffle genes in R+R and result in some rearranged duplicated genome P
• Problem: reconstruct pre-duplicated genome R from rearranged duplicated genome P.
![Page 108: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/108.jpg)
Genome Halving Problem
• WGD of genome R results in perfect duplicated genome R+R
• R+R becomes subject to rearrangements that shuffle genes in R+R and result in some rearranged duplicated genome P
• Problem: reconstruct pre-duplicated genome R from rearranged duplicated genome P.
• Genome Halving Problem: Given a duplicated genome P, recover the ancestral pre-duplicated genome R minimizing the reversal distance from R+R to P
![Page 109: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/109.jpg)
Illustration
R=+a -d +e -c +b
R+R=+a -d +e -c +b +a -d +e -c +b +a -d +d -a -b +c -e +e -c +b +a -d +d +c -e +e -c +b +a +b +a -d +d -e +e -c -c +b +a +b
P=+a -d +e -d +e -c -c +b +a +b
![Page 110: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/110.jpg)
Recover it!
R = ?? ?? ?? ?? ??
P = +a -d +e -d +e -c -c +b +a +b
![Page 111: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/111.jpg)
Even worse!
R = ?? ?? ?? ?? ??
P = +a -d +e -d +e -c -c +b +a +b
![Page 112: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/112.jpg)
Suppose we somehow figured out what is R: would it help to find d(P,R+R)?
R = +a +c +d +e +b
R+R = +a +c +d +e +b +a +c +d +e +b
P = +a -d +e -d +e -c -c +b +a +b
![Page 113: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/113.jpg)
???
+a +c +d +e +b
+a +c +d +e +b +a +c +d +e +b+a -e -d -c -a -b -e -d -c +b+a -e -d -c +c +d +e +b +a +b+a -e -d -e -d -c +c +b +a +b+a +d +e +d +e -c +c +b +a +b+a +d +e +d +e -c -c +b +a +b+a +d +e -d +e -c -c +b +a +b
+a -d +e -d +e -c -c +b +a +b
![Page 114: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/114.jpg)
HP-theory: reminder
• Transforming signed gene order
+a +b –c
into unsigned gene order
atah btbh chct
• Elements xt and xh are called obverse pair
• t stands for tail and h stands for head
Breakpoint graph is formed by 3 matchings:
obverse matching
black matching (adjacent elements in 1st permutation)
gray matching (adjacent elements in 2nd permutation)
![Page 115: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/115.jpg)
HP-theory: reminder
• Breakpoint graph is formed by obverse, black and gray matchings.
• Every pair of matching forms a collection of alternating cycles: black-gray cycles (#cycles in HP theory) single black-obverse cycle (1st permutation) single gray-obverse cycle (2nd permutation)
reversal distance between two circular permutations =
#elements - #black-gray cycles
![Page 116: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/116.jpg)
Reversal distance between duplicated genomes
• While there exist fast algorithms for computing reversal distance between permutations (i.e., no duplicate genes), the problem of computing reversal distance between genomes with duplicated genes remains unsolved.
• Solution: label different copies of each gene (k! different labelings for a gene with k copies)
• One of these labelings is unavoidably an
optimal labeling corresponding to the optimum rearrangement scenario
• Running time: (k!)n invocations of HP algorithms for a genome with n genes each present in k copies.
![Page 117: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/117.jpg)
Labelings and breakpoint graphs
• Every labeling transforms genomes with duplicated genes into genomes without duplicated genes and enables applications of HP algorithm.
• Every labeling corresponds to a breakpoint graph
• Good labelings correspond to breakpoint graphs with large number of cycles.
• Can we construct a labeling corresponding to a large number of cycles?
![Page 118: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/118.jpg)
Rearrangements in Duplicated Genomes: Challenges.
• Computing d(P,Q). Can we construct a labeling of duplicated genomes P and Q maximizing the number of cycles? NO
![Page 119: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/119.jpg)
Rearrangements in Duplicated Genomes: Challenges.
• Computing d(P,Q). Can we construct a labeling of duplicated genomes P and Q maximizing the number of cycles? NO
• Computing d(P,R+R). Can we construct a labeling of duplicated genomes P and R+R maximizing the number of cycles? NO
![Page 120: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/120.jpg)
Rearrangements in Duplicated Genomes: Challenges.
• Computing d(P,Q). Can we construct a labeling of duplicated genomes P and Q maximizing the number of cycles? NO
• Computing d(P,R+R). Can we construct a labeling of duplicated genomes P and R+R maximizing the number of cycles? NO
• Computing minR d(P,R+R).
![Page 121: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/121.jpg)
Rearrangements in Duplicated Genomes: Challenges.
• Computing d(P,Q). Can we construct a labeling of duplicated genomes P and Q maximizing the number of cycles? NO
• Computing d(P,R+R). Can we construct a labeling of duplicated genomes P and R+R maximizing the number of cycles? NO
• Computing minR d(P,R+R). YES!
![Page 122: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/122.jpg)
Rearrangements in Duplicated Genomes: Challenges.
• Breakpoint graphs are not defined for duplicated genomes.
• Can we generalize the notion of breakpoint graph for the case of duplicated genomes?
• Idea: Explore the connection between de Bruijn graphs and breakpoint graphs.
![Page 123: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/123.jpg)
De Bruijn Graphs
• De Bruijn graph: Given a set of edge-labeled graphs, de Bruijn graph of this set is the result of “gluing” edges with the same label in all graphs in the set.
![Page 124: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/124.jpg)
De Bruijn Graphs
• De Bruijn graph: Given a set of edge-labeled graphs, de Bruijn graph of this set is the result of “gluing” edges with the same label in all graphs in the set.
• Did we see de Bruijn graphs today?
![Page 125: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/125.jpg)
De Bruijn graph of a path
![Page 126: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/126.jpg)
de Bruijn graph of another path
![Page 127: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/127.jpg)
De Bruijn Graphs
• De Bruijn graph: Given a set of edge-labeled graphs, de Bruijn graph of this set is the result of “gluing” edges with the same label in all graphs in the set.
• Did we see de Bruijn graphs today?
![Page 128: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/128.jpg)
De Bruijn Graphs
• De Bruijn graph: Given a set of edge-labeled graphs, de Bruijn graph of this set is the result of “gluing” edges with the same label in all graphs in the set.
• Did we see de Bruijn graphs today?
• Breakpoint graph of permutations P and Q == de Bruijn graph of P-cycle and Q-cycle
![Page 129: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/129.jpg)
De Bruijn Graphs
• Breakpoint graph of permutations P and Q
==
de Bruijn graph of P-cycle and Q-cycle
• Breakpoint graph of any genomes P and Q
(with multiple gene copies)
![Page 130: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/130.jpg)
De Bruijn Graphs
• Breakpoint graph of permutations P and Q
==
de Bruijn graph of P-cycle and Q-cycle
• Breakpoint graph of any genomes P and Q
(with multiple gene copies)
==
de Bruijn graph of P-cycle and Q-cycle
![Page 131: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/131.jpg)
Overview
Genome correspondence
Chromosome evolution
Genome rearrangements
Sorting by reversals
Genome duplication
Duplicate gene evolution
Duplication and rearrangements
![Page 132: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/132.jpg)
![Page 133: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/133.jpg)
Greedy Algorithms And
Genome Rearrangements
![Page 134: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/134.jpg)
Outline
• Transforming Cabbage into Turnip
• Genome Rearrangements
• Sorting By Reversals
• Pancake Flipping Problem
• Greedy Algorithm for Sorting by Reversals
• Approximation Algorithms
• Breakpoints: a Different Face of Greed
![Page 135: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/135.jpg)
Turnip vs Cabbage: Look and Taste Different• Although cabbages and turnips share a recent common ancestor, they look and taste different
![Page 136: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/136.jpg)
Turnip vs Cabbage: Comparing Gene Sequences Yields No Evolutionary Information
![Page 137: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/137.jpg)
Turnip vs Cabbage: Almost Identical mtDNA gene sequences
• In 1980s Jeffrey Palmer studied evolution of plant organelles by comparing mitochondrial genomes of the cabbage and turnip
• 99% similarity between genes• These surprisingly identical gene sequences
differed in gene order• This study helped pave the way to analyzing
genome rearrangements in molecular evolution
![Page 138: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/138.jpg)
Turnip vs Cabbage: Different mtDNA Gene Order
• Gene order comparison:
![Page 139: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/139.jpg)
Turnip vs Cabbage: Different mtDNA Gene Order
• Gene order comparison:
![Page 140: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/140.jpg)
Turnip vs Cabbage: Different mtDNA Gene Order
• Gene order comparison:
![Page 141: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/141.jpg)
Turnip vs Cabbage: Different mtDNA Gene Order
• Gene order comparison:
![Page 142: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/142.jpg)
Turnip vs Cabbage: Different mtDNA Gene Order
• Gene order comparison:
Before
After
Evolution is manifested as the divergence in gene order
![Page 143: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/143.jpg)
Transforming Cabbage into Turnip
![Page 144: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/144.jpg)
• What are the similarity blocks and how to find them?• What is the architecture of the ancestral genome?• What is the evolutionary scenario for transforming one
genome into the other?
Unknown ancestor~ 75 million years ago
Mouse (X chrom.)
Human (X chrom.)
Genome rearrangements
![Page 145: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/145.jpg)
History of Chromosome X
Rat Consortium, Nature, 2004
![Page 146: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/146.jpg)
Reversals
• Blocks represent conserved genes.
1 32
4
10
56
8
9
7
1, 2, 3, 4, 5, 6, 7, 8, 9, 10
![Page 147: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/147.jpg)
Reversals
1 32
4
10
56
8
9
7
1, 2, 3, -8, -7, -6, -5, -4, 9, 10
Blocks represent conserved genes. In the course of evolution or in a clinical context, blocks
1,…,10 could be misread as 1, 2, 3, -8, -7, -6, -5, -4, 9, 10.
![Page 148: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/148.jpg)
Reversals and Breakpoints
1 32
4
10
56
8
9
7
1, 2, 3, -8, -7, -6, -5, -4, 9, 10
The reversion introduced two breakpoints(disruptions in order).
![Page 149: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/149.jpg)
Reversals: Example
5’ ATGCCTGTACTA 3’
3’ TACGGACATGAT 5’
5’ ATGTACAGGCTA 3’
3’ TACATGTCCGAT 5’
Break and Invert
![Page 150: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/150.jpg)
Types of Rearrangements
Reversal1 2 3 4 5 6 1 2 -5 -4 -3 6
Translocation1 2 3 44 5 6
1 2 6 4 5 3
1 2 3 4 5 6
1 2 3 4 5 6
Fusion
Fission
![Page 151: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/151.jpg)
Comparative Genomic Architectures: Mouse vs Human Genome• Humans and mice have
similar genomes, but their genes are ordered differently
• ~245 rearrangements
– Reversals
– Fusions
– Fissions
– Translocation
![Page 152: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/152.jpg)
Waardenburg’s Syndrome: Mouse Provides Insight into Human Genetic Disorder
• Waardenburg’s syndrome is characterized by pigmentary dysphasia• Gene implicated in the disease was linked to human chromosome 2
but it was not clear where exactly it is located on chromosome 2
![Page 153: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/153.jpg)
Waardenburg’s syndrome and splotch mice
• A breed of mice (with splotch gene) had similar symptoms caused by the same type of gene as in humans
• Scientists succeeded in identifying location of gene responsible for disorder in mice
• Finding the gene in mice gives clues to where the same gene is located in humans
![Page 154: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/154.jpg)
Comparative Genomic Architecture of Human and Mouse Genomes
To locate where corresponding gene is in humans, we have to analyze the relative architecture of human and mouse genomes
![Page 155: 6.096 Lecture 10](https://reader034.fdocuments.net/reader034/viewer/2022051001/56814960550346895db6b4ba/html5/thumbnails/155.jpg)