Physical Mapping of DNA BIO/CS 471 – Algorithms for Bioinformatics.
-
Upload
ethan-hill -
Category
Documents
-
view
216 -
download
0
Transcript of Physical Mapping of DNA BIO/CS 471 – Algorithms for Bioinformatics.
![Page 1: Physical Mapping of DNA BIO/CS 471 – Algorithms for Bioinformatics.](https://reader030.fdocuments.net/reader030/viewer/2022032607/56649ed95503460f94be81c2/html5/thumbnails/1.jpg)
Physical Mapping of DNAPhysical Mapping of DNA
BIO/CS 471 – Algorithms for BioinformaticsBIO/CS 471 – Algorithms for Bioinformatics
![Page 2: Physical Mapping of DNA BIO/CS 471 – Algorithms for Bioinformatics.](https://reader030.fdocuments.net/reader030/viewer/2022032607/56649ed95503460f94be81c2/html5/thumbnails/2.jpg)
Physical Mapping 2
Landmarks on the genomeLandmarks on the genome Identify the order and/or location of sequence
landmarks on the DNA
BamH1 – GGATCC
Restriction Enzyme Digests Hybridization Mapping
y x z w
Probes
Clones
![Page 3: Physical Mapping of DNA BIO/CS 471 – Algorithms for Bioinformatics.](https://reader030.fdocuments.net/reader030/viewer/2022032607/56649ed95503460f94be81c2/html5/thumbnails/3.jpg)
Physical Mapping 3
Producing a map of the genomeProducing a map of the genome
x ij a
m u
z d w f
e
m u z d
![Page 4: Physical Mapping of DNA BIO/CS 471 – Algorithms for Bioinformatics.](https://reader030.fdocuments.net/reader030/viewer/2022032607/56649ed95503460f94be81c2/html5/thumbnails/4.jpg)
Physical Mapping 4
Restriction Fragment MappingRestriction Fragment Mapping
3 8 6 10
4 5 11 7
3 1 5 2 6 3 7
A:
B:
A + B:
![Page 5: Physical Mapping of DNA BIO/CS 471 – Algorithms for Bioinformatics.](https://reader030.fdocuments.net/reader030/viewer/2022032607/56649ed95503460f94be81c2/html5/thumbnails/5.jpg)
Physical Mapping 5
Set PartitioningSet Partitioning The Set Partition Problem
• Input: X = {x1, x2, x3, … xn}
• Output: Partition of X into Y and Z such thatY = Z
This problem is NP Complete Suppose we have X = {3, 9, 6, 5, 1}
• Can we recast the problem as a double digest problem?
![Page 6: Physical Mapping of DNA BIO/CS 471 – Algorithms for Bioinformatics.](https://reader030.fdocuments.net/reader030/viewer/2022032607/56649ed95503460f94be81c2/html5/thumbnails/6.jpg)
Physical Mapping 6
Reduction from Set PartitionReduction from Set Partition X = {3, 9, 6, 5, 1}
1 3 5 6 9
12 12
1 3 5 6 9
19 3 5 6
19 3 5 6
12 12
![Page 7: Physical Mapping of DNA BIO/CS 471 – Algorithms for Bioinformatics.](https://reader030.fdocuments.net/reader030/viewer/2022032607/56649ed95503460f94be81c2/html5/thumbnails/7.jpg)
Physical Mapping 7
NP-CompletenessNP-Completeness Suppose we could solve the Double Digest
Problem in polynomial time…
Instance ofSet Partition Convert*
Instance ofDouble Digest
Solve in polynomial time
*polynomial time conversion
![Page 8: Physical Mapping of DNA BIO/CS 471 – Algorithms for Bioinformatics.](https://reader030.fdocuments.net/reader030/viewer/2022032607/56649ed95503460f94be81c2/html5/thumbnails/8.jpg)
Physical Mapping 8
Hybridization MappingHybridization Mapping The sequence of the
clones remains unknown
The relative order of the probes is identified
The sequence of the probes is known in advance
y x z w
Probes
Clones
![Page 9: Physical Mapping of DNA BIO/CS 471 – Algorithms for Bioinformatics.](https://reader030.fdocuments.net/reader030/viewer/2022032607/56649ed95503460f94be81c2/html5/thumbnails/9.jpg)
Physical Mapping 9
Interval graph representationInterval graph representation
ab
dc
e
b
d
a c e
Becomes a graph coloring problem, which is (you guessed it) NP-Complete
Becomes a graph coloring problem, which is (you guessed it) NP-Complete
![Page 10: Physical Mapping of DNA BIO/CS 471 – Algorithms for Bioinformatics.](https://reader030.fdocuments.net/reader030/viewer/2022032607/56649ed95503460f94be81c2/html5/thumbnails/10.jpg)
Physical Mapping 10
Simplifying AssumptionsSimplifying Assumptions Probes are unique
• Hybridize only once along the target DNA
There are no errors Every probe hybridizes at every possible
position on every possible clone
![Page 11: Physical Mapping of DNA BIO/CS 471 – Algorithms for Bioinformatics.](https://reader030.fdocuments.net/reader030/viewer/2022032607/56649ed95503460f94be81c2/html5/thumbnails/11.jpg)
Physical Mapping 11
Consecutive Ones ProblemConsecutive Ones Problem (C1P) (C1P)
c 1 c 2 c 3 c 4 c 5 c 6 c 7 c 8 c 9
l 1 1 1 0 1 1 0 1 0 1
l 2 0 1 1 1 1 1 1 1 1
l 3 0 1 0 1 1 0 1 0 1
l 4 0 0 1 0 0 0 0 1 0
l 5 0 0 1 0 0 1 0 0 0
l 6 0 0 0 1 0 0 1 0 0
l 7 0 1 0 0 0 0 1 0 0
l 8 0 0 0 1 1 0 0 0 1
Clones:
ProbesRearrange the columns such that all the ones in every row are together:
![Page 12: Physical Mapping of DNA BIO/CS 471 – Algorithms for Bioinformatics.](https://reader030.fdocuments.net/reader030/viewer/2022032607/56649ed95503460f94be81c2/html5/thumbnails/12.jpg)
Physical Mapping 12
An algorithm for An algorithm for C1PC1P
1. Separate the rows (clones) into components
2. Permute the components
3. Merge the permuted components
c 1 c 2 c 3 c 4 c 5 c 6 c 7 c 8 c 9
l 1 1 1 0 1 1 0 1 0 1
l 2 0 1 1 1 1 1 1 1 1
S1 = {1, 2, 4, 5, 7, 9}S2 = {2, 3, 4, 5, 6, 7, 8, 9}
![Page 13: Physical Mapping of DNA BIO/CS 471 – Algorithms for Bioinformatics.](https://reader030.fdocuments.net/reader030/viewer/2022032607/56649ed95503460f94be81c2/html5/thumbnails/13.jpg)
Physical Mapping 13
Partitioning clones into componentsPartitioning clones into components Component graph Gc
• Nodes correspond to clones
• Connect li and lj iff:
ij
ji
ji
SS
SS
SS
3.
2.
1.
![Page 14: Physical Mapping of DNA BIO/CS 471 – Algorithms for Bioinformatics.](https://reader030.fdocuments.net/reader030/viewer/2022032607/56649ed95503460f94be81c2/html5/thumbnails/14.jpg)
Physical Mapping 14
Component GraphComponent Graph
c 1 c 2 c 3 c 4 c 5 c 6 c 7 c 8 c 9
l 1 1 1 0 1 1 0 1 0 1
l 2 0 1 1 1 1 1 1 1 1
l 3 0 1 0 1 1 0 1 0 1
l 4 0 0 1 0 0 0 0 1 0
l 5 0 0 1 0 0 1 0 0 0
l 6 0 0 0 1 0 0 1 0 0
l 7 0 1 0 0 0 0 1 0 0
l 8 0 0 0 1 1 0 0 0 1
l1
l8
l4
l5
l2 l3
l6 l7
Here, connected components are labeled with greek letters.
![Page 15: Physical Mapping of DNA BIO/CS 471 – Algorithms for Bioinformatics.](https://reader030.fdocuments.net/reader030/viewer/2022032607/56649ed95503460f94be81c2/html5/thumbnails/15.jpg)
Physical Mapping 15
Assembling a componentAssembling a component Consider only row 1 of the following:
Placing all of the ones together, we can place columns 2, 7, and 8 in any order
c 1 c 2 c 3 c 4 c 5 c 6 c 7 c 8
l 1 0 1 0 0 0 0 1 1
l 2 0 1 0 0 1 0 1 0
l 3 1 0 0 1 0 0 1 1
… 0 1 1 1 0 …{2, 7, 8} {2, 7, 8} {2, 7, 8}
l1
l1
l2
l3
![Page 16: Physical Mapping of DNA BIO/CS 471 – Algorithms for Bioinformatics.](https://reader030.fdocuments.net/reader030/viewer/2022032607/56649ed95503460f94be81c2/html5/thumbnails/16.jpg)
Physical Mapping 16
Row 2Row 2 Because of the way we have constructed the
component, l2 will have some columns with 1’s where l1 has 1’s, and some where l2 does not.
Shall we place the new 1’s to the right or left? Doesn’t matter because the reverse permutation
is the same answer.
![Page 17: Physical Mapping of DNA BIO/CS 471 – Algorithms for Bioinformatics.](https://reader030.fdocuments.net/reader030/viewer/2022032607/56649ed95503460f94be81c2/html5/thumbnails/17.jpg)
Physical Mapping 17
Adding row 2Adding row 2
Placing column 5 to the left partially resolves the {2, 7, 8} columns
c 1 c 2 c 3 c 4 c 5 c 6 c 7 c 8
l 1 0 1 0 0 0 0 1 1
l 2 0 1 0 0 1 0 1 0
l 3 1 0 0 1 0 0 1 1
… 0 0 1 1 1 0 …
{5} {2, 7} {2, 7} {8}
l1
… 0 1 1 1 0 0 …l2
S1 = {2, 7, 8}S2 = {2, 5, 7}
![Page 18: Physical Mapping of DNA BIO/CS 471 – Algorithms for Bioinformatics.](https://reader030.fdocuments.net/reader030/viewer/2022032607/56649ed95503460f94be81c2/html5/thumbnails/18.jpg)
Physical Mapping 18
Additional RowsAdditional Rows Select a new row k from the component such
that edges (i, j) and (i, k) exist for two already added rows i, and k.
Look at the relationship between i and k, and between i and j to determine if k goes on the same side or the opposite side of i as j.
i
j
k
![Page 19: Physical Mapping of DNA BIO/CS 471 – Algorithms for Bioinformatics.](https://reader030.fdocuments.net/reader030/viewer/2022032607/56649ed95503460f94be81c2/html5/thumbnails/19.jpg)
Physical Mapping 19
DefinitionsDefinitions
Let Place i on the same side as j if
Else, place on the opposite side
yx SSyx
kiijkj ,min
i
j
k
![Page 20: Physical Mapping of DNA BIO/CS 471 – Algorithms for Bioinformatics.](https://reader030.fdocuments.net/reader030/viewer/2022032607/56649ed95503460f94be81c2/html5/thumbnails/20.jpg)
Physical Mapping 20
Placing RowsPlacing Rows
Place k on the same side as j if
Or:
c 1 c 2 c 3 c 4 c 5 c 6 c 7 c 8
l 1 0 1 0 0 0 0 1 1
l 2 0 1 0 0 1 0 1 0
l 3 1 0 0 1 0 0 1 1
i j
k
322131 ,min llllll
1,2min2 So we place l3 on the same side of l2 as l1, which is the right.
l1
l2
l3
i
![Page 21: Physical Mapping of DNA BIO/CS 471 – Algorithms for Bioinformatics.](https://reader030.fdocuments.net/reader030/viewer/2022032607/56649ed95503460f94be81c2/html5/thumbnails/21.jpg)
Physical Mapping 21
Placing rowsPlacing rows
Repeat for every row in the component
… 0 0 1 1 1 0 0 0 …
{5} {2} {7} {8} {1,4} {1,4}
l1
… 0 1 1 1 0 0 0 0 …l2
c 1 c 2 c 3 c 4 c 5 c 6 c 7 c 8
l 1 0 1 0 0 0 0 1 1
l 2 0 1 0 0 1 0 1 0
l 3 1 0 0 1 0 0 1 1
… 0 0 0 1 1 1 1 0 …l3
![Page 22: Physical Mapping of DNA BIO/CS 471 – Algorithms for Bioinformatics.](https://reader030.fdocuments.net/reader030/viewer/2022032607/56649ed95503460f94be81c2/html5/thumbnails/22.jpg)
Physical Mapping 22
Joining ComponentsJoining Components New graph: Gm – the merge graph (directed)
• Nodes are connected components of Gc
• Edge (, ) iff every set in is a subset of a set in c 1 c 2 c 3 c 4 c 5 c 6 c 7 c 8 c 9
l 1 1 1 0 1 1 0 1 0 1
l 2 0 1 1 1 1 1 1 1 1
l 3 0 1 0 1 1 0 1 0 1
l 4 0 0 1 0 0 0 0 1 0
l1
l2
l3
![Page 23: Physical Mapping of DNA BIO/CS 471 – Algorithms for Bioinformatics.](https://reader030.fdocuments.net/reader030/viewer/2022032607/56649ed95503460f94be81c2/html5/thumbnails/23.jpg)
Physical Mapping 23
Constructing Constructing GGmm
c 1 c 2 c 3 c 4 c 5 c 6 c 7 c 8 c 9
l 1 1 1 0 1 1 0 1 0 1
l 2 0 1 1 1 1 1 1 1 1
l 3 0 1 0 1 1 0 1 0 1
l 4 0 0 1 0 0 0 0 1 0
l 5 0 0 1 0 0 1 0 0 0
l 6 0 0 0 1 0 0 1 0 0
l 7 0 1 0 0 0 0 1 0 0
l 8 0 0 0 1 1 0 0 0 1
l1
l8
l4
l5
l2 l3
l6 l7
![Page 24: Physical Mapping of DNA BIO/CS 471 – Algorithms for Bioinformatics.](https://reader030.fdocuments.net/reader030/viewer/2022032607/56649ed95503460f94be81c2/html5/thumbnails/24.jpg)
Physical Mapping 24
Properties of Properties of GGmm
All rows in will share the same disjoint/subset relationship with each row of • Different compenents
disjoint or subset
• Same component shares a 1 column
• That column matches in a row of , then subset, else disjoint
c 1 c 2 c 3 c 4 c 5 c 6 c 7 c 8 c 9
l 1 1 1 0 1 1 0 1 0 1
l 2 0 1 1 1 1 1 1 1 1
l 3 0 1 0 1 1 0 1 0 1
l 4 0 0 1 0 0 0 0 1 0
l 5 0 0 1 0 0 1 0 0 0
l 6 0 0 0 1 0 0 1 0 0
l 7 0 1 0 0 0 0 1 0 0
l 8 0 0 0 1 1 0 0 0 1
![Page 25: Physical Mapping of DNA BIO/CS 471 – Algorithms for Bioinformatics.](https://reader030.fdocuments.net/reader030/viewer/2022032607/56649ed95503460f94be81c2/html5/thumbnails/25.jpg)
Physical Mapping 25
Ordering componentsOrdering components Vertices without
incoming edges: freeze their columns
Process the rest in topological order. is a singleton and a
subset
![Page 26: Physical Mapping of DNA BIO/CS 471 – Algorithms for Bioinformatics.](https://reader030.fdocuments.net/reader030/viewer/2022032607/56649ed95503460f94be81c2/html5/thumbnails/26.jpg)
Physical Mapping 26
Ordering Components (2)Ordering Components (2) Find the leftmost
column with a 1 In the current
assembly, find the rows that contain all ones for that column
Merge the columns