Maximum clique. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical...
-
Upload
stewart-rogers -
Category
Documents
-
view
215 -
download
0
Transcript of Maximum clique. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical...
Maximum clique
1 Introduction
2 Theoretical background Biochemistry/molecular biology
3 Theoretical background computer science
4 History of the field
5 Splicing systems
6 P systems
7 Hairpins
8 Detection techniques
9 Micro technology introduction
10 Microchips and fluidics
11 Self assembly
12 Regulatory networks
13 Molecular motors
14 DNA nanowires
15 Protein computers
16 DNA computing - summery
17 Presentation of essay and discussion
Course outline
NP complete continued
Some problems are undecidable: no computer can
solve them.
e.g. Turing’s “Halting Problem”
Other problems are decidable, but intractable:
as they grow large, we are unable to solve them
in reasonable time
What constitutes “reasonable time”?
tractibility
P = set of problems that can be solved
in polynomial time
NP = set of problems for which a solution
can be verified in polynomial time
P NP
The big question: Does P = NP?
P and NP summary
The NP-Complete problems are an interesting
class of problems whose status is unknown
No polynomial-time algorithm has been
discovered for an NP-Complete problem
No suprapolynomial lower bound has been
proved for any NP-Complete problem, either
Intuitively and informally, what does it mean
for a problem to be NP-Complete?
NP-complete problems
A problem P can be reduced to another problem
Q if any instance of P can be rephrased to an
instance of Q, the solution to which provides
a solution to the instance of P. This
rephrasing is called a transformation
Intuitively: If P reduces in polynomial time
to Q, P is “no harder to solve” than Q
reduction
Though nobody has proven that P != NP, if
you prove a problem NP-Complete, most people
accept that it is probably intractable
Therefore it can be important to prove that
a problem is NP-Complete
Don’t need to come up with an efficient
algorithm
Can instead work on approximation
algorithms
Why prove NP-completenss
What is a clique of a graph G?
Answer: a subset of vertices fully connected to
each other, i.e. a complete subgraph of G
The clique problem: how large is the maximum-
size clique in a graph?
Can we turn this into a decision problem?
Answer: Yes, we call this the k-clique problem
Is the k-clique problem within NP?
clique
What should the reduction do?
Answer: Transform a 3-CNF formula to a
graph, for which a k-clique will exist (for
some k) iff the 3-CNF formula is
satisfiable
clique
The reduction:
Let B = C1 C2 … Ck be a 3-CNF formula with k
clauses, each of which has 3 distinct literals
For each clause put a triple of vertices in the
graph, one for each literal
Put an edge between two vertices if they are in
different triples and their literals are
consistent, meaning not each other’s negation
Run an example:
B = (x y z) (x y z ) (x y z )
clique
Prove the reduction works:
If B has a satisfying assignment, then each
clause has at least one literal (vertex) that
evaluates to 1
Picking one such “true” literal from each clause
gives a set V’ of k vertices. V’ is a clique
(Why?)
If G has a clique V’ of size k, it must contain
one vertex in each clique (Why?)
We can assign 1 to each literal corresponding
with a vertex in V’, without fear of
contradiction
clique
A clique of a graph G=(V,E) is a subgraph C
that is fully-connected (every pair in C has
an edge).
CLIQUE: Given a graph G and an integer K, is
there a clique in G of size at least K?
CLIQUE is in NP: non-deterministically
choose a subset C of size K and check that
every pair in C has an edge in G.
This graph has a clique of size 5
Clique problem, summary
Maximum clique with DNA
Clique
defined as a set of verticesa set of vertices in
which every vertex is connected
to every other vertex by an edge
Maximal clique problem
Given a network containing N
vertices and M edges, how many
vertices are in the largest
clique?
Finding the size of the largest
clique has been proven to be an NP-
complete problem
Introdcution
Step 1 Make the complete data poolcomplete data pool
For a graph with N vertices, each possible
clique is represented by an N-digit binary
number
1: a vertex in the clique
0: a vertex out of the clique
i.e. i.e. clique (4,1,0)binary number 010011
Step 2
Find pairs of vertices in the graph that
are not connected by an edge
(0,2) (0,5) (1,5) (1,3)
The complementary graph
Algorithm
Step 3
Eliminate from the complete data pool all
numbers containing connections in the
complementary graph
xxx1x1 or 1xxxx1 or 1xxx1x or xx1x1x
Step 4
Sort the remaining data pool to find the
data containing the largest number of 1’s
the clique with the largest number of 1’sthe largest number of 1’s
tells us the sizesize of the maximal clique
Algorithm
two DNA sections
bit’s valuebit’s value (Vi) V0~V5 0 bp when Vi =1
10 bp when Vi
=0
position valueposition value (Pi) P0~P6 20 bp
Longest = 610 + 720 = 200bp (000000)
Shortest = 60 + 720 = 140bp (111111)
dsDNA
Construction of DNA molecules
sequence construction - randomly generated
to avoid mispairing, avoid accidental
homologies longer than 4bp
embedded restriction sequencesrestriction sequences within each Vi
=1
POA (parallel overlap assembly)
Construction of DNA molecules
POA (parallel overlap assembly) with 12 oligonucleotides
PiViPi+1 for even i
<Pi+1ViPi> for odd i
P0V0P1 P2V2P3
P4V4P5
<P2V1P1> <P4V3P3>
<P6V5P5>
PCR with P0 and <P6>
as primers (lane2 in fig3)
POA
Construction of DNA molecules
Construction of DNA molecules
Break DNA : internal sequence Vi =1
PCR with P0 and <P6> as primers
broken sting were not amplified
Division of the data pool into two
test tube
t0 : Alf II cut Vo=1
t1 : Spe I cut V2=1
combine t0 and t1 into
test tube t, which did not
contain xxx1x1
Digestion of restriction enzymes
Elimination all strings connected by edges
xxx1x1, 1xxxx1, 1xxx1x, xx1x1x
PCR amplification of remaining data DNA (Fig 3), Lane 5: digestion result
Lane 6: PCR result
Digestion and PCR amplification
Reading the size of the largest clique(s)shortest length : 160bp four vertices
What is the maximal clique?
6C4 = 15, 15 different strings read the answer by molecular cloning
1 insertion the DNA into M13 bacteriophage through site-directed mutagenesis
2 transfection of the mutagenized M13 phase DNA into E.coli
3 cloning
4 DNA extraction and sequencing
Readout
correct answer 111100
Readout
Production of ssDNA during PCR
cannot be cut by restriction enzymes
solution : digestion of the ssDNA with S1
nuclease before restriction digestion
Incomplete cutting by restriction enzymes
repetition of digestion-PCR process
increase the signal-to-noise
discussion - major error
Strengths
high parallelism
Weaknesses
limitation on the number of vertices that
this algorithm can handle
maximum number of vertices with picomole
operations = 27 (36 vertices with
nanomole)
exponential increase in the size of the
pool with the size of the problem
Further scale-up becomes impractical
New algorithms are needed
Discussion - strengths and weaknesses
Rapid and accurate data access is needed
biotin-avidin purification
electrophoresis
DNA cloning
too slow/ too noising
biochip is needed to accelerate readout
Discussion – future direction
Clique in microreactors
all possible solutions{000} {001} {010} {011} {100} {101}{110} {111}
clauses(x=1)^(y=0)^(z=1)
Selection principle
Selection principle
Positive selection
Negative selection
Logical operations
logical NOT operations
Logical operations
a b
logical AND operations
Logical operations
a b
logical OR operations
Logical operations
magnet
Microreactor structure
magnet
Microreactor structure
Selection principle
DNA input and transport principle
6 nodes, 2 initial answers 6
Max: SABCDE=101001
Maximal cliques
A B C D E F
A 1 0 1 0 0 1
B 0 1 1 1 0 0
C 1 1 1 0 1 1
D 0 1 0 1 1 0
E 0 0 1 1 1 0
F 1 0 1 0 0 1
Maximal cliques – connectivity matrix
SA=0
SE=0
SD=0
SC=1SC=0
SB=0
SA=0 SA=1
SF=0 SF=1
Maximal cliques – flow diagram
0xxxxx 00xxxx 0xx0xx 00x0xx 0xxx0x 00xx0x 0xx00x x0x00x 00x00x
0xxxxx 00xxxx 0xx0xx x0x0xx 00x0xx
0xxxxx x0xxxx 00xxxx
0xxxxx xxxxxx
XXXXXX with x={0,1}
SA=0
SA=0
SA=0
SA=0 SE=0
SD=0
SC=1SC=0
SB=0
SA=0 SA=0 SA=1
SA=0 SF=0 SF=1
Maximal cliques – flow diagram
DNA in
DNA out
Optical control
DNA computer design
DNA computer design
DNA computer design
node A node B node C node D
node B A0 B0 B1/Ø
node C A0 C0 C1/Ø B0 C0 C1/Ø
node D A0 D0 D1/Ø B0 D0 D1/Ø C0 D0 D1/Ø
node E A0 E0 E1/Ø B0 E0 E1/Ø C0 E0 E1/Ø D0 E0 E1/Ø
node F A0 F0 F1/Ø B0 F0 F1/Ø C0 F0 F1/Ø D0 F0 F1/Ø E0 F0 F1/Ø
node F
DNA computer design – selection modules
DNA information flow
100 m
Flow separation – laminar flow
100 m
Flow separation – laminar flow
Micro fabrication
Micro fabrication
Micro fabrication
Micro fabrication
Micro fabrication
Micro fabrication
Micro fabrication
DNA computer design – 20 nodes
DNA computer design – 20 nodes
word codes
optical programmability
usage of masks to programme
immobilisation of DNA to paramagnetic beads
hybridisation of DNA-strands
DNA sequence handling
Bead
Capture probe(Vn = 1)
Vn = 1 Vn+1Vn+2
Vn+3Vn-1Vn-2
Vn = 0 Vn+1Vn+2
Vn+3Vn-1Vn-2
3'-ATCGTCGAAGGAATGC-5'5'-TAGCAGCTTCCTTACG-3'
5'-ACACTGTGCTGATCTC-3'
The DNA library
PBS1: 5'-GCCCTAAAGGATCCACGTAAGGTCCTATGC
V0-1: 5'-AACCACCAACCAAACC V0-0: 5'-AAAACGCGGCAACAAG V1-1: 5'-TCAGTCAGGAGAAGTC V1-0: 5'-TCTTGGGTTTCCTGCA V2-1: 5'-TTTTCCCCCACACACA V2-0: 5'-TTGGACCATACGAGGA V3-1: 5'-CGTTCATCTCGATAGC V3-0: 5'-AGAGTCTCACACGACA V4-1: 5'-AAGGACGTACCATTGG V4-0: 5'-CTCTAGTCCCATCTAC V5-1: 5'-CAACGGTTTTATGGCG V5-0: 5'-GCGCAATTTGGTAACC V6-1: 5'-TAGCAGCTTCCTTACG V6-0: 5'-ACACTGTGCTGATCTC V7-1: 5'-CACATGTGTCAGCACT V7-0: 5'-TGTGTGTGCCTACTTG V8-1: 5'-GATGGGATAGAGAGAG V8-0: 5'-AATCCCACCAGTTGAC V9-1: 5'-ATGCAGGAGCGAATCA V9-0: 5'-GCTTGTTCAACCTGGTV10-1: 5'-CCCAGTATGAGATCAG V10-0: 5'-CTGTCCAAGTACGCTAV11-1: 5'-ATCGAGCTTCTCAGAG V11-0: 5'-TGTAGAGGCTAGCGAT
PBS2: 5'-TGGTTTGGCGGCTTTAGAATTCTGTGACAC
The DNA library
DNA hybridisation
100 m
DNA hybridisation
DNA hybridisation
DNA hybridisation
DNA hybridisation
liquid handling DNA computer
robotics
detection system
sorting module
computer control
DNA computer control
3.5mm