DNA Computing: Mathematics with Molecules
Russell DeatonProfessorComp. Sci. & Engr.The University of ArkansasFayetteville, AR [email protected]
What is DNA Computing (DNAC) ?
The use of biological molecules, primarily DNA, DNA analogs, and RNA, for computational purposes.
Why Nucleic Acids?• Density (Adleman, Baum):
– DNA: 1 bit per nm3, 1020 molecules– Video: 1 bit per 1012 nm3
• Efficiency (Adleman)– DNA: 1019 ops / J– Supercomputer: 109 ops / J
• Speed (Adleman):– DNA: 1014 ops per s– Supercomputer: 1012 ops per s
What makes DNAC possible?• Great advances in molecular biology
– PCR (Polymerase Chain Reaction)– DNA Microarrays– New enzymes and proteins– Better understanding of biological molecules
• Ability to produce massive numbers of DNA molecules with specified sequence and size
• DNA molecules interact through template matching reactions
What is a the typical methodology?
• Encoding: Map problem instance onto set of biological molecules and molecular biology protocols
• Molecular Operations: Let molecules react to form potential solutions
• Extraction/Detection: Use protocols to extract result in molecular form
PHYSICAL STRUCTURE OF DNA
Nitrogenous Base
34 Å
MajorGroove
Minor Groove
Central Axis
Sugar-PhosphateBackbone
20 Å5’ C
3’ OH
3’ 0HC 5’
5’
3’
3’
5’
What is an example?
• “Molecular Computation of Solutions to Combinatorial Problems”
• Adleman, Science, v. 266, p. 1021.
Algorithm
• Generate Random Paths through the graph.
• Keep only those paths that begin with vin and end with vout.
• If graph has n vertices, then keep only those paths that enter exactly n vertices.
• Keep only those paths that enter all the vertices at least once.
• In any paths remain, say “Yes”; otherwise, say “No”
INTER-STRAND HYDROGEN BONDING
Adenine Thymine
to Sugar-PhosphateBackbone
to Sugar-PhosphateBackbone
(+) (-)
(+)(-)
Hydrogen Bond
Guanine Cytosine
to Sugar-PhosphateBackbone
to Sugar-PhosphateBackbone
(-) (+)
(+)(-)
(+)(-)
STRAND HYBRIDIZATION
A B
a b
A B
ab
b
B
a
A
HEAT
COOL
ba
A B
OR
100° C
DNA LIGATION
’ ’
’ ’
Ligase Joins 5' phosphateto 3' hydroxyl
’ ’
Encoding0
1
2
‘GCATGGCC
‘AGCTTAGG
‘ATGGCATG
CCGGTCGA’
CCGGTACC’
‘GCATGGCCAGCTTAGG CCGGTCGA’
‘GCATGGCCATGGCATG CCGGTACC’
00 21
Massively Parallel SearchV1
E0->1
V0 V2 V3 V4 V5 V6
E1->2 E2->3 E3->4 E4->5 E5->6
V6
E0->6
V0
V3
E0->3
V0 V2 V3 V4 V5 V6
E3->2 E2->3 E3->4 E4->5 E5->6
V5
E4->5
V4 V1 V2
E5->1 E1->2
Algorithm
• Generate Random Paths through the graph.
• Keep only those paths that begin with vin and end with vout.
• If graph has n vertices, then keep only those paths that enter exactly n vertices.
• Keep only those paths that enter all the vertices at least once.
• In any paths remain, say “Yes”; otherwise, say “No”
DNA Polymerase
POLYMERASE CHAIN
REACTION
Start = V0, Stop = V6V1
E0->1
V0 V2 V3 V4 V5 V6
E1->2 E2->3 E3->4 E4->5 E5->6
V6
E0->6
V0
V3
E0->3
V0 V2 V3 V4 V5 V6
E3->2 E2->3 E3->4 E4->5 E5->6
V5
E4->5
V4 V1 V2
E5->1 E1->2
Algorithm
• Generate Random Paths through the graph.
• Keep only those paths that begin with vin and end with vout.
• If graph has n vertices, then keep only those paths that enter exactly n vertices.
• Keep only those paths that enter all the vertices at least once.
• In any paths remain, say “Yes”; otherwise, say “No”
GEL ELECTROPHORESIS - SIZE SORTING
BufferGel
Electrode
Electrode
Samples
Faster
Slower
Right LengthV1
E0->1
V0 V2 V3 V4 V5 V6
E1->2 E2->3 E3->4 E4->5 E5->6
V6
E0->6
V0
V3
E0->3
V0 V2 V3 V4 V5 V6
E3->2 E2->3 E3->4 E4->5 E5->6
Algorithm
• Generate Random Paths through the graph.
• Keep only those paths that begin with vin and end with vout.
• If graph has n vertices, then keep only those paths that enter exactly n vertices.
• Keep only those paths that enter all the vertices at least once.
• In any paths remain, say “Yes”; otherwise, say “No”
ANTIBODY AFFINITY
CACCATGTGAC
GTGGTACACTG B
PMP
+
Anneal
CACCATGTGAC
GTGGTACACTG B+
CACCATGTGAC
GTGGTACACTG B PMP
Bind
Add oligo withBiotin label
Heat and cool
Add Paramagnetic-Streptavidin
Particles
Isolate with MagnetN
S
Every VertexV1
E0->1
V0 V2 V3 V4 V5 V6
E1->2 E2->3 E3->4 E4->5 E5->6
V3
E0->3
V0 V2 V3 V4 V5 V6
E3->2 E2->3 E3->4 E4->5 E5->6
Algorithm
• Generate Random Paths through the graph.
• Keep only those paths that begin with vin and end with vout.
• If graph has n vertices, then keep only those paths that enter exactly n vertices.
• Keep only those paths that enter all the vertices at least once.
• In any paths remain, say “Yes”; otherwise, say “No”
Hamiltonian PathV1
E0->1
V0 V2 V3 V4 V5 V6
E1->2 E2->3 E3->4 E4->5 E5->6
Mismatches
DNA Word Design
• Importance of Template-Matching Hybridization Reactions in DNA Computing (DNAC)
• Sequence design should implement DNAC architecture.– Planned Hybridizations– Problem Size– Subsequent Processing Reactions
• Designed sequences should minimize unplanned “cross-hybridizations.”
• Consequences of Bad Designs: Errors and Poor Efficiency
DNA Word Design
• Design problem is hard.
• As number of sequences required to represent the problem increases, this constraints increasingly conflicts with the requirement of non-crosshybridization.
• How much of DNA sequence space is available for computation?
Why In Vitro?
• In Vitro Selection and Evolution• PCR as tool for selection• Ability to synthesis huge, random starting
populations• Mutagenesis• Oligos manufactured under conditions for use• Use massive parallelism of DNAC to solve
word design problem
Protocol Outline
• Start with huge population of random sequences with attached primers.
• Anneal rapidly to quench oligos in mismatched configurations.
• Using temperature as a control, melt most mismatched pairs.
• Amplify and purify• Repeat
Experimental Results
Experimental Results
Latest Results
DNA Memories
OverviewInput DNAs
(Unknown Seq.)Sequences Comple-
mentary to Input DNAs
Memory DNA Strands(With the 3’ end Comple-
mentary to the Input DNAs)
New UnknownInput DNAs
Learning Recall Output
Tag1Random
Probe
Separates Memory DNA Strands that Match orPartially Match the
New Inputs from ThoseThat Don’t Match
Labeled Tag Sequence Complements
Learning
• Learning: Information acquired from examples rather than programmed
• Protocol to store input DNAs (possibly of unknown sequence)
• Higher level representation of the input sequences• Not individual sequence memories but whole populations• Clustering of input sequences in vitro• Massively random and parallel copying or sampling
depending on number of inputs and probes
Base-by-Base Amplification
Input DNA
Tag Probe Extension
Sampling
Input DNA
Tag Probe Extension
Energy Surface Manipulation through Learning
Energy
Input Sequence
Energy
Input Sequence
Before Learning After Learning
Tags
• Non-Crosshybridizing Sequences• Convenient for Input/Output in absence of input
sequence information• Manipulate memory without input sequences• Implement DNA2DNA Computations (Landweber
and Lipton, DNA 3)
Recall
• Hybridization to retrieve memories
• Similar sequences patterns matched
• Pattern matching done against whole memory
• Single memory associated with single tags
• Memory composite of output on multiple tags
Experiments
• Test learning and recall with plasmid
• Test of sensitivity in concentration
• Test coverage of input sequence space with:– Plasmids (5k bp)– E. Coli (5M bp)
• Test sequence resolution of protocols
LearningInput 1 is a 3 kb linear DNA
(pBluescript)
Input 2 is a 5 kb linear DNA(x 174)
I1 I2 I1 I2
Input DNAs40
60
100
M1 M2
Memory DNAs
Input 1 Input 2
Stained Gel Blot with Memory 1(pBluescript)
Blot with Memory 2(x 174)
Plasmid inputs learned, similar sequences recalled, and dissimilar not matched.
Recall
Concentration Sensitivity
1. Plasmids digested with Hpa II
2. 1 g pBluescript3. 10ng - 800ng x 1744. Blotted with x 174
memory5. 1% x 174 detected in
background of pBluescript
Input Space Coverage
• Randomly digested input
• Learning on both inputs
• Blots nearly identical
E. coli
1. E. coli digested
2. 219bp fragment of x 174 added
3. Learning with and without fragment
4. Fragment distinguished when learned
Application
Team
• Russell Deaton, University of Arkansas, Computer Science and Engineering
• Junghuei Chen, University of Delaware, Chemistry and Biochemistry
• Hong Bi, University of Delaware, Chemistry and Biochemistry
• Max Garzon, University of Memphis, Computer Science• Harvey Rubin, University of Pennsyvania, School of
Medicine• David Wood, University of Delaware, Computer and
Information Science
Acknowledgement
• This work was supported by the NSF QuBIC program, award number EIA-0130385
Top Related