Post on 16-Dec-2015
1
Systems and Synthetic Biology: A programming languages point of view
Saurabh SrivastavaAssistant Research Engineer
Computer Science + Bioengineering
University of California, Berkeley
• Systems biology– Modeling complete biological processes
• Genomics– Reading DNA sequences got incredibly fast, and cheap– Algorithms for sequence data analytics within organisms
• Synthetic biology– DNA synthesis got incredibly cheap– Functional characterization on the way– Algorithms to predict tweaks to organisms
2
One view: biological specifications
One view: syntax of components
One view: semantics of components
In perspective
3
Different scales in biology
Systems Biology operates here
Synthetic Biology operates here
Intracellular processes
Intercellular processes
Protein interactions
Protein function
Organisms
4
Systems Biology operates here
Results and techniques used
Synthetic Biology operates here
Intracellular processes
Intercellular processes
Protein interactions
Protein function
5
Systems Biology operates here
Results and techniques used
Synthetic Biology operates here
Intracellular processes
Intercellular processes
Protein interactions
Protein function
Results:Constructed a bacteria that produces Paracetamol/Depon
Model of cell communication allows understanding cancer
Techniques:Automatically generating concurrent programs using input-output examples
Big data analysis and abstraction
7
Synthesizing Systems Biology “Programs”
Synthesizing concurrent programs from examplesPrograms ≡ biological modelsExamples ≡ biological experiments
We assist natural sciences with formal methods • Given experiments, are there other models?• If so, compute a new, disambiguating experiment
Part I: how stem cells coordinate their fates
8
Understanding Diseases
• “Cancer is fundamentally a disease of failure of regulation of tissue growth. In order for a normal cell to transform into a cancer cell, the genes which regulate cell growth and differentiation must be altered.” – from Wikipedia
• Research on cell differentiation helps understanding diseases such as cancer.
9
C. elegans: A Model Organism
Earthworm used in developmental biology.
959 cells; its organs found in other animals.
Differentiation studied on vulval development.
10
Differentiation and then development
into organ parts
Initial division of embryo
Identical precursor cells collaborate to
decide their fate
11
Modeling Goal
What is the mechanism (program) within each cell for deciding fates through communication?
12
Building Blocks of these Programs
Cells contain communicating proteins.
Protein interaction: a protein senses the concentration of other proteins.
Interaction is either activation or inhibition.
A B A B
13
How the Vulval Cells Differentiate
let-23
lin-1
2
sem-5
let-60
mpk-1
lst
let-23lin
-12
sem-5
let-60
mpk-1
lst
let-23
lin-1
2
sem-5
let-60
mpk-1
lst
AnchorCell
high med low
1º 2º 3º
If cells sense the same signalstrength, data races occur.
...
14
How Biologists Discover Interactions
• Measuring protein levels over time is infeasible
• If such “cell tracing” is infeasible, infer protein interaction from end-to-end experiments
• That is, mutate cells observe resulting fates
15
A Mutation Experiment
let-23
lin-1
2
let-23lin
-12
let-23
lin-1
2
AnchorCell
high med low
1º 2º 3º1º
...
16
Putting Experiments Together
Experiment AC lin-12 lin-15 let-23 lst Fate decisions
1 ON ON ON ON ON {332123}
2 ON OFF ON ON ON {331113}
3 ON ON OFF ON ON {112121,122121, 212121}
...
48 ...
No protein is mutated.
lin-12 is turned off. Multiple outcomes observed
Fate of six neighboring cells
Experiments over 35 years by 11 groups
17
How to Build Accurate Models?
let-23
lin-1
2
sem-5
let-60
mpk-1
lst
let-23lin
-12
sem-5
let-60
mpk-1
lst
let-23
lin-1
2
sem-5
let-60
mpk-1
lst
AnchorCell
high med low
1º 2º 3º
Inhibition discoveredby predictive modeling
[Fisher et al. 2007]
...
18
Semantics of the Modeling Language
Cell 1 Cell 2• Program has cells• Non-deterministic outcomes
via schedule interleaving
let-23
lin-1
2
sem-5
let-60
mpk-1
lst
• Cell has proteins• All proteins advance
synchronously
• Proteins have discrete state and update functions.
20
Synthesis of Programs
specification
biologicalinsight
synthesizer completedprogram
Experiment AC lin-12 lin-15 let-23 lst Fate decisions
1 ON ON ON ON ON {332123}
2 ON OFF ON ON ON {331113}
3 ON ON OFF ON ON {112121,122121, 212121}
...
Given as a partial program
21
Partial ProgramsPartial programs express biological insight:• Which proteins are in the cell• Which proteins may interact
Update functions can be unknown.
let-23
lin-1
2
sem-5
let-60
mpk-1
lst
??
23
Classical CEGIS
synthesizer
initial input/output setcandidatesolution
SAT
add input-outputcounterexample
SAT UNSATUNSAT
verifier
24
Correctness Condition
Safety: all schedules must lead the program to produce experiment outcomes observed in the wet lab.
∀mutation m. schedule s. P(m, s) E(m)∀ ∈
Completeness: each observed experiment outcome must be reproducible by the program for some schedule.
∀mutation m. fate f E(m). schedule s. P(m, s) = f∀ ∈ ∃
Experiment AC lin-12 lin-15 let-23 lst Fate decisions
1 ON ON ON ON ON {332123}
2 ON OFF ON ON ON {331113}
3 ON ON OFF ON ON {112121, 122121, 212121}
...
25
Counterexample-Guided Inductive Synthesis
inductivesynthesizer
safetyverifier
counterexample:execution P(m, s)
with bad outcome
counterexample:observation (m, f)not reproducible
completenessverifier
candidatecompletion
initial set ofinput/output examples
no candidatecompletion
ok
26
Verifying for Safety
• Safety: ∀mutation m. schedule s. P(m, s) E(m)∀ ∈
• Attempt to disprove by searching for a demonic schedule:∃mutation m. schedule s. P(m, s) E(m)∃ ∉
Unroll over the set ofperformed experiments
Search symbolically fora demonic schedule
27
Verifying for Completeness
• Completeness:∀mutation m. fate f E(m). schedule s. P(m, s) = f∀ ∈ ∃
• Attempt to disprove by showing lack of an angelic schedule for some outcome:∃mutation m. fate f E(m). schedule s. P(m, s) ≠ f∃ ∈ ∀
Unroll over pairs ofmutation and fate
Query symbolically foran angelic schedule
28
Synthesized Models
• We synthesized two models of VPCs.• Input: Partial model that specifies known,
simple protein behaviors.• Output: Synthesized update functions for two
key proteins.
model 1 model 2
29
Additional Algorithms for Going Beyond Synthesis
to Assist Scientists
Does there exist alternative model that
differs on new experiment?
For two or more models within the space, do there
exist disambiguating experiments?
What are the minimal number of experiments that constrain the space
to current model?
• Microbial chemical factories– Sustainable bacterial production of chemicals. E.g., drugs, polymers
• Bacteria as a sensor– Agricultural apps: e.g., sense nitrogen depletion in soil, change color
• Tumor killing bacteria– Sense multiple environmental factors (lower oxygen, high lactic acid)– Invade cancerous cell– Release drug inside cell
31
Synthetic Biology inserts DNAApplications enabled
Does it actually work?
• Jay D. Keasling
– Artemisinin: from Artemisia annua to Yeast– Amyris company: 219M incidence, 300M cure target– 8 years of work; manual insights
• Computationally predicted– Our tylenol E. coli strain Sugar
Tyenol
35
Sugar
Acetaminophen Tylenol
Bio repositoriesData dedup/correlation
+Search
DNA synthesis+ Plasmid
Transformation
Abstractions or rules from data
PolymersNylon etc.Biofuels
Sugar
Opportunity
Current status
+
Rule application to predict
36
Tylenol4-aminophenol4-aminobenzoate
Chorismatepathway
4ABH genefrom Mushroom
Prediction for Tylenol
38
Abstractions or rules from data
PolymersNylon etc.Biofuels
Sugar
Opportunity
Rule application to predict
Biochemical rules/abstractions?
fGraph
transformation function
Operator taking (chemical) graph and transforming it
f’
f’’
fclass* * * *
Open problem 1: Language for chemicals and transforms
C1=CC=CC=C1
3D 2D 1D
XYZ, CML, PDB SMILES, SYBYL
Good for crystallographers
Good for biochemists
Good for computational
storage, retrieval
Alternative new representation
• Transformation representation– encmolecule
– Be able to efficiently compute fclass(encmolecule)
• Conflicting objectives:– Trace-based encoding
• Difficulty at cycle cuts
– True graph encoding• Subgraph isomorphism
Need midway encoding
* * * *
Applying reaction operators
We can now predict new edges: I.e., external enzymes or even new constructed enzymes
Open problem 2: Deriving biochemical programs
• fclass(encmolecule) is a single step
• Function composition to get “pathway programs”– fclass0
(fclass1 (fclass2 (fclass3 (encmolecule))))
• Complications– Path can be “acyclic” not just straight-line– Mixed concrete and rule instantiation
Crux of the research problemIntelligent rule instantiation: ------ edges
Given target
chemical
Phrase it as a model checking reachability problem: initial experiments point to a need for more efficient representation of search space.
Need better search methods
Arbekacin(semi-synthetic)MRSA Drug$20,900/gram
Amikacin (natural)Nothing interesting$118/gram
Eventual targets
Arbekacin(semi-synthetic)MRSA Drug$20,900/gram
Modular decomposition through function summaries
Amikacin (natural)Nothing interesting$118/gram
49
http://pathway.berkeley.edu
Sugar
Acetaminophen Tylenol
Bio repositoriesData dedup/correlation
+Search
DNA synthesis+ Plasmid
Transformation
Chemical transformation functions from data
Language for biochemistry
Synthesis of biochemical program (i.e., DNA modifications)
Synthesis of internals of transformation functionPolymersNylon etc.Biofuels
Sugar
Opportunity
Current status
+