De Novo Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Schneider.pdf · De Novo Design...
Transcript of De Novo Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Schneider.pdf · De Novo Design...
De Novo Design
Adaptive Systems for Molecular Design
Gisbert Schneider
Beilstein Endowed Chair for Chem- & Bioinformatics
Goethe-University Frankfurt
23 June 2008, (c) G. Schneider
“[..] Thus between A & B immense
gap of relation. C & B the finest
Transmutation of Species
gap of relation. C & B the finest
gradation, B & D rather greater
distinction. Thus genera would be
formed. — bearing relation”
Charles R. Darwin (*1809)
Notebook B: Transmutation of Species (1837-1838), p.36
Source: www.darwin-online.org.uk
23 June 2008, (c) G. Schneider
Voyages to the (un)known
Darwin and Henslow, letters 1831-1860.
The growth of an idea (N. Barlow, ed. 1967).
Voyages of the H.M.S. Beagle
23 June 2008, (c) G. Schneider
Requirements
• Structure sampling
• Structure assessment
• Grow
• Link
• Lattice
• Stochastic
• Primary constraints
(receptor, ligand)
Implementation
De Novo Design Concepts
• Structure assessment
• Search / Optimization Method
(receptor, ligand)
• Secondary constraints
• Depth-first search
• Breadth-first search
• Random search
• Evolutionary Algorithm
• Monte Carlo / Metropolis
• Exhaustive enumeration
• (Free Energy Perturbation)23 June 2008, (c) G. SchneiderSchneider & Fechner (2005) Nat. Rev. Drug Discov. 4, 649.
Global optimum
Local optimum
Fit
ne
ss
Adaptive Walk
in a “Fitness Landscape”
Local optimum
X1X2
23 June 2008, (c) G. Schneider
Schneider & Fechner (2005) Nat. Rev. Drug Discov. 4, 649.
Fechner & Schneider (2007) JCIM 47, 656.
Fit
ne
ssTypes of Fitness Landscapes
Fit
ne
ss
The Dream The Nightmare
23 June 2008, (c) G. Schneider
à Chemical Similarity Principle!
O
O
O OH
Aspirin®
What is a Molecule?
Aspirin®
Increasing “Fuzziness”
• Pharmacophoric representation
on different levels of abstraction
23 June 2008, (c) G. Schneider
Tanrikulu et al. (2007) ChemBioChem 8, 1932.
Renner & Schneider (2004) J. Med. Chem. 47, 4653.
Start
q C
BA
Local Neighborhood Search
Neighborhood N Local
optimumx
• fixed N• variable (“adaptive”) N
23 June 2008, (c) G. Schneider
Local (neighborhood) search
Any-ascent/Stochastic
hill-climbing
Steepest/First-ascent
hill-climbing
Population-based
Local Neighborhood Strategies
Threshold-acceptingSimulated
annealing
Tabu search
Genetic
algorithms
Evolution
strategies
Evolutionary
programming
23 June 2008, (c) G. Schneider
• Structures that undergo adaptation
• Initial structures (starting solutions)
• Fitness measure that evaluates the structures
• Operations to modify the structures
Elements of Artificial Adaptive Systems (Koza, 1992)
• Operations to modify the structures
• State (memory) of the system at each stage
• Method for designating a result
• Method for terminating the process
• Parameters that control the process
23 June 2008, (c) G. Schneider
Start solution
Final
solution
Adaptive
neighborhood
Why “Adaptive”?
neighborhood
Adaptive optimization copes with …
• non-additive fitness functions
• multiple fitness functions
• very large search spaces
• nonlinear system behavior
23 June 2008, (c) G. Schneider
• Genetic algorithms
• Evolution strategies
• Particle Swarm Optimization (PSO)
Learning from Nature
• Particle Swarm Optimization (PSO)
• Ant Colony Optimization (ACO)
23 June 2008, (c) G. Schneider
Ant Colony Optimization
A Simple Combinatorial Case: Peptide Design
T-cell receptor
• Design of novel antigens presented
by MHC I (H-2Kb)
• Length: 8 residues (208 possibilities)
à Neural network ensemble + jury
(PDB: 1HSA, Madden et al. 1993)
à Neural network ensemble + jury
à Artificial ant system
Assay confirms the predictions:
89% correct for binding peptides
95% correct for non-binding peptides
Hiss et al. (2007) PEDS 20, 99.
Schneider et al. (1998) PNAS 95, 12179.
Which Search Strategy Should be Applied?
Global & local search control guided by
• Random parameter variation
• Stepwise systematic parameter variation
• Gradients in the fitness landscape ...
There is no one best method.
Evolutionary algorithms are
• easily implemented
• robust
• able to cope with high-dimensional optimization tasks
23 June 2008, (c) G. Schneider
Principle of Evolutionary Strategies
Step 1: Random searchStep 2 and following:
“Local hill climbing”
Best value
Optimum
Generate a diverse set of parameter
values and hope for a hit
Generate localized distributions of
parameter values
and improve steadily
Best value
à Adaptive neighborhood23 June 2008, (c) G. Schneider
Algorithm of the (1,λ),λ),λ),λ) Evolution Strategy
Initialize parent (xP,sP,FP)
For each generation:
Generate λ variants (xV,sV,FV) around the parent:
sV = abs(sP + G)
Fitness
Stepsize
Variables
(Rechenberg, 1973)
sV = abs(sP + G)
xV = xP + sV• G
Calculate fitness FV
Select best variant according to FV
Set (xP,sP,FP) = (xV,sV,FV)best
)2sin()ln(2),G( jiji π⋅−=
i,j: pseudo-random numbers in ]0,1]
in [-∞,∞]
23 June 2008, (c) G. Schneider
De novo Fragment Assembly
DB
DB
Known drugs and leadsKnown drugs and leads
Set of reactionsSet of reactions
Generate µ initial
parent molecules
Generate λ new moleculesfrom µ parent(s)
Start
R10
R10
R3OHO
R1DB
Stock of building blocks
with reaction labels
Determine fitness
Select µ best molecules
End
No
Yes
Terminate?
R10
NR7
OR4
R1
• Template compound(s)
• Prediction tools• Biochemical assay
• Template compound(s)
• Prediction tools
• Biochemical assay
Gen
era
tio
n l
oo
p
23 June 2008, (c) G. SchneiderSchneider et al. (2000) Angew. Chem. Int. Ed. 39:4130
Operators of Evolutionary Optimization
• Mutation
• Crossover
VariationAdds new information to a population
Exploits information within a population• Crossover
• Selection
Exploits information within a population
23 June 2008, (c) G. Schneider
HN
F
F
O
HN
F
HN
O
HN
Rx1
O
F
F HN
F
Rx1NH O
Rx1
Rx1
N
O
Rx1O
N
N Rx1
O
N
N
HN
HN
O
F
F
Breeding New Molecules
Virtual reaction
Mutation
HN
F
F
O
HN
F
HN
O
N
NHN
C
O
NHN
C
O
HN
Rx1
O
F
F
HN
F
Rx1NH O
Rx1
Rx1
C
O
Rx1NN
N
NHRx1
NH
Rx1C
O
Rx1
C
O
N HN OHN
F
HN
O
F
F
HNC
O N
NHN
Crossover
23 June 2008, (c) G. Schneider
IC50 [µM]
40
50
60
70
80
90
100
Random Search
Evolution Strategy
How Many Iterations? How Many Compounds?
Limited resources:
300 compounds
à There is an optimum!
Trypsin
inhibition
Population size × Generations
1 x
300
2 x
150
3 x
100
5 x
60
10 x
30
15 x
20
20 x
15
30 x
10
37 x
8
50 x
6
60 x
5
75 x
4
100 x
3
150 x
2
0
10
20
30
preferred combinations
à There is an optimum!
23 June 2008, (c) G. SchneiderSchüller & Schneider (2008) JCIM, in press.
ONH
NH
HO
O O
NNH2
NH
ONH
NH
OH
OH
HONH2O OH
ONH
NH
HO
NH
NH2
O
OHO NH2
1IC50 = 22.5 µM
2IC50 = 13.0 µM
6IC50 = inactive
ONH
NH
O O
NNH2
NH
HN
ES PSO
“Molecular Evolution”
Trajectotries through
inactive regions of
chemical space are
possible O O ONH
NHN
O
ONH
NH NH2
NH
O
ONH
NH
O
NH2
NH
N
ONH
NH NH2
NH
N
3IC50 = 47.5 µM
4IC50 = 74.7 µM
5IC50 = 37.2 µM
7IC50 = inactive
8IC50 = 15.1 µM
9IC50 = 54.7 µM
ONH
NH
O
NH
NH2
N
possible
23 June 2008, (c) G. Schneider
Particle Swarm Optimization (PSO)
• Biological motivated optimization paradigm
• Individuals move through a fitness landscape
• Particles can change their movement direction and velocities
• “Social” interactions between individuals of a population• “Social” interactions between individuals of a population
• Each particle knows about its own best position
• All particles know about the overall best position
23 June 2008, (c) G. Schneider
PSO: Update Rule 1 (Standard Type)
Individual
memory
Social
memory
))((2))((2)(v)1(v 21ii txprtxprtt ibii −⋅⋅+−⋅⋅+=+
v: Velocity vector
x: Coordinates of a particle
i: Dimension
t: Time (epochs)
r1 , r2: Random numbers in ]0,1]
pi , pb: Best positions (individual and social)
23 June 2008, (c) G. Schneider
Global optimum
PSO convergence after 100 epochs (Griewangk‘s function)
PSO: Individual and Social Memory
n1 = 1
n2 = 2
n1 = 2
n2 = 1
Higher weight on
the social memory
23 June 2008, (c) G. Schneider
Visualization of Particle Trajectories
• The particle visited several local optima bevor finding
the global optimum
à PSO Demo23 June 2008, (c) G. Schneider
COLIBREE®
Combinatorial Library BreedingCombinatorial Library Breeding
23 June 2008, (c) G. Schneider
Disassembly Rules
Ar Ar Ar
Retain building blocks with
MW < 200 Da
à 7,184 Building Blocks
avg MW = 141 ± 35 Da
23 June 2008, (c) G. SchneiderHartenfeller et al. (2008) Chem. Biol. Drug Des., in press
PSO Progress Over Time
Library size:
40 molecules40 molecules
Fitness:
Distance to Reference
(minimization!)
23 June 2008, (c) G. Schneider
Positive and Negative Design
PPARα(negative weight)
PPARγ(positive weight)
23 June 2008, (c) G. Schneider
Nonlinear
mapping
1
2
3
4
5
SOM projection
Self-Organizing Map (SOM)Kohonen , 1982
5
6
1 2 3 4 5 6 7 8
Original
data space
à SOMMER Demo
• Nonlinear projection of high-dimensional space
• Topology-preserving projection
• Data clustering
• Prediction with probability estimation
23 June 2008, (c) G. Schneider
Pharmacophore Road Map of Chemical Space
Proteases MMPsKinases
Ion channelsGPCRs Unidentified (orphan)
• Topological pharmacophores (CATS descriptor)
• COBRA collection (10,500 drugs and lead compounds)
23 June 2008, (c) G. Schneider
Thrombin inhibitors PPARα modulators
Land Ho! Exploration of Chemical Space
SO H
N
O
O
NH
O
N
NH2H2N
NAPAP
HO
O
O
HN
O
S
N
CF3
GW590735
23 June 2008, (c) G. Schneider
Let’s Be Positive!
De Novo Design of Thrombin Inhibitors
CATS descriptor no. 10:
PP0 =
Top-ranking designed compounds O O
NH
OHN
S
O
O
HN NH2
O
NH
O N
NHH2N
HN
S
O
OSCl
O O
NH
OHN
S
O
O
S
Cl
HN NH2
O O
NH
OHN
S
O
O
HN NH2
O O
NH
OHN
SO
OF S
NHH2N
23 June 2008, (c) G. Schneider
Climbing “Acid hill”:
De Novo Design of PPARαααα Agonists
CATS descriptor no. 19:
DN1 =
HO
O
Top-ranking designed compounds
O
HOO
HN
O
O
Cl
O
HO O
HN O
CF3
O
HO
OHN
O
NS
O
HO
O
HN
O
NS
Cl
CF3
O
CF3
O
HOO
HN
O
O
O
HO O
HN
O
N
O
HOO
HN
O
O
I
HO
O
HO
O
HN
O
NS
CF3
O
HOO
HN
O
NS
S
O
HO
OHN
O
NS
Cl
Cl
23 June 2008, (c) G. Schneider
Conclusions
• Fragment-based design often results in bioactive compounds
• “Novelty” of the designs depends on local convergence
• Adaptive optimization enables “scaffold-hopping”• Adaptive optimization enables “scaffold-hopping”
• Visual inspection / expert knowledge is crucial
23 June 2008, (c) G. Schneider