A knowledge-based approach for reaction generation Development, validation and applications Dimitar...
-
Upload
isaac-trujillo -
Category
Documents
-
view
219 -
download
2
Transcript of A knowledge-based approach for reaction generation Development, validation and applications Dimitar...
A knowledge-based approach for reaction
generation
Development, validation and applications
Dimitar Hristozov, 04.06.2009
Motivation
public
reaction
databases
>1,500,000 reactions
covering general organic chemistry
medicinal chemistslab notebooks (eLN)
proprietary
reaction
databases
public data
commercial
reaction
databases
U
large number of reactions per year, strong medicinal
chemistry bias
wealth of reaction data extract some of the knowledge hidden in these data use this knowledge to assist the medicinal chemist suggest new, synthetically feasible molecules with desired bio profile
Reaction vectors
From reaction database to knowledge base
1 2 3 4
Bond C-C C=O C-OH C-OR
# 0 0 -2 2
reactant vector, R = (R1 + R2) product vector, P
reaction vector, D = P - R
OH
O
OHO
O
+
1 2 3 4
Bond C-C C=O C-OH C-OR
# 4 1 2 0
1 2 3 4
Bond C-C C=O C-OH C-OR
# 4 1 0 2
R1 R2 P
Patel, H., Bodkin, M.J., Chen, B., Gillet, V.J.A Knowledge-Based Approach to De Novo Design Using Reaction Vectors, J. Chem. Inf. Model., 2009, ASAP article
From reaction vector to products (I)
The reaction vector, D, equals the difference between the product vector, P, and the reactant vector, R
D = P – R
1 2 3 4
Bond C-C C=O C-OH C-OR
# 4 1 0 2
O O
O
O
O
O
better descriptor
is required
Given a reaction vector, D, and a reactant vector, R, the product vector, P, can be obtained
P = D + R Given a product vector, P, can we reconstruct the
product molecule(s)?
1
2
3
4O5
O6
7
8
Extended atom pairs
atom types atom pairs
No. Symbol n p r Type
4 C 3 1 0 C(3,1,0)
5 O 2 0 0 O(2,0,0)
7 C 2 0 0 C(2,0,0)
Atom Pair Atoms
C(3,1,0)-2(1)-O(2,0,0) 4-5
C(2,0,0)-2(1)-O(2,0,0) 7-5
C(2,0,0)-3-C(3,1,0) 2-4; 7-4
AP2: atoms 1 bond away
AP3: atoms 2 bonds away
n: number of bonds to heavy atoms
p: number of π bonds
r: number of ring memberships
O O
O
O
From reaction vector to products (II)
Atom Pair Count
C(1,0,0)-2(1)-C(2,0,0)
2
C(2,0,0)-2(1)-C(2,0,0)
1
C(2,0,0)-2(1)-C(3,1,0)
1
C(3,1,0)-2(1)-O(2,0,0)
1
C(3,1,0)-2(2)-O(1,1,0)
1
C(2,0,0)-2(1)-O(2,0,0)
1
C(1,0,0)-3-C(2,0,0) 1
C(2,0,0)-3-C(3,1,0) 2
C(2,0,0)-3-O(2,0,0) 1
C(2,0,0)-3-O(1,1,0) 1
O(2,0,0)-3-O(1,1,0) 1
C(1,0,0)-3-O(2,0,0) 1
O
O
C(2,1,0)-2(2)-O(1,1,0)
C(3,1,0)-2(1)-O(2,0,0)
“wrong” or “missing”
atom pairsproduct vector (P = D + R)
C(3,0,0)-2(1)-O(2,0,0)
C(3,1,0)-2(1)-O(2,0,0)
OH
O
OHO
O
+
Reaction vectors in action
+1C(2,1,0)-2(2)-C(1,1,0)-2C(2,0,0)-2(1)-C(2,0,0)
+1C(2,1,0)-2(1)-C(2,0,0)-1C(2,0,0)-2(1)-O(1,0,0)
APs “Gained”APs “Lost”
CC
CC
OH CC
CC
OH CC
CC
CC
New atoms/bonds added using APs gained
Atoms/bonds selected for removal using APs lost
Starting Molecule
Reaction Vector
Product
5
4
3
2
OH1
5
4
3
2
Reaction
Advantages
Does not require manual atom-atom mapping of the reaction centre
Makes use of the synthetic chemistry data collected through the years
Accounts for the synthetic accessibility of the proposed molecules – all transformations are derived from successful reactions
Is fast to apply – no substructure searching is required
Reproducing reactions
+1C(2,1,0)-2(2)-C(1,1,0)-2C(2,0,0)-2(1)-C(2,0,0)
+1C(2,1,0)-2(1)-C(2,0,0)-1C(2,0,0)-2(1)-O(1,0,0)
APs GainedAPs Lost
+1C(2,1,0)-2(2)-C(1,1,0)-2C(2,0,0)-2(1)-C(2,0,0)
+1C(2,1,0)-2(1)-C(2,0,0)-1C(2,0,0)-2(1)-O(1,0,0)
APs GainedAPs Lost
5,695
diverse
reactionscreate knowledge base
1
for each reaction2 retrieve its reaction vector3
+1C(2,1,0)-2(2)-C(1,1,0)-2C(2,0,0)-2(1)-C(2,0,0)
+1C(2,1,0)-2(1)-C(2,0,0)-1C(2,0,0)-2(1)-O(1,0,0)
APs GainedAPs Lost
+1C(2,1,0)-2(2)-C(1,1,0)-2C(2,0,0)-2(1)-C(2,0,0)
+1C(2,1,0)-2(1)-C(2,0,0)-1C(2,0,0)-2(1)-O(1,0,0)
APs GainedAPs Lost
apply the reaction vector to the starting materials4
+
-H2O
is the product obtained in less than 30 seconds?5
2,902
reaction
vectors
How well did it work?
Products generated for ~90% of the 5,695 reactions
Reproducibility
0102030405060708090
100
product(s) generated no product generated
pe
r ce
nt
How fast did it work?
Execution Times
0
10
20
30
40
50
60
70
80
0.05 0.1 0.5 1 5 10 15 20 25 30 > 30
time / s
pe
r ce
nt
Median run time: 0.015 seconds per reaction
O
O
O
O
O
O
OH
O
Epoxide reduction
reproduced in large variety of environments (350 reactions) only one reaction was not reproduced
Epoxide reduction
O OH
Works like a charm…
O OH NH2O
O
NH2NH2
O
OH+
OH OH
OO O
O
OH O+ +
N
O
F
F
F
NO2
N
NH2
O
F
F
F
OH
O
O O OH O
O+
S
ONH
O O
SO
NH
O O
+ +
More than 95% reproduced successfully
epoxide reduction epoxide formation ester to amide
alcohol dehydration
Friedel-Crafts acylation
nitro reduction
acid to aldehyde nitrile to aldehyde
nitrile hyrdrolysis alcohol amination
aldol condensation alkene oxidation
O
OH
OO
N
O O
O
O
NNH
Br
NN
NH
Br
OH
O
N O
OH
NH
ON O
N
O
+
Still works like a charm…
More than 90% reproduced successfully
olefin metathesis amide reduction ether halogenation
ozonolysis
alkene halogenation
Wittig-Horner
Beckmann rearrangement Claisen rearrangement
Dieckmann condensation olefination
Robinson annulation
N
O
N
O
O
O
NH
OO
NO
O
+
O
O
Cl
ClBr
OH
O
+
O
O
OO
O
O
O
O
Cl
Cl
O
O OHOH
+
P SO
O
O O
OO
F
S
F
O
OPO
O
O
+ +
O
Si
O
OSi+ +
N
SO S
N
OH
N
O
NOH
S
ClN
O
N
O
S
Cl
OH
O
N
O
O
O
O N
O
OO
O+
N
O
N
O
S
O
O
O
OO
S
OH+
variety of environments were tested 79 out of 100 reactions were successfully reproduced 21% of the reactions were not reproduced
mainly condensations (intra- and intermolecular) which result in ring closures
Claisen condensation
O
O
OO
O
O
O
OH+ +
Still works
More than 50% reproduced successfully
A large variety of reactions successfully reproduced Small difficulties with complex cycle formations
improvements are on their way
Cope rearrangement (67% success) hetero Diels-Alder (73% success) Claisen condensation (79% success)
Diels-Alder cycloaddition (49% success) Fischer indole synthesis (57% success)
OH O
N
S
O
CF3
N N
N N
CF3
NN
CF3
NCF3
O
+
O
O
OO
O
O
O
OH+ +
Cl Cl
N+
O
N N+
N N
O
O
N+
N
N N+
O
N+
O
O
Cl
Cl
+N
O
O N
NNH2
N
N
N O+
Generating new molecules
Starting molecule
Can the transformbe applied?
Apply reaction transform
New molecule
Select reaction transform
Is a second reagentrequired?
Select suitable reagent
Discard reaction vector
yes
yes
no
no
Knowledge
base
Reagents
database
rank the proposed new molecules direct the generation towards desired new molecules
Multi-objective de novo design
O
NH
N
O
S
OH
O
O
NH
NO
S
OHO
N
N
Cl
O
NH
NO
S
OHO
NH2
O
NH
NO
S
ClO
Use case one: Lead optimisation
Here is my starting material. What kind of (feasible) one step transformations may I make?
starting molecule: Pencillin G
O
NH
NO
S
OHO
An example from Patel, H., Bodkin, M.J., Chen, B., Gillet, V.J. A Knowledge-Based Approach to De Novo Design Using Reaction Vectors, J. Chem. Inf. Model., 2009, ASAP article
Lead optimisation (cntd.)
O
NH
NO
S
OHO
ClO
NH
NO
S
OHO
N
N
Cl
O
NH
NO
S
OHO
Cl
O
NH
NO
S
OHO
NH2
O
NH
NO
S
OHO
NO2
O
NH
NO
S
OHO
OH
O O
NH
NO
S
OHO
O
NH
NO
S
OHO
SO
ONH2
O
NH
NO
S
OHO
NO2O
NH
NO
S
OHO
D
D
D
D
D
O
NH
NO
S
OHO
O
O
O
NH
NO
S
OHO
S OHO
O
O
NH
NO
S
OHO
O
O
O
NH
NO
S
OHO
OO
NH
NO
S
OHO
SO
O
OH
O
NH
NO
S
OHO
OHO
NH
NO
S
OHO
O
O
NH
NO
S
OHO
Ir O
NH
NO
S
OHO
O
O
NH
NO
S
ClO
O
NH
NO
S
OO
O
NH
NO
S
OHO
NH
NH2
O
NH
NO
S
OHO
NN
O
NH
NO
S
OHO
N
NN
N NH2
O
NH
NO
S
OHO
An example from Patel, H., Bodkin, M.J., Chen, B., Gillet, V.J. A Knowledge-Based Approach to De Novo Design Using Reaction Vectors, J. Chem. Inf. Model., 2009, ASAP article
Penicillin G
Use case two: Synthetic route
I have this (active) fragment. Is there a route from it to the molecule I have in mind?
reproducing known synthetic route – Plavix
Synthetic route from Wang, L. et al., Synthetic Improvements in the Preparation of Clopidogrel, Org. Process Res. Dev., 2007, 11 (3), 487-489
An example from Patel, H., Bodkin, M.J., Chen, B., Gillet, V.J. A Knowledge-Based Approach to De Novo Design Using Reaction Vectors, J. Chem. Inf. Model., 2009, ASAP article
StepNo. applicable
reactionvectors
Total no.products
generated
1 17 158
2 11 123
3 12 124
4 41 386
1 2
3
4
Use case three: Library design
With which of these reagents will my starting material undergo reaction X?
enumerate a library using a single reaction and a number of different reagents
N
O
Br
N
Br BOH OH
BOH OH
R
+ +
An example from Patel, H., Bodkin, M.J., Chen, B., Gillet, V.J. A Knowledge-Based Approach to De Novo Design Using Reaction Vectors, J. Chem. Inf. Model., 2009, ASAP article
starting material
reaction X (X = Suzuki coupling)
628 boronic acids as reagents
Library design (cntd.)
NH
N
O
S
NH
N
O
O
NH
N
O
NH
N
O
Cl
NH
N
O
O
O
NH
N
O
O
Cl
NH
N
O
O
292 products generated
Summary
The reaction vectors offer good way to explore the knowledge hidden inside reaction databases
A variety of chemical reactions can be reproduced with this approach
The method works fast The is applicable in different medicinal chemistry related
scenarios The use of the method is made easy by variety of
KNIME nodes which have been implemented
Acknowledgements
Michael Bodkin for his continuous support both in and outside my daily work
Hina Patel for creating the first prototype which sprung the reaction vectors into live
(http://pubs.acs.org/doi/abs/10.1021/ci800413m)
Dave Evans, Fred Ludlow, Swanand Gore, Dave Thorner, Maria Whatton, Juliette Pradon for many stimulating discussions and for their continuous support