A knowledge-based approach for reaction generation Development, validation and applications Dimitar...

38
A knowledge-based approach for reaction generation Development, validation and applications Dimitar Hristozov, 04.06.2009

Transcript of A knowledge-based approach for reaction generation Development, validation and applications Dimitar...

A knowledge-based approach for reaction

generation

Development, validation and applications

Dimitar Hristozov, 04.06.2009

Motivation

public

reaction

databases

>1,500,000 reactions

covering general organic chemistry

medicinal chemistslab notebooks (eLN)

proprietary

reaction

databases

public data

commercial

reaction

databases

U

large number of reactions per year, strong medicinal

chemistry bias

wealth of reaction data extract some of the knowledge hidden in these data use this knowledge to assist the medicinal chemist suggest new, synthetically feasible molecules with desired bio profile

Reaction vectors

From reaction database to knowledge base

1 2 3 4

Bond C-C C=O C-OH C-OR

# 0 0 -2 2

reactant vector, R = (R1 + R2) product vector, P

reaction vector, D = P - R

OH

O

OHO

O

+

1 2 3 4

Bond C-C C=O C-OH C-OR

# 4 1 2 0

1 2 3 4

Bond C-C C=O C-OH C-OR

# 4 1 0 2

R1 R2 P

Patel, H., Bodkin, M.J., Chen, B., Gillet, V.J.A Knowledge-Based Approach to De Novo Design Using Reaction Vectors, J. Chem. Inf. Model., 2009, ASAP article

From reaction vector to products (I)

The reaction vector, D, equals the difference between the product vector, P, and the reactant vector, R

D = P – R

1 2 3 4

Bond C-C C=O C-OH C-OR

# 4 1 0 2

O O

O

O

O

O

better descriptor

is required

Given a reaction vector, D, and a reactant vector, R, the product vector, P, can be obtained

P = D + R Given a product vector, P, can we reconstruct the

product molecule(s)?

1

2

3

4O5

O6

7

8

Extended atom pairs

atom types atom pairs

No. Symbol n p r Type

4 C 3 1 0 C(3,1,0)

5 O 2 0 0 O(2,0,0)

7 C 2 0 0 C(2,0,0)

Atom Pair Atoms

C(3,1,0)-2(1)-O(2,0,0) 4-5

C(2,0,0)-2(1)-O(2,0,0) 7-5

C(2,0,0)-3-C(3,1,0) 2-4; 7-4

AP2: atoms 1 bond away

AP3: atoms 2 bonds away

n: number of bonds to heavy atoms

p: number of π bonds

r: number of ring memberships

O O

O

O

From reaction vector to products (II)

Atom Pair Count

C(1,0,0)-2(1)-C(2,0,0)

2

C(2,0,0)-2(1)-C(2,0,0)

1

C(2,0,0)-2(1)-C(3,1,0)

1

C(3,1,0)-2(1)-O(2,0,0)

1

C(3,1,0)-2(2)-O(1,1,0)

1

C(2,0,0)-2(1)-O(2,0,0)

1

C(1,0,0)-3-C(2,0,0) 1

C(2,0,0)-3-C(3,1,0) 2

C(2,0,0)-3-O(2,0,0) 1

C(2,0,0)-3-O(1,1,0) 1

O(2,0,0)-3-O(1,1,0) 1

C(1,0,0)-3-O(2,0,0) 1

O

O

C(2,1,0)-2(2)-O(1,1,0)

C(3,1,0)-2(1)-O(2,0,0)

“wrong” or “missing”

atom pairsproduct vector (P = D + R)

C(3,0,0)-2(1)-O(2,0,0)

C(3,1,0)-2(1)-O(2,0,0)

OH

O

OHO

O

+

Reaction vectors in action

+1C(2,1,0)-2(2)-C(1,1,0)-2C(2,0,0)-2(1)-C(2,0,0)

+1C(2,1,0)-2(1)-C(2,0,0)-1C(2,0,0)-2(1)-O(1,0,0)

APs “Gained”APs “Lost”

CC

CC

OH CC

CC

OH CC

CC

CC

New atoms/bonds added using APs gained

Atoms/bonds selected for removal using APs lost

Starting Molecule

Reaction Vector

Product

5

4

3

2

OH1

5

4

3

2

Reaction

Advantages

Does not require manual atom-atom mapping of the reaction centre

Makes use of the synthetic chemistry data collected through the years

Accounts for the synthetic accessibility of the proposed molecules – all transformations are derived from successful reactions

Is fast to apply – no substructure searching is required

Good approach…

so how is it…

implemented?

Optimisation made easy

build as an Eclipse plug-in => 100% Java

KNIME meets Chemaxon

Sketcher

File reader

Reaction generator

Convertor

Multi-objective ranking

File writer

Marvin Views

Looks great…

but does it …

work?

Reproducing reactions

+1C(2,1,0)-2(2)-C(1,1,0)-2C(2,0,0)-2(1)-C(2,0,0)

+1C(2,1,0)-2(1)-C(2,0,0)-1C(2,0,0)-2(1)-O(1,0,0)

APs GainedAPs Lost

+1C(2,1,0)-2(2)-C(1,1,0)-2C(2,0,0)-2(1)-C(2,0,0)

+1C(2,1,0)-2(1)-C(2,0,0)-1C(2,0,0)-2(1)-O(1,0,0)

APs GainedAPs Lost

5,695

diverse

reactionscreate knowledge base

1

for each reaction2 retrieve its reaction vector3

+1C(2,1,0)-2(2)-C(1,1,0)-2C(2,0,0)-2(1)-C(2,0,0)

+1C(2,1,0)-2(1)-C(2,0,0)-1C(2,0,0)-2(1)-O(1,0,0)

APs GainedAPs Lost

+1C(2,1,0)-2(2)-C(1,1,0)-2C(2,0,0)-2(1)-C(2,0,0)

+1C(2,1,0)-2(1)-C(2,0,0)-1C(2,0,0)-2(1)-O(1,0,0)

APs GainedAPs Lost

apply the reaction vector to the starting materials4

+

-H2O

is the product obtained in less than 30 seconds?5

2,902

reaction

vectors

How well did it work?

Products generated for ~90% of the 5,695 reactions

Reproducibility

0102030405060708090

100

product(s) generated no product generated

pe

r ce

nt

How fast did it work?

Execution Times

0

10

20

30

40

50

60

70

80

0.05 0.1 0.5 1 5 10 15 20 25 30 > 30

time / s

pe

r ce

nt

Median run time: 0.015 seconds per reaction

O

O

O

O

O

O

OH

O

Epoxide reduction

reproduced in large variety of environments (350 reactions) only one reaction was not reproduced

Epoxide reduction

O OH

Works like a charm…

O OH NH2O

O

NH2NH2

O

OH+

OH OH

OO O

O

OH O+ +

N

O

F

F

F

NO2

N

NH2

O

F

F

F

OH

O

O O OH O

O+

S

ONH

O O

SO

NH

O O

+ +

More than 95% reproduced successfully

epoxide reduction epoxide formation ester to amide

alcohol dehydration

Friedel-Crafts acylation

nitro reduction

acid to aldehyde nitrile to aldehyde

nitrile hyrdrolysis alcohol amination

aldol condensation alkene oxidation

O

OH

OO

N

O O

O

O

NNH

Br

NN

NH

Br

OH

O

N O

OH

NH

ON O

N

O

+

Still works like a charm…

More than 90% reproduced successfully

olefin metathesis amide reduction ether halogenation

ozonolysis

alkene halogenation

Wittig-Horner

Beckmann rearrangement Claisen rearrangement

Dieckmann condensation olefination

Robinson annulation

N

O

N

O

O

O

NH

OO

NO

O

+

O

O

Cl

ClBr

OH

O

+

O

O

OO

O

O

O

O

Cl

Cl

O

O OHOH

+

P SO

O

O O

OO

F

S

F

O

OPO

O

O

+ +

O

Si

O

OSi+ +

N

SO S

N

OH

N

O

NOH

S

ClN

O

N

O

S

Cl

OH

O

N

O

O

O

O N

O

OO

O+

N

O

N

O

S

O

O

O

OO

S

OH+

variety of environments were tested 79 out of 100 reactions were successfully reproduced 21% of the reactions were not reproduced

mainly condensations (intra- and intermolecular) which result in ring closures

Claisen condensation

O

O

OO

O

O

O

OH+ +

Still works

More than 50% reproduced successfully

A large variety of reactions successfully reproduced Small difficulties with complex cycle formations

improvements are on their way

Cope rearrangement (67% success) hetero Diels-Alder (73% success) Claisen condensation (79% success)

Diels-Alder cycloaddition (49% success) Fischer indole synthesis (57% success)

OH O

N

S

O

CF3

N N

N N

CF3

NN

CF3

NCF3

O

+

O

O

OO

O

O

O

OH+ +

Cl Cl

N+

O

N N+

N N

O

O

N+

N

N N+

O

N+

O

O

Cl

Cl

+N

O

O N

NNH2

N

N

N O+

Wow! Cool! It works!

but what is its…

use?

Generating new molecules

Starting molecule

Can the transformbe applied?

Apply reaction transform

New molecule

Select reaction transform

Is a second reagentrequired?

Select suitable reagent

Discard reaction vector

yes

yes

no

no

Knowledge

base

Reagents

database

rank the proposed new molecules direct the generation towards desired new molecules

Multi-objective de novo design

O

NH

N

O

S

OH

O

O

NH

NO

S

OHO

N

N

Cl

O

NH

NO

S

OHO

NH2

O

NH

NO

S

ClO

Use case one: Lead optimisation

Here is my starting material. What kind of (feasible) one step transformations may I make?

starting molecule: Pencillin G

O

NH

NO

S

OHO

An example from Patel, H., Bodkin, M.J., Chen, B., Gillet, V.J. A Knowledge-Based Approach to De Novo Design Using Reaction Vectors, J. Chem. Inf. Model., 2009, ASAP article

Lead optimisation (cntd.)

O

NH

NO

S

OHO

ClO

NH

NO

S

OHO

N

N

Cl

O

NH

NO

S

OHO

Cl

O

NH

NO

S

OHO

NH2

O

NH

NO

S

OHO

NO2

O

NH

NO

S

OHO

OH

O O

NH

NO

S

OHO

O

NH

NO

S

OHO

SO

ONH2

O

NH

NO

S

OHO

NO2O

NH

NO

S

OHO

D

D

D

D

D

O

NH

NO

S

OHO

O

O

O

NH

NO

S

OHO

S OHO

O

O

NH

NO

S

OHO

O

O

O

NH

NO

S

OHO

OO

NH

NO

S

OHO

SO

O

OH

O

NH

NO

S

OHO

OHO

NH

NO

S

OHO

O

O

NH

NO

S

OHO

Ir O

NH

NO

S

OHO

O

O

NH

NO

S

ClO

O

NH

NO

S

OO

O

NH

NO

S

OHO

NH

NH2

O

NH

NO

S

OHO

NN

O

NH

NO

S

OHO

N

NN

N NH2

O

NH

NO

S

OHO

An example from Patel, H., Bodkin, M.J., Chen, B., Gillet, V.J. A Knowledge-Based Approach to De Novo Design Using Reaction Vectors, J. Chem. Inf. Model., 2009, ASAP article

Penicillin G

Use case two: Synthetic route

I have this (active) fragment. Is there a route from it to the molecule I have in mind?

reproducing known synthetic route – Plavix

Synthetic route from Wang, L. et al., Synthetic Improvements in the Preparation of Clopidogrel, Org. Process Res. Dev., 2007, 11 (3), 487-489

An example from Patel, H., Bodkin, M.J., Chen, B., Gillet, V.J. A Knowledge-Based Approach to De Novo Design Using Reaction Vectors, J. Chem. Inf. Model., 2009, ASAP article

StepNo. applicable

reactionvectors

Total no.products

generated

1 17 158

2 11 123

3 12 124

4 41 386

1 2

3

4

Use case three: Library design

With which of these reagents will my starting material undergo reaction X?

enumerate a library using a single reaction and a number of different reagents

N

O

Br

N

Br BOH OH

BOH OH

R

+ +

An example from Patel, H., Bodkin, M.J., Chen, B., Gillet, V.J. A Knowledge-Based Approach to De Novo Design Using Reaction Vectors, J. Chem. Inf. Model., 2009, ASAP article

starting material

reaction X (X = Suzuki coupling)

628 boronic acids as reagents

Library design (cntd.)

NH

N

O

S

NH

N

O

O

NH

N

O

NH

N

O

Cl

NH

N

O

O

O

NH

N

O

O

Cl

NH

N

O

O

292 products generated

Summary

The reaction vectors offer good way to explore the knowledge hidden inside reaction databases

A variety of chemical reactions can be reproduced with this approach

The method works fast The is applicable in different medicinal chemistry related

scenarios The use of the method is made easy by variety of

KNIME nodes which have been implemented

Acknowledgements

Michael Bodkin for his continuous support both in and outside my daily work

Hina Patel for creating the first prototype which sprung the reaction vectors into live

(http://pubs.acs.org/doi/abs/10.1021/ci800413m)

Dave Evans, Fred Ludlow, Swanand Gore, Dave Thorner, Maria Whatton, Juliette Pradon for many stimulating discussions and for their continuous support

Thank You!

do you have any…

questions, comments, recommendations?