De Novo Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Schneider.pdf · De Novo Design...

40
De Novo Design Adaptive Systems forMolecularDesign Gisbert Schneider Beilstein EndowedChairforChem-& Bioinformatics Goethe-University Frankfurt 23 June 2008, (c) G. Schneider

Transcript of De Novo Design - Unistrainfochim.u-strasbg.fr/CS3/program/material/Schneider.pdf · De Novo Design...

De Novo Design

Adaptive Systems for Molecular Design

Gisbert Schneider

Beilstein Endowed Chair for Chem- & Bioinformatics

Goethe-University Frankfurt

23 June 2008, (c) G. Schneider

“[..] Thus between A & B immense

gap of relation. C & B the finest

Transmutation of Species

gap of relation. C & B the finest

gradation, B & D rather greater

distinction. Thus genera would be

formed. — bearing relation”

Charles R. Darwin (*1809)

Notebook B: Transmutation of Species (1837-1838), p.36

Source: www.darwin-online.org.uk

23 June 2008, (c) G. Schneider

Voyages to the (un)known

Darwin and Henslow, letters 1831-1860.

The growth of an idea (N. Barlow, ed. 1967).

Voyages of the H.M.S. Beagle

23 June 2008, (c) G. Schneider

Requirements

• Structure sampling

• Structure assessment

• Grow

• Link

• Lattice

• Stochastic

• Primary constraints

(receptor, ligand)

Implementation

De Novo Design Concepts

• Structure assessment

• Search / Optimization Method

(receptor, ligand)

• Secondary constraints

• Depth-first search

• Breadth-first search

• Random search

• Evolutionary Algorithm

• Monte Carlo / Metropolis

• Exhaustive enumeration

• (Free Energy Perturbation)23 June 2008, (c) G. SchneiderSchneider & Fechner (2005) Nat. Rev. Drug Discov. 4, 649.

Global optimum

Local optimum

Fit

ne

ss

Adaptive Walk

in a “Fitness Landscape”

Local optimum

X1X2

23 June 2008, (c) G. Schneider

Schneider & Fechner (2005) Nat. Rev. Drug Discov. 4, 649.

Fechner & Schneider (2007) JCIM 47, 656.

Fit

ne

ssTypes of Fitness Landscapes

Fit

ne

ss

The Dream The Nightmare

23 June 2008, (c) G. Schneider

à Chemical Similarity Principle!

O

O

O OH

Aspirin®

What is a Molecule?

Aspirin®

Increasing “Fuzziness”

• Pharmacophoric representation

on different levels of abstraction

23 June 2008, (c) G. Schneider

Tanrikulu et al. (2007) ChemBioChem 8, 1932.

Renner & Schneider (2004) J. Med. Chem. 47, 4653.

Start

q C

BA

Local Neighborhood Search

Neighborhood N Local

optimumx

• fixed N• variable (“adaptive”) N

23 June 2008, (c) G. Schneider

Local (neighborhood) search

Any-ascent/Stochastic

hill-climbing

Steepest/First-ascent

hill-climbing

Population-based

Local Neighborhood Strategies

Threshold-acceptingSimulated

annealing

Tabu search

Genetic

algorithms

Evolution

strategies

Evolutionary

programming

23 June 2008, (c) G. Schneider

• Structures that undergo adaptation

• Initial structures (starting solutions)

• Fitness measure that evaluates the structures

• Operations to modify the structures

Elements of Artificial Adaptive Systems (Koza, 1992)

• Operations to modify the structures

• State (memory) of the system at each stage

• Method for designating a result

• Method for terminating the process

• Parameters that control the process

23 June 2008, (c) G. Schneider

Start solution

Final

solution

Adaptive

neighborhood

Why “Adaptive”?

neighborhood

Adaptive optimization copes with …

• non-additive fitness functions

• multiple fitness functions

• very large search spaces

• nonlinear system behavior

23 June 2008, (c) G. Schneider

• Genetic algorithms

• Evolution strategies

• Particle Swarm Optimization (PSO)

Learning from Nature

• Particle Swarm Optimization (PSO)

• Ant Colony Optimization (ACO)

23 June 2008, (c) G. Schneider

Ant Colony Optimization

A Simple Combinatorial Case: Peptide Design

T-cell receptor

• Design of novel antigens presented

by MHC I (H-2Kb)

• Length: 8 residues (208 possibilities)

à Neural network ensemble + jury

(PDB: 1HSA, Madden et al. 1993)

à Neural network ensemble + jury

à Artificial ant system

Assay confirms the predictions:

89% correct for binding peptides

95% correct for non-binding peptides

Hiss et al. (2007) PEDS 20, 99.

Schneider et al. (1998) PNAS 95, 12179.

Ant System for Combinatorial Design

23 June 2008, (c) G. Schneider

Which Search Strategy Should be Applied?

Global & local search control guided by

• Random parameter variation

• Stepwise systematic parameter variation

• Gradients in the fitness landscape ...

There is no one best method.

Evolutionary algorithms are

• easily implemented

• robust

• able to cope with high-dimensional optimization tasks

23 June 2008, (c) G. Schneider

Principle of Evolutionary Strategies

Step 1: Random searchStep 2 and following:

“Local hill climbing”

Best value

Optimum

Generate a diverse set of parameter

values and hope for a hit

Generate localized distributions of

parameter values

and improve steadily

Best value

à Adaptive neighborhood23 June 2008, (c) G. Schneider

Algorithm of the (1,λ),λ),λ),λ) Evolution Strategy

Initialize parent (xP,sP,FP)

For each generation:

Generate λ variants (xV,sV,FV) around the parent:

sV = abs(sP + G)

Fitness

Stepsize

Variables

(Rechenberg, 1973)

sV = abs(sP + G)

xV = xP + sV• G

Calculate fitness FV

Select best variant according to FV

Set (xP,sP,FP) = (xV,sV,FV)best

)2sin()ln(2),G( jiji π⋅−=

i,j: pseudo-random numbers in ]0,1]

in [-∞,∞]

23 June 2008, (c) G. Schneider

De novo Fragment Assembly

DB

DB

Known drugs and leadsKnown drugs and leads

Set of reactionsSet of reactions

Generate µ initial

parent molecules

Generate λ new moleculesfrom µ parent(s)

Start

R10

R10

R3OHO

R1DB

Stock of building blocks

with reaction labels

Determine fitness

Select µ best molecules

End

No

Yes

Terminate?

R10

NR7

OR4

R1

• Template compound(s)

• Prediction tools• Biochemical assay

• Template compound(s)

• Prediction tools

• Biochemical assay

Gen

era

tio

n l

oo

p

23 June 2008, (c) G. SchneiderSchneider et al. (2000) Angew. Chem. Int. Ed. 39:4130

Operators of Evolutionary Optimization

• Mutation

• Crossover

VariationAdds new information to a population

Exploits information within a population• Crossover

• Selection

Exploits information within a population

23 June 2008, (c) G. Schneider

HN

F

F

O

HN

F

HN

O

HN

Rx1

O

F

F HN

F

Rx1NH O

Rx1

Rx1

N

O

Rx1O

N

N Rx1

O

N

N

HN

HN

O

F

F

Breeding New Molecules

Virtual reaction

Mutation

HN

F

F

O

HN

F

HN

O

N

NHN

C

O

NHN

C

O

HN

Rx1

O

F

F

HN

F

Rx1NH O

Rx1

Rx1

C

O

Rx1NN

N

NHRx1

NH

Rx1C

O

Rx1

C

O

N HN OHN

F

HN

O

F

F

HNC

O N

NHN

Crossover

23 June 2008, (c) G. Schneider

IC50 [µM]

40

50

60

70

80

90

100

Random Search

Evolution Strategy

How Many Iterations? How Many Compounds?

Limited resources:

300 compounds

à There is an optimum!

Trypsin

inhibition

Population size × Generations

1 x

300

2 x

150

3 x

100

5 x

60

10 x

30

15 x

20

20 x

15

30 x

10

37 x

8

50 x

6

60 x

5

75 x

4

100 x

3

150 x

2

0

10

20

30

preferred combinations

à There is an optimum!

23 June 2008, (c) G. SchneiderSchüller & Schneider (2008) JCIM, in press.

ONH

NH

HO

O O

NNH2

NH

ONH

NH

OH

OH

HONH2O OH

ONH

NH

HO

NH

NH2

O

OHO NH2

1IC50 = 22.5 µM

2IC50 = 13.0 µM

6IC50 = inactive

ONH

NH

O O

NNH2

NH

HN

ES PSO

“Molecular Evolution”

Trajectotries through

inactive regions of

chemical space are

possible O O ONH

NHN

O

ONH

NH NH2

NH

O

ONH

NH

O

NH2

NH

N

ONH

NH NH2

NH

N

3IC50 = 47.5 µM

4IC50 = 74.7 µM

5IC50 = 37.2 µM

7IC50 = inactive

8IC50 = 15.1 µM

9IC50 = 54.7 µM

ONH

NH

O

NH

NH2

N

possible

23 June 2008, (c) G. Schneider

Particle Swarm Optimization (PSO)

• Biological motivated optimization paradigm

• Individuals move through a fitness landscape

• Particles can change their movement direction and velocities

• “Social” interactions between individuals of a population• “Social” interactions between individuals of a population

• Each particle knows about its own best position

• All particles know about the overall best position

23 June 2008, (c) G. Schneider

Particle Swarm Optimization (PSO)

T. Krink, University of Aarhus

PSO: Update Rule 1 (Standard Type)

Individual

memory

Social

memory

))((2))((2)(v)1(v 21ii txprtxprtt ibii −⋅⋅+−⋅⋅+=+

v: Velocity vector

x: Coordinates of a particle

i: Dimension

t: Time (epochs)

r1 , r2: Random numbers in ]0,1]

pi , pb: Best positions (individual and social)

23 June 2008, (c) G. Schneider

Global optimum

PSO convergence after 100 epochs (Griewangk‘s function)

PSO: Individual and Social Memory

n1 = 1

n2 = 2

n1 = 2

n2 = 1

Higher weight on

the social memory

23 June 2008, (c) G. Schneider

Visualization of Particle Trajectories

• The particle visited several local optima bevor finding

the global optimum

à PSO Demo23 June 2008, (c) G. Schneider

COLIBREE®

Combinatorial Library BreedingCombinatorial Library Breeding

23 June 2008, (c) G. Schneider

Disassembly Rules

Ar Ar Ar

Retain building blocks with

MW < 200 Da

à 7,184 Building Blocks

avg MW = 141 ± 35 Da

23 June 2008, (c) G. SchneiderHartenfeller et al. (2008) Chem. Biol. Drug Des., in press

Linker Library

23 June 2008, (c) G. Schneider

Molecule Dissection

23 June 2008, (c) G. Schneider

PSO Progress Over Time

Library size:

40 molecules40 molecules

Fitness:

Distance to Reference

(minimization!)

23 June 2008, (c) G. Schneider

Positive and Negative Design

PPARα(negative weight)

PPARγ(positive weight)

23 June 2008, (c) G. Schneider

Nonlinear

mapping

1

2

3

4

5

SOM projection

Self-Organizing Map (SOM)Kohonen , 1982

5

6

1 2 3 4 5 6 7 8

Original

data space

à SOMMER Demo

• Nonlinear projection of high-dimensional space

• Topology-preserving projection

• Data clustering

• Prediction with probability estimation

23 June 2008, (c) G. Schneider

Pharmacophore Road Map of Chemical Space

Proteases MMPsKinases

Ion channelsGPCRs Unidentified (orphan)

• Topological pharmacophores (CATS descriptor)

• COBRA collection (10,500 drugs and lead compounds)

23 June 2008, (c) G. Schneider

Thrombin inhibitors PPARα modulators

Land Ho! Exploration of Chemical Space

SO H

N

O

O

NH

O

N

NH2H2N

NAPAP

HO

O

O

HN

O

S

N

CF3

GW590735

23 June 2008, (c) G. Schneider

Let’s Be Positive!

De Novo Design of Thrombin Inhibitors

CATS descriptor no. 10:

PP0 =

Top-ranking designed compounds O O

NH

OHN

S

O

O

HN NH2

O

NH

O N

NHH2N

HN

S

O

OSCl

O O

NH

OHN

S

O

O

S

Cl

HN NH2

O O

NH

OHN

S

O

O

HN NH2

O O

NH

OHN

SO

OF S

NHH2N

23 June 2008, (c) G. Schneider

Climbing “Acid hill”:

De Novo Design of PPARαααα Agonists

CATS descriptor no. 19:

DN1 =

HO

O

Top-ranking designed compounds

O

HOO

HN

O

O

Cl

O

HO O

HN O

CF3

O

HO

OHN

O

NS

O

HO

O

HN

O

NS

Cl

CF3

O

CF3

O

HOO

HN

O

O

O

HO O

HN

O

N

O

HOO

HN

O

O

I

HO

O

HO

O

HN

O

NS

CF3

O

HOO

HN

O

NS

S

O

HO

OHN

O

NS

Cl

Cl

23 June 2008, (c) G. Schneider

Conclusions

• Fragment-based design often results in bioactive compounds

• “Novelty” of the designs depends on local convergence

• Adaptive optimization enables “scaffold-hopping”• Adaptive optimization enables “scaffold-hopping”

• Visual inspection / expert knowledge is crucial

23 June 2008, (c) G. Schneider

Acknowledgement

Dr. Karl-Heinz Baringhaus

Prof. Manfred Schubert-Zsilavecz

Prof. Holger Stark

Prof. Dieter Steinhilber

Dr. Tanja WeilDr. Tanja Weil

modlab® Team

www.modlab.de