Docking: Modeling of binding of macromolecules between themselves or with small-molecule ligands....

39
Docking: Modeling of binding of macromolecules between themselves or with small-molecule ligands. Factors that determine specific binding: Phenomenologically: -shape complementarity (lock and key). Basis for early geometry- based docking algorithms -property complementarity; hydrophobic atoms to hydrophobic atoms, hydrogen bond donor to h.b. acceptor, positively charged to negatively charged Physically: -for a stable complex, bound conformation is the global free Docking -CH2-CH2- -CH2-CH2- - - - + + || O H \ -H

Transcript of Docking: Modeling of binding of macromolecules between themselves or with small-molecule ligands....

Page 1: Docking: Modeling of binding of macromolecules between themselves or with small-molecule ligands. Factors that determine specific binding: Phenomenologically:

Docking: Modeling of binding of macromolecules between themselves or with small-molecule ligands.

Factors that determine specific binding:

Phenomenologically: -shape complementarity (lock and key).

Basis for early geometry-based docking algorithms

-property complementarity; hydrophobic atoms to hydrophobic atoms, hydrogen bond donor to h.b. acceptor, positively charged to negatively charged

Physically: -for a stable complex, bound

conformation is the global free energy minimum: maximizing favorable and minimizing unfavorable interactions

Docking

-CH2-CH2-

-CH2-CH2-

--- +

+

||O

H \

-H

Page 2: Docking: Modeling of binding of macromolecules between themselves or with small-molecule ligands. Factors that determine specific binding: Phenomenologically:

Energy function

Fast approximation, but accurate enough - global minimum should correspond to the (near) native conformation.

Potentials derived from:- Molecular mechanics force-fields:

physical terms, parameters based on QM and/or experimental physical properties (ECEPP,MMFF etc)

- statistical/phenomenological1. ad-hoc 2. Mean-force: observed statistics of inter-atomic distances + Boltzmann

Docking as a global energy optimization problem

Search algorithm

Quickly locates the global minimum on a (typically) extremely rugged energy landscape

- geometry-based: rigid-body, possibly followed by local minimization

- incremental construction:split in rigid fragments, dock, rebuild from ‘anchors’

- genetic algorithm:‘chromosomes’ of variables, recombination/mutations, Darwinian evolution

- Monte-Carlo (+local minimization)

Page 3: Docking: Modeling of binding of macromolecules between themselves or with small-molecule ligands. Factors that determine specific binding: Phenomenologically:

MCM global optimization procedure

Monte-Carlo minimization:1. Random step: perturb one of the

torsions or the position/orientation of the ligand

2. Local gradient minimization3. Compare the new energy to the

previous value, if improved, accept the new conformation, otherwise apply Metropolis criterion: accept/reject with the probability Exp(-∆E/kT)

4. Go back to step 1

Termination: Adaptive heuristics for optimal MC run length based on ligand size and flexibility.

Page 4: Docking: Modeling of binding of macromolecules between themselves or with small-molecule ligands. Factors that determine specific binding: Phenomenologically:

Fast Grid Protein/Flexible Ligand Docking in ICM

Global energy optimization:-ligand position and internal torsions optimized by stochastic Monte-Carlo search in the framework of Internal Coordinates Mechanics (ICM)

- local gradient minimization after each random move

- ligand is continuously flexible

- receptor represented by pre-calculated grid potentials

- energy terms include ligand internal force-field energy and grid receptor interaction potentials

Page 5: Docking: Modeling of binding of macromolecules between themselves or with small-molecule ligands. Factors that determine specific binding: Phenomenologically:

Grid potentials

Continuously differentiable grid potential using spline interpolation for efficient gradient minimization

Terms:• Van der Waals - steric repulsion and dispersion attraction• Electrostatics• Directional (anisotropic) hydrogen bonding• Hydrophobic interaction

Acceleration:~100 fold faster than explicit receptor.Implicit minor receptor flexibility:smoothing grid potentials, truncating VW repulsion to limit the adverse effect of minor steric clashes (‘soft’ docking). Soft potentials also make minimization more efficient

EmaxVW

E

d

Page 6: Docking: Modeling of binding of macromolecules between themselves or with small-molecule ligands. Factors that determine specific binding: Phenomenologically:

• Internal coordinates• Large radius of convergence• Efficient global energyoptimization algorithm

Applications:Folding, protein modeling,Docking, Virtual Screening

ICM References:• Abagyan, Mazur (1989) • Abagyan et al. (1994) “ICM - a new method for protein modeling..”J. Comp. Chem. 15, 488-506• Abagyan, and Totrov, (1994).“Biased Probability Monte Carlo searches …”J. Mol. Biol. 235, 983-1002.

Method: Internal Coordinate Mechanics (ICM)

Page 7: Docking: Modeling of binding of macromolecules between themselves or with small-molecule ligands. Factors that determine specific binding: Phenomenologically:

Global optimization procedure (2)

Tricks to improve search efficiencyConformational stack: low-energy conformations accumulated, trajectory monitored by comparison with previously found minima.

Multiple start: If the simulation is ‘trapped’ in the vicinity of a certain conformation, or if the energy remains higher than energy of a number of already found conformations, restart from another initial conformation.

‘Grid annealing’: first dock into smoothed grid potentials, than in ‘hard’ exact grids.

‘Reverse’ torsion steps, symmetry, Cartesian relaxation, etc..

Page 8: Docking: Modeling of binding of macromolecules between themselves or with small-molecule ligands. Factors that determine specific binding: Phenomenologically:

Accuracy/Speed of Flexible Ligand Docking

From: Schapira M, Abagyan R, Totrov M. Nuclear hormone receptor targeted virtual screening. J Med Chem. 2003 Jul 3;46(14):3045-59.

PDB RMSD

1a28 0.032

1bsx 0.38

1db1 0.74

1e3g 0.22

1e3k 0.28

1fby 0.37

1fcz 0.84

1fm6 1.39

1fm6 0.78

1fm9 1.72

1i37 0.21

1ilh 2.24

1l2i 0.29

1qkm 0.79

3erd 1.61

3ert 1.41

Large benchmarks 100-300 complexes:

• For a ‘clean’ benchmark (high resolution, good X-ray density for both ligand and binding site, no obvious crystallography errors) typically ~80% of ligand/receptor complexes are reproduced within 2Å RMSD

• For broader benchmarks ~70% within 2Å. >75% within 3Å, ~60% within 1.5Å RMSD, only ~45% within 1Å RMSD. For >85% of complexes, a pose within 2Å is found among top 10 solutions.

Page 9: Docking: Modeling of binding of macromolecules between themselves or with small-molecule ligands. Factors that determine specific binding: Phenomenologically:

Quality of the X-ray structure:-low resolution or NMR-missing residues-missing side-chains atoms-high b-factors-clashes

Special features:-covalent binding-rare residue charge state - protonated Asp or Glu (HIV protease),

deprotonated Cys or Tyr.-coupled ligand/ion binding (kinases - ATP/Mg) -tightly bound water molecules (2-3 hbonds coordination)

‘Druggability’:•binding pocket identification: PocketFinder

Receptor analysis/issues affecting docking

Page 10: Docking: Modeling of binding of macromolecules between themselves or with small-molecule ligands. Factors that determine specific binding: Phenomenologically:

•Conversion of pdb structure to ICM: - hydrogens and missing heavy atoms added- polar hydrogens optimized (possibly including water)- atom types and partial charges assigned

Specific cases may involve:

• regularization/refinement in cases of poor structure quality: idealized amino acid covalent geometry imposed, energy annealing, possibly in the presence of a ligand (site ‘molding’)

• sampling of alternative side-chain conformations, loops

• homology modelling

Receptor preparation

Page 11: Docking: Modeling of binding of macromolecules between themselves or with small-molecule ligands. Factors that determine specific binding: Phenomenologically:

Difficult cases:

• Highly flexible ligands (more than 10-15 torsions)• Shallow pockets• Water-mediated binding• Covalent binding• Poor quality of the receptor structure (low resolution X-ray, NMR, homology models)• Receptor flexibility

Remedies:- longer simulations- include water- constrained docking - explicit receptor docking- multiple receptor structures

Potential pitfalls

Page 12: Docking: Modeling of binding of macromolecules between themselves or with small-molecule ligands. Factors that determine specific binding: Phenomenologically:

Approaches:

• Explicit continuously flexible receptor:- in principle, more comprehensive- slow even for side-chains, very slow for backbone movements- prone to artefacts: dozens of new variables, new local minima- large backbone movements are still generallybeyond reach

• Ensemble of pre-defined receptor conformations - either from multiple experimental structures or from simulations such as side-chain or loop sampling, homology modelling:

- can be fast- any type of movement can be handled- success mostly defined by the quality of the ensemble

Receptor flexibility

Page 13: Docking: Modeling of binding of macromolecules between themselves or with small-molecule ligands. Factors that determine specific binding: Phenomenologically:

‘SCARE’ (Bottegoni et al, JCAMD 2008 A new method for ligand docking to flexible receptors by dual alanine scanning and refinement.)

Observation: typically, steric clashes resolved by induced fit involve 1-2 sidechains

• Grid docking is performed to multiple versions of the binding site generated by systematic replacement of various pairs of sidechains by alanines

• Top-scoring ligand pose from each grid simulation is refined with explicit flexible receptor. Best scoring refined conformation is selected as final answerOn a benchmark of 30 cross-docking pairs, top-ranking near native solution was found in 80% of cases. Protocol takes ~2Hr CPU time

Hybrid grid/explicit protocol: SCARE

Page 14: Docking: Modeling of binding of macromolecules between themselves or with small-molecule ligands. Factors that determine specific binding: Phenomenologically:

Direct incorporation of discrete receptor flexibility in grid-based simulation: • Receptor conformations provided by user in a conformational stack• Displaceable bound water molecules can be included• Potentials are pre-calculated for each receptor conformation• Stored as ‘4D’ grids - 4th dimension is the receptor conformation• During MC simulation, additional type of stochastic step is included: the grid 4D layer switch

Benchmarking recently published: Bottegoni et al J Med Chem 2009 Four-dimensional docking: a fast and accurate account of discrete receptor flexibility in ligand docking. 99 therapeutically relevant proteins and 300 diverse ligands. 77% complexes correctly reproduced. On average 4-fold faster than independent grid docking into all available receptor conformations.

Receptor ensemble docking in ICM: 4D grids

Page 15: Docking: Modeling of binding of macromolecules between themselves or with small-molecule ligands. Factors that determine specific binding: Phenomenologically:
Page 16: Docking: Modeling of binding of macromolecules between themselves or with small-molecule ligands. Factors that determine specific binding: Phenomenologically:

• Virtual ligand screening (VLS) algorithms allow to identify potential novel ligands from databases in silico.

Search large databases (100K-1M or more compounds) and select subsets enriched with hits using:

-2D pharmacophore- similarity measures (fingerprints, Tanimoto)-3D pharmacophore-receptor structure:

no prior ligand knowledge necessarysearch is not biased to known chemistry

Receptor structure based VLS:-dock each ligand in the DB to the receptor

structure-evaluate the quality of fit in the docked

structures to select potential binders.

Results of ICM docking of a virtual library of 200,000 compounds into an FGFR pocket

Virtual Ligand Screening

Page 17: Docking: Modeling of binding of macromolecules between themselves or with small-molecule ligands. Factors that determine specific binding: Phenomenologically:

Measure of VLS efficiency: enrichment factors

Full screening collection Ntotal, containing Atotal active compoundsVLS is used to select Nsel, containing Asel actives

Typically Nsel << Ntotal , but also in real life Nsel<Asel (false positives) and Asel<Atotal (false negatives)

Enrichment factor: (Asel/Nsel)/(Atotal/Ntotal)

Choice of threshold cutoff:

Nsel

Asel

100%

Page 18: Docking: Modeling of binding of macromolecules between themselves or with small-molecule ligands. Factors that determine specific binding: Phenomenologically:

Receptor structure based VLS: Docking+Scoring

Docking:

Find a putative docked conformation for each compound, native-like for the binding ligands.

* Efficient conformational search routine

* Docking potential:

- must be fast

- has to rank top the native-like conformation among many different docking conformations of the same ligand

Scoring:

* One or few conformations per compound are evaluated

* Potential (screening score)

- must rank the binding ligands above large number of chemically diverse inactive compounds

Binding energy?

Page 19: Docking: Modeling of binding of macromolecules between themselves or with small-molecule ligands. Factors that determine specific binding: Phenomenologically:

Binding energy

Ligand/receptor binding energy in solution:

• Van der Waals - favorable, but partially compensated by solvent

• Electrostatics - mostly compensated by solvation, only becomes favorable for charged ligands (salt bridges).

• Hydrogen bonds - mostly compensated by solvation, but major determinant of specificity

• Hydrophobic - often provide most of the affinity

• Strain - unfavorable

• Entropy loss - unfavorable, limits affinity for highly flexible ligands.

Fucose binding protein/fucose (1abf) Antibody/progesterone (1dbb)

pKd=4uM, 8 HB, dASAhp=128A2 pKd=1nM, 2 HB, dASAhp=390A2

Page 20: Docking: Modeling of binding of macromolecules between themselves or with small-molecule ligands. Factors that determine specific binding: Phenomenologically:

Binding energy calculations

Ligand/receptor binding energy in solution - highly complex concept:

Evan der Waals attraction+ Esteric repulsion+ Eel interact+ Eel desolvation + Eh-bond+ Eacc/donor desolvation + Ehydrophobic + Estrain + Eentropy loss

-Multiple opposing (favorable and unfavorable) contributions that largely compensate each other. Accumulation of errors: (-100±5) + (90±5) = -10±7.

-Solvent (water) effects: hydrophobicity, electrostatic desolvation, solute-solvent hydrogen bonds. Explicit water too computationally expensive and not always more accurate than implicit methods (continuum dielectric, surface tension).

-Some contributions extremely sensitive to the accuracy of geometry

Binding energy predictions remain, in general, only qualitatively accurate. Special model fitting for a specific receptor and/or chemotype of ligands is necessary to achieve quantitative agreement with experiment.

Page 21: Docking: Modeling of binding of macromolecules between themselves or with small-molecule ligands. Factors that determine specific binding: Phenomenologically:

Practically, Edocking Escoring EbindingEnergy are different approximations

EbindingEnergy estimates binding energy typically slow, often doesn’t work unless tuned for a specific system

Edocking Escoring are not accurate estimates of binding energy, but:

Edocking discriminates correct bound pose very fast

Escoring discriminates active ligands reasonably fast

Traditional approach: fitting of experimental binding energy values,

no non-binders in the training set.

Goal of VLS : discrimination between binders and non-binders.

ICM Score: Optimize Escoring performance for active ligand discrimination

Energy function optimization

Page 22: Docking: Modeling of binding of macromolecules between themselves or with small-molecule ligands. Factors that determine specific binding: Phenomenologically:

ICM Scoring function

Components are physical terms:

1. Internal force-field energy of the ligand

2. Conformational entropy loss of the ligand

3. Receptor-ligand hydrogen-bond interaction

4. Solvation electrostatic energy change

5. Hydrogen-bond donor/acceptor desolvation

6. Hydrophobic energy

Due to imperfect term evaluation (errors in geometry, charges etc.), to obtain best performance the components have to be adjusted/weighted.

•Evaluation of discrimination potential performance on a benchmark (‘score the score’):

• Five weighting

factors optimized

Page 23: Docking: Modeling of binding of macromolecules between themselves or with small-molecule ligands. Factors that determine specific binding: Phenomenologically:

Training VLS scoring function

Benchmark set generation:

* Diverse set of ligands and receptors * Structures generated by the docking procedure (not X-ray)* Artificial non-binding complexes are included

* Structures of 25 receptors and 75 ligands extracted from high-resolution ( <2Å ) PDB structures of complexes.* 10000 random ligands from ACD added

* Exhaustive cross-docking: all ligands to all receptors.

The score components pre-calculated for all 25*10075= 251625 putative complexes

•Multiple runs of "Amoeba” simplex minimization to ensure convergence

Result: recognition significantly improved

Page 24: Docking: Modeling of binding of macromolecules between themselves or with small-molecule ligands. Factors that determine specific binding: Phenomenologically:

Discrimination of active ligands

Page 25: Docking: Modeling of binding of macromolecules between themselves or with small-molecule ligands. Factors that determine specific binding: Phenomenologically:

Virtual Database Screening Efficiency

Schapira M, Abagyan R, Totrov M. Nuclear hormone receptor targeted virtual screening. J Med Chem. 2003 Jul 3;46(14):3045-59.

•19 structures for 10 nuclear hormone receptors

•One structure used (glucocorticoid receptor) was a homology model

•5000 random molecules from CDL screening collection

•A library of 78 known NR ligands, 3 to 8 per receptor

receptor Enrichment for top1%

AR[+](1E3G) 33 33

AR[+](1I37) 67 50

ERa[+](1L2I) 71 0

ERa[+](3ERD) 71 14

ERb[+](1QKM) 57 14

ERa[-](3ERT) 87 87

GR[+](model) 100 100

PXR[+](1ILH) 0 14

PR[+](1A28) 83 20

PR[+](1E3K) 50 60

PPARg[+](1FM6) 80 10

PPARg[+](1FM9) 40 30

PPARa[+](1K7L) 70 20

PPARd[+](1GWX) 60 20

RXRa[+](1FBY) 100 71

RXRa[+](1FM6) 100 29

RARg[+](1FCZ) 88 89

TRb[+](1BSX) 33 0

VDR[+](1DB1) 100 71

average 68 39

Page 26: Docking: Modeling of binding of macromolecules between themselves or with small-molecule ligands. Factors that determine specific binding: Phenomenologically:

Virtual Database Screening Efficiency

Chen, H, Lyne, PD, Giordanetto, F, Lovell, T and Li, J; On Evaluating Molecular-Docking Methods for Pose Prediction and Enrichment Factors. J Med Chem. J. Chem. Inf. Model. (2005)12 protein targets of therapeutic interest, 17 to 622 active ligands per target, 20000 random compounds

Enrichment factors at 1% of database subsetting

Page 27: Docking: Modeling of binding of macromolecules between themselves or with small-molecule ligands. Factors that determine specific binding: Phenomenologically:

•2D to 3D conversion, type and charge assignment: - using MMFF force field

•Pre-selection of ligands in the database for drug-likeness according to Lipinsky-like criteria:

- size/weight- number of h-bond donors and acceptors- number of flexible torsions (Weber)

•Protonation/charge states for ligands: charge carboxyls, amino groups, possibly generate tautomers. Important, especially for correct scoring. New: pKa prediction will allow automatic protonation of non-trivial chargeable groups.

Ligand pre-processing

Page 28: Docking: Modeling of binding of macromolecules between themselves or with small-molecule ligands. Factors that determine specific binding: Phenomenologically:

Top scoring ~1% - still 1000-5000 compounds. Further tightening of score cutoff typically does not improve hit ratesOptimal way to select 100-500 for experimental validation?

Rational for further filtering: -improve hit rate-diversify hits-ensure desired effect (e.g. inhibition)-achieve specificity with respect to homologous receptors

Consesus scoring: while lowering of primary score cutoff beyond 1% typically does not improve enrichment, unsing a secondary scoring function may further enhance enrichment. Selection according to additional criteria, such as:

-formation of specific h-bonds-contact with certain parts of receptor

Post-processing VLS hit lists

Page 29: Docking: Modeling of binding of macromolecules between themselves or with small-molecule ligands. Factors that determine specific binding: Phenomenologically:

Chemical clustering for improved diversity and hit rates: -top scoring list often dominated by one/few compound families -to diversify final selection, cluster compounds by chemical similarity, select best scoring compounds from each clusterOnce activity is confirmed for a chemotype, other compounds from the same cluster can be tested.

Post-processing VLS hit lists: chem. clustering

Page 30: Docking: Modeling of binding of macromolecules between themselves or with small-molecule ligands. Factors that determine specific binding: Phenomenologically:

Starting point - initial lead compoundIdentify scaffold and variable substituents (R1,R2 etc)Create substituent lists for each Ri positionAssemble a Markush virtual combinatorial library

VLS fully enumerated Markush combinatorial libraryAlternative - Two-step procedure:

1. VLS each Ri with constant small (H?) other positions, select best subset for each Ri.

2. Assemble Markush; VLS full enumeration of this smaller combinatorial library.

For large Ri lists (>~1000 compounds), dramatically larger virtual chemical space can be explored by the two-step procedure.

Lead optimization: screening focused combinatorial library

R1=H, CH3, Ph,…R2=H, CH3, Ph,…R3=H, CH3, Ph,…

Page 31: Docking: Modeling of binding of macromolecules between themselves or with small-molecule ligands. Factors that determine specific binding: Phenomenologically:
Page 32: Docking: Modeling of binding of macromolecules between themselves or with small-molecule ligands. Factors that determine specific binding: Phenomenologically:

Protein - Protein Docking

Page 33: Docking: Modeling of binding of macromolecules between themselves or with small-molecule ligands. Factors that determine specific binding: Phenomenologically:

Protein-Protein Docking in ICM

First demonstrationGlobal Stochastic Free-energy optimization with pseudo-Brownian moves and Biased Probability Monte Carlo (JCC, 1994).

Explicit All Atom docking and flexible side-chain refinementLysozyme-Antibody (Nature SB, 1994)Beta-lactamase/inhibitor docking challenge (1995,96) Grid Docking and refinement

24 known protein-protein complexes (Protein Sci. 2002)

Global Grid Docking and refinement

CAPRI docking competition (on-going, 2003 Proteins)

Page 34: Docking: Modeling of binding of macromolecules between themselves or with small-molecule ligands. Factors that determine specific binding: Phenomenologically:

A faster model: Atoms to Grids

Atoms-to-Grids docking

• One molecule is static (receptor) and is represented by grid potentials• Pros: energy calculation time does not depend on the receptor size; induced fit can be partially approximated by soft grids• Cons: non-symmetrical, some energy terms have to be adapted/simplified for grid representation.

Atoms-to-Atoms docking:•Pros: symmetrical, explicit flexibility can be introduced for both molecules•Cons: extremely time-consuming, scales poorly with the size

Page 35: Docking: Modeling of binding of macromolecules between themselves or with small-molecule ligands. Factors that determine specific binding: Phenomenologically:

Multiple start MC global optimization

Stochastic search good for local sampling, but diffusion gets slow on larger scale (d~t)•Pre-generate starting points spread evenly around receptor and ligand (Fig a)•Match each starting point on

receptor (Nr) with each starting

point on ligand (Nl) (Fig b)

•Six rotations around the

match axis, for a total of 6 Nr Nl

starting configurations

Page 36: Docking: Modeling of binding of macromolecules between themselves or with small-molecule ligands. Factors that determine specific binding: Phenomenologically:

Optimized scoring of docked solutions

•Scoring function including the grid terms and three ASA-based solvation components - polar, aliphatic and aromatic •Term contributions weighted:

E=Evw+Eel+Ehb+Epol+Ear+E

al

•For each of the 24 complexes in the benchmark, 6000-12000 docked conformations •Factors - optimized for best ranking of near-native solutions

Page 37: Docking: Modeling of binding of macromolecules between themselves or with small-molecule ligands. Factors that determine specific binding: Phenomenologically:

ICM docking in CAPRI

Best result in the worldwide Critical Assessment of PRedicted Interactions (CAPRI).

Proteins July 2003

Top prediction used Molsoft’s ICM Protein-Protein Docking procedure

Page 38: Docking: Modeling of binding of macromolecules between themselves or with small-molecule ligands. Factors that determine specific binding: Phenomenologically:

Best Results for 3 targets

A: Target 3, hemagglutinin / FabB: Target 6, -amylase / VHH C: Target 7, TCR-/ SpeA

Improvement of best rigid body docking solution for Target 6 (in gray) after refinement (in red)

X-ray structurePredicted ligand

Page 39: Docking: Modeling of binding of macromolecules between themselves or with small-molecule ligands. Factors that determine specific binding: Phenomenologically:

CAPRI Round 2 and 3 results

• Good models for 8 out of 9 targets

• One failure: T9 large hinge-bending movements, Successfully used new scoring function for T14, T18 & T19• 64-71% of native contacts• 0.4-1A interface RMSD• For T14, Rmsd 0.6A, Rank 1 by energy• T19: antibody - prion. Used no CDR bias + NMR model for prion.