3D-RISM‑DOCK: A New Fragment-Based Drug Design Protocol

download 3D-RISM‑DOCK: A New Fragment-Based Drug Design Protocol

of 14

Transcript of 3D-RISM‑DOCK: A New Fragment-Based Drug Design Protocol

  • 7/31/2019 3D-RISMDOCK: A New Fragment-Based Drug Design Protocol

    1/14

    MM-ISMSA: An Ultrafast and Accurate Scoring Function for ProteinProtein Docking

    Javier Klett, Alfonso Nunez-Salgado, Helena G. Dos Santos, A lvaro Cortes-Cabrera,

    ,

    Almudena Perona,, Ruben Gil-Redondo,, David Abia, Federico Gago, and Antonio Morreale*,

    Unidad de Bioinformatica, Centro de Biologa Molecular Severo Ochoa (CSIC-UAM), Campus de Cantoblanco UAM, E-28049Madrid, SpainDepartamento de Farmacologa, Universidad de Alcala , Alcala de Henares, E-28871 Madrid, SpainSmartLigs Bioinformatica S.L., Fundacion Parque Cientfico de Madrid, c/Faraday, 7. Campus de Cantoblanco UAM, E-28049Madrid, Spain

    ABSTRACT: An ultrafast and accurate scoring function for proteinprotein docking is presented. It includes (1) a molecularmechanics (MM) part based on a 126 Lennard-Jones potential; (2) an electrostatic component based on an implicit solventmodel (ISM) with individual desolvation penalties for each partner in the proteinprotein complex plus a hydrogen bondingterm; and (3) a surface area (SA) contribution to account for the loss of water contacts upon proteinprotein complexformation. The accuracy and performance of the scoring function, termed MM-ISMSA, have been assessed by (1) comparing thetotal binding energies, the electrostatic term, and its components (chargecharge and individual desolvation energies), as well asthe per residue contributions, to results obtained with well-established methods such as APBSA or MM-PB(GB)SA for a set of1242 decoy proteinprotein complexes and (2) testing its ability to recognize the docking solution closest to the experimentalstructure as that providing the most favorable total binding energy. For this purpose, a test set consisting of 15 proteinproteincomplexes with known 3D structure mixed with 10 decoys for each complex was used. The correlation between the valuesafforded by MM-ISMSA and those from the other methods is quite remarkable ( r2 0.9), and only 0.25.0 s (depending on thenumber of residues) are spent on a single calculation including an all vs all pairwise energy decomposition. On the other hand,MM-ISMSA correctly identifies the best docking solution as that closest to the experimental structure in 80% of the cases.Finally, MM-ISMSA can process molecular dynamics trajectories and reports the results as averaged values with their standarddeviations. MM-ISMSA has been implemented as a plugin to the widely used molecular graphics program PyMOL, although itcan also be executed in command-line mode. MM-ISMSA is distributed free of charge to nonprofit organizations.

    1. INTRODUCTION

    Molecular association (binding) plays a key role in cellularfunction and communication, and many illnesses can be directlylinked to an improper balance of interactions among distinctmolecular species. Of special importance are those establishedbetween different proteins or between small molecules andproteins. In the latter case, we usually talk about ligands andreceptors, but in the following, we will use this terminology torefer to the two binding partners. Understanding how bindingtakes place and how this event can be theoretically modeled is ofparamount importance in todays drug discovery campaigns,1

    especiallyin fields like protein engineering,2 ligand and fragment

    docking,

    3

    virtual screening (VS),

    4

    and computational muta-genesis,5 among others.A large number of methodological advances have been

    introduced since the first theoretical simulation of a biologicallyrelevant system6 that, in favorable cases, allow one to reproduceexperimental binding affinities with an error comparable to thatof the experimental measurements.7 On top of that, moderncomputers, sometimes including tailor-made architectures (e.g.,the Anton machine8), and supercomputers9 allow researchers toundertake calculations that were unimaginable just a few yearsago to address difficult problems such as simulating proteinfolding,10 long MD of very large systems,11 or unbiased drugbinding patterns.12

    Nevertheless, although theoretical methods and computertechnologies are continuously improving, there are still somebottlenecks. Representative examples are intrinsically massivecalculations, as undertaken in VS, where the number ofmolecules to be simulated can reach the order of several millions,and the analysis and interpretation of long MD trajectoriesespecially when solvent effects must be properly accounted for,per-residue analyses must be performed, or one wishes toestimate elusive entropic effects.

    Here, we are interested in the electrostatic contribution to thefree energy of binding, and more specifically in the effect playedby the solvent. There are two opposite, although complementary,ways to account for this effect:13 either representing the solvent

    explicitly by means of a set of discrete water moleculessurrounding the solute or via a mathematical function able todescribe the behavior of the bulk solvent. There are also examplesin which both methods have been combined in a sort of hybridsolvent model.14 However, both have advantages and caveats,and selecting the most appropriate description for a particularstudy greatly depends on the computational power available, asexplicit models require the calculation of a large amount ofinteractions due to the presence of numerous water molecules.

    Received: June 15, 2012Published: August 2, 2012

    Article

    pubs.acs.org/JCTC

    2012 American Chemical Society 3395 dx.doi.org/10.1021/ct300497z | J. Chem. Theory Comput. 2012, 8, 33953408

    http://localhost/var/www/apps/conversion/current/tmp/scratch_2/pubs.acs.org/JCTChttp://localhost/var/www/apps/conversion/current/tmp/scratch_2/pubs.acs.org/JCTC
  • 7/31/2019 3D-RISMDOCK: A New Fragment-Based Drug Design Protocol

    2/14

    For docking-related tools, implicit models are usually preferred,as they are faster than their explicit counterpart while performingquite similarly. Implicit models start by solving the classicalPoisson equation (PE).15 But, in many cases it is toocomputationally expensive, and other alternatives such as thegeneralized Born (GB) model are employed instead.16 Inaddition, solvation models based on group contributions

    (eff

    ective energy function, EFF1) have also been proposed andsuccessfully employed in proteins17 and proteinligand forcefields.18

    In this paper, we extend our previously developed GB-likesolvent model, called ISM (Implicit Solvent Model),19 to theproteinprotein docking problem and compare its performanceto well-established methods like APBS (PE solver) and MM-PB(GB)SA (as implemented in AmberTools20) using twodifferent test sets. For the procedures to be strictly comparable,we first incorporated MM (Molecular Mechanics) and SA(Surface Area) terms to the APBS (MM-APBSSA) and ISM(MM-ISMSA) methods. Comparisons between the threedifferent techniques were performed in order to assess theirrelative speeds and to validate the different values provided by

    our method for total binding free energies, individualcomponents, and per residue energy decompositions. Finally,to extend the usability of MM-ISMSA within the scientificcommunity, we have developed a graphical user interface (GUI)that allows its use via the popular PyMOL program.21 Thisplugin, which can be used to analyze single structures orcomplete MD trajectories, can be downloaded following freeregistration from the CBM Bioinformatics Units web page(http://ub.cbm.uam.es).

    2. THEORETICAL BACKGROUND

    2.1. Statistical Thermodynamics of Binding: Interac-tion Terms. According to classical thermodynamics,22 molec-

    ular association can be described as an equilibrium ([R] + [L]

    [RL], where R and L represent receptor and ligand, respectively,and RL, the complex formed between them) governed by theassociation and dissociation rate constants. The ratio betweenthem is the equilibrium binding constant, K, which is related tothe free energy change taking place in the process (Gbinding) bythe well-known equation:

    = = G G G G R T K lnbinding RL R L gas (1)

    where Rgas is the universal gas constant, T is the temperature inKelvin, and GXrepresents the free energy corresponding to thecomplex (X = RL), receptor (X = R), and ligand (X = L).

    The calculation ofGbinding would entail extremely lengthy

    simulations in which the ligand diffuses into the receptorsbinding site, but this is hardly ever done. As useful alternatives,

    the ligand can be grown slowly both in the bulk solvent andinside the binding site to calculate free energy differences,7 orGbinding can be estimated as the difference between the freeenergies of bound and unbound states, as in linear interactionenergy (LIE)23 and MM-PB(GB)SA24 end-point methods.

    It has been customary to describe the binding process throughthe thermodynamic cycle in Figure 1, where Gint refers to thebinding process in the gas phase, and Gsolv

    RL, GsolvR , and Gsolv

    L

    are the free energies of solvation for the complex, receptor, andligand, respectively. Next, because Gbinding is a state variable, thecycle can be solved to yield:

    = +

    = +

    G G G G G

    G G

    binding int solvRL

    solvR

    solvL

    int solv (2)

    Usually, from Figure 1 and eq 2 the solvation contribution

    (

    Gsolv) is obtained as a single term (e.g., in GB) and, as aconsequence, individual solvation energies for the ligand andreceptor are not available. As an alternative, a differentdescription of the binding process25 can be considered (Figure2) that consists offirst desolvating the apposing surfaces of bothligand and receptor and then letting the charges of the twomolecules interact.

    Thermodynamically, Gbinding has an enthalpic (Hbinding)and an entropic (Sbinding) component:

    = G H T Sbinding binding binding (3)

    Hbinding contains van der Waals (GbindingvdW ), hydrogen bonding

    (Gbindinghb ), and solvent-related contributions that can be further

    subdivided into polar (Gbindi ngp ) and apolar (Gbindi ng

    np )

    components. The former contains the Coulombic interactions(Gbinding

    elec,coul) together with ligand (Gbindingelec,desolvR) and receptor

    (Gbindingelec,desolvL) desolvation terms, whereas the latter includes the

    cavitation term (the work required to create a cavity within thesolvent to introduce the solute) and the van der Waals solutesolvent interactions. A linear relationship is usually assumedbetween the composite of the latter two components and thechange in solvent-accessible surface area (SASA) of the ligandand receptor upon binding. On the other hand, the entropiccontribution arises from the loss of some protein degrees offreedom that become frozen when the complex is formed andalso from solvent reorganization, as some water moleculespresent within the binding site will be released to the bulk solvent

    Figure 1. Graphical representation of the commonly used thermody-namic cycle to estimate Gbinding. The shadowed boxes represent thesystems (receptor, ligand, and complex) immersed in the solvent.

    Figure 2. Alternative description of the binding process to estimateGbinding. This type of equilibrium is commonly used in combination

    with PE solvers. White molecules are uncharged, and they are used toreplace the high-dielectric solvent with a low-dielectric medium in thedesolvation calculations.

    Journal of Chemical Theory and Computation Article

    dx.doi.org/10.1021/ct300497z | J. Chem. Theory Comput. 2012, 8, 339534083396

    http://ub.cbm.uam.es/http://ub.cbm.uam.es/
  • 7/31/2019 3D-RISMDOCK: A New Fragment-Based Drug Design Protocol

    3/14

    as a consequence of the binding event. This entropiccontribution is rarely taken into account when Gbinding iscomputed due to its complexity, high computational demand,and slow convergence.26 The electrostatic component is themost challenging term, and this will be the focus of the presentwork.

    2.2. The PE Model. The classical way to deal with theelectrostatic contribution to the binding energy (G

    elec

    ) is bysolving the PE, which relates the electrostatic potential (r) andthe charge distribution (r):

    = r r r[ ( ) ( )] 4 ( ) (4)

    where (r) is a distance-dependent dielectric function. For agiven (r), (r) can be calculated via the PE so that

    =G r r v12

    ( ) ( ) delec (5)

    Because the analytical solution of PE is possible only for verysimple geometries, for biological molecules we have to rely onnumerical methods such as finite differences,27 finite elements,28

    or boundary elements.29 Nonetheless, solving the PE is still a

    computationally demanding task in many molecular modelingareas, despite constant improvements over the years.30

    Equation 5 can be used in different ways to obtain anestimation of the binding free energy, either by computing thedesolvation terms of the thermodynamic cycle depicted in Figure1 (as implemented in the MM-PBSA method) or by applying thecycle shown in Figure 2 as described in section 3.2.1, where theCoulombiccontribution is obtained by computing the product ofligand charges times the electrostatic potential generated by theprotein on the ligand charge centers. On the other hand, receptorand ligand electrostatic desolvation energies are calculated in twosuccessive steps (Figure 2): a first one, where a calculation isperformed for the receptor and ligand alone, and a second one,for the ligand in the complex, with uncharged receptor, and forthe receptor in the complex, with uncharged ligand.

    2.3. The GB Model. The GB model is based on the Bornapproximation and can be easily derived from the PE, assuming aspherical solute that has the whole charge located at its center,according to

    =

    Gq

    r(Born) 166 1

    1solvelec

    2

    (6)

    where is the dielectric constant of the solvent, and q and rarethe charge and the radius of the sphere, respectively. Taking intoaccount that molecules can be represented as a set of interactingspheres, eq 6 can be generalized to the expression commonlyused in the GB models:

    = +

    = =

    = +

    = =

    G G Gq q

    r

    q q

    f

    (GB)

    332 166 11

    i

    N

    j i

    Ni j

    ij

    i

    N

    j

    Ni j

    solv

    elec

    vac pol

    1

    1

    1

    1 1 GB (7)

    where Nis the total number of atoms, rij is the distance betweenatoms i andj, qi and qj are the atomic charges of atoms i andj, and

    fGB is the GB function defined as

    = + f r i j r( , , ) [ e ]ij ij i jr

    GB2 ( /4 ) 1/2ij i j

    2

    (8)

    is the so-called effective Born radius, which is the distance froman atom to the molecular surface. Note that fGB is differentdepending on the system, that is, the receptor, the ligand, or thecomplex. In eq 7, the first term (Gvac) represents theelectrostatic interaction in vacuo, while the second (Gpol)accounts for thepolarization effects due to the solvent. Infact, it isthis second term that is calculated as the electrostatic componentof the free energy of solvation in GB-based methods, and as suchit has been implemented in many different programs and, inparticular, in the AmberTools package.

    Applying eq7 to the thermodynamic equilibrium depicted inFigure 1, it is possible to separate the electrostatic interactionbetween charges (in a vacuum and in the solvent) from the puredesolvation terms for the receptor and ligand:

    =

    = =

    = =

    = =

    = =

    Gq q

    r

    q q

    f

    q q

    f

    q q

    f

    q q

    f

    q q

    f

    (GB) 332 332 11

    166 11

    166 11

    i

    N

    j

    Ni j

    ij

    i

    N

    j

    Ni j

    i

    N

    j

    N i j i j

    i

    N

    j

    Ni j i j

    solvelec

    1 1

    R L

    1 1

    R L

    GBRL

    1 1

    R R

    GBRL

    R R

    GBR

    1 1

    L L

    GBRL

    L L

    GBL

    R L

    R L

    R L

    R L

    (9)

    Unfortunately, as commented upon above, the GB method asimplemented in AmberTools does not allow for suchdecomposition, and only the total polarization energy is obtained(Gpol in eq7). Nevertheless, setting the atomic charges in thereceptor/ligand to zero and subtracting the resulting GB termfrom the GB term of a standard calculation would afford the

    ligand/receptor desolvation terms independently. Accordingly,zeroing the ligands charges:

    = =

    = = =

    G G G q

    qf f

    (GB) (GB, 0)

    166 11

    ( )1 1

    i

    i

    N

    j

    N

    i

    desolvR

    elecMMGBSA

    elecMMGBSA L

    1 1

    R 2

    GBRL

    GBR

    R R

    (10)

    The same can be done if the receptors charges are zeroed:

    = =

    =

    = =

    G G G q

    q

    f f

    (GB) (GB, 0)

    166 11

    ( )1 1

    i

    i

    N

    j

    N

    i

    desolvL

    elecMMGBSA

    elecMMGBSA R

    1 1

    L 2

    GB

    RL

    GB

    L

    L L

    (11)

    Choosing this alternative would require (a) manipulating thetopology (top) files used by AmberTools to set the charges of theligand (or the receptor) to zero and (b) an additional GBcalculation (Gpol in eq7) to obtain the Coulombic interactionscreened by the solvent (Gscreened, the first right-hand term in eq9). That is, taking into account

    = + + G G G Gpol screened desolv R

    desolvL

    and the fact that we can calculate both desolvation energies asshown above:

    Journal of Chemical Theory and Computation Article

    dx.doi.org/10.1021/ct300497z | J. Chem. Theory Comput. 2012, 8, 339534083397

  • 7/31/2019 3D-RISMDOCK: A New Fragment-Based Drug Design Protocol

    4/14

    = G G G G(GB)screened solv elec

    desolvR

    desolvL

    three calculations are needed to obtain the total free energy ofsolvation and its components.

    2.4. The ISM Model. The model starts from the LorentzDebyeSack theory of polar liquids,31which establishes that thescreening effect due to the solvent shows a sigmoidal distance-dependent dielectric function of the form:

    =

    +

    +

    +D r

    k( )

    1

    1 e1

    r( 1) (12)

    where is the solvent dielectric constant, k= ( 1)/2, is aparameter controlling the rate of change of D(r), and r is thedistance. ISM considers that the main contribution to theelectrostatic desolvation of an atom originates from thedisplacement of the first shell of water molecules that surroundsthat atom. Taking these two facts into account, ISMs startingequation, as proposed by Hassan et al.,32 is the following:

    =

    +