Protein Folding and Binding: Effective Potentials, Replica ... · Protein Folding and Binding:...

Protein Folding and Binding: EffectivePotentials, Replica Exchange Simulations, andNetwork Models

Anthony K. Felts, Michael Andrec, Emilio Gallicchio,and Ronald M. Levy

Department of Chemistry and Chemical Biology andBioMaPS Institute for Quantitative Biology,P~ut gets University,Piscataway, New Jersey 08854ronlevy @lutece. rutgers, edu

1.1 Introduction

1 Replica Exchange and AGBNP 3

Molecular simulations of protein structural changes and ligand binding arebuilt upon two foundations: (i) the design of effective potentials which arematched to the requirements of accuracy and speed appropriate to particularmodeling problems, and (ii) the design of algorithms to sample the effectivepotentials in highly efficient ways so as to facilitate the convergence of thesimulations in a thermodynamic sense. Developing algorithms to satisfy thecompeting goals of accuracy and speed is at the heart of the problem whenconsidering computational models for use in structural biology, and strategiesfor achieving these twin goals in different molecular modeling contexis areemphasized throughout this review.

The protein folding problem is of fundamental importance in modern struc-tural biology. Recent advances in experimental techniques have helped to elu-cidate thermodynamic and kinetic mechanisms which underlie different stagesof the folding process. [1-6] Computer simulations performed at various levelsof molecular detail have played a central role in the interpretation of exper-imental studies. Molecular simulations using models based on fully atomicrepresentations are becoming more accurate and more practical and are in-creasingly employed to simulate protein folding and predict protein struc-tures. [7 15] Due to the large number of degrees of freedom, however, thesesimulations require extensive computer resources to obtain meaningful results,especially when the solvent environment is treated explicitly. [16] Because ofthis, many recent computational studies have been carried out with implicitsolvent models. [15, 17-20] The question of how well implicit solvent effectivepotentials when combined with detailed atomic protein models can predictthermodynamic as well as kinetic aspects of protein folding is under activeinvestigation. [9,10, 12,13,15,19-27]

Understanding the molecular basis for the thermodynamic stability andfolding kinetics of typical secondary structural elements is a necessary step inthe development of accurate protein folding models. The study of a-helical and/%sheet peptide fragments isolated from their protein environments has there-fore been actively pursued. [1,4, 28] Until the mid 1980’s linear peptides werebelieved to be poorly structured in water, [29, 30] a view also supported bytheoretical models. [31, 32] Several linear peptides which retain many struc-tural features of full size proteins have since beer/ discovered. [28, 33] TheC-peptide of ribonuclease A has been found to retain a significant fraction ofa-helical content in solution. [34, 35] More recently, the C-terminal peptide ofprotein G has been shown to retain in solution many features of its original~-hairpin structure [36] and has been the subject of extensive studies, bothexperimental [36-41] and theoretical. [17-20,42-50] Due to their small size,peptides of this kind are particularly useful as target systems for analysis andrefinement of molecular force fields. [51]

Numerous stringent requirements make the development of practically use-ful solvation free energy models for biological applications very challenging. In

I Replica Exchange and AGBNP 5

We developed the Analytical Generalized Born plus Non-Polar (AGBNP)model, an implicit solvent model based on the Generalized Born model[61-64, 68, 90] for the electrostatic component, and on the decomposition ofthe non-polar hydration free energy into a cavity component based on thesolute surface area and a solute-solvent van der Waals interaction free en-ergy component modeled using an estimator based on the Born radius of eachatom.

In early studies it was not possible to analyze a large enough region ofconformational space to offer a meaningful comparison between the thermo-dynamic predictions of different force field models concerning the propensitiesof various peptides towards forming either (~-helical or ~-halrpin conforma-tions. Recent advances in parallel sarnpling techniques [51, 91, 92] and thewidespread availability of large numbers of processors have now made possi-ble the calculation of the full potential of mean force of small to medium sizedpeptides in solution. [15, 19, 47, 48, 51]

One class of methods for studying equilibrium properties of quasi-ergodicsystems that has received a great deal of recent attention is based on theReplica Exchange (RE) [93, 94] algorithm (also known as parallel temper-ing). To accomplish barrier crossings, RE methods simulate a series of repli-cas over a range of temperatures. Periodically, coordinates are exchanged us-ing a Metropolis criterion [95] that ensures that at any given temperature acanonical distribution is realized. RE methods, particularly Replica ExchangeMolecular Dynamics (REMD) [91], have become very popular for the studyof protein biophysics, including peptide and protein folding [15, 96, 97], aggre-gation [98-100], and protein-ligand interactions [101,102]. Previous studies ofprotein folding appear to show a significant increase in the number of reversiblefolding events in REMD simulations versus conventional MD [103,104]. Giventhe wide use of REMD, a better understanding of the RE algorithm and howit can be utilized most effectively for the study of protein folding and bindingis of considerable interest.

The effectiveness of RE methods is determined by the number of tempera-tures (replicas) that are simulated, their range and spacing, the rate at whichexchanges are attempted, and the kinetics of the system at each temperature.While the determination of "optimal" Metropolis acceptance rates and tem-perature spacings has been the subject of a variety of studies [94, i05-110],the role played by the intrinsic temperature-dependent conformational kinet-ics which is central to understanding RE has not received much attention.Recent work [II0-113] recognizes the importance of exploration of conforma-tional space and the crossing of barriers between conformational states as thekey limiting factor for the RE algorithm. Molecular kinetics can have a strongeffect on RE beyond the entropic effects that have been discussed [111,113],particularly if the kinetics does not have simple temperature dependence. Itis known from experimental and computational studies that the folding ratesof proteins and peptides can exhibit anti-Arrhenius behavior, where the fold-


and intramolecular degrees of freedom. The solvation free energy, AGsolv,of each structure is estimated using the analytical generalized Born model[27] with nonpolar free energy estimator (AGBNP, as described below) asimplemented in the IMPACT modeling program. [128]

The total internal energy of a protein is given by the OPLS-AA force field,

AUint =-- AUOPLS-AA = z~Ubond ÷ z~Uangle ÷ AVtorsion ÷ AUcoulomb ÷ Z]Uvdw,(1.4)

where the first three terms refer to intramolecular interactions arising fromthe connectivity of the molecule and the last two terms reflect nonlocal in-teractions within the protein. In the original development of the OPLS-AAforce field, the partial charges and van der Waals parameters were adjusted toreproduce experimental heats of vaporization and densities for a series of pureliquids. [126,129-133] These parameters were further tested by comparison toexperimental solvation energies, using explicit-solvent simulations. Additionalcomparisons were made in some cases to hydrogen-bond dimer-interactionenergies obtained from quantum-chemical calculations. These comparisonswere used to detect large discrepancies that, when present, called for a re-investigation of the non-bonded parameters. The OPLS-AA torsional param-eters were fit to reproduce gas-phase conformational energies obtained fromquantum-chemical calculations, [127] and stretching and bending parameterswere adapted from the CHARMM22 or AMBER force fields.

1.2.2 Analytical Generalized Born with Non-Polar Free EnergyEstimator

The generalized Born model is given by the following equation, [61]

where qi is the charge of atom i and rij is the distance between atoms i and j,gives the electrostatic component of the free energy of transfer of a moleculewith interior dielectric ~in from vacuum to a continuum medium of dielectricconstant ~w, by interpolating between the two extreme cases that can be solvedanalytically: the one in which the atoms are infinitely separated and the otherin which the atoms are completely overlapped. The interpolation function fijin Eq. (1.5) is defined as

(1.6)

where B~ is the Born radius of atom i defined as the effective radius thatreproduces through the Born equation

(1.7)


solute-solvent van der Waals interaction energy component has been derivedon the basis of simple physical arguments. [27]

We use two sets of parame)erizations of ~ and ~ to test the full nonpo-far function described above relative to a simpler nonpolar function. In pastimplementations, [14] the total nonpolar solvation free energy is given by aterm proportional to the solvent-accessible surface area, or in terms of Eq. 1.9,setting all values of ai to zero,

where 0~i is set for all atoms to 0.015 kcal/mol//~~. This implicit solvent modelwith the less-detailed nonpolar function is referred to as "AGB-7". Whenwe use the full nonpolar function including the dispersion term using theparameters set forth in the work of Gallicchio and Levy [27], the implicitsolvent model is referred to as "AGBNP’.

A third parameterization aimed at implementing a correction for saltbridge interactions (which are generally overestimated by generalized Bornsolvent models) [97,135] is also investigated. To correct for the overstabiliza-tion of salt bridges by the generalized Born model, we used modified radii and7i for carboxylate oxygens. The radius of the carboxylate oxygen is decreasedfrom 1.48/~, as in the original AGBNP, to 1.30 .~; ~ of the carboxylate oxygenis set to -0.313 kcal/mol//~2. These have the combined effect of increasing thesolubility of carboxylate oxygens and decreasing the likelihood of ion pairingbetween the carboxylate groups on glutamate and aspartate, and positivelycharged groups found on lysine and arginine. We have parameterized this ra-dius and 0~ to experimental data for small molecules and to provide resultswhich matched those generated with explicit solvent (unpublished results).The implicit solvent model that has additional descreening of ion pairing isreferred to as "AGBNP+".

1.2.3 Replica Exchange Molecular Dynamics

The MD replica exchange canonical sampling method (REMD) has been im-plemented in the molecular simulation package IMPACT following the ap-proach proposed by Sugita and Okamoto. [91] In this method, a series ofstructures (the replicas) are simulated in parallel using MD at different tem-peratures. The temperatures, Tm and T~, of two replicas, i and j, respectively,are exchanged with the following Metropolis transition probability: [91]

where

1 for A_<OW({T.~,T,~} --> {Tn,T~}) = exp(-A) for A > 0 (1.11)

zx = - - Ej), (1.12)


of being in states extracted from different replica exchange temperatures ispeaked near a "reference" or "simulation" temperature that is a parameter ofthe kinetic model and is derived based on the distribution of instantaneoustemperatures in a classical system. This model allows a given path to samplestates having instantaneous temperatures above or below the reference tem-perature To in a physically realistic manner. For the simulations describedhere, all of the intertemperature microscopic rate constants corresponding toa decrease in temperature were set equal to 1, whereas the rate constantsfor the reverse transition were determined by k~j = P(Tj)/P(T~). The replicaexchange temperature corresponding to a given state will be referred to as itsinstantaneous temperature in the results below. [121]

1.2.5 Loop Prediction with Torsion Angle Sampling and REMD

The loop prediction algorithm implemented in the Protein Local Optimiza-tion Program (PLOP) is described in detail in Ref. 144. During loop build-up,a series of filters of increasing complexity is applied to eliminate unreasonableconformations as early as possible. Some of these filters detect clashes betweenbackbone atoms and the atoms of the rest of the protein (referred to as the"frame"), and" check that enough space is available to place the sidechain ofeach residue. On the order of hundreds to thousands of loop conformations aregenerated in the loop build-up stage. To reduce the number of conformationspassed to the next stages, loop conformations are clustered based on back-bone RMSD using the K-means algorithm. [145] The basic loop predictionalgorithm described above is often insufficient for loops with nine or moreresidues. For these longer loops we have adopted prediction schemes basedon multiple executions of PLOP with different parameters. [144,146] Theseschemes are based on focusing conformational sampling in promising and pro-gressively smaller regions of conformational space. The initial predictions withthe most favorable energy scores are subjected to a series of constrained re-finement calculations with PLOP in which selected loop backbone atoms arenot allowed to move or move only within a given range. [144] Further enhance-ments, such as allowing for more atomic overlaps and increasing the numberof clusters in the K-means algorithm, have been incorporated into the loopsampling algorithms. [122]

We also investigated if a technique based on replica exchange moleculardynamics importance sampling could predict loop conformations. We selected9-residue loops which were not successfully predicted by the standard sam-pling algorithm built around PLOP to see if importance sampling would suc-ceed. This subset of the 9-residue loops (Table 1.3) was investigated withthe temperature replica exchange sampling method (T-REMD) [91, 97, 128]as implemented in the IMPACT software package. [128] The lowest energyloop configuration obtained in the third stage of PLOP optimization was cho-sen as a starting point for the corresponding T-REMD run. Each loop wasminimized in the field of the surrounding immobilized protein frame.


formation of the fl-hairpin.) However, significantly more of the structures gen-erated with the full AGBNP nonpolar function have a collapsed hydrophobiccore as compared to those generated with the surface area nonpolar model.The full nonpolar model of the OPLS-AA/AGBNP potential favors the for-mation of the collapsed hydrophobic core of the peptide even in the presenceof the destructive salt bridge between residues K50 and E56.

While previous replica exchange simulations of the C-terminal polypep-tide from the B1 domain of protein G in explicit and implicit solvent havebeen carried out using the capped peptide [19, 20, 47,48], the experimentshave been performed on the zwitterion form of the peptide (i.e. the uncappedpeptide). [36-39] A salt bridge between the N- and C-termini can form inthe uncapped polypeptide but cannot form in the capped peptide. The 8-hairpin population of the uncapped peptide (26%) is significantly larger thanthe E-hairpin population of the capped peptide (10%) with the same solvationmodel (AGBNP). [97] This is due to the stabilizing effects of the salt bridgebetween the N- and C-termini. The terminal salt bridge compensates for thedisruptive interaction between the charged residues of K50 and E56 whichform a salt bridge preventing the formation of the hairpin. The salt bridgebetween KS0 and E56 is found in many of the structures of the zwitteriongenerated without additional dielectric screening (AGBNP). The populationof this disruptive salt bridge is reduced when increased dielectric screeningof the charged side chains is applied in AGBNP+, and consequently the ¢?-hairpin population is increased from 26% to 40%. The predicted fl-hairpinpopulation of the uncapped peptide agrees well with the experimental resultsof Blanco et al. (42% at 283 K) [36] carried out on the same uncapped sys-tem. The degree of hydrophobic collapse (98%) agrees reasonably well withthe experimental results reported by Mufioz et al. who observed around 80%hydrophobic collapse at 270 K. [38]

1.3.2 Folding of Other Small Peptides

To demonstrate the accuracy of OPLS-AA/AGBNP+, we predicted the con-formations of a series of small peptides which adopt either an a-helicalconformation (CheY2-mu peptide [150], C-peptide [34], and the S-peptide-analog [151]), no secondary structure (the CheY2 peptide [150]), or a mix of~ and a conformation (the FSD1 mini-protein [152]). We performed REMDsimulations to sample the conformational space available to these peptides.The results are summarized in ~i’able 1.1. We acheive reasonable accuracy forthese peptides. It is also apparent that there is no bias towards forming a-helical conformation with OPLS-AA/AGBNP+ as is evident by the predictionof the coil conformation for CheY2 peptide which is similar in sequence to thea-helical CheY2-mu [15@


Table 1.2. Summary of the loop canformational predictions results with the com-bination of standard and enhanced sampling procedures, ddd refers to distance-dependent dielectric; E, S, and M axe energy, sampling, and marginal errors, repec-tively; and (RMSD} : average RMSD (in £) of the lowest energy loops. [122]

9-residueddd AGB--~ AGBNP AGBNP+

E 19 6 4 2S 4 4 4 5M 3 1 0 1E+S+M 26 11 8 8{I~MSD} 2.31 1.10 1.04 1.00median RMSD 1.27 0.52 0.52 0.58

13-residueAGBNP+

2518

1.870.67

The percentage of predictions they report within 2 /~ RMSD (described asgood and medium predictions) is 55%. [147] Using a tighter RMSD cutoff of1.5/~, we obtain with PLOP and AGBNP+ an 86% success rate in our pre-dictions for 9-residue loops. For a set of 13-residue loops, Fiser et al., using thesame 2 ]k RMSD cutoff, report a very low 15% success rate, [147] compared tothe 77% success rate we obtained using the AGBNP+ scoring function. Xianget al. performed a search over a discrete rotamer library with scoring basedon their colony energy. For 9-residue loops, they report an average RMSD of2.68 ~. [148] In comparison the average RMSD we have obtained with PLOPand AGBNP+ is 1.00 ~. De Bakker et al. [154] generated loop conformationswith their program RAPPER [155] and scored them with a knowledge-basedpotential and with a physics-based potential, AMBER/GBSA. For 9-residueloops from the Fiser set [147], the average RMSD of the lowest energy loopswas over 2 ~ when scored with the AMBER/GBSA potential which producedtheir best results. [154]

Jacobson et al. [144] performed loop prediction calculations on a large setof 9-residue loops using the SGB/NP model [64, 156] with the crystal sym-metry included and using the standard conformational sampling algorithmused here. [144] They had obtained ten energy errors and eight sampling er-rors. [144] In comparison, we find eleven energy and seven sampling errors withSGB/NP without crystal symmetry but we find only eight energy errors andfive sampling errors with SGB/NP with crystal symmetry. [122] This mightindicate that crystal symmetry is important for prediction accuracy; however,we obtained two energy errors and five sampling errors using AGBNP+ with-out the presence of the crystal environment. A recent study based on thecomparison of X-ray and NMR structures of identical proteins suggests thatin most cases the impact of the crystal environment on protein structures isrelatively small and not strongly correlated with crystal packing. [157] Re-cently, Zhu et al. [146,153] have reported loop prediction results for the same35 13-residue loops investigated here using the SGB/NP potential with crys-


loops that resulted in global sampling errors with the standard loop sam-pling procedure. The results are shown in Table 1.3 As has been demon-strated, [144,146] the conformational search algorithms based on PLOP per-form well for predicting the conformation of protein loops of up to thirteenresidues in length; however, because of the exponential explosion in the num-ber of possible loop configurations that need to be examined, the applicationof this method to longer loops and situations which involve several interactingloops, as well as simultaneous refinement of the protein region surrounding theloops, is problematic. In contrast, importance sampling schemes concentratesampling in the most thermodynamically relevant regions of the conforma-tional space and scale linearly with the increase of the number of degrees offreedom.

Table 1.3. Summary of the loop conformational predictions results with the OPLS-AA/AGBNP+ force field and REMD conformational sampling, compared to thecorresponding predictions with the PLOP-based standard sampling procedure Inparentheses is the range of residue numbers in the loop.

PDB (R~st !~last) PLOP REMDRMSD (~) RMSD (~)

inpk (102-110) 3.60 4.30lone (70-78) 7.43 2.06lfus (31-39) 6.03 1.78ibyb (246-254) 4.00 4.95Inoa (99-107) 5.67 3.94lwer (942-950) 4.29 1.34

Prior to applying the REMD procedure to the group of protein loopsclassified as sampling errors by the standard loop prediction routine, we testedthe REMD protocol on a less challenging set of five 9-residue loops for whichthe PLOP scheme was successful. The REMD approach produced matchingresults within reasonable simulation times, indicating that the REMD protocolcan also easily provide good predictions in these cases. However, as the resultssummarized in Table 1.3 show, the more challenging cases of conformationalsampling, although improved over the PLOP predictions, remain problematic.The REMD scheme was able to substantially improve within the allocatedsimulation time half of the PLOP sampling errors, resulting in much higherquality structures. The RMSD’s of the predictions for the lonc, lfus, andlwer, improved from the range between 41 to 7.51 to ~2 I or less. Onlyone case, however, (1wet) resulted in a correct prediction based on the 1.5 iRMSD threshold.

The REMD trajectory for the lfus (31-39) loop is illustrated in Fig. 1.1,where the energies of conformations sampled in the last 5 ns of simulation atvarious temperatures are plotted. The patchy pattern of the lowest tempera-


Previous experimental work in the Eaton laboratory [38] has shown thatthe time dependence of loss of hairpin structure in the G-peptide after asmM1 temperature-jump perturbation is well fit by a single exponential. Toconfirm that our kinetic model is consistent with this previous experimentalkinetic work, we performed a series of simulations modeling this temperature-jump experiment. We began each simulation by constructing an ensemble ofstarting points distributed according to an equilibrium distribution with Toranging from 300 to 615 K. We then performed a Markov process simula-tion for 2,000-5,000 time units beginning from each starting point by usinga reference temperature 60° higher than the temperature used to constructthe initial starting point ensemble. For each temperature, the number of tra-jectories residing in a/%hairpin state were monitored as a function of time.In all cases, the loss of hairpin structure is fit well by single exponential de-cay with the exception of a small initial "burst phase." [121] Our results arequalitatively consistent with experimental observations [38].

1.4.2 The G-Peptide Has an ~-Helical Intermediate DuringFolding from Coil Conformations.

Protein folding is a process by which conformations without identifiable sec-ondary structure adopt a native conformation. To study this process in the G-peptide with our kinetic network model, we performed a temperature quenchexperiment similar to the temperature-jump experiment described above butfor which the starting ensemble was chosen from the equilibrium distributionat To -- 700 K, and the simulation was run at a reference temperature of 300K. The fraction of a-helix and/~-halrpin states as a function of time displaysa rapid rise in the amount of a-helix initially, which reaches a maximum andthen decreases. Simultaneously, the amount of/~-hairpin rises initially at arapid rate, then continues to rise with a slower rate similar to the rate of de-crease in the fraction of a-helix. This finding is suggestive of a mechanism inwhich there are a small number of fast direct paths from unfolded coil statesto the/~-hairpin, but that the majority quickly fold to a-helical states, whichthen convert into/~-hairpins on a longer time scale. A similar phenomenon isnot observed for the unfolding process: temperature-jump simulations from300 to 700 K do not show appreciable a-helix formation. That the folding andunfolding kinetic paths are different reflects the quite different nonequilibriumcooling and heating conditions that are being simulated. [121]

We can assign approximate absolute time scales to the processes observedhere by equating the "two-state" equilibration rate observed after a small tern-perature perturbation with that experimentally observed (6 /~s) [38]. Basedon this finding, the appearance of/%hairpin has a time constant of ~2,500time units, which would correspond in physical units to ~50 ~s, whereas therapid initial formation of a-helix occurs with a time constant of 9 time unitsor ~180 ns. [121] These rates are in qualitative agreement with experimentalobservations [38].


find that there are two major clusters. These account for 95% of the statesand consist mainly of hairpin and helical states. Because none of the clus-ters observed for a 400 K cutoff temperature contain both a hairpin and ahelical state, it follows that it is not possible to find a path connecting thesetwo macrostates without accessing a state with an instantaneous temperature>400 K. In fact, the lowest temperature at which the hairpin and helical statescoexist in the same connected cluster is 488 K, that is, slightly above Tm forthe OPLS-AA/AGBNP effective potential model. Therefore, if the system isin the metastable helical macrostate at room temperature, it must experiencea thermal fluctuation sui~iciently large to bring its instantaneous temperatureabove Tm to convert into the hairpin macrostate. [121]

1.4.3 A Molecular View of Kinetic Pathways

One of the advantages of the kinetic network model proposed here is thatwe are able to explore a large number of potential pathways that join twomacrostates. The number of such paths will typically be extremely large. Fur-thermore, each state along the path has associated with it all of the atomiccoordinates from the REMD simulation. Therefore, the molecular aspects ofthe paths can be analyzed in detail. This ability allows us to explore themultitude of folding pathways that the system can potentially have at its dis-posal. One way in which this model can be used is to generate many paths byusing Markovian kinetic Monte Carlo simulations. Such an approach with all-atom models has been useful for enumerating and quantifying the relative fluxthrough parallel kinetic pathways in small systems [140,141]. Alternatively, itis possible to investigate thermodynamically favorable pathways by a detailedanalysis of the structure of the kinetic network, for example, by searchingfor a small number of short paths connecting the two macrostates under theconstraint that the instantaneous temperature remain below a predeterminedmaximum value. We use this approach to analyze pathways connecting thea-helix and ~-hairpin macrostates in the G-peptide. [121]

Two short pathways that link the a-helical and ~-hairpin macrostateswithout making use of microstates with an instantaneous temperature above488 K are shown in Fig. 1.2. The path shown in Fig. 1.2 Upper involves theunwinding of both ends of the helix, leaving approximately one turn of helixin the middle of the molecule. This turn then serves as a nucleation point forthe formation of the/~-turn, which is stabilized by hydrophobic interactionsbetween the side chains of Y45 and F52. The native hydrogen bonds nearestthe turn then form, after which the remainder of the native hairpin structureforms. This pathway is similar to previously proposed mechanisms for thefolding of the G-peptide ~-hairpin from a coil state, which emphasize theformation of hydrophobic contacts before hydrogen bond formation [17,18,44,46,158,159] and the persistence of the/~-turn even in the unfolded state [159].The novel aspect of the path shown in Fig. 1.2 Upper is the preformation ofthe ~-turn from a residual turn in an otherwise unfolded a-helix.


the structure of one of these complexes (P450 BM-3 bound to NPG [163])depends on temperature and that at biologically relevant temperatures theligand moves from a position distant from the heine iron, as seen in the lowtemperature X-ray crystal structure, into a position proximal to the iron,leading to the displacement of the iron coordinated water molecule and theinitiation of the oxidation mechanism.

In this study we use replica exchange molecular dynamics (REMD) [91,97]with OPLS-AA force field [12.6,127] and AGBNP+ [27] to study the thermody-namic equilibrium between, the conformations of the P450 BM-3/NPG com-plex in which the terminal carbon atoms of NPG is distant from the heine ironas in the low temperature x-ray crystal structure [163] (henceforth addressedas distal state, see Fig. 1.3(A)) and conformations with the terminal carbonatoms of NPG proximal to the heine iron as in the conformation proposedby Jovanovic et al. [167] (henceforth addressed as proximal state, see Fig.1.3(B)). REMD is ideally suited for this problem not only because it improvesconformational sampling but also because it yields the populations of confor-mational states over a range of temperatures. In this study we compute therelative populations of the distal and proximal states as a function of temper-ature and we formulate a model for the origin of the entropic stabilization ofthe proximal state over distal state. To obtain information about relevant highenergy transition states regions which are scarcely resolved at room temper-ature we use the temperature weighted histogram (T-WHAM) [160] methodwhich combines data from simulations at several temperatures. Using this toolwe postulate a mechanism for the conformational interconversion between thedistal and proximal states. [102]

A B

Fig. 1.]. Active site of the P450 BM-3/NPG complex in (a) the low temperatureX-ray conformation (PDB ljpz) representative of distal state where the NPG (shownin ~reen) is distant from the heme iron, with Phe87 (shown in magenta) interposedbetween NPG and heine iron (shown in blue) and (b) the alternative active site ofthe conformation predicted by Jovanovic et al. representative of the proximal statewhere Phe8? has changed its rotameric state to allow NPG to approach the hemeiron.

1 l~eplica Exchange and AGBNP 25

thermal activation mechanism proposed by Jovanovic et al. [167]. The pre-dicted midpoint of the transition from the distal to the proximal state is 268K (see Figure 1.4) ~20 degrees higher than the observed transition temper-ature. [167] The increase in population of the proximal state with increasingtemperature indicates that the proximal state is stabilized by conformationalentropy. [102]

0.9

0.8

0.7

0.6

0.5

0.4

260 270 280 290 300 310 320 330 340 350

Temperature (K)

Fig. 1.4. Population as a function of temperature, p(T), corresponding to theconformations in which ligand is proximal to the heme iron. The proximal statepopulation increases monotonically with temperature indicating that the proximalstate is stabilized by conformational entropy at temperatures greater than at least268 K. This is borne out by the expression for the conformational entropy differencebetween the proximal and the distal states: S = k ln[p/(1-p)] +kT/[p(1 - p)]Op/OT,where the second term is positive and the first term is positive for T > 268 K(p(T) > 1/2).

Structural analysis of the ensemble of conformations at ambienttemperature

The conformational change in the active site of the P450 BM-3/NPG complexinvolves primarily NPG and the active site residue Phe87. Analysis of the sim-ulation trajectories identified three distinct states at room temperature~ Thefirst state (20% population) corresponds to the distal state (Fig. 1.3A), the sec-ond state (4% population) contributes to the proximal state and includes thepreviously proposed [167] proximal conformation (Fig. 1.3B). Besides thesetwo states corresponding to previously identified conformations, we observed

1 Replica Exchange and AGBNP27

analysis the locked ligand proximal state appears to be an intermediate in theconversion from the distal to free ligand proximal state. [102]

1.6 Simple Continuous and Discrete Models forSimulating Replica Exchange

One ca~nnot systematically explore the convergence properties of RE as afunction of the simulation parameters and/or the underlying kinetics of themolecular system by brute force molecular simulations, since RE simulationsof protein folding are very difcult to converge. As an alternative, it is usefulto study simplified low dimensionality systems. While these models do notcapture all of the complexities of the "real" molecular simulation, they docapture some of the essential features of RE and allows us to study these fun-damental aspects of the algorithm at relatively low computational cost and ina controlled setting. We discuss here two simplified models of RE. The firstis a discrete two-state network model, containing two conformational states(Folded and Unfolded/ at each of several temperatures [123]. This model re-duces the atomic complexity of the system to discrete conformational stateswhich evolve in continuous time according to Markovian kinetics for bothconformational transitions and exchange between replicas. The second makesuse of a continuous two-dimensional potential which is sufficiently simple tobe amenable to accurate analytical and numerical solution, while includingsome characteristics of molecular systems that were absent from the discretenetwork model. In both cases, the efficiency of RE conformational samplingwill be monitored by measuring NTE, the number round-trip transitions inthe conformational state of a replica, conditional on the low temperature ofinterest To, that occur in a given observation time. A transition event is atransit of a given replica from one conformation at To to the other confor-marion at To and back again regardless of route. Conceptually, this measurereflects the potential of RE to achieve rapid equilibration at the temperatureof interest by means of conformational transitions at temperatures other thanthe temperature of interest.

1.6.1 Discrete Network Replica Exchange (NRE)

The NRE model, the protein is assumed to exist in one of the two macrostatesF and U (for "folded" and "unfolded") which do not possess any internalstructure. Instead, it is assumed that the system evolves in time as a Pois-son process, in which instantaneous transitions between F and U occur af-ter waiting periods given by exponentially distributed random variables withmeans equal to the reciprocals of the folding or unfolding rates. If the tran-sition events are Markovian, then the simultaneous behavior of two uncou-pled non-interacting replicas can be represented by the four composite states

F2F1

fFlU2

1 Replica Exchange and AGBNP

U2F1

U2Ul

F1F2 UlF2

UlU2

Fig. 1.5. The kinetic network model for the discrete NRE model used by Zheng,et al. [123] The state labels represent the conformation (letter) and temperature(subscript) for each replica. For example, F2U1 represents the state in which replica1 is folded and at temperature T2, while replica 2 is unfolded at temperatureGray and black arrors correspond to folding and unfolding transitions, respectively,while the temperature at which the transition occurs is indicated by the solid anddashed lines (for T2 and T1, repectively). The bold arrows correspond to temperatureexchange transitions, with the solid and dashed lines denoting transitions with rateparameters a and wa, respectively.

in agreement with the activation free energies obtained from the free energyprofile along the folding coordinate. Replica exchange simulations were per-formed with a kinetic MC propogator, and exchanges of configurations wereattempted every Nx MC steps.

Behavior similar to that seen for the NRE model is also observed for thecontinuous potential: the efficiency is non-monotonic and exhibits a maximumat an optimal high temperature given by the maximal harmonic mean of thefolding and unfolding rates. However, the number of transitions is significantlylower than that predicted from the average of the harmonic means of the ratesas seen in the NRE model [123].

1.6.3 Non-Markovian effects revealed by comparison of continuousand discrete RE simulations

The origin of this discrepancy was clarified by a detailed comparison of simula-tions performed using the continuous potential with those performed with the


of differences in simulation protocols and parameters. The use of simplifiedmodel systems allows for thorough theoretical, conceptual, and computationalanalysis of the problem that can provide insights into the factors that limitthe efficiency of replica exchange in more realistic molecular systems. By si-multaneously studying a discrete network model of RE and RE on a simplifiedtwo-dimensional potential, it is possible to clarify to some degree the originsand effects of anti-Arrhenius and non-Markovian kinetics on the efficiencyof RE. Furthermore, these results suggest that the use of "training" simula-tions to explore some aspects of the temperature dependence for folding ofthe atomic level models prior to performing replica exchange studies could beuseful in improving the overall efficiency of the calculation.

There are still unresolved questions that could be profitably addressed bysimplified models of RE. One question for which the two-dimensional systemwould be appropriate is the study of the relationship between conformationaland thermal diffusion. Optimization of the diffusion of replicas in temperaturespace has been a major focus of recent theoretical and computational studyof the replica exchange method [94, 105-110,168]. However, the convergenceof thermodynamic quantities is not limited by thermal diffusion per se, butby the exploration of the conformational space of the system. While verypoor thermal diffusion obviously defeats the purpose of replica exchange byeffectively reducing it to a set of parallel uncoupled simulations, it is notclear that further optimization of thermal diffusion that is already "reasonablygood" will automatically improve convergence. The exact relationship betweenthermal and conformational diffusion remains to be fully clarified, and we lookforward to studying this and other questions using simplified continuous anddiscrete models of replica exchange. [124]

1.7 Conclusion

We have demonstrated that the OPLS-AA/AGBNP+ and REMD can cap-ture the thermodynamics of peptide folding (for instance, the G-peptide andC-peptide [97]1 and protein-ligand binding (N-palmitoylglycine complexed tocytochrome P450 BM-3 [102]). OPLS-AA/AGBNP+ is effective in discrimi-nating the correct fold of a loop on a protein from competing misfolded con-formations. [122] This is an indication that our effective potential is suitablefor protein folding when considered in conjunction with our previous work ondetecting native folds from misfolded decoys. [14] While thermodynamics canbe calculated directly from replica exchange, kinetics cannot. We have shown,however, that network models can be constructed from the conformations gen-erated from REMD to calculate the kinetics of the system. [121] Also we haveshown that a kinetic network model with a discrete model of the RE systemcan provide insights into the kinetics of RE. [123] We have extended our in-vestigation into the behavior of RE with a simple continuous potential whichcaptures some of the kinetics of protein folding. [124] These simple models


References

1. W.A. Eaton, V. Mufioz, S.J. Hagen, G.S. Jas, L.J. Lapidus, E.R. Henry,J. HoRichter, Annu. Rev. Biophys. Biomol. Struct. 29,327 (2000)

2. J.K. Myers, T.G. Oas, Annu. Rev. Biochem. 71, 783 (2002)3. A.R. Dinner, A. Bali, L.J. Smith, C.M. Dobson, M. Karplus, Trends Biochem.

Sci. 25,331 (2000)4. J. Rumbley, L. Hoang, L. Mayne, S.W. Englander, Proc. Natl. Acad. Sci. USA

98, 105 (2001)5. A.R. Fersht, V. Daggett, Cell 108, 573 (2002)6. M. Vendruscolo, E. Paci, Curt. Opin. Struct. Biol. 13, 82 (2003)7. T. Lazaridis, M. Karplus, J. Mol. Biol. 288, 477 (1999)8. D. Petrey, B. Honig, Protein Science 9, 2181 (2000)9. T. Lazaridis, M. Karplus~ Curt. Opin. Struct. Biol. 10, 139 (2000)

10. B.D. Bursulaya, C.L. Brooks III, J. Phys. Chem. B 104, 12378 (2000)11. B.N. Dominy, C.L. Brooks III, J. Comput. Chem. 23,147 (2002)12. Y. Liu, D.L. Beveridge, Proteins: Struct. Funct. Genet. 46, 128 (2002)13. M. Feig, C.L. Brooks III, Proteins: Struct. Funct. Genet. 49, 232 (2002)14. A.K. Felts, E. Gallicchio, A. Wallqvist, R.M. Levy, Proteins: Struct. Funct.

Genet. 48, 404 (2002)15. Y.M. Rhee, V.S. Pande, Biophys. J. 84, 775 (2003)16. R.M. Levy, E. Gallicchio, Annu. Rev. Phys. Chem. 49,531 (1998)17. A.R. Dinner, T. Lazaridis, M. Karplus, Proc. Natl. Acad. Sci. USA 96, 9068

(1999)18. B. Zagrovic, E.J. Sorin, V. Pande, J. Mol. Biol. 313, 151 (2001)19. R. Zhou, B.J. Berne, Proc. Natl. Acad. Sci. USA 99, 12777 (2002)20. R. Zhou, Proteins: Struct. Funct. Genet. 53, 148 (2003)21. B. Roux, T. Simonson, Biophys. Chem. 78, 1 (1999)22. D. Bashford, D.A. Case, Annu. Rev. Phys. Chem. 51, 129 (2000)23. T. Simonson, Curt. Opin. Struct. Biol. 11, 243 (2001)24. J. Zhu, Y. Shi, H. Liu, J. Phys. Chem. B 106, 4844 (2002)25. M. KrS1, J. Comput. Chem. 24, 531 (2003)26. A. Suenaga, J. Mol. Struct. (Theochem) 634, 235 (2003)27. E. Gallicchio, R.M. Levy~ J. Comput. Chem. 25,479 (2004)28. P.E. Wright, H.J. Dyson~ R.A. Lerner, Biochemistry 27~ 7167 (1988)29. R.M. Epand, H.A. Scheraga, Biochemistry 7, 2864 (1968)30. J.C. Howard, A. Ali, H.A. Scheraga, F.A. Momany, Macromol. 8, 607 (1975)31. B.H. Zimm, J.K. Bragg, J. Chem. Phys. 31, 526 (1959)32. S. Lifson, A. Roig, J. Chem. Phys. 34, 1963 (1961)33. F. Blanco, M. Ram/rez-Alvarado, L. Serrano, Curr. Opin. Struct. Biol. 8, 107

(1998)34. A. Bierzynski, P.S. Kim, R.L. Baldwin, Proc. Natl. Acad. Sci. USA 79, 2470

(1982)35. K.R. Shoemaker, P.S. Kim, D.N. Brems, S. Marqusee~ E.J. York, I.M. Chaiken,

J.M. Stewart, R.L. Baldwin, Proc. Natl. Acad. Sci. USA 82, 2349 (1985)36. F.J. Blanco, G. Rivas, L. Serrano, Nature Struc. Biol. 1, 584 (1994)37. F.J. Blanco, L. Serrano, Eur. J. Biochem. 230, 634 (1995)38. V. MmSoz, P.A. Thompson, J. Hofrichter~ W.A. Eaton, Nature 390, 196 (1997)39. V. Mufioz, E.R. Henry, J. Hofrichter, W.A. Eaton, Proc. Natl. Acad. Sci. USA

95, 5872 (1998)


80. P.H. Hiinenberger, V. Helms, N. Narayana, S.S. Taylor, J.A. McCammon, Bio-chemistry 38(8), 2358 (1999)

81. T. Simonson, A.T. Brfinger, J. Phys. Chem. 98, 4683 (1994)82. D. Sitkoff, K.A. Sharp, B. Honig, J. Phys. Chem. 98, 1978 (1994)83. C.S. Rapp, R.A. Friesner, Proteins: Struct., Funct., Genet. 35, 173 (1999)84. F. Fogolari, G. Esposito, P. Viglino, H. Molinari, J. Comp. Chem. 22, 1830

(2001)85. E. Pellegrini, M.3. Field, 3. Phys. Chem. A 106, 1316 (2002)86. C. Curutchet, C.J. Cramer, D.C. Truhlar, M.F. Ruiz-Lbpez, D. 1Kinaldi,

M. Orozco, F.J. Luque, 3. Comp. Chem. 24, 284 (2003)87. A. Wallqvist, D.C. Covell, J. Phys. Chem. 99, 13118 (1995)88. E. Gallicchio, M.M. Kubo, R.M. Levy, J. Phys. Chem. B 104, 6271 (2000)89. P~.M. Levy, L.Y. Zhang, A.K. Gallicchio, E. amd Felts, 3. Am. Chem. Soc.

25(31), 9523 (2003)90. M. Nina, D. Beglov, B. Roux, J. Phys. Chem. B 101, 5239 (1997)91. Y. Sugita, Y. Okamoto, Chem. Phys. Left. 314, 141 (1999)92. A. Mitsutake, Y: Sugita, Y. Okamoto, Biopolymers 60, 96 (2001)93. R.H. Swendsen, J.S. Wang, Phys. Rev. Left. 57, 2607 (1986)94. K. Hukushima, K. Nemoto, J. Phys. Soc. Jpn. 65, 1604 (1996)95. N. Metropolis, A.W. Rosenbluth, M.N. P~osenbluth, A.H. Teller, E. Teller, J.

Chem. Phys. 21, 1087 (1953)96. H. Nymeyer, S. Gnanakaran, A.E. Garcla, Meth. Enzymol. 383, 119 (2004)97. A.K. Felts, Y. Harano, E. Callicchio, P~.M. Levy, Proteins: Struct. l~nct. Bioin-

form. 56, 310 (2004)98. M. Cecchini, F. Rao, M. Seeber, A. Caflisch, J. Chem. Phys. 121, 10748 (2004)99. H.H.G. Tsai, M. Reches, C.J. Tsai, K. Cunasekaran, E. Gazit, P~. Nussinov,

Proc. Natl. Acad. Sci. USA 102, 8174 (2005)100. A. Baumketner, J.E. Shea, Biophys. J. 89, 1493 (2005)101. C.M. Verkhivker, P.A. Rejto, D. Bouzida, S. Arthurs, A.B. Colson, S.T. Freer,

D.K. Cehlhaar, V. Larson, B.A. Luty, T. Marrone, P.W. Rose, Chem. Phys.Left. 337, 181 (2001)

102. K.P. Ravindranathan, E. Gallicchio, R.A. Friesner, A.E. McDermott,Levy, J. Am. Chem. $oc. 128, 5786 (2006)

103. F. Rao, A. Caflisch, J. Chem. Phys. 119, 4035 (2003)104. M.M. Seibert, A. Patriksson, B. Hess, D. van der Spoel, J. Mol. Biol. 354, 173

(2005)105. D.A. Koi~e, 3. Chem. Phys. 117, 6911 (2002)106. A. Kone, D.A. Kofke, J. Chem. Phys. 122, 206101 (2005)107. C. Predescu, M. Predescu, C.V. Ciobanu, 3. Chem. Phys. 120, 4119 (2004)108. C. Predescu, M. Predescu, C.V. Ciobanu, J. Phys. Chem. B 109, 4189 (2005)109. N. Rathore, M. Chopra, 3.3. de Pablo, J. Chem. Phys. 122, 024111 (2005)110. S. Trebst, M. Troyer, U.H.E. Hansmann, J. Chem. Phys. 124, 174903 (2006)111. D.M. Zuckerman, E. Lyman, J. Chem. Theory Comput. 2, 1200 (2006)112. D.M. Zuckerman, J. Chem. Theory Comput. 2, 1693 (2006)113. D.A.C. Beck, G.W.N. White, V. Daggett, 3. Struct. Biol. 157, 514 (2007)114. S.I. Segawa, M. Sugihara, Biopolymers 23, 2473 (1984)115. M. Oliveberg, Y.J. Tan, A.R. Fersht, Proc. Natl. Acad. Sci. USA 92, 8926

(1995)116. M. Karplus, J. Phys. Chem. B 104, 11 (2000)


150. V. Mufioz, L. Serrano~ J. Mol. Biol. 245, 275 (1995)151. C. Mitchinson, R.L. Baldwin, Proteins: Struct. Funct. Genet. 1:23 (1986)152. B.I. Dahiyat, S.L. Mayo, Science 278, 82 (1997)153. K. Zhu, M.R. Shirts, I~.A. Friesner, J. Chem. Theory Comput. 3, 2108 (2007)154. P.I.W. de Bakker, M.A. DePristo, D.F. Burke, T.L. Blundell, Proteins: Struct.

Funct. Bioinform. 51, 21 (2003)155. M.A. DePristo, P.I.W. de Bakker, S.C. Lovell, T.L. Blundell, Proteins: Struct.

Funct. Bioinform. 51, 44 (2003)156. A. Ghosh, C.S. Rapp, R.A. Friesner, J. Phys. Chem. B 102, 10983 (1998)157. M. Andrec, D.A. Snyder, Z. Zhou, J. Young, M.G.T., R.M. Levy, Proteins:

Struct. Funct. Bioinform. 69, 449 (2007)158. D.K. Klimov, D. Thirumalal, Proc. Natl. Acad. Sci. USA 97, 2544 (2000)159. P.G. Bolhuis, Proc. Natl. Acad. Sci. USA 100, 12129 (2003)160. E. Gallicchio, M. Andrec, A.K. Felts, R.M. Levy, J. Phys. Chem. B 109, 6722

(2005)161. P.R.O. Montellano, Cytoehrome P450: Structure, Mechanism and Biochem-

istry, 2nd edn. (Plenum Press, New York, 1995)162. V. Guallar, R.A. Friesner, J. Am. Chem. Soc. 126, 8501 (2004)163. D.C. Haines, D.R. Tomchick, M. Machius, J.A. Peterson, Biochemistry 40,

13456 (2001)164. P.A. Williams, J. Cosine, A. Ward, H.C. Angova, D.M. Vinkovic, H. Jhoti,

Nature 424, 464 (2003)165. P.A. Williams, J. Cosme, D.M. Vinkovic, A. Ward, H.C. Angove, P.J. Day,

C. Vonrhein, I.J. Tickle, H. Jhoti, Science 305, 683 (2004)166. G.A. Schoch, J.K. Yano, M.R. Wester, K.J. Griffin, C.D. Stout, E.F. Johnson,

J. Biol. Chem. 279, 9497 (2004)167. T. Jovanovic, R. Farid, R.A. Friesner, A.E. McDermott, J. Am. Chem. Soc.

127, 13548 (2005)168. W. Nadler, U.H.E. Hansmann, arXiv:0709.3289vl (2007)

Protein Folding and Binding: Effective Potentials, Replica ... · Protein Folding and Binding:...

Documents

Transcript of Protein Folding and Binding: Effective Potentials, Replica ... · Protein Folding and Binding:...