Sequence, structure, and cooperativity in folding of elementary ...

7
Correction BIOPHYSICS AND COMPUTATIONAL BIOLOGY, CHEMISTRY Correction for Sequence, structure, and cooperativity in folding of elementary protein structural motifs,by Jason K. Lai, Ginka S. Kubelka, and Jan Kubelka, which appeared in issue 32, August 11, 2015, of Proc Natl Acad Sci USA (112:98909895; first published July 27, 2015; 10.1073/ pnas.1506309112). The authors note that Fig. 5 appeared incorrectly. The corrected figure and its legend appear below. www.pnas.org/cgi/doi/10.1073/pnas.1601618113 Fig. 5. Free energy profiles for experimental reaction coordinates. (A) Free energy profiles as a function of the total number of helical residues at ap- proximately every 14 K from 274 K (blue) to 344 K (yellow) for the P22 subdomain (Left) and αtα (Right). (B) The free energy as function of the folded probability for each individual 13 C-labeled stretch at two temperatures. The colors correspond to the label color scheme in Fig. 2. The apparent noise in some of the plots is due to the limited number of configurations for certain values of P, as stretches as short as two peptide bonds are considered. E1126 | PNAS | February 23, 2016 | vol. 113 | no. 8 www.pnas.org

Transcript of Sequence, structure, and cooperativity in folding of elementary ...

Page 1: Sequence, structure, and cooperativity in folding of elementary ...

Correction

BIOPHYSICS AND COMPUTATIONAL BIOLOGY, CHEMISTRYCorrection for “Sequence, structure, and cooperativity infolding of elementary protein structural motifs,” by Jason K.Lai, Ginka S. Kubelka, and Jan Kubelka, which appearedin issue 32, August 11, 2015, of Proc Natl Acad Sci USA

(112:9890–9895; first published July 27, 2015; 10.1073/pnas.1506309112).The authors note that Fig. 5 appeared incorrectly. The corrected

figure and its legend appear below.

www.pnas.org/cgi/doi/10.1073/pnas.1601618113

Fig. 5. Free energy profiles for experimental reaction coordinates. (A) Free energy profiles as a function of the total number of helical residues at ap-proximately every 14 K from 274 K (blue) to 344 K (yellow) for the P22 subdomain (Left) and αtα (Right). (B) The free energy as function of the foldedprobability for each individual 13C-labeled stretch at two temperatures. The colors correspond to the label color scheme in Fig. 2. The apparent noise in someof the plots is due to the limited number of configurations for certain values of P, as stretches as short as two peptide bonds are considered.

E1126 | PNAS | February 23, 2016 | vol. 113 | no. 8 www.pnas.org

Page 2: Sequence, structure, and cooperativity in folding of elementary ...

Sequence, structure, and cooperativity in folding ofelementary protein structural motifsJason K. Laia, Ginka S. Kubelkab, and Jan Kubelkab,1

aDepartment of Molecular Biology, University of Wyoming, Laramie, WY 82071; and bDepartment of Chemistry, University of Wyoming, Laramie, WY 82071

Edited by William A. Eaton, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD, and approved July7, 2015 (received for review March 31, 2015)

Residue-level unfolding of two helix-turn-helix proteins—one nat-urally occurring and one de novo designed—is reconstructed frommultiple sets of site-specific 13C isotopically edited infrared (IR) andcircular dichroism (CD) data using Ising-like statistical-mechanicalmodels. Several model variants are parameterized to test theimportance of sequence-specific interactions (approximated byMiyazawa–Jernigan statistical potentials), local structural flexibility(derived from the ensemble of NMR structures), interhelical hydro-gen bonds, and native contacts separated by intervening disor-dered regions (through the Wako–Saitô–Muñoz–Eaton scheme,which disallows such configurations). The models are optimizedby directly simulating experimental observables: CD ellipticity at222 nm for model proteins and their fragments and 13C-amide I′bands for multiple isotopologues of each protein. We find thatdata can be quantitatively reproduced by the model that allowstwo interacting segments flanking a disordered loop (double se-quence approximation) and incorporates flexibility in the nativecontact maps, but neither sequence-specific interactions nor hydro-gen bonds are required. The near-identical free energy profiles asa function of the global order parameter are consistent withexpected similar folding kinetics for nearly identical structures.However, the predicted folding mechanism for the two motifs isdifferent, reflecting the order of local stability. We introduce freeenergy profiles for “experimental” reaction coordinates—namely,the degree of local folding as sensed by site-specific 13C-edited IR,which highlight folding heterogeneity and contrast its overall, av-erage description with the detailed, local picture.

protein thermodynamics | site-specific folding | Ising-like models |statistical mechanics

The original protein-folding problem of structure predictionfrom amino acid sequences is for many small proteins an

accomplished goal (1–3). By contrast, the seemingly much simplerproblem of the folding mechanism of a known protein structureremains unsolved, even for the smallest proteins (4). The maindifficulty stems from limited experimental information aboutpartially folded and intermediate states of proteins along theirfolding pathways. Experiments that use multiple spectroscopicprobes with complementary or site-specific structure sensitivitiescan, under favorable circumstances, overcome this obstacle.Noncoincident equilibrium unfolding curves from different probesare a clear sign of noncooperative transitions (5), which, in prin-ciple, allow intermediate states to be detected and characterized(6). Muñoz and coworkers pioneered the multiprobe equilibriumapproach for studies of “downhill” folding (5, 7, 8), where coop-erativity is minimal, but subsequently demonstrated its applica-bility to other fast-folding proteins (9–13) and extended theanalysis to kinetics (14). Following their work, experiments fromother laboratories have reported probe-dependent foldingequilibria and kinetics in a number of small proteins (15–26).However, because the spectroscopic signals do not directly reporton structure and are often subject to interferences from non-structural effects that lead to ambiguities in their interpretations,the challenge lies in relating the observed experimental data to theunderlying structural and energetic states of the protein.

In our laboratory, the telltale nonoverlapping unfolding tran-sitions were observed by site-specific 13C isotopically edited IRspectroscopy (27) in two small helix–turn–helix (hth) proteins(19, 25) (Fig. 1). hth is an important structural motif, often foundas an autonomously stable unit in larger helical domains. Suchautonomous motifs, of which the P22 subdomain (28) is an ex-ample (Fig. 1A), are believed to be important as potential foldingnuclei, or “foldons” (16, 29). hth motifs are also excellent modelsfor studying folding of secondary and tertiary structure; the αtα(Fig. 1B) was de novo designed (30) specifically for that purpose.Both motifs have in common that tertiary, interhelical interac-tions are critical for their folding as evident from the lack of anyresidual structure in peptide fragments corresponding to theindividual helices (19, 25). Conversely, site-specific unfoldingdata reveal quite distinct patterns of local thermal stabilities:Whereas the P22 subdomain unfolds from its N terminus towardthe turn (19), αtα appears to unfold from the turn toward thechain ends (25). Here, to explain these data within the unifyingframework of a physical model, we use one of the simplest, butpowerful, descriptions of protein folding—the Ising-like statistical–mechanical model.Ising-like models have an impressive track record in repro-

ducing folding experimental data (7, 31–34) and even yieldfolding mechanism consistent with all-atom molecular dynamics(MD) simulations (35). However, replicating unfolding datafrom multiple local sites represents a new test for these simplemodels. In addition, because the Ising-like models are derivedsolely from the protein native structure, the even greater chal-lenge is to explain the differences in the local unfolding—as in-dicated by 13C experimental data (19, 25)—of two proteinswith nearly identical structures. Thirdly, because one of the

Significance

Although novel experimental approaches open exciting op-portunities for understanding the elusive protein-folding path-ways, interpreting experiments in terms of microscopic foldingmechanisms often poses a serious challenge. Here, we dem-onstrate that extensive sets of global and site-specific unfold-ing data for two small proteins can be quantitatively explainedby a simple statistical–mechanical model derived from knownnative protein structures. Remarkably, differences in foldingbetween two proteins with similar structures are capturedwithout need to consider sequence-specific interresidue in-teractions. This finding is significant because it implies thatknowledge of the native structure—which implicitly includessequence information—is sufficient for predicting detailed, site-specific folding mechanisms.

Author contributions: J.K. designed research; J.K.L. and G.S.K. performed research; J.K.L.and J.K. analyzed data; and J.K.L., G.S.K., and J.K. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.1To whom correspondence should be addressed. Email: [email protected].

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1506309112/-/DCSupplemental.

9890–9895 | PNAS | August 11, 2015 | vol. 112 | no. 32 www.pnas.org/cgi/doi/10.1073/pnas.1506309112

Page 3: Sequence, structure, and cooperativity in folding of elementary ...

model proteins (the P22 subdomain) is naturally occurring, whereasthe other (αtα) is de novo designed, the native-centric assump-tion, which is usually justified by natural evolution, may not bestrictly valid (36). The significance of the modeling is thereforenot just in interpreting experimental data, but also in validatingthe underlying assumptions of the Ising-like models, which haveimportant implications for the general understanding of proteinfolding. Furthermore, evaluating the importance of additionalapproximations and specific model parameters for the correctdescription of the experimental data can highlight their respec-tive roles in determining the observed folding behavior. Weevaluate the effects of sequence-specific interactions, which weapproximate byMiyazawa–Jernigan (MJ) statistical contact potentials

(37), backbone hydrogen bonds (38), and formation of nonlocalcontacts between separate native segments. For the latter, wecompare two widely used variants of the Ising-like model: thedouble-sequence approximation with loops (DSA/L) (32–35),which allows such contacts, andWako-Saitô-Muñoz-Eaton (WSME)(39, 40), where the contacts only form within a contiguous nativestretch. Because the hth motifs are among the simplest structuresthat combine both short- and long-range contacts (Fig. 1), theyare ideal models for such tests.

Results and DiscussionOptimization of Ising-Like Models. The detailed description of theIsing-like models and their parameterization is given in SI Ap-pendix. All are derived from coarse-grained contact maps, witheach contact weighted by its relative abundance in the NMRstructural set (Fig. 1). This weighting not only deals with thetechnical problem of choosing which NMR structure to use, butalso includes additional information about how well defined thestructure of particular regions is. We test the significance of theweighting by considering a contact map for an average structure(SI Appendix, Fig. S1 A and B). The adjustable parameters of themodel—the contact interaction energy, entropy cost of orderinga native bond and the Flory characteristic ratio for the DSA/Lmodel—were optimized for each hth motif by simultaneouslyfitting all experimental data, including the circular dichroism(CD) for the fragments (Fig. 2). The site-specific interaction po-tentials are approximated by the MJ matrix (37) with a singleadjustable scaling factor such that the number of adjustable pa-rameters is the same as for nonspecific interactions. Including hy-drogen bonds adds an extra parameter for the hydrogen-bondenergy.Additional parameters are necessary for simulation of exper-

imental signals (SI Appendix). The CD ellipticities at 222 nm arecalculated by using a standard length-dependent formula for theα-helix and random coil (41) and generally accepted ranges for

Fig. 1. Structure of hth motifs. (A, Upper and B, Upper) Representativestructures solved by NMR for the P22 subdomain (A; PDB ID code 1GP8) andde novo designed αtα (B; PDB ID code 1ABZ) are shown here. (A, Lower andB, Lower) Contact maps weighted by the fractional occurrence of eachcontact in the NMR ensemble (on the scale 0.0–1.0, white to red).

Fig. 2. Modeling experimental unfolding data with different Ising-like model parameterizations. The P22 subdomain (A) and αtα (B) thermal unfoldingprobed by CD ellipticity and site-specific 13C isotope-edited spectroscopy. (A, Lower and B, Lower) Results of fitting the experimental data with severalvariants of the Ising-like model. (A, Upper and B, Upper) Fractional populations of the isotopically labeled stretches, derived from the sets of temperature-dependent experimental 13C amide I′ spectra (SI Appendix) by SMSA decomposition (symbols) and best model predictions (solid lines). The color schemecorresponds to the highlighted 13C-labeled regions in the cartoon protein representations. (A, Lower and B, Lower) Mean residue molar ellipticity at 222 nmof each protein (red), a fragment corresponding to the N-terminal helix (blue), and the C-terminal helix (black). (A, i and B, i) DSA/L with a weighted contactmap (CM) and a single contact energy term. (A, ii and B, ii) DSA/L with a weighted CM and residue-specific potentials approximated by MJ map. (A, iii andB, iii) DSA/L with a CM for an average structure and a single contact energy term. (A, iv and B, iv) The WSME model with a weighted CM and a single contactenergy term.

Lai et al. PNAS | August 11, 2015 | vol. 112 | no. 32 | 9891

BIOPH

YSICSAND

COMPU

TATIONALBIOLO

GY

CHEM

ISTR

Y

Page 4: Sequence, structure, and cooperativity in folding of elementary ...

temperature-dependent baselines (SI Appendix). Including thefragment CD is crucial; otherwise, the models tend to fold thehelices independently. To reproduce temperature-dependent13C amide I′ IR spectra, the Ising-like model is combined withthe recently developed Shifted Multivariate Spectra Analysis(SMSA) method (42). Because of the extensive amount of data,we only show fractional folded populations (Fig. 2). The actualexperimental spectra and their model predictions are found in SIAppendix, Figs. S2–S5. The reliability of the model can be furtherverified by inspection of resulting spectral components (SI Ap-pendix, Figs. S2–S5) and by the consistency between estimatedtemperature-dependent amide I′ frequency shifts (SI Appendix,Table S3) and known shifts of the amide I′ in model compounds(19, 25). The resulting model parameters are also summarized inSI Appendix, Tables S1 and S2.

Significance of Residue-Specific Interactions. The most remarkableand quite unexpected result is that just a single energy parameterfor all interresidue contacts in the DSA/L Ising-like model (32–35) is sufficient for explaining all of the experimental data forboth motifs (Fig. 2 A, i and B, i). By contrast, inclusion ofsequence-specific interactions via the MJ matrix (37) producesworse results, although it is not immediately obvious from Fig. 2A, ii and B, ii, which only compares CD data and fractionalpopulations of native states. However, examination of the un-derlying component amide I′ IR spectra (SI Appendix, Fig. S3)shows that several are unphysical. Specifically, for the N termi-nus of the P22 subdomain (SI Appendix, Fig. S3A) and the Nterminus of the turn in αtα (SI Appendix, Fig. S3J), the foldedcomponent spectra are too intense, which compensates for thefolded population being too low on the corresponding segmentspredicted by the model with MJ potentials. In addition, for theP22 subdomain, other spectral components are suspect as well,notably the unfolded ones for both the N and C termini of thefirst helix. By contrast, without MJ potentials, both intensitiesand bandshapes (SI Appendix, Fig. S2) are clearly more in linewith expectations. The folded spectrum for the C terminus ofhelix 1 in αtα still appears too intense (SI Appendix, Fig. S2J),perhaps implying that the folded population is too low, but theintensity is now comparable to that measured for other double-labeled segments (e.g., SI Appendix, Fig. S2 D and L). The dif-ficulty with modeling this particular segment may also stem fromthe fact that it is essentially disordered, even at the lowest tem-perature. Conversely, for turn-C, which is another segment withsimilarly low native population, the component spectra do notget excessively intense, even with MJ potentials.These results have two important implications. First, the res-

idue-specific interactions do not seem to be necessary for in-ferring the folding mechanism, but the same energy parametercommon to all interresidue contact is sufficient to account for allof the data. This result parallels the multiple successful applica-tions of the model of Henry, Eaton, and others, which likewiseuses only a single contact energy (7, 31–35), and it is also consis-tent with the findings of Muñoz and coworkers (43, 44) that se-quence variability plays a less significant role in determiningfolding energy landscapes of natural proteins in comparison withtopology and size. Obviously, this result does not mean that thespecific amino acid sequence is not important because it is re-sponsible for folding and stability of the protein in the first place.It merely suggests that all of the sequence-specific interactionsare already incorporated in the folded structure or, strictlyspeaking, in its contact map (Fig. 1). Second, even in such a case,if described correctly, amino acid-specific parameters should notmake the results worse. This result means that MJ potentials donot correctly capture the interresidue contact interactions—atleast not in the context of the Ising-like model for protein folding—in either the P22 subdomain or in αtα.

Dynamics in the Native Structure: Weighted Contact Maps.A notableresult in Fig. 2 is that most of the isotopically labeled segmentshave <100% folded populations at the start of the unfoldingcurves (lowest temperatures), but retain >0% at high tempera-tures. That the protein is never completely folded is an importantresult of the model and cannot be captured, for example, by asimple chemical mass-action scheme that assumes transitionsfrom fully folded to fully unfolded (19, 25). Partly folded, flexibleregions are present in all proteins, but are expected to beabundant, particularly in small proteins and fragments that arefrequently used as models for folding studies. Likewise, incompleteunfolding at high temperature is common and evidenced here bythe residual α-helical CD at high temperatures, particularly forthe P22 subdomain (Fig. 2A).Because our model is derived from weighted contact maps

(Fig. 1), it contains information about the relative flexibility ofparticular regions. The significance of this weighting can betested by comparing the model derived from the contact map ofan average structure (SI Appendix, Fig. S1 A and B). Here, the fitexhibits little change for the P22 subdomain (Fig. 2 A, iii), but isnotably worse for αtα (Fig. 2 B, iii). Note that the fragment CDcould not be fitted at all. The effect on the P22 subdomain issmall, most likely because the most flexible region is on the Nterminus, which has few contacts, whereas the most criticalinterhelical contacts near the turn are well established (SI Ap-pendix, Fig. S1C). It should also be noted that averaging thestructure and its associated contact map may be somewhat prob-lematic because it may not lead to physically meaningful results.However, from a purely practical standpoint, weighting thecontact maps according to the set of structures at hand was themost straightforward way to represent the experimental struc-tural information and, at least for αtα, is clearly beneficial tothe modeling.

Importance of Nonlocal Contacts: The WSME Model. The popularWSME scheme (39, 40) only considers contacts within a nativestretch. This approximation greatly simplifies the enumeration ofthe partition function (45), which can be done exactly withoutany restriction on the number of native stretches. However, theneglect of contacts between separate native segments also resultsin a failure of the model to reproduce experimental data (Fig. 2A, iv and B, iv). For αtα, where data clearly show that helicesmust come into contact with the unfolded section in between,this result can be expected. However, the same result for the P22subdomain demonstrates that the importance of nonlocal contactsis general. Despite the restriction to only two native segments,the DSA/L scheme provides a more realistic representation ofthe unfolding than an unlimited number of native segmentsallowing only local interactions. Even additional modifications,such as consideration of two separate entropy parameters forhelical and nonhelical parts to compensate for the additionalunordered loop entropy term in DSA/L, did not lead to anysignificant improvement (SI Appendix, Fig. S6 C and D). Thisfinding is consistent with recent results of Henry et al. (35),whoshow that DSA/L is generally a very good approximation forsmall proteins with less than ∼50 amino acids.

Hydrogen Bonds. The role of polypeptide backbone hydrogenbonds vs. specific side-chain interactions in protein folding is stillquestioned (38). The models above did not explicitly considerinterhelical hydrogen bonds, but their success in description ofthe experimental data suggests that the contribution of hydro-gen bonds is not significant. Not surprisingly, when testing thishypothesis directly by including an additional hydrogen-bondenergy term, the effect is minimal (SI Appendix, Fig. S6E), andthe resulting hydrogen bond energy parameters are near zero(SI Appendix, Table S1). This result is in agreement with pastimplementations of Ising-like models (32–35) that do not explicitly

9892 | www.pnas.org/cgi/doi/10.1073/pnas.1506309112 Lai et al.

Page 5: Sequence, structure, and cooperativity in folding of elementary ...

consider hydrogen-bond energies, but is somewhat at odds withexperimental results, which suggest that protein stability is sub-stantially affected by the backbone hydrogen bonds (46, 47). Thelikely explanation for this discrepancy is that the hydrogen bondsmay be effectively included in the local (i, i + 4) interresiduecontacts. As our tests show, when these local contacts areomitted from the energy function, hydrogen bonds become crit-ical for folding (SI Appendix, Fig. S6F and Table S1). For the P22subdomain, these give essentially the same results as the localcontacts (Fig. 2 A, i); in contrast, the latter yield much better fitsto the experimental data for αtα (Fig. 2 B, i).

Comparison of the Thermal Unfolding of the Two hth Motifs. Suc-cessfully optimizing a model that explains experimental data forboth studied proteins provides the basis for quantitative, detailedcomparison of folding. Such comparison of the P22 subdomainand αtα is interesting for several reasons. First, it underlines theroles of the overall topology and sequence-specific interactions(48, 49) in determining the local stability of native structuralelements. As demonstrated above, even a simple model based ona coarse-grained representation of native contacts with a single,common interaction energy parameter can capture quite distinctunfolding behavior in very similar structures. Although ideallyone would like to fit the data for both proteins with the sameset of parameters, such goals are unfeasible for the heavilycoarse-grained approach tested here. Generally, the higher thelevel of coarse-graining, the more specific the parameters be-come (50). Nevertheless, although the parameter values differsomewhat, consistency between different variants of the modelin fitting the sets of data for both proteins is remarkable (SIAppendix, Tables S1 and S2).Second, because αtα is de novo designed, the neglect of all

nonnative interactions by the Ising-like model, which can berationalized on the basis of molecular evolution, becomesquestionable. However, the Ising-like model works just as wellfor the αtα as it does for the naturally occurring P22 subdomain,and the effects of modifying the model and imposing additionalsimplifications are the same for both (Fig. 2). This finding im-plies that native interactions must dominate the stabilization ofthe partially folded, intermediate structures in both motifs.Conversely, the details of these interactions are very different, asevident from distinct patterns of thermal stability in individuallabeled regions of the two motifs. From the heat maps in Fig. 3,the differences in unfolding captured by site-specific experimentsare apparent: The P22 subdomain unfolds from the N terminusand is most stable at the helical segments near the turn, com-pared with αtα, which has little defined structure in the turnregion and the highest stability near the centers and toward thetermini of the α-helices. It is interesting to note that even in theP22 subdomain the turn itself is predicted to be less stable thanthe helices, which was missed by analysis with chemical mass-action models (19), although the general order of unfolding of

the other segments agreed with the Ising-like model. For αtα, theunfolding curves of the two turn segments could not even befitted with a two-state model; this result was attributed to thesesegments being essentially disordered (25), which the presentIsing-like model analysis confirms.

Free Energy Profiles: Kinetics and Mechanism of Folding. Our ex-perimental data only report on equilibrium folding and, conse-quently, contain no information about how the motifs actuallyfold in time. However, previous work has established empiricalcorrelations between the degrees of unfolding cooperativity inequilibrium and the free energy barriers and folding kinetics (9,51). In addition, because the Ising-like model provides a com-plete statistical–mechanical description of all states of the aminoacid chain, the folding kinetics and mechanism can be inferred.The kinetics is often well described as diffusion on a one-dimensional free energy profile (34, 52), calculated as a functionof a suitable coordinate that measures the degree of folding (34,35). Because each residue is by definition native or unfolded inthe Ising-like models, the number of native peptide bonds is anatural choice for a reaction coordinate. The free energy sur-faces for both motifs (Fig. 4A) are similar, and the only notabledifference is the greater variation of the αtα free energy withtemperature, reflecting a more pronounced and sharper unfoldingtransition (Fig. 2). The profiles for both proteins are essentiallybarrier-less at low temperatures, and only an insignificant bar-rier to folding appears near the transition midpoints (∼0.4 and∼0.3 kcal·mol−1, respectively). Negligible folding free-energybarriers are consistent with the heterogeneous unfolding pro-cess, where effectively a continuum of states with varying de-grees of native structure are populated, and suggest very fastfolding paralleling experimental data on other similar motifs(29, 53) and general trends expected for helical proteins of thissize (44, 54).The similarity of the one-dimensional free energy profiles

obscures the differences in folding mechanism that would beexpected from distinct patterns of local thermodynamic stability,

Fig. 3. Residue-by-residue thermal unfolding of two hth motifs. Populationfolded (from 0.0 blue, to 1.0 red) of individual peptide bonds for P22 sub-domain (Left) and αtα (Right) as a function of temperature. Cartoons on theright of each map show the location of the N-terminal (h1) and C-terminal(h2) helices.

Fig. 4. Comparison of the folding mechanism as predicted by the model.(A) Free energy profiles for the P22 subdomain (Left) and αtα (Right) as thefunction of the number of native peptide bonds, plotted approximatelyevery 14 K from 274 K (blue) to 344 K (yellow). (B) Probability of beingfolded (0.0 blue to 1.0 red) for each peptide bond at each value of theoverall number of native peptide bonds at 270K and 350K.

Lai et al. PNAS | August 11, 2015 | vol. 112 | no. 32 | 9893

BIOPH

YSICSAND

COMPU

TATIONALBIOLO

GY

CHEM

ISTR

Y

Page 6: Sequence, structure, and cooperativity in folding of elementary ...

because each value of the reaction coordinate averages manyactual microstates. More detail is revealed if the contribution ofeach individual peptide bond to the given value of the reactioncoordinate value is plotted, as in Fig. 4B. Viewed together withthe free energy profile (Fig. 4A), these plots illustrate the pre-dicted order of structure formation during the folding transition—i.e., diffusion of the population distribution from low to highdegree of native structure. It is evident that the folding pro-gresses from the segments with higher stability, which startforming near the top of the free energy barrier, to the leaststable parts, which do not reach their native states until thefolded minimum or not at all. Specifically, in the P22 sub-domain, the first to form would be the helical structure in thevicinity of the turn, whereas in αtα it is near the helix termini.The N-terminal helix in αtα is predicted to start folding beforethe other helix, which is expected because the fragment ex-periments show that helix 1 does show some degree of auton-omous stability (Fig. 2).

“Experimental” Reaction Coordinates: Probe-Specific Free EnergyProfiles. Although the number of native peptide bonds is thenatural reaction coordinate for Ising-like models, experiments donot directly measure the overall fraction of native or unfoldedamide bonds. The CD is sensitive to the amount of α-helix and itslength, whereas the 13C edited amide I′ IR spectrum reports onthe change in backbone conformation of the labeled segments.The experimental signals are often “built” into the free energyprofiles as a dividing surface (14, 34, 52), but a more straight-forward alternative is to consider a specific reaction coordinatefor each probe. One could then construct a free energy profile assensed by the particular experimental technique used to measurefolding. Comparison of such surfaces with the global one canreveal how closely each probe captures the overall folding and,conversely, how well the overall reaction coordinate reflectsunfolding measured by the specific probes. Moreover, the probe-specific profiles can be used to estimate—or analyze—the fold-ing kinetics measured by each particular method.Fig. 5 displays free energy profiles calculated for such probe-

specific reaction coordinates (SI Appendix). The total number ofα-helical residues, which mostly accounts for the observed CDellipticity at 222 nm, gives very similar profiles to those in Fig.4A. This finding is hardly surprising, because the folded structureis mostly α-helix, and the α-helical content, as measured by CD,should therefore closely follow the overall unfolding. On theother hand, profiles calculated as a function of the folded frac-tion of each 13C-labeled stretch, as measured by the IR (Fig. 5B),are quite distinct. The less stable regions (e.g., N terminus of theP22 subdomain and turn segments in αtα) show broad free en-ergy minima consistent with their considerable flexibility, which,upon increase in temperature, tend to further broaden and shifttoward the lower values of the order parameter. By contrast, themost stable parts (e.g., C terminus of the first helix of the P22subdomain and N terminus of helix 1 of αtα) have deep, narrownative minima that do not shift with temperature. Therefore,although proteins are often divided into two-state folders andnon-two-state folders, both gradual and two-state-like foldingscenarios may actually be observed within a single protein,depending on the type and position of the experimental probeused for its detection. Moreover, although Ising-like models as-sume only two states for each individual amide bond, the two-stateassumption is obviously not generally valid, even for stretches asshort as two amides; in this case, it would be justified only for themost highly thermodynamically stable segments of each modelhth motif.

Concluding RemarksSite-specific experiments offer valuable indications of the foldingcooperativity and of the relative thermodynamic stability of the

individual probed structural segments. Only when combined witha microscopic model for protein folding, however, do the detailsof the underlying conformational states emerge, along with in-sights into the origins of the observed behavior. Ising-like sta-tistical–mechanical models continue to demonstrate their utilityfor interpreting protein-folding experimental data. With very fewfree parameters, these simple models capture local thermody-namic stabilities, as probed by site-specific experiments, and,most notably, their differences between two structurally nearly-identical proteins. The latter is particularly remarkable, consid-ering that the Ising-like models are based solely on the coarse-grained representation of the native structure. Although one ofthe studied proteins is de novo designed, the success of the Ising-like model in reproducing its unfolding suggests that the basicassumptions—namely, the dominance of native interactions—are equally valid. Furthermore, no sequence-specific interactionsare needed, but subtle differences in the native contacts and theirrigidity, as reflected in the weighted contact maps, are sufficientto explain the unfolding stability patterns. Although this findingdoes not imply that the specific amino acid sequence is un-important, it suggests that its effects are already implicit in thedetails of the folded structure. Model comparison also highlightsthe importance of nonlocal contacts between disjoint nativestretches for the correct description of the noncooperativeunfolding of the studied proteins. Disallowing such long-rangecontact formation (WSME model) generally leads to predictionsof a more cooperative behavior, at odds with the experimentaldata. Finally, the free energy surfaces constructed from the op-timized model, although based only on equilibrium experiments,hint on the mechanism of folding. The stability appears to be key,because the more stable native segments are generally predictedto form first, consistent with both experimental studies of di-rected stability perturbations (49, 55) and with the recent state-of-the-art MD simulations (2).

Fig. 5. Free energy profiles for experimental reaction coordinates.(A) Free energy profiles as a function of the total number of helical resi-dues at approximately every 14 K from 274 K (blue) to 344 K (yellow) forthe P22 subdomain (Left) and αtα (Right). (B) The free energy as functionof the folded probability for each individual 13C-labeled stretch at twotemperatures. The colors correspond to the label color scheme in Fig. 2.The apparent noise in some of the plots is due to the limited number ofconfigurations for certain values of P, as stretches as short as two peptidebonds are considered.

9894 | www.pnas.org/cgi/doi/10.1073/pnas.1506309112 Lai et al.

Page 7: Sequence, structure, and cooperativity in folding of elementary ...

Materials and MethodsThe P22 subdomain [Protein Data Bank (PDB) ID code 1GP8] and de novodesigned αtα (PDB ID code 1ABZ) were synthesized by using Fmoc-basedsolid-phase peptide synthesis techniques on PS-3 and Tribute automatedpeptide synthesizers (Protein Technologies), respectively, with isotope vari-ants synthesized with amino acids 13C labeled at C=O. CD was performed ona Jasco J-815 spectropolarimeter in a 1-mm path length quartz cuvette andIR on a Bruker Tensor 27 FTIR spectrometer, equipped with a DLaTGS de-tector. Both CD and IR measurements were conducted in D2O-based buffersolution at pH 7.4 and pH 2.3 (uncorrected) for P22 and αtα, respectively.

Ising-like model partition function derivations for the DSA/L and WSMEmodels generally followed Kubelka et al. (34) and Bruscolini and Pelizzola

(45), respectively. Variants of the models included altering the energyfunction to incorporate the MJ potential matrix (37) or interhelical hydrogenbonds and defining the contact map to describe a single averaged structureor with contacts fractionally weighted according to occurrence in the NMRensemble. The CD ellipticity at 222 nm was modeled directly using well-established methods (41). For simulating 13C amide I′ IR data, the Ising-likemodel was combined with the SMSA method (42).

ACKNOWLEDGMENTS. We thank Milan Balaz for the use of circulardichroism and high performance liquid chromatography instruments andWilliam A. Eaton for helpful discussions. This work was supported by Na-tional Science Foundation CAREER Grant 0846140.

1. Freddolino PL, Harrison CB, Liu Y, Schulten K (2010) Challenges in protein foldingsimulations: Timescale, representation, and analysis. Nat Phys 6(10):751–758.

2. Lindorff-Larsen K, Piana S, Dror RO, Shaw DE (2011) How fast-folding proteins fold.Science 334(6055):517–520.

3. Adhikari AN, Freed KF, Sosnick TR (2012) De novo prediction of protein foldingpathways and structure using the principle of sequential stabilization. Proc Natl AcadSci USA 109(43):17442–17447.

4. Skinner JJ, et al. (2014) Benchmarking all-atom simulations using hydrogen exchange.Proc Natl Acad Sci USA 111(45):15975–15980.

5. Muñoz V (2002) Thermodynamics and kinetics of downhill protein folding in-vestigated with a simple statistical mechanical model. Int J Quantum Chem 90(4-5):1522–1528.

6. Eaton WA (1999) Searching for “downhill scenarios” in protein folding. Proc NatlAcad Sci USA 96(11):5897–5899.

7. Garcia-Mira MM, Sadqi M, Fischer N, Sanchez-Ruiz JM, Muñoz V (2002) Experimentalidentification of downhill protein folding. Science 298(5601):2191–2195.

8. Sadqi M, Fushman D, Muñoz V (2006) Atom-by-atom analysis of global downhillprotein folding. Nature 442(7100):317–321.

9. Naganathan AN, Doshi U, Muñoz V (2007) Protein folding kinetics: Barrier effects inchemical and thermal denaturation experiments. J Am Chem Soc 129(17):5673–5682.

10. Fung A, Li P, Godoy-Ruiz R, Sanchez-Ruiz JM, Muñoz V (2008) Expanding the realm ofultrafast protein folding: gpW, a midsize natural single-domain with α+β topologythat folds downhill. J Am Chem Soc 130(23):7489–7495.

11. Naganathan AN, Li P, Perez-Jimenez R, Sanchez-Ruiz JM, Muñoz V (2010) Navigatingthe downhill protein folding regime via structural homologues. J Am Chem Soc132(32):11183–11190.

12. Naganathan AN, Muñoz V (2014) Thermodynamics of downhill folding: Multi-probeanalysis of PDD, a protein that folds over a marginal free energy barrier. J Phys ChemB 118(30):8982–8994.

13. Sborgi L, et al. (2015) Interaction networks in protein folding via atomic-resolutionexperiments and long-time-scale molecular dynamics simulations. J Am Chem Soc137(20):6506–6516.

14. Li P, Oliva FY, Naganathan AN, Muñoz V (2009) Dynamics of one-state downhillprotein folding. Proc Natl Acad Sci USA 106(1):103–108.

15. Yang WY, Pitera JW, Swope WC, Gruebele M (2004) Heterogeneous folding of thetrpzip hairpin: Full atom simulation and experiment. J Mol Biol 336(1):241–251.

16. Maity H, Maity M, Krishna MMG, Mayne L, Englander SW (2005) Protein folding: Thestepwise assembly of foldon units. Proc Natl Acad Sci USA 102(13):4741–4746.

17. Ma H, Gruebele M (2005) Kinetics are probe-dependent during downhill folding of anengineered lambda6-85 protein. Proc Natl Acad Sci USA 102(7):2283–2287.

18. Hauser K, Krejtschi C, Huang R, Wu L, Keiderling TA (2008) Site-specific relaxationkinetics of a tryptophan zipper hairpin peptide using temperature-jump IR spec-troscopy and isotopic labeling. J Am Chem Soc 130(10):2984–2992.

19. Amunson KE, Ackels L, Kubelka J (2008) Site-specific unfolding thermodynamics of ahelix-turn-helix protein. J Am Chem Soc 130(26):8146–8147.

20. Liu F, Gao YG, Gruebele M (2010) A survey of λ repressor fragments from two-state todownhill folding. J Mol Biol 397(3):789–798.

21. Nagarajan S, et al. (2011) Differential ordering of the protein backbone and sidechains during protein folding revealed by site-specific recombinant infrared probes.J Am Chem Soc 133(50):20335–20340.

22. Jones KC, Peng CS, Tokmakoff A (2013) Folding of a heterogeneous β-hairpin peptidefrom temperature-jump 2D IR spectroscopy. Proc Natl Acad Sci USA 110(8):2828–2833.

23. Kishore M, Krishnamoorthy G, Udgaonkar JB (2013) Critical evaluation of the two-state model describing the equilibrium unfolding of the PI3K SH3 domain by time-resolved fluorescence resonance energy transfer. Biochemistry 52(52):9482–9496.

24. Walters BT, Mayne L, Hinshaw JR, Sosnick TR, Englander SW (2013) Folding of a largeprotein at high structural resolution. Proc Natl Acad Sci USA 110(47):18898–18903.

25. Kubelka GS, Kubelka J (2014) Site-specific thermodynamic stability and unfolding of ade novo designed protein structural motif mapped by 13C isotopically edited IRspectroscopy. J Am Chem Soc 136(16):6037–6048.

26. Davis CM, Cooper AK, Dyer RB (2015) Fast helix formation in the B domain of proteinA revealed by site-specific infrared probes. Biochemistry 54(9):1758–1766.

27. Decatur SM (2006) Elucidation of residue-level structure and dynamics of poly-peptides via isotope-edited infrared spectroscopy. Acc Chem Res 39(3):169–175.

28. Sun Y, et al. (2000) Structure of the coat protein-binding domain of the scaffoldingprotein from a double-stranded DNA virus. J Mol Biol 297(5):1195–1202.

29. Religa TL, et al. (2007) The helix-turn-helix motif as an ultrafast independently foldingdomain: the pathway of folding of Engrailed homeodomain. Proc Natl Acad Sci USA104(22):9272–9277.

30. Fezoui Y, Connolly PJ, Osterhout JJ (1997) Solution structure of alpha t alpha, a helicalhairpin peptide of de novo design. Protein Sci 6(9):1869–1877.

31. Muñoz V, Thompson PA, Hofrichter J, Eaton WA (1997) Folding dynamics andmechanism of beta-hairpin formation. Nature 390(6656):196–199.

32. Henry ER, Eaton WA (2004) Combinatorial modeling of protein folding kinetics: Freeenergy profiles and rates. Chem Phys 307(2-3):163–185.

33. Cellmer T, Henry ER, Hofrichter J, Eaton WA (2008) Measuring internal friction of anultrafast-folding protein. Proc Natl Acad Sci USA 105(47):18320–18325.

34. Kubelka J, Henry ER, Cellmer T, Hofrichter J, Eaton WA (2008) Chemical, physical, andtheoretical kinetics of an ultrafast folding protein. Proc Natl Acad Sci USA 105(48):18655–18662.

35. Henry ER, Best RB, EatonWA (2013) Comparing a simple theoretical model for proteinfolding with all-atommolecular dynamics simulations. Proc Natl Acad Sci USA 110(44):17880–17885.

36. Best RB, Hummer G, Eaton WA (2013) Native contacts determine protein foldingmechanisms in atomistic simulations. Proc Natl Acad Sci USA 110(44):17874–17879.

37. Miyazawa S, Jernigan RL (1996) Residue-residue potentials with a favorable contact pairterm and an unfavorable high packing density term, for simulation and threading. J MolBiol 256(3):623–644.

38. Rose GD, Fleming PJ, Banavar JR, Maritan A (2006) A backbone-based theory ofprotein folding. Proc Natl Acad Sci USA 103(45):16623–16633.

39. Wako H, Saitô N (1978) Statistical mechanical theory of the protein conformation. II.Folding pathway for protein. J Phys Soc Jpn 44(6):1939–1945.

40. Muñoz V, EatonWA (1999) A simple model for calculating the kinetics of protein foldingfrom three-dimensional structures. Proc Natl Acad Sci USA 96(20):11311–11316.

41. Scholtz JM, Qian H, York EJ, Stewart JM, Baldwin RL (1991) Parameters of helix-coiltransition theory for alanine-based peptides of varying chain lengths in water.Biopolymers 31(13):1463–1470.

42. Kubelka J (2013) Multivariate analysis of spectral data with frequency shifts: Appli-cation to temperature dependent infrared spectra of peptides and proteins. AnalChem 85(20):9588–9595.

43. De Sancho D, Doshi U, Muñoz V (2009) Protein folding rates and stability: How muchis there beyond size? J Am Chem Soc 131(6):2074–2075.

44. De Sancho D, Muñoz V (2011) Integrated prediction of protein folding and unfoldingrates from only size and structural class. Phys Chem Chem Phys 13(38):17030–17043.

45. Bruscolini P, Pelizzola A (2002) Exact solution of the Muñoz-Eaton model for proteinfolding. Phys Rev Lett 88(25 Pt 1):258101.

46. Bunagan MR, Gao J, Kelly JW, Gai F (2009) Probing the folding transition statestructure of the villin headpiece subdomain via side chain and backbone mutagenesis.J Am Chem Soc 131(21):7470–7476.

47. Culik RM, Jo H, DeGrado WF, Gai F (2012) Using thioamides to site-specifically in-terrogate the dynamics of hydrogen bond formation in β-sheet folding. J Am ChemSoc 134(19):8026–8029.

48. Baker D (2000) A surprising simplicity to protein folding. Nature 405(6782):39–42.49. Zarrine-Afsar A, Larson SM, Davidson AR (2005) The family feud: Do proteins with

similar structures fold via the same pathway? Curr Opin Struct Biol 15(1):42–49.50. Zhang Z, Pfaendtner J, Grafmüller A, Voth GA (2009) Defining coarse-grained rep-

resentations of large biomolecules and biomolecular complexes from elastic networkmodels. Biophys J 97(8):2327–2337.

51. Naganathan AN, Sanchez-Ruiz JM, Muñoz V (2005) Direct measurement of barrierheights in protein folding. J Am Chem Soc 127(51):17970–17971.

52. Yang WY, Gruebele M (2003) Folding at the speed limit. Nature 423(6936):193–197.53. Du D, Gai F (2006) Understanding the folding mechanism of an α-helical hairpin.

Biochemistry 45(44):13131–13139.54. Naganathan AN, Muñoz V (2005) Scaling of folding times with protein size. J Am

Chem Soc 127(2):480–481.55. McCallister EL, Alm E, Baker D (2000) Critical role of beta-hairpin formation in protein

G folding. Nat Struct Biol 7(8):669–673.

Lai et al. PNAS | August 11, 2015 | vol. 112 | no. 32 | 9895

BIOPH

YSICSAND

COMPU

TATIONALBIOLO

GY

CHEM

ISTR

Y