Cezary Czaplewski Faculty of Chemistry University of Gdańsk Poland

27
All-atom molecular simulations of protein folding and unfolded- state dynamics and structure with accelerated calculations on GPU Cezary Czaplewski Faculty of Chemistry University of Gdańsk Poland he 10th Protein Folding Winter School, KIAS, February, 7-11, 2011

description

All-atom molecular simulations of protein folding and unfolded-state dynamics and structure with accelerated calculations on GPU. Cezary Czaplewski Faculty of Chemistry University of Gdańsk Poland. The 10th Protein Folding Winter School, KIAS, February, 7-11, 2011. - PowerPoint PPT Presentation

Transcript of Cezary Czaplewski Faculty of Chemistry University of Gdańsk Poland

Page 1: Cezary Czaplewski Faculty of Chemistry University of  Gdańsk Poland

All-atom molecular simulations of protein folding and unfolded-state dynamics and

structure with accelerated calculations on GPU

Cezary CzaplewskiFaculty of ChemistryUniversity of GdańskPoland

The 10th Protein Folding Winter School, KIAS, February, 7-11, 2011

Page 2: Cezary Czaplewski Faculty of Chemistry University of  Gdańsk Poland

Molecular Simulation of ab Initio Protein Folding for

a Millisecond Folder NTL9(1-39)

Vincent A. Voelz,1 Gregory R. Bowman,2 Kyle Beauchamp,2 Vijay S. Pande1,2,3

1 Department of Chemistry, Stanford University, 2 Biophysics Program, Stanford University

3 Department of Structural Biology Stanford University

J. AM. CHEM. SOC. 2010, 132, 1526–1528

Page 3: Cezary Czaplewski Faculty of Chemistry University of  Gdańsk Poland

• Computer simulations, validated by experiment, can help gain a complete understanding of how proteins fold.

• Over a million-fold range in folding rates = possible diversity in folding mechanism.

• Folding@Home using GPU allowing for several folding trajectories of 39-residue NTL9(1-39), the slowest-folding protein (~1.5 ms folding time) folded ab initio with all-atom model MD to date.

• Insights into folding mechanism based on Markov state model (MSM).

Page 4: Cezary Czaplewski Faculty of Chemistry University of  Gdańsk Poland

10-15femto

10-12pico

10-9nano

10-6micro

10-3milli

100seconds

bond vibration

loopclosure

helixformation

folding of-hairpins

proteinfolding

all atom MD step

sidechainrotation

Page 5: Cezary Czaplewski Faculty of Chemistry University of  Gdańsk Poland

GPU

• Type of CPU attached to a graphics card dedicated to calculating floating point operations

• Incorporates stream processing microchips which contain special mathematical operations

• Stream Processing: applications can use multiple computational units without explicitly managing allocation, synchronization, or communication among those units.

Page 6: Cezary Czaplewski Faculty of Chemistry University of  Gdańsk Poland

CPU vs. GPU

CPU – 4 cores

Page 7: Cezary Czaplewski Faculty of Chemistry University of  Gdańsk Poland

Floating-Point Operations per Second for the CPU and GPU

Page 8: Cezary Czaplewski Faculty of Chemistry University of  Gdańsk Poland
Page 9: Cezary Czaplewski Faculty of Chemistry University of  Gdańsk Poland

Trp-cage 4.1 msPitera, Swope, PNAS 2003

Proteins folded ab initio by all atom MD

Fip35 WW 13 msEnsign, Pande, Biophys. J., 2009

Villin headpiece 10 msZagrovic, Snow, Shirts, Pande, JMB 2002

Fast folding villin variant <1 msEnsign, Kasson, Pande, JMB 2007

Page 10: Cezary Czaplewski Faculty of Chemistry University of  Gdańsk Poland

NTL9(1-39)~1.5 ms

experimental folding time

Page 11: Cezary Czaplewski Faculty of Chemistry University of  Gdańsk Poland

• Folding@Home using Gromacs with OpenMM library written specially for GPU allowing dramatically longer trajectories

• AMBER ff96 with Onufriev, Bashford,Case GBSA• Up to 10000 parallel MD simulations at 300, 330, 370 and 450K• Starting from native, random coil, extended• Aggregate 1.52 ms • Out of the ~3000 trajectories started from unfolded states at

370K only two reach <3.5 Å RMSD and eight <4 Å RMSD• Number of folding events is consistent with a simple model of

parallel uncoupled folding as a two-state Poisson process: ⟨n = ∫M(t)k exp(-M(t) kt) dt⟩

M(t) is the number of parallel simulations that reach time t.k is ~640/s experimental folding rate

Page 12: Cezary Czaplewski Faculty of Chemistry University of  Gdańsk Poland

Distributions of rmsd for native-state simulations of NTL9(1−39) after 10 μs

The number of parallel simulations at 370 K that reach time t.

Posterior predictions of the folding rate

Page 13: Cezary Czaplewski Faculty of Chemistry University of  Gdańsk Poland

A snapshot from a folding trajectory 3.1 Å RMSD

Non-native and native-like hydrophobic core arrangements

Page 14: Cezary Czaplewski Faculty of Chemistry University of  Gdańsk Poland

Markov state model (MSM)• MSM constitutes a kinetic clustering• Conformations that can interconvert rapidly are grouped into the

same state• Conformations that can only interconvert slowly are grouped into

separate states• Satisfies the Markov property—the identity of the next state

depends only on the identity of the current state and not any of the previous states

• Transition probability matrix T propagates state probabilities p

• An implied timescale k for given lag time t can be calculated from the eigenvalues m of matrix T

Page 15: Cezary Czaplewski Faculty of Chemistry University of  Gdańsk Poland

Detail of MSMBuilder package

• 100,000 microstates were generated by clustering conformations separated by 10 ns using k-centers algorithm

• The remaining 90% of the data was then assigned to these clusters• The resulting microstates had an average radius of ~4.5 Å • A macrostate model generated by lumping microstates into 2,000

macrostates using the Robust Perron Cluster Analysis (PCCA+) algorithm

• Although only a few folding trajectories were observed directly, a network of many possible pathways can be inferred from the overlapping sampling of local transitions.

• Top 10 folding fluxes, calculated by a greedy backtracking algorithm

Page 16: Cezary Czaplewski Faculty of Chemistry University of  Gdańsk Poland

Implied timescales Markov State Models (MSMs) built at lag times between 1 and 32 ns

100,000-microstate model 2000-macrostate model

Page 17: Cezary Czaplewski Faculty of Chemistry University of  Gdańsk Poland

A scatter plot of the 2000 macrostates Shown in red are the 14 macrostates transited by the top ten pathway fluxes

Page 18: Cezary Czaplewski Faculty of Chemistry University of  Gdańsk Poland

A 2000-state Markov State Model (MSM).

The top 10 folding pathways account for 25% of the ∼total flux and transit 14 of the 2000 macrostates

Page 19: Cezary Czaplewski Faculty of Chemistry University of  Gdańsk Poland

Contact profile subspaces used to calculate Qa Q12 Q13

natnat

nat

ccccQ

c(x)– contact profile indexed by x = (i, j)

Page 20: Cezary Czaplewski Faculty of Chemistry University of  Gdańsk Poland

The 14 macrostates plotted along structural and kinetic reaction coordinates

Page 21: Cezary Czaplewski Faculty of Chemistry University of  Gdańsk Poland

Contact profiles for the 14 macrostates involvedin the top folding pathways

Page 22: Cezary Czaplewski Faculty of Chemistry University of  Gdańsk Poland

Values of Q for each of the 14 macrostates involved in the top ten folding pathways

Page 23: Cezary Czaplewski Faculty of Chemistry University of  Gdańsk Poland

Q-values plotted versus pfold (committor) values

Page 24: Cezary Czaplewski Faculty of Chemistry University of  Gdańsk Poland

Macrostates l, m and n have very similar structural ensembles and similar pfold values

These states differ mostly intheir hairpin registrations and packing of the hairpin loop.

Page 25: Cezary Czaplewski Faculty of Chemistry University of  Gdańsk Poland

Conclusions

• Existing force field models using implicit solvent are accurate enough to fold proteins ab initio at long time scales, opening the door to simulating more structurally complex proteins.

• There need not be a single pathway or single, dominant mechanism for the folding of a given protein.

• Multiple mechanisms could be simultaneously present .

• The sequence of the protein, coupled with the chemical environment, control the balance to which each mechanistic pathway is seen.

Page 26: Cezary Czaplewski Faculty of Chemistry University of  Gdańsk Poland
Page 27: Cezary Czaplewski Faculty of Chemistry University of  Gdańsk Poland

Take-home message• GPU can speed up your simulations 10 times• Existing force field models using implicit solvent are

accurate enough to fold proteins during MD.• With only a few folding trajectories observed directly,

a network of many possible pathways can be inferred from kinetic clustering using the Markov State Model.

• Several pathways for the folding of a given protein.• Multiple folding mechanisms (a diffusion-collision or

nucleation-condensation) could be simultaneously present .