Protein folding - Dalhousie University

Click here to load reader

  • date post

    17-Mar-2022
  • Category

    Documents

  • view

    1
  • download

    0

Embed Size (px)

Transcript of Protein folding - Dalhousie University

Protein foldingMolecules spend more time in low energy conformations
Predicting a structure is approximated as the search for the lowest energy conformation on a protein model
Modeling requires data, a molecular model and a strategy to explore protein conformations
All practical methods are heuristics; which means that they aren't guaranteed to find THE lowest energy conformation
Bigger, faster computers will not get ourselves out of this situation
What is a model?
Model: a hypothetical description of a complex entity or process.
wordnet.princeton.edu/perl/webwn
To model a protein we need?
A description of the system.
(atom coordinates)
A way to know whether we are going anywhere.
(Force field)
(optimization)
Force fields
A set of mathematical functions and parameters to model each relevant “forces” (VdW, etc.).
The energy of 1 atom is the sum of all these terms.
The energy of a protein is the sum of energies of all atoms.
Force field are mathematical expressions that include all known important factors for molecular interactions.
The Natural choice is to use the chemical energy as a score and try to find the structure with the lowest free energy.
It doesn’t work with free energies
G H TS= − Free energy is temperature-dependent.
Entropic contribution cannot be calculated from a snapshot.
Something else must be used.
Modeling all relevant energy terms.
Energy function
A sum of terms that approximate the contributions of known theoretical microscopic forces.
FF str bend tors VdW el crossE E E E E E E= + + + + +
Modeling bond stretch/compression
In the case of bond stretching/compression, we need to measure the distance r between two atoms, and get from the force field what should be the optimal distance ro for a given pair of atoms.
Modeling bond stretch/compression
Realistic models of bond stretch-compression are computationally expensive.
This figure show how simpler model fare at modeling bond stretching.
P2E str Rab =k2
ab Rab
−R0 ab
ab2k3 abRab−R0
ab3k 4 abRab−R0
ab4

Generalization
How robust is the simulation in a range of conditions.
Efficiency
The longer it takes to perform a single task, the fewer iterations will be computed in the same amount of time.
Modeling VdW
Lennard-Jones
Is actually a computational stunt so there is no need to compute R but rather use Rn where n is an even factor.
( ) ( ) ( )2 2 2
ij i j i j i jR x x y y z z= − + − + −
6 6 ( ) BR C
AEXP R e R
Why?
Electrostatic fields decay with 1/ distance. Which makes them the longest- ranged interactions.
Coulomb’s Law
( )el A A
Computational cost of non-bonded energy (VdW, El)
~99.88% of computation in protein-sized models. Most of this is very small and does not contribute to the total energy significantly.
The number of non-bonded increase to the square of the number of atoms while bonded interaction are increasing roughly linearly.
Hydrophobic forces
This is not an explicit term.
Hydrophobic interaction are due to the difference in free energy between water molecules and polar/non-polar side chains.
The effect is thus intrinsic to the computations of electrostatic forces.
A process to find the best possible conformation
Force fields provide all the functions and parameters to compute the energy of 1 structure.
Finding the best structure comes down to finding the structure with the lowest energy.
Need a process to change the conformation toward a better structure: optimization.
The optimization is iterated until it is reasonable to believe that no better structure can be found.
Principle of optimization
You start with a protein for which you know all coordinates.
Evaluate the energy
Find a better structure, usually with small changes
Repeat until no better structure can be found.
This task is usually NEVER straightforward, unless the system would be made of a small number of atoms.
Minimizing
The gradient method
For each atoms:
1. Computer the force vectors for each term 2. Sum all vectors 3. Move the atoms over a small distance along their
resultant vectors.
Repeat until all resultant force vectors are of length 0.
Optimizing simple functions
Make use of classical mechanics equations such as:
Each atom gets a random kinetic energy vector which is added to the resultant force vector. This simulates thermal motion.
F ma=
Molecular Simulations
( ) 2 1 12i i i ir r r a t+ −= − +
Verlet Algorithm Numerical solution to Newton’s equations
r i1=r ivi t1/2 ai t 2
r i−1=r i−vi t−1/2 ai t 2
a can be computed from F = ma
Molecular Simulations
(practical): Microsecond 10-6
Timesteps
To simulate a microsecond, it takes a very, very long time…
To simulate slow processes, the time scale isn’t realistic.
Other Optimization strategies
Simulated Annealing
Scaling down the energy landscape makes the crossing of barrier more probable.
Time to do so it in short supply, however!
Why modeling proteins
From phylogenetic information, a few residues were identified as players.
Use molecular mechanics to “see” whether the surface of the protein can accommodate
an anti-codon. Inagaki, Y., Blouin, C., Doolittle, W.F., and Roger, A.J. 2002. Convergence and constraint in eukaryotic release factor 1 (eRF1) domain 1: the evolution of stop codon specificity. Nucleic Acids Res 30: 532-544.
Why modeling proteins Modeling a weird substrate into an active site.
Mandelate racemase can bind a substrate with two rings! Is there room for this in the wild type active site?
The answer is yes, although a bit counter-intuitive.
Siddiqi, F, Bourque, J., Jiang, H., Gardner, M., St. Maurice, M., Blouin, C., and Bearne S.L., Perturbing the Hydrophobic Pocket of Mandelate Racemase to Probe Phenyl Motion During Catalysis. Biochemistry 44(25):9013-21
Folding polypeptide isn’t expected to work out as well because…
Empirical models are parameterized with pre-folded proteins.
Role of water in partially folder proteins is significant.
Time scale for folding a protein is still a bit out of range for simulation.
Assistance in folding, either from Chaperones, other monomers isn’t there.
Folding process is seeded by the chain extension during translation.
Folding of peptide occurs an a time scale COMPLETELY beyond
what we can get away with today.
Protein folding from Scratch
Two genes: TC5b and TC3b
Both have references structure for validation.
Sequences
Solvation: Generalize Born/solvent-accessible surface area
This means that the water molecules are not explicitly defined in the simulation and the effect of the solvent is treated as a macro
property.
Understanding folding and design: Replica-exchange simulation of “Trp-Cage” miniproteins.
Pitera, JW., Swope, W. 2003. Proc. Natl. Acad. Sci. USA, 100: 7587-7592
The GIST
Run 23 simulation (4ns) in parallel, each at a different temperature from hot to cold. At every 5ps, redistribute the best
conformation to the coldest simulation.
This is much more effective at exploring solutions than a 23 X 4ns simulation
Protein folding from Scratch
RED is the reference, GREEN is the computed model.
Large Energy barriers are not as high in small, isolated structures.
It is reasonable to limit the scope of these simulations to protein
domains.
( ) 2
1
n
snapshot.
Total energy remains constant:
So, at higher temperatures, higher energy structures are more often observed.
This means more alternative conformation sampled in the same amount of time.
Each of these conformational groups gets in turn refined at lower temperature.
Statistically, most of the simulation time of the coldest chain will be spent in the energy well with the lowest energy.
kFFT EEE +=
IBM Blue Gene project (65K processors, never enough, however)
High Performance achievement in MD NAMD
Open source
Benchmark system
Parallel computing and Molecular dynamics
Folding proteins from an extended conformation is a difficult problem because of the crossing of energy
barriers.
The following slides describe the limitations of simulating the crossing of energy barrier using “massively” parallel
techniques.
Limitations of Parallel computing
It takes 1500 days to complete a thesis for one student
If the student is helped by someone, the work may go 2X as fast: 750 days.
What if 1500 students are working on the same thesis?
Overhead
Communication
Some work have to be executed in a sequence
Communicating the task and the results becomes an increasingly important time step as the task become small.
Each individual process have to wait for the slowest one to finish, leading to a loss of efficiency.
It doesn't make sense to have much more CPU than atoms in the system!
Time scale in protein folding
In the order of micro to milliseconds
This is not achievable by modern computers.
~10 000 days for 1 experiment (~28 years)
folding@home
(PS3, Xboxes, PC (Screensaver) )
Crossing energy barrier
Most of the time is spent waiting for the thermal motion to topple a structure over a barrier.
Principle of Ensemble dynamics
M CPU should take M X less time to go over a barrier.
For breaking 3-10 H-bonds (~22.3 kJ/M)
If an even occurs on average every 10,000 ns, the chance to witness this event during a 30 ns simulation is 0.3%.
If the same simulation runs on 10,000 machines, one expect to observe the event ~30 times over 30 ns.
Ensemble Dynamics
Start M dynamic calculations with the same initial structure.
Once 1 thread finds a barrier and go over it, copy the state of this thread into all other M-
1 replicate processes.
The communication overhead is negligible if the crossings are rare events, which is the
case.
folded protein are non- native.
This means that in order to resume folding, these must
be broken.
The Villin headpiece is one of the fastest (known) folding peptide !! What
about simulating anything else?
Progress in last 2 years
Rates of protein folding appear to be correctly
predicted using ensemble dynamics.
Progress on large systems
fusion have been published.
membrane fusion
Peter M. Kasson , Nicholas W. Kelley , Nina Singhal , Marija Vrljic¶, Axel T. Brunger¶,||, , and
Vijay S. Pande
11921
Summary
Biological models are assumed to have the lowest energy.
Optimization is used to find the lowest energy structures, and thus the biologically relevant conformation.
Simulation time is the bottleneck. The more you sample, the more likely that the solution will be good.
There are some progress in solving protein folding using heuristics and parallel computing, but the solution depend on theoretical breakthroughs, not the addition of hardware.
Slide 1
Slide 2
Slide 3
Slide 4
Slide 5
Slide 6
Slide 7
Slide 8
Slide 9
Slide 10
Slide 11
Slide 12
Slide 13
Slide 14
Slide 15
Slide 16
Slide 17
Slide 18
Slide 19
Slide 20
Slide 21
Slide 22
Slide 23
Slide 24
Slide 25
Slide 26
Slide 27
Slide 28
Slide 29
Slide 30
Slide 31
Slide 32
Slide 33
Slide 34
Slide 35
Slide 36
Slide 37
Slide 38
Slide 39
Slide 40
Slide 41
Slide 42
Slide 43
Slide 44
Slide 45