Post on 16-Jan-2016
description
Molecular Simulations & Sampling Techniques 1
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Bioinformatics Data Analysis & Tools
Molecular simulations & sampling techniques
Molecular Simulations & Sampling Techniques 2
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Molecular Simulations: Brief History1936 Gelatine balls (Morell and Hildebrand)
1953 MC simulations (Metropolis et al.)
1957 MC of Lennard-Jones spheres (Wood and Parker)
1964 MD of liquid argon 10 ps (Rahman)
1970’s Non-equilibrium methods
1970’s Stochastic dynamics methods
1974 MD of liquid water (Stillinger and Rahman)
1977 MD of protein in vacuo 20 ps (McCammon et al.)
1980’s Quantum-mechanical effects
1983 MD of protein in water 20 ps (van Gunsteren et al.)
1998 MD of peptide folding 100 ns (Daura et al.)
1998 MD of protein folding 1 s (Duan and Kollman)
Today Large proteins or complexes in water or membrane; up to microseconds
(10-100 CPU days ~10^14 slower than nature; computer speed x10 every 6 years)2029 Protein folding 1 ms
2034 E-coli, 10^11 atoms 1 ns
2056 Cell, 10^15 atoms 1 ns
2080 Protein folding as fast as in nature
Molecular Simulations & Sampling Techniques 3
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Protein flexibility
• Also a correctly folded protein is dynamic– Crystal structure
yields average position of the atoms
– ‘Breathing’ overall motion possible
Molecular Simulations & Sampling Techniques 4
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
B-factors
• De gemiddelde beweging van atoom rond gemiddelde positie
alpha helicesbeta-sheet
Molecular Simulations & Sampling Techniques 5
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Peptide folding from simulation
• A small (beta-)peptide forms helical structure according to NMR
• Computer simulations of the atomic motions: molecular dynamics
Molecular Simulations & Sampling Techniques 6
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Folding and un-folding in 200 ns
t [ns]
RM
SD
[nm
]
00 50 100 150 20000
0.1
0.2
0.3
0.4
Unfolded structures
all different?how different?
321 1010 possibilities!
Folded structures
all the same
folded
unfolded
Molecular Simulations & Sampling Techniques 7
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Temperature dependence
folded
unfolded
folding equilibrium depends on temperature
360 K
320 K
340 K
350 K
298 K
Molecular Simulations & Sampling Techniques 8
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Pressure dependence
2000 atm
1000 atm
1 atm
folding equilibrium depends on pressure
folded
unfolded
Molecular Simulations & Sampling Techniques 9
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
• Number of relevant non-folded structures is very much smaller than the number of possible non-folded structures
• If the number of relevant non-folded structures increases proportionally with the folding time, only 109 protein structures need to be simulated in stead of 1090 structures
• Folding-mechanism perhaps simpler after all…
Surprising result
Number of aminoacids in protein chain
Folding time (exp/sim) (seconds)
Number
possible structures
relevant (observed) structures
peptide 10 10-8 320 109 103
protein 100 10-2 3200 1090 109
Molecular Simulations & Sampling Techniques 10
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Phase Space
• Defines state of classical system of N particles:
– coordinates q = (x1, y1, z1, x2, … , zN)
– momenta p = (px1, py1, pz1, px2, … , pzN)
• One conformation (+ momenta) is one point (p,q) in phase space
• Motion is a curved line in phase space– trajectory: (p(t),q(t))
Molecular Simulations & Sampling Techniques 11
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Molecular Motions: Time & Length-scales
Molecular Simulations & Sampling Techniques 12
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Newton DynamicsSir Isaac Newton
t t + t
Molecular Simulations & Sampling Techniques 13
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Classical (Newton) Mechanics• A system has coordinates q and momenta p (= mv):
p = ( p1, p2, … , pN )
q = ( q1, q2, … , qN )• This is called the configuration space.
• The total energy can be split into two components:– kinetic energy (K):
K(p) = ½ mv2 = ½ p2/m– potential energy (V):
V(q) depends on interaction(s)
• The potential energy is described by – bonded interactions (e.g. bond stretching, angle bending)– non-bonded interactions (e.g. van der Waals, electrostatic)
• Non-bonded interactions determine the conformational variation that we observe for example in protein motions.
Molecular Simulations & Sampling Techniques 14
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
The Hamilton Function• The Hamiltonian function represents the total energy:
H(p,q) = K(p) + V(q)
• Is the generalised expression of classical mechanics • In two differential expressions:
• Newton equations of motion, but in a very elegant way • Use 'generalised coordinates' (p and q):
– can use any coordiate system• e.g., Cartesian coordinates or Euler angles
q Hq = ––– = ––– t pk
p Hp = ––– = ––– t qk
. .
Molecular Simulations & Sampling Techniques 15
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Hamilton's Principle • "The time derivative of the integral over the energy of
( p q - H(p,q) ) t = 0
• Hamilton's principle is most fundamental– Newton's equation of motion are only one set of equations that
can be derived from Hamilton's principle.• The integral is called the 'action‘, meaning:
– If we integrate the trajectory of an object in a configuration space given by positions q and momenta p between time points (integration limits) t1 and t2, then the value of the integral (= the 'action') of a 'real‘ trajectory is a minimum (more precisely an extremum) if compared to all other trajectories.
• Example: Why does a thrown stone follow a parabolic trajectory?– If you vary the trajectory and calculate the action, the parbolic
trajectory will yield the smallest 'action'.
. .
Molecular Simulations & Sampling Techniques 16
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Harmonic oscillator:• 1-dimensional motion
• 2 dimensions in phase-space:
– position (1-dimensional)
– momentum (1-dimensional)
• analytical solution for integration:– q(t) = b · cos (√k/m · t )– p(t) = -b · √mk · sin ( √k/m · t )
p(t)
q(t)
Molecular Simulations & Sampling Techniques 17
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Calculating Averages
• Integration of phase space:– 1 particle, 2 values per coordinate (e.g. up, down):
• 1*6 degrees of freedom (dof); 26 = 64 points• 2 particles: 2*6 dof; 212 = 4.096 points• 3 particles: 3*6 dof; 218 = 262.144 points• 4 particles: 4*6 dof; 224 = 16.777.216 points
• Need whole of phase space ?– only low energy states are relevant
Molecular Simulations & Sampling Techniques 18
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Solving Complex systems
• No analytical solutions• Numerical integration:
– by time (Molecular Dynamics)– by ensemble (Monte-Carlo)
• Molecular Dynamics: Numerical integration in time– Euler’s approximation:
• q(t + Δt) = q(t) + p(t)/m · Δt• p(t + Δt) = p(t) + m · a(t) · Δt
– Verlet / Leap-frog
Molecular Simulations & Sampling Techniques 19
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Features of Newton Dynamics• Newton’s equations:
– Energy conservative– Time reversible– Deterministic
• Numeric integration by Verlet algrorithm: ‘Simulation’r(t + t) ~ 2 r(t) - r(t - t) + F(t)/m t2 [ + 2 O(t4) ]
• In ‘real’ simulation: Rounding errors (cumulative): not fully reversible no full energy conservation
• Coupling to thermal bath re-scaling not fully deterministic
• ‘Lyapunov’ instability trajectories diverge
Molecular Simulations & Sampling Techniques 20
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Derivation: Verlet
• Taylor expansion:– q(t+Δt) = q(t) + q’(t)Δt + 1/2! q’’(t)Δt2 + 1/3! q’’’(t)Δt3 + …
• where: q’(t) = v(t) (1st derivative, velocity)
• and: q’’(t) = a(t) (2nd derivative, acceleration)
q(t+Δt) = q(t) + q’(t)Δt + 1/2! q’’(t)Δt2 + 1/3! q’’’(t)Δt3
q(t−Δt) = q(t) − q’(t)Δt + 1/2! q’’(t)Δt2 − 1/3! q’’’(t)Δt3 +
q(t+Δt) + q(t−Δt) = 2q(t) + 2·1/2! q’’(t)Δt2
– Rearrange:
q(t+Δt) = 2q(t) − q(t−Δt) + a(t)Δt2
• 2nd order; but 3rd order accuracy
Molecular Simulations & Sampling Techniques 21
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
What do we obtain?• Trajectory:
q(t) and p(t)
• Probability of occurence:
P(p,q) = 1/Z e-H(p,q)/kT
• Averages along trajectory:
<A(p,q)T> = 1/T A(q(t),p(t)) dt (where T denotes total time, and not! temperature)
Molecular Simulations & Sampling Techniques 22
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Convergence• Amount of phase-space covered
– “Sampling”
• Impossible to prove:You cannot know what you don’t know
• Energy “landscape” in phase-space– there might be a “next valley”
Molecular Simulations & Sampling Techniques 23
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Example: Convergence (1)
Molecular Simulations & Sampling Techniques 24
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Example: Convergence (2)
Molecular Simulations & Sampling Techniques 25
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Example: Convergence (3)
• Apparent Convergence
on all timescales100 ps – 10 ns !
Molecular Simulations & Sampling Techniques 26
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Efficiency• Time step limited by vibrational frequencies
– heavy-atom–hydrogen bond vibration 10-14s (10fs)– 10-20 integration steps per vibrational period:
• 0.5 fs time step; 2.000.000 steps for 1 ns• Removal of fast vibrations (constraining):
– hydrogen atom bond and angle motion– heavy-atom bond motion– out-of-plane motions (e.g. aromatic groups)
• In practice: 1-2 fs time step– 5-7 fs maximum
Molecular Simulations & Sampling Techniques 27
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Constraining• to remove degrees of freedom, e.g.:
– bond i-j vibrations keep distance i-j constant– angle i-j-k vibrations keep distance i-k constant
• Constraint Algorithms– SHAKE
• iterative adjustment of lagrange multipliers– LINCS
• Taylor expansion of matrix inversion• non-iterative (more stable)• no highly connected constraints
– SETTLE• Analytical Solution
– for symmetric 3-atom molecules (like water)
Molecular Simulations & Sampling Techniques 28
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Improving Performance• Pairwise potential: Fij = − Fji
• Potential E(r) ~ 0 at large r : cut-off– Coulomb: ~ 1/r– Lennard-Jones: ~1/r6
• Atoms move little in one step: pair-list
– Evaluating r is expensive: r = √|rj−ri|
• Large distances change less: twin-range– short-range each step; long range less often
• Multiple time-step methods• Many Processor/Compiler/Language specific optimizations:
– use of Fortran vs. C– optimize cache performance
• arrays of positions, velocities, foces, parameters are very large
– compiler optimizations
Molecular Simulations & Sampling Techniques 29
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Ignoring Degrees of Freedom• Internal:
– bonds, angles → Constraint algorithm• larger time steps
• External:– “Solvent” → Langevin dynamics
• less (explicit) particles– Inertia & “solvent” → Brownian dynamics
• larger time steps
Molecular Simulations & Sampling Techniques 30
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Trajectory on Energy Surface
Molecular Simulations & Sampling Techniques 31
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Sampling in Conformational Space• Most of the computational time is spent on calculating
(local, harmonic) vibrations.
Entropy
Ene
rgy
E >> KT
vibration
Molecular Simulations & Sampling Techniques 32
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Barriers
• Kitao et al. (1998) Proteins 33, 496-517.
Molecular Simulations & Sampling Techniques 33
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Psychology of Theorists 100%
“In theory, there should be no difference between theory and practice. In practice, however, there is always a difference...“ (Witten and Frank)
“For every complex question there is a simple and wrong solution.” (Albert Einstein)
“All models are wrong, but some are useful.” (George Box)
0%
OP
TIM
IST
SC
ALE
Molecular Simulations & Sampling Techniques 34
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Monte Carlo Sampling• Ergodic hypothesis:
– Sampling over time (Molecular Dynamics approach); and
– Ensemble averaging (Monte Carlo approach) • Yield the same result:
(r) = < i(r) >NVE
• Detailed Balance condition:
p(o) (o n) = p(n) (n o)
Molecular Simulations & Sampling Techniques 35
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Metropolis Selection Scheme• Metropolis acceptance rule that satisfies detailed
equilibrium:
acc(o n) = p(n)/p(o) = e-E/kT if p(n) < (o)
acc(o n) = 1 if p(n) (o)
Metropolis Monte Carlo
• Ergodic probability density for configurations around rN
e-E/kT
p(rN) = –––––– e-E/kT
Molecular Simulations & Sampling Techniques 36
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Search Strategies
Molecular Simulations & Sampling Techniques 37
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Leaps
Molecular Simulations & Sampling Techniques 38
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Computational Scheme
• Readuction of the leaps will lead to classical dynamics
• Control parameter:– RMSD– Angle deviation
Molecular Simulations & Sampling Techniques 39
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Computational Load: Solvation
• Most computational time (>95%) spent on calculating (bulk) water-water interactions
Molecular Simulations & Sampling Techniques 40
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Implicit Solvation
Molecular Simulations & Sampling Techniques 41
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
POPS• Solvent accessible area
– fast and accurate area calculation– resolution:
• POPS-A (per atom) • POPS-R (per residue)
– parametrised on 120000 atoms and 12000 residues– derivable -> MD
• Free energy of solvation
Gsolvi = areai · i
• POPS is implemented in GROMOS96• parameters 'sigma' from simulations in water:
– amino acids in helix, sheet and extended conformation– peptides in helix and sheet conformation
Molecular Simulations & Sampling Techniques 42
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
POPS server
Molecular Simulations & Sampling Techniques 43
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Test molecules: alanine dipeptide
Molecular Simulations & Sampling Techniques 44
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Test molecules: BPTI / Y35G-BPTI Classical MD Leap-dynamics Essential dynamics
Molecular Simulations & Sampling Techniques 45
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Calmodulin domains• Apparent unfolding
temperatures (CD)– C-domain : 315 K
(42 ° C)– N-domain : 328 K
(55 °C)• LD simulations:
– 3 ns– 4 trajectories
• 290 K• 325 K• 360 K
Molecular Simulations & Sampling Techniques 46
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Snapshots
Molecular Simulations & Sampling Techniques 47
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Trajectories
Molecular Simulations & Sampling Techniques 48
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Example: Protein & Ligand Dynamics
Molecular Simulations & Sampling Techniques 49
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Example: Essential Dynamics Analysis
Cyt-P450BM3 7 x 10ns
“free” MD simulations
Molecular Simulations & Sampling Techniques 50
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
CD
Molecular Simulations & Sampling Techniques 51
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Comparison CD / simulation
Molecular Simulations & Sampling Techniques 52
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Example: Minima
Molecular Simulations & Sampling Techniques 53
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Example: Conformations
Molecular Simulations & Sampling Techniques 54
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Levinthal’s paradox
• Eiwitvouwingsprobleem:– Voorspel de 3D structuur vanuit de sequentie– Begrijp het vouwingsproces
Molecular Simulations & Sampling Techniques 55
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Folding energy
• Each protein conformation has a certain energy and a certain flexibility (entropy)
• Corresponds to a point on a multidimensional free energy surface
may have higher energybut lower free energythan
energyE(x)
coordinate x
Three coordinates per atom3N-6 dimensions possible G = H – TS
Molecular Simulations & Sampling Techniques 56
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Folded state
• Native state = lowest point on the free energy landscape
• Many possible routes • Many possible local minima (misfolded structures)
Molecular Simulations & Sampling Techniques 57
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Molten globule
• First step: hydrophobic collapse• Molten globule: globular structure, not yet correct folded• Local minimum on the free energy surface
Molecular Simulations & Sampling Techniques 58
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Force Field
“the collection of all forces that we consider to occur in a mechanical atomar system”
• A generalised description:Etotal = Ebonded + Enon-bonded + Ecrossterm
• Crossterms:– non-bonded interaction influence the bonded
interaction (v.v.). – Some force fields neglect those terms.
• Note that force fields are (mostly) designed for pairwise atom interactions. – Higher order interactions are implicitly included in
the pairwise interaction parameters.
Molecular Simulations & Sampling Techniques 59
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Force Field Components: Bonded Interactions
Molecular Simulations & Sampling Techniques 60
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Force Field Components: Non-Bonded Interactions
Molecular Simulations & Sampling Techniques 61
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
All Together…
Molecular Simulations & Sampling Techniques 62
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Reduced Units• Generalise description of (atomic) systems
– expres all quantities in basic units derived from system's dimensions
• For example, a Lennard-Jones interaction:VLJ = ƒ(r/)
is characteristic interaction energy; is equilibrium distance
• Choose basic units:– unit of length, – unit of energy,
– unit of mass, m (mass of the atoms in the system)• all other units can be derived from these, e.g.:
– time: m/– temperature: /kB
(from: Frenkel and Smit, 'Understanding Molecular Simulations', Academic Press.)
• Other choices, e.g., ‘MD’ units: – length nm (10-9m),mass u, time ps (10-12s), charge e, temp K– energy kJ mol-1, veolcity nm ps-1, pressure kJ mol-1 nm-3
Molecular Simulations & Sampling Techniques 63
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006
Main points
Molecular Simulations & Sampling Techniques 64
Bio
inf.
Data
An
aly
sis
& T
ools
17 Jan 2006