Parameterisation of a custom amino-acid, PCA
description
Transcript of Parameterisation of a custom amino-acid, PCA
Parameterisation of a custom amino-acid, PCA
Contents
• Force-fields and how they work– What kind of interactions do we need to
parameterise?
• Quantum Mechanical calculations– Where force-fields obtain basic information
• CHARMM parameterisation process– The goal of developers => follow same goal
• Worked example of pyroglutamic acid
What this talk will cover• It will:
– Teach the necessary background
– Advise starting points for parameterisation, by analogy with existing compounds
– Follow the process for a CHARMM style strategy
• pyroglutamic acid under CHARMM27
- We will be following a published process from developers
• It will not:• Teach you how to
create a force-field from scratch– Full parameterisation is a
long process, with many potential pit-falls
– This applies to new atoms and geometries not seen in force-fields
– If your molecule falls under this category, it’s best to ask the experts whether someone else has done it
NB: read Vanommeasleagh's home page and tutorials: http://dogmans.umaryland.edu/~kenno/#CGenFF
1: Force fields• All force-fields have a purpose
– There are several force-fields, each with their own goals (e.g. following developer’s research interest)
– They are thus not fully compatible
• All force-fields are made from simple components– They are not really black-boxes– Each terms can be fully understood– Therefore, researchers like us can make force-fields
• I will describe CHARMM-ffs from above points
Purpose of force-fields• Force-fields are used to
replicate chemical or biological environments.– This means they should
produce reasonable results compared to experimental data
• Force-fields must also be derived from quantum mechanical (QM) simulations for molecular accuracy– i.e. FFs must approximate
behaviour of electrons and atomic bonds
– FFs must choose basic functions with which to do this
• FF-developers must choose what kind of environment or interaction they will replicate
• Thus, FFs must make choices– AMBER targets proteins
conformations– OPLS-AA targets organic
liquid environments– CHARMM targets hydrous
solvent interactions and energies
• .˙. Many force-fields can be used for task X, but some will perform better.
Purpose of force-fieldsDue to these differences:
• Force-fields have an optimal range of performances, and limits– E.g. GROMOS works
particularly well for studies of lipid behaviour
– Results may be surprising if ffs used in exotic conditions
• Force-fields should not be mixed with each other• The detailed mechanics are
different
Analogy: TIP3 water do not melt or boil at expected temperatures – they were never
meant to!
CHARMM-CGenFF force field
• Basis: General forcefield for all organic molecules– Match MM-properties to QM calculations (MP2/6-31G*)– Minimum necessary work, to allow parameters to be
shared across many different molecules– Can use for, e.g. drug-binding studies in MD
• Overall process for parametrisation– Charges <= Dipole moments, interaction with TIP3 water– Bond and angle values <= QM Equilibrium geometry– Bond and angle constraints <= Vibrational modes– Dihedral constraints <= Potential energy scans– Van der Waals <= Heat of solvation, molecular volume
CHARMM force field for biomolecules
• Basis: Energetically accurate set of parameters– Replicate observed experimental quantities specific to
proteins, nucleic-acids, etc.– i.e. Aim is for CHARMM proteins/DNA which act exactly like
their counterparts in real life• Overall process for parametrisation
– Fit parameters like CGenFF.– Modify to fit with relevant experimental observations. This
means heat of solvation, solvent/solute interactions, etc.– Operating conditions: This force-field is optimised for room
temperature, liquid phase experiments – Similar process: CGenFF can combine with CHARMM
Comparison: AMBER’s protein FF• Basis: To replicate protein behaviour by modelling amino
acid conformations accurately– (subtle difference to CHARMM)
• Process:– Point charges fit to higher QM calculations
(B3LYP/cc-pVTZ/HF/6-31G**) at dielectric constant = 4– Bonds and angles matched to X-ray crystals and vibrational
spectra– Torsions fit to reproduce peptide conformations and phi-psi
energy surfaces.• Performs similar to CHARMM by different means
– Complete amino acids are parameterised, where as CHARMM uses fragments with known experimental data
– More dependent on QM and biochemical observations, less on strict chemical data
Inner-working of CHARMM
• CHARMM and CGenFF (from Duan e.t al., 2003) :– Harmonic bonds– Harmonic angles
(modified)– Harmonic
impropers– Cosine dihedrals
Inner-working of CHARMM
Compositions• Basic interactions between
bonded atoms are harmonic potentials– This reduces computation
cost by using simple functions and eliminating electrons from the equation
• Electrostatics and vdW forces are preserved– They are essential to
molecular systems!
Observations• Intra-molecular terms are very
simple in nature• They must be fitted to
approximate real covalently bonded molecules
• QM-calculations form the basis of these target data
• During parameterisation, we won’t usually need to modify vdW terms
QM calculations: Gaussian• CHARMM developers
use Gaussian to produce target data– Process documented
for Gaussian– However, one can use
other QM programs• I’ll explain what
calculations are required and how to do them
Using Gaussian with GaussView
Gaussian: Scripting• Textual interface
provides all input necessary– One file for one
simulation• GaussView (GUI) is
provided to assist process
• The QM-level for parameterisation is MP2/6-31G*– Enough to describe
common interactions– Probably not sufficient for
certain organo-metal interactions
Gaussian: ScriptingConditions and general
purpose of this simulation
Coordinates:different representations
possible
Title
Detailed commands, follow-on simulations,
etc.
QM calculations required:
1: Equilibrium geometry
2: Vibrational spectra
3: Dihedral energy surfaces
NB: CHARMM website provides a fully worked tutorial
CHARMM parametrisation process
• Priorities of different parameters– The energetics needs to be replicated ultimately
to < 1 kcal/mol– Some parameters have wide-spread effects, other
fill in important details
• Flowchart– Parameterisation is iterative, and will take time
• Illustrate process with pyroglutamic acid
CHARMM parametrisation process
• The priority of different parameters:– Charges– Equilibrium bond and
angle values– Bonds and angles force
constants– Torsions
• This order will permeate throughout parameterisation
• Each set of parameters depends on everything above
• Hence, refinement of parameters follows:• Create/modify data• Refine, check results• repeat.
Flowchart and notations• The rest of this talk follows like so:• 1 – Prepare entries and coordinates• 2 – Optimise charges• 3 – Optimise bonds and angles
– 3.1 – equilibrium values by QM geometry– 3.2 – force constants by molecular vibrations
• 4 – Optimise dihedrals– 4.1 – Generate Potential Energy Scans– 4.2 – Dihedral constants by matching and chemcial
knowledge• 5 – Validation
Individual ProcessesSet initial topology
and geometry
QM-minim. geometrywith water molecules
Set charges and vdW
MM-minim. geometrywith water molecules
Do theymatch?
charge and vdWfits complete
Set initial geometry
QM-minim. geometry
Modify bonds and angles
MM-minim. geometry
Do theymatch?
bond and anglefits complete
...etc.
Calculate QM vib. spec.in CHARMM
Modify bond/angleforce const.
Calculate MM vib. spec.in CHARMM
Do theymatch?
bond/angleforce const. complete
Validation• After the last fit to
dihedrals is complete, all outputs need to be verified– Starting again from
partial charges and water interaction simulations...
• Any changes likely propagate itself downwards through flowchart• Main reason why
parameterisation is time-consuming
First parametrisation
Do charges
Do bonds and angles
Do dihedrals
Validate all data
all complete
Worked example:
• Pyroglutamic acid– As its name suggests,
an amino acid– N-terminal only,
cyclisation of glumatic acid or enzymatic activity
– Used in protein-ligand simulation with scorpion toxins.
1.1: prepare topology (idea)
• Observe existing molecules in the database– What chemical
groups are your molecules?
• Cut and paste sections together, borrowing charges and groupings
1.1: prepare topology (details)• Adapt from existing residues.
– PRO as base residue, referred to GLN and backbone values.
• Consider its use and clashes with existing parameters
• Right now, backbone angles and torsion parameters will be used (bad idea for cyclic molecule)
• Change atom-typings to allow modification of important atom bonds and angles
• NB: Creating new atom-types are not preferred, as it will be more difficult for future work
1.1: atom-typings• From existing ff:• neutral Ns in CHARMM
– NH2 is primary amine– NH1 is secondary amine – N is tertiary amine
• carboxyl Cs in CHARMM– CC/CD are proteins– CE1/CE2 are elementary
alkanes• Use NH1/CC typing, no need
to modify O and H.• Topology done! Return to
ICs later.
1.1: prepare parameters• Work out required bonds, angles,
dihedrals– running CHARMM can help, as it stops
with a warning when bonds and angles are missing
• Borrow known bond and angle values– PRO provides most of existing– CGenFF has same philosophy in making
bonded parameters• 2PDO is a very similar molecule in
CGenFF. Use its values for dihedrals as an initial guess.
• TIP: Establish what you *need* to parameterise now, and change only them in future edits.
1.2: Create molecule for Gaussian
• The starting geometry will affect results– We grab some experimental coordinates from, e.g.
DrugBank or PDB– These coordinates can also be created with some
chemical softwares
• NB: CHARMM uses IC tables which it will use to generate the residues when coordinates are not given– Either paste by analogy or transfer from
minimised geometry
1.2: QM Minimisation
• Gaussian outputs in formats offering more accuracy than pdb– Outputs all bond, angle
and dihedral data– file conversions may be
necessary for visualisation
1.3: Create IC table
• Each entry in the IC table (see below) lists 4 connected atoms, I, J, K and L,; for a normal IC table entry, the I-J bond length, R(IJ); the I-J-K bond angle, T(IJK); the dihedral angle I-J-K-L, PHI; the bond angle T(JKL); and the K-L bond length, R(KL) are listed. Improper dihedral angles, which are used to keep sp2 atoms planar and sp3 atoms in a tetrahedral geometry, are marked with a star. The center atom of an improper dihedral angle is marked with a star. For an improper dihedral angle entry, the I-K bond length, R(IK); the I-K-J bond angle, T(IKJ); the dihedral angle I-J-K-L , PHI; the J-K-L bond angle T(JKL); and the K-L bond length, R(KL) are listed. The atom entry "-99" indicates an undefined atom.
• CHARMM begins with a seed of three atoms, then defines the rest of the molecule with these three– Protein conventions
follow backbone N-CA-C• When you create an IC
table, remember to base your first entries on these and ‘grow’ the molecule from there
CHARMM manual entry(!)
1.3: Create IC tableAtom set Bond
I-JAngleI-J-K
Dihedral I-J-K-L
AngleJ-K-L
BondK-L
N CA C O 1.441 111.58 22.93 120.82 1.23
CB CA C O 1.545 110.65 -172.00 120.82 1.23
Atom set BondI-K
AngleI-K-J
Dihedral I-J-K-L
AngleJ-K-L
BondK-L
N C *CA HA 1.441 110.86 -122.40 109.09 1.102
N C *CA CB 1.441 110.86 113.74 110.65 1.545
1.3: IC table notes• CHARMM does not need
every possible entry in a given IC table– Only the dihedrals are
necessary• The command “IC
PARAM” will fill in bonds/angles using existing parameters
• Will save you a lot of time
• Number of entries: about 1 less than the total number of atoms.
2: Optimise charges• In CHARMM ff, protein charges are not
parameterised by QM calculations alone.• Consistent behaviour with other amino-acids
means that PCA should obey similar charges, rather than QM data per-se– Obtained charges from analogy are “good enough”
when compared with QM data– No unique chemical groups such that charge
optimisation is required
• This step is thus skipped for PCA• I will show you a mocked example
2.1: Optimise charges
• Begin with charges from Gaussian/analogy
• (can read from output)
2.1: Optimise charges
• Run water interaction simulation and also check molecule dipole– i.e. imitate H-bonding
interaction with charges
• Modify charges to fit both data
• NB: MM dipole needs to overestimate QM dipole by 30-50%, since QM data is in vacuum
3: Optimise bonds and angles • Begin by setting equilibrium values to
QM or crystal values• Compare results and modify
equilibrium values– Using an IC table by analogy gives the
wrong ring conformation to CHARMM– I constructed an IC to start PCA near the
other minimum. CHARMM then finds the equatorial conformer
• It is important to check that your conformation agrees after dihedras are fitted
3.1: Bond/angle equilibrium values
• CHARMM developers used 0.2 Å and 3° as the upper limit– Can usually do much
better– Developers seek to
use same parameters to describe many ligands, we do not need to
3.1: Bond/angle force constants• Method: Comparison of QM and MM
vibrational spectra • Analogous residues are good starting points
for force constants
• However, perfect agreement is impossible– This is due to differences between MM and QM
minima, and mixing with torsion parameters– May need to modify again during validation– A “general” agreement (about 10%) is good
enough when all parameterisation is finished
3.1: Vibrational spectra• Vibrations can be
collected into components– change in dipole
determines IR absorption• Components are defined
by motions of collective parts– bond stretches and angle
bending– rocking, scissoring,
wagging motions
• Tip: revise IR and Raman spectroscopy
dihedrals, out-of-plane
motion
N-H,O-H stretch
C-H stretch
C=O stretch
N-C stretch
3.1: Vibrational notation
• In CHARMM, you need to convert motions of atoms into bonds, angles and dihedrals (internal coordinates, IC)– Forms basis set of all the degrees of freedoms– # Vibrations = DoG = 3N-6– CHARMM then uses these to fit vibrational spetra
• Then you need to convert these ICs into vibrational modes– Read Pulay et. al. (1979)
NB: I wrote a relatively simple tutorial on the CHARMM forums to run users through the Pulay conversion for water and propane.
QMMM
• Visualising these vibrations with GaussView and others will help you identify important vibrations
3.1: Bond/angle force constants
• NB: Remember there is a distinction between fitting to QM calculations, and fitting to experimental spectra
4: Optimise dihedrals• Potential Energy Scans
involve: fixing a dihedral at discrete points, minimising the rest of the geometry, and calculating absolute energy.
• Determines conformational preferences of residues, especially important for packing
4: Optimise dihedrals
• PCA with initial dihedrals favour the opposite ring conformer (top)
• One will need to work out which parameter affect the rotations you need
• After a series of fits and rationales for given parameters, a closer agreement can be obtained (bottom)
Some rationales for the PCA case:
• Fit only dihedrals that contains re-typed atoms– This includes NH1 and CC
• Carboxyl backbone prefers equatorial over axial orientation w.r.t. ring– Fit a single 1-fold or 2-fold
dihedral to CD-N-CA-C to express this
• Keep all the 2PDO dihedrals as is, except where necessary– So, planar amide retains
2-fold only due to symmetry
– Ring-dihedrals contain only 3-fold
– Modify numbers to produce correct energy surface
• The rest is heuristic searching
• Pointers for a good fit:• Single parameters of 1-fold,2-fold, 3-fold, 6-fold.• Accuracy to about 0.2 kcal/mol• Attention to barrier heights and relative minima
positions
• Using multiple dihedral parameters and no restrictions on phase, one can achieve a fit like this.
• The energy surface is very close to QM. However, the parameters used are arguably unphysical
5: Iteration
• Now that you have your molecule, re-run all the tests and check if discrepancies have arisen during the process
• The molecular vibrational spectrum should look better– Dihedrals factor into lower vibrational modes
• Adjust as necessary.
5: Comments
• You can spend as much time as you wish tweaking the numbers, but keep in mind that:– Simulation accuracy is going to be larger than the
residual errors in your parameters– Even experimental accuracy is 1 kcal/mol
• If it is convenient, post the major results and check with the developers (be nice)
Finished product• Once you are satisfied
with the result, it’s time to test the new molecule in MD
• As force-field are always in constant development...– if you did your
parameterisation well, CHARMM developers may add it to the collection.
• Good luck!