Molecular Replacement in CCP4 Martyn Winn CCP4 group, Daresbury Laboratory.
-
Upload
callie-allinson -
Category
Documents
-
view
233 -
download
3
Transcript of Molecular Replacement in CCP4 Martyn Winn CCP4 group, Daresbury Laboratory.
Molecular Replacement in CCP4
Martyn Winn
CCP4 group, Daresbury Laboratory
Data analysis before MR
Matthews coefficientNumber copies in a.s.u.
Native Patterson (translational NCS)B factor analysis
Self RF(rotational NCS)
Data analysis before MR
Interface to Sfcheck (currently in Validation&Deposition module)
completeness, anisotropy, Wilson B, twinning check, pseudo-translation check
Finding search models
Need a PDB file for a structurally similar protein. This usually means a homologous protein.
Either you have one already? Or you search the Protein Data Bank
Search is based on sequence alignment between target protein and proteins in PDB.Several bioinformatics tools can help here:
OCA, MSDlite, MSDtarget - all use FASTAwww.ebi.ac.uk/msd
psiBLAST - iterative searchingwww.ncbi.nlm.nih.gov/BLAST
FFAS - profile-profile alignmentffas.ljcrf.edu/ffas-cgi/cgi/ffas.pl
Editing search models
Don’t use a raw PDB file for Molecular Replacement unless it is very similar (e.g. same protein, different conditions, ligand, etc.)Edit it to:
• remove residues that don’t occur in the target• remove side chain atoms that don’t occur in the target
(these assume a know alignment from model to target)• remove uncertain regions of model (check B factors, occupancies)• remove flexible loops
Note that we don’t add anything!! Homology modelling?
Consider use of individual domains and multimers (see MrBUMP below)
ChainsawNorman Stein, Daresbury Lab.
MR model preparation: chainsaw
• Molecular replacement model preparation utility that edits a PDB search model according to a sequence alignment.
• Features:– Removes un-aligned residues from the model– Prunes non-conserved residues back to the gamma atom– Preserves more atoms than in polyalanine model
Example of 1mr6 used as a template for 1tgx (38% sequence identity)
Unmodified template Chainsaw template Polyalanine template
Running Chainsaw:
complete PDB file
model totargetalignment
Alignment from:original search tool (FASTA, psiBLAST, etc.)multiple alignment (set of search models, protein family, etc.)hand-created
MolrepAlexei Vagin, York
http://www.ysbl.york.ac.uk/~alexei/molrep.html
Performs complete MR in single step:
Expt. data (MTZ)
Search model (PDB)
MolrepPositionedsearch model
• Individual steps for more difficult cases: CRF, TF, rigid-body• Multi-copy search: locked CRF, dyad search• Self RF• Phased TF, spherically-averaged phased TF• Improve search model• Other search models: electron density map, NMR models• Fit model in electron density map / EM map
Molrep: overview of functionality
MR for straightforward case via GUI:
title
mode
MTZ file
MTZ labels
search model
RUN IT!
Other parameters
Low resolution cut-offMolrep uses soft cut-off, Boff (BOFF, COMPL, RESMIN)
High resolution cut-offMolrep uses soft cut-off, Badd (BADD, SIM)
|F|new = |F|input *exp(-Badd*s2)*(1-exp(-Boff*s2)
Defaults estimated
High resolution limit
Absolute cut-off (RESMAX)Default estimated
Radius of Patterson sphere for CRFDefault is twice radius of gyration of search model, Keyword RAD, Infrequently Used Parameters in GUI
DEFAULTS ARE GOOD
Cross Rotation Function
List of top RF peaks
More details here
polar anglesEuler angles (CCP4)
R factor
Translation Functionpolar angles
R factorScore
fractionaltranslation
List of solutions:top TF for each RF solution
contrast of solution
Identification of solutions
SCORE = product Correlation Coefficient and maximal value of Packing Function
Packing Function integrated into TF search removes solutions with overlapping molecules
CONTRAST = ratio of top score to mean score:
>2.5 - definitely solution<2.5 and > 1.8 - solution<1.8 and > 1.5 - maybe solution<1.5 and > 1.3 - maybe not solution, but program accepts it <1.3 - probably not solution
Finding more than one copy in the asu
By default, Molrep will estimate number of copies to find.Override with NMON keyword
CRFTF for first copyFix first copyTF for second copyFix second copyTF for third copy...
Program flow:
Solving complexes
• Choose first component (largest, highest similarity)• Solve for first component (probably need to specify NMON explicitly)• New Molrep job
Model in - second componentFixed in - positioned first component
• Repeat for all other components
Possibility to use spherically-averaged phased TF using phasesfrom first component
Phaser
Phaser website:http://www-structmed.cimr.cam.ac.uk/phaser/
Randy Read, Airlie McCoy, Cambridge
Performs complete MR in single step:
Expt. data (MTZ)
Search model (PDB)
PhaserPositionedsearch model
Use “MODE MR_AUTO” or “automated search” in the GUI
• anisotropy correction• fast rotation function• fast translation function• packing• refinement and phasing
loop over models
More functionality ...
• All steps can be run separately • Search over spacegroups:
MTZ spacegroup and enantiomorphAll spacegroups in MTZ point-groupSelected spacegroups
• Ensemble models (see later)• Brute RF and TF - slow and accurate• Normal mode analysis
Generates perturbed models
MR for straightforward case via GUI:
mode
MTZ file
search model
RUN IT!
target details
specify search
FRF
Euler angles (CCP4)
Top LLG and Z-scores for FRF
FTF
fractional translation
Top LLG and Z-scores for FRF
FRF solution number
Phaser does packing check after FTFClashes = C atoms closer than 2ÅDefault number of clashes = 0Think about increasing to 2 or 5
Packing
Solution files:.sol file produced at end of job• Contains summary of all solutions• Each solution contains rotations and usually translations -
3DIM vs 6DIM•One line per model located•.sol file can be read back into Phaser in later jobs
Z-score Have I solved it? less than 5 no
5 - 6 unlikely
6 - 7 possibly
7 - 8 probably
more than 8 definitely
RFZ = RF Z-score
TFZ = TF Z-score
Ensemble models
Phaser refers to search models as “ensembles”
Often, ensemble contains single model, as in traditional MR
But Phaser can use an ensemble of > 1 models, which may work better than any single model
Models in an ensemble must be superposed prior to use in Phaser - use e.g. Superpose in CCP4
N.B. Phaser will complain if:– MW of models in ensemble are too different
– RMS between models is too large
(In Molrep, construct ensemble as pseudo-NMR PDB file)
Finding more than one copy in the asu
Specify > 1 in Composition of the asymmetric unit(keyword COMPOSITION ... NUMBER)
Specify > 1 in Number of copies to search for(keyword SEARCH ... NUMBER)
Phaser will issue warnings if these numbers are wrong.
CRFTF for first copyFix first copy (possibly multiple sets)CRF for second opyTF for second copyFix second copy (possibly multiple sets)...
Complexes
E.g. beta-blip example in Phaser tutorial:http://www-structmed.cimr.cam.ac.uk/phaser/tutorial/Phaser_MR_tute.html
As before, but:• Define > 1 type of component
Composition of the asymmetric unitDefine another component
• Define > 1 ensembleDefine ensembles
Add ensemble• Specify all searches
Search detailsAdd another search
MrBUMPRonan Keegan, Martyn Winn, Daresbury Lab.
The aim of MrBUMP
•An automation framework for Molecular Replacement.•Particular emphasis on generating a variety of search
models.•Can be used to generate models only.
•In favourable cases, gives “one-button” solution•In unfavourable cases, will suggest likely search models
for manual investigation (lead generation)
Wraps Phaser and/or Molrep.•Also uses a variety of helper applications (e.g. Chainsaw)
and bioinformatics tools (e.g. Fasta, Mafft)•Uses on-line databases (e.g. PDB, Scop)
`
`
`
`Target MTZ
& Sequence
TargetDetails
TemplateSearch
ModelPreparation
Molecular Replacement& Refinement
Check scores and exit or select the next model
The Pipeline
Search for homologous proteins
FASTA search of PDB• Sequence based search using sequence of target structure.• Can be run locally if user has fasta34 program installed or remotely
using the OCA web-based service hosted by the EBI.
All of the resulting PDB id codes are added to a list
These structures are calledmodel templates
Search for additional similar structures
• Additional structure-based search (optional)– Top hit from the FASTA search is used as the template
structure for a secondary structure based search.– Uses the SSM webservice provided by the EBI (a.k.a. MSDfold)
• Manual addition• Can add additional PDB id codes to the list, e.g. from FFAS or psiBLAST searches• Can add local PDB files
– Any new structures found are added to the list. – Provides structural variation, not based on direct sequence similarity to target
Multiple Alignment
• After the set of PDB ids are collected in the FASTA and SSM searches, their coordinate-based sequences are collected and put through a multiple alignment with the target sequence
• Aims:– Score template structures in a consistent manner, in order to
prioritise them for subsequent steps– Extract pairwise alignment between template and target for use
in Chainsaw step. Multiple alignment should give a better set of alignments than the original pair-wise FASTA alignments
Multiple Alignment
target
modeltemplates
pairwisealignment
Jalview 2.08.1 Barton group, Dundee
currently support ClustalW or MAFFT for multiple alignment
Template Model Scoring
• Sequence identity:– Ungapped sequence identity i.e. sequence identity of aligned target
residues• Alignment quality:
– Dependent on the alignment length, the number of gaps created in the template alignment and the extent of each of these gaps.
– The penalties given for gaps and the size of the gaps is biased so that alignments that preserve domains of the structure rather than spreading the aligned residues out score higher.
The top scoring models are then used for further processing
• Alignment Scoring:
score = sequence identity X alignment quality
Domains
• Suitable templates for target domains may exist in isolation in PDB, or in combination with dissimilar domains
• In case of relative domain motion, may want to solve domains separately
Domains
• Domains search:– Top scoring templates from multiple alignment are tested to see
if they contain any domains.– Uses the SCOP database. This only lists domains that appear
more than once in the PDB.– The database is scanned to to see if domains exist for each of
the PDBs in the list of templates– Domains are then extracted from the parent PDB structure file
and added to the list of template models as additional search models for MR.
Multimers
• Multimer search:– Search for quaternary structures that may be used as search
models.– Better signal-to-noise ratio than monomer, if assembly is
correct for the target.– Multimeric structures based on top templates are retrieved
using the PQS service at the EBI, and added to the list of search models
– PQS will soon be replaced by the use of the PISA service at the EBI (Eugene Krissinel)
1n5a SPLIT-ASU into 4 Oligomeric files of type TRIMERIC1n5b SPLIT-ASU into 2 Oligomeric files of type DIMERIC1n5c SYMMETRY-COMPLEX Oligomeric file of type DIMERIC1n5d SYMMETRY-COMPLEX Oligomeric file of type DIMERIC
Search Model Preparation
Search models prepared in four ways:1. PDBclip
– original PDB with waters removed, hydrogens removed, most probable conformations for side chains selected and chain ID’s added if missing.
2. Molrep – Molrep contains a model preparation function which will align the
template sequence with the target sequence and prune the non-conserved side chains accordingly.
– Chainsaw – Can be given any alignment between the target and template
sequences.– Non-conserved residues are pruned back to the gamma atom.
1. Polyalanine– Created by excluding all of the side chain atoms beyond the CB atom
using the Pdbset program
Also create an ensemble model for Phaser based on top 5 models
otherwise
final Rfree < 0.48 orfinal Rfree < 0.52 and dropped by 5%
• The search models can be processed with Molrep or Phaser or both.• The resulting models from molecular replacement are passed to Refmac
for restrained refinement.• The change in the Rfree value during refinement is used as rough
estimate of how good the resulting model is.
Molecular Replacement and Refinement
final Rfree < 0.35 or final Rfree < 0.5 and dropped by 20%
• MR scores and un-refined models available for later inspection.
“success”
“marginal”
“failure”
• MrBUMP can take advantage of a compute cluster to farm out the Molecular Replacement jobs.
• Currently Sun Grid Engine enabled clusters are supported but support will be added for LSF and condor and any other types of queuing system if there is enough demand.
• All nodes terminate when one finds a solution
MrBUMP on compute clusters
• Pre-release made available in Jan 06
• Simple installation
• Currently runs on Linux and OSX.
• Windows version almost ready.
•Comes with CCP4 GUI .
•Can also be run from the command line with keyword input
• First citation in Obiero et al., Acta Cryst. (2006). F62, 757-760
•Regular updates (currently version 0.3.2)
Pre-release version of MrBUMP
http://www.ccp4.ac.uk/MrBUMP
A few observations ...
• In difficult cases, success in MrBUMP may depend on particular template, chain and model preparation method• Nevertheless, may get several putative solutions• Ease of subsequent model re-building, model completion may depend on choice of solution
• First solution or check everything? • Expectation that quick solution required - in fact, most users seem happy to let MrBUMP run for long time (hours, days)
• Worth checking “failed” solutions!