12 th April 2007 What’s new and Automation developments in CCP4 Ronan Keegan CCP4, STFC Daresbury...
-
Upload
shauna-preston -
Category
Documents
-
view
216 -
download
0
Transcript of 12 th April 2007 What’s new and Automation developments in CCP4 Ronan Keegan CCP4, STFC Daresbury...
12th April 2007
What’s new and Automation What’s new and Automation developments in CCP4developments in CCP4
Ronan KeeganRonan Keegan
CCP4, STFC Daresbury Laboratory, U.K.CCP4, STFC Daresbury Laboratory, U.K.
12th April 2007
Quick OverviewQuick Overview
• Brief introduction to CCP4
• New programs and features in CCP4
• Upcoming features in version 6.1
• Automation projects– MrBUMP – automated Molecular Replacement– Other automation projects
12th April 2007
What is CCP4?What is CCP4?
• Collaborative Computational Project Number 4• Set up in the late 70’s to support collaboration between
researchers working on Protein Crystallography software in the UK and to assemble a comprehensive collection of software to satisfy the computational requirements of the relevant UK groups.
• Many functions:– Support and distribution of the CCP4 suite of programs for PX– Education – workshops, university visits, summer schools,
study weekend– Maintaining the CCP4 bulletin board and website
• Academic users can use the suite for free. Licence fee for commercial users
12th April 2007
CCP4 Organisational StructureCCP4 Organisational Structure
DL CCP4 GroupCore developments &
activities
Project Leader
WG 1WG 2
Funded Developers Associated Developers
Occasional Contributors
STAB
Exec
Core projects e.g : CCP4mg, mmdb,
PIMS, Automation, BIOXHIT …Major programs e.g: Mosflm, Refmac,
Scala, Phaser, Clipper, Coot …
Lots of other useful software e.g. PDBExtract
SteeringCommittees
12th April 2007
Downloads by Month
0
500
1000
1500
2000
2500
3000
3500
Apr May June July Aug Sept Oct Nov Dec Jan Feb Mar
Month
Dow
nlo
ad
s
Source
Windows
Linux
OS X
IRIX
OSF1
SunOS
Total
12th April 2007
download type
2474
9361
7168
1512
199
9
5
source
windows
linux
os x
irix
osf
sunos
12th April 2007
New programs and features in New programs and features in CCP4CCP4
• New Packages in CCP4 6.0:– CCP4mg – Molecular Graphics– Coot – graphical toolkit for model building, model completion
and validation– Phaser – molecular replacement (version 1.3.3)– Chainsaw – MR model preparation– Pirate: statistical phase improvement– Superpose: secondary structure alignment– BP3: heavy atom phasing and refinement– Chooch: anomalous scattering factors from raw
fluorescence spectra– New features in CCP4i
12th April 2007
CCP4mgCCP4mg
• The aim is to provide a molecular graphics program that is fully compatible with the CCP4 environment and programs.
• Features:– Displays molecules with simple,
flexible selection tools and a variety of display styles and colouring schemes.
– A simple graphical interface to select the atoms to display, the colour scheme and the display style.
– Surfaces and electrostatic potential calculations
– Displays maps with a 'continuous crystal' and real time update of contouring level.
12th April 2007
• Superpose two or more protein structures automatically. Also structure analysis: secondary structure, solvent accessible surface area, hydrogen bonds, close contacts.
• Writes 'snapshot' images, create movies. Also creates POV-Ray input files and PostScript files.
• Runs on Linux and Windows (2000, NT and XP) and Mac OSX.
12th April 2007
• Normal mode Analysis
• CCP4MG can currently perform approximate normal mode calculations using two elastic network models.
– Only consider one atom per residue (CA)
– Assume all force constants to be the same
– Gaussian Network and Anisotropic Network methods employed
12th April 2007
Coot Coot
• Coot is for model building, model completion and validation.
• It will display maps and models and allows model manipulations such as idealization, real space refinement, manual rotation/translation, rigid-body fitting, ligand search, solvation, mutations, rotamers, and Ramachandran plots.
• File formats handled: PDB, mmCIF, MTZ files, Phases (.phs) and others.
• Most of its functions are also accessible for scripting.
http://www.ysbl.york.ac.uk/~emsley/coot/http://www.ysbl.york.ac.uk/~emsley/coot/
12th April 2007
CootCoot
12th April 2007
PhaserPhaser• Phaser is a program for phasing
macromolecular crystal structures with maximum likelihood methods. Version 1.3.3 in CCP4 6.0.2 supports the molecular replacement method. The next version will include the experimental phasing method.
• Features:
– brute- force rotation and translation searches
– FFT- based fast rotation and translation searches
– correction for anisotropic diffraction
– search for multiple molecules in multiple space groups
http://www-structmed.cimr.cam.ac.uk/phaserhttp://www-structmed.cimr.cam.ac.uk/phaser/
12th April 2007
Pirate & SuperposePirate & Superpose• Pirate:
– Pirate is a new statistical phase improvement program.– 'pirate' performs statistical phase improvement by classifying the electron
density map by sparseness/denseness and order/disorder, with the aim of obtaining superior results to conventional solvent mask based methods without requiring knowledge of the solvent content.
– Currently available for Linux and MAC OSX.
• Superpose:– superpose aligns two structures by matching graphs built on the protein's
secondary-structure elements, followed by an iterative three-dimensional alignment of protein backbone C-alpha atoms.
12th April 2007
BP3BP3• BP3 is a new program for
obtaining phase information from an S/MIR(AS) and/or S/MAD experiment(s) by multivariate likelihood estimation.
• It will refine heavy and/or anomalously scattering atomic parameters along with error parameters to generate phase information.
12th April 2007
ChoochChooch• Program to determine what
wavelengths to use to do your MAD experiment.
• Determines values of anomalous scattering factors from raw fluorescence spectra and pinpoints the position of the f'' maximum and the f' minimum values.
• Command line driven with all options controlled by switches.
• Optional PGPLOT visual output.
• Publication quality PS output generated on request.
12th April 2007
ChainsawChainsaw• Molecular replacement model preparation utility that mutates a template
PDB file according to a sequence alignment.• Features:
– examines the sequence alignment between target and template and modifies the template PDB file by pruning non-conserved residues back to the gamma atom
– more atoms are preserved than in a polyalanine model, but parts of the model which are unlikely to be present in the crystal structure and thus would only degrade the signal are pruned.
1mr6 used as a template for 1tgx (38% sequence identity). From left to right: unmodified template, chainsaw template, polyalanine template.
12th April 2007
New features in CCP4iNew features in CCP4i
• Interfaces for new programs:– Phaser, – Pirate/Clipper, – BP3, – Chainsaw, – CCP4mg launcher, – CRANK, – Shelx_C/D/E.
12th April 2007
New features in CCP4iNew features in CCP4i
• Database search and sort
• Project shortcuts
• Customise job database view
• Help shortcuts
12th April 2007
CCP4 6.1 and beyondCCP4 6.1 and beyond• Version 6.1 in 6-12 months
time• New Programs for 6.1
– Rapper – Protein modelling, automated conformer generation
– Rampage - generate Ramachandran plots for structure validation
– Buccaneer – chain tracing– Pointless – determine
space/laue group from umerged data
– Oasis– Crunch2– Afro– Clipper2 libraries– Automation scripts
• MrBUMP• XIA2
12th April 2007
iMosflmiMosflm
• New improved mosflm graphical user interface.
• More user friendly than the old one.
12th April 2007
Updates to popular CCP4 programsUpdates to popular CCP4 programs• Acorn
– ab initio procedure for the determination of protein structure using atomic resolution data or artificially extended data to atomic resolution, and for finding sub-structures from anomalous or isomorphous differences.
• Truncate (Uboat)– New improved version written in C++. – In the longer term there will be new tests for twinning,
anisotropy corrections and the ability to handle unmerged data (useful if radiation damage occurs), but these won't be in the initial release.
• Phaser 2.0/2.1– Will include experimental phasing
• Refmac 5.3/6.0– The latest version of Refmac, and will supersede the version
5.2.x in the CCP4 6.0.x series.
12th April 2007
CCP4 6.1 and beyondCCP4 6.1 and beyond• Plans for CCP4i
– CCP4i Classic reworked– CCP4i Auto – automation scripts
• CCP4i database– New database handler– Allow for greater flexibility and control of jobs– Job/DB viewer program built on top of the DB (more about this later)
12th April 2007
CCP4 6.1 and beyond CCP4 6.1 and beyond
• Long term plans– Better integration between CCP4i,
CCP4mg and Coot– More intuitive interfaces to programs– More automation
12th April 2007
CCP4 AutomationCCP4 Automation
• Reasons– Higher throughput at synchrotron
beamlines– Crystallography is increasingly becoming a
tool for researchers in other fields. Not all have the time to learn how to use the complex set of programs for solving structures. Users prefer to concentrate on the Biology
12th April 2007
12th April 2007
MrBUMP - Molecular Replacement with Bulk
Model Preparation
12th April 2007
Aim of MrBUMPAim of MrBUMP
• Automated framework for Molecular Replacement• Particular emphasis on generating variety of search models
• Wraps Phaser, Molrep and Acorn• Uses a variety of helper applications (eg Chainsaw) and
bioinformatics tools (eg FASTA, Mafft)• Uses on-line databases (eg PDB, Scop)• Can make use of computational cluster resources to speed
up the processing
• In favourable cases, gives “one-button” solution• In unfavourable cases, suggests likely search models for
manual investigation
12th April 2007
PipelinePipeline
`
`
`
`Target MTZ
& Sequence
TargetDetails
TemplateSearch
ModelPreparation
Molecular Replacement& Refinement
Check scores and exit or select the next model
12th April 2007
Template SearchTemplate Search
• Sequence based search (FASTA)• Secondary structure based search
(SSM)• Domain search (SCOP)• Identification of possible multimers
(PQS & PISA)• Users can also enter their own
templates by ID or from locally held files.
12th April 2007
Model PreparationModel Preparation• Search models can be prepared for MR in several
ways– Chainsaw – non-conserved residues are pruned (sequence
provided)– Molrep – pruning of non-conserved side-chains (internal
sequence alignment)– Polyalanine – all side chain atoms are pruned beyond the CB
atom– PDBclip – models are not modified
• An ensemble of the best models is also created for Phaser
12th April 2007
Molecular Replacement & Molecular Replacement & RefinementRefinement
• For each search model, MR is done with Molrep or Phaser or both.
• MR programs run mostly with defaults
• MrBUMP provides LABIN columns, MW of target, sequence identity of search model, number of copies to search for, number of clashes tolerated
• Allow Molrep / Phaser to set resolution limits and weights
• After MR, models are passed to Refmac for restrained refinement
otherwise
final Rfree < 0.48 orfinal Rfree < 0.52 and dropped by 5%
final Rfree < 0.35 or final Rfree < 0.5 and dropped by 20%
“success”
“marginal”
“failure”
12th April 2007
MrBUMP and cluster computingMrBUMP and cluster computing• MrBUMP is usually run on a desktop
from ccp4i or the command line• However, MrBUMP can take
advantage of a compute cluster to farm out the Molecular Replacement jobs.
• Currently Sun Grid Engine enabled clusters are supported but support will be added for other types of queuing system (e.g. LSF, Condor) if there is enough demand.
• Job control: All nodes terminate when one finds a solution
• Current (known) cluster installations at Daresbury, Diamond and University of Dundee.
12th April 2007
MrBUMP on the GridMrBUMP on the Grid
• Currently under development• Large parameter space searches. Submit
many jobs to U.K. computational grid resources using recently developed e-Science tools (MCS, AgentX, Rcommands, SRB)
• Goals:– To improve the performance/success rate of the
method– Possibly extract useful Biological information– Make grid-enabled version available to users
12th April 2007
MrBUMP OutputMrBUMP Output
• Currently produces a long log file listing search results, model preparation steps, summaries from each MR and refinement job and relevant references for programs used.
• Not ideal, there’s a lot of
information to trawl through. Summary of results now provided at the end of log file.
• Future versions will provide results in marked-up web page format for more clarity.
12th April 2007
MrBUMP Output – CCP4i dbviewerMrBUMP Output – CCP4i dbviewer
12th April 2007
MrBUMP pre-releaseMrBUMP pre-release
• Beta version first released in Jan’ 06 (current version is 0.3.3)
• Currently supported on Linux and Mac OSX, Windows version will be available when included in suite.
• Will be included in next release of CCP4 (version 6.1)
• MrBUMP paper to be published in Acta Cryst. D in April ‘07
• First citations in Obiero et al., Acta Cryst. (2006). F62, 757-760; El Omari et al., Acta Cryst. (2006). F62, 949-953
http://www.ccp4.ac.uk/MrBUMP/http://www.ccp4.ac.uk/MrBUMP/
12th April 2007
New featuresNew features• Run Acorn after refinement for phase
improvement (high resolution data)• Support for searching in enantiomorphic
spacegroups.• Users can now specify template models by
PDB ID or add local PDB files.• “Generate models only” option.• XML Output.• Additional multiple alignment programs
supported – Tcoffee and Probcons.
12th April 2007
Future versionsFuture versions
• Improvements to multimeric search models (using PISA)
• Supplement multiple alignment with additional sequences and/or structural information
• Model completion and/or re-building• Target complexes.• Improved output presentation
12th April 2007
ConclusionsConclusions
• Test cases and the examples demonstrated the utility of trying a range of search models, a protocol that can only be attempted adequately by automation.
• MrBUMP is not meant to compete with careful analysis of the data and model by an experienced crystallographer. However, it may succeed in difficult cases by finding a combination of models and protocols that would not otherwise have been tried.
• In more straight forward cases the advantage is simply one of convenience.
12th April 2007
CCP4 Automation - BALBESCCP4 Automation - BALBES
• Authors: Garib Murshudov, Alexei Vagin, Fei Long (YSBL)
• Built around Molrep MR and model preparation, Refmac and Sfcheck
• Model preparation based on using a custom database derived from the PDB database
• Best model is derived from the database and used in Molrep.
• Protocols
– Simple molecular replacement
– Domains iterated with refinement
– Use of tertiary structure if available
– Completion of MR using phased MR and refinement
• Released early 2007
12th April 2007
XIA2 Automated Data ReductionXIA2 Automated Data Reduction
• xia2 is a new automated data reduction system designed to work from raw diffraction data and a little metadata, and produce usefully reduced data in a form suitable for immediately starting phasing and structure solution.
• Pre-release version is currently available.
http://www.ccp4.ac.uk/xia/http://www.ccp4.ac.uk/xia/
12th April 2007
XIA2XIA2BEGIN PROJECT TM1553BEGIN CRYSTAL 13185
BEGIN AA_SEQUENCE
MHKMWPSDSNDHRVTRRNVIIFSSLLLGSLAILLALLLIRTKDQYYELRDFALGTSVRIVVSSQKINPRTIAEAILEDMKRITYKFSFTDERSVVKKINDHPNEWVEVDEETYSLIKAACAFAELTDGAFDPTVGRLLELWGFTGNYENLRVPSREEIEEALKHTGYKNVLFDDKNMRVMVKNGVKIDLGGIAKGYALDRARQIALSFDENATGFVEAGGDVRIIGPKFGKYPWVIGVKDPRGDDVIDYIYLKSGAVATSGDYERYFVVDGVRYHHILDPSTGYPARGVWSVTIIAEDATTADALSTAGFVMAGKDWRKVVLDFPNMGAHLLIVLEGGAIERSETFKLFERE
END AA_SEQUENCE
BEGIN HA_INFOATOM SENUMBER_PER_MONOMER 5END HA_INFO
BEGIN WAVELENGTH INFLWAVELENGTH 0.97950F' -12.1F'' 5.8END WAVELENGTH INFL
BEGIN WAVELENGTH LREMWAVELENGTH 1.00000F' -2.5F'' 0.5END WAVELENGTH LREM
BEGIN SWEEP INFLWAVELENGTH INFLBEAM 109.0 105.0IMAGE 13185_2_E1_001.imgDIRECTORY /data/jcsg/als1/8.2.1/20050121/collection/TM1553/13185/END SWEEP
BEGIN SWEEP LREMWAVELENGTH LREMBEAM 109.0 105.0IMAGE 13185_2_E2_001.imgDIRECTORY /data/jcsg/als1/8.2.1/20050121/collection/TM1553/13185/END SWEEP
END CRYSTAL 13185
END PROJECT TM1553
• Requires image data + input specification script with target and experiment data:• Sequence• Number of heavy atoms• Wavelength• Location of image data
12th April 2007
Through your favourite phasing pipeline…
12th April 2007
CCP4 Automation - HAPPy CCP4 Automation - HAPPy – Heavy Atom Phasing in Python– Heavy Atom Phasing in Python
• What it is:
• Automated Experimental Phasing Pipeline• Replaces and expands on the capabilities of the CHART package
• What it will do:
• Take integrated and merged experimental data amplitudes (post-TRUNCATE),de-twinned,consistently indexed.
• Determine the heavy atom structure and phase probabilities.
• Optimize the density map to give interpretable map.
• Build structure.
• First release will handle SAD data only.MAD,MIR,MIRAS modes later.
http://www.ccp4.ac.uk/HAPPy
12th April 2007
Acknowledgements:Acknowledgements:• Core Group (Daresbury):
– Martyn Winn, Charles Ballard, Peter Briggs, Francois Remacle, Norman Stein, Wendy Yang, Maeri Howard.
• CCP4MG (York):– Liz Potterton, Stuart McNicholas
• Coot (Oxford & York):– Paul Emsley, Kevin Cowtan
• Program Developers (York, Cambridge, Diamond & Leiden University):– Garib Murshudov, Alexei Vagin, Fei Long, Randy
Read, Airlie McCoy, Harry Powell, Gwyndaf Evans, Phil Evans, Eleanor Dodson, Nick Furnham, Steve Ness.
• BBSRC for their funding• And many others…