From crystals to pdb: building a high throughput crystallography pipeline for structural genomics...
-
date post
19-Dec-2015 -
Category
Documents
-
view
218 -
download
0
Transcript of From crystals to pdb: building a high throughput crystallography pipeline for structural genomics...
From crystals to pdb: building a high throughput crystallography From crystals to pdb: building a high throughput crystallography pipeline for structural genomicspipeline for structural genomics
Chiu HJ1, Wolf G1, West W2, van den Bedem H1, Miller MD1, Zhang Z1, Morse A2, Wang X2, Xu Q1, Levin I1, von Delft F3, Elsliger MA3, Godzik A2, Grzechnik SK2 and Deacon AM1
1Stanford Synchrotron Radiation Laboratory, 2575 Sand Hill Road, Menlo Park, CA 94025. 2University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093 3The Scripps Research Institute, 10550 N. Torrey Pines Rd., La Jolla, CA 92037
The Structure Determination Core (SDC) of the Joint Center for the Structural Genomics (JCSG) is dedicated to developing technologies, which streamline all the steps in the structure determination process from crystals to PDB-ready atomic coordinates. Over the last year the JCSG production capacity has increased dramatically. SDC has screened more than 7000 crystals from 192 protein targets. A total of 232 datasets from 106 targets have been collected and 90 structures have been solved. In order to handle the rapidly growing flow of experimental data, we have developed a set of crystallographic and database tools to both track and streamline our workflow.
Crystal cassettes are shipped to SDC from the Crystallomics Core. All relevant crystal information is captured in the central JCSG database and is downloaded in a “Beamline Report”. Crystals are screened automatically using the Stanford Auto-Mounter and Blu-Ice software. The visual and diffraction properties of each crystal are recorded. A computer program, DISTIL, is under development to automatically analyze diffraction images and provide an objective screening evaluation for each crystal. The best crystals for each target are flagged for data collection.
A computer program, Xsolve, is used for automatic crystallographic data processing and structure solution. A model building tool providing crystallographers with the best possible initial model for refinement is under development. The results of the analysis are uploaded to a Structure Solution Tracking System. A Refinement Tracking System requests weekly updates and collects all the data necessary for a peer-review Quality Control step, before the coordinates are deposited to the Protein Data Bank.
The Joint Center for Structural Genomics
Mission: To establish a robust and scalable protein structure determination pipeline that will form the foundation for a large-scale cost effective production center for structural genomics.
Structural Genomics of Thermotoga maritima
T.maritima genome
A system to test the pipeline• Small bacterial genome 1877 gene products• Proteins should express well in E. coli• Proteins from a thermophile may be more stable• Process entire genome• Establish trends in process e.g. crystallization.
Category Number % Category Number %
Nucleic acid binding
DNA binding
DNA repair
DNA replication factor
Transcription factor
RNA binding
Structural Ribosomal
protein
Translation factor
Motor
Enzyme
170
109
11
3
37
43
52
12
5
600
9.2
5.9
0.5
0.1
1.9
2.3
2.8
0.6
0.2
32.4
Peptidase
Protein Kinase
Protein Phosphatase
Signal transducer
Cell adhesion
Structural Protein
Transporter
Ion channel
Ligand Binding or carrier
Electron transporter
Unknown or unclassified
27
17
8
32
1
61
202
3
255
52
713
1.5
0.9
0.4
1.7
0.0
3.3
10.9
0.2
13.8
2.8
38.5
Total 1877 100%
HT StructureDetermination2nd Generation
HT Data Collection1st Generation Prototype3rd Generation Software
TargetSelection
HT Imaging
1st Generation Hardware6th Generation Software
StructureValidation &
DepositionAutosubmission of
electronic publication
Data flow parallels the experimental pipeline, harvesting ~300 parameters from 19 stages
HT Crystallization
HT Purification
HT Expression
PDB
HT Pipeline Processes, Bottlenecks and Leaks
purificationexpressioncloning
struc. refinementstruc. validationannotationpublication
phasingdata collection
xtal screening tracingbl xtal mounting
crystallizationimagingharvesting
targetselection
All relevant crystal information is captured in the central JCSG database in the form of Beamline Report
Target ID
Dif
frac
tion
pro
per
ties
Res
olu
tion
S
pot
qu
alit
yD
iffr
acti
on s
tren
gth
Beamline
Crystallization codition Visual properties
Robust and automated crystal screening
Initial design to productionLarge-scale capacity Shipping, storage and screeningUsed by JCSG since June 2002Implemented on all SSRL beamlines
Cassette kits distributed to
PX user groups
Integration with BLU-ICEAutomated sample mountingAutomated sample alignmentAutomated diffraction images
Increased screening capacity during SSRL shutdownLeverage existing infrastructure X-ray MicroMax-002 generator installed June 2003SSRL automated screening system used>4200 crystals screened in 9 months
All data uploaded to JCSG DBScreening, collection and structure solutionWork closely with BIC on implementationand debuggingStill more features needed to handle expanding production
Structure solution tracking
Local SDC “dataset” database
Active crystal report
Xsolve: automation of structure determination
2004 developmentsImprove success rate: better autoindexing, determine optimal resolution for scaling sweepsMore general: handle crystallographic details: re-indexing screw axes, merging sweepsMore robust operation: catch timeouts, core dumps, infinite loops etcImplement parallelization: develop tools to monitor and control processing on a Linux clusterNew program support: HKL2000, SHARP, SHELXD (not completely tested)
MosflmAutoindex
MosflmIntegrate
SolveSolve
ResolveTrace
ScalaScale
SolveP4221 mol2 ...
SolveP4222 mols3...
SolveP41221 mol2 ...
SolveP41222 mol3...
SolveP42221 mol2 ...
SolveP42222 mols3...
AutoindexIntegrateScaleSolveTrace
Main goals
• Handle majority cases
• Organize data and workflow
• Ease information flow to JCSG DB
• Allow integration of new programs.
• Use parallel execution of jobs
Refinement Tracking System
Automation of protein model completion:an inverse kinematics approach
Automatically Build Backbone Fragments:
• Build candidate closing conformations using IK techniques (robotics)• Rank according to electron density fit and conformational likelihood• Subject top-ranking candidates to real-space, torsion angle SA refinement
Results:
• Closed missing fragments of up to 12 residues in length to within 0.6A all-atom RMSD in 2.8A-model
Manually Finalizing Model:
• Labor intensive, time consuming• Existing aids are highly interactive
Lotan et al. submittedvan den Bedem et al. in preparation
Total Crystals Screened at SDC 10778
Unique Targets Represented 356
TM/non-TM targets 299/57
Datasets collected 394 (288 TM, 106 non-TM)
Unique Targets Represented 194
TM/non-TM targets 146/48
Structures solved 155 (94 MAD; 51 MR; 3 SAD; 7 NMR)
(125 TM: 30 non-TM)
JCSG production statistics (August 10, 2004)
can be searched by
Shipment IDDewarTarget IDCassette/puck
Installation of a Microsource X-ray generator at 9-2
JCSG production statistics (August 10, 2004)
More to come…22 targets: data collected, not yet solved92 targets: diffraction better than 3.5Å, not yet solved
Growing reliance on the JCSG DB500 crystals and 8 structures per month20 cassettes (2000 crystals) inventory30-40 structures in refinement
2.0 TB of diffraction images 0.5 TB of processing files>100,000 diffraction images
Average resolution of structures in PDB 2.0AAverage protein chain length 260 aaAverage number of residues in asu 480 aa
TSRI Administrative CoreIan WilsonPeter KuhnMarc ElsligerFrank von DelftTina MontgomeryGye Won HanRong ChenAngela Walker
UCSD Bioinformatics CoreJohn WooleyAdam GodzikSusan TaylorSlawomir Grzechnik Bill WestAndrew MorseJie QuyangXianhong WangJaume CanavesLukasz JaroszewskiRobert SchwarzenbacherMarc Robinson RechaviChris EdwardsOlga KirillovaRay Bean, Josie Alaoen
Stanford /SSRLStructure Determination CoreKeith HodgsonAshley DeaconBritt HedmanGuenter WolfMitch MillerHenry van den BedemQingping XuHerbert AxelrodChristopher RifeInna LevinR. Paul PhizackerleyAmanda PradoJohn KovarikRoss FloydIrimpan MathewsMichael SolitsAina CohenPaul Ellis
GNF & TSRICrystallomics CoreRay Stevens Scott LesleyRebbeca Page Carina GrittiniGlen Spraggon Andreas Kreusch Michael DiDonato Daniel McMullan Heath Klock Polat Abdubek Eileen Ambing Tanya Biorac Joanna C. Hale Justin Haugen Mike HornsbyEric Koesema Edward Nigoghossian Kevin Quijano Megan Wemmer Aprilfawn White Juli VincentJeff VelasquezKin MoyVandana SridharBernard CollinsThomas Clayton
Scientific Advisory BoardCarl-Ivar Brändén,
Karolinska Inst., Stockholm (retired 2003)Elbert Branscomb,
DOE Joint Genome Inst., Walnut CreekStephen Cusack,
EMBL – Outstation GrenobleLeroy Hood,
Inst. for Systems Biology, SeattleJohn Kuriyan, U.C. Berkeley
Erkki Ruoslahti, The Burnham Institute
James Wells, Sunesis Pharmaceuticals, Inc.
Charles Cantor. Sequenom, Inc.Todd Yeates,
UCLA-DOE, Inst. for Genomics and Proteomics
James Paulson, Consortium for Functional Glycomics,
The Scripps Research Institute
Exploratory ProjectsKurt Wüthrich (NMR)Linda ColumbusTouraj Etezady-EsfarjaniWolfgang PetiVirgil Woods (DXMS)
Acknowledgements
NIH Protein Structure Initiative Grant P50 GM62411