From crystals to pdb: building a high throughput crystallography pipeline for structural genomics...

1
From crystals to pdb: building a high throughput From crystals to pdb: building a high throughput crystallography pipeline for structural genomics crystallography pipeline for structural genomics Chiu HJ 1 , Wolf G 1 , West W 2 , van den Bedem H 1 , Miller MD 1 , Zhang Z 1 , Morse A 2 , Wang X 2 , Xu Q 1 , Levin I 1 , von Delft F 3 , Elsliger MA 3 , Godzik A 2 , Grzechnik SK 2 and Deacon AM 1 1 Stanford Synchrotron Radiation Laboratory, 2575 Sand Hill Road, Menlo Park, CA 94025. 2 University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093 3 The Scripps Research Institute, 10550 N. Torrey Pines Rd., La Jolla, CA 92037 The Structure Determination Core (SDC) of the Joint Center for the Structural Genomics (JCSG) is dedicated to developing technologies, which streamline all the steps in the structure determination process from crystals to PDB-ready atomic coordinates. Over the last year the JCSG production capacity has increased dramatically. SDC has screened more than 7000 crystals from 192 protein targets. A total of 232 datasets from 106 targets have been collected and 90 structures have been solved. In order to handle the rapidly growing flow of experimental data, we have developed a set of crystallographic and database tools to both track and streamline our workflow. Crystal cassettes are shipped to SDC from the Crystallomics Core. All relevant crystal information is captured in the central JCSG database and is downloaded in a “Beamline Report”. Crystals are screened automatically using the Stanford Auto-Mounter and Blu-Ice software. The visual and diffraction properties of each crystal are recorded. A computer program, DISTIL, is under development to automatically analyze diffraction images and provide an objective screening evaluation for each crystal. The best crystals for each target are flagged for data collection. A computer program, Xsolve, is used for automatic crystallographic data processing and structure solution. A model building tool providing crystallographers with the best possible initial model for refinement is under development. The results of the analysis are uploaded to a Structure Solution Tracking System. A Refinement Tracking System requests weekly updates and collects all the data necessary for a peer-review Quality Control step, before the coordinates are deposited to the Protein Data Bank. The Joint Center for Structural Genomics Mission: To establish a robust and scalable protein structure determination pipeline that will form the foundation for a large-scale cost effective production center for structural genomics. Structural Genomics of Thermotoga maritima T.maritima genome A system to test the pipeline Small bacterial genome 1877 gene products Proteins should express well in E. coli Proteins from a thermophile may be more stable Process entire genome Establish trends in process e.g. crystallization. Category Number % Category Number % Nucleic acid binding DNA binding DNA repair DNA replication factor Transcription factor RNA binding Structural Ribosomal protein Translation factor Motor Enzyme 170 109 11 3 37 43 52 12 5 600 9.2 5.9 0.5 0.1 1.9 2.3 2.8 0.6 0.2 32.4 Peptidase Protein Kinase Protein Phosphatase Signal transducer Cell adhesion Structural Protein Transporter Ion channel Ligand Binding or carrier Electron transporter Unknown or unclassified 27 17 8 32 1 61 202 3 255 52 713 1.5 0.9 0.4 1.7 0.0 3.3 10.9 0.2 13.8 2.8 38.5 Total 1877 100% HT Structure Determination 2 nd Generation HT Data Collection 1 st Generation Prototype 3 rd Generation Software Target Selection HT Imaging 1 st Generation Hardware 6 th Generation Software Structure Validation & Deposition Autosubmission of electronic publication Data flow parallels the experimental pipeline, harvesting ~300 parameters from 19 stages HT Crystallization HT Purification HT Expression PDB HT Pipeline Processes, Bottlenecks and Leaks purification expression cloni ng struc. refinement struc. validation annotatio n publicati on phasing data collection xtal screening tracing bl xtal mounting crystallization imaging harvesting target selectio n All relevant crystal information is captured in the central JCSG database in the form of Beamline Report Target ID Diffraction properties Resolution Spot quality Diffraction strength Beamline Crystallization codition Visual properties Robust and automated crystal screening Initial design to production Large-scale capacity Shipping, storage and screening Used by JCSG since June 2002 Implemented on all SSRL beamlines Cassette kits distributed to PX user groups Integration with BLU-ICE Automated sample mounting Automated sample alignment Automated diffraction images Increased screening capacity during SSRL shutdown Leverage existing infrastructure X-ray MicroMax-002 generator installed June 2003 SSRL automated screening system used >4200 crystals screened in 9 months All data uploaded to JCSG DB Screening, collection and structure solution Work closely with BIC on implementation and debugging Still more features needed to handle expanding production Structure solution tracking Local SDC “dataset” database Active crystal report Xsolve: automation of structure determination 2004 developments Improve success rate: better autoindexing, determine optimal resolution for scaling sweeps More general: handle crystallographic details: re-indexing screw axes, merging sweeps More robust operation: catch timeouts, core dumps, infinite loops etc Implement parallelization: develop tools to monitor and control processing on a Linux cluster New program support: HKL2000, SHARP, SHELXD (not completely tested) Mosflm Autoindex Mosflm Integrate Solve Solve Resolve Trace Scala Scale Solve P422 1 mol 2 . . . Solve P422 2 mols 3 . . . Solve P4122 1 mol 2 . . . Solve P4122 2 mol 3 . . . Solve P4222 1 mol 2 . . . Solve P4222 2 mols 3 . . . Autoindex Integrate Scale Solve Trace Main goals Handle majority cases Organize data and workflow Ease information flow to JCSG DB Allow integration of new programs. Use parallel execution of jobs Refinement Tracking System Automation of protein model completion: an inverse kinematics approach Automatically Build Backbone Fragments: Build candidate closing conformations using IK techniques (robotics) Rank according to electron density fit and conformational likelihood Subject top-ranking candidates to real-space, torsion angle SA refinement Results: Closed missing fragments of up to 12 residues in length to within 0.6A all- atom RMSD in 2.8A-model Manually Finalizing Model: Labor intensive, time consuming Existing aids are highly interactive Lotan et al. submitted van den Bedem et al. in preparation Total Crystals Screened at SDC 10778 Unique Targets Represented 356 TM/non-TM targets 299/57 Datasets collected 394 (288 TM, 106 non-TM) Unique Targets Represented 194 TM/non-TM targets 146/48 Structures solved 155 (94 MAD; 51 MR; 3 SAD; 7 NMR) (125 TM: 30 non-TM) JCSG production statistics (August 10, 2004) can be searched by Shipment ID Dewar Target ID Cassette/puck Installation of a Microsource X-ray generator at 9-2 JCSG production statistics (August 10, 2004) More to come… 22 targets: data collected, not yet solved 92 targets: diffraction better than 3.5Å, not yet solved Growing reliance on the JCSG DB 500 crystals and 8 structures per month 20 cassettes (2000 crystals) inventory 30-40 structures in refinement 2.0 TB of diffraction images 0.5 TB of processing files >100,000 diffraction images Average resolution of structures in PDB 2.0A Average protein chain length 260 aa Average number of residues in asu 480 aa TSRI Administrative Core Ian Wilson Peter Kuhn Marc Elsliger Frank von Delft Tina Montgomery Gye Won Han Rong Chen Angela Walker UCSD Bioinformatics Core John Wooley Adam Godzik Susan Taylor Slawomir Grzechnik Bill West Andrew Morse Jie Quyang Xianhong Wang Jaume Canaves Lukasz Jaroszewski Robert Schwarzenbacher Marc Robinson Rechavi Chris Edwards Olga Kirillova Ray Bean, Josie Alaoen Stanford /SSRL Structure Determination Core Keith Hodgson Ashley Deacon Britt Hedman Guenter Wolf Mitch Miller Henry van den Bedem Qingping Xu Herbert Axelrod Christopher Rife Inna Levin R. Paul Phizackerley Amanda Prado John Kovarik Ross Floyd Irimpan Mathews Michael Solits Aina Cohen Paul Ellis GNF & TSRI Crystallomics Core Ray Stevens Scott Lesley Rebbeca Page Carina Grittini Glen Spraggon Andreas Kreusch Michael DiDonato Daniel McMullan Heath Klock Polat Abdubek Eileen Ambing Tanya Biorac Joanna C. Hale Justin Haugen Mike Hornsby Eric Koesema Edward Nigoghossian Kevin Quijano Megan Wemmer Aprilfawn White Juli Vincent Jeff Velasquez Kin Moy Vandana Sridhar Bernard Collins Thomas Clayton Scientific Advisory Board Carl-Ivar Brändén, Karolinska Inst., Stockholm (retired 2003) Elbert Branscomb, DOE Joint Genome Inst., Walnut Creek Stephen Cusack, EMBL – Outstation Grenoble Leroy Hood, Inst. for Systems Biology, Seattle John Kuriyan, U.C. Berkeley Erkki Ruoslahti, The Burnham Institute James Wells, Sunesis Pharmaceuticals, Inc. Charles Cantor. Sequenom, Inc. Todd Yeates, UCLA-DOE, Inst. for Genomics and Proteomics James Paulson, Consortium for Functional Glycomics, Exploratory Projects Kurt Wüthrich (NMR) Linda Columbus Touraj Etezady- Esfarjan i Wolfgang Peti Virgil Woods (DXMS) Acknowledgements NIH Protein Structure Initiative Grant P50 GM62411
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    218
  • download

    0

Transcript of From crystals to pdb: building a high throughput crystallography pipeline for structural genomics...

Page 1: From crystals to pdb: building a high throughput crystallography pipeline for structural genomics Chiu HJ 1, Wolf G 1, West W 2, van den Bedem H 1, Miller.

From crystals to pdb: building a high throughput crystallography From crystals to pdb: building a high throughput crystallography pipeline for structural genomicspipeline for structural genomics

Chiu HJ1, Wolf G1, West W2, van den Bedem H1, Miller MD1, Zhang Z1, Morse A2, Wang X2, Xu Q1, Levin I1, von Delft F3, Elsliger MA3, Godzik A2, Grzechnik SK2 and Deacon AM1

1Stanford Synchrotron Radiation Laboratory, 2575 Sand Hill Road, Menlo Park, CA 94025. 2University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093 3The Scripps Research Institute, 10550 N. Torrey Pines Rd., La Jolla, CA 92037

The Structure Determination Core (SDC) of the Joint Center for the Structural Genomics (JCSG) is dedicated to developing technologies, which streamline all the steps in the structure determination process from crystals to PDB-ready atomic coordinates. Over the last year the JCSG production capacity has increased dramatically. SDC has screened more than 7000 crystals from 192 protein targets. A total of 232 datasets from 106 targets have been collected and 90 structures have been solved. In order to handle the rapidly growing flow of experimental data, we have developed a set of crystallographic and database tools to both track and streamline our workflow.

Crystal cassettes are shipped to SDC from the Crystallomics Core. All relevant crystal information is captured in the central JCSG database and is downloaded in a “Beamline Report”. Crystals are screened automatically using the Stanford Auto-Mounter and Blu-Ice software. The visual and diffraction properties of each crystal are recorded. A computer program, DISTIL, is under development to automatically analyze diffraction images and provide an objective screening evaluation for each crystal. The best crystals for each target are flagged for data collection.

A computer program, Xsolve, is used for automatic crystallographic data processing and structure solution. A model building tool providing crystallographers with the best possible initial model for refinement is under development. The results of the analysis are uploaded to a Structure Solution Tracking System. A Refinement Tracking System requests weekly updates and collects all the data necessary for a peer-review Quality Control step, before the coordinates are deposited to the Protein Data Bank.

The Joint Center for Structural Genomics

Mission: To establish a robust and scalable protein structure determination pipeline that will form the foundation for a large-scale cost effective production center for structural genomics.

Structural Genomics of Thermotoga maritima

T.maritima genome

A system to test the pipeline• Small bacterial genome 1877 gene products• Proteins should express well in E. coli• Proteins from a thermophile may be more stable• Process entire genome• Establish trends in process e.g. crystallization.

Category Number % Category Number %

Nucleic acid binding

DNA binding

DNA repair

DNA replication factor

Transcription factor

RNA binding

Structural Ribosomal

protein

Translation factor

Motor

Enzyme

170

109

11

3

37

43

52

12

5

600

9.2

5.9

0.5

0.1

1.9

2.3

2.8

0.6

0.2

32.4

Peptidase

Protein Kinase

Protein Phosphatase

Signal transducer

Cell adhesion

Structural Protein

Transporter

Ion channel

Ligand Binding or carrier

Electron transporter

Unknown or unclassified

27

17

8

32

1

61

202

3

255

52

713

1.5

0.9

0.4

1.7

0.0

3.3

10.9

0.2

13.8

2.8

38.5

Total 1877 100%

HT StructureDetermination2nd Generation

HT Data Collection1st Generation Prototype3rd Generation Software

TargetSelection

HT Imaging

1st Generation Hardware6th Generation Software

StructureValidation &

DepositionAutosubmission of

electronic publication

Data flow parallels the experimental pipeline, harvesting ~300 parameters from 19 stages

HT Crystallization

HT Purification

HT Expression

PDB

HT Pipeline Processes, Bottlenecks and Leaks

purificationexpressioncloning

struc. refinementstruc. validationannotationpublication

phasingdata collection

xtal screening tracingbl xtal mounting

crystallizationimagingharvesting

targetselection

All relevant crystal information is captured in the central JCSG database in the form of Beamline Report

Target ID

Dif

frac

tion

pro

per

ties

Res

olu

tion

S

pot

qu

alit

yD

iffr

acti

on s

tren

gth

Beamline

Crystallization codition Visual properties

Robust and automated crystal screening

Initial design to productionLarge-scale capacity Shipping, storage and screeningUsed by JCSG since June 2002Implemented on all SSRL beamlines

Cassette kits distributed to

PX user groups

Integration with BLU-ICEAutomated sample mountingAutomated sample alignmentAutomated diffraction images

Increased screening capacity during SSRL shutdownLeverage existing infrastructure X-ray MicroMax-002 generator installed June 2003SSRL automated screening system used>4200 crystals screened in 9 months

All data uploaded to JCSG DBScreening, collection and structure solutionWork closely with BIC on implementationand debuggingStill more features needed to handle expanding production

Structure solution tracking

Local SDC “dataset” database

Active crystal report

Xsolve: automation of structure determination

2004 developmentsImprove success rate: better autoindexing, determine optimal resolution for scaling sweepsMore general: handle crystallographic details: re-indexing screw axes, merging sweepsMore robust operation: catch timeouts, core dumps, infinite loops etcImplement parallelization: develop tools to monitor and control processing on a Linux clusterNew program support: HKL2000, SHARP, SHELXD (not completely tested)

MosflmAutoindex

MosflmIntegrate

SolveSolve

ResolveTrace

ScalaScale

SolveP4221 mol2 ...

SolveP4222 mols3...

SolveP41221 mol2 ...

SolveP41222 mol3...

SolveP42221 mol2 ...

SolveP42222 mols3...

AutoindexIntegrateScaleSolveTrace

Main goals

• Handle majority cases

• Organize data and workflow

• Ease information flow to JCSG DB

• Allow integration of new programs.

• Use parallel execution of jobs

Refinement Tracking System

Automation of protein model completion:an inverse kinematics approach

Automatically Build Backbone Fragments:

• Build candidate closing conformations using IK techniques (robotics)• Rank according to electron density fit and conformational likelihood• Subject top-ranking candidates to real-space, torsion angle SA refinement

Results:

• Closed missing fragments of up to 12 residues in length to within 0.6A all-atom RMSD in 2.8A-model

Manually Finalizing Model:

• Labor intensive, time consuming• Existing aids are highly interactive

Lotan et al. submittedvan den Bedem et al. in preparation

Total Crystals Screened at SDC 10778

Unique Targets Represented 356

TM/non-TM targets 299/57

Datasets collected 394 (288 TM, 106 non-TM)

Unique Targets Represented 194

TM/non-TM targets 146/48

Structures solved 155 (94 MAD; 51 MR; 3 SAD; 7 NMR)

(125 TM: 30 non-TM)

JCSG production statistics (August 10, 2004)

can be searched by

Shipment IDDewarTarget IDCassette/puck

Installation of a Microsource X-ray generator at 9-2

JCSG production statistics (August 10, 2004)

More to come…22 targets: data collected, not yet solved92 targets: diffraction better than 3.5Å, not yet solved

Growing reliance on the JCSG DB500 crystals and 8 structures per month20 cassettes (2000 crystals) inventory30-40 structures in refinement

2.0 TB of diffraction images 0.5 TB of processing files>100,000 diffraction images

Average resolution of structures in PDB 2.0AAverage protein chain length 260 aaAverage number of residues in asu 480 aa

TSRI Administrative CoreIan WilsonPeter KuhnMarc ElsligerFrank von DelftTina MontgomeryGye Won HanRong ChenAngela Walker

UCSD Bioinformatics CoreJohn WooleyAdam GodzikSusan TaylorSlawomir Grzechnik Bill WestAndrew MorseJie QuyangXianhong WangJaume CanavesLukasz JaroszewskiRobert SchwarzenbacherMarc Robinson RechaviChris EdwardsOlga KirillovaRay Bean, Josie Alaoen

Stanford /SSRLStructure Determination CoreKeith HodgsonAshley DeaconBritt HedmanGuenter WolfMitch MillerHenry van den BedemQingping XuHerbert AxelrodChristopher RifeInna LevinR. Paul PhizackerleyAmanda PradoJohn KovarikRoss FloydIrimpan MathewsMichael SolitsAina CohenPaul Ellis

GNF & TSRICrystallomics CoreRay Stevens Scott LesleyRebbeca Page Carina GrittiniGlen Spraggon Andreas Kreusch Michael DiDonato Daniel McMullan Heath Klock Polat Abdubek Eileen Ambing Tanya Biorac Joanna C. Hale Justin Haugen Mike HornsbyEric Koesema Edward Nigoghossian Kevin Quijano Megan Wemmer Aprilfawn White Juli VincentJeff VelasquezKin MoyVandana SridharBernard CollinsThomas Clayton

Scientific Advisory BoardCarl-Ivar Brändén,

Karolinska Inst., Stockholm (retired 2003)Elbert Branscomb,

DOE Joint Genome Inst., Walnut CreekStephen Cusack,

EMBL – Outstation GrenobleLeroy Hood,

Inst. for Systems Biology, SeattleJohn Kuriyan, U.C. Berkeley

Erkki Ruoslahti, The Burnham Institute

James Wells, Sunesis Pharmaceuticals, Inc.

Charles Cantor. Sequenom, Inc.Todd Yeates,

UCLA-DOE, Inst. for Genomics and Proteomics

James Paulson, Consortium for Functional Glycomics,

The Scripps Research Institute

Exploratory ProjectsKurt Wüthrich (NMR)Linda ColumbusTouraj Etezady-EsfarjaniWolfgang PetiVirgil Woods (DXMS)

Acknowledgements

NIH Protein Structure Initiative Grant P50 GM62411