Management and Distribution of Chemical Data in the Protein Data Bank

26
Management and Distribution of Chemical Data in the Protein Data Bank John Westbrook, Dimitris Dimitropoulos, Jasmine Young, Peter Rose, Philip E. Bourne and Helen Berman RCSB Protein Data Bank U.S. Government Chemical Databases and Open Chemistry August 26, 2011

description

Management and Distribution of Chemical Data in the Protein Data Bank. John Westbrook, Dimitris Dimitropoulos, Jasmine Young, Peter Rose, Philip E. Bourne and Helen Berman RCSB Protein Data Bank. U.S. Government Chemical Databases and Open Chemistry August 26, 2011. - PowerPoint PPT Presentation

Transcript of Management and Distribution of Chemical Data in the Protein Data Bank

Page 1: Management and Distribution of Chemical Data in the  Protein Data Bank

Management and Distribution of Chemical Data in the

Protein Data Bank

John Westbrook, Dimitris Dimitropoulos, Jasmine Young, Peter Rose, Philip E. Bourne

and Helen Berman

RCSB Protein Data Bank

U.S. Government Chemical Databases and Open Chemistry August 26, 2011

Page 2: Management and Distribution of Chemical Data in the  Protein Data Bank

What is the Protein Data Bank?

Single international archive for information about the structure of large biological molecules

PDB depositions should be restricted to atomic coordinates that are substantially determined by experimental measurements on specimens containing biological macromolecules

Outcome of a Workshop on Archiving Structural Models of Biological Macromolecules (2006) Structure 14: 1211-1217

Page 3: Management and Distribution of Chemical Data in the  Protein Data Bank

What is the content of the PDB? Public archive (August 2011)

More than 75,000 entries More than 550,000 files Requires over 115 GB of

storage Data dictionaries Derived data files

For each entry Atomic coordinates Sequence information Description of structure Experimental data Release status information

Internal archive Depositor correspondence Depositor contact information Paper records Documentation Historical records from Day

One

Page 4: Management and Distribution of Chemical Data in the  Protein Data Bank

Who manages the PDB?

NSF, NIGMS, DOE, NLM, NCI, NINDS, NIDDK NLM

EMBL-EBI, Wellcome Trust, BBSRC, NIGMS, EU NBDC-JST

Page 5: Management and Distribution of Chemical Data in the  Protein Data Bank

Who uses the PDB?

Depositors

Users

Page 6: Management and Distribution of Chemical Data in the  Protein Data Bank

Nu

mb

er o

f re

leas

ed e

ntr

ies

Year:

Page 7: Management and Distribution of Chemical Data in the  Protein Data Bank

Chemical data in PDB

Understanding the interactions between proteins and small molecules is key to understanding biological function Providing accurate chemical descriptions is a major focus of PDB annotation All polymer and small molecule chemical components are described in the PDB Chemical Component Dictionary Significant software and data infrastructure has been created to maintain this dictionary and to provide a consistent chemical representation across the PDB archiveChemical representation in the PDB is under constant scrutiny and is continuously improved

Page 8: Management and Distribution of Chemical Data in the  Protein Data Bank

Depositedcoordinates

Chemicalcomponents

Perceivedcovalentstructure

New?

Chemical ComponentDictionary

Standardizeresidue/atom nomenclature

Yes

No

Comparewith

dictionary

Processdeposited

entry

Annotatechemical definition

How does new chemistry enter the PDB?

Page 9: Management and Distribution of Chemical Data in the  Protein Data Bank

PDB entry 3dnb; 1.3 Å resolution PDB entry 6bna; 2.21 Å resolution

Chemical data in PDB are experimentally derived subject to modeling restraints

Assessing data quality

Page 10: Management and Distribution of Chemical Data in the  Protein Data Bank

How are data checked now?

Chemistry Polymer (match to sequence DB and internal

consistency) Ligands, ions, inhibitors (match to dictionary)

Geometry Close contacts Valence geometry Torsion angles

Experimental data Model vs. structure factors

Page 11: Management and Distribution of Chemical Data in the  Protein Data Bank

Method-specific Validation Task Forces have been convened to collect recommendations and develop consensus on method-specific issues, including validation checks that should be performed and identification of validation software applications.

On-going focus on data quality

X-ray Validation 2008 Workshop on Next

Generation Validation Tools for the wwPDB

White paper accepted by Structure Chair: Randy J. Read (University of

Cambridge)

3DEM Validation Meeting September 2010 Chairs: Richard Henderson (Maps,

Cambridge University), Andrej Sali (Models, UCSF)

White paper in progress

NMR Validation Meetings held September 2009,

January 2011 Report in progress Chairs: Gaetano Montelione

(Rutgers), Michael Nilges (Institut Pasteur)

Small-Angle Scattering Members: Jill Trewhella (University

of Sydney), Dmitri Svergun (EMBL Hamburg), Andrej Sali (UCSF), Mamoru Sato (Yokohama City University), John Tainer (Scripps)

Page 12: Management and Distribution of Chemical Data in the  Protein Data Bank

Documenting PDB chemistryin the Chemical Component Dictionary

Library of all polymer and non-polymer chemical components in PDB ~13,000 chemical component definitions 400 additional definitions of amino acid

protonation variants ~700 new components released this year ~1700 component definitions updated this year Maintained by members of the wwPDB

Page 13: Management and Distribution of Chemical Data in the  Protein Data Bank

wwPDB resourceswwpdb.org

Page 14: Management and Distribution of Chemical Data in the  Protein Data Bank

Chemical Component Dictionary and data download options

Chemical definitions in mmCIF, PDBML/XML and SDF/MOL formats

Tabulations of SMILES, InChI and InChI key descriptors for each chemical definition

Bundles of coordinates extracted from PDB entries for each ligand in the archive, stored in mmCIF, PDBML and SDF/MOL formats

Page 15: Management and Distribution of Chemical Data in the  Protein Data Bank

Chemical Component Dictionary content

Molecular names and synonyms Chemical formula, formula weight, and formal charge Atom and residue nomenclature Polymer linking type Model coordinates (an example from a PDB entry) Computed coordinates (Corina or OpenEye) Connectivity and bond types Stereochemistry and aromaticity Systematic names (ACDLabs & OpenEye) SMILES, InChi, and InChiKey descriptors Release status and revision history

Page 16: Management and Distribution of Chemical Data in the  Protein Data Bank

Chemical Component Dictionary Interpretation

Definitions includeCommon or representative forms of the moleculeGenerally neutral and complete moleculesOff-the-shelf reagents used to prepare an experimental sampleModel coordinates from a single experimental observationComputed coordinates from programs: Corina or OpenEye/Omega

Page 17: Management and Distribution of Chemical Data in the  Protein Data Bank

Searching the Chemical Component Dictionary

ligand-expo.rcsb.org

Search optionsMolecular NameFormulaSMILES InChI/InChIKeyPDB component identifierChemical substructure

Browsing optionsStandard and modified amino acidsStandard and modified nucleotidesSelected top-selling pharmaceuticalsCommon aromatic ring systems

Page 18: Management and Distribution of Chemical Data in the  Protein Data Bank

Ligand Expo: Browse dictionary content

Page 19: Management and Distribution of Chemical Data in the  Protein Data Bank

Ligand Expo: View chemical details

Page 20: Management and Distribution of Chemical Data in the  Protein Data Bank

Ligand Expo: View chemical details

Page 21: Management and Distribution of Chemical Data in the  Protein Data Bank

Ligand Expo: Find data in related resources

Page 22: Management and Distribution of Chemical Data in the  Protein Data Bank

Find small molecules at the RCSB PDBhttp://www.pdb.org

Simple search for all entries containing a particular ligand

Page 23: Management and Distribution of Chemical Data in the  Protein Data Bank

RCSB PDB Small molecule Advanced Search

Interactive chemical structure search with graphics Exact, substructure, superstructure, MW searches Restricted formula searches

Page 24: Management and Distribution of Chemical Data in the  Protein Data Bank

RCSB PDB report and display of molecular interactions

Page 25: Management and Distribution of Chemical Data in the  Protein Data Bank

Access

RCSB Protein Data

Bank www.pdb.org

Ligand Expo ligand-expo.rcsb.org

wwPDB www.wwpdb.org

Dictionary

Resources mmcif.pdb.org

pdbml.pdb.org

Page 26: Management and Distribution of Chemical Data in the  Protein Data Bank

Acknowledgements

Operated by two members of the RCSB:

Supported by:

NIGMS

The RCSB PDB is a member of the