The RCSB Protein Data Bank Teaching an Old Dog New Tricks

43
Swiss-Prot - 20 Year Cele bration www.pdb.org • [email protected] The RCSB Protein Data Bank Teaching an Old Dog New Tricks Philip E. Bourne [email protected]

description

The RCSB Protein Data Bank Teaching an Old Dog New Tricks. Philip E. Bourne [email protected]. From the guardian of a resource (institution) to all those men and women who make biology possible – may we never take you for granted. Biocurator Perspectives. A Tribute. Agenda. The old dog - PowerPoint PPT Presentation

Transcript of The RCSB Protein Data Bank Teaching an Old Dog New Tricks

Page 1: The RCSB Protein Data Bank Teaching an Old Dog New Tricks

Swiss-Prot - 20 Year Celebration

www.pdb.org • [email protected]

The RCSB Protein Data BankTeaching an Old Dog New Tricks

Philip E. Bourne

[email protected]

Page 2: The RCSB Protein Data Bank Teaching an Old Dog New Tricks

Swiss-Prot - 20 Year Celebration

www.pdb.org • [email protected]

A Tribute

From the guardian of a resource (institution) to all those men and

women who make biology possible – may we never take you for

granted

Biocurator Perspectives

Page 3: The RCSB Protein Data Bank Teaching an Old Dog New Tricks

Swiss-Prot - 20 Year Celebration

www.pdb.org • [email protected]

Agenda

• The old dog• New tricks

– Thinking differently about proteins– Virtual Communities

• Internal (wwPDB)• External

• What will the resource look like in 2-5 years?

Page 4: The RCSB Protein Data Bank Teaching an Old Dog New Tricks

Swiss-Prot - 20 Year Celebration

www.pdb.org • [email protected]

History of the Old Dog1970s• Community discussions about how to establish an archive of protein structures• Cold Spring Harbor meeting in protein crystallography• PDB established at Brookhaven (October 1971; 7 structures)1980s• Number of structures increases as technology improves• Community discussions about requiring depositions• IUCr guidelines established• Number of structures deposited increases1990s• Ontology defined • Structural genomics begins• PDB moves to RCSB 2000s• wwPDB formed

Page 5: The RCSB Protein Data Bank Teaching an Old Dog New Tricks

Swiss-Prot - 20 Year Celebration

www.pdb.org • [email protected]

History of the Old Dog1970s• Community discussions about how to establish an archive of protein structures• Cold Spring Harbor meeting in protein crystallography• PDB established at Brookhaven (October 1971; 7 structures)1980s• Number of structures increases as technology improves• Community discussions about requiring depositions• IUCr guidelines established• Number of structures deposited increases1990s• Ontology defined • Structural genomics begins• PDB moves to RCSB 2000s• wwPDB formed

Page 6: The RCSB Protein Data Bank Teaching an Old Dog New Tricks

Swiss-Prot - 20 Year Celebration

www.pdb.org • [email protected]

Unchanging Core Mission

• Create and maintain a well-curated database of macromolecular structure data derived using experimental methods

that will…• Facilitate and support scientific research and education

that is…• Always accessible to a diverse user community worldwide• Developed in collaboration with that community

Page 7: The RCSB Protein Data Bank Teaching an Old Dog New Tricks

Swiss-Prot - 20 Year Celebration

www.pdb.org • [email protected]

Challenges - Scientific • More complex structures – molecular machines,

complexes• New methods (e.g. EM)• Lack of a vocabulary to provide reductionism in

complex structures• Partially solved problems in analyzing structures –

structure alignments, domain definitions, functional site determination and characterization, pathway relationships, interaction partners

• Integrating microscopic and macroscopic views• Disease relationships

Page 8: The RCSB Protein Data Bank Teaching an Old Dog New Tricks

Swiss-Prot - 20 Year Celebration

www.pdb.org • [email protected]

um

ber

of

rele

ased

en

trie

s

Year:

Growth and Complexity

Page 9: The RCSB Protein Data Bank Teaching an Old Dog New Tricks

Structure

SWISS-PROT/ GenBank IDs

Gene Ontology

Enzyme Commission

SourceOrganism

OMIM/Disease

Genomes(NCBI Gene)

Structural Genomics Targets

PubmedNCBI Taxonomy

Domains/Families

Primary References Derived References

•Source Organism Browser

•GO Browsers•Find Structures by GO ID

• Enzyme Browser

• Reactome

• Genome Browser•SNPs Mapped to Structure•Find Structures by SP ID

SCOP

CATH

•Disease Browser

Some Actions

•CATH Browser•SCOP Browser•PFAM Display

•Abstract Search

• Target Search

Data Integration

NAR 2005, 33: D233-D237

Human Proteome &Homology Models

•Function Coverage•Target Selection

PFAM

Page 10: The RCSB Protein Data Bank Teaching an Old Dog New Tricks

Swiss-Prot - 20 Year Celebration

www.pdb.org • [email protected]

Challenges - Technical• Sheer numbers• Efficient visualization• Improved annotation• Demands from a more diverse user base• Centralization versus decentralization• Web V2

Page 11: The RCSB Protein Data Bank Teaching an Old Dog New Tricks

Swiss-Prot - 20 Year Celebration

www.pdb.org • [email protected]

Diverse User Community (180,000 individuals per month)

and Diversifying Further• Structural biologists

• Computational biologists

• Experimental biologists

• Educators

• Students

• Lay public

Page 12: The RCSB Protein Data Bank Teaching an Old Dog New Tricks

Swiss-Prot - 20 Year Celebration

www.pdb.org • [email protected]

Agenda

• The old dog• New tricks

– Thinking differently about proteins– Virtual Communities

• Internal (wwPDB)• External

• What will the resource look like in 2-5 years?

Page 13: The RCSB Protein Data Bank Teaching an Old Dog New Tricks

Swiss-Prot - 20 Year Celebration

www.pdb.org • [email protected]

New Tricks – Protein Representation

The conventional view of a protein (left) has had a remarkable impact on our understanding of living systems, but is it time for a new view? It is not how one protein sees another after all.

Page 14: The RCSB Protein Data Bank Teaching an Old Dog New Tricks

Limitations of a Cartesian Viewpoint

• A local viewpoint – does not capture the global properties of the protein

• Limited to a single scale descriptor

• Limits comparative analysis

New Tricks – Protein Representation

Page 15: The RCSB Protein Data Bank Teaching an Old Dog New Tricks

Swiss-Prot - 20 Year Celebration

www.pdb.org • [email protected]

Protein Kinase A – Open Book View

Page 16: The RCSB Protein Data Bank Teaching an Old Dog New Tricks

Swiss-Prot - 20 Year Celebration

www.pdb.org • [email protected]

Superfamily Members – The Same But Different

Page 17: The RCSB Protein Data Bank Teaching an Old Dog New Tricks

Protein kinase like superfamily. Left - rmsd distance matrix. Right – number of violations of the triangle inequality at each pair of

proteins.

Alignment Violates the Triangle Inequality

),(),(),(|),(),(| kjdjidkidkjdjid

Many of the features in the distance matrix may be due to “distortions” induced by the failure to satisfy the TI.

New Tricks – Protein Representation

Page 18: The RCSB Protein Data Bank Teaching an Old Dog New Tricks

Swiss-Prot - 20 Year Celebration

www.pdb.org • [email protected]

• Roots in spherical harmonics

• Parameter space and boundary conditions can be a variety of properties

• Order of the multipoles defines the granularity of the descriptors

• Bottom line – interpreted as shape descriptors

An Alternative Approach: Multipolar Representation

Gramada & Bourne 2006 BMC Bioinformatics 7:242

Page 19: The RCSB Protein Data Bank Teaching an Old Dog New Tricks

Results – Protein Kinase Like Superfamily Alignment

Scheeff & Bourne 2005 PLoS Comp. Biol., 1(5) e49

Clear distinction between families.

Some clustering seen inside TPKs that resemble various groups, even though there is little shape discrimination at this level.

New Tricks – Protein Representation

Page 20: The RCSB Protein Data Bank Teaching an Old Dog New Tricks

Possibilities – Structure Based Phylogenetic Analysis

Scheeff & Bourne Multipoles

New Tricks – Protein Representation

Page 21: The RCSB Protein Data Bank Teaching an Old Dog New Tricks

Swiss-Prot - 20 Year Celebration

www.pdb.org • [email protected]

New Tricks – Protein Motion

OrderedStructures

DisorderedStructures

Structures exist in a spectrum from order to disorder

Page 22: The RCSB Protein Data Bank Teaching an Old Dog New Tricks

Obtaining Protein Dynamic InformationProtein Structures Treated as a

3-D Elastic Network

Bahar, I., A.R. Atilgan, and B. Erman

Direct evaluation of thermal fluctuations in proteins using a single-parameter harmonic potential.

Folding & Design, 1997. 2(3): p. 173-181.

New Tricks – Protein Motion

Page 23: The RCSB Protein Data Bank Teaching an Old Dog New Tricks

Gaussian Network Model

• Each C is a node in the network.

• Each node undergoes Gaussian-distributed fluctuations influenced by neighboring interactions within a given cutoff distance. (7Å)

• Decompose protein fluctuation into a summation of different modes.

New Tricks – Protein Motion

Page 24: The RCSB Protein Data Bank Teaching an Old Dog New Tricks

Functional Flexibility Score

• Utilize correlated movements to help define regional flexibility with functional importance.

Functionally Flexible Score

For each residue:

1. Find Maximum and Minimum Correlation.

2. Use to scale normalized fluctuation to determine functional importance.

Gu, Gribskov & Bourne 2006 PLoS Comp. Biol. 2(7) e90

Page 25: The RCSB Protein Data Bank Teaching an Old Dog New Tricks

Identifying FFRs in HIV Protease

Gu, Gribskov & Bourne 2006 PLoS Comp. Biol. 2(7) e90

Page 26: The RCSB Protein Data Bank Teaching an Old Dog New Tricks

Other Examples BPTI and Calmodulin

Gu, Gribskov & Bourne 2006 PLoS Comp. Biol. 2(7) e90

Page 27: The RCSB Protein Data Bank Teaching an Old Dog New Tricks

Side Note: Gaussian Network Model vs Molecular Dynamics

• GNM relatively course grained

• GNM fast to compute vs MD–Look over larger time scales

–Suitable for high throughput

New Tricks – Protein Motion

Page 28: The RCSB Protein Data Bank Teaching an Old Dog New Tricks

Swiss-Prot - 20 Year Celebration

www.pdb.org • [email protected]

An Active Research Program Around the Resource is Good for

the Resource

Page 29: The RCSB Protein Data Bank Teaching an Old Dog New Tricks

Swiss-Prot - 20 Year Celebration

www.pdb.org • [email protected]

Agenda

• The old dog• New tricks

– Thinking differently about proteins– Virtual Communities

• Internal (wwPDB)• External

• What will the resource look like in 2-5 years?

Page 30: The RCSB Protein Data Bank Teaching an Old Dog New Tricks

Swiss-Prot - 20 Year Celebration

www.pdb.org • [email protected]

• Ensures that the PDB

remains a single &

uniform archive publicly

available to the worldwide

community

• 3 founding members:

RCSB PDB, PDBj, MSD-

EBI

Single worldwide archive of macromolecular structural data

Virtual Communities - Internal

Page 31: The RCSB Protein Data Bank Teaching an Old Dog New Tricks

wwPDB Activities

• Collaborative projects– Remediation

• taxonomy, ligands, literature

– Single data processing system

Virtual Communities - Internal

Page 32: The RCSB Protein Data Bank Teaching an Old Dog New Tricks

Swiss-Prot - 20 Year Celebration

www.pdb.org • [email protected]

Agenda

• The old dog• New tricks

– Thinking differently about proteins– Virtual Communities

• Internal (wwPDB)• External (modeling, other….)

• What will the resource look like in 2-5 years?

Page 33: The RCSB Protein Data Bank Teaching an Old Dog New Tricks

Swiss-Prot - 20 Year Celebration

www.pdb.org • [email protected]

Virtual Communities - External

Consider the PDB a gathering point through which a virtual and

real community interacts with each other around a common

interest

Page 34: The RCSB Protein Data Bank Teaching an Old Dog New Tricks

Virtual Communities - External

PDB-in-a-CAVE

NJ Science Olympiad Science ExpoTraveling art exhibit

for lay audiences

Website Tutorials/Feedback

Molecule of the Month

Real

Virtual

Page 35: The RCSB Protein Data Bank Teaching an Old Dog New Tricks

Swiss-Prot - 20 Year Celebration

www.pdb.org • [email protected]

Virtual Communities - Modelers• Recommendations of Workshop

– PDB depositions should be restricted to atomic coordinates that are substantially determined by experimental measurements on specimens containing biological macromolecules

– A central, publicly available archive (or technical equivalent thereof) or portal should be established for models

– It was unanimously agreed that methods for assessing model quality are essential

Structure 2006 To be published

Page 36: The RCSB Protein Data Bank Teaching an Old Dog New Tricks

Swiss-Prot - 20 Year Celebration

www.pdb.org • [email protected]

Agenda

• The old dog• New tricks

– Thinking differently about proteins– Virtual Communities

• Internal (wwPDB)• External

• What will the resource look like in 2-5 years?

Page 37: The RCSB Protein Data Bank Teaching an Old Dog New Tricks

Swiss-Prot - 20 Year Celebration

www.pdb.org • [email protected]

What Will the Resource Look Like in the Next 2-5 Years?

• Upwards of 75,000 structures• Consensus (and different) views at the micro and

macro scale – domains, SNPs, gene structure, cell localization, pathways, interactions, post-translational modification…

• Community annotation cf Wikipedia• Distributed subsets - External Reference Files (XML)• MyPDB• PDB-in-a-box• Specialized visualization tools (mbt.sdsc.edu)

Page 38: The RCSB Protein Data Bank Teaching an Old Dog New Tricks

1. A link brings up figures from the paper

0. Full text of PLoS papers stored in a database

2. Clicking the paper figure retrievesdata from the PDB which is

analyzed

3. A composite view ofjournal and database

content results

Is a database really different than a biological journal?

PloS Comp Biol 2005 1(3) e34

4. The composite view haslinks to pertinent blocks

of literature text and back to the PDB

1.

2.

3.

4.

The Knowledge and Data Cycle

Now assigning DOIs to structures

Page 39: The RCSB Protein Data Bank Teaching an Old Dog New Tricks

Swiss-Prot - 20 Year Celebration

www.pdb.org • [email protected]

Acknowledgements

The RCSB PDB

Jenny GuProtein Motions

NIH, NSF, DOE

Apostol GramadaMultipole Analysis

Page 40: The RCSB Protein Data Bank Teaching an Old Dog New Tricks

A Protein is More than the Union of its Parts

• Breaking the protein into parts changes the object of the comparison

• This is interpreted in many cases to imply that the rmsd measure is inadequate.

• The reality is that it is the aligning of structure that breaks the triangle inequality and not the measure per se. The reason for failure is that we effectively compare different objects then we say we do.

From Røgen & Fain (2003), PNAS 100:119-124

New Tricks – Protein Representation

Page 41: The RCSB Protein Data Bank Teaching an Old Dog New Tricks

An Alternative Approach: Multipolar Representation

Roots in Spherical Harmonics

• Parameterization

+ boundary conditionsgCharge distribution (i.e. structure) Ð

f qlm out;M lm in;qilm; M i

lmg

Scalar potential

Gramada & Bourne 2006 BMC Bioinformatics 7:242

New Tricks – Protein Representation

Spatial distribution ofa scalar quantity

Page 42: The RCSB Protein Data Bank Teaching an Old Dog New Tricks

• “Out” Multipoles

qlm =P

i=1

N

r li Y ã

lm(òi;þi); l = 0;ááá;1 ; m = à l;ááá;l

For a given rank l, they form a 2l+1 dimensional vector under 3D rotations

ql = fql;mgm=à l;ááá;l

Vector algebra applies => metric properties

Gramada & Bourne 2006 BMC Bioinformatics 7:242

An Alternative Approach: Multipolar Representation

New Tricks – Protein Representation

Page 43: The RCSB Protein Data Bank Teaching an Old Dog New Tricks

The multipoles can be interpreted as shape descriptors

In principle, from the entire series of multipoles one can reconstruct the scalar field and therefore the density, i.e the entire set of Cartesian coordinates, i. e. of the structure with a geometric level of detail

The partitioning of the multipole series according to various representation of the rotational group allows for a multi-scale description of the structure

An Alternative Approach: Multipolar Representation

Gramada & Bourne 2006 BMC Bioinformatics 7:242New Tricks – Protein Representation