Classification: understanding the diversity and principles of

64
MCSG 2001 structure protein structur e and function the diversity and principles of

description

Classification: understanding the diversity and principles of. protein structure and function. MCSG 2001 structures. Protein structure classification. Main reference: Robert B. Russell (2002) Classification of Protein Folds. Molecular Biotechnology 20:17-28. - PowerPoint PPT Presentation

Transcript of Classification: understanding the diversity and principles of

Page 1: Classification: understanding the diversity and principles of

MCSG 2001 structures

protein structure and function

Classification: understanding the diversity and principles of

Page 2: Classification: understanding the diversity and principles of

Protein structure classification

Main reference: Robert B. Russell (2002) Classification of Protein Folds. Molecular Biotechnology 20:17-28.

Importance: central to studies of protein structure, function, and evolution

Philosophy: phyletic vs. phenetic Method: structure comparison + human know

ledge

Page 3: Classification: understanding the diversity and principles of

Philosophy of classification

Phyletic: based on phylogenetic relationship

Phenetic: based on study of phenomena (phenomelogical)

Page 4: Classification: understanding the diversity and principles of

Classification Unit: Domain, a LEGO piece

Ranganathan

Page 5: Classification: understanding the diversity and principles of

From domain to assembly

Domains are shuffled, duplicated and fused to make proteins

On average, a domain is of 173 a.a. in size, compared to 466 a.a. for a yeast protein

Most of the natural domain sequences assume one of a few thousand folds, of which ~1000 are already known

no satisfactory estimate yet for the number of macromolecular complexes

On average, a yeast complex may consist of 7.5 proteins

Sali et al. 2003

Page 6: Classification: understanding the diversity and principles of

Distribution of Protein size

Swiss-prot

Page 7: Classification: understanding the diversity and principles of

Structural vs. functional domain

Page 8: Classification: understanding the diversity and principles of

Russian doll: a conceptual problem

Singh

Page 9: Classification: understanding the diversity and principles of

Approaches

Hierarchical Based on the types and arrangements of

secondary structures Unit (level): domain Domain assignment - structural vs. functional (fold or function

in isolation) - automated assignment methods

(structure vs. sequence)

Page 10: Classification: understanding the diversity and principles of

A. P. Singh

Page 11: Classification: understanding the diversity and principles of

Assignment of Class

All or All (could be subjective) / ( unit) or + Other classes

Page 12: Classification: understanding the diversity and principles of

Class assignment could be subjective

Page 13: Classification: understanding the diversity and principles of

All-alpha structures

Page 14: Classification: understanding the diversity and principles of

All-beta structures

Superoxide dimutase

Page 15: Classification: understanding the diversity and principles of

Alpha/beta structures

Closed barrel Open twisted sheet

Page 16: Classification: understanding the diversity and principles of

B-a-b motif

(barrel) (sheet)

Page 17: Classification: understanding the diversity and principles of

a/b vs. a+b

Page 18: Classification: understanding the diversity and principles of
Page 19: Classification: understanding the diversity and principles of

Assignment of Fold

Defined by the number, type, and arrangement of SSEs

Connectivity (e.g. circular permutation, scrambled proteins)

Page 20: Classification: understanding the diversity and principles of

Assignment of Superfamily Homologous even in the absence of

significant sequence similarity - certain level of structural similarity - unusual structural features - low but significant sequence similarity

from structural alignment - key active site residues - sequence similarity bridges Divergence vs. convergence

Page 21: Classification: understanding the diversity and principles of

Divergent vs. convergent evolution

Divergent evolution: decent from a common ancestor; become variant due to mutation

Convergent evolution: no common ancestor; become similar due to functional or physical constraint

Page 22: Classification: understanding the diversity and principles of

Anti-freeze protein: convergent evolution

crystal.biochem.queensu.ca

Page 23: Classification: understanding the diversity and principles of

Homologous fold

Ranganathan

Page 24: Classification: understanding the diversity and principles of

Analogous fold

Ranganathan

Page 25: Classification: understanding the diversity and principles of

CN

N’ C’

N’

C’

NC

Scallop Myosin Regulatory Domain C chain

Aldehyde Oxidoreductase A chain

Analogous or homologous?

Page 26: Classification: understanding the diversity and principles of

Assignment of Family

significant sequence similarity

Page 27: Classification: understanding the diversity and principles of

Classification databases

SCOP - careful assignment of evolutionary

relationships; homologous vs. analogous CATH - A:architecture FSSP - a list of structural neighbors

Page 28: Classification: understanding the diversity and principles of
Page 29: Classification: understanding the diversity and principles of
Page 30: Classification: understanding the diversity and principles of

CATH

Singh

Class: SSE composition & packing

Architecture: overall shape of domain, ignore SSE connectivity

Topology (Fold): consider connectivity

Homologous superfamily: a common ancestor

Page 31: Classification: understanding the diversity and principles of

Classification databasesCATH Class, Architecture, Topolgy, and Homologo

us superfamily, a hierarchical classification of protein domain structureshttp://www.biochem.ucl.ac.uk/bsm/cath_new/

SCOP Structural Classification Of Proteins: augmented manual classificationhttp://scop.mrc-lmb.cam.ac.uk/scop/

FSSP Fold classification based on Structure-Structure alignment of Proteinshttp://www2.ebi.ac.uk/dali/fssp/

Page 32: Classification: understanding the diversity and principles of

Genome-scale structure analysis

Curr. Opin. Str. Biol., 2003

Page 33: Classification: understanding the diversity and principles of

genome-scale structure annotation

Page 34: Classification: understanding the diversity and principles of

Some statistics 80% of sequence families belong to 400 folds (top 10 folds

account for 40% of sequence families) >60% of genes encode multi-domain proteins (80% for euk

aryotes) ~50,000 protein families and ~150,000 singletons structural superfamilies ~1800 (+/-50) and ~10,000 unifolds 50-60% of distant homologs (<25% seq. id.) can be recogni

zed by profile-based sequence comparison methods (e.g. psi-blast, HMM, etc)

50-60% of the enzymes in yeast and E coli are common, and >80% of pathways are shared

Page 35: Classification: understanding the diversity and principles of

superfolds, superfamilies, supersites

TIM barrel, Rossmann-like, ferredoxin-like, b-propellers, 4-helix bundle, Ig-like, b-jelly rolls, Oligonucleotide/oligosaccharride binding (OB) fold, SH3-like.

Structure -> function (only 50% correct)

Page 36: Classification: understanding the diversity and principles of

Structure implicates function?

Page 37: Classification: understanding the diversity and principles of

Assessing the Progress of Structural Genomics Projects

1 Nov. 2002, Science

Page 38: Classification: understanding the diversity and principles of

Target Tracking by PDB (Sep 2002)

Page 39: Classification: understanding the diversity and principles of

PDB content growth (May 2005)

Page 40: Classification: understanding the diversity and principles of
Page 41: Classification: understanding the diversity and principles of

Some statistics Contributed 316 non-redundant PDB entries compri

sing 459 CATH and 393 SCOP domains by 11 SG consortia.

14% of the targets have a homolog (>30% sequence identity) solved by another consortium

67% of SG domains in CATH are unique vs. 21% of non-SG domains.

19% and 11% contributed new superfamilies and new folds, respectively.

Allow new and reliable homology models for 9287 non-redundant gene sequences in 208 completely sequenced genomes.

Page 42: Classification: understanding the diversity and principles of

PSI Structure Statistics2002-2003

Unique structures (30% seq. ID) PSI 70%

PDB 10% New folds

PSI 12%PDB 3%

NIGMS Protein Structure Initiative

Page 43: Classification: understanding the diversity and principles of

Average total cost per structurePSI Pilot phase

01 $650 K (7 centers)02 $400 K (9

centers)03 $240 K04 ?05 $100 K (goal)

PSI-2 Production phase

06-10 $50 K (goal)Comparison ~$250-300 K

NIGMS Protein Structure Initiative

Page 44: Classification: understanding the diversity and principles of

PSI Pilot Phase -- Lessons Learned

1. Structural genomics pipelines can be constructed and scaled-up

2. High throughput operation works for many proteins

3. Genomic approach works for structures4. Bottlenecks remain for some proteins5. A coordinated, 5-year target selection

policy must be developed6. Homology modeling methods need

improvementNIGMS Protein Structure Initiative

Page 45: Classification: understanding the diversity and principles of

PSI-2 Production Phase (2005)

Interacting network for high throughput protein structure determination with three components Large-scale centers for protein structure

production of selected targets Specialized centers for technology

development leading to high throughput structure determination of difficult proteins

Specialized centers for protein structures relevant to disease (other NIH Institutes and Centers)

Included in NIH Structural Biology Roadmap plans

NIGMS Protein Structure Initiative

Page 46: Classification: understanding the diversity and principles of

Computational structural genomics

Page 47: Classification: understanding the diversity and principles of

Summary table

Page 48: Classification: understanding the diversity and principles of

Fold occurrence matrix

Page 49: Classification: understanding the diversity and principles of

Common Folds

Page 50: Classification: understanding the diversity and principles of

Unique Folds

Page 51: Classification: understanding the diversity and principles of

Main findings

Folds can be assigned to ~25% ORF and ~20% amino acids for the 20 genomes

>80% scop folds identified in one of the 20 organisms Worm and E. coli have most distinct folds Level of gene duplication (2.4 folds in MG, 32 in worm) higher th

an observed based on sequence only Top three most common folds: P-loop NTP hydrolase, the ferro

doxin fold, TIM-barrel Unique folds tend to be those involved in cell defense (e.g. toxin

s) Common folds tend to be more “symmetrical”

Page 52: Classification: understanding the diversity and principles of

Fold evolution

Page 53: Classification: understanding the diversity and principles of

Insertion, deletion, substitution

Page 54: Classification: understanding the diversity and principles of

a-helix & b-sheet substitution in Rossmann-fold like proteins

Page 55: Classification: understanding the diversity and principles of

A path from all-b to all-a proteins

Page 56: Classification: understanding the diversity and principles of

N

C

AB

DC N

C

AB

DC

..A..B..C..D.. ..C..D..A..B..

Circular Permutation (CP)

Page 57: Classification: understanding the diversity and principles of

1nls (Concanavalin) 1led (Lectin)

N

C

N C

Circular permutation example

Page 58: Classification: understanding the diversity and principles of

Strand invasion/withdraw

Page 59: Classification: understanding the diversity and principles of

Strand invasion/withdraw

Page 60: Classification: understanding the diversity and principles of

Strand invasion/withdraw

Page 61: Classification: understanding the diversity and principles of

Hairpin flips/swaps

Page 62: Classification: understanding the diversity and principles of

Hairpin flips/swaps

Page 63: Classification: understanding the diversity and principles of

Sickel-cell hemoglobin confers resistance to malaria

Hemoglobin &sickle cell anemia

Page 64: Classification: understanding the diversity and principles of

Lethal legos as killer clumps

The inherited form of Lou Gehrig's disease--familial amyotrophic lateral sclerosis (FALS)--causes a decay of the motor neurons in the spinal cord and brain, a devastating loss of bodily control, and death within 2 to 5 years.

Elam et al. Nat. Str. Biol., 2003