2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2....

65
ISSCB 09/03 M. Linial Discussion of Function talk IV 1. Enzyme Seq-Fun 2. Annotation 3. Integration

Transcript of 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2....

Page 1: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

Discussion of Functiontalk IV

1. Enzyme Seq-Fun2. Annotation3. Integration

Page 2: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

Sequence and Functionrelationship

taking one example: Enzymeswell known

functionality is definesconservedessential

tree like classification

Page 3: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

Relatively easy function

ENZYMES

Enzymes, WIT, KEGG etc

Page 4: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

Functionally characterized EnzymesBy Cofactors

6-hydroxyDOPAAmmoniaAscorbateATP BicarbonateBile saltsBiotinCadmiumCalciumCobalaminCobaltCoenzyme F430Coenzyme-ACopper DipyrromethaneDithiothreitolDivalent cation F420 FAD Fe(II)

FlavinFlavoproteinFMNGlutathioneHemeHeme-thiolateIron Iron(II)Iron-molybdenum Iron-sulfur Lipoyl groupMagnesium Manganese MolybdenumMolybdopterinMonovalent cationNAD NAD(P)HNickelPotassiumPQQ

Proto heme IXPterinPyridoxal phosphatePyridoxal-phosphatePyruvate Reduced flavinSeleniumSirohemeSodiumTetrahydropteridineThiamine pyrophosphateThiol-dependentTryptophan…………..

Page 5: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

Page 6: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

Functionally characterized Enzymes

1. -. -.- Oxidoreductases.1. 1. -.- Acting on the CH-OH group of donors.1. 2. -.- Acting on the aldehyde or oxo group of donors.1. 3. -.- Acting on the CH-CH group of donors.1. 4. -.- Acting on the CH-NH(2) group of donors.1. 5. -.- Acting on the CH-NH group of donors.1. 6. -.- Acting on NADH or NADPH.

5. -. -.- Isomerases.5. 1. -.- Racemases and epimerases.5. 2. -.- Cis-trans-isomerases.5. 3. -.- Intramolecular oxidoreductases..5. 4. -.- Intramolecular transferases (mutases).5. 5. -.- Intramolecular lyases.

Catalysis

Page 7: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

Structurally based alignmentsof structurally and functionally characterized sequences

(Human)

90%

(Chick)

45%(E coli)

(E coli)

(B ster.)

20%

(E coli)

(Yeast)

Sequence5.3.1.1 (TP Isomerase)

SameExact

5.3.1.1 (TP Isomerase)

BothClass 5 (isom.)

5.3.1.1 (TP Isomerase)

5.3.1.24 (PRA Isomerase)

5.3.1.15 (Xylose Isom.)

DifferentClasses

4.1.3.3 (Aldolase)

4.2.1.11 (Enolase)

Function

Page 8: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

0102030405060708090100

010203040506070

Relationship of Similarity inSequence to that in Function

%ID

Sequence similarity of pairs of proteins

% S

ame

Fu

nct

ion

Percentage of pairs that havesame precise function asdefined by Enzyme & FlyBasefunctional classifications

Page 9: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

0102030405060708090100

010203040506070

Relationship of Similarity inSequence to that in Function

%ID

% S

ame

Fu

nct

ion

M. G

erstein

Page 10: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

Can transfer both

Fold & Functional Annotation

0102030405060708090100

010203040506070

Relationship of Similarity in Sequence tothat in Function

%ID

% S

ame

Fu

nct

ion

M. G

erstein

Page 11: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

Can not transfer Fold or Functional

Annotation("Twilight Zone")

Can transfer Annotation related

Fold but not Function

Can transfer both

Fold & Functional Annotation

0102030405060708090100

010203040506070

Relationship of Similarity inSequence to that in Function

%ID

% S

ame

Fu

nct

ion

M. G

erstein

Page 12: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

Can not transfer Fold or Functional

Annotation("Twilight Zone")

Can transfer Annotation related

Fold but not Function

Can transfer both

Fold & Functional Annotation

0

1020

3040

50

6070

8090

100

010203040506070

Relationship of Similarity inSequence to that in Function

%ID

% S

ame

Fu

nct

ion

Broadvs

NarrowSimilarity

M. G

erstein

Page 13: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

Annotation-Based Analysis ofProtein Sets

Page 14: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

Few words on

Protein Function

Protein Annotation

Page 15: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

Prediction of Function

What is function? This is not a simple term

Function may be:

• a biological process (e.g. serine protease activity)

• a molecular event (e.g. proteolysis of a specific substrate)

• a cellular structure (e.g. membrane; chromatin, etc.)

• relevance to a whole process (e.g. cell cycle)

• relevance to the whole organism (e.g. ovulation)

Page 16: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

“omics”: genomics and proteomics

• Main idea: use high throughput as a mean oftackling biological complexity.

DIGE 2d gel DNA microarray SELDI-TOF spectrum

Page 17: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

“omic” research

• Experimental Stage: data collection

• Computational Stage: statistical analysis

• Result: “graveyards” of genes/proteins

CD44 HSP CAT ERP2 RPL1 ENO

SODa TRD PMS DUF ACT GLU

Page 18: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

A protein graveyardCRZ 1 HMO 1 POL 1 SNU 13 RPC 10 SFL 1 SNU 13 RPC 10

BEM 2 NHP 6A DPB 3 PRP 19 RPB 8 GAL 4 PRP 19 RPB 8SYF 1 EPL 1 POL 2 KEM 1 RPB 10 MIG1 KEM 1 RPB 10

CDC 13 SCC 4 RFA 2 SEH 1 RPO 26 HSF 1 SEH 1 RPO 26SHE 3 RSC 9 RFA 3 NPL 6 STB 4 MOT3 NPL 6 STB 4

NCE 4 ISW 1 RFA 1 HOT1 TOA2 STE 12 HOT1 TOA2- ISW 2 RFC 3 DAL 82 TOA1 NUT 1 DAL 82 TOA1

ECM 5 TRA 1 RFC 2 ACE 2 SUA 7 BDF 1 ACE 2 SUA 7EAF 3 - RFC 4 BUR 6/NCB 1TAF1 UME 6 BUR 6/NCB 1TAF1

HFI1 IOC3 TOP1 NCB 2 TAF9 MMS 4 NCB 2 TAF9MSI 1 - TOF2 SSU 72 TAF10 ABF 2 SSU 72 TAF10

CAC 2 CST 6 RNH 35 KIN 28 TAF11 GAT1 KIN 28 TAF11RSC 1 or 2 HOF 1 FOB 1 MOT1 TAF3 RTG1 MOT1 TAF3

RSC 6 ACT 1 SSA 3 RPB 9 TAF6 SKN 7 RPB 9 TAF6RSC 8 ARP 4 SSA 2 RPB 2 TAF12 TAF4/MPT1RPB 2 TAF12

STH 1 ARP 9 GLC 7 RPB 7 TAF7 SIR 2 RPB 7 TAF7SFH 1 ARP 8 TDH 1, 2, 3RPO 21 TAF5 MSN 2 RPO 21 TAF5

RSC 2 ARP 7 PDC 2 RPB 4 SPT 15 MET 31 RPB 4 SPT 15CHD 1 APN 1 HPR 5 RPB 3 TAF2 HAC 1 RPB 3 TAF2

SMC 3 PHR 1 - MED 8 TAF8 SSL 2 MED 8 TAF8IRR 1 NTG2 RIM 1 SRB 8 TFA2 RAD 3 SRB 8 TFA2

SWI 3 MSH 6 MGM 101 MED 2 TFA1 UBP 8 MED 2 TFA1SNF 12 MSH 2 CBF 5 SRB 7 TFG1 CCT 4 SRB 7 TFG1

SNF 2 RAD 26 CBF 2 SSN 2 TAF14 /TFG3RPL 10 SSN 2 TAF14 /TFGSWI 1 RPH 1 CHL 1 SRB 4 TFG2 RPP 0 SRB 4 TFG2

GCN 5 MUS 81 CDC 14 FHL 1 TFB4 RPL 11 A or BFHL 1 TFB4SPT 7 MEC 1 SMC 1 SRB 5 TFB3 RPL 12 A or BSRB 5 TFB3

NGG 1 RAD 52 SMC 2 SRB 2 TFB2 RPL 15 B SRB 2 TFB2SPT 3 RAD 59 SGS 1 MED 6 TFB1 RPL 19 A or BMED 6 TFB1

ADA 2 MSH 3 YCS 4 RGR 1 SSL 1 RPL 1A or BRGR 1 SSL 1YNG 2 RAD 7 MCD 1 MED 11 CCL 1 RPL 25 MED 11 CCL 1

SPT 8 RAD 4 SCC 2 SIN 4 REB 1 RPL 3 SIN 4 REB 1SPT 20 RAD 14 CFT1 CSE 2/MED 9FHL 1 RPL 30 CSE 2/MED 9FHL 1

ESA 1 RAD 23 YSH 1 GAL 11 SUB 1 RPL 34 A or BGAL 11 SUB 1RPD 3 DPB 4 REF 2 MED 7 GIS1 RPL 35 A or BMED 7 GIS1

HTB 2 or 1 RFC 5 NAM 7 MED 4 ARO 80 RPL 4A or BMED 4 ARO 80RSC 58 RFC 1 PAP 1 MED 1 FKH 2 RPL 8A or BMED 1 FKH 2

IOC4 TOP2 PRP 43 RPB 11 - RPS 0A or BRPB 11 -ITC1 TOP3 PRP 9 ROX3 IXR1 RPS 11 A or BROX3 IXR1

NHP 10 MIP 1 PRP 46 SRB 6 SGV 1 RPS 1A SRB 6 SGV 1

Page 19: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

Biological analysis of protein sets

• Biological interpretation requires intimateknowledge of the proteins and is time-consuming.

• Usually only a few proteins are examined.

• How can we interpret the results efficiently?

• How can we understand the results at aproteomic level?

• Solution: analysis of protein annotations.

Page 20: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

Protein annotations

• Annotation (keyword): a binary property of aprotein, from a “library” of properties.

• Cover various biological aspects: function,structure, taxonomy, localization, biologicalpathway…

• Annotations come from different sources.• Growth in annotation amount and variety.• Libraries of annotations allow computational

analysis.

Page 21: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

Major source for incorrectannotations

In the protein world:

1. Wrong gene finding (exon- intron)

2. Premature cleavage -wrong tails (nt sequencingmistakes)

3. ESTs may be misleading

4. Automatic assignment of features

5. Rush due to publication/public/government pressure(human genome, is the worse)

6. No replacement for manual curators

Page 22: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

Biological analysis of protein sets

• Biological interpretation requires intimateknowledge of the proteins and is time-consuming.

• Usually only a few proteins are examined.

• How can we interpret the results efficiently?

• How can we understand the results at aproteomic level?

• Solution: analysis of protein annotations.

Page 23: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

Protein annotations

• Annotation (keyword): a binary property of aprotein, from a “library” of properties.

• Cover various biological aspects: function,structure, taxonomy, localization, biologicalpathway…

• Annotations come from different sources.• Growth in annotation amount and variety.• Libraries of annotations allow computational

analysis.

Page 24: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

Annotation types

Page 25: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

Gene Ontology (GO)

GO provides controlled

annotations of :

1. Molecular function

2. Biological process

3. Cellular component

The annotations arepart of a hierarchicalgraph, in which eachGO term has a parentor parents, and mighthave child terms.

Page 26: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

Assigning GO termsto proteins

Nuclear

Protein Hcc-1SWISS-PROT: P82979

Cellular Component

GO:0005634 - nucleus

Molecular FunctionGO:0003676 – nucleic acid bindingGO:0003677 – DNA binding

Biological ProcessGO:0006350 – transcriptionGO:0006355 – transcription regulationGO:0006417 – general regulation ofprotein biosynthesisGO:0006355 – translational regulation

Page 27: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

Advantages of using GO terms

• Assigned to most of the proteins in SWISS-PROT (~100%).

• Checked one-by-one by experts (EBI).

• Comprehensive : based on SWISS-PROT keywords, EC number,InterPro keywords and PubMed abstracts.

• Tree-like structure (DAG) : using the hierarchy to find the “best” GO

term describing a protein or a set of proteins.

Page 28: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

Semantic similarity measures

“Information content” : “chaperone” is more informative term than

“signal transducer” because the former is used several hundredtimes, while the latter is used several thousand times.

GO:0003674 : molecular function

GO:0004871 : signal transducer

GO:0004872 : receptor

GO:0009881 : photoreceptor

GO:0004888 : transmembrane receptor

Page 29: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

Protein annotations

9% have more than 20 annotation per protein (not including Taxonomy)

Page 30: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

Annotation types• GO annotation has a broad distribution, the accuracylevel is very different• Some overlap in keywords but different definition

Page 31: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

Computational analysis –naïve

Something is missing…

60membrane

40enzyme

amountannotation

100 proteins:

Summation: a naïve method forprotein set analysis.

Page 32: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

Intersection and inclusion

60 membrane40 enzyme

enzymemembrane membraneenzyme

enzyme

membrane

Page 33: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

Page 34: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

Page 35: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

• A web-base tool aimed at biologicalanalysis of protein sets.

• Biological information is shown throughintersection and inclusion.

• Goal: provide a “biological roadmap” ofthe protein set.

Page 36: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

enzyme

cytoplasm

hydrolase

transcription

nucleus

kinase

Method

enzyme

cytoplasm

hydrolase

transcription

nucleus

kinase

P1

100110

P2

110110

P3

111000

P4

111001

P5

111000

P6

111001

enzyme

hydrolase

cytoplasmnucleus

transcription

kinase

cytoplasmtranscription

nucleus

P1 P2 P3 P4 P5 P6

P2 P3 P4 P5 P6

P3 P4 P5 P6

P4 P6

P1 P2

P2

= 6

= 5

= 4

= 2

= 2

= 2

Page 37: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

10101000100000100100101010100010000010010010101010001000001001001010101000100000100100101010100010000010010010101010001000001001001010101000100000101111000000010101011111110100101010100010000010010010101010001000001001001010101000100000100100101010100010101011111111111101000000101010111111111110000000101010111111111110000000101010111111111111101001010100000001010101111111111100000001010101111111111100000001010101111111111100000001010101111000000000011111110000000101010111111111110000000101010111111111110000000101010111111111110000000101010111111111110000000010101000100000100100101010100010000010010010101010001000001001001010101000100000100100101010100010000010010010101010001000001001001010101000100000101111000000010101011111110100101010100010000010010010101010001000001001001010101000100000100100101010100010101011111111111101000000101010111111111110000000101010111111111110000000101010111111111111101001010100000001010101111111111100000001010101111111111100000001010101111111111100000001010101111000000000011111110000000101010111111111110000000101010111111111110000000101010111111111110000000101010111111111110000000101010111111110101010010111100000001010101111111111111100000001010101111111111101111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111100100000100100101010100010000010010010101010001000001001001010101000100000100100101010100010000010010010101010001000001001001010101000100000100100101010100010000011111000000010101011111110010010101010001000001001001010101000100000100100101010100010000010010010101011111000000010101011111110001000001001001010101000100000100100101010100010000010010010000000011111111111111111111111000000001010100001111111111111111111000000010101011111111111000000010101011111111111000000010101011111111111000000010101011111111111000000010101011000000000000000000001111000000010101011111111111000000101010101001100101010111111100101010101001011111000000010101011111111111111010101000101010010111111111110000000101010111111111111010000001010101111111111100000001010101111111111100000001010101111111111111010010101000000010101011111111111000000010101011111111111000000010101011111111111000000010101011110000000000111111100000001010101111111111100000001010101111111111100000001010101111111111100000001010101111111111100000001010101111111101010100101111000000010101011111111111111000000010101011111111111011111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111101110000000101010111111111110000000101010111111100001010001111000000010101011111101010100111111000000010101111111101010100101111000000010101011111111111111000000010101011111111111011111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111101110000000101010111111111110000000101010111111100001010001111000000010101011111101010100111111000000010101

Page 38: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

Graph complexity

• 20 keywords: >1,000,000 nodes

• This worst-case doesn’t occur for largeK values in the protein-keyword world.

• Still, highly complex graphs do occur.

KK

n n

K2

1

=

∑=

Theoretical complexity:

Page 39: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

Re• A user-controlledthreshold trading graphaccuracy for simplicity.

• Represents the maximallevel of error allowed, inproteins.

40

1022

8

40

22

10

35

15 15

14

35

16

Resolution = 2 proteins solution

Page 40: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

Annotation types

Some non informativewords -complete genomeDisease…

Some are partiallyAnnotated EC x.y.

Growing very fastStill many terms are inconsistent

Page 41: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

The power of integration

• Integration of various biological aspects:structure, function, taxonomy, localization,pathways…

• Integration of various characteristics(e.g. scop).

• Methods: full integration, “zooming”.

Page 42: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

Biological examples

• Take set of all 576 proteins annotated by ‘GOmolecular function’ as ‘anion channel’.

• View through InterPro keywords (sequentialsignatures).

Page 43: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

zoom

BASICSET

GABA Areceptor

Neurotransmitter-gated ion channel

Nicotinicacetylcholine

receptorVoltage-gated

chloride channel

Intracellularchloride channel

H+ transportingATPase

Eukaryotic porin

InterProNumber of

proteins

Sensitivity: TP/(TP+FN)

red = FN white = TP

Page 44: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

InterPro

alphasubunit

betasubunit

gammasubunit

GABA Areceptor

Page 45: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

TaxonomyEukaryota

chordata

drosophilla

C. elegans

human

chickenmammalia

rodentia

Page 46: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

Resolution:

0 proteins

Page 47: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

Resolution: 1

Page 48: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

Resolution: 2

Page 49: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

Resolution: 5

Page 50: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

Resolution: 8

Page 51: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

Resolution: 15

Page 52: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

Resolution: 30

Page 53: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

Biological examples - ProtoNet

• Very large cluster of GTP binding proteins (A244299)

Page 54: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

Biological examples - ProtoNet

• Very large cluster of UREASE SF (as in talk III)

17 different enzymatic groups

Highly pure (while)

Page 55: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

Biological examples - experimental

• Comparative proteomic experiment: E. coli responseto benzoic acid (Yan et al, 2002).

• A set of 51 proteins are down-regulated by a factor of1.3 or more.

• Benzoic acid is known to inhibit E. coli growth (Lambert etal, 1997).

• Could we guess this without examining individualproteins?

Page 56: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

TransportMetabolism

Biosynthesis

Cell growth and/ormaintenance

Amino acidbiosynthesis

Vitaminbiosynthesis

Proteinbiosynthesis

Nitrogenmetabolism

Coenzymebiosynthesis

Lipidmetabolism

Phosphatemetabolism

Carbohydratemetabolism

GO biological process

Page 57: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

False Annotations

AutomatedConsistency of KWConnectivity

Tested against ProSite(manually)

78% correct +9% fullSeparation (TP/FP)

Page 58: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

Detecting false annotations• Automatic statistical annotation methods are susceptible to

both errors of type I and II.

• False positives are especially problematic because ofincorrect annotation transfer.

• InterPro ‘Glutamine synthetase’ – 131 proteins

• Glutamine synthetase (Glutamate ammonia ligase)reaction:

ATP + L-glutamate + NH3 ∆ ADP + phosphate + L-glutamine

Page 59: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

False-positives

Glutaminesynthetase

Ligase

GlutamineSynthetase class Iadenylation site

Outer-membrane,Virulence,

Bacterial Ig-like,Bacterial adhesion mediator,

Peptidoglycan-binding Actin-binding,WD repeat,Coiled coil

InterPro

SwissProt

ENZYME

Page 60: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

Conclusion

• PANDORA offers:– Interactive comprehensible graph display.

– Full protein-keyword intersection andinclusion relations.

– User-controlled data simplification.

– Integration of 6 annotation sources.

– Detection of false annotations.

Page 61: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

Future plans

• Enlarge and enrich sources.

• An automatic detection method of falseannotations.

• Deal with quantitative properties.

Page 62: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

Quantitative properties

• Not all biological properties are naturallybinary.

• Some interesting quantitative properties areuser-specific (e.g. change in expressionlevel).

Page 63: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

Page 64: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

Page 65: 2. Annotation 1. Enzyme Seq-Fun talk IV - huji.ac.il · PDF file1. Enzyme Seq-Fun 2. Annotation 3. ... TFG2 RPP 0 SRB 4 TFG2 ... FKH2 RPL 8A or BMED 1 FKH2 IOC4 TOP2 PRP 43 RPB 11-RPS

ISSCB 09/03 M. Linial

www.pandora.cs.huji.ac.il

We come in peace…