You can request PRO terms by using the SourceForge PRO tracker (Fig 3A) or by directly contributing...

1
You can request PRO terms by using the SourceForge PRO tracker (Fig 3A) or by directly contributing to PRO by providing the information in the RACE-PRO annotation tool (Fig 3B), a user-oriented interface to enter the experimental information about protein forms, therefore enabling and fostering contribution by domain experts. Submission is subject to editorial review and the data is the input to a program that generates PRO terms (Fig 3 C). Protein Ontology to provide specificity to protein and complex annotations Cecilia Arighi 1 , Harold Drabkin 2 , and PRO consortium* 1 Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, 2 The Jackson Laboratory, Bar Harbor, ME Natale D A et al. Nucl. Acids Res. 2014;42:D415- D421 Fig.1- Illustration of PRO categories and relation to external resources. Categories are listed along the top, with example terms for IRF5 (interferon regulatory factor 5) shown directly below. Not all terms and relationships are shown. IRF, interferon regulatory factor; Phos, phosphorylated; m, mouse; h, human; iso, isoform; BMv, bone marrow variant. Protein Protein Complex: Use GO terms Relations: Protein gene level -> UniProtKB Key Is_a has_component Background The Protein Ontology (PRO; http://www.proconsortium.org/) defines protein and protein complex entities representing their major forms and relations among them. Protein entities represented in PRO denote single amino acid chains and are categorized by level of specificity into evolutionary family, products of a single gene in one species, products of a single transcript and post- translationally modified forms, among others. PRO specializes in organism- specific protein complexes, their components and their modified forms. PRO’s scope includes the 12 GO reference genomes. PRO works with and complements established sequence oriented databases such as UniProtKB and it is interoperable with other biomedical and biological ontologies such as the Gene Ontology (GO), where the PRO organism-specific complexes are subclasses of the organism-agnostic protein complex terms in the GO Cellular Component Ontology (Fig.1). Fig 3. External Contributions via PRO Website A-PRO homepage with links to the annotation resources: SourceForge tracker and RACE-PRO. B- RACE-PRO annotation interface. Example of annotation of the BH3-interacting domain death agonist p15 cleaved form. C- Integrated view of information (ontology, annotation, and mapping) in the PRO entry report B A-Enter accession or paste sequence B-Define protein region and/or PTMs D- Data source C-Enter protein form name E- Annotatio n C PRO enables annotation to multiple levels of granularity How do I request PRO terms? Annotatio n tool Term request via Sourceforge How PRO is being used? B-Individual processed proteins from polyproteins Figure source: http://viralzone.expasy.org/all_by_species/43.html d=derives from P: GO:0039563,suppressio n by virus of host STAT1 activity (EXP, PMID:14612562) A-Isoforms D-Complex with detail subunit composition P: GO:0002756, MyD88- independent toll-like receptor signaling pathway (EXP, PMID:18222170 ) P: GO:0002756, MyD88- independent toll-like receptor signaling pathway (EXP, Reactome:REACT_6809 ) C-Protein variants agent in DOID:9246, cerebral amyloid angiopathy agent in DOID:10652, Alzheimer’s disease E-Family-type terms has part PF:00001 , 7 transmembrane receptor has part PF:03827 , Orexin receptor type 2 GO GO Species-agnostic Complex Species-specific Fig.2- Examples of PRO levels of granularity with accompanying annotation. A) Different isoforms of PIP5K1C, B) individual dengue proteins, C) Sequence variants of APP protein, D) TLR4 complex in human and mouse with subunit composition including PTMs, and E) Family-type of terms depicting the Orexin Receptor A Sumolyated form of isoform 1 Unsumolyated form of isoform 1 The gene product of Irf8 appears to be involved in transcription regul … or NOT PRO is being utilized in multiple ways, some examples: Entity tagging and semantic integration Definition of terms in other ontologi Description of protein/complex networks Gene Ontology Annotation GO curators at MGI and Pombase are actively requesting PRO terms for isoforms and modified forms. The annotations can be viewed in new interface Amigo2. PRO entry report A-Ontology information B-Count of Lck human-related terms at different levels C-Visualization G-List of complexes LcK human is component of Report for LCK human D-Sequence LcK human H-Annotations to terms related to LcK human F-List of all Protein forms related to LcK human E-Multiple Alignment of protein forms related to LcK human with modification sites highlighted Fig.4- The PRO entry report for the species-specific protein (gene level) contains the summary of the protein forms, complexes, annotations and sequences for such protein. *PRO Consortium Conclusion : PRO versatility of proteins and complex representation enables its use at multiple levels of granularity

Transcript of You can request PRO terms by using the SourceForge PRO tracker (Fig 3A) or by directly contributing...

Page 1: You can request PRO terms by using the SourceForge PRO tracker (Fig 3A) or by directly contributing to PRO by providing the information in the RACE-PRO.

You can request PRO terms by using the SourceForge PRO tracker (Fig 3A) or by directly contributing to PRO by providing the information in the RACE-PRO annotation tool (Fig 3B), a user-oriented interface to enter the experimental information about protein forms, therefore enabling and fostering contribution by domain experts. Submission is subject to editorial review and the data is the input to a program that generates PRO terms (Fig 3 C).

Protein Ontology to provide specificity to protein and complex annotationsCecilia Arighi1, Harold Drabkin2, and PRO consortium*

1Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, 2 The Jackson Laboratory, Bar Harbor, ME

Natale D A et al. Nucl. Acids Res. 2014;42:D415-D421

Fig.1- Illustration of PRO categories and relation to external resources. Categories are listed along the top, with example terms for IRF5 (interferon regulatory factor 5) shown directly below. Not all terms and relationships are shown. IRF, interferon regulatory factor; Phos, phosphorylated; m, mouse; h, human; iso, isoform; BMv, bone marrow variant.

Protein Protein

Complex: Use GO terms

Relations:

Protein gene level -> UniProtKB

Key

Is_a has_component

Background

The Protein Ontology (PRO; http://www.proconsortium.org/) defines protein and protein complex entities representing their major forms and relations among them. Protein entities represented in PRO denote single amino acid chains and are categorized by level of specificity into evolutionary family, products of a single gene in one species, products of a single transcript and post-translationally modified forms, among others. PRO specializes in organism-specific protein complexes, their components and their modified forms. PRO’s scope includes the 12 GO reference genomes. PRO works with and complements established sequence oriented databases such as UniProtKB and it is interoperable with other biomedical and biological ontologies such as the Gene Ontology (GO), where the PRO organism-specific complexes are subclasses of the organism-agnostic protein complex terms in the GO Cellular Component Ontology (Fig.1).

Fig 3. External Contributions via PRO Website A-PRO homepage with links to the annotation resources: SourceForge tracker and RACE-PRO. B- RACE-PRO annotation interface. Example of annotation of the BH3-interacting domain death agonist p15 cleaved form. C- Integrated view of information (ontology, annotation, and mapping) in the PRO entry report

B

A-Enter accession or paste sequence

B-Define protein region and/or PTMs

D- Data source

C-Enter protein form name

E- Annotation

C

PRO enables annotation to multiple levels of granularity

How do I request PRO terms?

Annotation tool

Term request via Sourceforge

How PRO is being used?

B-Individual processed proteins from polyproteinsFigure source: http://viralzone.expasy.org/all_by_species/43.html

d=derives from

P: GO:0039563,suppression by virus of host STAT1 activity (EXP, PMID:14612562)

A-Isoforms

D-Complex with detail subunit composition

P: GO:0002756, MyD88-independent toll-like receptor signaling pathway (EXP, PMID:18222170 )

P: GO:0002756, MyD88-independent toll-like receptor signaling pathway (EXP, Reactome:REACT_6809 )

C-Protein variants

agent in DOID:9246, cerebral amyloid angiopathy

agent in DOID:10652, Alzheimer’s disease

E-Family-type terms

has part PF:00001 , 7 transmembrane receptor

has part PF:03827 , Orexin receptor type 2

GOGO

Species-agnostic

Complex

Species-specific

Fig.2- Examples of PRO levels of granularity with accompanying annotation. A) Different isoforms of PIP5K1C, B) individual dengue proteins, C) Sequence variants of APP protein, D) TLR4 complex in human and mouse with subunit composition including PTMs, and E) Family-type of terms depicting the Orexin Receptor

A

Sumolyated form of isoform 1

Unsumolyated form of isoform 1

The gene product of Irf8 appears to be involved in transcription regulation … or NOT

PRO is being utilized in multiple ways, some examples:

Entity tagging and semantic integration

Definition of terms in other ontologies

Description of protein/complex networks

Gene Ontology Annotation

GO curators at MGI and Pombase are actively requesting PRO terms for isoforms and modified forms. The annotations can be viewed in new interface Amigo2.

PRO entry report

A-Ontology information

B-Count of Lck human-related terms at different levels

C-Visualization

G-List of complexes LcK human is component of

Report for LCK human

D-Sequence LcK human

H-Annotations to terms related to LcK human

F-List of all Protein forms related to LcK human

E-Multiple Alignment of protein forms related to LcK human with modification sites highlighted

Fig.4- The PRO entry report for the species-specific protein (gene level) contains the summary of the protein forms, complexes, annotations and sequences for such protein. *PRO Consortium

Conclusion: PRO versatility of proteins and complex representation enables its use at multiple levels of granularity