UniProt-GOA

40
Introduction to the Gene Ontology and GO annotation resources Rachael Huntley UniProt-GOA EBI PAGXX January 2012

description

Event: Plant and Animal Genomes conference 2012 Speaker: Rachael Huntley The Gene Ontology (GO) is a well-established, structured vocabulary used in the functional annotation of gene products. GO terms are used to replace the multiple nomenclatures used by scientific databases that can hamper data integration. Currently, GO consists of more than 35,000 terms describing the molecular function, biological process and subcellular location of a gene product in a generic cell. The UniProt-Gene Ontology Annotation (UniProt-GOA) database1 provides high-quality manual and electronic GO annotations to proteins within UniProt. By annotating well-studied proteins with GO terms and transferring this knowledge to less well-studied and novel proteins that are highly similar, we offer a valuable contribution to the understanding of all proteomes. UniProt-GOA provides annotated entries for over 387,000 species and is the largest and most comprehensive open-source contributor of annotations to the GO Consortium annotation effort. Annotation files for various proteomes are released each month, including human, mouse, rat, zebrafish, cow, chicken, dog, pig, Arabidopsis and Dictyostelium, as well as a file for the multiple species within UniProt. The UniProt-GOA dataset can be queried through our user-friendly QuickGO browser2 or downloaded in a parsable format via the EBI3 and GO Consortium FTP4 sites. The UniProt-GOA dataset has increasingly been integrated into tools that aid in the analysis of large datasets resulting from high-throughput experiments thus assisting researchers in biological interpretation of their results. The annotations produced by UniProt-GOA are additionally cross-referenced in databases such as Ensembl and NCBI Entrez Gene. 1 http://www.ebi.ac.uk/GOA 2 http://www.ebi.ac.uk/QuickGO 3 ftp://ftp.ebi.ac.uk/pub/databases/GO/goa 4 ftp://ftp.geneontology.org/pub/go/gene-associations

Transcript of UniProt-GOA

Page 1: UniProt-GOA

Introduction to the Gene Ontologyand GO annotation resources

Rachael Huntley

UniProt-GOA

EBI

PAGXX

January 2012

Page 2: UniProt-GOA

2

Why do we need GO?

Page 3: UniProt-GOA

Reasons for the Gene Ontology

www.geneontology.org

• Inconsistency in English language

Page 4: UniProt-GOA

Inconsistency in English languauge

• Same name for different concepts

or

??

Cell

Page 5: UniProt-GOA

Comparison is difficult – in particular across species or across databases

Just one reason why the Gene Ontology (GO) is is needed…

• Different names for the same concept

Eggplant

Aubergine

Brinjal

Melongene

Same for biological concepts

Page 6: UniProt-GOA

Reasons for the Gene Ontology

www.geneontology.org

• Inconsistency in English language

• Increasing amounts of biological data available

• Increasing amounts of biological data to come

Page 7: UniProt-GOA

Search on ‘DNA repair’...get almost 65,000 results

Increasing amounts of biological data available

Expansion of sequence information

Page 8: UniProt-GOA

Reasons for the Gene Ontology

www.geneontology.org

• Inconsistency in English language

• Large datasets need to be interpreted quickly

• Increasing amounts of biological data available

• Increasing amounts of biological data to come

Page 9: UniProt-GOA

• A way to capture biological knowledge for individual gene productsin a written and computable form

The Gene Ontology

• A set of concepts and their relationships to each other arrangedas a hierarchy

www.ebi.ac.uk/QuickGO

Less specific concepts

More specific concepts

Page 10: UniProt-GOA

The Concepts in GO

1. Molecular Function

2. Biological Process

3. Cellular Component

An elemental activity or task or job

• protein kinase activity• insulin receptor activity

A commonly recognised series of events

• cell division

Where a gene product is located

• mitochondrion

• mitochondrial matrix

• mitochondrial inner membrane

Page 11: UniProt-GOA

Anatomy of a GO term

Unique identifierUnique identifier

Term nameTerm name

DefinitionDefinitionSynonymsSynonyms

Cross-referencesCross-references

Page 12: UniProt-GOA

Ontology structure

• Directed acyclic graphTerms can have more than one parent

• Terms are linked by relationships

is_a

part_of

regulates (and +/- regulates)

www.ebi.ac.uk/QuickGOoccurs_in

has_part

These relationships allow for complex analysis of large datasets

Page 13: UniProt-GOA

Reactome

http://www.geneontology.org

Page 14: UniProt-GOA

• Compile the ontologies

- currently over 35,000 terms - constantly increasing and improving

• Annotate gene products using ontology terms

- around 30 groups provide annotations

• Provide a public resource of data and tools

- regular releases of annotations - tools for browsing/querying annotations and editing the ontology

Aims of the GO project

Page 15: UniProt-GOA

GO Annotation

Page 16: UniProt-GOA

UniProt-Gene Ontology Annotation (UniProt-GOA) database at the EBI

• Largest open-source contributor of annotations to GO

• Provides annotation for more than 397,000 species

• Our priority is to annotate the human proteome

Page 17: UniProt-GOA

A GO annotation is … …a statement that a gene product;

1. has a particular molecular function or is involved in a particular biological process

or is located within a certain cellular component

2. as determined by a particular method

3. as described in a particular reference

P00505

Accession Name GO ID GO term name Reference Evidence code

IDAPMID:2731362aspartate transaminase activityGO:0004069GOT2

Page 18: UniProt-GOA

Electronic Annotation

Manual Annotation

UniProt-GOA incorporates annotations made using two methods

• Quick way of producing large numbers of annotations• Annotations use less-specific GO terms

• Time-consuming process producing lower numbers of annotations• Annotations tend to use very specific GO terms

• Only source of annotation for many non-model organism species

Page 19: UniProt-GOA

1. Mapping of external concepts to GO terms

e.g. InterPro2GO, UniProt Keyword2GO, Enzyme Commission2GO

Electronic annotation methods

GO:0004715 ; non-membrane spanning protein tyrosine kinase activity

Page 20: UniProt-GOA

Annotations are high-quality and have an explanation of the method (GO_REF)

Macaque

Mouse

DogCow

Guinea PigChimpanzee Rat

Chicken

Ensembl compara

2. Automatic transfer of manual annotations to orthologs

...and more

e.g. Human

Arabidopsis

Rice

Brachypodium

Maize

Poplar

Grape

…and moreEnsembl compara

Electronic annotation methods

Page 21: UniProt-GOA

Manual annotation by GOA

High–quality, specific annotations made using:

• Full text peer-reviewed papers

• A range of evidence codes to categorise the types of evidence found in a

papere.g. IDA, IMP, IPI

http://www.ebi.ac.uk/GOA

Page 22: UniProt-GOA

* Includes manual annotations integrated from external model organism and specialist groups

920,557Manual annotations*

110,247,289Electronic annotations

Dec 2011 Statistics

Number of annotations in UniProt-GOA database

Page 23: UniProt-GOA

How to access and use

GO annotation data

Page 24: UniProt-GOA

Where can you find annotations?

UniProtKB

Ensembl

Entrez gene

Page 25: UniProt-GOA

GO Consortium website

Gene Association Files17 column files containing all information for each annotation

http://www.ebi.ac.uk/GOA/downloads.html

UniProt-GOA websiteNumerous species-specific files

Page 26: UniProt-GOA

GO browsers

Page 27: UniProt-GOA

http://www.ebi.ac.uk/QuickGO

The EBI's QuickGO browser

Search GO terms or proteins

Find sets of GO annotations

Page 28: UniProt-GOA

• Access gene product functional information

• Analyse high-throughput genomic or proteomic datasets

• Validation of experimental techniques

• Get a broad overview of a proteome

• Obtain functional information for novel gene products 

How scientists use the GO

Some examples…

Page 29: UniProt-GOA

Term enrichment

• Most popular type of GO analysis

• Determines which GO terms are more often associated with a specified list of genes/proteins compared with a control list or rest of genome

• Many tools available to do this analysis

• User must decide which is best for their analysis

Page 30: UniProt-GOA

Numerous Third Party Tools

www.geneontology.org/GO.tools.shtml

Page 31: UniProt-GOA

Selected Gene Tree: pearson lw n3d ...Branch color classification:Set_LW_n3d_5p_...

Colored by: Copy of Copy of C5_RMA (Defa...Gene List: all genes (14010)

attacked

time

control

Puparial adhesionMolting cycleHemocyanin

Defense responseImmune responseResponse to stimulusToll regulated genesJAK-STAT regulated genes

Immune responseToll regulated genes

Amino acid catabolismLipid metobolism

Peptidase activityProtein catabolismImmune response

Selected Gene Tree: pearson lw n3d ...Branch color classification:Set_LW_n3d_5p_...

Colored by: Copy of Copy of C5_RMA (Defa...Gene List: all genes (14010)

Bregje Wertheim at the Centre for Evolutionary Genomics, Department of Biology, UCL and Eugene Schuster Group, EBI.

MicroArray data analysis

Analysis of high-throughput genomic datasets

Page 32: UniProt-GOA

Validation of experimental techniques

(Cao et al., Journal of Proteome Research 2006)

Rat liver plasma membrane isolation

Page 33: UniProt-GOA

Annotating novel sequences

• Can use BLAST queries to find similar sequences with GO annotation which can be transferred to the new sequence

• Two tools currently available;

AmiGO BLAST – searches the GO Consortium database

BLAST2GO – searches the NCBI database

Page 34: UniProt-GOA

Using the GO to provide a functional overview for a large dataset

• Many GO analysis tools use GO slims to give a broad overview of the dataset

• GO slims are cut-down versions of the GO andcontain a subset of the terms in the whole GO

• GO slims usually contain less-specialised GO terms

Page 35: UniProt-GOA

Slimming the GO using the ‘true path rule’ Many gene products are associated with a large number of descriptive, leaf GO nodes:

Page 36: UniProt-GOA

Slimming the GO using the ‘true path rule’ …however annotations can be mapped up to a smaller set of parent GO terms:

Page 37: UniProt-GOA

GO slims

or you can make your own using;

Custom slims are available for download;

http://www.geneontology.org/GO.slims.shtml

• AmiGO's GO slimmer

• QuickGOhttp://www.ebi.ac.uk/QuickGO

http://amigo.geneontology.org/cgi-bin/amigo/slimmer

Page 38: UniProt-GOA

www.ebi.ac.uk/QuickGO

Map-up annotations with GO slims

The EBI's QuickGO browser

Search GO terms or proteins

Find sets of GO annotations

Questions on how to use QuickGO?Contact [email protected]

Page 39: UniProt-GOA

The UniProt-GOA group

Curators:

Software developer:

Team leaders:

Emily DimmerRachael Huntley

Rolf Apweiler

Email: [email protected]

http://www.ebi.ac.uk/GOA

Claire O’Donovan

Tony Sawford

Yasmin Alam-FaruquePrudence Mutowo

Page 40: UniProt-GOA

Acknowledgements

GO ConsortiumMembers of;

UniProtKB

InterPro

IntAct

HAMAP

Ensembl

Ensembl Genomes

FundingNational Human Genome Research Institute (NHGRI)Kidney Research UKBritish Heart FoundationEMBL