ICAR2016 TAIR talk
-
Upload
donghui-li -
Category
Education
-
view
131 -
download
0
Transcript of ICAR2016 TAIR talk
TAIR: A Sustainable Community Resource for Arabidopsis Research
International Conference on Arabidopsis Research (ICAR 2016), GyeongJu, Korea
1. TAIR: a sustainable community resource for Arabidopsis research (Eva Huala)
2. Using biological ontologies to accelerate progress in plant biology research (Donghui Li)
3. Community annotation: making your data and publication more discoverable (Donghui Li)
Using biological ontologies to accelerate progress in plant biology research
Donghui Li
TAIR/Phoenix Bioinformatics
Every year, an average of: • Over 3000 Arabidopsis research articles are added • Over 2000 papers are associated with genes • Over 400 articles have gene function, expression or
phenotype data extracted • Over 5000 experiment-based annotations are added
using controlled vocabularies (GO and PO ontologies)
Producing a ‘gold standard’ annotated reference plant genome
Highly structured, searchable, computable functional annotations
• How do we use biological ontologies to annotate Arabidopsis gene function?
• How to read/interpret annotations?
• What can you do with these annotations?
Outline
Why do we need ontologies?
Inconsistency in free text: Different names for the same concept
translation, protein synthesis Same name for different concepts
Bud initiation?
A Gene Ontology (GO) term
Accession: GO:0006412 Name: translation Ontology: biological_process Synonyms: protein anabolism, protein biosynthesis, protein biosynthetic process, protein formation, protein synthesis, protein translation Definition: The cellular metabolic process in which a protein is formed, using the sequence of a mature mRNA molecule to specify the sequence of amino acids in a polypeptide chain. Translation is mediated by the ribosome, and begins with the formation of a ternary complex between aminoacylated initiator methionine tRNA, GTP, and initiation factor 2, which subsequently associates with the small subunit of the ribosome and an mRNA. Translation ends with the release of a polypeptide chain from the ribosome. Source: GOC:go_curators
molecular function: catalytic / binding activities kinase activity, DNA binding activity
biological process: biological goal or objective
protein translation, mitosis cellular component: location or complex
nucleus, ribosome, proteasome
More info at www.geneontology.org
Gene Ontology (GO)
Experimental evidence codes (EXP) IDA Inferred from Direct Assay (enzyme assays, in situ hybridization) IMP Inferred from Mutant Phenotype (analysis of visible trait) IPI Inferred from Physical Interaction (yeast-2-hybrid) IEP Inferred from Expression Pattern (RT-PCR, Western blot) IGI Inferred from Genetic Interaction (double mutant analysis)
Examples
http://geneontology.org/page/guide-go-evidence-codes
Commonly used evidence codes
Experimental evidence codes (EXP) IDA Inferred from Direct Assay (enzyme assays, in situ hybridization) IMP Inferred from Mutant Phenotype (analysis of visible trait) IPI Inferred from Physical Interaction (yeast-2-hybrid) IEP Inferred from Expression Pattern (RT-PCR, Western blot) IGI Inferred from Genetic Interaction (double mutant analysis) Computational Analysis Evidence Codes (non-EXP) ISS Inferred from Sequence or Structural Similarity
- based on published sequence alignment IEA Inferred from Electronic Annotation
- InterPro2GO
Examples
http://geneontology.org/page/guide-go-evidence-codes
Commonly used evidence codes
Evidence code
Annotation counts %
Evidence code
Annotation counts %
EXP 95,435 34.7 IDA 56,271 20.4 IEP 6,651 2.4 IGI 4,286 1.6 IMP 19,441 7.1 IPI 8,786 3.2
Non-EXP 179,801 66.2 Total 275,236 101
Summary of Arabidopsis GO annotations in TAIR
Notes: 9,186 unique publications used in EXP annotations Based on TAIR ATH_GO_GOSLIM.txt 2016-06-05
- Querygenefunc,oninforma,on- GOannota,onprojec,on- Func,onalcategoriza,on- Termenrichment
Application: What can you do with TAIR GO/PO annotations?
Get annotations for individual genes from the TAIR locus page
Gene Ontology annotations
Plant Ontology annotations
Get annotations for individual genes from the TAIR locus page
Other functional information:
Gene summary Polymorphism
Phenotype Publications
Gene symbols
- Querygenefunc,oninforma,on- GOannota,onprojec,on- Func,onalcategoriza,on- Termenrichment
Application: What can you do with TAIR GO/PO annotations?
Source: http://geneontology.org/page/current-go-statistics 2016-06-03
Rat
Human
Mouse
Arabidopsis Zebrafish
Worm Chicken
Fly Yeast Rice E coli
GO annotations by species
Annotating new plant genomes by projecting GO terms from Arabidopsis onto other non-model plant species based on gene orthology
EnsemblPlants Compara
• Use the Compara pipeline to build orthology • Automatically transfer GO annotations to plant orthologs
Rulesü atleasta40%pep,deiden,tytoeachotherü onlyGOannota,onswithanevidencetypeofIDA,IEP,IGI,
IMPorIPIareprojectedü noannota,onswitha'NOT'qualifierareprojectedü annota,onstotheGO:0005515proteinbindingtermarenot
projected
- Querygenefunc,oninforma,on- GOannota,onprojec,on- Func,onalcategoriza,on- Termenrichment
Application: What can you do with TAIR GO/PO annotations?
Biological process
Functional category Gene count
Overrepresentation statistical test:
In my list of genes, are any functional classes (for example a GO process) found more often than
expected when compared with the reference list?
Term enrichment analysis
Model for the regulation of long-term drought responses in Q. suber root
Model for ABA-dependent drought response in cork oak
1 The main activity of TAIR curators is producing a ‘gold standard’
annotated reference genome dataset by integrating experimental data from the research literature. New annotations are constantly added.
2 One common use of TAIR is to infer the function of genes in agriculturally important species based on orthology to Arabidopsis genes.
3 TAIR’s annotations are used in applications such as functional categorization, term enrichment. It is important to use the latest annotation file from TAIR.
Summary
1. Pre-publication: register your gene symbol to minimize accidental duplications in gene nomenclature
2. Preparing your manuscript: include AGI locus identifiers
3. Post-publication: submit your annotation to us (any journal)
Tips to make your research more discoverable
AT1G56650 PAP1 PRODUCTION OF ANTHOCYANIN PIGMENT 1AT2G01180 PAP1 PHOSPHATIDIC ACID PHOSPHATASE 1AT2G27190 PAP1 PURPLE ACID PHOSPHATASE 1AT3G16500 PAP1 PHYTOCHROME-ASSOCIATED PROTEIN 1
Gene name duplication make it harder to find the right gene
Plant Cell Physiol. 2010 Jun;51(6):866-76
Plant Cell Physiol. Jun;51(6):877-83
Conflicting nomenclature / error in publication not uncommon
• “I do profit a lot from the data on TAIR, thus
this submission is a small contribution to extend the data present on TAIR.”
• “I gratefully did it [data submission] because I already benefit from similar information for other genes.”
Community feedback