Modeling Functional Genomics Datasets CVM8890-101
description
Transcript of Modeling Functional Genomics Datasets CVM8890-101
![Page 1: Modeling Functional Genomics Datasets CVM8890-101](https://reader036.fdocuments.net/reader036/viewer/2022062301/568140e2550346895dacaf50/html5/thumbnails/1.jpg)
Modeling Functional Genomics Modeling Functional Genomics DatasetsDatasets
CVM8890-101CVM8890-101
Lesson 2Lesson 2
13 June 200713 June 2007 Teresia BuzaTeresia Buza
![Page 2: Modeling Functional Genomics Datasets CVM8890-101](https://reader036.fdocuments.net/reader036/viewer/2022062301/568140e2550346895dacaf50/html5/thumbnails/2.jpg)
Lesson 2: Introduction to Lesson 2: Introduction to functional annotation. functional annotation.
Orthologs and homologs; Orthologs and homologs; clusters of orthologous clusters of orthologous
genes (COGs) and the gene genes (COGs) and the gene ontology (GO); and how to ontology (GO); and how to
find what functional find what functional annotation is available. annotation is available.
![Page 3: Modeling Functional Genomics Datasets CVM8890-101](https://reader036.fdocuments.net/reader036/viewer/2022062301/568140e2550346895dacaf50/html5/thumbnails/3.jpg)
1.Introduction to Functional Annotation
![Page 4: Modeling Functional Genomics Datasets CVM8890-101](https://reader036.fdocuments.net/reader036/viewer/2022062301/568140e2550346895dacaf50/html5/thumbnails/4.jpg)
ATGTCCTATCCATGTCGTACAGATTGACGAGAT
Genomic hypothesisGenome
Protein
mRNA transcript
Gene
Transcriptome
Proteome
Central Dogma New technology
Genome sequencing
Transcript profiling
Protein quantification
What next?
Where are we?Where are we?
Functional annotation
Structural annotation
What is all this?
![Page 5: Modeling Functional Genomics Datasets CVM8890-101](https://reader036.fdocuments.net/reader036/viewer/2022062301/568140e2550346895dacaf50/html5/thumbnails/5.jpg)
Genome Annotation
Biologists refer to both the annotation of the genome Biologists refer to both the annotation of the genome
and functional annotation of gene products:and functional annotation of gene products:
““Structural” AnnotationStructural” Annotation
& &
““Functional” AnnotationFunctional” Annotation
![Page 6: Modeling Functional Genomics Datasets CVM8890-101](https://reader036.fdocuments.net/reader036/viewer/2022062301/568140e2550346895dacaf50/html5/thumbnails/6.jpg)
Structural annotation
Identification of genomic elements.• ORFs predicted during genome assembly• Location of ORFs • Gene structure • Coding regions • Location of regulatory motifs etc
Functional annotation
Attaching biological information to genomic elements.• Biochemical function • Biological function • Involved regulation and interactions • Expression etc
These steps may involve both biological experiments and in
silico analysis.
Structural & Functional AnnotationStructural & Functional Annotation
http://en.wikipedia.org/wiki/Genome_annotation#Genome_annotation (with modifications)
![Page 7: Modeling Functional Genomics Datasets CVM8890-101](https://reader036.fdocuments.net/reader036/viewer/2022062301/568140e2550346895dacaf50/html5/thumbnails/7.jpg)
Why Functional Annotation?Why Functional Annotation?
Enables you to take large “laundry lists” of genes/proteins and turn them into a biologically useful model
![Page 8: Modeling Functional Genomics Datasets CVM8890-101](https://reader036.fdocuments.net/reader036/viewer/2022062301/568140e2550346895dacaf50/html5/thumbnails/8.jpg)
• Annotation of gene products = Annotation of gene products = Gene OntologyGene Ontology (GO)(GO) annotation annotation
• Initially, predicted ORFs have no functional Initially, predicted ORFs have no functional
literature and GO annotation relies on literature and GO annotation relies on computational methods computational methods (rapid but ?Quantity vs Quality)(rapid but ?Quantity vs Quality)
• Functional literature exists for many genes/proteins Functional literature exists for many genes/proteins
prior to genome sequencing prior to genome sequencing (slow but provide high (slow but provide high quality annotations)quality annotations)
• GO annotation does not rely on a completed GO annotation does not rely on a completed genome sequence! genome sequence!
Functional AnnotationFunctional Annotation
![Page 9: Modeling Functional Genomics Datasets CVM8890-101](https://reader036.fdocuments.net/reader036/viewer/2022062301/568140e2550346895dacaf50/html5/thumbnails/9.jpg)
Types of Functional annotationTypes of Functional annotationBased in direct experimental evidence of function Experiments in the same ORGANISM example:• Enzyme assays• Binding experiments• Pathway analysis• Synthetic lethals• Functional complementation• Gene mutations• RNAi• 2-hybrid interactions etc
Indirect Evidence of function• Expression analysis• Structure analysis• Sequence analysis
![Page 10: Modeling Functional Genomics Datasets CVM8890-101](https://reader036.fdocuments.net/reader036/viewer/2022062301/568140e2550346895dacaf50/html5/thumbnails/10.jpg)
Problem:• Many genes/proteins have no annotation• Some have unknown functions Challenge:• We want to get the maximum functional
annotation for modeling our data
Solution:• Read papers (Pubmed etc) • Search for homologs/orthologs of known function• Homologs and orthologs help assign function….
Functional AnnotationFunctional Annotation
![Page 11: Modeling Functional Genomics Datasets CVM8890-101](https://reader036.fdocuments.net/reader036/viewer/2022062301/568140e2550346895dacaf50/html5/thumbnails/11.jpg)
2. Finding Function: orthologs and homologs
![Page 12: Modeling Functional Genomics Datasets CVM8890-101](https://reader036.fdocuments.net/reader036/viewer/2022062301/568140e2550346895dacaf50/html5/thumbnails/12.jpg)
What are Homologs, Orthologs, Paralogs?
Homolog Is a relationship between genes separated by the event of speciationor genetic duplication
Ortholog
Orthologs are homologous genes in different species that evolved from a common ancestor gene by speciation. Normally (not always), orthologs retain the same function in the course of evolution. Identification of orthologs is critical for reliable prediction of gene function in newly sequenced genomes.
Paralog Paralogs are homologous genes related by duplication within a genome. Paralogs evolve new functions, even if these are related to the original one.
http://homepage.usask.ca/~ctl271/857/def_homolog.shtml
![Page 13: Modeling Functional Genomics Datasets CVM8890-101](https://reader036.fdocuments.net/reader036/viewer/2022062301/568140e2550346895dacaf50/html5/thumbnails/13.jpg)
http://www.ensembl.org/info/data/compara/tree_example1.jpg
Orthologs & Paralogs
orthologs
Paralogs
![Page 14: Modeling Functional Genomics Datasets CVM8890-101](https://reader036.fdocuments.net/reader036/viewer/2022062301/568140e2550346895dacaf50/html5/thumbnails/14.jpg)
How to search for Orthology?How to search for Orthology?
BLAST : BLAST : http://www.ncbi.nlm.nih.gov/BLAST/http://www.ncbi.nlm.nih.gov/BLAST/• Sequence alignment search tool• Utilizes heuristic algorithm
MPsrch: http://www.ebi.ac.uk/MPsrch/• Sequence comparison tool• Implement Smith & Waterman algorithm• Utilizes exhaustive algorithm
Domain analysis: Domain analysis: http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtmlhttp://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml• Analysis of regions of sequence homology among sets of proteins that are not all full-
length homologs.• Homology domains often, but not always, correspond to recognizable protein folding
domains
Protein family databases Protein family databases (e.g. COGs & KOGs)(e.g. COGs & KOGs)• Superfamily: Complete set of proteins having sequence homology over essentially their
full length.• Subfamilies: Incomplete set of homologous proteins which yet encompass proteins of
diverse function
![Page 15: Modeling Functional Genomics Datasets CVM8890-101](https://reader036.fdocuments.net/reader036/viewer/2022062301/568140e2550346895dacaf50/html5/thumbnails/15.jpg)
Systems for Functional AnnotationSystems for Functional Annotation
1.1. C Clusters of lusters of OOrthologous rthologous GGroups (COGs)roups (COGs)
ProkaryotesProkaryotes
2. eu2. euKKaryote aryote OOrthologous rthologous GGroups (KOGs)roups (KOGs)
EukaryotesEukaryotes
3.3. G Gene ene OOntology (GO)ntology (GO)
![Page 16: Modeling Functional Genomics Datasets CVM8890-101](https://reader036.fdocuments.net/reader036/viewer/2022062301/568140e2550346895dacaf50/html5/thumbnails/16.jpg)
COGs & KOGs
Both are based on orthology. Both are based on orthology. Genes are assigned to broad Genes are assigned to broad
categories (A-Z)categories (A-Z) Each category corresponds to an Each category corresponds to an
ancient conserved domain ancient conserved domain
COGs - prokaryotesCOGs - prokaryotes KOGs - eukaryotesKOGs - eukaryotes
![Page 17: Modeling Functional Genomics Datasets CVM8890-101](https://reader036.fdocuments.net/reader036/viewer/2022062301/568140e2550346895dacaf50/html5/thumbnails/17.jpg)
1.1. Information storage and processingInformation storage and processing
2.2. Cellular processes and signalingCellular processes and signaling
3.3. MetabolismMetabolism
4.4. Poorly characterizedPoorly characterized
COGs has 25 functional categories (A – Z) in four broad groups
Text search:Text search:
Clusters of Orthologous Groups (COGs)Clusters of Orthologous Groups (COGs)http://www.ncbi.nlm.nih.gov/COG/
![Page 18: Modeling Functional Genomics Datasets CVM8890-101](https://reader036.fdocuments.net/reader036/viewer/2022062301/568140e2550346895dacaf50/html5/thumbnails/18.jpg)
INFORMATION STORAGE AND PROCESSING
[J] Translation, ribosomal structure and biogenesis [A] RNA processing and modification [K] Transcription [L] Replication, recombination and repair [B] Chromatin structure and dynamics
CELLULAR PROCESSES AND SIGNALING
[D] Cell cycle control, cell division, chromosome partitioning [Y] Nuclear structure [V] Defense mechanisms [T] Signal transduction mechanisms [M] Cell wall/membrane/envelope biogenesis [N] Cell motility [Z] Cytoskeleton [W] Extracellular structures [U] Intracellular trafficking, secretion, and vesicular transport [O] Posttranslational modification, protein turnover, chaperones
COGs CategoriesCOGs Categories
ftp://ftp.ncbi.nih.gov/pub/COG/COG/fun.txt
![Page 19: Modeling Functional Genomics Datasets CVM8890-101](https://reader036.fdocuments.net/reader036/viewer/2022062301/568140e2550346895dacaf50/html5/thumbnails/19.jpg)
METABOLISM [C] Energy production and conversion [G] Carbohydrate transport and metabolism [E] Amino acid transport and metabolism [F] Nucleotide transport and metabolism [H] Coenzyme transport and metabolism [I] Lipid transport and metabolism [P] Inorganic ion transport and metabolism [Q] Secondary metabolites biosynthesis, transport and catabolism
POORLY CHARACTERIZED
[R] General function prediction only [S] Function unknown
COGs CategoriesCOGs Categories
ftp://ftp.ncbi.nih.gov/pub/COG/COG/fun.txt
![Page 20: Modeling Functional Genomics Datasets CVM8890-101](https://reader036.fdocuments.net/reader036/viewer/2022062301/568140e2550346895dacaf50/html5/thumbnails/20.jpg)
Tatusov et al., 2000: The COG database: a tool for genome-scale analysis of protein functions and evolution
Classification of COGs by functional categories
Example 1
![Page 21: Modeling Functional Genomics Datasets CVM8890-101](https://reader036.fdocuments.net/reader036/viewer/2022062301/568140e2550346895dacaf50/html5/thumbnails/21.jpg)
Effects of Antibiotics on Pasteurella multocida transcriptome
Nanduri et al 2006
Example 2
AMX
CTC
ENR
DecreaseIncrease
COG categories
05
1015202530
3540
05
10152025303540
0
5
10
15
20
25
30
35
40
- C D E F G H I J K L M N O P Q R S T U V
![Page 22: Modeling Functional Genomics Datasets CVM8890-101](https://reader036.fdocuments.net/reader036/viewer/2022062301/568140e2550346895dacaf50/html5/thumbnails/22.jpg)
The Gene Ontology (GO)The Gene Ontology (GO)• The Gene Ontology (GO) is the de facto Standard for
functional annotation
• GO functional annotation is based on orthology AND direct experimental evidence
• GO terms allow much more detailed functional analysis (> 24,000 terms) than COGs & KOGs (25 broad terms)
• GO is a controlled vocabulary of terms split into three related ontologies covering basic areas of molecular biology:
molecular function: 8,123 terms biological process: 13,960 terms cellular component: 2,071 terms
GO Report 2007- 04
![Page 23: Modeling Functional Genomics Datasets CVM8890-101](https://reader036.fdocuments.net/reader036/viewer/2022062301/568140e2550346895dacaf50/html5/thumbnails/23.jpg)
0 50 100 150 200 250 300 350
NucleusCell
CytoplasmMitochondrion
Plasma membraneCytosol
CytoskeletonExtracellular matrix
NucleoplasmEndoplasmic
Golgi apparatusIntracellularEndosome
CytoplasmicChromosome
NucleolusLysosome
Nuclear envelopeExtracellular spaceExtracellular region
Cellular_componentCilium
Nuclear chromosomeRibosome
PeroxisomeMicrotubule
VacuoleUnlocalized protein
Number of GO terms
Cellular Component
Functional Annotation of Chicken Proteomic data
Example 3
![Page 24: Modeling Functional Genomics Datasets CVM8890-101](https://reader036.fdocuments.net/reader036/viewer/2022062301/568140e2550346895dacaf50/html5/thumbnails/24.jpg)
Use GO for…….Use GO for…….
• Modeling function in high-throughput datasets (arrays!) started by Fly, Yeast, Mouse (Ashburner et al 2000, 2001)
• Grouping gene products by biological functionGrouping gene products by biological function
• Determining which classes of gene products are Determining which classes of gene products are over-represented or under-representedover-represented or under-represented
• Focusing on particular biological pathways and Focusing on particular biological pathways and functions functions ((hypothesis-drivenhypothesis-driven))
• Relating a protein’s location to its functionRelating a protein’s location to its function
![Page 25: Modeling Functional Genomics Datasets CVM8890-101](https://reader036.fdocuments.net/reader036/viewer/2022062301/568140e2550346895dacaf50/html5/thumbnails/25.jpg)
Annotating to the Annotating to the GOGO
• Need to show type of evidence of
function Literature curation: read and interpret
reviewed literature (IDA, IGI, IMP, IPI, IGC)
(TAS, NAS) Computational analysis (RCA, ISS, IEA)
http://www.geneontology.org/GO.evidence.shtml
![Page 26: Modeling Functional Genomics Datasets CVM8890-101](https://reader036.fdocuments.net/reader036/viewer/2022062301/568140e2550346895dacaf50/html5/thumbnails/26.jpg)
4. How to find functional 4. How to find functional annotation for your speciesannotation for your species
![Page 27: Modeling Functional Genomics Datasets CVM8890-101](https://reader036.fdocuments.net/reader036/viewer/2022062301/568140e2550346895dacaf50/html5/thumbnails/27.jpg)
How to find functional annotationHow to find functional annotation
For quick search you need to know:
Name of your species (e.g Sus scrofa, Aspergillus flavus) Taxonomy ID (e.g 9823 – S. scrofa, 5059 – A. flavus etc) Database to look in (e.g. NCBI, Uniprot, EBI-GOA, GOC, AgBase
etc)
Not all functional annotation for a species will be in one database!
Not very many species have a broad coverage of GO annotation…
BUT do not worry Search for their homologs might help May rely on manual annotation from literature (Refer Manual annotation Course on by Fiona McCarthy)
![Page 28: Modeling Functional Genomics Datasets CVM8890-101](https://reader036.fdocuments.net/reader036/viewer/2022062301/568140e2550346895dacaf50/html5/thumbnails/28.jpg)
Functional annotationAre the genes/proteins in GenBank? Check by Taxon ID
GOA make GO annotations (IEA) usingautomated methods
Manual annotations from literature (IDA, IMP, IPI, IGI, IEP codes)
GOA collect all GO annotations& submit to GOC
GOA maintain annotation file
AgBase maintains annotation file
UniProtKB
Known?NM_, NP_
Fill in GO association file
Annotate by structural/sequence similarity ORTHOLOGS (ISS code)
Submit to AgBase(Agricultural Species)
GOC maintain annotation files• unfiltered GOA• filtered GOA
Yes
YesNo GO Manual annotations from literature
(IDA, IMP, IPI, IGI, IEP codes)
UniParc/IPI Annotate by structural/sequence similarity ORTHOLOGS (ISS code)
No
No GO Manual annotations from literature (IDA, IMP, IPI, IGI, IEP codes)
Annotate by structural/sequence similarity ORTHOLOGS (ISS code)
No GO
![Page 29: Modeling Functional Genomics Datasets CVM8890-101](https://reader036.fdocuments.net/reader036/viewer/2022062301/568140e2550346895dacaf50/html5/thumbnails/29.jpg)
DemonstrationDemonstration
![Page 30: Modeling Functional Genomics Datasets CVM8890-101](https://reader036.fdocuments.net/reader036/viewer/2022062301/568140e2550346895dacaf50/html5/thumbnails/30.jpg)
![Page 31: Modeling Functional Genomics Datasets CVM8890-101](https://reader036.fdocuments.net/reader036/viewer/2022062301/568140e2550346895dacaf50/html5/thumbnails/31.jpg)
![Page 32: Modeling Functional Genomics Datasets CVM8890-101](https://reader036.fdocuments.net/reader036/viewer/2022062301/568140e2550346895dacaf50/html5/thumbnails/32.jpg)
![Page 33: Modeling Functional Genomics Datasets CVM8890-101](https://reader036.fdocuments.net/reader036/viewer/2022062301/568140e2550346895dacaf50/html5/thumbnails/33.jpg)