Genomics of Microbial Eukaryotes
description
Transcript of Genomics of Microbial Eukaryotes
Genomics of Microbial Eukaryotes
Igor GrigorievFungal Genomics Program Head
US DOE Joint Genome Institute, Walnut Creek, CA
2
Outline
Eukaryotic Genome Annotation
Fungal Genomics Program
MycoCosm
3
Are you in the right room?
genome.jgi.doe.gov
IMG
MycoCosm
100+ annotated eukaryotic genomes
4
Started with Human Genome Project
5
Protein-based methods build CDS exons around known protein alignments.(Fgenesh, GeneWise)
GenBank protein
Transcript-based methods map or assemble transcripts on the genome, including UTRs (EST_map, Combest)
EST contig
Predict model
Predict model
Ab initio methods use knowledge of known genes’ structures to predict start, stop, and splice sites in CDS only. (Fgenesh+, GeneMark)
Train on known genes
ATG TGA
GT AG
exons introns5’UTR5’UTR3’UTR3’UTR
Promoter PolyA
Gene model
Eukaryotic Gene Prediction
6
Predicted protein
Protein Annotation
Higher order assignments:
Gene Ontology terms
EC numbers --> KEGG pathways
Gene families, with and without other species
Possible orthologs
(in nr, SwissProt, KEGG, KOG)
Possible paralog
(Blastp+MCL)
Domain
(InterPro, tmhmm)
Signal peptide
(signalP)
7
EST Support is Critical for Eukaryotes
0%10%20%30%40%50%60%70%80%90%
100%
N.heam
atoco
cca
L.bic
olor
P.blake
sleea
nus
S.rose
us
M.g
ram
inico
la
A.niger
*
N.disc
reta
*
W.co
cos
G.trab
eum
A.acu
leatu
s
Other Genes
Supported by ESTs
Sanger 454 Illumina
5531
34
EST profile
CombEST gene models
8
Best Models
FGENESH
Representative set
GENEWISE
EXTERNAL MODELS
Multiple gene predictors offer several different gene models at each gene locus; A single best model from each locus is automatically selected based on homology and EST support; These compose a non-redundant (or Filtered) gene set for further analysis This set is further improved during community-driven manual curation
9
Genomic assembly and EST contigs
An
no
tati
on
Pip
elin
e
Gene predictions Gene predictions
Protein annotationsProtein annotations
Transcript + protein mapsTranscript + protein maps
Repeat maskRepeat mask
Manual curationManual curation
Bring it all together
Analysis
Gene familiesGene expressionPhylogenomicsProteomicsProtein targetingetc
Annotation
10
Many Genes of Eco-responsive Daphnia pulex
First crustacean, aquatic animal sequenced, new model organism30,940 predicted D.pulex genes in ~200Mb genome85% supported by 1+ lines of evidence Colbourne et al, Science, 2011
11
Half of Daphnia Genes have no Homologs
With Evgeny Zdobnov’s group (Univ. Genève)
* Of 716 highly conserved single copy orthologs, Daphnia is missing only two
12
Outline
Eukaryotic Genome Annotation
Fungal Genomics Program
MycoCosm
13
Fungal Genomics for Energy and Environment
Grow Grow DegradeDegrade
Lignocellulose degradation
Plant symbiontsand pathogens
SugarFermentation
FermentFerment
Bio-refinery
GOAL: Scale up sequencing and analysis of fungal diversity for DOE science and applications
14
15
• Plant feedstock health• Symbiosis
• Plant Pathogenicity
• Biocontrol
• Biorefinery fungi• Lignocellulose degradation
• Sugar fermentation
• Industrial organisms
• Fungal diversity• Phylogentic
• Ecologic
Genomic Encyclopedia of Fungi Launched
www.jgi.doe.gov/fungi 100+ fungal genomes600+ registered users5000+ visitors/month
16
Distinct Mechanisms of Cellulose Degradation
White rotP.chrysosporium
Cellobiohydrolase IIGH6(CBH50)
Cellobiohydrolase IGH7 (CBH58,62)Endoglucanases
GH5-CBM1,GH12
GH3 -glucosidase
Cellulose
No cellulose binding domain CBM1 in brown rot!
Fe2+ + H2O2 Fe3+ + HO- + HO.
Fe3+
Glucose
Copper radical oxidasesGlucose oxidases
Iron reductase
Brown rotPostia placenta
Martinez et al, PNAS 2009
17
Diverse Basidiomycota
• FGP09 pilots• Basidio jam (Mar 2010)• 3 CSP11 proposals• Basidio jam (Mar 2011)
1818
Future Grand Challenges
Fungal isolates & groups
Systems of interacting organisms
Systems in wild
MODELING
FUNCTION
SEQUENCE
1. 1000 fungal genomessampling fungal diversity
2. Model fungisampling 100s of conditions
3. Fungal ecosystems: Bioenergy crops symbionts & pathogens Biorefinery Fungal metagenomes
19
Leadership in Sequencing Fungi
68%
23%
Ascomycota
Basidiomycota
Blastocladiomycota
Chytridiomycota
Glomeromycota
Microsporidia
Neocallimastigomycota
Unknown
Zygomycota
13%10%
31%
5%
41%
DOE Joint GenomeInstitute
Broad Institute
Sanger Institute
Washington Univ
other
20
Annotation and Analysis Tools
• Automated Annotation• Pipeline
• Genomics Analysis Platform• Genome Centric• Comparative Genomics
• Community Resource• Integrated data• User tools• Training
21
Genome-Centric View
Comparative View
www.jgi.doe.gov/fungi
2222
Genome-Centric View
Focus: functional genomics, user data deposition and curation
23
New Comparative View
2424
Community Building Tools
•
• Jamborees: • Genome analysis for publications
• MycoCosm Tutorials: • On-line video, MGM, workshops w/ large
meetings (Asilomar, JGI Users, MSA)
• Preparation for CSP: • Large meetings and focused groups
25
Summary
Eukaryotic Annotation Recipe:• Combined gene predictors,
experimental data, and community annotation
Fungal Genomics Program: • Scaled-up sequencing &
comparative analysis of fungi relevant for energy & environment (jgi.doe.gov/fungi)
26
Outline
Eukaryotic Genome Annotation
Fungal Genomics Program
MycoCosm