Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative...
Transcript of Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative...
Genomics: 1
Genomics-sequencing of microbial
genomes
This lecture illustrates the strategies used in microbial genome
sequencing projects, compares genome content and
organisation amongst microbes, and shows how to derive
information on gene function across genome.
Objectives for students:
• Expected to describe strategies involved in microbial genome
sequencing and functional genomics
• Provide examples of information that can be derived from
genomics
Genomics: 2
Microbial Genome Sequencing
• Genome Sequencing Projects
– strategy & methods
– annotation
• Comparative genomics
– organisation
– gene content
• Functional genomics
– transcriptome
– proteome
– genome-wide mutation
• Concentrate on strategy & ideas
Genomics: 3
Bacterial genome projects
• Many completed:
– Haemophilus influenzae
– Escherichia coli
– Bacillus subtilis
– Mycoplasma genitalium
– Helicobacter pylori (x2)
– Campylobacter jejuni
– Treponema pallidum
– Neisseria menigitidis
– Neisseria gonnorhoea
– Vibrio cholerae
– E. coli O157
• Good link to projects:
– http://www.tigr.org/
– http://www.ncbi.nlm.nih.gov/
– http://www.sanger.ac.uk/
– http://www.genomesonline.org/
Genome sequencing progress
• Complete:
– Archaeal: 70 (2007&2008: 49&55)
– Bacterial: 945 (554&728)
– (Eukaryal: 121) (76&97)
• Ongoing:
– Prokaryotic: 3498 Archaeal: 111
– (Eukaryotic: 1223)
• Metagenome projects: 200
Genomics: 4www.genomesonline.org
Genomics: 5
Microbial eukaryote projects
• Complete
– Yeast -Saccharomyces cerevisiae
– Plasmodium falciparum
– Aspergillus nidulans, A.niger, A.oryzae & A.fumigatus
– Trypanosoma cruzi & brucei
– Leishmania
– Entamoeba histolytica
– Giardia lamblia
– Candida albicans & glabrata
– Paramecium
• Underway
– Pneumocystis carinii
– Plasmodium vivax
– some complete chromosomes finished
– Other species and isolates from completed list
Genomics: 6
Why bother? -To sequence or not to sequence
(considerations in the pre-genome era)
• piecemeal collection of sequenced genes
– slow
– costly
– ever complete?
• genome project
– rational approach
– efficient and rapid
– quality assurance
– address novel questions
• problems/issues
– ownership
– strain choice
– cost
– approach
– data release
– some now less relevant
• Post genomic era
– Comparative genomics
– Functional genomics
Genomics: 7
Genome sequencing strategy
• Strategy choice
• large collaborative cosmid/BAC-based projects
– now better suited for larger genomes
– slow
• small insert shotgun approach
– centralised
– rapid and efficient
– choice for bacteria
• Strain choice
– fresh isolate vs lab strain
– clinical vs environmental
– subsequent genetic analysis
Genomics: 8
Yeast genome sequence strategy
• Yeast chromosomes (16) individually sequenced
• several approaches used
• Make genome library in cosmids
• order cosmid library – which cosmid overlaps with which
– link cosmid to genome map
– produced tiled set of cosmids
– only sequence minimum number
• Use chromosome specific probe to identify chr-specific cosmids
• sequence cosmid inserts by subcloning
• Solve problems by direct PCR sequencing, walking and other libraries (lambda)
• Telomeres
Genomics: 9
Tiled set
Genomics: 10
c1A B
c2C D
c3E F
c4G H
c5I J
c1 c2 c3 c4 c5ABCDEFGHIJ
Ordering
Clones
Genomics: 11
PH011
200100
80 100 120 140 160 180
70512
70449
70893
70515
70124
70266 7202
70265
70871
70463
Genomics: 12
Whole genome/chromosome shot-
gun strategy (WGS)
• Rapid
• Generation of small insert genomic library
• Library is not initially ordered
• DNA sequence ends of inserts
• Depends on powerful computing to
assemble sequence reads
Genomics: 13
Main steps in generating a complete genome
sequence
Isolation
Construction
Shotgun
sequencing
Finishing
Annotation
Minimum time
period (weeks)
2
4-6
2-4
12
12
Genomics: 14
bacterial
chromosome
vector
plasmid
random
shearing
size selection
library
of
clones
sequence
end of
each clone
individual clones
Genomics: 15
Assembly
Sequencing individual clones
genome sequence with gaps
Genomics: 16
Automated sequencers: ABI 3700
• Made by Applied Biosystems
• Most widely used automated sequencers:
– 96 capillaries
– robot loading from 384-well plates
• Two to three hours per run
• 600–700 bases per run
96–well plate
robotic arm and syringe
96 glass capillaries
load bar
Genomics: 17
Automated sequencers: MegaBACE• Made by Amersham
• 96 capillaries
• Robotic loading from
384–well plate
• Two to four hours per
run
• Can read up to 800
bases
Source : GE Healthcare Life Science, Uppsala, Sweden
Genomics: 18
Automatic gel reading
• Top image: confocal
detection by the
MegaBACE sequencer
of fluorescently
labeled DNA
• Bottom image:
computer image of
sequence read by
automated sequencer
Genomics: 19
Industrialization of sequencing
• Most genome
sequencing projects
divide tasks among
different teams
– Genome libraries
– Production sequencing
– Finishing
• Sequencing machines
run 24/7
• Many tasks performed
by robots
The Broad Institute of MIT and Harvard, www.genome.gov
Genomics: 20
The future is here?..454 sequencing
Reprinted by permission from Macmillan Publishers Ltd: [NATURE] (Margulies et al., 437: 376
copyright (2005)
454 sequencing: the system
Genomics: 21
DNA Library Preparation emPCR Sequencing
4.5 hours 8 hours 7.5 hours
•Well diameter: average 44μm
•400,000 reads obtained in parallel
•A single cloned amplified sstDNA
bead is deposited per well
•4 bases (TACG) cycled 100 times
•Chemiluminescent signal generation
•Signal processing to determine base
sequence and quality score
Source :454 Sequencing © Roche Diagnostics
Genomics: 22
WGS: Just how much effort?• individual sequencing reads accumulate
– each read about 500bp
– computing used to assemble reads
– contiguous sequences called contigs
• Aim for 8-10 read coverage of genome for accuracy
• example:
– H.influenzae
• 19,687 templates
• 24,304 reads assembled
• 11,631,485 bp
• 9
Genomics: 23
Sequencing a genome
vgecisahubofevaluatedgeneticsrelatedresourcesforteachershealthprofessionalsandgeneralpublic
contiguous sequence
luatedgeneticsrel
tatedgene
ourcesforteachcisahubofevaluatedgenc
hprofessionalsandgeneralpub hprofessionalsandgeneralpub
cisahubofevaluatedgen
esforteachershealt
cisahubofevaluatedgenc chershealthprofession
luatedgeneticsrel
esforteachershealt
atedgene
ourcesforteach
chershealthprofession
atedgene
fragments of sequence luatedgeneticsrel ourcesforteach
chershealthprofession
vgecisahubofbofevaluatedgenetics
icsrelatedresourcesforteachershealthlthprofessionalsandgeneralp
generalpublicoverlaps
Genomics: 24
Gaps
Physical Gap
Sequence Gap
Genome
Library cloneSequence read
contig
Genomics: 25
Bridging Gaps
• rise in contig number as amount of reads increases
• steady fall as accumulating sequence bridges gaps between contigs
• levels off as new reads more likely in known contig than gap
• start finishing
Number of reads
Num
ber
of
conti
gs
1
rapid gap bridging
difficult gap bridging
Finishing
Genomics: 26
Finishing
• Why are gaps present?
• Gap bridging
– sequence gaps
• sequence gaps –choose appropriate clone and walk
– physical gaps
• alternative libraries (which?)
• PCR across gap
• Mistakes/poor sequence
– areas where sequence reads are less than 8-10
– repeated sequences -rRNA
• closure and completion
Genomics: 27
Finished Yet?atgaatccaagccaaatacttgaaaatttaaaaaaagaattaagtgaaaacgaatacgaaaactatttatcaaatttaaaattcaacgaaaaacaaagcaaagcagatcttttagtttttaatgctccaaatgaactcatggctaaattcatacaaacaaaatacggcaaaaaaatcgcgcatttttatgaagtgcaaagcggaaataaagccatcataaatatacaagcacaaagtgctaaacaaagcaacaaaagcacaaaaatcgacatagctcatataaaagcacaaagcacgattttaaatccttcttttacttttgaaagttttgttgtaggggattctaacaaatacgcttatggagcatgtaaagccatagcacataaagacaaacttggaaaactttataatccaatctttgtttatggacctacaggacttggaaaaacacatttacttcaagcagttggaaatgcaagcttagaaatgggaaaaaaagttatttacgctaccagtgaaaatttcatcaacgattttacttcaaatttaaaaaatggttctttagataaatttcatgaaaagtatagaaactgcgatgttttacttatagatgatgtacagtttttaggaaaaaccgataaaattcaagaagaatttttctttatatttaatgaaatcaaaaataacgatggacaaatcatcatgacttcagacaatccacccaacatgctaaaaggtataaccgaacgcttaaaaagtcgttttgcacatgggatcatagctgatataactccacctcaactagatacaaaaatagccatcataagaaaaaaatgtgaatttaacgatatcaatctttctaatgatattataaactatatcgctacttctttaggggataatataagagaaatcgaaggtatcatcataagtttaaatgcttatgcaaccatactaggacaagaaatcacactcgaacttgccaaaagtgtgatgaaagatcatatcaaagaaaagaaagaaaatatcactatagatgacattttatctttggtatgtaaagaatttaacatcaaaccaagcgatgtgaaatccaataaaaaaactcaaaatatagtcacagcaagacgcattgtgatttacctagctagggcacttacggctttgactatgccacaacttgcgaattattttgaaatgaaagatcatacagctatttcacataatgttaaaaaaatcacagaaatgatagaaaatgatgcttctttaaaagcaaaaatcgaagaacttaaaaacaaaattcttgttaaaagtcaaagttaagtgaaaggatgtgaaaaataaattctagagtgtgaaaaaaagaaattaagcaaagtatgataaaatacaaatttgattattttgctttgaaaaatttcacaatttcaacaagcttattattacaacgaatttaaaattaaaataaaccaaggagaaaaaatgaagttaagtatcaataaaaatactttagaatctgcagtgattttatgtaatgcttatgtagaaaaaaaagactcaagcaccattacttctcatcttttttttcatgctgatgaagataaacttcttattaaagctagtgattatgaaataggtatcaactataaaataaaaaaaatccgcgtagaatcaagtggttttgctactgcaaatgcaaaaagtattgcagatgttattaaaagcttaaacaatgaagaagttgttttagaaaccattgataattttttatttgtaagacaaaaaagtacaaaatacaaacttcctatgtttaatcatgaagattttccaaattttccaaatacagaaggaaaaaaccaatttgacattgattcaagtgatttaagccgttctcttaaaaagatattaccaagtattgatacaaataacccaaaatactccttaaatggtgcatttttagatataaaaacagataaaattaacttcgtaggaactgatacaaaacgccttgcaatctatactttagaaaaagcaaataatcaagaatttagttttagtatccctaaaaaagctattatggaaatgcaaaaacttttctatgaaaaaatagaaattttttatgatcaaaatatgcttattgccaaaaatgaaaattttgaattctttacaaaacttatcaatgataaatttccagattatgaaaaagttataccaaaaactttcaaacaagaactcagtttttcaactgaagattttatagatagtcttaaaaaaatcagcgttgtaactgaaaaaatgagacttcattttaacaaagataaaatcatctttgaaggtataagtttagacaatatggaagcaaaaacagaacttgaaattcaaacaggagtaagtgaagaatttaatcttactataaaaatcaaacatttacttgatttcttaacttctatagaagaagaaaaattcactttaagtgtaaatgaacctaattcagcatttatagtcaaatcccaaggactatcaatgattatcatgcctatgattttgtaataaaacaagtaaaagataaaggaaaaatatgcaagaaaattacggtgcgagtaatattaaagtcctaaaaggcttagaagctgttagaaaacgcccaggtatgtatataggagatacaaacataggcggacttcatcatatgatttatgaagttgtggataattctatcgatgaagctatggcaggacattgcgatactatagatgtagaaatcactactgaaggaagctgtatagttagtgataatggtcgtggtattcctgttgatatgcacccaactgaaaatatgccaactttaactgttgttttaactgtcctacatgcagggggaaaattcgataaagatacttataaagtttcaggcggtttgcacggtgttggggtttcggttgtaaatgcactctctaaaaaacttgtagctacagttgaaagaaatggagaaatttatcgtcaagaattttcagaaggtaaagttatcagtgaatttggtgtgataggaaaaagtaaaaaaacaggaacaactatagaattttggcctgatgatcaaatttttgaagtgactgaatttgattatgaaattttggctaaaagatttcgtgaacttgcatacttaaatccaaaaatcactataaattttaaagataaccgcgtaggcaaacatgaaagttttcactttgaaggtggaatttctcagtttgttacagacttaaataaaaaagaagctttaactaaagcaattttctttagtgtagatgaagaagatgtgaatgttgaagtagctttgctttacaatgatacttatagtgaaaatttactctcttttgtaaataatattaaaaccccagatggtggaacacacgaagctggttttagaatgggtttaactcgtgtgataagtaactatatagaagcaaatgcaagtgctagagaaaaggataataaaatcacgggtgatgatgtgcgtgaaggtttgatcgctattgtgagtgtaaaggtacctgaaccacaatttgaaggacaaaccaaaggaaaacttggttcaacttatgtgcgtcctatagtttcaaaagcaagttttgagtatttgactaaatattttgaagaaaatcctatcgaagctaaagctataatgaataaagctttaatggcagctagaggaagagaagcagcgaaaaaagctagagaattaacgcgcaaaaaagaaagtttaagcgtaggaactttaccagggaaattagctgattgtcaaagtaaagatccaagtgaaagtgaaatttatcttgtggaaggggattctgcaggaggttctgcaaaacaaggtagagaaagatctttccaagctatactgcctttgcgtggtaaaattttaaatgttgaaaaagcaagactagataaaattttaaaatctgagcaaattcaaaatatgattaccgcttttggctgtggtataggtgaagattttgatctttcaaaacttagatatcataaaatcatcatcatgacagatgcggatgttgatggatctcatatacaaaccttgcttttaactttcttcttccgttttatgaatgaacttgtggcaaatggacatatttatctagcacaaccacctttatatctttataaaaaagctaaaaagcaaatttatttaaaagatgaaaaagctttgagcgaatacctgatagaaacgggaatagaaggtttaaactatgaaggtataggaatgaatgatttaaaagattatttaaaaatcgttgcagcttatcgtgcgattttaaaagatcttgaaaagcgttttaatgtgatttctgtgatacgctatatgatagaaaattcaaatttagttaaaggaaataatgaagaattatttagtgtaatcaaacaatttttagaaacacaaggacacaatatcttaaatcattatatcaacgaaaatgaaattcgagctttcgttcaaactcaaaatggcttagaagaacttgtgatcaatgaagaacttttcactcatccactatatgaagaagcgagttatatttttgataagattaaagatagaagcttggaatttgataaagatattttagaagttcttgaagatgttgaaaccaatgctaaaaaaggtgctactatacaacgctataaaggtttaggggaaatgaatcctgagcaactttgggaaaccacaatggatccaagcgtaagaagacttttaaaaatcactattgaagatgcacaaagtgcaaatgatacctttaatctctttatgggtgatgaggttgaaccaagacgcgattatatccaagcgcacgctaaagatgtaaagcatttggatgtgtaaaaatttatcattgaagaaatcatttcttcaatgagttttgttttgtaagagtatagctagaggaattcttcttcttgtatcgtatttttctccataatatttttcaagataatttaaaattttttcttcatcttcaggttctatttcccaaagtccttcactatcttgcatccatcttatagctgctaaccaagcttttctacttgcatgcatattggtaatgagattggatccatgacaagctaaacaatttgcttccactaaaggtgaatcaggatcgataatcaatcctgtatcagggttaatttcaagattttgagcccaacttgcacttaaaaacaatgctaagatcaatataatttttttcatacttaaactccataaacattaactctatggcatgcattattgatatatcctcctggattccactgtgctaaaaccataggttgactgttaccttgactatcgatagctcttgcccaaatttcataatatccttttgttggtattgatatttgagcactccatttttgccatgctaatctatttaatggtttttctacctttgc ………………….
Genomics: 28
Sequencing a genome
VGEC is a hub of evaluated genetics related resources for teachers, health professionals and general public.
annotation
vgecisahubofevaluatedgeneticsrelatedresourcesforteachershealthprofessionalsandgeneralpublic
contiguous sequence
luatedgeneticsrel
tatedgene
ourcesforteachcisahubofevaluatedgenc
hprofessionalsandgeneralpub hprofessionalsandgeneralpub
cisahubofevaluatedgen
esforteachershealt
cisahubofevaluatedgenc chershealthprofession
luatedgeneticsrel
esforteachershealt
atedgene
ourcesforteach
chershealthprofession
atedgene
fragments of sequence luatedgeneticsrel ourcesforteach
chershealthprofession
vgecisahubofbofevaluatedgenetics
icsrelatedresourcesforteachershealthlthprofessionalsandgeneralp
generalpublicoverlaps
Genomics: 29
Genome Annotation
• Find ORFs
– look for ATG-Stop (+alternatives)
– over certain size
– overlaps
– computer based (“Glimmer” & “Orpheus”) and trained eye.
• ORF function
– Search databases with predicted translated sequences –BLASTX
– Consider level of similarity and context
– Domain comparisons
• Pfam/Prosite
• Other features
Genomics: 30
www.yeastgenome.org
Genomics: 31
http://mips.gsf.de/genre/proj/yeast/index.jsp
http://www.yeastgenome.org/MAP/GENOMICVIEW/GenomicView.shtml
Genomics: 32
Artemis: sequence viewer and annotation tool from the Sanger
Centre (http://www.sanger.ac.uk/Software/Artemis/)
Genomics: 33
Genomics: 34
Genomics: 35
http://xbase.bham.ac.uk/
xBASE is a database for comparative genome analysis of all
bacterial genome sequences
Chaudhuri RR, Pallen MJ. xBASE, a collection of online
databases for bacterial comparative genomics. Nucleic Acids
Res. 2006 Jan 1;34(Database issue):D335-7.
Genomics: 36
Coordinator
DNA
Shotgun
sequences
Finishing
instructions
Shotgun
templates
Annotation
tasks
Finishing sequences
Bioinformatics Lab
Annotations
SS
S
S
S
S
S
SS
S
S
S
S
S S S
S
S S
S
S
S
S
S
S
SS S
S
SS
SS
Working draft
sequence
Finished
sequence
Finished annotated
sequence
A conceptual diagram of the flux and information in a network-
based genome-sequencing project
Genomics: 37
Post Genome Sequence• Comparative genomics
– comparing genome organisation and content
– genome size
– genome repeats/Tn/phages
– gene content
– minimal gene content
• Functional genomics –ascribing gene function across a genome
– gene function –knowns
– phenotype prediction
– gene function –unknowns
– investigating function
• Bacteria-Yeast
Genomics: 38
Bacteria: Does size matter?
• Link genome size to adaptive capability
– biosynthetic capability
• synthesis of nutrients
– Stress resistance
• resist environmental insults
– structural complexity
• surface structures, sporogenesis
– Regulation –sensing signals and transcriptional responses
• detect change or requirement and respond appropriately
• transcriptional regulation
Genomics: 39
Not just Size but how you use it…..
• Small genomes
– Mycoplasma genitalium
• 580,070 bp
• smallest genome for self-replicating organism
• free living but only just..infects host cells (guess which!)
• few biosynthesis and regulatory systems
• has replication & transcription & translation, metabolism etc functions
– Borrelia burgdorferi
• 910,725 bp
• Lyme disease
• few cellular biosynthetic systems
– Mycoplasma pneumoniae (0.8 Mbp); Chlamydia trachomatis (1.0 Mbp);
Genomics: 40
bigger genomes
• Haemophilus influenzae
– 1.830 Mbp
– colonises human respiratory tract
– limited environment
• Helicobacter pylori
– 1.667 Mbp
– colonises human stomach
– limited environment
• Campylobacter jejuni
– 1.641 Mbp
– colonises intestine
– limited environment
Genomics: 41
and bigger….• Escherichia coli (K-12)
– 4.639 Mbp
• Bacillus subtilis
– 4.214 Mbp
– soil/plant organism
– secondary metabolites
• Pseudomonas aeruginosa
– incomplete (5.9 Mbp)
• Yersinia pestis (4.4 Mbp)
• Clostridium spp (4-5 Mbp)
• Mycobacterium tuberculosis
– 4.411 Mbp
– slow growing (double in 24h)
– large proportion of genome on lipid metabolism
• Streptomyces coelicolor (~8 Mbp)
– secondary metabolites –antibiotics!
Genomics: 42
Organisation• Linear chromosomes
– Borrelia burgdorferi
– Streptomyces coelicolor
• Multiple chromosomes
– Vibrio cholerae
• Plasmids
– Borrelia burgdorferi
– 17 linear & circular plasmids
– 50% genome size
– plasmid replication, “decaying genes”, ?Ag variation
• Transposons, IS elements, phages
– found in most genomes
– Campylobacter has none
• Repeats
Genomics: 43
Replication
• Origin (oriC) and termination (terC) of replication
– OriC often near dnaA gene (replication initiation
protein)
– In Borrelia burgdorferi (linear) oriC (& dnaA) in centre
• strand bias
– which strand is each gene on?
– transcription in same direction as replication –more
efficient
– variation in level of strand bias
• Mt 55% vs Bs 75%
Genomics: 44
Gene Content
• Annotation
– sequence similarity
• gene families
• regulators, transport, biosynthesis
– domain matches
• trans-membrane domains, DNA binding
• Paralogues and Orthologues
– Paralogues:
• Members of same family (homologous) in same genome.
• Likely to have different exact function
– Orthologues:
• homologues (same family) in different genomes
• May have identical function
Vibrio cholerae as predicted by genome........
Reprinted by permission from Macmillan Publishers Ltd: [NATURE]( Heidelberg et al, 406 ,477-483), copyright (2000)Genomics: 45
Genomics: 46
Gene content (cont.)• ORFans
– significant proportion of genome contains ORFs of unknown function
– some may be orthologues of unknowns in other organisms
– some unique to organism
• important for biology of organism
– examples:
• H.influenzae: 42%
• H.pylori: 33%
• E.coli: 38%
• M.tuberculosis: 60% to 16%
– number decreasing
• Gene size –most about 1kb
Genomics: 47
Genomic
rearrangements
• Example
comparison
• Comparison
of:
S.e Typhi
CT18 with
S.e Typhi
Ty2
• inversion that
spans
terminushttp://www.sanger.ac.uk/resources/software/act/
Genomics: 48
Variation by gain and/or loss• Core regions
– shared by closely related species
• Additional “flexible” gene pool– variable regions
– acquired from mobile genetic elements
• First described as pathogenicityislands
– in non-pathogens too
– wider role
• Genomic Islands– pathogens
– commensals
– symbionts
– environmental
• Gain of GI sometimes assoc with gene loss
– reduction in obligate intracellular pathogens
• Genome organisation as well as genome content correlates with microbial lifestyle
Genome reduction by
deletion events
Gene acquisition
by HGT
Mutations
rearrangements
Common bacterial ancestor
Intracellular bacterium,
obliagate intracellular
pathogen, endosymbiont
Extracellular bacterium,
facultative pathogen,
symbiont
All lifestyles
GEIPlasmid
Genomics: 49
Other tRNA-associated elements:
tRNAPProL
Black arrows=Sal+Ec; white arrows=Sal or Ec; grey=strain/serovar specific
GC is for S. Typhi
Infection and Immunity, May 2002, p. 2351-2360, Vol. 70, No. 5
Genomics: 50
Other tRNA-associated elements:
tRNAArgU
Infection and Immunity, May 2002, p. 2351-2360, Vol. 70, No. 5
The supragenome• The distributed-genome hypothesis (DGH)
• Bacteria have a (supra) genome much larger than
the genome of any single bacterium.
• Core and non-core gene sets
– Example: Hiller et al. sequenced 8 strains of
Streptococcus pneumoniae + 9 already available
– Core set of genes in all strains
– 20-30% genes non-core (not present in all strains)
• Genetic recombination generates diversity across
strains.
• Also for Haemophilus influenzae (Hogg et al.)
– ~1400 in core set and ~1300 non-core in subset of strains
Genomics: 51
•Hiller et al. Journal of
Bacteriology, November 2007, p.
8186-8195, Vol. 189, No. 22
•Hogg et al. Genome Biology
2007, 8:R103 (doi:10.1186/gb-
2007-8-6-r103).
Genomics: 52
Yeast• 16 chromosomes totalling 12.068Mbp
• 5885 orfs –6275 but 390 unlikely translated
• Few introns ~4%
• Avg gene size 2kb (worm ~6kb and human >30kb)
• GC vary along chr length
– low GC at telomere & centromere
– GC rich correlate with higher recombination
• Tn and remnants in genome
– evidence of hotspots
• 50% orfs known function
– some exact role unclear
• http://genome-www.stanford.edu/Saccharomyces/
• http://mips.gsf.de/projects/fungi
Genomics: 53
Functional genomics
• Functional genomics –ascribing gene function across a
genome
• function and inter-relationships
• strategy
• [bioinformatic analysis -gene identification]
• Transcriptome -expression pattern
– Proteome -expression pattern
– Mutantome -mutant phenotype
– Interactome –protein-protein interactions
GENOME
TRANSCRITOME
RNA
Copies of the
active protein-
coding genes
PROTEOME
The cell’s
repertoire
Genomics: 54
Arrays: micro and chip• Microarrays
– Glass slides with <10000 individual samples applied in known position
– Use of robotics
– Samples can be PCR products or oligos
– example: oligo/PCR product complementary to each ORF
• Chip arrays
– silicon based
– >10,000 sequences
– http://www.affymetrix.com/index.html
• Redundancy
• fluorescent labels
Genomics: 55
One cell=
one specific sequence
AC
GT
AT
AC
GT
AT
AC
GT
AT
TG
CA
TA
TG
CA
TA
TG
CA
TA
LaserChip
ArraysIndividual
sequences &
bound sample
Genomics: 56
Transcriptome
• Genome-wide determination of expression
level of each ORF
• when expressed relates to role
• also assess mutants
• compare expression of each ORF in
different conditions
• Genome wide expression maps
• global patterns of expression
Genomics: 57
AGGCAT AATGAA When expressed?
mRNAs
2 x ORF
Bacillus genieae
AATGAA
AGGCAT
orf 1 orf 2
orf 2orf 1
grow in conditions
when only orf 2
expressed
isolate mRNAs
and make cDNA
copy
AATGAA
TTACTT
TTACTT
Genomics: 58
extract
mRNA
Grow under
different
conditions
Probe array with labelled copy of mRNA
Genomics: 59
Differentially labelled probes
Red
channel
Green
channel
Combined
Genomics: 60
http://www.bio.davidson.edu/courses/genomics/chip/chip.html
Genomics: 61
Expression profiling C. jejuni in low
iron
Cj1659 (P19)
Cj0177
Cj0037c
Genomics: 62
Proteome
• Genome-wide determination of protein expression
• Gives information stimulons
• protein expression linked to function
• assess mutants (regulatory mutants affect several proteins)
• Grow bacteria under defined conditions
• Extract proteins
• 2D-gel electrophoresis
• Protein spot identification
• Mass Spectrometry
• peptide size predictions from Genome data
Genomics: 63
Defining the Campylobacter
proteome –chasing spots
Which protein? Which conditions?
Which other
proteins are co-
expressed?
Genomics: 64
C. jejuni iron example
Genomics: 65
digest
with
protease
pIM
ol
mass
Mass Spec
* * ***
http://depts.washington.edu/yeastrc/pages/ms.html
Genomics: 66
Mass Mutagenesis: mutantome
• Mutate every ORF in genome
– organism specific technology
• High throughput analysis of phenotype
– need to analyse many 1000s of mutants under many
conditions
• Signature-tagged technology
– enables analysis of mutant pools
– requires array technology for genome-wide projects
• Association on ORF with mutant phenotypes
• Regulators might be pleiotropic
Genomics: 67
Arrays: micro and chip• Microarrays
– Glass slides with <10000 individual samples applied in known position
– Use of robotics
– Samples can be PCR products or oligos
– example: oligos complementary to each unique Tag
– example: oligo/PCR product complementary to each ORF
• Chip arrays
– silicon based
– >10,000 sequences
– http://www.affymetrix.com/index.html
• Redundancy
• fluorescent labels
Genomics: 68
One cell=
one specific sequence
AC
GT
AT
AC
GT
AT
AC
GT
AT
TG
CA
TA
TG
CA
TA
TG
CA
TA
LaserChip
ArraysIndividual
sequences &
bound sample
Genomics: 69
Signature Tagged
• Tags are short unique DNA sequences
• Tag linked to mutation
• Each individual mutant has unique tag
• Each mutant ORF has unique Tag
ORF X
Chromosomal Mutants
Genomics: 70
ORF X
Chromosomal MutantsMutant Pools
compare
condition „normal‟
functional role ?
Genomics: 71
Bar coding genes
mutant 2
mutant 3
mutant 4
and so on…
to mutant 1654.
mutant 1mutant-
specific DNA
sequence
“normal, un-mutated
Campylobacter
Genomics: 72
Which bar codes are missing?
• Which bar coded mutants are missing?
• Gene involved in process
mutant pool
post-treatment
mutant pool
copies of
barcodes
present
1 2 3 4……… 9 10
11
21
91 100
+ + + + + ++++++
++ + +
++ - - -
-
- +-
-
Bar code Array
+ + +
www.freedigitalphotos.net/
Reprinted by permission from Macmillan Publishers Ltd: [NATURE REVIEWS GENETICS]
(Mazurkiewicz et al. 7 929-939), copyright (2006)Genomics: 73
Interactome
Yeast 2 hybrid
Genomics: 74
http://en.wikipedia.org/wiki/Two-hybrid_screening
Which proteins can interact?
•Expression library of binding-
domain::protein 1 (bait)
•Expression library of activation-
domain::protein 2 (prey)
•Test combinations of all genome
orfs
•Which combinations turn on the
reporter gene?
Protein-protein interaction networks
Genomics: 75
Parrish et al. 2007. A proteome-wide protein interaction map
for Campylobacter jejuni. Genome Biol 8:R130.
Genomics: 76
Genomotyping or Genomic indexing
11 12 13 14
6 7 8 9
1 2 3 4
15
10
5
11 12 13 14
6 7 8 9
1 2 3 4
15
10
5
11 12 13 14
6 7 8 9
1 2 3 4
15
10
5
11 12 13 14
6 7 8 9
1 2 3 4
15
10
5
•Array of all known genes in microbe
•Genes 1, 2, 3 &14 forms minimal gene set
•Hybridise array with labelled chromosomal DNA
1
2
3
146
5
9
8
11
4
5
15
Isolate 1 Isolate 2 Isolate 3