GN3502: Bacterial Genetics Ken Forbes Medical Microbiology.
-
Upload
jordan-stevens -
Category
Documents
-
view
242 -
download
0
Transcript of GN3502: Bacterial Genetics Ken Forbes Medical Microbiology.
GN3502: Bacterial Genetics
Ken Forbes
Medical Microbiology
1. “Classical” bacterial genetics
2. New approachesPhysical mapping of genomes
Whole genome sequencing
Functional analysis
3. New perspectives on bacterial geneticsOrigin of species
Bacterial lifestyles
Lecture synopsis
“Classical” view of bacteria
• Single chromosome• May have plasmids and phage• Simple gene structure• Genes have recognisable phenotype• Can do genetics in lab
– gene transfer• transformation• transduction• Conjugation
– molecular biology
Classical methods are not adequate
• Bacteria live in many diverse habitats• Much diversity within a species• Most genes in most species have not yet been
identified
Have most of the genes in any species been identified?
• Traditional genetic and molecular methods have identified a function for only half of the genes in E. coli
• Constraints from– methodologies– many genes will not be expressed in the lab
• New approaches needed– genome oriented– sequence oriented
1. “Classical” bacterial genetics
2. New approachesPhysical mapping of genomes
Whole genome sequencing
Functional analysis
3. New perspectives on bacterial genetics
Origin of species
Bacterial lifestyles
Lecture synopsis
Lecture synopsis: 2. New approachesPhysical mapping of genomes
Methods: PFGE clone libraries
Discoveries: bacterial genomes size shape replicons
Whole genome sequencingMethods: sequencing strategies
Discoveries: gene organisation assigning function
Functional analysisDiscoveries: new genes
Methods: for individual genes for whole genomes DNA arrays proteome
Physical mapping of genomes
• Low resolution restriction enzyme maps of whole genome
• Locate genes on the map using DNA-based techniques
• PHYSICAL map of chromosome not a GENETIC map
• Restriction map whole chromosome with rare cutting REs– complete digests– partial– double digests
Pulsed-Field Gel Electrophoresis
EE
E
E
EEE
E
HH
H
1 Mb
S
S
HSE HS(H)
Molten agaroseCultured cells
Incubate with Proteinase K
Trapped HMW DNA
Embedded Cells
Inactivate Proteinase K & wash to remove cell
debris
Pulsed-Field Gel Electrophoresis
Digest with Rare-cutting restriction enzyme
+
Periodic Switching (pulsing) between
electrode pairs
Net migration
-
+
-
Pulsed-Field Gel Electrophoresis
Mapping genes on whole genome RE maps
E E
E
E
EEE
E
HH
H
S
S
| geneA
gene
B |
geneC |
• Hybridize cloned-gene DNA fragment to PFGE fragments– locate gene on map
Ordered clone librariesmethod
• Make clones of entire genome– Ø clones of whole genome
• Small (10’s kb) size of inserts means 1000’s clones required to cover whole chromosome
– Bacterial Artificial Chromosomes (BAC)• clone in E.coli F plasmid• large (100’s kb) size of inserts means fewer clones needed
• Order the clones into contigs– overlapping clones will cross hybridise
Ordered clone libraries
• Disadvantages– not all regions clonable – labour intensive and expensive
• Advantages– immortalised source of genomic DNA– minimally redundant– easy to find and sub-clone a gene of interest– identify adjacent genes– use in genome sequencing projects
Ordered clone librariesapplications
• E. coli K12– widely used lab strain
• Mycobacterium leprae– obligate human pathogen– not cultivable in vitro– genetic analysis impossible– ordered clone library allowed molecular genetic analysis
Physical mapping
• Pros – only need DNA of organism– standard molecular biology methods used
• Cons– low resolution– no phenotypic information about genes
Physical mapping of genomesMethods: PFGE clone libraries
Discoveries: bacterial genomes size shape replicons
Whole genome sequencingMethods: sequencing strategies
Discoveries: gene organisation assigning function
Functional analysisDiscoveries: new genes
Methods: for individual genes for whole genomes DNA arrays proteome
Lecture synopsis: 2. New approaches
Bacterial genomes come in many different sizes
• Range 0.6Mb – 9Mb• Bigger genomes encode more genes• < 2Mb specialist species
– restricted ecological niche (Mycoplasma)
– fastidious growth (Haemophilus influenzae)
– obligate intracellular parasites (Chlamidia)
• 3 – 5Mb generalist species – broad metabolic potential, few organic growth requirements
(E. coli)
• > 5Mb species with developmental cycles – (Streptomyces: mycelial growth, spores, complex bioactive
compounds)
Bacterial genomes come in different conformations
• Circular chromosomes– the traditional view: E. coli
• Linear chromosomes– Borrelia
• Plasmids– circular and linear forms
Bacterial genomes can have several chromosomes
• “Chromosomes must harbour some essential genes”– ribosomal RNA (rrn)
• “Plasmids should not be required for viability”– only encode supplementary functions– can be very large (1-2 Mb)
Bacterial genomes
• Most species have one chromosome– eg E. coli
• 1x circular chromosome with rrn, housekeeping genes
• Some species have 2 chromosomes (a few 3)– eg Agrobacterium tumefaciens
• 2x chromosomes each with rrn and housekeeping genes– 1x circular 3Mb– 1x linear 2Mb
• 2x plasmids, circular 200kb, 450kb
Physical mapping: conclusions
• Bacterial genomes are very variable– chromosome size, conformation, number– plasmids often very important, but not essential
• Genomes have a large coding capacity– this reflects bacterial biodiversity– there are many genes of unknown function– laboratory analysis imposes constraints on understanding
of many genes
• How can you identify all of the genes in a species?
Physical mapping of genomesMethods: PFGE clone libraries
Discoveries: bacterial genomes size shape replicons
Whole genome sequencingMethods: sequencing strategies
Discoveries: genome organisation identify genes
Functional analysisDiscoveries: new genes
Methods: for individual genes for whole genomes DNA arrays proteome
Lecture synopsis: 2. New approaches
Whole genome sequencing
• Whole genome sequences now available for– 300 bacterial species/ strains– most pathogens – representatives of most bacterial lineages
Haemophilus influenzae genomepublished 1995
Whole genome sequencing
• Advantages– inexpensive– all of genome seq’ available– all genes identified
• Requirements– automated DNA sequencing machines– massive computing power
• “Factory sequencing”
Fluorescent sequencing• DNA sequencing reaction
– Sanger terminator chemistry • nt chain extension until blocked by terminator nt
– terminator nt has fluorescent dye attached• each nt has different colour
Phases of sequencing project
• Primary sequencing phase– random accumulation of seq’ into contigs
• Linking phase– contigs linked together using directed sequencing
methods
• Polishing phase– removal of sequence ambiguities from the single
contig
• Finished sequence– analyse, annotate
Genome sequencing strategies
• Total-genome shotgun sequencing• Primer walking• Mixed strategy
Total-genome shotgun sequencing
• Shotgun cloning– shear DNA into random fragments of 1-5kb– clone into vector
• Sequencing primers in vector
vector
cloned insert
sequencingprimers
Total-genome shotgun sequencingadvantages
• Don’t require map of genome• Sequencing machines at continuous full capacity• Sequence polishing only done once• >’er accuracy through multiple coverage
– 6-10 fold genome equivalents
Total-genome shotgun sequencingdisadvantages
• Repeat coverage is wasteful• Can’t clone some genomic regions• Repetitive regions in genome
– can’t map each to its correct genomic position– prevents contigs from being joined together
• other methods required to span across each repeat
• Sequence assembly and analysis can only be done at end of sequencing phase
Primer walking
• Require ordered clone library• Primer walk along each cloned fragment
– first primer in vector• sequence into cloned DNA
– next primer in new seq’ • sequence further into cloned DNA
– start at each end of cloned fragment– cycles of:
sequencing
polishingprimer design
primersynthesis
Primer walking
• Advantages– high quality, useable sequence obtained from start– sequence produced in large contigs– no repeat coverage – both strands sequenced
• Disadvantages– many expensive primers needed– time lag between walks– little automation, sequencing machines often idle
Mixed strategy
• Most popular strategy• Combine advantages of both methods
– initial random- sequencing phase• on either whole genome or on set of ordered clones• typically 3-6 fold coverage
– final primer-walking over gaps
Ultrahigh throughput sequencing
• Sequencing by Synthesis – SBS– eg SOLEXA– generates short (18-35 base) reads
video of chemistry
Ultrahigh throughput sequencing
• Sequencing by Synthesis – SBS– template of tens of millions of individual, clonally
amplified DNA fragments– yields up to 1 gigabase sequence in total– avoids cloning steps– inexpensive: £500/ bacterial genome
Physical mapping of genomesMethods: PFGE clone libraries
Discoveries: bacterial genomes size shape replicons
Whole genome sequencingMethods: sequencing strategies
Discoveries: genome organisation identify genes
Functional analysisDiscoveries: new genes
Methods: for individual genes for whole genomes DNA arrays proteome
Lecture synopsis: 2. New approaches
Genome organisation
• Can identify– all protein and RNA coding genes– organisation of genes
• in genome • wrt each other
E. coli genome
• Traditional genetic and molecular methods have identified 2220 genes in E. coli
E. coli genome
• Whole genome sequencing has identified 4288 protein coding genes in E. coli genome
E. coli genome
genetic map = 100 min
physical map = 4.6Mb
1min = 46Kb
Genome organisation
• >90% of genome codes for genes• Genes
– identified in genome sequence by• Open Reading Frame (ORF)• homology to known genes in other spp
• Regulation of gene expression– promoter and ribosome binding site sequences– operons and linked genes
Identifying genes: by phenotype
• Genes traditionally identified by genetic analysis– Robust identification of gene by its function
Identifying genes: by DNA homology
• Identify gene by sequence homology• Need previously characterised gene in another
species– high homology between them– robust identification of the previously characterised gene– But new gene may have different biological role
Identifying genes: by Open Reading Frame
• ORF: “a DNA seq with no stop codons”• Only genes coding for proteins• Ends of the gene not easily defined
Bacterial genomes have many genes with no known function
• 60% of genes have a recognisable function– but the specific role of many are unknown
• 40% of genes have no known function– 10% found in other species
• conserved protein families• important housekeeping genes?
– 30% unique to each sp• determine pathogenicity, lifestyle
Physical mapping of genomesMethods: PFGE clone libraries
Discoveries: bacterial genomes size shape replicons
Whole genome sequencingMethods: sequencing strategies
Discoveries: genome organisation identify genes
Functional analysisDiscoveries: new genes
Methods: for individual genes for whole genomes DNA arrays proteome
Lecture synopsis: 2. New approaches
Assigning function to novel genes
• How do you determine the function of genes identified by seq’ rather than by phenotype?
• For individual genes use an appropriate molecular genetic technique– gene knockouts– conditional lethal mutations– control region probes
Assigning function to new genes
• Individual genes gene knockouts
conditional lethal mutations
control region probes
• Whole genome DNA arrays
proteome analysis
DNA arrays
• Macroarrays– DNA fragment probes (eg PCR product)– one per gene– array on membrane (103 s)
• Microarrays– oligonucleotide probes– several oligonucleotides per gene– array on glass (105 s)
DNA arrays
Colour = relative ORF expressionIntensity = extent ORF expression
Sample A Sample B
Expression in both samples
DNA arrays: applications
• Gene expression (mRNA)– transcriptome
• Presence/ absence genes (DNA)– genome polymorphisms
Proteomics
• 2D electrophoresis of cellular proteins– separate by charge then by size
– AA sequence spot of interest– refer back to genome sequence
• Characterisation of all expressed proteins
1. “Classical” bacterial genetics
2. New approachesPhysical mapping of genomes
Whole genome sequencing
Functional analysis
3. New perspectives on bacterial geneticsOrigin of speciesLifestyles
Lecture synopsis
Why have bacteria so many genes?
• 60% have recognisable function– specific role of many genes unknown
• eg only to enzyme class
• 40% have no known function– 10% common, conserved gene families– 30% unique to each species
Some genes are common to many species
• Conserved gene families • Presumably housekeeping genes• Potential targets for novel antibacterials
Some genes are unique to one species
• These genes give a sp its unique characteristics• Allow adaptation to a particular lifestyles• Virulence genes
How many genes does a pathogen need?
• Mycobacterium tuberculosis– mechanism of pathogenesis unknown– 4.4 Mb genome– 3994 genes
• 1/3 known function
• 1/3 similar proteins
• 1/3 unknown
in vivo300 genes not required
in vitro3000 genes not required
Some species are apparently “missing genes”
• Many pathogens have complex growth requirements
• Some functions or pathways absent– genes for some pathways eliminated
• nutrients supplied by host
– adaptation to niche• H.pylori lives in acidic environment of stomach
does not ferment sugars (acidic products) does ferment amino acids (alkaline products)
“Community Genomics Among Stratified Microbial Assemblages in the Ocean's Interior”
(2006) DeLong, et al Science 311, pp. 496-503
• Planktonic microbial communities in Pacific Ocean– sampled from ocean surface to sea floor– sequenced 64 million base pairs– thousands of new genes
• Variations in sequencs at different depths– near the ocean surface
• photosynthetic and mobile microorganisms• more genes for iron uptake
– deeps• a predominance of "adhesive" microbes• antibiotic synthesis genes
• Organisms do not live in isolation• Organisms interact with host/ environment• Organisms often dependent on each other
– nutrient flow through biological systems
• Use genomics to understand the interaction between spp at gene level
Bacteria are diverse
Bacteria are diverse
Stereo micrograph of dental plaque.Nutrient flow from cocci to filamentous bacteria.