Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative...

76
Genomics: 1 Genomics-sequencing of microbial genomes This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome content and organisation amongst microbes, and shows how to derive information on gene function across genome. Objectives for students: Expected to describe strategies involved in microbial genome sequencing and functional genomics Provide examples of information that can be derived from genomics

Transcript of Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative...

Page 1: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 1

Genomics-sequencing of microbial

genomes

This lecture illustrates the strategies used in microbial genome

sequencing projects, compares genome content and

organisation amongst microbes, and shows how to derive

information on gene function across genome.

Objectives for students:

• Expected to describe strategies involved in microbial genome

sequencing and functional genomics

• Provide examples of information that can be derived from

genomics

Page 2: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 2

Microbial Genome Sequencing

• Genome Sequencing Projects

– strategy & methods

– annotation

• Comparative genomics

– organisation

– gene content

• Functional genomics

– transcriptome

– proteome

– genome-wide mutation

• Concentrate on strategy & ideas

Page 3: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 3

Bacterial genome projects

• Many completed:

– Haemophilus influenzae

– Escherichia coli

– Bacillus subtilis

– Mycoplasma genitalium

– Helicobacter pylori (x2)

– Campylobacter jejuni

– Treponema pallidum

– Neisseria menigitidis

– Neisseria gonnorhoea

– Vibrio cholerae

– E. coli O157

• Good link to projects:

– http://www.tigr.org/

– http://www.ncbi.nlm.nih.gov/

– http://www.sanger.ac.uk/

– http://www.genomesonline.org/

Genome sequencing progress

• Complete:

– Archaeal: 70 (2007&2008: 49&55)

– Bacterial: 945 (554&728)

– (Eukaryal: 121) (76&97)

• Ongoing:

– Prokaryotic: 3498 Archaeal: 111

– (Eukaryotic: 1223)

• Metagenome projects: 200

Page 4: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 4www.genomesonline.org

Page 5: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 5

Microbial eukaryote projects

• Complete

– Yeast -Saccharomyces cerevisiae

– Plasmodium falciparum

– Aspergillus nidulans, A.niger, A.oryzae & A.fumigatus

– Trypanosoma cruzi & brucei

– Leishmania

– Entamoeba histolytica

– Giardia lamblia

– Candida albicans & glabrata

– Paramecium

• Underway

– Pneumocystis carinii

– Plasmodium vivax

– some complete chromosomes finished

– Other species and isolates from completed list

Page 6: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 6

Why bother? -To sequence or not to sequence

(considerations in the pre-genome era)

• piecemeal collection of sequenced genes

– slow

– costly

– ever complete?

• genome project

– rational approach

– efficient and rapid

– quality assurance

– address novel questions

• problems/issues

– ownership

– strain choice

– cost

– approach

– data release

– some now less relevant

• Post genomic era

– Comparative genomics

– Functional genomics

Page 7: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 7

Genome sequencing strategy

• Strategy choice

• large collaborative cosmid/BAC-based projects

– now better suited for larger genomes

– slow

• small insert shotgun approach

– centralised

– rapid and efficient

– choice for bacteria

• Strain choice

– fresh isolate vs lab strain

– clinical vs environmental

– subsequent genetic analysis

Page 8: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 8

Yeast genome sequence strategy

• Yeast chromosomes (16) individually sequenced

• several approaches used

• Make genome library in cosmids

• order cosmid library – which cosmid overlaps with which

– link cosmid to genome map

– produced tiled set of cosmids

– only sequence minimum number

• Use chromosome specific probe to identify chr-specific cosmids

• sequence cosmid inserts by subcloning

• Solve problems by direct PCR sequencing, walking and other libraries (lambda)

• Telomeres

Page 9: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 9

Tiled set

Page 10: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 10

c1A B

c2C D

c3E F

c4G H

c5I J

c1 c2 c3 c4 c5ABCDEFGHIJ

Ordering

Clones

Page 11: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 11

PH011

200100

80 100 120 140 160 180

70512

70449

70893

70515

70124

70266 7202

70265

70871

70463

Page 12: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 12

Whole genome/chromosome shot-

gun strategy (WGS)

• Rapid

• Generation of small insert genomic library

• Library is not initially ordered

• DNA sequence ends of inserts

• Depends on powerful computing to

assemble sequence reads

Page 13: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 13

Main steps in generating a complete genome

sequence

Isolation

Construction

Shotgun

sequencing

Finishing

Annotation

Minimum time

period (weeks)

2

4-6

2-4

12

12

Page 14: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 14

bacterial

chromosome

vector

plasmid

random

shearing

size selection

library

of

clones

sequence

end of

each clone

individual clones

Page 15: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 15

Assembly

Sequencing individual clones

genome sequence with gaps

Page 16: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 16

Automated sequencers: ABI 3700

• Made by Applied Biosystems

• Most widely used automated sequencers:

– 96 capillaries

– robot loading from 384-well plates

• Two to three hours per run

• 600–700 bases per run

96–well plate

robotic arm and syringe

96 glass capillaries

load bar

Page 17: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 17

Automated sequencers: MegaBACE• Made by Amersham

• 96 capillaries

• Robotic loading from

384–well plate

• Two to four hours per

run

• Can read up to 800

bases

Source : GE Healthcare Life Science, Uppsala, Sweden

Page 18: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 18

Automatic gel reading

• Top image: confocal

detection by the

MegaBACE sequencer

of fluorescently

labeled DNA

• Bottom image:

computer image of

sequence read by

automated sequencer

Page 19: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 19

Industrialization of sequencing

• Most genome

sequencing projects

divide tasks among

different teams

– Genome libraries

– Production sequencing

– Finishing

• Sequencing machines

run 24/7

• Many tasks performed

by robots

The Broad Institute of MIT and Harvard, www.genome.gov

Page 20: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 20

The future is here?..454 sequencing

Reprinted by permission from Macmillan Publishers Ltd: [NATURE] (Margulies et al., 437: 376

copyright (2005)

Page 21: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

454 sequencing: the system

Genomics: 21

DNA Library Preparation emPCR Sequencing

4.5 hours 8 hours 7.5 hours

•Well diameter: average 44μm

•400,000 reads obtained in parallel

•A single cloned amplified sstDNA

bead is deposited per well

•4 bases (TACG) cycled 100 times

•Chemiluminescent signal generation

•Signal processing to determine base

sequence and quality score

Source :454 Sequencing © Roche Diagnostics

Page 22: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 22

WGS: Just how much effort?• individual sequencing reads accumulate

– each read about 500bp

– computing used to assemble reads

– contiguous sequences called contigs

• Aim for 8-10 read coverage of genome for accuracy

• example:

– H.influenzae

• 19,687 templates

• 24,304 reads assembled

• 11,631,485 bp

• 9

Page 23: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 23

Sequencing a genome

vgecisahubofevaluatedgeneticsrelatedresourcesforteachershealthprofessionalsandgeneralpublic

contiguous sequence

luatedgeneticsrel

tatedgene

ourcesforteachcisahubofevaluatedgenc

hprofessionalsandgeneralpub hprofessionalsandgeneralpub

cisahubofevaluatedgen

esforteachershealt

cisahubofevaluatedgenc chershealthprofession

luatedgeneticsrel

esforteachershealt

atedgene

ourcesforteach

chershealthprofession

atedgene

fragments of sequence luatedgeneticsrel ourcesforteach

chershealthprofession

vgecisahubofbofevaluatedgenetics

icsrelatedresourcesforteachershealthlthprofessionalsandgeneralp

generalpublicoverlaps

Page 24: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 24

Gaps

Physical Gap

Sequence Gap

Genome

Library cloneSequence read

contig

Page 25: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 25

Bridging Gaps

• rise in contig number as amount of reads increases

• steady fall as accumulating sequence bridges gaps between contigs

• levels off as new reads more likely in known contig than gap

• start finishing

Number of reads

Num

ber

of

conti

gs

1

rapid gap bridging

difficult gap bridging

Finishing

Page 26: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 26

Finishing

• Why are gaps present?

• Gap bridging

– sequence gaps

• sequence gaps –choose appropriate clone and walk

– physical gaps

• alternative libraries (which?)

• PCR across gap

• Mistakes/poor sequence

– areas where sequence reads are less than 8-10

– repeated sequences -rRNA

• closure and completion

Page 27: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 27

Finished Yet?atgaatccaagccaaatacttgaaaatttaaaaaaagaattaagtgaaaacgaatacgaaaactatttatcaaatttaaaattcaacgaaaaacaaagcaaagcagatcttttagtttttaatgctccaaatgaactcatggctaaattcatacaaacaaaatacggcaaaaaaatcgcgcatttttatgaagtgcaaagcggaaataaagccatcataaatatacaagcacaaagtgctaaacaaagcaacaaaagcacaaaaatcgacatagctcatataaaagcacaaagcacgattttaaatccttcttttacttttgaaagttttgttgtaggggattctaacaaatacgcttatggagcatgtaaagccatagcacataaagacaaacttggaaaactttataatccaatctttgtttatggacctacaggacttggaaaaacacatttacttcaagcagttggaaatgcaagcttagaaatgggaaaaaaagttatttacgctaccagtgaaaatttcatcaacgattttacttcaaatttaaaaaatggttctttagataaatttcatgaaaagtatagaaactgcgatgttttacttatagatgatgtacagtttttaggaaaaaccgataaaattcaagaagaatttttctttatatttaatgaaatcaaaaataacgatggacaaatcatcatgacttcagacaatccacccaacatgctaaaaggtataaccgaacgcttaaaaagtcgttttgcacatgggatcatagctgatataactccacctcaactagatacaaaaatagccatcataagaaaaaaatgtgaatttaacgatatcaatctttctaatgatattataaactatatcgctacttctttaggggataatataagagaaatcgaaggtatcatcataagtttaaatgcttatgcaaccatactaggacaagaaatcacactcgaacttgccaaaagtgtgatgaaagatcatatcaaagaaaagaaagaaaatatcactatagatgacattttatctttggtatgtaaagaatttaacatcaaaccaagcgatgtgaaatccaataaaaaaactcaaaatatagtcacagcaagacgcattgtgatttacctagctagggcacttacggctttgactatgccacaacttgcgaattattttgaaatgaaagatcatacagctatttcacataatgttaaaaaaatcacagaaatgatagaaaatgatgcttctttaaaagcaaaaatcgaagaacttaaaaacaaaattcttgttaaaagtcaaagttaagtgaaaggatgtgaaaaataaattctagagtgtgaaaaaaagaaattaagcaaagtatgataaaatacaaatttgattattttgctttgaaaaatttcacaatttcaacaagcttattattacaacgaatttaaaattaaaataaaccaaggagaaaaaatgaagttaagtatcaataaaaatactttagaatctgcagtgattttatgtaatgcttatgtagaaaaaaaagactcaagcaccattacttctcatcttttttttcatgctgatgaagataaacttcttattaaagctagtgattatgaaataggtatcaactataaaataaaaaaaatccgcgtagaatcaagtggttttgctactgcaaatgcaaaaagtattgcagatgttattaaaagcttaaacaatgaagaagttgttttagaaaccattgataattttttatttgtaagacaaaaaagtacaaaatacaaacttcctatgtttaatcatgaagattttccaaattttccaaatacagaaggaaaaaaccaatttgacattgattcaagtgatttaagccgttctcttaaaaagatattaccaagtattgatacaaataacccaaaatactccttaaatggtgcatttttagatataaaaacagataaaattaacttcgtaggaactgatacaaaacgccttgcaatctatactttagaaaaagcaaataatcaagaatttagttttagtatccctaaaaaagctattatggaaatgcaaaaacttttctatgaaaaaatagaaattttttatgatcaaaatatgcttattgccaaaaatgaaaattttgaattctttacaaaacttatcaatgataaatttccagattatgaaaaagttataccaaaaactttcaaacaagaactcagtttttcaactgaagattttatagatagtcttaaaaaaatcagcgttgtaactgaaaaaatgagacttcattttaacaaagataaaatcatctttgaaggtataagtttagacaatatggaagcaaaaacagaacttgaaattcaaacaggagtaagtgaagaatttaatcttactataaaaatcaaacatttacttgatttcttaacttctatagaagaagaaaaattcactttaagtgtaaatgaacctaattcagcatttatagtcaaatcccaaggactatcaatgattatcatgcctatgattttgtaataaaacaagtaaaagataaaggaaaaatatgcaagaaaattacggtgcgagtaatattaaagtcctaaaaggcttagaagctgttagaaaacgcccaggtatgtatataggagatacaaacataggcggacttcatcatatgatttatgaagttgtggataattctatcgatgaagctatggcaggacattgcgatactatagatgtagaaatcactactgaaggaagctgtatagttagtgataatggtcgtggtattcctgttgatatgcacccaactgaaaatatgccaactttaactgttgttttaactgtcctacatgcagggggaaaattcgataaagatacttataaagtttcaggcggtttgcacggtgttggggtttcggttgtaaatgcactctctaaaaaacttgtagctacagttgaaagaaatggagaaatttatcgtcaagaattttcagaaggtaaagttatcagtgaatttggtgtgataggaaaaagtaaaaaaacaggaacaactatagaattttggcctgatgatcaaatttttgaagtgactgaatttgattatgaaattttggctaaaagatttcgtgaacttgcatacttaaatccaaaaatcactataaattttaaagataaccgcgtaggcaaacatgaaagttttcactttgaaggtggaatttctcagtttgttacagacttaaataaaaaagaagctttaactaaagcaattttctttagtgtagatgaagaagatgtgaatgttgaagtagctttgctttacaatgatacttatagtgaaaatttactctcttttgtaaataatattaaaaccccagatggtggaacacacgaagctggttttagaatgggtttaactcgtgtgataagtaactatatagaagcaaatgcaagtgctagagaaaaggataataaaatcacgggtgatgatgtgcgtgaaggtttgatcgctattgtgagtgtaaaggtacctgaaccacaatttgaaggacaaaccaaaggaaaacttggttcaacttatgtgcgtcctatagtttcaaaagcaagttttgagtatttgactaaatattttgaagaaaatcctatcgaagctaaagctataatgaataaagctttaatggcagctagaggaagagaagcagcgaaaaaagctagagaattaacgcgcaaaaaagaaagtttaagcgtaggaactttaccagggaaattagctgattgtcaaagtaaagatccaagtgaaagtgaaatttatcttgtggaaggggattctgcaggaggttctgcaaaacaaggtagagaaagatctttccaagctatactgcctttgcgtggtaaaattttaaatgttgaaaaagcaagactagataaaattttaaaatctgagcaaattcaaaatatgattaccgcttttggctgtggtataggtgaagattttgatctttcaaaacttagatatcataaaatcatcatcatgacagatgcggatgttgatggatctcatatacaaaccttgcttttaactttcttcttccgttttatgaatgaacttgtggcaaatggacatatttatctagcacaaccacctttatatctttataaaaaagctaaaaagcaaatttatttaaaagatgaaaaagctttgagcgaatacctgatagaaacgggaatagaaggtttaaactatgaaggtataggaatgaatgatttaaaagattatttaaaaatcgttgcagcttatcgtgcgattttaaaagatcttgaaaagcgttttaatgtgatttctgtgatacgctatatgatagaaaattcaaatttagttaaaggaaataatgaagaattatttagtgtaatcaaacaatttttagaaacacaaggacacaatatcttaaatcattatatcaacgaaaatgaaattcgagctttcgttcaaactcaaaatggcttagaagaacttgtgatcaatgaagaacttttcactcatccactatatgaagaagcgagttatatttttgataagattaaagatagaagcttggaatttgataaagatattttagaagttcttgaagatgttgaaaccaatgctaaaaaaggtgctactatacaacgctataaaggtttaggggaaatgaatcctgagcaactttgggaaaccacaatggatccaagcgtaagaagacttttaaaaatcactattgaagatgcacaaagtgcaaatgatacctttaatctctttatgggtgatgaggttgaaccaagacgcgattatatccaagcgcacgctaaagatgtaaagcatttggatgtgtaaaaatttatcattgaagaaatcatttcttcaatgagttttgttttgtaagagtatagctagaggaattcttcttcttgtatcgtatttttctccataatatttttcaagataatttaaaattttttcttcatcttcaggttctatttcccaaagtccttcactatcttgcatccatcttatagctgctaaccaagcttttctacttgcatgcatattggtaatgagattggatccatgacaagctaaacaatttgcttccactaaaggtgaatcaggatcgataatcaatcctgtatcagggttaatttcaagattttgagcccaacttgcacttaaaaacaatgctaagatcaatataatttttttcatacttaaactccataaacattaactctatggcatgcattattgatatatcctcctggattccactgtgctaaaaccataggttgactgttaccttgactatcgatagctcttgcccaaatttcataatatccttttgttggtattgatatttgagcactccatttttgccatgctaatctatttaatggtttttctacctttgc ………………….

Page 28: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 28

Sequencing a genome

VGEC is a hub of evaluated genetics related resources for teachers, health professionals and general public.

annotation

vgecisahubofevaluatedgeneticsrelatedresourcesforteachershealthprofessionalsandgeneralpublic

contiguous sequence

luatedgeneticsrel

tatedgene

ourcesforteachcisahubofevaluatedgenc

hprofessionalsandgeneralpub hprofessionalsandgeneralpub

cisahubofevaluatedgen

esforteachershealt

cisahubofevaluatedgenc chershealthprofession

luatedgeneticsrel

esforteachershealt

atedgene

ourcesforteach

chershealthprofession

atedgene

fragments of sequence luatedgeneticsrel ourcesforteach

chershealthprofession

vgecisahubofbofevaluatedgenetics

icsrelatedresourcesforteachershealthlthprofessionalsandgeneralp

generalpublicoverlaps

Page 29: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 29

Genome Annotation

• Find ORFs

– look for ATG-Stop (+alternatives)

– over certain size

– overlaps

– computer based (“Glimmer” & “Orpheus”) and trained eye.

• ORF function

– Search databases with predicted translated sequences –BLASTX

– Consider level of similarity and context

– Domain comparisons

• Pfam/Prosite

• Other features

Page 30: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 30

www.yeastgenome.org

Page 31: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 31

http://mips.gsf.de/genre/proj/yeast/index.jsp

http://www.yeastgenome.org/MAP/GENOMICVIEW/GenomicView.shtml

Page 32: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 32

Artemis: sequence viewer and annotation tool from the Sanger

Centre (http://www.sanger.ac.uk/Software/Artemis/)

Page 33: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 33

Page 34: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 34

Page 35: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 35

http://xbase.bham.ac.uk/

xBASE is a database for comparative genome analysis of all

bacterial genome sequences

Chaudhuri RR, Pallen MJ. xBASE, a collection of online

databases for bacterial comparative genomics. Nucleic Acids

Res. 2006 Jan 1;34(Database issue):D335-7.

Page 36: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 36

Coordinator

DNA

Shotgun

sequences

Finishing

instructions

Shotgun

templates

Annotation

tasks

Finishing sequences

Bioinformatics Lab

Annotations

SS

S

S

S

S

S

SS

S

S

S

S

S S S

S

S S

S

S

S

S

S

S

SS S

S

SS

SS

Working draft

sequence

Finished

sequence

Finished annotated

sequence

A conceptual diagram of the flux and information in a network-

based genome-sequencing project

Page 37: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 37

Post Genome Sequence• Comparative genomics

– comparing genome organisation and content

– genome size

– genome repeats/Tn/phages

– gene content

– minimal gene content

• Functional genomics –ascribing gene function across a genome

– gene function –knowns

– phenotype prediction

– gene function –unknowns

– investigating function

• Bacteria-Yeast

Page 38: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 38

Bacteria: Does size matter?

• Link genome size to adaptive capability

– biosynthetic capability

• synthesis of nutrients

– Stress resistance

• resist environmental insults

– structural complexity

• surface structures, sporogenesis

– Regulation –sensing signals and transcriptional responses

• detect change or requirement and respond appropriately

• transcriptional regulation

Page 39: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 39

Not just Size but how you use it…..

• Small genomes

– Mycoplasma genitalium

• 580,070 bp

• smallest genome for self-replicating organism

• free living but only just..infects host cells (guess which!)

• few biosynthesis and regulatory systems

• has replication & transcription & translation, metabolism etc functions

– Borrelia burgdorferi

• 910,725 bp

• Lyme disease

• few cellular biosynthetic systems

– Mycoplasma pneumoniae (0.8 Mbp); Chlamydia trachomatis (1.0 Mbp);

Page 40: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 40

bigger genomes

• Haemophilus influenzae

– 1.830 Mbp

– colonises human respiratory tract

– limited environment

• Helicobacter pylori

– 1.667 Mbp

– colonises human stomach

– limited environment

• Campylobacter jejuni

– 1.641 Mbp

– colonises intestine

– limited environment

Page 41: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 41

and bigger….• Escherichia coli (K-12)

– 4.639 Mbp

• Bacillus subtilis

– 4.214 Mbp

– soil/plant organism

– secondary metabolites

• Pseudomonas aeruginosa

– incomplete (5.9 Mbp)

• Yersinia pestis (4.4 Mbp)

• Clostridium spp (4-5 Mbp)

• Mycobacterium tuberculosis

– 4.411 Mbp

– slow growing (double in 24h)

– large proportion of genome on lipid metabolism

• Streptomyces coelicolor (~8 Mbp)

– secondary metabolites –antibiotics!

Page 42: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 42

Organisation• Linear chromosomes

– Borrelia burgdorferi

– Streptomyces coelicolor

• Multiple chromosomes

– Vibrio cholerae

• Plasmids

– Borrelia burgdorferi

– 17 linear & circular plasmids

– 50% genome size

– plasmid replication, “decaying genes”, ?Ag variation

• Transposons, IS elements, phages

– found in most genomes

– Campylobacter has none

• Repeats

Page 43: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 43

Replication

• Origin (oriC) and termination (terC) of replication

– OriC often near dnaA gene (replication initiation

protein)

– In Borrelia burgdorferi (linear) oriC (& dnaA) in centre

• strand bias

– which strand is each gene on?

– transcription in same direction as replication –more

efficient

– variation in level of strand bias

• Mt 55% vs Bs 75%

Page 44: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 44

Gene Content

• Annotation

– sequence similarity

• gene families

• regulators, transport, biosynthesis

– domain matches

• trans-membrane domains, DNA binding

• Paralogues and Orthologues

– Paralogues:

• Members of same family (homologous) in same genome.

• Likely to have different exact function

– Orthologues:

• homologues (same family) in different genomes

• May have identical function

Page 45: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Vibrio cholerae as predicted by genome........

Reprinted by permission from Macmillan Publishers Ltd: [NATURE]( Heidelberg et al, 406 ,477-483), copyright (2000)Genomics: 45

Page 46: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 46

Gene content (cont.)• ORFans

– significant proportion of genome contains ORFs of unknown function

– some may be orthologues of unknowns in other organisms

– some unique to organism

• important for biology of organism

– examples:

• H.influenzae: 42%

• H.pylori: 33%

• E.coli: 38%

• M.tuberculosis: 60% to 16%

– number decreasing

• Gene size –most about 1kb

Page 47: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 47

Genomic

rearrangements

• Example

comparison

• Comparison

of:

S.e Typhi

CT18 with

S.e Typhi

Ty2

• inversion that

spans

terminushttp://www.sanger.ac.uk/resources/software/act/

Page 48: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 48

Variation by gain and/or loss• Core regions

– shared by closely related species

• Additional “flexible” gene pool– variable regions

– acquired from mobile genetic elements

• First described as pathogenicityislands

– in non-pathogens too

– wider role

• Genomic Islands– pathogens

– commensals

– symbionts

– environmental

• Gain of GI sometimes assoc with gene loss

– reduction in obligate intracellular pathogens

• Genome organisation as well as genome content correlates with microbial lifestyle

Genome reduction by

deletion events

Gene acquisition

by HGT

Mutations

rearrangements

Common bacterial ancestor

Intracellular bacterium,

obliagate intracellular

pathogen, endosymbiont

Extracellular bacterium,

facultative pathogen,

symbiont

All lifestyles

GEIPlasmid

Page 49: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 49

Other tRNA-associated elements:

tRNAPProL

Black arrows=Sal+Ec; white arrows=Sal or Ec; grey=strain/serovar specific

GC is for S. Typhi

Infection and Immunity, May 2002, p. 2351-2360, Vol. 70, No. 5

Page 50: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 50

Other tRNA-associated elements:

tRNAArgU

Infection and Immunity, May 2002, p. 2351-2360, Vol. 70, No. 5

Page 51: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

The supragenome• The distributed-genome hypothesis (DGH)

• Bacteria have a (supra) genome much larger than

the genome of any single bacterium.

• Core and non-core gene sets

– Example: Hiller et al. sequenced 8 strains of

Streptococcus pneumoniae + 9 already available

– Core set of genes in all strains

– 20-30% genes non-core (not present in all strains)

• Genetic recombination generates diversity across

strains.

• Also for Haemophilus influenzae (Hogg et al.)

– ~1400 in core set and ~1300 non-core in subset of strains

Genomics: 51

•Hiller et al. Journal of

Bacteriology, November 2007, p.

8186-8195, Vol. 189, No. 22

•Hogg et al. Genome Biology

2007, 8:R103 (doi:10.1186/gb-

2007-8-6-r103).

Page 52: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 52

Yeast• 16 chromosomes totalling 12.068Mbp

• 5885 orfs –6275 but 390 unlikely translated

• Few introns ~4%

• Avg gene size 2kb (worm ~6kb and human >30kb)

• GC vary along chr length

– low GC at telomere & centromere

– GC rich correlate with higher recombination

• Tn and remnants in genome

– evidence of hotspots

• 50% orfs known function

– some exact role unclear

• http://genome-www.stanford.edu/Saccharomyces/

• http://mips.gsf.de/projects/fungi

Page 53: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 53

Functional genomics

• Functional genomics –ascribing gene function across a

genome

• function and inter-relationships

• strategy

• [bioinformatic analysis -gene identification]

• Transcriptome -expression pattern

– Proteome -expression pattern

– Mutantome -mutant phenotype

– Interactome –protein-protein interactions

GENOME

TRANSCRITOME

RNA

Copies of the

active protein-

coding genes

PROTEOME

The cell’s

repertoire

Page 54: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 54

Arrays: micro and chip• Microarrays

– Glass slides with <10000 individual samples applied in known position

– Use of robotics

– Samples can be PCR products or oligos

– example: oligo/PCR product complementary to each ORF

• Chip arrays

– silicon based

– >10,000 sequences

– http://www.affymetrix.com/index.html

• Redundancy

• fluorescent labels

Page 55: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 55

One cell=

one specific sequence

AC

GT

AT

AC

GT

AT

AC

GT

AT

TG

CA

TA

TG

CA

TA

TG

CA

TA

LaserChip

ArraysIndividual

sequences &

bound sample

Page 56: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 56

Transcriptome

• Genome-wide determination of expression

level of each ORF

• when expressed relates to role

• also assess mutants

• compare expression of each ORF in

different conditions

• Genome wide expression maps

• global patterns of expression

Page 57: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 57

AGGCAT AATGAA When expressed?

mRNAs

2 x ORF

Bacillus genieae

AATGAA

AGGCAT

orf 1 orf 2

orf 2orf 1

grow in conditions

when only orf 2

expressed

isolate mRNAs

and make cDNA

copy

AATGAA

TTACTT

TTACTT

Page 58: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 58

extract

mRNA

Grow under

different

conditions

Probe array with labelled copy of mRNA

Page 59: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 59

Differentially labelled probes

Red

channel

Green

channel

Combined

Page 60: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 60

http://www.bio.davidson.edu/courses/genomics/chip/chip.html

Page 61: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 61

Expression profiling C. jejuni in low

iron

Cj1659 (P19)

Cj0177

Cj0037c

Page 62: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 62

Proteome

• Genome-wide determination of protein expression

• Gives information stimulons

• protein expression linked to function

• assess mutants (regulatory mutants affect several proteins)

• Grow bacteria under defined conditions

• Extract proteins

• 2D-gel electrophoresis

• Protein spot identification

• Mass Spectrometry

• peptide size predictions from Genome data

Page 63: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 63

Defining the Campylobacter

proteome –chasing spots

Which protein? Which conditions?

Which other

proteins are co-

expressed?

Page 64: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 64

C. jejuni iron example

Page 65: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 65

digest

with

protease

pIM

ol

mass

Mass Spec

* * ***

http://depts.washington.edu/yeastrc/pages/ms.html

Page 66: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 66

Mass Mutagenesis: mutantome

• Mutate every ORF in genome

– organism specific technology

• High throughput analysis of phenotype

– need to analyse many 1000s of mutants under many

conditions

• Signature-tagged technology

– enables analysis of mutant pools

– requires array technology for genome-wide projects

• Association on ORF with mutant phenotypes

• Regulators might be pleiotropic

Page 67: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 67

Arrays: micro and chip• Microarrays

– Glass slides with <10000 individual samples applied in known position

– Use of robotics

– Samples can be PCR products or oligos

– example: oligos complementary to each unique Tag

– example: oligo/PCR product complementary to each ORF

• Chip arrays

– silicon based

– >10,000 sequences

– http://www.affymetrix.com/index.html

• Redundancy

• fluorescent labels

Page 68: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 68

One cell=

one specific sequence

AC

GT

AT

AC

GT

AT

AC

GT

AT

TG

CA

TA

TG

CA

TA

TG

CA

TA

LaserChip

ArraysIndividual

sequences &

bound sample

Page 69: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 69

Signature Tagged

• Tags are short unique DNA sequences

• Tag linked to mutation

• Each individual mutant has unique tag

• Each mutant ORF has unique Tag

ORF X

Chromosomal Mutants

Page 70: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 70

ORF X

Chromosomal MutantsMutant Pools

compare

condition „normal‟

functional role ?

Page 71: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 71

Bar coding genes

mutant 2

mutant 3

mutant 4

and so on…

to mutant 1654.

mutant 1mutant-

specific DNA

sequence

“normal, un-mutated

Campylobacter

Page 72: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 72

Which bar codes are missing?

• Which bar coded mutants are missing?

• Gene involved in process

mutant pool

post-treatment

mutant pool

copies of

barcodes

present

1 2 3 4……… 9 10

11

21

91 100

+ + + + + ++++++

++ + +

++ - - -

-

- +-

-

Bar code Array

+ + +

www.freedigitalphotos.net/

Page 73: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Reprinted by permission from Macmillan Publishers Ltd: [NATURE REVIEWS GENETICS]

(Mazurkiewicz et al. 7 929-939), copyright (2006)Genomics: 73

Page 74: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Interactome

Yeast 2 hybrid

Genomics: 74

http://en.wikipedia.org/wiki/Two-hybrid_screening

Which proteins can interact?

•Expression library of binding-

domain::protein 1 (bait)

•Expression library of activation-

domain::protein 2 (prey)

•Test combinations of all genome

orfs

•Which combinations turn on the

reporter gene?

Page 75: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Protein-protein interaction networks

Genomics: 75

Parrish et al. 2007. A proteome-wide protein interaction map

for Campylobacter jejuni. Genome Biol 8:R130.

Page 76: Genomics-sequencing of microbial genomes · Genomics: 37 Post Genome Sequence •Comparative genomics –comparing genome organisation and content –genome size –genome repeats/Tn/phages

Genomics: 76

Genomotyping or Genomic indexing

11 12 13 14

6 7 8 9

1 2 3 4

15

10

5

11 12 13 14

6 7 8 9

1 2 3 4

15

10

5

11 12 13 14

6 7 8 9

1 2 3 4

15

10

5

11 12 13 14

6 7 8 9

1 2 3 4

15

10

5

•Array of all known genes in microbe

•Genes 1, 2, 3 &14 forms minimal gene set

•Hybridise array with labelled chromosomal DNA

1

2

3

146

5

9

8

11

4

5

15

Isolate 1 Isolate 2 Isolate 3