Whole genome sequencing of bacteria & analysis
-
Upload
drelamuruganvet -
Category
Documents
-
view
803 -
download
4
Transcript of Whole genome sequencing of bacteria & analysis
WHOLE GENOME SEQUENCING OF BACTERIA & ANALYSIS
ELAMURUGAN. A
Ph.D Scholar,
Vet. Immunology
INTRODUCTION 1977 - first complete genome to be sequenced was
bacteriophage X174 - 5386 bp 1995 - first complete genome sequence from a free
living organism - Haemophilus influenzae (1.83 Mb) by whole genome shotgun approach
Sanger & Coulson (1977) - used chain-terminating dideoxynucleotide analogues
Maxam & Gilbert (1977) chemical degradation DNA sequencing - terminally labeled DNA fragments were chemically cleaved at specific bases and separated by gel electrophoresis
http://www.genomesonline.org/cgi-bin/GOLD/sequencing_status_distribution.cgi
429
Genome online database (GOLD)
ARCHON X PRIZE
X PRIZE Foundation in Santa Monica, CA, has introduced the Archon X PRIZE for Genomics and will award a sum of $10 million to the first team that can design a system capable of sequencing 100 human genomes in 10 days
SEQUENCING TECHNOLOGY
First generation Sanger’s dideoxy chain terminating tech Maxam & Gilbert chemical degradation tech
Next generation sequencing (NGS) 454/Roche - pyrosequencing Illumina/ Solexa - reversible dye terminators SOLiD /ABI- sequential ligation of oligonucleotide
probes
Second generation HT-NGS – sequencing after amplification
Heliscope SMRT (Pacific biosciences) Single molecule real time (RNAP) sequencer Nanopore DNA sequencer Ion Torrent sequencing technology (PostLight) VisiGen biotechnologies – FRET
Advantages of 3rd generation HT-NGS over 2nd higher throughput faster turnaround time longer read lengths higher consensus accuracy small amounts of starting material low cost
Third generation HT-NGS - Single molecule sequencing
ADVANTAGES OF HT-NGS
Massive parallel sequencing of hundreds of thousands or millions of templates
Preliminary and tedious cloning work is eliminated and substituted by PCR amplification
Most recent technologies, even PCR is eliminated, because single DNA molecules
Economic Reduced time
DISADVANTAGES OF HT-NGS
Most NGSTs produce short reads Constructions of fragment libraries remain tricky
and involve several steps of fragmentation, adaptor ligation and PCR amplification
Short homopolymers with the 454 technology Modified nucleotides cause mis-incorporation or
block further incorporation if the florescent moiety cannot be completely removed
Assembly of short reads into longer sequences
Illumina/ Solexa technology
zero-mode waveguides(ZMWs)
Selection of a technology for an experiment
GENOME ASSEMBLY
Assemblers can join sequences together based on overlapping regions between the sequences
Composed of contigs and scaffolds Contigs - contiguous consensus sequences that are
derived from collections of overlapping reads Scaffolds - ordered and orientated sets of contigs
that are linked to one another by mate pairs of sequencing reads
N50 - basic statistic for describing the contiguity of a genome assembly. The longer the N50 is, the better the assembly
Alignment against a reference genome sequence
De novo assembly Construction of longer sequences, such as contigs or genomes, from shorter sequences, such as sequence reads, without prior knowledge of the order of the reads or reference to a closely related sequence
GENE PREDICTION
Ab initio gene prediction - mathematical models rather than external evidence (such as EST and protein alignments) to identify genes and to determine their intron–exon structures
Evidence-driven gene prediction - using ESTs, can be used to identify exon boundaries unambiguously. Great potential to improve the quality of gene prediction in newly sequenced genomes. ESTs and proteins must first be aligned to the genome
Commonly used tools for gene prediction in prokaryotes Glimmer, GeneMark
GENOME ANNOTATION
Is the extraction of biological knowledge from raw nucleotide sequences
Seeks to identify every potential protein coding gene (ORFs)
Used to compare in available database like BlastP
‘Structural’ genome annotation is the process of identifying genes and their intron–exon structures
‘Functional’ genome annotation is the process of attaching meta-data such as gene ontology terms to structural annotations
APPLICATIONS
Very large no of short reads help to identify single nucleotide polymorphisms (SNP) when comparing them in reference genome
Identification of rearrangements, deletions, insertions, inversions
Used to generate expressed sequence tags (EST) from RNA sequencing
Also to detect small regulatory RNAs Illumia technoloy - ChIP Seq to study protein - DNA
interactions Metagenomics
LEADS TO DEVELOPMENT
Functional genomics Comparative genomics Environmental genomics (Metagenomics)
FUNCTIONAL GENOMICS Reveals genome structure and its functional relation Orthologs - they represent genes derived from a
common ancestor that diverged because of divergence of the organism, tend to have similar function
Paralogs are homologs produced by gene duplication and represent genes derived from a common ancestral gene that duplicated within an organism and then diverged, tend to have different functions
Xenologs are homologs resulting from the horizontal transfer of a gene between two organisms. The function of xenologs can be variable, depending on how significant the change in context was for the horizontally moving gene. In general, though, the function tends to be similar
PHYLOGENETIC ANALYSIS Phylogenetic trees, which are used to classify the
evolutionary relationships between homologous genes represented in the genomes of divergent species
Internal Nodes orDivergence Points
Branches or Lineages A
B
C
D
E
Terminal Nodes
Ancestral Node or ROOT of
the Tree
COMPARATIVE GENOMICS
Comparison of genome sequences reveals much information about genome structure and evolution, including importance of lateral gene transfer
Tool to discover how microbs adapted to particular ecology and in development of new therapeutic agents
METAGENOMICS
Genomics-based study of genetic material recovered directly from environmentally derived samples without laboratory culture and compared with all previously sequenced genes
Enable how microbs adapt extreme environments which help to discover new metabolic pathway and protective mechanisms
IMPACT OF GENOME SEQUENCING
Revealed genome reduction in I/C bacteria Genome plasticity (rearrangements, mobile elements) Gene duplication and diversification of protein
function Lateral gene transfer & acquisition of new functions Adaptation to environments, virulence Industrial process - fermentation tech, Bioremediation Biotransformation Development of vaccines Bacterial diversity Synthetic biology Epigenetics
REVERSE VACCINOLOGY
Use of genomic sequence information to identify novel and better suited protein candidates for vaccine
Serogroup B Neisseria meningitidis – based on genomic data all proteins predicted to be surface exposed, therefore accessible to antiobodies
Suitable candidates selected after sequencing various strains
Streptococcus agalactiae
Pan-genome composed of core genome, the genes present in all sequence strains and the dispensable genome made of genes present in a subset of strains
Synthetic biology - from sequence of entire genome to synthesize genes de novo
Identification of minimal genome, the smallest set of genes that enbles life - Mycoplasma genitalium
DATABASES AND TOOLS RELATED WITH BACTERIAL GENOMIC DATA NCBI Entrez Genome Project database:
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db = genomeprj A searchable collection of complete and incomplete (in-progress) large-
scale sequencing, assembly, annotation, and mapping projects for cellular organisms
NCBI, Bacteria Genome Database: http://www.ncbi.nlm.nih.gov/genomes/static/eub.html The Genome database provides views for a variety of genomes, complete
chromosomes, sequence maps with contigs, and integrated genetic and physical maps
Bacterial Genomes at The Sanger Institute: • http://www.sanger.ac.uk/Projects/Microbes/• This web contains a list of funded, on-going, or completed projects of
pathogens sequenced at this institute TIGR Comprehensive Microbial Resource (CMR):
http://cmr.tigr.org/tigr-scripts/CMR/CmrHomePage.cgi A free website displaying information on all the publicly available,
complete prokaryotic genomes
GOLD: Genomes OnLine Database: http://www.genomesonline.org/ A genome database containing information about which genomes
have been sequenced or are in progress Microbial Genome Database for Comparative Analysis (MBGD):
http://mbgd.genome.ad.jp/ A database for comparative analysis of completely sequenced
microbial genomes Virulence Factors of Bacterial Pathogens (VFDB):
http://zdsys.chgb.org.cn/VFs/main.htm VFDB is an integrated and comprehensive database of virulence
factors for bacterial pathogens Genome Information Broker:
http://gib.genes.nig.ac.jp/ A comprehensive data repository of complete microbial genomes in
the public domain. Many microbial genomes can be explored graphically
Islander, a Database of Genomic Islands: http://www.indiana.edu/~islander This database contains genomic islands discovered in completely
sequenced bacterial genomes
GenoList genome browser at Institute Pasteur: http://genolist.pasteur.fr/ Contains access to diverse genome browsers of pathogenic
bacteria IslandPath:
http://www.pathogenomics.sfu.ca/islandpath/update/IPindex.pl
An aid to the identification of genomic islands, including pathogenicity islands, of potentially horizontally transferred genes
HGT-DB: http://www.tinet.org/~debb/HGT/ A database containing the prediction of horizontally
transferred genes in several prokaryotic complete genomes E. coli genome project:
http://www.genome.wisc.edu A site devoted to the E. coli genome project with an
updated annotation of the genome
Thank you