Big Picture Of ≈1.7 million species classified so far, roughly 6000 are microbes True number of...
-
Upload
sharyl-harmon -
Category
Documents
-
view
215 -
download
2
Transcript of Big Picture Of ≈1.7 million species classified so far, roughly 6000 are microbes True number of...
Big Picture
• Of ≈1.7 million species classified so far, roughly 6000 are microbes
• True number of microbes is obviously larger than 6000
• “Imagine if our entire understanding of biology was based on a visit to the zoo. That’s where we’ve been in microbiology.– Norman Pace, Univ of Colorado, Boulder
Diversity of bacteria and archaea
• Only ~1% of all microbial species can be cultured
• 97% of prokaryotic isolates in stock centers are from just 4 phyla:– Proteobacteria (Escherichia, Helicobacter,
Pseudomonas)– Firmicutes (Bacillus, Streptococcus, Staphylococcus)– Actinobacteria (Mycobacterium)– Bacteriodetes (Porphyromonas gingivalis)
Current estimates of microbial community diversity
• Curtis et al (2002) PNAS estimated that there are up to 160 species in a typical milliliter of seawater while there are somewhere between 6,400 and 38,000 in a typical gram of soil.
What if we combined environmental sampling and shotgun sequencing?
• How many genomes would be sampled and from what organisms?
• How many novel genes would be discovered?
• How many genomes could we completely assemble?
Methods: Sargasso
• Performed whole-genome shotgun sequencing of surface water samples from Sargasso Sea
• Samples were filtered to isolate microbes• Created genomic libraries with 2 to 6 kb
inserts• Sequenced plasmid clones• Resulted in >1.5 Gbp of microbial DNA
sequence
The Sargasso Sea
• A sea with no coastline (bounded by ocean currents)– It moves!– Generally between the West Indies and the Azores– Water is very placid (the ‘doldrums’)– Covered by a lens of warm, nutrient-poor water and a
vast mat of algae (Sargassum)
• As simple a microbial community as is likely to be found in the ocean
Sampling scheme
• 1700 liters of surface water sampled from four different sites in Feb or May
• Filters allowed only cells in the 0.1-3.0 micron range– Excluded dissolved DNA and free virus
– Excluded most eukaryotes
Lots and lots of sequences
• 2 million cloned fragments 2-6 kbp in size were sequenced
• This yielded 1.6 billion base pairs total – 1 billion bp non-redundant– For comparison, the human genome is 3
billion bp
Assembly issues
• Organisms differ in– abundance– genome size
• Cannot rely on assumption that coverage is uniformly random
• Some contigs will have extremely deep coverage, which is a challenge for assembly algorithms
Assembly results
• Assembly was only successful in February sample– 64,000 scaffolds, most less than 10 kbp– 500,000 clones did not assemble
• Of those with 3X or greater coverage– About ½ could be classified taxonomically
• 21 scaffolds with greater than 14X coverage– SNPs occur at 1/10 kbp, suggesting genetic diversity within
‘species’
• Only two genomes were fully assembled, and then only with the aid of an existing reference sequence for both
Unexpected sequences
• Relatives of Burkholderia and Shewanella, typical of much more nutrient-rich environments. – Probable contaminants (at least the Burkholderia)
• At least two abundant Archaeal organisms, typical of much greater depths (200 meters)
• At least 10 mega-plasmids, many with genes related to trace metal utilization or toxicity
• Not too surprising– Some phage genomes, presumably integrated– About 70 different eukaryote species (based mainly on the
presence of 18S rDNA)
How many new genes are discovered?
• 1.2 million genes identified– Equal to the number of genes submitted to the
Swissprot/TrEMBL database from the last 8 years!
• Interesting findings– Ammonium oxidation in Archaea, which was
previously unknown– Widespread presence of genes allowing
unconventional forms of phosphate uptake– Only~37 Rubisco sequences were found, but ~800
proteorhodopsin-like genes
Organism Identification
• Focused analysis on scaffolds with at least 3X coverage depth
• =333 scaffolds; 2226 contigs; 30.9 Mbp; 25% of the data set
• Used oligonucleotide frequencies, depth of coverage, and similarity to previously sequenced genomes to separate some sequences into organism “bins”
• Identified several populations related to known species
Photosynthesis in Sargasso Sea
• Thought to be dominated by the cyanobacteria Prochlorococcus and Synechococcus
• But, >90% of cyanobacteria scaffolds appear to be Prochlorococcus
• Could be due to the gradient sampled and the larger size of Synechococcus
Bacteriorhodopsin
• Transmembrane protein that is a green light-driven proton pump
• Protons pumped out of cell; then flow back in through ATP Synthase to create ATP
• Some rhodopsins found on scaffolds of organisms previously unknown to contain them
Rhodopsin-like Sequences
• Identified 13 subfamilies of rhodopsin-like genes
• Four families of proteins from cultured organisms and nine families from uncultured organisms
• Expression levels of these genes are unknown
Problems
• Large data dump in NCBI angered some
• Unclear how effective filtering was (some apparent eukaryotic DNA found)
• Some questioning of how samples were collected
• Had some trouble getting permits from countries to collect ocean water samples
How many species are there?
• Number of distinct SSU genes– 1164 in February– 248 in May– Dominated by proteobacteria– 148 are new phylotypes (at 97% identity)
• Can we estimate how many remain to be sampled?– From a model of assembly completeness, one can estimate
1,800 to 48,000 species in the combined sample– With 5-10 fold deeper coverage, one can estimate that 50
genomes could be fully assembled
Patchiness
• Patchiness is well documented in marine macro-organisms in the open ocean, but is not known for microbial communities
• Of the species represented by assemblies with more than 50 fragments, more than half differed in abundance among sites