Big Picture Of ≈1.7 million species classified so far, roughly 6000 are microbes True number of...

26
Big Picture • Of ≈1.7 million species classified so far, roughly 6000 are microbes • True number of microbes is obviously larger than 6000 “Imagine if our entire understanding of biology was based on a visit to the zoo. That’s where we’ve been in microbiology. – Norman Pace, Univ of Colorado, Boulder

Transcript of Big Picture Of ≈1.7 million species classified so far, roughly 6000 are microbes True number of...

Big Picture

• Of ≈1.7 million species classified so far, roughly 6000 are microbes

• True number of microbes is obviously larger than 6000

• “Imagine if our entire understanding of biology was based on a visit to the zoo. That’s where we’ve been in microbiology.– Norman Pace, Univ of Colorado, Boulder

Diversity of bacteria and archaea

• Only ~1% of all microbial species can be cultured

• 97% of prokaryotic isolates in stock centers are from just 4 phyla:– Proteobacteria (Escherichia, Helicobacter,

Pseudomonas)– Firmicutes (Bacillus, Streptococcus, Staphylococcus)– Actinobacteria (Mycobacterium)– Bacteriodetes (Porphyromonas gingivalis)

Unculturable microbes

Hugenholtz P (2002) Genome Biology 3, 1-8

Current estimates of microbial community diversity

• Curtis et al (2002) PNAS estimated that there are up to 160 species in a typical milliliter of seawater while there are somewhere between 6,400 and 38,000 in a typical gram of soil.

What if we combined environmental sampling and shotgun sequencing?

• How many genomes would be sampled and from what organisms?

• How many novel genes would be discovered?

• How many genomes could we completely assemble?

Environmental Genome Shotgun Sequencing of the Sargasso Sea

Venter JC, et al

Methods: Sargasso

• Performed whole-genome shotgun sequencing of surface water samples from Sargasso Sea

• Samples were filtered to isolate microbes• Created genomic libraries with 2 to 6 kb

inserts• Sequenced plasmid clones• Resulted in >1.5 Gbp of microbial DNA

sequence

The Sargasso Sea

• A sea with no coastline (bounded by ocean currents)– It moves!– Generally between the West Indies and the Azores– Water is very placid (the ‘doldrums’)– Covered by a lens of warm, nutrient-poor water and a

vast mat of algae (Sargassum)

• As simple a microbial community as is likely to be found in the ocean

The Sargasso Sea

Sampling scheme

• 1700 liters of surface water sampled from four different sites in Feb or May

• Filters allowed only cells in the 0.1-3.0 micron range– Excluded dissolved DNA and free virus

– Excluded most eukaryotes

Lots and lots of sequences

• 2 million cloned fragments 2-6 kbp in size were sequenced

• This yielded 1.6 billion base pairs total – 1 billion bp non-redundant– For comparison, the human genome is 3

billion bp

Assembly issues

• Organisms differ in– abundance– genome size

• Cannot rely on assumption that coverage is uniformly random

• Some contigs will have extremely deep coverage, which is a challenge for assembly algorithms

Assembly results

• Assembly was only successful in February sample– 64,000 scaffolds, most less than 10 kbp– 500,000 clones did not assemble

• Of those with 3X or greater coverage– About ½ could be classified taxonomically

• 21 scaffolds with greater than 14X coverage– SNPs occur at 1/10 kbp, suggesting genetic diversity within

‘species’

• Only two genomes were fully assembled, and then only with the aid of an existing reference sequence for both

Unexpected sequences

• Relatives of Burkholderia and Shewanella, typical of much more nutrient-rich environments. – Probable contaminants (at least the Burkholderia)

• At least two abundant Archaeal organisms, typical of much greater depths (200 meters)

• At least 10 mega-plasmids, many with genes related to trace metal utilization or toxicity

• Not too surprising– Some phage genomes, presumably integrated– About 70 different eukaryote species (based mainly on the

presence of 18S rDNA)

How many new genes are discovered?

• 1.2 million genes identified– Equal to the number of genes submitted to the

Swissprot/TrEMBL database from the last 8 years!

• Interesting findings– Ammonium oxidation in Archaea, which was

previously unknown– Widespread presence of genes allowing

unconventional forms of phosphate uptake– Only~37 Rubisco sequences were found, but ~800

proteorhodopsin-like genes

Organism Identification

• Focused analysis on scaffolds with at least 3X coverage depth

• =333 scaffolds; 2226 contigs; 30.9 Mbp; 25% of the data set

• Used oligonucleotide frequencies, depth of coverage, and similarity to previously sequenced genomes to separate some sequences into organism “bins”

• Identified several populations related to known species

Prochlorococcus gene conservation

Phylotypes

Photosynthesis in Sargasso Sea

• Thought to be dominated by the cyanobacteria Prochlorococcus and Synechococcus

• But, >90% of cyanobacteria scaffolds appear to be Prochlorococcus

• Could be due to the gradient sampled and the larger size of Synechococcus

Bacteriorhodopsin

• Transmembrane protein that is a green light-driven proton pump

• Protons pumped out of cell; then flow back in through ATP Synthase to create ATP

• Some rhodopsins found on scaffolds of organisms previously unknown to contain them

Rhodopsin-like Sequences

• Identified 13 subfamilies of rhodopsin-like genes

• Four families of proteins from cultured organisms and nine families from uncultured organisms

• Expression levels of these genes are unknown

Problems

• Large data dump in NCBI angered some

• Unclear how effective filtering was (some apparent eukaryotic DNA found)

• Some questioning of how samples were collected

• Had some trouble getting permits from countries to collect ocean water samples

How many species are there?

• Number of distinct SSU genes– 1164 in February– 248 in May– Dominated by proteobacteria– 148 are new phylotypes (at 97% identity)

• Can we estimate how many remain to be sampled?– From a model of assembly completeness, one can estimate

1,800 to 48,000 species in the combined sample– With 5-10 fold deeper coverage, one can estimate that 50

genomes could be fully assembled

Patchiness

• Patchiness is well documented in marine macro-organisms in the open ocean, but is not known for microbial communities

• Of the species represented by assemblies with more than 50 fragments, more than half differed in abundance among sites

Links

• http://www.ploscollections.org/article/browseIssue.action?issue=info:doi/10.1371/issue.pcol.v06.i02

• Video: http://plos.cnpg.com/lsca/webinar/venter/20070306/index.html

• General about GOS at JCVI: http://www.jcvi.org/cms/research/projects/gos/overview/