Enigmatic, ultrasmall, uncultivated
Embed Size (px)
Transcript of Enigmatic, ultrasmall, uncultivated
Enigmatic, ultrasmall, uncultivated ArchaeaBrett J. Bakera, Luis R. Comollib, Gregory J. Dicka,1, Loren J. Hauserc, Doug Hyattc, Brian D. Dilld, Miriam L. Landc,Nathan C. VerBerkmoesd, Robert L. Hettichd, and Jillian F. Banelda,e,2
aDepartment of Earth and Planetary Science and eEnvironmental Science, Policy, and Management, University of California, Berkeley, CA 94720; bLawrenceBerkeley National Laboratories, Berkeley, CA 94720; and cBiosciences and dChemical Sciences Divisions, Oak Ridge National Laboratory, Oak Ridge, TN 37831
Edited by Norman R. Pace, University of Colorado, Boulder, CO, and approved March 30, 2010 (received for review December 16, 2009)
Metagenomics has provided access to genomes of as yet unculti-vated microorganisms in natural environments, yet there are gapsin our knowledgeparticularly for Archaeathat occur at rela-tively lowabundance and in extreme environments. Ultrasmall cells(
dataset, 117 Mb) (24). The resulting 229,084 reads were assem-bled into 13,256 contigs containing at least two reads and totaling34.1 Mb of sequence. 16S rRNA genes and other phylogeneticmarkers indicated that the dataset included the near complete(15 coverage) genome of an ARMAN-2 population (ARMAN-2reads represent 7% of the sequences from the two biolms).ARMAN cells from a third biolm (UBA) were enriched by
ltration (targeting cells 1.5 kb were
binned using a variety of methods (SI Materials and Methods), in-cluding clustering of tetranucleotide frequencies within emergentself-organizing maps. Tetranucleotide mapping created distinctclusters, consistently binned ARMAN fragments of known aflia-tion, and conrmed that essentially all ARMAN fragments wereincluded in the genomic reconstructions (5). A subset of low cov-erage (3) ARMAN fragments could not be rmly assigned to ei-ther group and were designated ARMAN-unassigned (430,607bp of sequence). These fragments may belong to variant genotypes.To further evaluate genome completeness, we checked for the
presence of 40 genes expected in all genomes (25). ARMAN-2has a single copy of each; ARMAN-4 and -5 lack nine and fourgenes, respectively. Thus, we estimate that ARMAN-2 is 99%,ARMAN-4 is 80%, and ARMAN-5 90% complete, with averagegenome sizes of 1 Mb. The 552 predicted orthologous proteinsof ARMAN-4 and -5 share 71% average amino acid identity.The genome of ARMAN-2 is in three fragments and ARMAN-4and -5 are in 46 and 75 fragments, respectively. Genome statisticsare summarized in Table 1. For group two, we propose the nameCandidatus Micrarchaeum acidiphilum ARMAN-2. We suggestthe names Candidatus Parvarchaeum acidiphilum for ARMAN-4and Candidatus Parvarchaeum acidophilus for ARMAN-5.
Comparative Genomics. ARMAN proteins have an unusually largenumber of topBLASThits to bacterial proteins (Fig. 1 andFig. S3)and 12 only have bacterial orthologs in the current clusters oforthologous groups (COG) database (26) (Table S1). Other no-table features of the ARMANgenomes are the anomalously smallfraction of genes that can be assigned to archaeal COGs (arCOGs)(105)to sequences in public databases (Fig. S3).We examined the taxonomic distributions of the top hits of all
genes of ARMAN, Euryarchaea, Crenarchaea, and two separatebasal Archaea (N. equitans and Candidatus Korarchaeum crypto-lum) to the KEGG database (Fig. 1). The ARMANs have a largerpercentage of their genes, with best matches to crenarchaealorganisms than any other Euryarchaea. They also have many geneswith no matches (e-value of 105) in the KEGG database. Thetaxonomic match-based proles of the ARMANs are most similarto the deeply branched N. equitans and Ca. K. cryptolum.
Proteomic Analysis of ARMAN. ARMAN proteins were identiedfrom lysates of seven biolm communities via shotgun proteomicsbased on assignment of tandem mass spectra to peptides pre-dicted from the genomic datasets. The database used for peptideidentication included predicted proteins from four bacterial andsix Thermoplasmatales archaeal species and from ARMAN-2, -4,and -5. Most identied ARMAN peptides were uniquely assignedto proteins of a single ARMAN type. Searches yielded 173 pro-teins from ARMAN-2 (17% of predicted proteins), 32 fromARMAN-4 (3% of predicted proteins), and 27 from ARMAN-5(3% of predicted proteins) (Tables S4S6). The most abundantARMAN-2, -4, and -5 proteins were involved in energy pro-duction, translation, protein modication, amino acid metabo-lism, or had unknown functions (Tables S4S6). H+-transportingsubunits of ATP synthase are the most highly expressed proteinswith assigned function fromARMAN-4, and -5, and account for 8,31, and 40% of the total spectral counts for proteins fromARMAN-2, -4, and -5, respectively).A protein with homology to the alpha subunit of a multisubunit
archaeal thermosome (ARMAN-2, UNLARM2_0780), likelya group II chaperonin (27), is also among the most highlyexpressed ARMAN proteins with known function. Chaperoninsare generally highly expressed in the acid mine drainage (AMD)organisms (28) and may play a role in protein refolding or stabi-lization. However, this thermosome protein alternatively may bea component of the intracellular ARMAN tube, given that ar-
Table 1. Summary of genomic information for ARMAN
ARMAN-2 ARMAN-4 ARMAN-5 Unassigned
Total bp 999,043 800,887 921,220 430,607G+C content % 47.2 34.8 34.1 36.5Number of fragments 3 44 73 145Average contig size(bp)
333,014 17,546 12,381 2,935
Predicted proteincoding genes(CDS/ORF)
1,033 916 1046 NA
Percent of genomeprotein coding
90.7 90.1 89.5 NA
Number of KEGGmatches
735 569 687 NA
Number of arCOGmatches
685 587 687 NA
Average CDS/ORFlength (bp)
892 774 788 NA
668 539 561 NA
152 139 159 NA
tRNAs 35 39 42 6
Baker et al. PNAS | May 11, 2010 | vol. 107 | no. 19 | 8807
chaeal chaperonins have previously been shown to form hetero-oligomeric complex cylinders (29). The ARMAN-4 homolog ofthe archaeal thermosomeprotein has been detected by proteomicsas well (BJBARM4_0177 in two datasets).Roughly 25% of the ARMAN proteins identied were annotated
ashypothetical or conservedhypothetical and, across all samples, 32%of spectra assigned to ARMAN derived from hypothetical proteins.ThemostabundantproteinofARMAN-2(UNLARM2_0493)hasnoassigned function. Interestingly, this proteinwasnot identied in somecommunity proteomic datasets, suggesting that it may be involved ina specic environmental response.
Metabolism.Despite few genes for glycolysis, ARMAN-2 has genesfor breakdown of fatty acids via beta oxidation and all threeARMANs have complete or near complete tricarboxylic acid(TCA) cycles. ARMAN-4 and -5 have near complete glycolyticpathways, use the pentose phosphate pathway for carbohydratemetabolism, and have genes required for glycerol utilization. Fiveproteins of the TCA cycle were detected in the proteomic data andhigh levels of succinate dehydrogenase were detected from allthree ARMAN groups (two homologs in ARMAN-2). Therefore,we know that aerobic respiration is active in the ARMANs in thebiolm. Aconitate hydratase was also detected in the proteomicdata from all three ARMAN groups (two homologs in ARMAN-5), but succinate-CoA ligase and fumarate lyase were detectedonly from ARMAN-4. All ARMAN genomes encode a singlesuperoxide dismutase, two peroxiredoxin-like genes, and alkylhydroperoxide reductase to respond to oxidative stress, consistentwith an aerobic lifestyle, although cytochrome c-oxidase geneshave not been found. All three genomes encode components re-quired for oxidative phosphorylation and ARMAN-2 has hemesynthesis genes needed to produce cytochromes. ARMAN-2 hasan alternative thymidylate synthase (ThyX) for thymine synthesis(30).We have not identied the ribulosemonophosphate pathwaythat is common to Archaea in ARMAN-2. For CoA synthesis, allthree groups are missing the bifunctional phosphopantothe-noylcysteine synthetase/decarboxylase that is found in all otherArchaea except N. equitans, Ca. K. cryptolum (31), and Ther-molum pendens (32).
There are many pathways for which genes were not identied,but this could be because of genome incompleteness or high levelsof divergence from characterized genes. For example, we identiedonly two genes for cobalamin (vitamin B12) biosynthesis, cobala-min adenosyltransferase, and precorrin-3B synthetase (cobZ).CobZ has only been reported previously in bacteria and is a keyenzyme in a poorly understood cobalamin biosynthetic pathway inphotosynthetic bacteria (33). Searches of the UBA-BS and UBAdatasets revealed that this is the only CobZ in either community.Interestingly, the ARMAN-2 gene UNLARM2_0870, encodinga protein of unknown function, contains a cobalamin binding site.
DNA Replication and Cell Cycle. All three ARMAN groups haveorc1/cdc6 replication initiation protein homologs, indicating thatthey have circular chromosomes. ARMAN-2 has primases, priS,and multiple DNA polymerases. Additionally, the ARMAN-2 ge-nome encodes a sliding clamp proliferating cell nuclear antigen(PcnA) protein (UNLARM2_0836), and this protein was identi-ed in the UBA-BS sample. All three groups have a clamp-loadercomplex that encodes the large and small subunit of replicationfactor C and ARMAN-4 and -5 have a replication factor A ho-molog. Chromatin-associated genes include Alba (in ARMAN-5)andA3 archaeal-type histone (just in ARMAN-2). ARMAN-2 hasa MinD protein involved in chromosome partitioning, as well astwo structural maintenance of chromosomes-like proteins forchromosomal segregation, both of which were identied by pro-teomics. It also has a HerA (FtsK-like) previously only found inCrenarchaea. ARMAN-2 has four cell-division proteins (FtsZ),whereas ARMAN-4 and -5 have two. One ARMAN-2 cell-division FtsZ p