Comparative day/night metatranscriptomic analysis of ...users.unimi.it/biofilms/appl biotec...

18
Comparative day/night metatranscriptomic analysis of microbial communities in the North Pacific subtropical gyreRachel S. Poretsky, 1 Ian Hewson, 2 Shulei Sun, 1 Andrew E. Allen, 3 Jonathan P. Zehr 2 and Mary Ann Moran 1 * 1 University of Georgia, Department of Marine Sciences, Athens, GA 30602, USA. 2 University of California Santa Cruz, Department of Ocean Sciences, Santa Cruz, CA 95064, USA. 3 J. Craig Venter Institute, Microbial and Environmental Genomics, San Diego, CA 92121, USA. Summary Metatranscriptomic analyses of microbial assem- blages (< 5 mm) from surface water at the Hawaiian Ocean Time-Series (HOT) revealed community-wide metabolic activities and day/night patterns of differ- ential gene expression. Pyrosequencing produced 75 558 putative mRNA reads from a day transcriptome and 75 946 from a night transcriptome. Taxonomic binning of annotated mRNAs indicated that Cyano- bacteria contributed a greater percentage of the tran- scripts (54% of annotated sequences) than expected based on abundance (35% of cell counts and 21% 16S rRNA of libraries), and may represent the most actively transcribing cells in this surface ocean com- munity in both the day and night. Major heterotrophic taxa contributing to the community transcriptome included a-Proteobacteria (19% of annotated sequences, most of which were SAR11-related) and g-Proteobacteria (4%). The composition of transcript pools was consistent with models of prokaryotic gene expression, including operon-based transcription patterns and an abundance of genes predicted to be highly expressed. Metabolic activities that are shared by many microbial taxa (e.g. glycolysis, citric acid cycle, amino acid biosynthesis and transcription and translation machinery) were well represented among the community transcripts. There was an overabun- dance of transcripts for photosynthesis, C1 metabolism and oxidative phosphorylation in the day compared with night, and evidence that energy acquisition is coordinated with solar radiation levels for both autotrophic and heterotrophic microbes. In contrast, housekeeping activities such as amino acid biosynthesis, membrane synthesis and repair, and vitamin biosynthesis were overrepresented in the night transcriptome. Direct sequencing of these envi- ronmental transcripts has provided detailed informa- tion on metabolic and biogeochemical responses of a microbial community to solar forcing. Introduction Oceanic subtropical gyres make up 40% of the Earth’s surface and play critical roles in carbon fixation and nutrient cycling. The Hawaii Ocean Time-Series (HOT) in the North Pacific subtropical gyre was established to provide a long- term perspective on oceanographic properties of such systems (Karl and Lukas, 1996) and has served as the focus of substantial research into the role of marine micro- organisms in ocean biogeochemistry (Karl et al., 1997; Cavender-Bares et al., 2001; Zehr et al., 2001). Station ALOHA, the core study site at HOT, is characterized by warm (> 23°C) surface waters with low NO 3 - concentra- tions (< 15 nM), seasonally variable surface mixed-layers (10–120 m), low standing biomass of living organisms (10–15 mgCl -1 ) and a persistent deep (75–140 m) chloro- phyll a maximum layer. Since 1988, regular measurements of physical, chemical and biological parameters have been obtained with monthly ship-based monitoring as well as bottom-moored instruments and buoys. Recent metage- nomic sampling efforts at Station ALOHA have provided information about the genes harboured by the bacteri- oplankton community and how they are distributed with depth (DeLong et al., 2006). Characterizing patterns of expression of these microbial genes and identifying what factors induce their expression is the next critical step in understanding this oceanic ecosystem. Analogous to metagenomics, environmental transcrip- tomics (metatranscriptomics) retrieves and sequences environmental mRNAs from a microbial assemblage without prior knowledge of what genes the community might be expressing (Poretsky et al., 2005; Frias-Lopez et al., 2008). Thus it provides a less biased perspective on Received 17 September, 2008; accepted 3 December, 2008. *For correspondence. E-mail [email protected]; Tel. 706-542-6481; Fax 706-542-5888. Environmental Microbiology (2009) 11(6), 1358–1375 doi:10.1111/j.1462-2920.2008.01863.x © 2009 The Authors Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd

Transcript of Comparative day/night metatranscriptomic analysis of ...users.unimi.it/biofilms/appl biotec...

Page 1: Comparative day/night metatranscriptomic analysis of ...users.unimi.it/biofilms/appl biotec amb_LM... · Comparative day/night metatranscriptomic analysis of microbial communities

Comparative day/night metatranscriptomic analysisof microbial communities in the North Pacificsubtropical gyreemi_1863 1358..1375

Rachel S. Poretsky,1 Ian Hewson,2 Shulei Sun,1

Andrew E. Allen,3 Jonathan P. Zehr2 andMary Ann Moran1*1University of Georgia, Department of Marine Sciences,Athens, GA 30602, USA.2University of California Santa Cruz, Department ofOcean Sciences, Santa Cruz, CA 95064, USA.3J. Craig Venter Institute, Microbial and EnvironmentalGenomics, San Diego, CA 92121, USA.

Summary

Metatranscriptomic analyses of microbial assem-blages (< 5 mm) from surface water at the HawaiianOcean Time-Series (HOT) revealed community-widemetabolic activities and day/night patterns of differ-ential gene expression. Pyrosequencing produced75 558 putative mRNA reads from a day transcriptomeand 75 946 from a night transcriptome. Taxonomicbinning of annotated mRNAs indicated that Cyano-bacteria contributed a greater percentage of the tran-scripts (54% of annotated sequences) than expectedbased on abundance (35% of cell counts and 21% 16SrRNA of libraries), and may represent the mostactively transcribing cells in this surface ocean com-munity in both the day and night. Major heterotrophictaxa contributing to the community transcriptomeincluded a-Proteobacteria (19% of annotatedsequences, most of which were SAR11-related) andg-Proteobacteria (4%). The composition of transcriptpools was consistent with models of prokaryotic geneexpression, including operon-based transcriptionpatterns and an abundance of genes predicted to behighly expressed. Metabolic activities that are sharedby many microbial taxa (e.g. glycolysis, citric acidcycle, amino acid biosynthesis and transcription andtranslation machinery) were well represented amongthe community transcripts. There was an overabun-dance of transcripts for photosynthesis, C1metabolism and oxidative phosphorylation in the

day compared with night, and evidence that energyacquisition is coordinated with solar radiation levelsfor both autotrophic and heterotrophic microbes. Incontrast, housekeeping activities such as amino acidbiosynthesis, membrane synthesis and repair, andvitamin biosynthesis were overrepresented in thenight transcriptome. Direct sequencing of these envi-ronmental transcripts has provided detailed informa-tion on metabolic and biogeochemical responses of amicrobial community to solar forcing.

Introduction

Oceanic subtropical gyres make up 40% of the Earth’ssurface and play critical roles in carbon fixation and nutrientcycling. The Hawaii Ocean Time-Series (HOT) in the NorthPacific subtropical gyre was established to provide a long-term perspective on oceanographic properties of suchsystems (Karl and Lukas, 1996) and has served as thefocus of substantial research into the role of marine micro-organisms in ocean biogeochemistry (Karl et al., 1997;Cavender-Bares et al., 2001; Zehr et al., 2001). StationALOHA, the core study site at HOT, is characterized bywarm (> 23°C) surface waters with low NO3

- concentra-tions (< 15 nM), seasonally variable surface mixed-layers(10–120 m), low standing biomass of living organisms(10–15 mg C l-1) and a persistent deep (75–140 m) chloro-phyll a maximum layer. Since 1988, regular measurementsof physical, chemical and biological parameters have beenobtained with monthly ship-based monitoring as well asbottom-moored instruments and buoys. Recent metage-nomic sampling efforts at Station ALOHA have providedinformation about the genes harboured by the bacteri-oplankton community and how they are distributed withdepth (DeLong et al., 2006). Characterizing patterns ofexpression of these microbial genes and identifying whatfactors induce their expression is the next critical step inunderstanding this oceanic ecosystem.

Analogous to metagenomics, environmental transcrip-tomics (metatranscriptomics) retrieves and sequencesenvironmental mRNAs from a microbial assemblagewithout prior knowledge of what genes the communitymight be expressing (Poretsky et al., 2005; Frias-Lopezet al., 2008). Thus it provides a less biased perspective on

Received 17 September, 2008; accepted 3 December, 2008. *Forcorrespondence. E-mail [email protected]; Tel. 706-542-6481; Fax706-542-5888.

Environmental Microbiology (2009) 11(6), 1358–1375 doi:10.1111/j.1462-2920.2008.01863.x

© 2009 The AuthorsJournal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd

Page 2: Comparative day/night metatranscriptomic analysis of ...users.unimi.it/biofilms/appl biotec amb_LM... · Comparative day/night metatranscriptomic analysis of microbial communities

microbial gene expression in situ compared with otherapproaches (Wawrik et al., 2002; Bürgmann et al., 2003;Zhou, 2003). Environmental transcriptomics protocols aretechnically difficult, however, as prokaryotic mRNAs gen-erally lack the poly(A) tails that make isolation of eukary-otic messages relatively straightforward (Liang andPardee, 1992) and because of the relatively short half-lives of mRNAs (Belasco, 1993). In addition, mRNAs aremuch less abundant than rRNAs in total RNA extracts,thus an rRNA background often overwhelms mRNAsignals.

A first analysis of environmental transcriptomes by cre-ating clone libraries using random primers to reverse-transcribe and amplify environmental mRNAs wassuccessful in two different natural environments(Poretsky et al., 2005), but results were biased by selec-tion of the random primers used to initiate cDNA synthe-sis. Techniques to linearly amplify mRNA obviate theneed for random primers in the amplification step andmake it possible to use less starting material (Gelderet al., 1990), while recently developed pyrosequencingtechnologies allow direct sequencing (without cloning)(Margulies et al., 2005). Initial application of thisapproach at Station ALOHA (Frias-Lopez et al., 2008)and in coastal water mesocosms (Gilbert et al., 2008)demonstrated its utility for characterizing microbial com-munity gene expression.

Here we use environmental transcriptomics to elucidateday/night differences in gene expression in surfacewaters of the North Pacific subtropical gyre (Karl andLukas, 1996). This analysis provides information on thedominant metabolic processes within the bacterioplank-ton assemblages and reveals changes in expression pat-terns of biogeochemically relevant processes.

Results

cDNA sequence annotation

The cDNAs prepared from amplified RNA (collected fromthe 0.2–5 mm size fraction) ranged in size from 100 bp to1 kb, with the majority between 200 and 500 bp. Theaverage picoliter reactor pyrosequencing read lengthwas 99 bp, typical for the GS 20 sequencing platform.Predicted rRNA sequences were removed based onsequence similarity to the nt database using BLASTN.While more laborious than our initial approach that usedsequence similarity to the RDP II database supplementedwith a 18S, 23S and 28S rRNA database from genomesequences, it identified nearly all of the rRNA sequencesin our libraries. Accurate identification of rRNAs is crucialbecause of numerous misidentified sequences in theRefSeq protein database (i.e. rRNA sequences that areincorrectly annotated as putative proteins). Relatively lowrRNA sequence contamination (37%) compared with the

rRNA content of prokaryotic cells (> 80%; Ingraham et al.,1983) indicated that the steps for excluding rRNAsthrough selective degradation and subtractive hybridiza-tion were largely successful.

Sequences remaining after deletion of rRNAsequences (75 558 from the day and 75 946 from thenight) were categorized as possible protein encodingsequences and BLASTX-queried against the NCBIcurated, non-redundant reference sequence database(RefSeq) to determine putative functions (Fig. 1). Aboutone-third of HOT pyrosequences in each library met thecriteria for gene predictions determined empirically by insilico analysis of known functional gene sequences frag-mented into 100 bp pieces (see Experimental proceduresfor more details). This is nearly twice the fraction of readsidentified in metagenomic efforts with similar pyrose-quencing read lengths (Frias-Lopez et al., 2008; Mouet al., 2008), as might be expected for sequences biasedtowards coding regions of genomes. These sequenceswere subsequently assigned to the function of their besthit in RefSeq. Transcript abundance was analysed asrelative abundance within the collective community tran-scriptome rather than per-gene expression levels (seeFrias-Lopez et al., 2008). Empirically derived criteria wereestablished in separate in silico analyses for the Clustersof Orthologous Groups (COG) and Kyoto Encyclopedia ofGenes and Genomes (KEGG) databases, which containfewer sequences than RefSeq (Fig. 1). Some of thesequences without hits in RefSeq were similar to proteinsin the Global Ocean Sampling database, indicating thatsimilar sequences have been found in marine bacteri-oplankton communities, but functional annotation is notcurrently possible.

At the end of the annotation pipeline, half of the pos-sible protein-encoding sequences in each library had nosignificant hits to previously sequenced genes. Toexamine how sequences from uncultured marine bacte-rial taxa might decrease annotation success or skewtaxonomic assignments, we randomly selected 100 bpsequences from the coding regions of genome fragmentsfrom SAR86 and SAR116 cells captured in environmen-tal BAC libraries (SAR86 BAC, AF279106; SAR86 BAC,AY552545; SAR116 BAC, AY744399). Excluding self-hits, approximately 60% of the sequences from the BACshad no hits in RefSeq (Table S1). In a similar analysis ofcoding sequences from cultured taxa with genomesequences available (Pelagibacter ubique HTCC1062and Prochlorococcus marinus MIT9312), only ~20% ofthe sequences had no hits in RefSeq. Many unannotatedsequences in the HOT libraries are therefore likely to betranscripts from poorly known taxa, but also includesome transcripts from well-known taxa with poor identityto sequence databases for that particular 100 bp frag-ment. In support of the latter, a preliminary analysis of a

Comparative Metatranscriptomic Analysis 1359

© 2009 The AuthorsJournal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375

Page 3: Comparative day/night metatranscriptomic analysis of ...users.unimi.it/biofilms/appl biotec amb_LM... · Comparative day/night metatranscriptomic analysis of microbial communities

marine environmental transcriptome consisting of longerreads (~200 bp; 454 GS FLX sequencing platform; R.S.Poretsky and M.A. Moran, unpublished; and Table S1)resulted in twice the frequency of annotated sequencesas the HOT metatranscriptome. For the 100 bp genomefragments from uncultured taxa that had significant hitsin RefSeq, they were almost always to a gene from anorganism in the same phylum (90%) or subphylum(70%), and thus did not significantly skew the taxonomicassignments (Table S1). SAR86, SAR116 and other cur-rently recognized uncultured groups made up ~4% of the16S rRNA amplicons from these samples (see below).Finally, to examine the possibility that the unidentifiedsequences were from non-protein-coding regions, thesesequences were BLAST-queried to tRNA genes, 5S rRNAgenes and intergenic region sequences from threeP. marinus genomes (MIT9301, MIT9312 and AS601)and two P. ubique genomes (HTCC1002 andHTCC1062). Based on this analysis, ~4% of the 76 327unidentified sequences were from non-protein-codingregions of these genomes, and these primarily hit inter-genic regions.

Community composition and taxonomic originof transcripts

Prochlorococcus are the most abundant Cyanobacteria atStation ALOHA (> 95% of photosynthetic picoplanktoncells; Campbell and Vaulot, 1993) and in this studyaccounted for approximately 2 ¥ 105 cell ml-1 (based onflow cytometric counting; http://hahana.soest.hawaii.edu/hot/hot-dogs/), or ~30% of the total microbial community(Fig. 2). Heterotrophic bacteria (including phototrophs)were numerically dominant with ~5 ¥ 105 cell ml-1,accounting for ~65% of the microbial community presentat the time of sampling. Direct counts also indicated thepresence of ~800 cell ml-1 of pigmented nanoeukaryotes(0.2%; Fig. 2).

Companion PCR-based 16S rRNA clone libraries weregenerated from DNA collected in tandem with the RNAsamples and demonstrated close agreement with the flowcytometric data in terms of taxonomic composition atStation ALOHA. Cyanobacteria accounted for ~20% of the16S rRNA sequences, and heterotrophic bacterial groupswere ~80% (Fig. 3). Among the heterotrophic 16S rRNA

BLASTX against RefSeq

240,422 Total 454Sequences

88,916rRNA sequences

151,504 Possible protein-encoding sequences

BLASTX againstCOG

BLASTX againstGOS

37%

42%

63%

76,327

unidentified

26,366 GOS sequences

11%32%

48,648 Identified sequences

21%

102,856 Unidentified

BLASTXagainst nr15%10%

163

0.07%

BLASTN against nt

24,474 35,927

sequences

sequences

sequencessequences

BLASTX againstKEGG

Fig. 1. The mRNA annotation pipeline developed for 454 transcript reads showing combined counts for the day and night transcriptomes. Allpercentages are relative to the total number of sequences entering the pipeline.

1360 R. S. Poretsky et al.

© 2009 The AuthorsJournal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375

Page 4: Comparative day/night metatranscriptomic analysis of ...users.unimi.it/biofilms/appl biotec amb_LM... · Comparative day/night metatranscriptomic analysis of microbial communities

sequences, Proteobacteria were most abundant (41%;Fig. 3) and were dominated by a-Proteobacteria (22%),b-Proteobacteria (8%) and g-Proteobacteria (8%).Bacteroidetes (8%) and Firmicutes (12%, biased towardsthe day sample) were also well represented.

Taxonomically binned mRNA sequences were com-pared with community composition data to ask whethertaxa contributed to the HOT community mRNA in propor-tion to their representation in the microbial assemblage(i.e. whether taxa are equally transcriptionally active on aper-cell basis). Cyanobacteria dominated the transcriptlibraries (55% of sequences) with about twofold higherrepresentation than in the 16S rRNA amplicons or the cellcount data (Fig. 3), indicating that there is more geneexpression in these autotrophic bacterioplankton than inco-occurring heterotrophs (or possibly that their tran-scripts are longer-lived). When relative 16S rRNA abun-dance was calculated among just the heterotrophicgroups (i.e. with cyanobacterial sequences removed),many taxa had similar contributions to the transcript pooland amplicon pool, suggesting comparable levels oftranscriptional activity on a per-gene basis within the limitsof recognized biases of PCR amplification (Fig. 3).

Proteobacteria contributed the second largest number oftranscript sequences (28%), most of which were attrib-uted to a-Proteobacteria (19%) and g-Proteobacteria(4%). Approximately 2% of the total transcripts were ofeukaryotic origin. Comparing putative taxonomic assign-ments of transcripts between day and night, Cyanobacte-ria contributed equally to the day and night transcriptome(55% versus 56%) as did a-Proteobacteria (40% versus45% of heterotrophic transcripts) and g-Proteobacteria(11% versus 8% of heterotrophic transcripts) (Fig. 3).

More detailed taxonomic assignment of transcripts wascarried out for the best represented clades. The Cyano-bacteria transcripts were dominated by Prochlorococcus-like sequences most similar to P. marinus AS9601,P. marinus MIT 9301 and P. marinus MIT 9312 (Table 1).The a-Proteobacteria, the most transcriptionally activeamong the heterotrophic groups, mostly containedsequences with similarity to the SAR11 group membersP. ubique HTCC1002 and P. ubique HTCC1062 (~10% ofprokaryotic transcripts). Roseobacter-like sequenceswere also represented and were primarily assigned toDinoroseobacter shibae DFL 12, Jannaschia sp. CCS1,Silicibacter pomeroyi DSS-3, Roseobacter denitrificans

0 200 400 600

Depth

(m

)

0

50

100

150

200

chla (10-3 μg l

-1)

Prochlorococcus x 103 cells ml

-1

Synechococcus x 102 cells ml

-1

Nanoeukaryotes x 102 cells ml

-1

Heterotrophic bateria x 103 cells ml

-1

Fig. 2. Depth profiles of Prochlorococcus-like, Synechococcus-like, heterotrophic bacteria and pigmented nanoeukaryotes during the HOT-175cruise, as determined by flow cytometry. The horizontal line indicates the mixed layer depth. The depth profile for chlorophyll a is alsoindicated. Data were collected through the HOT project and downloaded from the HOT Data Organization and Graphical System(http://hahana.soest.hawaii.edu/hot/hot-dogs/).

Comparative Metatranscriptomic Analysis 1361

© 2009 The AuthorsJournal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375

Page 5: Comparative day/night metatranscriptomic analysis of ...users.unimi.it/biofilms/appl biotec amb_LM... · Comparative day/night metatranscriptomic analysis of microbial communities

Och 114 and Silicibacter sp. TM1040 (Table 1 and Fig. 4).These assignments do not imply that these actual specieswere present at the time of sample collection, but ratherthey represent the best current sequence matches forsome of the more abundant environmental transcripts.

Transcriptome coverage

To estimate transcriptome coverage, 16S rRNA clonelibrary data were used to establish a taxon-abundancemodel for the HOT community at an identity level of 99%.Assuming that each taxon expresses 1000 differentgenes at any given time (based on the Escherichia colimodel; Ingraham et al., 1983) and that genome coverage

follows a Lander–Waterman model (Lander and Water-man, 1988), we estimate that the most abundant taxon inthe day or night sample had over 90% transcriptomecoverage (i.e. 90% of the expressed genes weresequenced at least once), while the 15 most abundanttaxa had more than half of their transcriptome repre-sented (Table S2). Alternately, we determined the single-tons and doubletons among the COG categories (i.e. thenumber of COGs containing only one or two sequences)and applied the Chao1 index of diversity to determine thetheoretical abundance of COGs in the day and night. Thesequencing effort captured about 80% of the COGs pre-dicted to be present in the night transcriptome and 70% ofthe COGs predicted for the day transcriptome (Table S2).

Cyanobacteria

Alphaproteobacteria

Gammaproteobacteria

Betaproteobacteria

Deltaproteobacteria

Epsilonproteobacteria

Other Proteobacteria

Bacteroidetes

Chlamydiae

Chlorobi

Chloroflexi

Chrysiogenetes

Acidobacteria

Firmicutes

Planctomycetes

Spirochaetes

Thermotogae

Verrucomicrobia

Cyanobacteria18 %

Other82%

Other45%

Other79%

Other44%

Cyanobacteria55 %

Cyanobacteria21 %

Cyanobacteria56 %

Actinobacteria

Lentispaerae

A

B

16S rRNAgenes

16S rRNAgenes

mRNA

mRNA

Fig. 3. Contribution of taxa to the 16S rRNA amplicon pool and transcript pool for the day (A) and night (B) samples. Taxonomy is presentedto the phylum level (based on NCBI taxonomy) except for Proteobacteria, which is at the subphylum level. The dashed red lines indicatecyanobacterial abundance in the night sample as determined by flow cytometric counting.

1362 R. S. Poretsky et al.

© 2009 The AuthorsJournal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375

Page 6: Comparative day/night metatranscriptomic analysis of ...users.unimi.it/biofilms/appl biotec amb_LM... · Comparative day/night metatranscriptomic analysis of microbial communities

Based on these coverage estimates, increasedsequencing depth would have been required to fullycapture some specialized processes carried out by rarermembers of the HOT community, but frequently tran-scribed genes from abundant taxa were well represented.In support of this, transcript mapping to the three P. mari-

nus and two P. ubique reference genomes showedsequences with homology to approximately half thegenes, at coverage depths ranging from 1 to nearly 500hits per gene (Fig. 5). Moreover, many of the referencegenes with the greatest coverage are those mediatingmetabolic processes expected to be dominant in the HOTbacterioplankton community (e.g. the photosynthesisgenes psaA and psaB, the light-harvesting complex andRuBisCo, ammonium transporters and transcription-related genes; Fig. 5). Other genes on the referencegenomes for which there is similarly deep transcript cov-erage (e.g. proteorhodopsin, Na+/solute symporters,colicin V production and several hypothetical proteins)can be hypothesized to also represent dominant meta-bolic activities (Fig. 5).

Operon signature in environmental transcript pools

Genes that encode steps in the same metabolic pathwayare frequently clustered into operons in prokaryotic

% P

HX

Genes

Fre

quency

Fre

quency

Number of Adjacent Genes

Fig. 4. Evidence for prokaryotic gene expression patterns in the community transcriptome based on P. marinus, P. ubique and Roseobactergenome bins.A. Operon-based expression was evaluated by comparing the number of adjacent transcripts (closed circles) to the number of adjacent genesfound in 1000 random samples of the same size from the reference genome (black lines).B. Preferential representation of transcripts from genes predicted to be highly expressed was evaluated by comparing the per cent of PHXgenes in the reference genome (grey bar) to the per cent in the transcript pool (black bar). Differences between transcript pools and referencegenomes were significant for both operon and PHX analyses (Wilcoxon signed-rank test; P < 0.05).

Table 1. Number of sequences from the community transcriptomewith highest homology to the listed reference genomes, as deter-mined by top BLASTX hit to RefSeq.

Night Day

Prochlorococcus marinus str. MIT 9301 6309 6292Prochlorococcus marinus str. AS9601 3214 2849Pelagibacter ubique HTCC1002 2541 1851Prochlorococcus marinus str. MIT 9312 1430 1264Pelagibacter ubique HTCC1062 1308 944Dinoroseobacter shibae DFL 12 48 34Jannaschia sp. CCS1 41 27Silicibacter pomeroyi DSS-3 39 30Roseobacter denitrificans Och 114 30 28Silicibacter sp. TM1040 19 26

Comparative Metatranscriptomic Analysis 1363

© 2009 The AuthorsJournal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375

Page 7: Comparative day/night metatranscriptomic analysis of ...users.unimi.it/biofilms/appl biotec amb_LM... · Comparative day/night metatranscriptomic analysis of microbial communities

HTCC 1062

HTCC 1002

0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 14000

20

40

60

80

0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 20000

25

50

75

400

425

Occure

nces

0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000

0

5

10

15

20

25

30

MIT9312

0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 20000

50

100

150400

450

500

MIT9301

AS9601

Hypothetical protein

Cytochrome b559, beta subunit

Photosystem II D2

Ribosomal protein L14

Ribosomal protein L20

Photosytem II PsbJ protein

Ammonium transporter family

Photosystem II PsbB (CP47)

Ribulose bisphosphate carboxylase

Protoporphyrin IX magnesium chelatase, subunit chlH

Elongation factor Tu

Photosystem I PsaA

Photosystem II PsbA (D1)

light-harvesting complex protein

Integral membrane protein, interacts with FtsH

Photosystem I PsaB

30S ribosomal protein S3Photosystem IIreaction center Z

Bacteriorhodopsin

Na+/solute symporter

AcrB/AcrD/AcrF family protein (Acriflavin resistance)

Chromosome segregationSMC family protein Hypothetical protein

DNA-directed RNA polymerase beta prime chain

0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 14000

5

10

15

20

25

30

35

lipoproteinprecursor

excinucleaseABC subunit C

heat shock protein a

30S ribosomal protein S1

octaprenyl-diphosphate synthaseadenylylsulfate reductase

translation elongation factor EF-G

E

A

B

C

D

Fig. 5. Mapping of transcripts to five reference genomes. A–C are P. marinus strains; D–E are P. ubique strains. The x-axis shows genenumber in the reference genome. Shaded areas represent possible hypervariable regions with few mapped transcripts.

1364 R. S. Poretsky et al.

© 2009 The AuthorsJournal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375

Page 8: Comparative day/night metatranscriptomic analysis of ...users.unimi.it/biofilms/appl biotec amb_LM... · Comparative day/night metatranscriptomic analysis of microbial communities

genomes (Overbeek et al., 1999) to facilitate coordinatedtranscription. Thus a cell’s transcript pool is anticipated toinclude more mRNAs from adjacent genes than what isexpected from a random sampling of the genome. Wetested this using the transcripts assigned to taxonomicbins for P. marinus, P. ubique and Roseobacter by count-ing the frequency with which transcripts from two adjacentgenes on the reference strain genome (defined as � 1gene intervening) were both present in the bin, recogniz-ing that the wild and reference organisms will not be fullysyntenic. In all cases, the transcript bins had significantlymore adjacent genes than a null distribution generatedfrom the reference genomes (Fig. 4A), suggesting thatrandom transcript sequencing captures operon-basedexpression patterns in natural marine bacterioplanktoncommunities.

Predicted highly expressed genes in environmentaltranscript pools

Genes that are frequently transcribed by a cell can beidentified based on patterns in codon usage (Karlin andMrázek, 2000). We identified predicted highly expressed(PHX) genes for the reference genomes, and thenassigned PHX status to the transcripts with best hits tothat reference genome based on homology. For all taxa,and in accordance with biological expectations, the envi-ronmental transcript bins had a significantly higher per-centage of PHX genes than the reference genomes(Fig. 4B). This pattern was particularly evident for theRoseobacters (9% of the genes in the reference genomesare PHX versus 30% of the transcripts; 3.1-fold enrich-ment) and for P. marinus MIT9301 (4.6% versus 12.9%;2.8-fold enrichment). A larger proportion of PHX tran-scripts were found in the day for all P. marinus bins andthe Roseobacter bin (although not for P. ubique), suggest-ing that highly expressed genes more frequently mediatedaytime-biased processes (data not shown).

Metatranscriptomic comparison of day and nightsamples

The majority of annotated transcripts (~80%) wereassigned to genes related to metabolism, and in particularto three KEGG categories: amino acid transport andmetabolism, energy production and conversion (particu-larly oxidative phosphorylation, carbon fixation and nitro-gen metabolism), and carbohydrate transport (Fig. 6).Membrane transport and signal transduction pathwayswere also common in the community transcriptome,specifically for ABC transporters of amino acids, glycinebetaine/L-proline, polyamines (spermidine andputrescine), iron and nutrients in the form of nitrate, phos-phate and phosphonate.

The day/night samples allowed comparison of dominantexpression patterns in the presence and absence of solarradiation in the bacterioplankton community. Among the167 KEGG metabolic pathways represented in the anno-tated sequences, four pathways were better representedat night (including those for glycospingolipid biosynthesisand nucleotide sugars metabolism) and six were betterrepresented in the day (including photosynthesis and oxi-dative phosphorylation) (95% confidence level; Table 2).Some KEGG pathways had significant diel differences infrequency for individual taxonomic bins. These include:histidine biosynthesis, with evidence for expression of allor nearly all genes in the pathway (both P. ubique andP. marinus at night; Fig. 7A and Fig. S1A); metabolism ofglutathione, a reductant with multiple detoxifying and cyto-protective capabilities (P. marinus at night); the photosyn-thesis pathway (phycobilisome, photosystem I and II,cytochromes, ATP synthase) and nearly all genesinvolved in biosynthesis of phytoene, and subsequentconversion into carotenoids (P. marinus in the day;Fig. 7B); nucleotide sugars metabolism, glycosphingolipidbiosynthesis, carotenoid biosynthesis and vitamin B6metabolism (P. ubique in the night; Fig. S1B); and transferof methyl groups for C1 metabolism (P. ubique andRoseobacter in the day) (Table S3).

Transcript annotation based on the COG database wascomparable. Among the 1577 COGs represented, statis-tical comparisons identified 12 that were better repre-sented at night and 13 that were better represented in theday (Table S4). These included amino acid and nucleotidemetabolism, membrane biosynthesis and polyaminedehydrogenation at night, and light-mediated energy pro-duction, protein turnover, catalase synthesis and inor-ganic ion transport and metabolism in the day.

Statistically significant differences in the distribution oftranscripts between the day and night samples were alsoassessed independently of KEGG and COG assignmentsin order to capture signals from genes not currently clas-sified by these annotation systems. Among the additionalsignificant functions overrepresented in the night tran-scriptome were those for ABC-type spermidine/putrescinetransport system permeases, RNA methyltransferasesand signal transduction histidine kinases. For the daytranscriptome, genes encoding proteorhodopsin and anaromatic-ring hydroxylase were significantly overrepre-sented (Table S5).

Eukaryotic sequences

The majority of eukaryotic transcripts were most closelyaffiliated with sequences from green-lineage organisms(Viridiplantae), such as the picoeukaryotic prasinophytesOstreococcus spp. (Derelle et al., 2006) and Micromonasspp. A large number of transcripts also appeared to be

Comparative Metatranscriptomic Analysis 1365

© 2009 The AuthorsJournal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375

Page 9: Comparative day/night metatranscriptomic analysis of ...users.unimi.it/biofilms/appl biotec amb_LM... · Comparative day/night metatranscriptomic analysis of microbial communities

most closely related to genes in Chromalveoltae(Stramenopile or Alevolate) genomes. These groups aremajor components of the picoeukaryotic phytoplankton(McDonald et al., 2007) and are small enough to pass the

5 mm prefilter used in this study. Gene transcripts thatmost closely matched reference genomes of photosyn-thetic eukaryotes were more abundant in the day com-pared with night sample. Among the most highly

Fig. 6. The 50 most abundant KEGG pathways in the night (black) and day (gray) transcriptomes. The pathways marked with stars weresignificantly overexpressed in one of the pools as determined by comparisons with P < 0.05 (Rodriguez-Brito et al., 2006).

1366 R. S. Poretsky et al.

© 2009 The AuthorsJournal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375

Page 10: Comparative day/night metatranscriptomic analysis of ...users.unimi.it/biofilms/appl biotec amb_LM... · Comparative day/night metatranscriptomic analysis of microbial communities

expressed genes detected from eukaryotic organismswere those encoding chlorophyll binding proteins, lightharvesting reactions and photosynthetic machinery(Fig. 8). These included a photosystem II D1 reaction-centre protein related to that from the diatom Thalassio-sira psuedonana, as well as the plastid-encodedphotosystem I subunit protein similar to psaB from thediatom Odontella sinensis. Evidence for stramenopilenitrogen metabolism via urea cycle activity was alsodetected based on several transcripts that most closelymatched stramenopile carbamoyl phosphate synthetaseIII, indicating that the unique diatom urea cycle (Armbrustet al., 2004; Allen et al., 2006) is likely active in naturalpopulations of stramenopile picophytoplankton.

qPCR quality control

The half-life of microbial transcripts can be as short as30 s based on studies of mRNAs of cultured bacteria(Belasco, 1993), while processing times for environmentalnucleic acid samples can take hours (Fuhrman et al.,1988). Linear amplification of RNA greatly reduces thetime between initiation of sampling and capture of tran-scripts because sample volumes can be reduced, but ithas potential to introduce bias into the sequenced mRNApool. A previous test with mRNA from the cultured marinebacterium S. pomeroyi DSS-3 demonstrated minor biasand good repeatability during linear amplification (Bürg-mann et al., 2007). Here, we assessed the full environ-mental transcriptomic sequencing protocol by comparingqPCR-based ratios of selected genes in day versus nighttotal RNA fractions to the pyrosequencing-based ratio ofthese same genes in the sequenced transcript pools. Fivegenes common in the transcriptome (P. marinus-like recAand psaA, P. ubique-like proteorhodopsin and Na+/solutesymporter, and P. torquis-like membrane proteinase)showed a strong positive correlation between night andday ratios in the original RNA pool and the pyrosequencedata sets (r = 0.94, Fig. S2), indicating that the sequencedmetatranscriptome was representative of the unamplifiedmRNA pool.

Discussion

The HOT program provides comprehensive, long-termoceanographic information for the oligotrophic NorthPacific Ocean (Karl and Lukas, 1996). In situ dissolvedorganic constituents at 25 m depth at Station ALOHA aretypically 70–110 mM for carbon, 5–6 mM for nitrogen and0.2–0.3 mM for phosphorus; ammonium concentrations inthese waters (~50 nM) are below the detection limit ofstandard nutrient analysis (http://hahana.soest.hawaii.edu/hot/hot-dogs/). Surface water nutrient data over thepast several decades for the month of November, themonth in which the community transcriptomes in thisstudy were obtained, and taken during various times ofday show no discernable differences in organic and inor-ganic carbon, nitrogen, and/or phosphorus concentrationsat Station ALOHA on a diel basis.

Building on previous metagenomic and transcriptomicanalyses of this system (DeLong et al., 2006; Frias-Lopezet al., 2008), this day/night environmental transcriptomicseffort provides insight into the temporal patterns of bacte-rioplankton metabolic processes and ecological activities(Table 3). Three important caveats of the analysis arethat: (i) the composition of the environmental transcrip-tomes may be inadvertently shaped by collection andfiltration manipulations, (ii) mRNAs with intrinsicallyshorter half-lives are less likely to be stabilized andsequenced and (iii) only 32% of the 151 000 possibletranscript sequences could be confidently assigned to aknown function (Fig. 1). Despite these concerns, the com-munity transcriptomes provided reasonable coverage ofmRNAs from the dominant organisms, and the relativerepresentation of transcripts was corroborated by RTqPCR-based expression analyses (Fig. S2).

The community transcriptomes had properties consis-tent with expected attributes of the HOT ecosystem,including the apparent taxonomic affiliations of tran-scripts. Closely related P. marinus reference strains thatare members of high light clade eMIT9312 comprised themost populated transcript bin. This clade has been shownto dominate in the upper euphotic zone (< 50 m) at low

Table 2. KEGG pathways significantly overrepresented in the night (grey shading) and day (no shading) transcriptomes (P < 0.05).

Pathway ID Pathway Category

path00520 Nucleotide sugars metabolism Carbohydrate Metabolismpath00521 Streptomycin biosynthesis Biosynthesis of Secondary Metabolitespath00602 Glycosphingolipid biosynthesis – neo-lactoseries Glycan Biosynthesis and Metabolismpath00603 Glycosphingolipid biosynthesis – globoseries Glycan Biosynthesis and Metabolismpath00190 Oxidative phosphorylation Energy Metabolismpath00195 Photosynthesis Energy Metabolismpath03010 Ribosome Translationpath03020 RNA polymerase Transcriptionpath04940 Chaperonin N/Apath05060 Chaperonin N/A

Comparative Metatranscriptomic Analysis 1367

© 2009 The AuthorsJournal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375

Page 11: Comparative day/night metatranscriptomic analysis of ...users.unimi.it/biofilms/appl biotec amb_LM... · Comparative day/night metatranscriptomic analysis of microbial communities

1368 R. S. Poretsky et al.

© 2009 The AuthorsJournal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375

Page 12: Comparative day/night metatranscriptomic analysis of ...users.unimi.it/biofilms/appl biotec amb_LM... · Comparative day/night metatranscriptomic analysis of microbial communities

and mid latitudes (below 30°) (Johnson et al., 2006),much like the HOT stations from which our samples werecollected. SAR11-like sequences comprised the secondlargest taxonomic bin. This taxon is the most numerousheterotrophic marine bacterioplankton group, particularlyin oligotrophic oceans where it makes up 30–40% of cellsin the euphotic zone (Morris et al., 2002).

Studies of taxonomic composition of ocean assem-blages consistently show the numerical importance of a-and g-Proteobacteria, Cyanobacteria, and Bacteriodetes(Morris et al., 2002; DeLong et al., 2006; Ruschet al., 2007), but little is known about how abundancespecifically relates to activity levels. Based on compari-sons of the relative abundance of taxa (flow cytometrycounts and 16S rRNA amplicons) to their representationin the community transcriptome, by far the highest per-celltranscriptional activity level in the HOT ecosystem wasseen for the Cyanobacteria. Assuming similar mRNA half-

lives across the prokaryotic taxa, dominant autotrophsproduced more transcripts per gene than anyco-occurring heterotrophic group not only in the day, butalso at night (Fig. 3). This may reflect an advantage ofautotrophy over heterotrophy for maintaining cellularactivity levels given the low concentration and refractorynature of organic carbon fuelling heterotrophic activity inthe oligotrophic ocean (Bauer et al., 1992).

As expected, many transcripts involved in light-mediated processes, such as photosynthesis and prote-orhodopsin activity, were among those overrepresented inthe community transcriptome in the day. Transcriptsinvolved in protection or repair of light-induced DNA andprotein damage (e.g. catalase, chaperones, photolyases,superoxide dismutase and various DNA repair proteins)were also common in the day sample. Evidenceof daytime C1 utilization by some heterotrophs suggestsa source of C1 compounds or methyl groups in this

Fig. 7. Transcript mapping to the KEGG histidine metabolism pathway for P. ubique, overrepresented at night (A) and the biosynthesis ofsteroids and carotenoids pathway for P. marinus, overrepresented in the day (B). Colour (blue for night, yellow for day) indicates thattranscripts were found; grey indicates that genes were present in the reference genome but no transcripts were found; white indicates thatgenes were not present in the reference genomes.

electron transport

photosynthesis, light reaction

phosphorus metabolic process

oxidative phosphorylation

ion transmembrane transporter activity

energy derivation by oxidation of organic compounds

heme binding

cellular biosynthetic process

protein metabolic process

cellular macromolecule metabolic process

organelle organization and biogenesis

DNA metabolism

organic acid metabolic process

carbon utilization by fixation of carbon dioxide

aldehyde metabolic process

macromolecular complex assembly

cellular component assembly

ribonucleoprotein complex biogenesis and assembly

macromolecule biosynthetic process

intracellular transport

aromatic compound metabolic process

biopolymer metabolic process

amino acid and derivative metabolic process

0 20 40 60 80 100 120 140 160 180

Fig. 8. Number of eukaryotic transcripts in day (top bars) compared with night (bottom bars) samples. The relative contribution ofViridiplanteae (green), photosynthetic Chromist algae (yellow), and other Chromist (red) transcripts to each Gene Ontology (GO) annotationcategory are depicted.

Comparative Metatranscriptomic Analysis 1369

© 2009 The AuthorsJournal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375

Page 13: Comparative day/night metatranscriptomic analysis of ...users.unimi.it/biofilms/appl biotec amb_LM... · Comparative day/night metatranscriptomic analysis of microbial communities

ecosystem. Compounds such as methanol and formalde-hyde (Heikes et al., 2002; Carpenter et al., 2004; Giovan-noni et al., 2008), methane (Ward et al., 1987), andmethylhalides (Woodall et al., 2001; Schaefer et al., 2002)may be available to heterotrophic bacterioplankton insurface sea water. Dimethylsulphoniopropionate, anorganic sulphur compound produced in abundance bymarine phytoplankton (Kiene et al., 2000), is a rich sourceof methyl groups for surface ocean bacterioplankton, andtetrahydrofolate-mediated C1 transfer (i.e. transcriptsmapping to the C1 pool by folate and methane metabolismKEGG pathway; Table S5) has been shown to play a rolein its metabolism (Howard et al., 2006). Recovery of nearly

four times as much mRNA per volume of sea water in theday (~30 ng l-1) compared with night (~8 ng l-1) is consis-tent with high relative abundance of RNA polymerasetranscripts in the day (Table 2) and likely reflects increasedgene expression when solar radiation is available.

Night-biased synthesis of vitamin B6, essential for avariety of amino acid conversions including transamina-tions, decarboxylations and dehydrations, in conjunctionwith evidence for other night-time activities such as theg-glutamyl pathway for amino acid uptake, the overrepre-sentation of amino acid transport and metabolism genes,and the histidine synthesis pathway (Table 3 andTables S4–S6), indicate that amino acid acquisition in

Table 3. Selected biogeochemically relevant genes in the HOT metatranscriptome.

Night Day

Nitrogen Nitrogenase (N fixation) nifH, nifU, nifS, nifB + +Ammonium transport amt + +*Ammonia monooxygenase amoAAssimilatory nitrate reductase narB +Hydroxylamine oxidoreductase haoNitrate permease napA +Nitrite reductase nirA +Dissimilatory nitrite reductase nirK, nirSNitric oxide reductase norQ +Nitrate transporter narK +Urease ureC, ureE, ureF + +

Methylotrophy Serine-glyoxylate aminotransferase + +Formate dehydrogenase fdh, fdsD + +Methylene tetrahydrofolate reductase metF + +Methane monooxygenase mmoMethanol dehydrogenase mxa +Methenyltetrahydromethanopterin cyclohydrolase mch + +Crotonyl-CoA reductase + +Formaldehyde-activating enzyme fae +

Polyamine degradation Deoxyhypusine synthase dys2 +* +Spermidine/putrescine transport system permease potC +* +Acetylpolyamine aminohydrolase aphA

Sulphur cycle Sulphur oxidation soxB, soxC, soxA, soxZ, soxF + +Dimethylsulphoniopropionate demethylase dmdA

Glycine betaine Dimethylglycine dehydrogenase dmgdh + +Glycine cleavage system (amnomethyltransferase) gcvT +* +

Aromatic compounds Aromatic ring hydroxylase chlP + +*protocatechuate 3,4-dioxygenase pcaHBenzoyl-CoA oxygenase boxA +

Carbon monoxide Carbon monoxide dehydrogenase cosS, coxM, coxL + +Phototrophy and C fixation Photosystem I multiple + +*

Photosystem II multiple + +*Rubisco rbcL, rbcS + +*Photosynthetic reaction centre, M subunit pufM +Proteorhodopsin + +*

Phosphate assimilation Phosphonate uptake phnD, phnC + +Alkaline phosphatase phoA + +Phosphate uptake pstA, pstS + +

Amino acid metabolism Glutamate synthase gltB + +Glutathione reductase gor +* +Histidine kinase baeS +* +Threonine synthase thrC +* +

Trace metal uptake Selenium +* +Iron tonB + +Arsenite +Arsenate reductase arsC + +

A ‘+’ indicates occurrence in the night or day sample. An asterisk indicates significantly higher transcript frequency in one.

1370 R. S. Poretsky et al.

© 2009 The AuthorsJournal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375

Page 14: Comparative day/night metatranscriptomic analysis of ...users.unimi.it/biofilms/appl biotec amb_LM... · Comparative day/night metatranscriptomic analysis of microbial communities

general may be a relatively more important metabolicactivity in the night. Prochlorococcus marinus has recentlybeen shown to exhibit diel patterns of amino acid uptake,with acquisition occurring predominantly at dusk (Maryet al., 2008). Our data agree with this and further suggestthat heterotrophic taxa also devote a greater percentageof their transcriptome to transporting and synthesizingamino acids at night. Night-time accumulation of aminoacids might be a mechanism for nitrogen storage by manyorganisms, particularly for P. marinus, which undergoescell division at night. Histidine, the amino acid with themost consistent signal for synthesis at night by bothautotrophs and heterotrophs (Fig. 7A and Fig. S1), is oneof the most nitrogen-rich amino acids (only arginine hasmore amino groups).

Overall, bacterial community investment in this olig-otrophic ocean system was skewed towards energyacquisition and metabolism during the day, while biosyn-thesis (specifically of membranes, amino acids and vita-mins) received relatively greater investments at night.Many microbial processes expected to be differentiallyexpressed over a day/night cycle, such as photosynthe-sis, oxidative phosphorylation and proteorhodopsin activ-ity, were indeed captured in the sequence data. Lessanticipated processes that emerged included the utiliza-tion of C1 compounds, the uptake of polyamines and thedegradation of aromatic compounds (Table 3). Othermetabolic processes ongoing in this microbial community,although without statistical evidence for day/night pat-terns, included: use of nitrate and urea as nitrogensources; use of phosphate, phosphonate and carbon-oxygen-phosphorus (C-O-P) compounds as phosphorussources; oxidation of reduced sulphur compounds; oxida-tion of carbon monoxide; and uptake of multiple tracemetals (Table 3). This comparative analysis of microbialcommunity transcripts has provided an inventory ofongoing metabolic processes, offered insights into theirtemporal patterns and supplied a new type of data forpredictive modelling of environmental controls on ecosys-tem properties.

Experimental procedures

Sample collection

Samples were collected at the Hawaiian Ocean Time-series(HOT) Station ALOHA, defined by the 6-nautical-mile radiuscircle centred at 22°45′N, 158°W in November, 2005 (HOT-175). For RNA extraction, sea water was collected from adepth of 25 m using Niskin bottles on a conductivity-temperature-depth rosette sampler. A night sample was col-lected at 03:00 on 11 November 2005, and a daytimesample was collected at 13:00 on 13 November 2005.During HOT-175, the peak PAR level was at 12:00, withsunrise occurring around 07:00 and sunset just before18:00. Sea water (80 l for the night sample and 40 l for the

day sample) was prefiltered through a 5 mm, 142 mm poly-carbonate filter (GE Osmonics, Minnetonka, MN) followedby a 0.2 mm, 142 mm Durapore (Millipore) filter usingpositive air pressure. The 0.2 mm filters were placed in a15 ml tube containing 2 ml Buffer RLT (containingb-mercaptoethanol) from the RNeasy kit (Qiagen, Valencia,CA) and flash-frozen in liquid nitrogen for RNA extraction.For DNA extraction, an additional 20 l of sea water weresimultaneously filtered using the protocol outlined above atboth time points. The 0.2 mm filters were placed in Whirlpackbags and flash-frozen. The total sampling time from initiationof collection until freezing in liquid nitrogen was approxi-mately 1.5 h. We obtained ~1 mg of total RNA from 40 to 80 lof sea water. Following mRNA enrichment and amplification,30–100 mg of mRNA was available for conversion to cDNAfor sequencing. Typically, only 3–5 mg of DNA was requiredfor pyrosequencing.

RNA and DNA preparation

DNA was extracted using a phenol : chloroform-based proto-col (Fuhrman et al., 1988). Briefly, frozen filters inside Whirl-pak bags were transferred to 50 ml Falcon centrifuge tubes.Ten millilitre extraction buffer [SDS (10% Sodium DoecylSulphate) : STE (100 mM NaCl, 10 mM Tris, 1 mM EDTA),9:1] was added to the tubes and boiled in a water bath for5 min. The extraction buffer was then removed from thetubes, placed into Oak Ridge round-bottom centrifuge tubes,to which 3 ml NaOAc and 28 ml 100% EtOH were added.Organic macromolecules were precipitated overnight at-20°C, before the tubes were centrifuged for 1 h at 15 000 g.The supernatant was decanted, and pellets dried for 30 minin the air. The pellets were resuspended in 600 ml deionizedwater, and sequentially extracted with 500 ml phenol, 500 mlphenol : chloroform : isoamyl alcohol (24:1:0.1), and 500 mlchloroform:isoamyl alcohol (9:1); after each extraction theorganic phase was removed and discarded. The supernatantwas removed into a fresh tube at the end of last extraction,amended with 150 ml NaOAc and 1.2 ml 100% EtOH, andprecipitated overnight. The tube contents were then centri-fuged at 15 000 g for 1 h, the supernatant decanted, andpellets dried in a speed vacuum dryer for 10 min. The DNApellets were resuspended in 100 ml DNAse and RNAse-freedeionized water (Ambion).

RNA was extracted using a modified version of the RNeasykit (Qiagen) that results in high RNA yields from material onpolycarbonate filters (Poretsky et al., 2008). Frozen sampleswere first thawed slightly for 2 min in a 40–50°C water bathand then vortexed for 10 min with RNase-free beads from theMo-Bio RNA PowerSoil kit (Carlsbad, CA). Following centrifu-gation for 5 min at 3000–5000 g, the supernatant was trans-ferred to a new tube. Beginning with the RNeasy Midi kit,1 vol. of 70% ethanol was added to the lysate and, in order toshear large-molecular-weight nucleic acids, the lysate wasdrawn through a 22-gauge needle several (~5) times. RNAextraction then continued with the RNeasy Mini kit accordingto the manufacturer’s instructions.

Following extraction, RNA was treated with DNase usingthe TURBO DNA-free kit (Ambion, Austin, TX). Two methodswere employed to rid the RNA samples of rRNA. The RNAwas first treated enzymatically with the mRNA-ONLY

Comparative Metatranscriptomic Analysis 1371

© 2009 The AuthorsJournal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375

Page 15: Comparative day/night metatranscriptomic analysis of ...users.unimi.it/biofilms/appl biotec amb_LM... · Comparative day/night metatranscriptomic analysis of microbial communities

Prokaryotic mRNA Isolation Kit (Epicentre Biotechnologies,Madison, WI) that uses a 5′-phosphate-dependent exonu-clease to degrade rRNAs. The MICROBExpress kit (Ambion)subtractive hybridization with capture oligonucleotideshybridized to magnetic beads was subsequently used as anadditional mRNA enrichment step.

In order to obtain mg quantities of mRNA, approximately500 ng of RNA was linearly amplified using the MessageAmpII-Bacteria Kit (Ambion) according to the manufacturer’sinstructions. Finally, the amplified, antisense RNA (aRNA)was converted to double-stranded cDNA with random hex-amers using the Universal RiboClone cDNA SynthesisSystem (Promega, Madison, WI). The cDNA was purified withthe Wizard DNA Clean-up System (Promega). The qualityand quantity of the total RNA, mRNA, aRNA and cDNA wereassessed by measurement on the NanoDrop-1000 Spectro-photometer (NanoDrop Technologies, Wilmington, DE) andthe Experion Automated Electrophoresis System (Bio-Rad,Hercules, CA).

cDNA sequencing and quality control

cDNAs from each sample (night and day) were sequencedusing the GS 20 sequencing system by 454 Life Sciences(Branford, CT) (Margulies et al., 2005), resulting in10 682 120 bp from 106 907 reads for the night sample and13 255 704 bp from 133 515 reads for the day sample. Theaverage sequence length was 99 bp. The sequences havebeen deposited in the NCBI Short Read Archive with theGenome Project ID #33463.

rRNA identification and removal

For rRNA sequence identification, the sequences were clus-tered at an identity threshold of 98% based on a local align-ment (number of identical residues divided by length ofalignment) using the program Cd-hit (Li and Godzik, 2006).Ribosomal RNA sequences were identified by BLASTN queriesof the reference sequence of each cluster against the non-curated, GenBank nucleotide database (nt) (Benson et al.,2007) using cut-off criteria of E-value � 10-3, nucleic acidlength � 69 and per cent identity � 40% previously estab-lished with in silico tests for rRNA sequence predictions ofshort pyrosequences (Frias-Lopez et al., 2008; Mou et al.,2008). We conservatively identified a sequence as rRNA-derived and removed it from the analysis pipeline if any of thetop three BLASTN hits were to an rRNA gene.

cDNA sequence annotation

The criteria for protein predictions generated using BLASTX

against the NCBI curated, non-redundant referencesequence database (RefSeq) (Pruitt et al., 2005) were estab-lished with in silico tests to determine suitable cut-off limits forreliable functional prediction. For these tests, 100 arbitrarilyselected, known functional gene sequences were fragmentedinto 20–500 bp fragments and analysed using BLASTX

against RefSeq to determine if the best BLAST hit was to thecorrect gene function, excluding self-hits. Based on theseanalyses, the cut-off criteria for protein prediction were

set as E-value < 0.01, identity > 40% and overlappinglength > 23 aa to the corresponding best hit.

Sequences with hits to RefSeq were assigned functionalprotein or pathway predictions based on the COG database(Tatusov et al., 2000) or KEGG database (Kanehisa andGoto, 2000). The cut-off criteria for functional protein predic-tion based on orthologous groups using BLASTX analysisagainst the COG database were established using the samein silico approach with 100 bp fragments of known functionalgenes as E-value < 0.1, identity > 40% and overlappinglength > 23 aa to the corresponding best hit. The COG cut-offcriteria were also applied to the KEGG database for pathwayprediction because of the similarity in database size. Taxo-nomic binning of the sequences was carried out using MEGAN

with the default settings for all parameters (Huson et al.,2007); this program assigns likely taxonomic origin tosequences based on the NCBI taxonomy of closest BLAST

hits. The taxonomic affiliations of the putative mRNAsequences were predicted using MEGAN to the family level,and the top BLAST hit for any higher-resolution taxonomicassignments. All non-rRNA sequences that had no RefSeqhits were BLASTX-queried against the nr database as well asagainst CAMERA un-assembled ORFs predicted from theGlobal Ocean Survey reads (http://camera.calit2.net/index.php) (Seshadri et al., 2007).

Eukaryotic sequence annotation

Eukaryotic transcripts were binned by MEGAN. Sequenceswere queried (BLASTX) against a curated database of proteinsequences derived from all available complete eukaryoticorganelle and nuclear genomes (currently, 46 eukaryoticgenomes). Transcripts that matched a reference proteinsequence with > 60% identity and an E-value < e-10 wereretained and the reference protein for the cluster was used forfunctional annotation. Functional annotation was performedusing Java-based Blast2go (Conesa et al., 2005) that anno-tates genes based on similarity searches with statisticalanalysis and highlighted visualization on directed acyclicgraphs.

16S rRNA gene libraries

PCR amplification of ribosomal DNA was carried out usingprimers 27F and 1522R (Johnson, 1994). The PCR condi-tions were as follows: 3 min at 96°C, followed by 30 cycles ofdenaturation at 95°C for 50 s, annealing at 58°C for 50 s,primer extension at 72°C for 1 min and a final extension at72°C for 10 min. PCR products were cleaned using theQIAquick PCR Purification Kit (Qiagen) and multiple PCRreactions were pooled and cloned into pCR2.1 vector usingthe TOPO TA cloning kit (Invitrogen, Carlsbad, CA). PCRamplifications included standard no-template controls.Clones from each sample (192) were sequenced at the Uni-versity of Georgia Sequencing Facility on an ABI 3100(Applied Biosystems, Foster City, CA).

Predicted highly expressed genes

The PHX genes were determined for cultured representativesof three prokaryotic taxa that were well represented in thetranscript libraries (Prochlorococcus, Roseobacter and

1372 R. S. Poretsky et al.

© 2009 The AuthorsJournal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375

Page 16: Comparative day/night metatranscriptomic analysis of ...users.unimi.it/biofilms/appl biotec amb_LM... · Comparative day/night metatranscriptomic analysis of microbial communities

SAR11) using an algorithm developed by Karlin and Mrázek(2000). The algorithm is based on comparisons with codonusage patterns in genes expected to be frequently tran-scribed in a prokaryotic genome (ribosomal proteins, chap-erone proteins, etc.). Environmental transcript sequencesthat had best BLAST hits to one of the PHX genes weresimilarly designated as PHX.

Statistical analysis

A statistical program designed for comparing gene frequencyin metagenomic data sets (Rodriguez-Brito et al., 2006) wasused to compare the night and day mRNA sequences cat-egorized based on COGs, KEGGs and proteins. The programwas run with 20 000 repeated samplings with a sample sizeof 10 000 for COGs, 9000 for KEGGs and 25 000 for pro-teins. The significance level (P) was set at < 0.05.

qPCR verifications

To confirm that the composition of the pyrosequence librarywas representative of the initial mRNAs, transcripts of fivegenes that were top hits to multiple sequences in both tran-script pools were quantified in the total RNA pool. The qPCRprimer sets were designed for the P. marinus str. AS9601recA and psaA, a proteorhodopsin gene and a Na+/solutesymporter (Ssf family) gene from P. ubique HTCC1062, and aprobable integral membrane proteinase attributed to Psy-chroflexus torquis ATCC 700755 (sequences and annealingtemps in Table S6). Reverse transcription reactions werecarried out on 200 ng of RNA using the Omniscript RT kit(Qiagen) in 20 ml volumes containing 1¥ RT buffer, 0.3 mg ml-1

of random hexamers (Invitrogen), 1 ml of 5 mM dNTPs, 2 U ofreverse transcriptase and 20 U of RNase inhibitor (Promega)at 37°C for 1 h, followed by inactivation of the reverse tran-scriptase at 95°C for 2 min. The day : night ratio of each genetranscript in the RNA pools was determined by qPCR ampli-fication of a serial dilution of cDNAs in triplicate, and calcu-lation of the difference in cycle threshold values (DCT)between the two samples. Quantitative amplification wasdone using the iCycler iQ RT PCR detection system (Bio-Rad) in a 20 ml reaction volume containing 10 ml of iQ SYBRGreen Supermix (Bio-Rad), 0.4 ml each of 10 mM of theforward and reverse primers and 1 ml of the cDNA template.PCR conditions included a preliminary denaturation at 95°Cfor 3 min followed by 45 cycles of 95°C for 15 s, annealing for1.5 s, 95°C for 1 min and 55°C for 1 min. A melt curve wasgenerated following the PCR, beginning with 55°C andincreasing 0.4°C every 10 s until 95°C. A PCR control withoutan initial RT step was included with every set of reactions.

Acknowledgements

We thank the Captain and crew of the R/V Kilo Moana and DrDavid Karl. Jennifer Oliver assisted with sample processing.Jonathan Badger assisted with data processing. Funding wasprovided by The Gordon and Betty Moore Foundation,National Science Foundation grants MCB-0702125 (M.A.M.),EF-0722374 (A.E.A) and OCE-0425363 (J.P.Z.), and the NSFC-MORE Center for Microbial Oceanography.

References

Allen, A.E., Vardi, A., and Bowler, C. (2006) An ecologicaland evolutionary context for integrated nitrogen metabo-lism and related signaling pathways in marine diatoms.Curr Opin Plant Biol 9: 264–273.

Armbrust, E.V., Berges, J.A., Bowler, C., Green, B.R., Mar-tinez, D., Putnam, N.H., et al. (2004) The genome of thediatom Thalassiosira pseudonana: ecology, evolution, andmetabolism. Science 306: 79–86.

Bauer, J.E., Williams, P.M., and Druffel, E.R.M. (1992) 14Cactivity of dissolved organic carbon fractions in the north-central Pacific and Sargasso Sea. Nature 357: 667–670.

Belasco, J.G. (1993) mRNA degradation in prokaryotic cells:an overview. In Control of Messenger RNA Stability.Belasco, J.G., Brawerman, G. (eds). San Diego, CA, USA:Academic Press, pp. 3–11.

Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J.,and Wheeler, D.L. (2007) GenBank. Nucleic Acids Res 35:D21–D25.

Bürgmann, H., Widmer, F., Sigler, W.V., and Zeyer, J. (2003)mRNA extraction and reverse transcription-PCR protocolfor detection of nifH gene expression by Azotobacter vine-landii in soil. Appl Environ Microbiol 69: 1928–1935.

Bürgmann, H., Howard, E.C., Ye, W., Sun, F., Sun, S., Napi-erala, S., and Moran, M.A. (2007) Transcriptional responseof Silicibacter pomeroyi DSS-3 to dimethylsulfoniopropi-onate (DMSP). Environ Microbiol 9: 2742–2755.

Campbell, L., and Vaulot, D. (1993) Photosynthetic pico-plankton community structure in the subtropical NorthPacific Ocean near Hawaii (Station ALOHA). Deep SeaRes. Part I Oceanogr Res Pap 40: 2043–2060.

Carpenter, L.J., Lewis, A.C., Hopkins, J.R., Read, K.A.,Longley, I.D., and Gallagher, M.W. (2004) Uptake ofmethanol to the North Atlantic Ocean surface. Global Bio-geochem Cycles 18: GB4027.

Cavender-Bares, K.K., Karl, D.M., and Chisholm, S.W.(2001) Nutrient gradients in the western North AtlanticOcean: relationship to microbial community structure andcomparison to patterns in the Pacific Ocean. Deep SeaRes. Part I Oceanogr Res Pap 48: 2373–2395.

Conesa, A., Gotz, S., Garcia-Gomez, J.M., Terol, J., Talon,M., and Robles, M. (2005) Blast2GO: a universal tool forannotation, visualization and analysis in functional genom-ics research. Bioinformatics 21: 3674–3676.

DeLong, E.F., Preston, C.M., Mincer, T., Rich, V., Hallam,S.J., Frigaard, N.-U., et al. (2006) Community genomicsamong stratified microbial assemblages in the ocean’sinterior. Science 311: 496–503.

Derelle, E., Ferraz, C., Rombauts, S., Rouze, P., Worden,A.Z., Robbens, S., et al. (2006) Genome analysis of thesmallest free-living eukaryote Ostreococcus tauri unveilsmany unique features. Proc Natl Acad Sci USA 103:11647–11652.

Frias-Lopez, J., Shi, Y., Tyson, G.W., Coleman, M.L.,Schuster, S.C., Chisholm, S.W., and DeLong, E.F. (2008)Microbial community gene expression in ocean surfacewaters. Proc Natl Acad Sci USA 105: 3805–3810.

Fuhrman, J.A., Comeau, D.E., Hagstrom, A., and Chan, A.M.(1988) Extraction from natural planktonic microorganismsof DNA suitable for molecular biological studies. ApplEnviron Microbiol 54: 1426–1429.

Comparative Metatranscriptomic Analysis 1373

© 2009 The AuthorsJournal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375

Page 17: Comparative day/night metatranscriptomic analysis of ...users.unimi.it/biofilms/appl biotec amb_LM... · Comparative day/night metatranscriptomic analysis of microbial communities

Gelder, R.N.V., von Zastrow, M.E., Yool, A., Dement, W.C.,Barchas, J.D., and Eberwine, J.H. (1990) Amplified RNAsynthesized from limited quantities of heterogeneouscDNA. Proc Natl Acad Sci USA 87: 1663–1667.

Gilbert, J.A., Field, D., Huang, Y., Edwards, R., Li, W., Gilna,P., and Joint, I. (2008) Detection of large numbers of novelsequences in the metatranscriptomes of complex marinemicrobial communities. PLoS ONE 3: e3042.

Giovannoni, S.J., Hayakawa, D.H., Tripp, H.J., Stingl, U.,Givan, S.A., Cho, J.-C., et al. (2008) The small genome ofan abundant coastal ocean methylotroph. Environ Micro-biol 10: 1771–1782.

Heikes, B.G., Chang, W.N., Pilson, M.E.Q., Swift, E., Singh,H.B., Guenther, A., et al. (2002) Atmospheric methanolbudget and ocean implication. Global Biogeochem Cycles16: 80.81–80.80.13.

Howard, E.C., Henriksen, J.R., Buchan, A., Reisch, C.R.,Burgmann, H., Welsh, R., et al. (2006) Bacterial taxa thatlimit sulfur flux from the ocean. Science 314: 649–652.

Huson, D.H., Auch, A.F., Qi, J., and Schuster, S.C. (2007)MEGAN analysis of metagenomic data. Genome Res 17:377–386.

Ingraham, J.L., Maaløe, O., and Neidhardt, F.C. (1983)Growth of the Bacterial Cell. Sunderland, MA, USA:Sinauer Associates.

Johnson, J.L. (1994) Similarity analysis of rRNAs. In Methodsfor General and Molecular Bacteriology. Gerhardt, P.,Murray, R.G.E., Wood, W.A., and Krieg, N.R. (eds). Wash-ington, DC: American Society for Microbiology, pp. 683–700.

Johnson, Z.I., Zinser, E.R., Coe, A., McNulty, N.P., Wood-ward, E.M.S., and Chisholm, S.W. (2006) Niche partition-ing among Prochlorococcus ecotypes along ocean-scaleenvironmental gradients. Science 311: 1737–1740.

Kanehisa, M., and Goto, S. (2000) KEGG: Kyoto encyclope-dia of genes and genomes. Nucleic Acids Res 28: 27–30.

Karl, D., Letelier, R., Tupas, L., Dore, J., Christian, J., andHebel, D. (1997) The role of nitrogen fixation in bio-geochemical cycling in the subtropical North PacificOcean. Nature 388: 533–538.

Karl, D.M., and Lukas, R. (1996) The Hawaii Ocean Time-series (HOT) program: background, rationale and fieldimplementation. Deep Sea Res. Part II Top Stud Oceanogr43: 129–156.

Karlin, S., and Mrázek, J. (2000) Predicted highly expressedgenes of diverse prokaryotic genomes. J Bacteriol 182:5238–5250.

Kiene, R.P., Linn, L.J., and Bruton, J.A. (2000) New andimportant roles for DMSP in marine microbial communities.J Sea Res 43: 209–224.

Lander, E.S., and Waterman, M.S. (1988) Genomic mappingby fingerprinting random clones: a mathematical analysis.Genomics 2: 231–239.

Li, W., and Godzik, A. (2006) Cd-hit: a fast program forclustering and comparing large sets of protein or nucleotidesequences. Bioinformatics 22: 1658–1659.

Liang, P., and Pardee, A.B. (1992) Differential display ofeukaryotic messenger RNA by means of the polymerasechain reaction. Science 257: 967–971.

McDonald, S.M., Sarno, D., Scanlan, D.J., and Zingone, A.(2007) Genetic diversity of eukaryotic ultraphytoplankton in

the Gulf of Naples during an annual cycle. Aquat MicrobEcol 50: 75–89.

Margulies, M., Egholm, M., Altman, W.E., Attiya, S., Bader,J.S., Bemben, L.A., et al. (2005) Genome sequencing inmicrofabricated high-density picolitre reactors. Nature 437:376–380.

Mary, I., Garczarek, L., Tarran, G.A., Kolowrat, C., Terry,M.J., Scanlan, D.J., et al. (2008) Diel rhythmicity in aminoacid uptake by Prochlorococcus. Environ Microbiol 10:2124–2131.

Morris, R.M., Rappe, M.S., Connon, S.A., Vergin, K.L.,Siebold, W.A., Carlson, C.A., and Giovannoni, S.J. (2002)SAR11 clade dominates ocean surface bacterioplanktoncommunities. Nature 420: 806–810.

Mou, X., Sun, S., Edwards, R.A., Hodson, R.E., and Moran,M.A. (2008) Bacterial carbon processing by generalistspecies in the coastal ocean. Nature 451: 708–711.

Overbeek, R., Fonstein, M., D’Souza, M., Pusch, G.D., andMaltsev, N. (1999) The use of gene clusters to infer func-tional coupling. Proc Natl Acad Sci USA 96: 2896–2901.

Poretsky, R.S., Bano, N., Buchan, A., LeCleir, G.,Kleikemper, J., Pickering, M., et al. (2005) Analysis ofmicrobial gene transcripts in environmental samples. ApplEnviron Microbiol 71: 4121–4126.

Poretsky, R.S., Bano, N., Buchan, A., Moran M.A., andHollibaugh, J.T. (2008) Environmental transcriptomics: amethod to access expressed genes in complex microbialcommunities. In Molecular Microbial Ecology Manual.Kowalchuk, G.A., de Bruijn, F.J., Head, I.M., Akkermans,A.D.L., and van Elsas, J.D. (eds). Dordrecht, Netherlands:Springer, pp. 1892–1904.

Pruitt, K.D., Tatusova, T., and Maglott, D.R. (2005) NCBIReference Sequence (RefSeq): a curated non-redundantsequence database of genomes, transcripts and proteins.Nucleic Acids Res 33: D501–D504.

Rodriguez-Brito, B., Rohwer, F., and Edwards, R. (2006) Anapplication of statistics to comparative metagenomics.BMC Bioinformatics 7: 162.

Rusch, D.B., Halpern, A.L., Sutton, G., Heidelberg, K.B.,Williamson, S., Yooseph, S., et al. (2007) The Sorcerer IIGlobal Ocean Sampling Expedition: Northwest Atlanticthrough Eastern Tropical Pacific. PLoS Biol 5: e77.

Schaefer, J.K., Goodwin, K.D., McDonald, I.R., Murrell, J.C.,and Oremland, R.S. (2002) Leisingera methylohatidivoransgen. nov., sp nov., a marine methylotroph that grows onmethyl bromide. Int J Syst Evol Microbiol 52: 851–859.

Seshadri, R., Kravitz, S.A., Smarr, L., Gilna, P., and Frazier,M. (2007) CAMERA: a community resource for metage-nomics. PLoS Biol 5: 394–397.

Tatusov, R.L., Galperin, M.Y., Natale, D.A., and Koonin, E.V.(2000) The COG database: a tool for genome-scale analy-sis of protein functions and evolution. Nucleic Acids Res28: 33–36.

Ward, B.B., Kilpatrick, K.A., Novelli, P.C., and Scranton, M.I.(1987) Methane oxidation and methane fluxes in the oceansurface-layer and deep anoxic waters. Nature 327: 226–229.

Wawrik, B., Paul, J.H., and Tabita, F.R. (2002) Real-timePCR quantification of rbcL (ribulose-1,5-bisphosphatecarboxylase/oxygenase) mRNA in diatoms and pelago-phytes. Appl Environ Microbiol 68: 3771–3779.

1374 R. S. Poretsky et al.

© 2009 The AuthorsJournal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375

Page 18: Comparative day/night metatranscriptomic analysis of ...users.unimi.it/biofilms/appl biotec amb_LM... · Comparative day/night metatranscriptomic analysis of microbial communities

Woodall, C.A., Warner, K.L., Oremland, R.S., Murrell, J.C.,and McDonald, I.R. (2001) Identification of methyl halide-utilizing genes in the methyl bromide-utilizing bacterialstrain IMB-1 suggests a high degree of conservation ofmethyl halide-specific genes in gram-negative bacteria.Appl Environ Microbiol 67: 1959–1963.

Zehr, J.P., Waterbury, J.B., Turner, P.J., Montoya, J.P.,Omoregie, E., Steward, G.F., et al. (2001) Unicellularcyanobacteria fix N2 in the subtropical North Pacific Ocean.Nature 412: 635–638.

Zhou, J.H. (2003) Microarrays for bacterial detection andmicrobial community analysis. Curr Opin Microbiol 6: 288–294.

Supporting information

Additional Supporting Information may be found in the onlineversion of this article:

Fig. S1. Transcript mapping to the KEGG histidine metabo-lism pathway for P. marinus (A) and the vitamin B6 metabo-lism pathway for P. ubique (B) at night. Blue shading indicatesthat transcripts were found; grey indicates genes that arepresent in the genome, but no transcripts were found; whiteindicates genes that are not present in the referencegenomes.Fig. S2. Quality control of the pyrosequences using qPCRverifications of transcript ratios for five genes: recA and psaAfrom P. marinus str. AS9601, a bacteriorhodopsin and aNa+/solute symporter (Ssf family) gene from P. ubiqueHTCC1062, and a probable integral membrane proteinaseattributed to P. torquis ATCC 700755. The night : day ratio oftranscripts in the pyrosequence libraries is plotted against thesame ratio in the original total RNA fraction.

Table S1. Results of bioinformatic pipeline for 100 and200 bp fragments from groups for which there are no genomesequences currently available. BACs from uncultured marinetaxa (two from SAR86 and one from SAR116) were frag-mented into random 100 bp pieces, using just the codingregions. Fragments were blasted against RefSeq, not allow-ing a self-hit. As controls, we did the same for P. ubiqueHTCC1062 and P. marinus MIT9312.Table S2. Estimates of coverage using two different models.The Lander–Waterman model uses the 16S rRNA clonelibrary data to establish a taxon-abundance model for thesystem at a similarity level of 99%, and is based on theassumptions that each taxon produces 1000 transcripts atany given time and all expressed genes are expressedequally. The Chao1 richness estimators for COGs are com-puted using EstimateS (version 8.0, R. K. Colwell, http://purl.oclc.org/estimates).Table S3. KEGG pathways for three taxonomic bins(P. marinus, P. ubique and Roseobacters) significantly over-represented in the night (grey shading) and day (no shading)transcriptomes (P < 0.10).Table S4. COGs significantly overrepresented in the night(grey shading) and day (no shading) transcriptomes(P < 0.05).Table S5. Genes significantly overrepresented in the night(grey shading) and day (no shading) transcriptomes(P < 0.05).Table S6. Primer sets used in qPCR.

Please note: Wiley-Blackwell are not responsible for thecontent or functionality of any supporting materials suppliedby the authors. Any queries (other than missing material)should be directed to the corresponding author for thearticle.

Comparative Metatranscriptomic Analysis 1375

© 2009 The AuthorsJournal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375