Describing polar marine microbial communities by their metabolic structure

1
Describing polar marine microbial communities by their metabolic structure Can we bridge the gap between community structure and ecosystem function? Jeff S. Bowman* and Hugh W. Ducklow Lamont-Doherty Earth Observatory, Columbia University *[email protected] Introduction and Motivation Ecologists typically describe microbial communities by the diversity of a taxonomic marker gene, such as the 16S rRNA gene. Although this data is well suited to evaluating differences between communities, and to cor- relate community structure with other environmental parameters (e.g. chlorophyll concentration, tempera- ture, salinity), it is less well suited to describing the metabolic capabilities (i.e. ecosystem function) of the community. Although metagenomics and other techniques can bridge the gap between microbial commu- nity structure and ecosystem function these techniques are costly, data intensive, and low throughput. Our goal was to develop a high-throughput method for inferring community metabolism from community taxonomy. By evaluating metabolic structure in place of community structure we capture key in- ter-sample relationships and their impact on microbial ecosystem function. Our method produces pathway genome databases (PGDBs) that describe the metabolic pathways likely to be present in the sample. These PGDBs are amenable to flux-based metabolic modeling. Future work will focus on predict- ing the flow of elements and energy through these pathways, providing a way to model the impact of changing community structure on biogeochemical cycles. Here we apply our method to a seasonally variable, depth stratified microbial community from the West Ant- arctic Peninsula, a region undergoing unprecedented environmental change. Key Points Microbial communities can be described by their metabolic structure. Metabolic structure provides information on potential microbial ecosystem functions . Representing a microbial community by metabolic structure may provide a way to model the flow of elements and energy through the community . í í í í í í í í í í í í í í í Longitude Latitude NW NE SW SE WAP srr36.NW.surface srr40.NW.surface srr27.SW.surface.A srr38.SW.surface.A srr39.SE.surface.H srr41.SE.surface.H srr43.NE.surface.I srr44.NE.surface.I srr30.SW.deep.D srr34.SW.deep.D srr33.SE.deep.E srr29.NW.deep.C srr42.NW.deep.C srr31.SE.deep.E srr32.NE.surface.B srr28.NE.surface.B srr35.NE.deep.F srr37.NE.deep.F 0.05 0.15 0.25 Height srr30.SW.deep.D srr34.SW.deep.D srr35.NE.deep.F srr31.SE.deep.E srr32.NE.surface.B srr33.SE.deep.E srr28.NE.surface.B srr29.NW.deep.C srr42.NW.deep.C srr38.SW.surface.A srr27.SW.surface.A srr40.NW.surface srr43.NE.surface.I srr44.NE.surface.I srr37.NE.deep.F srr36.NW.surface srr39.SE.surface.H srr41.SE.surface.H 0.4 0.6 0.8 1.0 Height A B Deep Winter surface Surface 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.45 0.55 0.65 0.75 Distance by pathway abundance Distance by edge abundance C srr27.SW.surface.A srr36.NW.surface srr38.SW.surface.A srr39.SE.surface.H srr40.NW.surface srr41.SE.surface.H srr43.NE.surface.I srr44.NE.surface.I srr29.NW.deep.C srr30.SW.deep.D srr31.SE.deep.E srr33.SE.deep.E srr34.SW.deep.D srr35.NE.deep.F srr37.NE.deep.F srr42.NW.deep.C srr28.NE.surface.B srr32.NE.surface.B phenylalanine degradation II (anaerobic) phenylacetate degradation II (anaerobic) /íLGRQDWH GHJUDGDWLRQ alginate degradation maltose degradation spheroidene and spheroidenone biosynthesis thiamin salvage III formate oxidation to CO2 salicylate degradation I chlorosalicylate degradation methylsalicylate degradation guanylyl molybdenum cofactor biosynthesis proline degradation JO\FRJHQ ELRV\QWKHVLV , IURP $'3í'í*OXFRVH DVSDUDJLQH ELRV\QWKHVLV ,,, W51$íGHSHQGHQW phenylacetate degradation I (aerobic) JOXFRVH DQG JOXFRVHííSKRVSKDWH GHJUDGDWLRQ lysine biosynthesis I triclosan resistance FLVíYDFFHQDWH ELRV\QWKHVLV srr27.SW.surface.A srr36.NW.surface srr38.SW.surface.A srr39.SE.surface.H srr40.NW.surface srr41.SE.surface.H srr43.NE.surface.I srr44.NE.surface.I srr29.NW.deep.C srr30.SW.deep.D srr31.SE.deep.E srr33.SE.deep.E srr34.SW.deep.D srr35.NE.deep.F srr37.NE.deep.F srr42.NW.deep.C srr28.NE.surface.B srr32.NE.surface.B Robiginitalea biformata HTCC2501 +DHPRSKLOXV LQIOXHQ]DH 3LWW** Lactobacillus sanfranciscensis TMW 1 1304 Actinosynnema mirum DSM 43827 Alteromonodales spp. Arthrobacter aurescens TC1 Thermodesulfovibrio yellowstonii DSM 11347 Bartonella bacilliformis KC583 Colwellia psychrerythraea 34H Nitrosopumilus maritimus SCM1 Bartonella quintana Toulouse Thalassobaculum spp. Ruegeria pomeroyi DSS 3 Saccharophagus degradans 2 40 Halothiobacillus spp Parvibaculum_lavamentivorans DS 1 Capnocytophaga/Cellulophaga spp. Hippea maritima DSM 10411 Tetragenococcus halophilus Melissococcus plutonius spp. í í 0 10 20 30 í í 0 5 10 15 20 PC1 PC2 Deep Surface Winter surface í í 0 5 10 15 í í í 0 5 PC1 PC2 Deep Surface Winter surface A B C D 741 0 41 0 16S sequence library, the bigger the better! Obtain all completed genomes Build 16S rRNA reference tree Find consensus genome for each tree node Place reads on reference tree Extract pathways for each placement Generate confidence score for sample Predict metabolic pathways Calculate confidence for each node Evaluate genomic plasticity for terminal nodes Evaluate relative core genome size Sample Analysis Database Construction Confidence Score Fig. 1. Methods. Our metabolic inference pipe- line uses a phylogenetic placement program (p- placer) [1] to place query reads on a reference tree of 16S rRNA genes from all completed genomes. We determine a consensus genome for each point of placement on the tree, and determine the met- abolic pathways represented in these genomes. Separately we determine a confidence score for each point of placement on the reference tree from a novel indicator of genomic stability. Fig. 4. Sample locations within the Palmer LTER off the WAP (left) and inter-sample similarity (right). The location of Palmer Sta- tion is given by the star. Summer surface and deep samples along with winter surface samples were analyzed [2]. A) Hierarchical cluster- ing of samples by metabolic structure. B) Hierarchical clustering of samples by taxonomic structure. Note duplicate samples in both A and B. C) Distances between samples are in good agreement between the two methods (R2 = 0.65). Fig. 5. What taxa and metabolic pathways account for the most variance? Having determined that the relationship between sam- ples can be accurately represented by metabolic structure we can begin to ask ecologically relevant questions. A frequent question posed to community structure data is what taxa account for most variability? We can ask the same question of metabolic structure; what metabolism account for the most variability? A) PCA of taxonomic structure. B) PCA of metabolic structure. C) Heatmap of high vari- ance taxa. D) Heatmap of high variance metabolisms. These metabolisms represent ecosystem functions that may be differentially provided by the microbial communities. 1. Matsen, F, R Kodner, E Armbrust. 2010. pplacer: Linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics, 11:538. 2. Luria, C, H Ducklow, L Amaral-Zettler. 2014. Marine bacterial, archaeal and eukaryotic diversity and community structure on the conti- nental shelf of the western Antarctic Peninsula. Aquatic Microbial Ecology, 73:2 107-121. www.polarmicrobes.org Link to Poster Email Presenter 0 500 1000 1500 2000 2500 0.0 0.2 0.4 0.6 0.8 1.0 Terminal node Relative plasticity I II III IV V VI VII VIII IX Terminal Node Terminal Node Internal Node Core genome Accessory Genome Fig. 2. Confidence score. Placements can be made to terminal and internal nodes. To determine the confidence (c) of a metabolic inference for a given placement we con- sider the core genome size (S core ), the mean genome size of the clade (S clade ), and the mean index of plasticity for the clade (r d ; Fig. 4). 2015 GRS and GRC, Polar Marine Science Fig. 3-. Genomic plasticity of genomes in our database. A major impediment to accurate metabolic inference is the genetic diversity that can exist within even a narrow taxonomic clade. We developed a confidence metric for our inferred metab- olisms that is based on the degree of genomic plasticity present inherent to each genome. X-axis gives the position of each genome on our reference tree, Y-axis gives the degree of plasticity. Unusually plastic genomes are indicated by Roman numerals. I) Nanoarcheum equitans II) the Mycobacteria III) a butyrate producing bacterium within the Clostridium IV) Candidatus Hodgkinia circadicola V) the Myco- plasma VI) Sulcia muelleri VII) Portiera aleyrodidanum VIII) Buchnera aphidicola IX) the Oxalobacteraceae.

Transcript of Describing polar marine microbial communities by their metabolic structure

Page 1: Describing polar marine microbial communities by their metabolic structure

Describing polar marine microbial communities by their metabolic structureCan we bridge the gap between community structure and ecosystem function?

Je� S. Bowman* and Hugh W. DucklowLamont-Doherty Earth Observatory, Columbia University

*[email protected]

Introduction and Motivation

Ecologists typically describe microbial communities by the diversity of a taxonomic marker gene, such as the 16S rRNA gene. Although this data is well suited to evaluating di�erences between communities, and to cor-relate community structure with other environmental parameters (e.g. chlorophyll concentration, tempera-ture, salinity), it is less well suited to describing the metabolic capabilities (i.e. ecosystem function) of the community. Although metagenomics and other techniques can bridge the gap between microbial commu-nity structure and ecosystem function these techniques are costly, data intensive, and low throughput.

Our goal was to develop a high-throughput method for inferring community metabolism from community taxonomy. By evaluating metabolic structure in place of community structure we capture key in-ter-sample relationships and their impact on microbial ecosystem function. Our method produces pathway genome databases (PGDBs) that describe the metabolic pathways likely to be present in the sample. These PGDBs are amenable to �ux-based metabolic modeling. Future work will focus on predict-ing the �ow of elements and energy through these pathways, providing a way to model the impact of changing community structure on biogeochemical cycles.

Here we apply our method to a seasonally variable, depth strati�ed microbial community from the West Ant-arctic Peninsula, a region undergoing unprecedented environmental change.

Key Points

• Microbial communities can be described by their metabolic structure.• Metabolic structure provides information on potential microbial ecosystem functions.• Representing a microbial community by metabolic structure may provide a way to model the �ow of elements and energy through the community.

Longitude

Latit

ude

NW

NE

SW

SE

WAP

srr3

6.N

W.s

urfa

cesr

r40.

NW

.sur

face

srr2

7.S

W.s

urfa

ce.A

srr3

8.S

W.s

urfa

ce.A

srr3

9.S

E.s

urfa

ce.H

srr4

1.S

E.s

urfa

ce.H

srr4

3.N

E.s

urfa

ce.I

srr4

4.N

E.s

urfa

ce.I

srr3

0.S

W.d

eep.

Dsr

r34.

SW

.dee

p.D

srr3

3.S

E.d

eep.

Esr

r29.

NW

.dee

p.C

srr4

2.N

W.d

eep.

Csr

r31.

SE

.dee

p.E

srr3

2.N

E.s

urfa

ce.B

srr2

8.N

E.s

urfa

ce.B

srr3

5.N

E.d

eep.

Fsr

r37.

NE

.dee

p.F

0.05

0.15

0.25

Hei

ght

srr3

0.S

W.d

eep.

Dsr

r34.

SW

.dee

p.D

srr3

5.N

E.d

eep.

Fsr

r31.

SE

.dee

p.E

srr3

2.N

E.s

urfa

ce.B

srr3

3.S

E.d

eep.

Esr

r28.

NE

.sur

face

.Bsr

r29.

NW

.dee

p.C

srr4

2.N

W.d

eep.

Csr

r38.

SW

.sur

face

.Asr

r27.

SW

.sur

face

.Asr

r40.

NW

.sur

face

srr4

3.N

E.s

urfa

ce.I

srr4

4.N

E.s

urfa

ce.I

srr3

7.N

E.d

eep.

Fsr

r36.

NW

.sur

face

srr3

9.S

E.s

urfa

ce.H

srr4

1.S

E.s

urfa

ce.H0.

40.

60.

81.

0

Hei

ght

A

B

Deep

Winter surface

Surface

0.06 0.08 0.10 0.12 0.14 0.16 0.18

0.45

0.55

0.65

0.75

Distance by pathway abundance

Dis

tanc

e by

edg

e ab

unda

nce

C

srr2

7.S

W.s

urfa

ce.A

srr3

6.N

W.s

urfa

cesr

r38.

SW

.sur

face

.Asr

r39.

SE

.sur

face

.Hsr

r40.

NW

.sur

face

srr4

1.S

E.s

urfa

ce.H

srr4

3.N

E.s

urfa

ce.I

srr4

4.N

E.s

urfa

ce.I

srr2

9.N

W.d

eep.

Csr

r30.

SW

.dee

p.D

srr3

1.S

E.d

eep.

Esr

r33.

SE

.dee

p.E

srr3

4.S

W.d

eep.

Dsr

r35.

NE

.dee

p.F

srr3

7.N

E.d

eep.

Fsr

r42.

NW

.dee

p.C

srr2

8.N

E.s

urfa

ce.B

srr3

2.N

E.s

urfa

ce.B

phenylalanine degradation II (anaerobic)

phenylacetate degradation II (anaerobic)

alginate degradation

maltose degradation

spheroidene and spheroidenone biosynthesis

thiamin salvage III

formate oxidation to CO2

salicylate degradation I

chlorosalicylate degradation

methylsalicylate degradation

guanylyl molybdenum cofactor biosynthesis

proline degradation

phenylacetate degradation I (aerobic)

lysine biosynthesis I

triclosan resistance

srr2

7.S

W.s

urfa

ce.A

srr3

6.N

W.s

urfa

cesr

r38.

SW

.sur

face

.Asr

r39.

SE

.sur

face

.Hsr

r40.

NW

.sur

face

srr4

1.S

E.s

urfa

ce.H

srr4

3.N

E.s

urfa

ce.I

srr4

4.N

E.s

urfa

ce.I

srr2

9.N

W.d

eep.

Csr

r30.

SW

.dee

p.D

srr3

1.S

E.d

eep.

Esr

r33.

SE

.dee

p.E

srr3

4.S

W.d

eep.

Dsr

r35.

NE

.dee

p.F

srr3

7.N

E.d

eep.

Fsr

r42.

NW

.dee

p.C

srr2

8.N

E.s

urfa

ce.B

srr3

2.N

E.s

urfa

ce.B

Robiginitalea biformata HTCC2501

Lactobacillus sanfranciscensis TMW 1 1304

Actinosynnema mirum DSM 43827

Alteromonodales spp.

Arthrobacter aurescens TC1

Thermodesulfovibrio yellowstonii DSM 11347

Bartonella bacilliformis KC583

Colwellia psychrerythraea 34H

Nitrosopumilus maritimus SCM1

Bartonella quintana Toulouse

Thalassobaculum spp.

Ruegeria pomeroyi DSS 3

Saccharophagus degradans 2 40

Halothiobacillus spp

Parvibaculum_lavamentivorans DS 1

Capnocytophaga/Cellulophaga spp.

Hippea maritima DSM 10411

Tetragenococcus halophilus

Melissococcus plutonius spp.

0 10 20 300

510

1520

PC1

PC

2

●●

DeepSurfaceWinter surface

0 5 10 15

05

PC1

PC

2

●●

DeepSurfaceWinter surface

A B

C D

741

0

41

0

16S sequence library, the bigger

the better!

Obtain all completed genomes

Build 16S rRNA reference tree

Find consensus genome for

each tree node

Place reads on reference tree

Extract pathways for each placement

Generate confidence score

for sample

Predict metabolic pathways

Calculate confidence for

each node

Evaluate genomic

plasticity for terminal nodes

Evaluate relative core genome size

Sample Analysis

Database Construction

Con�dence Score

Fig. 1. Methods. Our metabolic inference pipe-line uses a phylogenetic placement program (p-placer) [1] to place query reads on a reference tree of 16S rRNA genes from all completed genomes. We determine a consensus genome for each point of placement on the tree, and determine the met-abolic pathways represented in these genomes. Separately we determine a con�dence score for each point of placement on the reference tree from a novel indicator of genomic stability.

Fig. 4. Sample locations within the Palmer LTER o� the WAP (left) and inter-sample similarity (right). The location of Palmer Sta-tion is given by the star. Summer surface and deep samples along with winter surface samples were analyzed [2]. A) Hierarchical cluster-ing of samples by metabolic structure. B) Hierarchical clustering of samples by taxonomic structure. Note duplicate samples in both A and B. C) Distances between samples are in good agreement between the two methods (R2 = 0.65).

Fig. 5. What taxa and metabolic pathways account for the most variance? Having determined that the relationship between sam-ples can be accurately represented by metabolic structure we can begin to ask ecologically relevant questions. A frequent question posed to community structure data is what taxa account for most variability? We can ask the same question of metabolic structure; what metabolism account for the most variability? A) PCA of taxonomic structure. B) PCA of metabolic structure. C) Heatmap of high vari-ance taxa. D) Heatmap of high variance metabolisms. These metabolisms represent ecosystem functions that may be di�erentially provided by the microbial communities.

1. Matsen, F, R Kodner, E Armbrust. 2010. pplacer: Linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a �xed reference tree. BMC Bioinformatics, 11:538.2. Luria, C, H Ducklow, L Amaral-Zettler. 2014. Marine bacterial, archaeal and eukaryotic diversity and community structure on the conti-nental shelf of the western Antarctic Peninsula. Aquatic Microbial Ecology, 73:2 107-121.

www.polarmicrobes.org

Link toPoster

EmailPresenter

0 500 1000 1500 2000 2500

0.0

0.2

0.4

0.6

0.8

1.0

Terminal node

Rel

ativ

e pl

astic

ity

I

IIIII

IV

V VIVII

VIII

IX

Terminal Node

Terminal Node

Internal Node

Core genome

Accessory Genome

Fig. 2. Con�dence score. Placements can be made to terminal and internal nodes. To determine the con�dence (c) of a metabolic inference for a given placement we con-sider the core genome size (Score), the mean genome size of the clade (Sclade), and the mean index of plasticity for the clade (rd; Fig. 4).

2015 GRS and GRC, Polar Marine Science

Fig. 3-. Genomic plasticity of genomes in our database. A major impediment to accurate metabolic inference is the genetic diversity that can exist within even a narrow taxonomic clade. We developed a con�dence metric for our inferred metab-olisms that is based on the degree of genomic plasticity present inherent to each genome. X-axis gives the position of each genome on our reference tree, Y-axis gives the degree of plasticity. Unusually plastic genomes are indicated by Roman numerals. I) Nanoarcheum equitans II) the Mycobacteria III) a butyrate producing bacterium within the Clostridium IV) Candidatus Hodgkinia circadicola V) the Myco-plasma VI) Sulcia muelleri VII) Portiera aleyrodidanum VIII) Buchnera aphidicola IX) the Oxalobacteraceae.