UC Davis EVE161 Lecture 9 by @phylogenomics

43
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Lecture 9: EVE 161: Microbial Phylogenomics Lecture #9: Era II: rRNA Case Study UC Davis, Winter 2014 Instructor: Jonathan Eisen 1

description

UC Davis EVE161 Course Lecture Slides

Transcript of UC Davis EVE161 Lecture 9 by @phylogenomics

Page 1: UC Davis EVE161 Lecture 9 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Lecture 9:

EVE 161:Microbial Phylogenomics

!Lecture #9:

Era II: rRNA Case Study !

UC Davis, Winter 2014 Instructor: Jonathan Eisen

!1

Page 2: UC Davis EVE161 Lecture 9 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Where we are going and where we have been

• Previous lecture: !8: Era II: rRNA ecology

• Current Lecture: !9: rRNA Case Study - Built Environment

• Next Lecture: !10: Genome Sequencing

!2

Page 3: UC Davis EVE161 Lecture 9 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Where we are going and where we have been

• Previous lecture: !8: Era II: rRNA ecology

• Current Lecture: !9: rRNA Case Study - Built Environment

• Next Lecture: !10: Genome Sequencing

!3

Page 4: UC Davis EVE161 Lecture 9 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Microbial Ecology of the Built Environment

• New Sloan Foundation Program

• Culture independent microbial studies linked to building science

• Many facilities being looked at including schools, homes, hospitals, offices, planes, cars

• More information at http://microBE.net

!

Page 5: UC Davis EVE161 Lecture 9 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Why Care?

• Humans spend most of their time in built environments

• Most microbial ecology studies have focused on natural environments

• Building design being governed by esthetic and engineering aspects and some health aspects but generally little microbiology taken into account

• Likely an important source of microbiomes of humans and other organisms in the built environment

Page 6: UC Davis EVE161 Lecture 9 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Architectural Design Drives the Biogeography of IndoorBacterial CommunitiesSteven W. Kembel1,2,3., James F. Meadow2,3*., Timothy K. O’Connor2,3,4, Gwynne Mhuireach2,5,

Dale Northcutt2,5, Jeff Kline2,5, Maxwell Moriyama2,5, G. Z. Brown2,5,6, Brendan J. M. Bohannan2,3,

Jessica L. Green2,3,7

1Departement des sciences biologiques, Universite du Quebec a Montreal, Montreal, Quebec, Canada, 2 Biology and the Built Environment Center, University of Oregon,

Eugene, Oregon, United States of America, 3 Institute of Ecology and Evolution, University of Oregon, Eugene, Oregon, United States of America, 4Department of Ecology

and Evolutionary Biology, University of Arizona, Tucson, Arizona, United States of America, 5 Energy Studies in Buildings Laboratory, University of Oregon, Eugene,

Oregon, United States of America, 6Department of Architecture, University of Oregon, Eugene, Oregon, United States of America, 7 Santa Fe Institute, Santa Fe, New

Mexico, United States of America

Abstract

Background: Architectural design has the potential to influence the microbiology of the built environment, withimplications for human health and well-being, but the impact of design on the microbial biogeography of buildings remainspoorly understood. In this study we combined microbiological data with information on the function, form, andorganization of spaces from a classroom and office building to understand how design choices influence the biogeographyof the built environment microbiome.

Results: Sequencing of the bacterial 16S gene from dust samples revealed that indoor bacterial communities wereextremely diverse, containing more than 32,750 OTUs (operational taxonomic units, 97% sequence similarity cutoff), butmost communities were dominated by Proteobacteria, Firmicutes, and Deinococci. Architectural design characteristicsrelated to space type, building arrangement, human use and movement, and ventilation source had a large influence on thestructure of bacterial communities. Restrooms contained bacterial communities that were highly distinct from all otherrooms, and spaces with high human occupant diversity and a high degree of connectedness to other spaces via ventilationor human movement contained a distinct set of bacterial taxa when compared to spaces with low occupant diversity andlow connectedness. Within offices, the source of ventilation air had the greatest effect on bacterial community structure.

Conclusions: Our study indicates that humans have a guiding impact on the microbial biodiversity in buildings, bothindirectly through the effects of architectural design on microbial community structure, and more directly through theeffects of human occupancy and use patterns on the microbes found in different spaces and space types. The impact ofdesign decisions in structuring the indoor microbiome offers the possibility to use ecological knowledge to shape ourbuildings in a way that will select for an indoor microbiome that promotes our health and well-being.

Citation: Kembel SW, Meadow JF, O’Connor TK, Mhuireach G, Northcutt D, et al. (2014) Architectural Design Drives the Biogeography of Indoor BacterialCommunities. PLoS ONE 9(1): e87093. doi:10.1371/journal.pone.0087093

Editor: Bryan A. White, University of Illinois, United States of America

Received July 18, 2013; Accepted December 18, 2013; Published January 29, 2014

Copyright: ! 2014 Kembel et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permitsunrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: This research was funded by a grant to the Biology and the Built Environment Center from the Alfred P. Sloan Foundation Microbiology for the BuiltEnvironment Program (http://www.sloan.org/major-program-areas/basic-research/microbiology-of-the-built-environment/). The funders had no role in studydesign, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing Interests: The authors have declared that no competing interests exist.

* E-mail: [email protected]

. These authors contributed equally to this work.

Introduction

Biologists and designers are beginning to collaborate in a newfield focused on the microbiology of the built environment [1,2].These collaborations, which integrate perspectives from ecologyand evolution, architecture, engineering and building science, aredriven by a number of interrelated observations. First, it isincreasingly recognized that buildings are complex ecosystemscomprised of microorganisms interacting with each other and theirenvironment [3–5]. Second, the built environment is the primaryhabitat of humans; humans spend the majority of their livesindoors where they are constantly coming into contact with the

built environment microbiome (the microbial communities withinbuildings) [6]. Third, evidence is growing that the microbes livingin and on people, the human microbiome, play a critical role inhuman health and well-being [7–9]. Together, these observationssuggest that it may be possible to influence the human microbiomeand ultimately human health, by modifying the built environmentmicrobiome through architectural design.Despite this potential, we remain in the very early stages of

understanding the link between design and the microbiology of theindoor environment. A comprehensive understanding of themechanisms that shape indoor ecosystems will entail disentanglingthe relative contributions of biological processes including

PLOS ONE | www.plosone.org 1 January 2014 | Volume 9 | Issue 1 | e87093

Architectural Design Drives the Biogeography of IndoorBacterial CommunitiesSteven W. Kembel1,2,3., James F. Meadow2,3*., Timothy K. O’Connor2,3,4, Gwynne Mhuireach2,5,

Dale Northcutt2,5, Jeff Kline2,5, Maxwell Moriyama2,5, G. Z. Brown2,5,6, Brendan J. M. Bohannan2,3,

Jessica L. Green2,3,7

1Departement des sciences biologiques, Universite du Quebec a Montreal, Montreal, Quebec, Canada, 2 Biology and the Built Environment Center, University of Oregon,

Eugene, Oregon, United States of America, 3 Institute of Ecology and Evolution, University of Oregon, Eugene, Oregon, United States of America, 4Department of Ecology

and Evolutionary Biology, University of Arizona, Tucson, Arizona, United States of America, 5 Energy Studies in Buildings Laboratory, University of Oregon, Eugene,

Oregon, United States of America, 6Department of Architecture, University of Oregon, Eugene, Oregon, United States of America, 7 Santa Fe Institute, Santa Fe, New

Mexico, United States of America

Abstract

Background: Architectural design has the potential to influence the microbiology of the built environment, withimplications for human health and well-being, but the impact of design on the microbial biogeography of buildings remainspoorly understood. In this study we combined microbiological data with information on the function, form, andorganization of spaces from a classroom and office building to understand how design choices influence the biogeographyof the built environment microbiome.

Results: Sequencing of the bacterial 16S gene from dust samples revealed that indoor bacterial communities wereextremely diverse, containing more than 32,750 OTUs (operational taxonomic units, 97% sequence similarity cutoff), butmost communities were dominated by Proteobacteria, Firmicutes, and Deinococci. Architectural design characteristicsrelated to space type, building arrangement, human use and movement, and ventilation source had a large influence on thestructure of bacterial communities. Restrooms contained bacterial communities that were highly distinct from all otherrooms, and spaces with high human occupant diversity and a high degree of connectedness to other spaces via ventilationor human movement contained a distinct set of bacterial taxa when compared to spaces with low occupant diversity andlow connectedness. Within offices, the source of ventilation air had the greatest effect on bacterial community structure.

Conclusions: Our study indicates that humans have a guiding impact on the microbial biodiversity in buildings, bothindirectly through the effects of architectural design on microbial community structure, and more directly through theeffects of human occupancy and use patterns on the microbes found in different spaces and space types. The impact ofdesign decisions in structuring the indoor microbiome offers the possibility to use ecological knowledge to shape ourbuildings in a way that will select for an indoor microbiome that promotes our health and well-being.

Citation: Kembel SW, Meadow JF, O’Connor TK, Mhuireach G, Northcutt D, et al. (2014) Architectural Design Drives the Biogeography of Indoor BacterialCommunities. PLoS ONE 9(1): e87093. doi:10.1371/journal.pone.0087093

Editor: Bryan A. White, University of Illinois, United States of America

Received July 18, 2013; Accepted December 18, 2013; Published January 29, 2014

Copyright: ! 2014 Kembel et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permitsunrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: This research was funded by a grant to the Biology and the Built Environment Center from the Alfred P. Sloan Foundation Microbiology for the BuiltEnvironment Program (http://www.sloan.org/major-program-areas/basic-research/microbiology-of-the-built-environment/). The funders had no role in studydesign, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing Interests: The authors have declared that no competing interests exist.

* E-mail: [email protected]

. These authors contributed equally to this work.

Introduction

Biologists and designers are beginning to collaborate in a newfield focused on the microbiology of the built environment [1,2].These collaborations, which integrate perspectives from ecologyand evolution, architecture, engineering and building science, aredriven by a number of interrelated observations. First, it isincreasingly recognized that buildings are complex ecosystemscomprised of microorganisms interacting with each other and theirenvironment [3–5]. Second, the built environment is the primaryhabitat of humans; humans spend the majority of their livesindoors where they are constantly coming into contact with the

built environment microbiome (the microbial communities withinbuildings) [6]. Third, evidence is growing that the microbes livingin and on people, the human microbiome, play a critical role inhuman health and well-being [7–9]. Together, these observationssuggest that it may be possible to influence the human microbiomeand ultimately human health, by modifying the built environmentmicrobiome through architectural design.Despite this potential, we remain in the very early stages of

understanding the link between design and the microbiology of theindoor environment. A comprehensive understanding of themechanisms that shape indoor ecosystems will entail disentanglingthe relative contributions of biological processes including

PLOS ONE | www.plosone.org 1 January 2014 | Volume 9 | Issue 1 | e87093

Page 7: UC Davis EVE161 Lecture 9 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Methods

Study Location

Page 8: UC Davis EVE161 Lecture 9 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014environmental selection, dispersal, diversification, and ecologicaldrift [10]. To date, most research has focused on understandingthe influence of environmental selection and dispersal on the builtenvironment microbiome. Environmental conditions includinghumidity and air temperature have been shown to influence thegrowth rate and survival of many microbial taxa [3,5,11] andcorrelate with the composition of bacterial communities indoors[4]. Many bacteria and fungi exhibit strong microhabitatassociations and increased growth under conditions of higherhumidity and in the presence of water sources, such as in kitchensand restrooms [12,13]. The dispersal of microbes into and withinthe built environment also appears to have a significant influenceon indoor ecosystems. The sources of microbes include those fromoutdoor habitats such as air and soil brought into the building viaventilation systems or carried into the building by macroorganisms[4,14–16], microbes from indoor sources such as water, carpetsand other surfaces within a building [13,17], and microbes emittedfrom macroorganisms within the building including humans, petsand plants [18,19]. The relative importance of these differentsources of microbes indoors is not well understood, but is likely todiffer as a function of space (e.g. geographic location [20]), time(e.g. year and season of sampling [15]), and building design andoperation [4].

The biological processes described above can be fundamentallyaltered by building design. However many questions remainunanswered regarding how design aspects – such as the function,form and organization of a building - shape the indoor microbiome.Function refers to the collection of activities and uses that a buildingand its spaces serve. Functional requirements are translated intothe variety and number of space types within a building – forexample offices, restrooms, and hallways. Function is also a keydeterminant of the design criteria for environmental conditionsincluding temperature, relative humidity, and light levels. Formrefers to geometry of a building and the spaces within it, whileorganization refers to the spatial relationships among indoor spaces.Form and organization are highly interrelated and both involvedesign choices that influence human circulation (the source,variation and movement of people), air circulation (the source,variation and movement of air), and environmental conditionsthroughout a building.To understand how design choices influence the biogeography

of indoor bacterial communities, we collected microbiological,architectural, and environmental data in 155 rooms throughout amulti-use classroom and office building (Lillis Hall; Fig. 1). Wefocus on the bacterial communities in settled dust, because itrepresents an integrative record of microbial biodiversity in indoor

Figure 1. Architectural layout for two of four floors in Lillis Hall. Restrooms (brown), offices (blue) and classrooms (yellow) are shown toillustrate space type distribution throughout Lillis. The first two floors of the building are primarily devoted to classrooms and share a similar floor-plan. The 3rd and 4th floors contain most offices in the building and also share a similar floor-plan. The building has a basement and penthousespaces; these are largely building support spaces, including mechanical rooms and storage.doi:10.1371/journal.pone.0087093.g001

Biogeography of Indoor Bacterial Communities

PLOS ONE | www.plosone.org 2 January 2014 | Volume 9 | Issue 1 | e87093

Page 9: UC Davis EVE161 Lecture 9 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Methods

Study Location

We analyzed bacterial communities in dust collected from 155 spaces in the Lillis Hall, a four-story classroom and office building on the University of Oregon campus in Eugene, Oregon, USA. This building was chosen as a study site for several reasons. Architecturally, Lillis Hall was designed to accommodate natural ventilation for both fresh air and cooling; the building is thin, allowing most rooms access to the building skin for supplying outside air directly through windows and louvers, and it has a central atrium used for exhausting air through stack ventilation. From a study design perspective, diverse space types, occupancy levels, and building management strategies were located in close proximity within the same building, making it possible to compare their relative influences on indoor biogeography.

Page 10: UC Davis EVE161 Lecture 9 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Architectural Design Data

Page 11: UC Davis EVE161 Lecture 9 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Architectural Design Data

Data on architectural design attributes of each space including function, form, and organization were obtained using architectural plans, field observation, and a building information model (Fig. 1). Spaces in the building were classified into one of seven space types. This classification system was developed for the present study based on the Oregon University System’s space type codes and definitions [40]. These categories are based on the overall architectural design and intended human use pattern for each space, and include circulation (e.g. hallways, atria), classrooms, classroom support (e.g. reading and practice rooms), offices, office support (e.g. most storage spaces, conference rooms), building support (e.g. mechanical equipment rooms, janitor closets), and restrooms. We measured numerous spatial and architectural attributes of each space including level (floor), wing (east versus west), size (net floor area), air handling unit (AHU) (13 different AHUs supply air to different rooms, so AHU is a categorical variable with 15 levels, one for each AHU as well as a ‘none’ category for rooms without mechanically supplied air, and a ‘multiple’ category for circulation spaces fed by multiple supply sources), and a separate binary variable denoting whether the space was only capable of being naturally ventilated by unfiltered outside air (e.g. via windows or louvers; 41 rooms) or by dedicated mechanical AHU supply (114 rooms).

Metrics related to form and organization were quantified using network analysis (Fig. 2) and information from building construction drawings. Spaces were considered to be spatially connected if they shared a doorway or other physical connection that would permit a person to move directly between the two spaces. The network of spatial connections among spaces was used to calculate two measures of network centrality [22], [41] for each space in the building: betweenness, a measure of the fraction of shortest paths among all spaces in the building that would pass through a space, and degree, the number of connections a space has to other spaces. The network of spatial connections between spaces was also used to define a connectance distance between all pairs of spaces in the building, defined as the minimum number of spaces a person would need to travel through to move between two spaces. We considered using ventilation-based distance (how much duct length separates two connected spaces) as a connectance distance, however preliminary investigation indicated that connectance distance and ventilation distance were strongly correlated.

Page 12: UC Davis EVE161 Lecture 9 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Architectural Design Data

Data on architectural design attributes of each space including function, form, and organization were obtained using architectural plans, field observation, and a building information model (Fig. 1). Spaces in the building were classified into one of seven space types. This classification system was developed for the present study based on the Oregon University System’s space type codes and definitions [40]. These categories are based on the overall architectural design and intended human use pattern for each space, and include circulation (e.g. hallways, atria), classrooms, classroom support (e.g. reading and practice rooms), offices, office support (e.g. most storage spaces, conference rooms), building support (e.g. mechanical equipment rooms, janitor closets), and restrooms. We measured numerous spatial and architectural attributes of each space including level (floor), wing (east versus west), size (net floor area), air handling unit (AHU) (13 different AHUs supply air to different rooms, so AHU is a categorical variable with 15 levels, one for each AHU as well as a ‘none’ category for rooms without mechanically supplied air, and a ‘multiple’ category for circulation spaces fed by multiple supply sources), and a separate binary variable denoting whether the space was only capable of being naturally ventilated by unfiltered outside air (e.g. via windows or louvers; 41 rooms) or by dedicated mechanical AHU supply (114 rooms).

Metrics related to form and organization were quantified using network analysis (Fig. 2) and information from building construction drawings. Spaces were considered to be spatially connected if they shared a doorway or other physical connection that would permit a person to move directly between the two spaces. The network of spatial connections among spaces was used to calculate two measures of network centrality [22], [41] for each space in the building: betweenness, a measure of the fraction of shortest paths among all spaces in the building that would pass through a space, and degree, the number of connections a space has to other spaces. The network of spatial connections between spaces was also used to define a connectance distance between all pairs of spaces in the building, defined as the minimum number of spaces a person would need to travel through to move between two spaces. We considered using ventilation-based distance (how much duct length separates two connected spaces) as a connectance distance, however preliminary investigation indicated that connectance distance and ventilation distance were strongly correlated.

Page 13: UC Davis EVE161 Lecture 9 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

spaces [21]. Our study addresses two overarching questions. First,at the scale of the entire building, do function, form andorganization predict variation in the built environment micro-biome? Second, for rooms that serve the same function (rooms thatare of the same space type), which aspects of form andorganization most influence the built environment microbiome?

Methods

Study LocationWe analyzed bacterial communities in dust collected from 155

spaces in the Lillis Hall, a four-story classroom and office buildingon the University of Oregon campus in Eugene, Oregon, USA.This building was chosen as a study site for several reasons.Architecturally, Lillis Hall was designed to accommodate naturalventilation for both fresh air and cooling; the building is thin,allowing most rooms access to the building skin for supplyingoutside air directly through windows and louvers, and it has acentral atrium used for exhausting air through stack ventilation.From a study design perspective, diverse space types, occupancylevels, and building management strategies were located in closeproximity within the same building, making it possible to comparetheir relative influences on indoor biogeography.

Architectural Design DataData on architectural design attributes of each space including

function, form, and organization were obtained using architecturalplans, field observation, and a building information model (Fig. 1).Spaces in the building were classified into one of seven space types.This classification system was developed for the present studybased on the Oregon University System’s space type codes anddefinitions [40]. These categories are based on the overallarchitectural design and intended human use pattern for eachspace, and include circulation (e.g. hallways, atria), classrooms,classroom support (e.g. reading and practice rooms), offices, officesupport (e.g. most storage spaces, conference rooms), building support(e.g. mechanical equipment rooms, janitor closets), and restrooms.We measured numerous spatial and architectural attributes ofeach space including level (floor), wing (east versus west), size (netfloor area), air handling unit (AHU) (13 different AHUs supply air todifferent rooms, so AHU is a categorical variable with 15 levels,one for each AHU as well as a ‘none’ category for rooms withoutmechanically supplied air, and a ‘multiple’ category for circulationspaces fed by multiple supply sources), and a separate binaryvariable denoting whether the space was only capable of beingnaturally ventilated by unfiltered outside air (e.g. via windows orlouvers; 41 rooms) or by dedicated mechanical AHU supply (114rooms).

Figure 2. Network analysis metrics used to quantify spatial arrangement of spaces within Lillis Hall. Examples in the left column followclassic network representation, while those in the right column embody the architectural translation of networks. Shaded nodes and building spacescorrespond to centrality measures [22] of betweenness (the number of shortest paths between all pairs of spaces that pass through a given space overthe sum of all shortest paths between all pairs of spaces in the building) and degree (the number of connections a space has to other spaces);connectance distance (the number of doors between any two spaces) is a pairwise metric, shown here as the range of connectance distance values foreach complete network/building. Since betweenness and degree strongly co-vary and are both measures of network centrality [22], they areconsidered together in some analyses.doi:10.1371/journal.pone.0087093.g002

Biogeography of Indoor Bacterial Communities

PLOS ONE | www.plosone.org 3 January 2014 | Volume 9 | Issue 1 | e87093

Page 14: UC Davis EVE161 Lecture 9 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Human use patterns are a product of functional classification, but they also dictate form and organizational attributes of building design. In this study, human use patterns for each space were estimated based on a qualitative assessment of the expected patterns of human diversity and annual occupied hours in each space. Briefly, human diversity was defined on a three-point scale, ranging from low human diversity (spaces likely to be occupied by at most a single individual during a typical day; e.g. a closet) to high human diversity (spaces likely to be occupied by numerous different individuals during a typical day; e.g. a hallway). Annual occupied hours (person-hours per year) were similarly defined along a three-point scale from low (spaces that are typically vacant or occupied at low density; e.g. a mechanical support space) to high (spaces that are frequently occupied at relatively high density; e.g. administrative offices). Both of these human occupancy variables are explained in more detail in Table S1.

At the time of microbial community sampling, ambient air temperature and relative humidity measurements were taken from each space. Relative humidity measurements were detrended using daily mean values to account for temporal changes over the sampling period.

Page 15: UC Davis EVE161 Lecture 9 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Biological Sampling

Sampling of dust was carried out with a Shop-Vac® 9.4L Hang Up vacuum (www.shopvac.com; #215726) fitted with a Dustream™ Collector vacuum filter sampling device (www.inbio.com/dustream.html). Dust samples were collected by vacuuming an area of approximately 2m2 on horizontal surfaces above head level for 2 minutes in each space. We preferentially chose these surfaces for sampling since they minimized the frequency of disturbance by cleaning, and thus likely serve as a long-term sample of airborne particles in each space [21]. All samples were collected during June 22–24, 2012. Building construction was completed in 2003, and dust has presumably been accumulating in some sampled spaces since that time.

Page 16: UC Davis EVE161 Lecture 9 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

• Then a semi-standard rRNA PCR workflow

Page 17: UC Davis EVE161 Lecture 9 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014Metrics related to form and organization were quantified usingnetwork analysis (Fig. 2) and information from building construc-tion drawings. Spaces were considered to be spatially connected ifthey shared a doorway or other physical connection that wouldpermit a person to move directly between the two spaces. Thenetwork of spatial connections among spaces was used to calculatetwo measures of network centrality [22,41] for each space in thebuilding: betweenness, a measure of the fraction of shortest pathsamong all spaces in the building that would pass through a space,and degree, the number of connections a space has to other spaces.The network of spatial connections between spaces was also usedto define a connectance distance between all pairs of spaces in thebuilding, defined as the minimum number of spaces a personwould need to travel through to move between two spaces. Weconsidered using ventilation-based distance (how much duct lengthseparates two connected spaces) as a connectance distance,however preliminary investigation indicated that connectancedistance and ventilation distance were strongly correlated.Human use patterns are a product of functional classification,

but they also dictate form and organizational attributes of building

design. In this study, human use patterns for each space wereestimated based on a qualitative assessment of the expectedpatterns of human diversity and annual occupied hours in each space.Briefly, human diversity was defined on a three-point scale,ranging from low human diversity (spaces likely to be occupied byat most a single individual during a typical day; e.g. a closet) tohigh human diversity (spaces likely to be occupied by numerousdifferent individuals during a typical day; e.g. a hallway). Annualoccupied hours (person-hours per year) were similarly definedalong a three-point scale from low (spaces that are typically vacantor occupied at low density; e.g. a mechanical support space) tohigh (spaces that are frequently occupied at relatively high density;e.g. administrative offices). Both of these human occupancyvariables are explained in more detail in Table S1.At the time of microbial community sampling, ambient air

temperature and relative humidity measurements were taken fromeach space. Relative humidity measurements were detrendedusing daily mean values to account for temporal changes over thesampling period.

Figure 3. The taxonomic composition of bacterial communities sampled from dust in Lillis Hall. Samples are organized by space type,and relative abundances are shown for groups comprising more than 1% (for phylum and class level) and 4% (for order level).doi:10.1371/journal.pone.0087093.g003

Biogeography of Indoor Bacterial Communities

PLOS ONE | www.plosone.org 4 January 2014 | Volume 9 | Issue 1 | e87093

Building-scale Design Influences on the Built Environment Microbiome

Page 18: UC Davis EVE161 Lecture 9 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014Biological SamplingSampling of dust was carried out with a Shop-VacH 9.4L Hang

Up vacuum (www.shopvac.com; #215726) fitted with a Dus-treamTM Collector vacuum filter sampling device (www.inbio.com/dustream.html). Dust samples were collected by vacuumingan area of approximately 2m2 on horizontal surfaces above headlevel for 2 minutes in each space. We preferentially chose thesesurfaces for sampling since they minimized the frequency ofdisturbance by cleaning, and thus likely serve as a long-termsample of airborne particles in each space [21]. All samples werecollected during June 22–24, 2012. Building construction wascompleted in 2003, and dust has presumably been accumulating insome sampled spaces since that time.Dust samples were stored at 280uC until DNA extraction. Dust

was manually extracted from filters, and used for DNA extraction.Whole genomic DNA was isolated from samples using MO BIOPowerLyzerTM PowerSoilH DNA Isolation Kit (MO BIO,Carlsbad, CA) according to manufacturer’s instructions with thefollowing modifications: bead tubes were vortexed for 10 min;solutions C4 and C5 were substituted for PW3 and PW4/PW5solutions from the same manufacturer’s PowerWaterH DNAisolation kit. Bacterial communities were profiled by sequencinga ,420 bp fragment of the V4 region of the bacterial 16S rRNAgene using a custom library preparation protocol [24]. Briefly, the

protocol consisted of two PCRs. The first amplified the V4/V5region using the primers 59-AYTGGGYDTAAAGNG-39 and 59-CCGTCAATTYYTTTRAGTTT-39 [42,43] and appended a6 bp barcode and partial Illumina sequencing adaptor. Forwardand reverse strands were labeled with different barcodes, and theunique combination of these barcodes was used to pool samples inpost-processing.All extracted samples were amplified in triplicate for PCR1 and

triplicates were pooled before PCR2. PCR1 (25 mL total volumeper reaction) consisted of the following ingredients: 5 mL 5x HFbuffer (Thermo Fisher Scientific, U.S.A.), 0.5 mL dNTPs (10 mM),0.25 mL Phusion Hotstart II polymerase (Thermo Fisher Scien-tific, U.S.A.), 13.25 mL certified nucleic-acid free water, 0.5 mLforward primer (10 uM), 0.5 mL reverse primer (10 uM), and 5 mLtemplate DNA. The PCR1 conditions were as follows: initialdenaturation for 30 s at 98uC; 20 cycles of 20 s at 98uC, 30 s at50uC and 30 s at 72uC; and 72uC for 10 min for final extension.After PCR1, the triplicate reactions were pooled and cleaned withthe QIAGEN Minelute PCR Purification Kit according to themanufacturers protocol (QIAGEN, Germantown, MD). Amplifiedproducts from PCR1 were eluted in 11.5 mL of Buffer EB. ForPCR2, a single primer pair was used to add the remainingIllumina adaptor segments to the ends of the concentratedamplicons of PCR1. The PCR2 (25 mL volume per reaction)consisted of the same combination of reagents that was used inPCR1, along with 5 mL concentrated PCR1 product as template.The PCR 2 conditions were as follows: 30 s denaturation at 98uC;15 cycles of 10 s at 98uC, 30 s at 64uC and 30 s at 72uC; and10 min at 72uC for final extension.Amplicons were size-selected by gel electrophoresis: gel bands at

c. 500bp were extracted and concentrated, using the ZR-96Zymoclean Gel DNA Recovery Kit (ZYMO Research, Irvine,CA), following manufacturer’s instructions, quantified using aQubit Fluorometer (Invitrogen, NY), and pooled in equimolarconcentrations for library preparation for sequencing. Resultinglibraries were sequenced in two multiplexed Illumina MiSeq lanes(paired-end 150 base pair sequencing) at the Dana Farber CancerInstitute (Boston, MA). All sequence data and metadata have beendeposited in the open-access data repository Figshare (http://figshare.com/articles/Lillis_Dust_Sequencing_Data/709596).

Sequence ProcessingWe processed raw sequence data with the FastX_Toolkit (http://

hannonlab.cshl.edu/fastx_toolkit) and QIIME [44] software pipe-lines to eliminate low-quality sequences and de-multiplex sequenc-es into samples. Sequences were trimmed to a length of 200 bp(100 bp from each paired end). We retained sequences with anaverage quality score of 30 over 97% of the sequence length aftertrimming. After trimming, quality filtering and rarefaction of eachsample to 2,100 sequences to ensure equal sampling depth acrosssamples, 329,700 sequences from 155 samples remained and wereincluded in all subsequent analyses. We binned sequences intooperational taxonomic units (OTUs) at a 97% sequence similaritycutoff using UCLUST [45] and assigned taxonomy to each OTUusing the BLAST taxon assignment algorithm and Greengenesversion 4feb2011 core set [46] as implemented in QIIME version1.4. We inferred phylogenetic relationships among all bacterialOTUs using a maximum likelihood GTR+Gamma phylogeneticmodel in FastTree [47].

Data AnalysisStatistical analysis was performed in R [48]. Pairwise commu-

nity dissimilarity was calculated using the quantitative, taxonomy-based Canberra distance metric, implemented in the vegan package

Table 1. Variance in biological dissimilarity among bacterialcommunities from all spaces, as well as just offices, (Canberradistance) explained by different variables in Lillis Hall.

Room types Explanatory variable R2 P-value

all rooms Space type 0.06 0.001

Air source - air handling unit (AHU) 0.13 0.001

Building floor 0.01 0.001

Space size 0.01 0.001

Building wing - East/West 0.01 0.341

Building side - North/South 0.01 0.001

Occupant diversity 0.01 0.001

Annual occupied hours 0.01 0.015

Centrality (betweenness) 0.01 0.001

Centrality (degree) 0.01 0.001

Temperature 0.01 0.024

Relative Humidity* 0.01 0.001

Natural ventilation capability 0.01 0.001

offices Air source - air handling unit (AHU) 0.07 0.001

Building floor 0.07 0.001

Space size 0.02 0.025

Building wing - East/West 0.01 0.541

Centrality (betweenness) 0.02 0.005

Centrality (degree) 0.02 0.016

Temperature 0.02 0.002

Relative Humidity* 0.01 0.786

Natural ventilation capability 0.02 0.001

Variance explained (R2) and statistical significance (P-value) quantified with aPERMANOVA test; since P-values are from permutational tests involving 999permutations, they are only reported down to 0.001. All variables and theirrespective units are described in the methods section and Table S1.*detrended using daily averages.doi:10.1371/journal.pone.0087093.t001

Biogeography of Indoor Bacterial Communities

PLOS ONE | www.plosone.org 5 January 2014 | Volume 9 | Issue 1 | e87093

Building-scale Design Influences on the Built Environment Microbiome

Analysis of the variance in bacterial community composition explained by different factors (Table 1; PERMANOVA on Canberra distances) indicated that space type and air handling unit (AHU) explained the greatest proportion of variance (R2 = 0.06 & 0.13, respectively; both P = 0.001).

Page 19: UC Davis EVE161 Lecture 9 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014[49] in R. We also assessed the consequences of beta-diversitymetric choice on our results; correlations between potentialmetrics are included as Fig. S1. Constrained ordinations(distance-based redundancy analysis; DB-RDA) were createdutilizing the capscale function in vegan. Correlations reported onordination axes, indicated by arrows, are based on simple linearmodels of environmental variables against ordination axes.Indicator taxa analysis [50] was performed using the indvalfunction in the labdsv package [51]. Mantel and partial mantel testswere used to investigate the correlations between community andenvironmental distance matrices, including a distance-decaycomparison, using the mantel function in vegan. Permutationalmultivariate analysis of variance (PERMANOVA) was used to testcommunity differences between groups of samples as a way toidentify drivers of variation in community structure, using theadonis function in vegan. All permutational tests were conductedwith 999 permutations, and thus p-values are reported down to, butnot below, 0.001.

Results

Building-scale Design Influences on the BuiltEnvironment MicrobiomeBacterial communities in dust from Lillis Hall were highly

diverse. Using barcoded Illumina sequencing of 16S rRNA genes,we detected 32,964 operational taxonomic units (OTUs; definedat a 97% sequence similarity cut-off) in 791,192 sequences from155 samples (19,403 OTUs and 325,500 sequences afterrarefaction to 2,100 sequences per sample). Most of these OTUswere rare, occurring in one (49.9%) or two (13.3%) samples, andat low relative abundance (61.1% of OTUs were singletons ordoubletons). However, OTUs from several taxonomic groupsincluding Alpha-, Beta-, and Gamma-Proteobacteria, Firmicutes,and Deinococci were abundant and common in almost all dustsamples we collected (Fig. 3 and Fig. S2). There were 58 OTUsbelonging to these taxonomic groups that were present in 95% ormore of all samples we collected. These ubiquitous OTUs werealso abundant, representing 0.1% of the OTU richness but .28%of all sequences.Spaces differing in their architectural design characteristics

contained distinctive bacterial communities. Analysis of thevariance in bacterial community composition explained by

Figure 4. Dust communities within a building cluster by space type and are strongly correlated with building centrality and humanoccupancy. Points represent centroids (6SE) from distance based redundancy analysis (DB-RDA). Space types hold significantly differentcommunities (P=0.005), though this is driven primarily by restrooms. Bacterial OTUs that have the strongest influence in sample dissimilarities areshown at the margins; numbers in parentheses indicate multiple OTUs in the same genus. Centrality (along y-axis) represents network betweennessand degree; human occupancy (along x-axis) represents annual occupied hours and human diversity. All four correlates (simple linear models as afactor of ordination axis) are significant along their respective axes (all P,0.001).doi:10.1371/journal.pone.0087093.g004

Biogeography of Indoor Bacterial Communities

PLOS ONE | www.plosone.org 6 January 2014 | Volume 9 | Issue 1 | e87093

Page 20: UC Davis EVE161 Lecture 9 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Design Influences on the Built Environment Microbiome within a Space Type

The large number of office spaces (73 offices) made it possible to test for drivers of microbial community variation among offices.

Page 21: UC Davis EVE161 Lecture 9 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014different factors (Table 1; PERMANOVA on Canberra distances)indicated that space type and air handling unit (AHU) explainedthe greatest proportion of variance (R2=0.06 & 0.13, respectively;both P=0.001). Nearly all other variables considered in this study(Table 1) were significantly correlated with biological variation aswell, but explained a far smaller portion of the overall variance inmicrobial community structure at the scale of the building. ThusTable 1 can be seen as a potential list of building features that can,in the future, be targeted when attempting to account formicrobiological variation in architectural design.Restrooms explained a substantial amount of the variation

observed between space types; bacterial communities in restroomswere compositionally distinct from other space types (R2=0.06;P=0.001; from PERMANOVA on Canberra distances). Inaddition to serving a distinct function, restrooms were character-ized architecturally by relatively low network centrality (quantifiedas network betweenness and degree [22]; network terminologyoutlined in Fig. 2). This is because in Lillis hall, restroomsgenerally only have a single door and are rarely or never on a pathbetween any two other spaces. Restrooms also had a high diversity

of human occupants (defined as a high number of differentoccupants throughout the day; explicit definitions of occupancyvariables provided in Table S1). Indicator taxa analysis detectednumerous OTUs that were associated with restrooms, predomi-nantly belonging to taxa that are commonly associated with thehuman gut and skin microbiome including Lactobacillus, Staphylo-coccus, and Streptococcus. Taxa including Lactobacillus, Staphylococcusand Clostridiales were also more abundant in restrooms comparedwith other space types, while Sphingomonas were relatively lessabundant in restrooms (Fig. 4).Aside from restrooms, bacterial communities in Lillis hall

tended to vary with both human occupancy and room centrality(Fig. 4). For instance hallways, which had high human occupancyand high occupant diversity (e.g., relatively many occupants andmany different occupants throughout the day) as well as highcentrality (hallways often serve as a pathway between rooms), weredistinct from spaces such as mechanical support rooms and facultyoffices with the opposite set of attributes (Fig. 4). While there werefew statistically significant indicator taxa from individual spacetypes other than restrooms, there was variation in the abundance

Figure 5. Offices contain significantly different dust microbial communities depending on ventilation source. a) The first axis isconstrained by whether or not offices have operable window louvers (blue) or not (red). Taxon names on either side are grouped from the 25strongest weighting OTUs in either direction. b) Deinococcus were 1.7 times more abundant in mechanically ventilated offices compared to windowventilated offices. c) The opposite pattern was observed for Methylobacterium OTUs, which were 1.8 times more abundant in window ventilatedoffices. Boxplots delineate (from bottom) minimum, Q1, median, Q3, and maximum values; notches indicate 95% confidence intervals. d) Cross-sectional view of representative Lillis Hall offices. Offices on the south side of the building (left) received primarily mechanically ventilated air, whileoffices on the north side of the building (right) are equipped with operable windows as a primary ventilation air source.doi:10.1371/journal.pone.0087093.g005

Biogeography of Indoor Bacterial Communities

PLOS ONE | www.plosone.org 7 January 2014 | Volume 9 | Issue 1 | e87093

Design Influences on the Built Environment Microbiome within a Space Type

Page 22: UC Davis EVE161 Lecture 9 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014of major bacterial taxa among these spaces. Taxa includingLactococcus, Pseudomonas, and Streptococcus were more abundant inthe centrally located and highly-occupied spaces (Fig. 4), whileAchromobacter and Methylobacterium were more abundant in the lesscentral and less occupied spaces. Space types did not varysignificantly in terms of their overall bacterial OTU richness ordiversity (ANOVA using rarefied OTU richness and Shannondiversity; P=0.2 & 0.9, respectively).

Design Influences on the Built Environment Microbiomewithin a Space TypeThe large number of office spaces (73 offices) made it possible to

test for drivers of microbial community variation among offices.Using a single space type also allowed us to hold relatively constantseveral building parameters. Specifically, parameters includingspace size, relative humidity, and occupancy varied less acrossoffices than across all rooms at the building-scale. Variation inbacterial community structure among faculty offices was largelyexplained by the ventilation source in offices, with mechanicallyventilated faculty offices containing a distinctive set of bacterialtaxa when compared with window ventilated faculty offices (Fig. 5;R2=0.025; P=0.005). Taxa including Deinococcus, Achromonobacter,and Roseomonas were associated with mechanically ventilatedfaculty offices, while Methylobacterium, Sphingomonas, and Streptococcuswere more closely associated with window ventilated facultyoffices. Two of the most abundant of these strongly weighting taxa,Deinococcus and Methylobacterium, when grouped by genus, showconsistent abundance differences between offices with differentventilation strategies. We found a strong association between thespatial connectance distance of offices (the number of doorsthrough which one must walk between any two spaces) versus themicrobial community similarity of offices (Fig. 6; R=0.19;

P=0.002; from a Mantel test of Canberra distance vs. spatialconnectance distance). This association was also significant at thebuilding scale, regardless of space type (R= 0.11; P=0.001).

Discussion

In this paper we first asked: at the scale of the entire building, dofunction, form and organization predict variation in the builtenvironment microbiome? Our data suggest that the answer isyes. In architecture, function translates to space type, which inLillis Hall was the strongest predictor of microbiome variationthroughout the building. Due to the integrative nature ofarchitectural design, function often drives patterns in the formand organization of spaces throughout a building, and form andorganization are necessarily difficult to disentangle. Although formand organization are distinct aspects of architectural design, wedid not attempt to draw a distinction between them in ouranalyses, since nearly every building variable herein relates toboth. In Lillis Hall, design choices resulted in distinct space typesthat greatly differed in terms of their architectural characteristics,which were related to variation in microbial community compo-sition at the building-scale. We also focused our analyses on themost common space type in Lillis Hall: offices. Specifically, weasked which aspects of form and organization most influenced thebuilt environment microbiome in offices. We found that networkbetweenness, building floor, space size, and ventilation source werethe strongest predictors for microbiome variation, even afterholding function constant.Despite the microbiome variation across space types, we

detected a core built environment microbiome [23] of bacterialtaxa that were present in nearly every indoor space we sampled.This core microbiome was dominated by taxa including membersof the Proteobacteria and Firmicutes that are commonly found inindoor dust [15], although other common indoor dust taxa such asActinobacteria were rare in this building (c. 1% of sequences).Many of the common taxa in the indoor dust microbiome werealso detected in air and surface samples from the same building[24], suggesting that resuspension and settling of microbes fromthese pools of potential colonists are contributing to thecommunities detected in dust. The synchrony among these threemicrobial pools (air, surfaces and dust) within Lillis Hall suggests aconserved core building microbiome. Likely sources of this coremicrobiome include humans, soils and plants. We found thatseveral of the bacterial taxa most strongly associated withrestrooms as well as with high occupant diversity space types,such as classrooms, are also known to be associated with thehuman microbiome (e.g. Lactobacillus and Staphylococcus), whilebacteria in low occupant diversity space types such as facultyoffices and mechanical support spaces were more indicative ofoutdoor environments such as soils and the phyllosphere (e.g.Methylobacterium).There has been a recent debate regarding the relative

importance of dispersal from outdoor sources versus the conditionswithin buildings for determining the structure of indoor microbialcommunities [16,24,25]. We found evidence for the importance ofboth types of processes: the potential for dispersal from outdoorsources (e.g. ventilation air source, natural ventilation capacity)and conditions within the building (e.g. space type, building floor,temperature and relative humidity) influenced microbial commu-nity structure. This suggests that dispersal- and niche-basedexplanations will be required to understand the dynamics of thebuilt environment microbiome. As in any ecological community,the spatial and temporal scale used to define indoor communitieswill have a large impact on the processes that give rise to patterns

Figure 6. Offices in Lillis Hall show a strong distance-decaypattern. When only considering a single space type, biologicalsimilarity (y-axis; 1 - Canberra distance) decreases with connectancedistance (number of intermediate space boundaries [e.g., doors] onewould walk through to travel the shortest distance between any twospaces) (Mantel test; R= 0.189; P= 0.002). The same pattern was alsoobserved at the whole-building scale (not shown; Mantel test; R=0.112;P= 0.001).doi:10.1371/journal.pone.0087093.g006

Biogeography of Indoor Bacterial Communities

PLOS ONE | www.plosone.org 8 January 2014 | Volume 9 | Issue 1 | e87093

Page 23: UC Davis EVE161 Lecture 9 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

SFig1. High degree of correlation between three beta-diversity metrics. Multivariate community analysis was carried out with the Canberra taxonomic metric; this choice results in de-emphasis of the most abundant species (as opposed to using the Bray-Curtis dissimilarity metric), and also ignores nuanced evolutionary relationships between bacterial OTUs (as opposed to using the phylogenetic Weighted UniFrac distance). While the choice of a beta-diversity metric can impact results, the three potential candidates that we explored resulted in largely the same distance between samples in multivariate space. All three metrics are bounded between 0 and 1. Pearson’s correlations (r) are given in the upper right panels.

Page 24: UC Davis EVE161 Lecture 9 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Acidobacteria   0.6%

Actinomycetales   0.5%

SC4   0.3%

Chlamydiales   0.2%

Fusobacteriaceae   0.2%

Verrucomicrobia   0.1%

Planctomycetes   0.08%

Chloroflexi   0.03%

Bacteria

Proteobacteria

Alphaproteobacteria

Rhizobiales

Met

hylo

bact

eriu

m   

7%

Rhizo

biace

ae   

0.8%

4 m

ore

Rhodospirillales

Acetobacteraceae

Roseom

onas 

  4%

Sphingomonadales

Sphingomonadaceae

Sphingomonas   3%

Rhodobacteraceae   3%

5 more

Caul

obac

tera

ceaeBrevundimonas   1% Betap

roteo

bacte

ria

Burkh

older

iales

Oxalobacteraceae   5%

Comamonadaceae   3%

Alcalige

nace

ae

4 more2 more

GammaproteobacteriaPseudomonadales

Pseudomonas   3%

Moraxellaceae

Enterobacteriaceae

11 more

6 more

Firmicutes

Bacilli

Lactobacillales

Streptococcaceae

8%   

Stre

ptoc

occu

s

Lactobacillaceae

4%   

Lacto

bacil

lus

Aerococcaceae

3 more

BacillalesStaphylococcaceae

5%   Staphylo

coccus

Plan...ceae

1%   Lysinibacillus

6 more

Bacillaceae

1%   Bacillus

Clostridia

ClostridialesClost...Sedis

2%   Anaerococcus

2 more

Lachno...raceae

6 more

Deino

cocc

i

13%   Deinococcus

Bacteroidetes

Sphing...riales

Flexibacte

raceae

2%   Hymenobacter2 more

4 more

2 more

Tenericutes

Mollicutes

2%   Spiroplasm

a

Cyanobacteria

Chloroplast

2%   Streptophyta

Figure S2. The taxonomic composition of bacterial communities sampled from dust in the Lillis Business Complex. The relative abundance of sequences assigned to taxa at different taxonomic levels is indicated by the relative width of categories at each level. Bacterial taxonomy was visualized using Krona (http://sourceforge.net/projects/krona/; Ondov et al. 2011). doi:10.1371/journal.pone.0087093.s002 (PDF)

Page 25: UC Davis EVE161 Lecture 9 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Conclusion

Churchill famously stated that “[w]e shape our buildings, and afterwards our buildings shape us.” Humans help to direct microbial biodiversity patterns in buildings – not only as building occupants, but also through architectural design strategies. The impact of human design decisions in structuring the indoor microbiome offers the possibility to use ecological knowledge to shape our buildings in a way that will select for an indoor microbiome that promotes our health and well-being.

Page 26: UC Davis EVE161 Lecture 9 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

RESEARCH Open Access

Microbes in the neonatal intensive care unitresemble those found in the gut of prematureinfantsBrandon Brooks1, Brian A Firek2, Christopher S Miller1,3, Itai Sharon1, Brian C Thomas1, Robyn Baker4,Michael J Morowitz2 and Jillian F Banfield1*

Abstract

Background: The source inoculum of gastrointestinal tract (GIT) microbes is largely influenced by delivery mode infull-term infants, but these influences may be decoupled in very low birth weight (VLBW, <1,500 g) neonates viaconventional broad-spectrum antibiotic treatment. We hypothesize the built environment (BE), specifically roomsurfaces frequently touched by humans, is a predominant source of colonizing microbes in the gut of prematureVLBW infants. Here, we present the first matched fecal-BE time series analysis of two preterm VLBW neonateshoused in a neonatal intensive care unit (NICU) over the first month of life.

Results: Fresh fecal samples were collected every 3 days and metagenomes sequenced on an Illumina HiSeq2000device. For each fecal sample, approximately 33 swabs were collected from each NICU room from 6 specified areas:sink, feeding and intubation tubing, hands of healthcare providers and parents, general surfaces, and nurse stationelectronics (keyboard, mouse, and cell phone). Swabs were processed using a recently developed ‘expectationmaximization iterative reconstruction of genes from the environment’ (EMIRGE) amplicon pipeline in which full-length16S rRNA amplicons were sheared and sequenced using an Illumina platform, and short reads reassembled intofull-length genes. Over 24,000 full-length 16S rRNA sequences were produced, generating an average ofapproximately 12,000 operational taxonomic units (OTUs) (clustered at 97% nucleotide identity) per room-infant pair.Dominant gut taxa, including Staphylococcus epidermidis, Klebsiella pneumoniae, Bacteroides fragilis, and Escherichia coli,were widely distributed throughout the room environment with many gut colonizers detected in more than half ofsamples. Reconstructed genomes from infant gut colonizers revealed a suite of genes that confer resistance toantibiotics (for example, tetracycline, fluoroquinolone, and aminoglycoside) and sterilizing agents, which likelyoffer a competitive advantage in the NICU environment.

Conclusions: We have developed a high-throughput culture-independent approach that integrates room surveysbased on full-length 16S rRNA gene sequences with metagenomic analysis of fecal samples collected from infants inthe room. The approach enabled identification of discrete ICU reservoirs of microbes that also colonized the infant gutand provided evidence for the presence of certain organisms in the room prior to their detection in the gut.

* Correspondence: [email protected] of Earth and Planetary Sciences, University of CaliforniaBerkeley, Berkeley, CA 94720, USAFull list of author information is available at the end of the article

© 2014 Brooks et al.; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the CreativeCommons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, andreproduction in any medium, provided the original work is properly cited.

Brooks et al. Microbiome 2014, 2:1http://www.microbiomejournal.com/content/2/1/1

Page 27: UC Davis EVE161 Lecture 9 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

BackgroundFrom birth to death, humans spend approximately 90%of their time indoors [1]. This realization, coupled withadvancements in DNA sequencing technologies, hasspawned a new interest in studying buildings as ecosys-tems. Pioneering efforts have revealed a built environ-ment (BE), a term used here to collectively describe boththe biotic and abiotic features of a building structure,that is far more complex than originally imagined [2,3].Diverse microbial communities have been uncovered ina variety of BEs [4] and surprisingly, from sites engi-neered to be sterile or near sterile, such as NASA cleanrooms [5,6] and high-risk hospital wards [7-10]. Add-itionally, recent studies characterizing different buildingtypes have revealed general trends suggesting a room’sfunction or architecture dictates the BE’s microbiome[8,11]. Intrabuilding experiments in hospitals have cor-roborated this notion, showing general use areas, suchas waiting rooms and lobbies, have a markedly differentmicrobial community compared to more restrictive hos-pital zones such as intensive care units [8]. The ex-change between the BE microbiome and the humanmicrobiome communities remains unclear; however, theobservation that human pathogens are enriched for inhospital settings is of obvious concern [11]. Here, weaimed to characterize the interaction between the BE’smicrobiome and the human microbiome through studyof very low birth weight (VLBW, <1,500 g) infantshoused in a neonatal intensive care unit (NICU) as ourmodel system.Infants housed in a NICU are well suited to studies

that aim to characterize interactions between the BE andoccupants. In utero, infants are canonically thought toexist in a sterile or near-sterile environment [12]. Acquisi-tion of the microbiome starts at birth and is strongly influ-enced by mode of delivery [13]. Patterns of colonization infull-term infants tend to follow a well documented trajec-tory affected by diet, host genotype, and a limited set ofother variables, with the infant gut converging on anadult-like state around 2.5 years of life [14,15]. In VLBWinfants, early gut succession is characterized by extremelylimited diversity, chaotic flux in community composition,and an abundance of opportunistic pathogens [16-19]. Itis possible that a high rate of caesarean deliveries and theroutine use of broad-spectrum antibiotics during the firstweek of life serve to decouple VLBW infants from sourceinoculum introduced during the birthing process. Theseinfluences likely render premature infant microbiomes es-pecially susceptible to environmental influences.There is strong evidence suggesting that the ICU

serves as a reservoir of clinically relevant pathogens.‘Outbreaks’ of disease in ICUs are relatively common,and a recent study estimated at least 38% of all ICU out-breaks could be attributed to microbial sources within

the ICU environment, such as equipment, or personnel[20]. In addition, upward of 63% of extremely preterminfants develop life-threatening infections [21]. Epidemi-ologic investigations indicate environmental sources ofinfective agents in air [22], infant incubators [23,24], sinkdrains [25], soap dispensers [26], thermometers [27], andbaby toys [28]. Clearly there is a growing need for com-prehensive ecological surveys of the hospital BE to betterunderstand the overall process of microbe migration andestablishment on and in the body of occupants. Here, weperformed the first matched time series characterizationof the NICU and infant gut. Our analysis used metage-nomic sequencing of microbial community DNA ex-tracted from fecal samples to evaluate the metabolicpotential of gut colonizing microorganisms and a re-cently developed ‘expectation maximization iterative re-construction of genes from the environment’ (EMIRGE)amplicon protocol to profile the microbial communitycomposition of BE samples collected from six environ-ment types [29]. Our protocol was aimed at addressingthe hypothesis that the BE, specifically room surfacesfrequently touched by humans, is a predominant sourceof colonizing microbes in the GI tract of prematureinfants.

MethodsSample collectionFecal samples were collected every third day, starting onthe third day of life, for 1 month from two infants. In-fants were enrolled in the study based on the criteriathat they were <31 weeks’ gestation, <1,250 g at birth,and were housed in the same physical location withinthe NICU during the first month of life. A summary ofhealth-related metadata including antibiotics exposure isprovided in Table 1. Fecal samples were collected usinga previously established perineal stimulation procedureand were stored at -80°C within 10 minutes [16]. Allsamples were collected after signed guardian consentwas obtained, as outlined in our protocol to the ethical

Table 1 Health profile of premature infant cohortCharacteristic Infant 1 Infant 2

Gestational age 26 3/7 weeks 28 2/7 weeks

Weight 951 g 1,148 g

Multiple gestation No Twin

Delivery mode Vaginal Vaginal

Chorioamnionitis Yes Yes

Day of life (DOL)1 to 7 antibiotics

Ampicillin, gentamycin Ampicillin, gentamycin

Other antibiotics No DOL 14 to 16,vancomycin, cefotaxime

Feeding initiated DOL 3, maternal milk DOL 8, artificial formula

Survive to discharge Yes Yes

Brooks et al. Microbiome 2014, 2:1 Page 2 of 16http://www.microbiomejournal.com/content/2/1/1

Page 28: UC Davis EVE161 Lecture 9 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Methods

Page 29: UC Davis EVE161 Lecture 9 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Methods

BackgroundFrom birth to death, humans spend approximately 90%of their time indoors [1]. This realization, coupled withadvancements in DNA sequencing technologies, hasspawned a new interest in studying buildings as ecosys-tems. Pioneering efforts have revealed a built environ-ment (BE), a term used here to collectively describe boththe biotic and abiotic features of a building structure,that is far more complex than originally imagined [2,3].Diverse microbial communities have been uncovered ina variety of BEs [4] and surprisingly, from sites engi-neered to be sterile or near sterile, such as NASA cleanrooms [5,6] and high-risk hospital wards [7-10]. Add-itionally, recent studies characterizing different buildingtypes have revealed general trends suggesting a room’sfunction or architecture dictates the BE’s microbiome[8,11]. Intrabuilding experiments in hospitals have cor-roborated this notion, showing general use areas, suchas waiting rooms and lobbies, have a markedly differentmicrobial community compared to more restrictive hos-pital zones such as intensive care units [8]. The ex-change between the BE microbiome and the humanmicrobiome communities remains unclear; however, theobservation that human pathogens are enriched for inhospital settings is of obvious concern [11]. Here, weaimed to characterize the interaction between the BE’smicrobiome and the human microbiome through studyof very low birth weight (VLBW, <1,500 g) infantshoused in a neonatal intensive care unit (NICU) as ourmodel system.Infants housed in a NICU are well suited to studies

that aim to characterize interactions between the BE andoccupants. In utero, infants are canonically thought toexist in a sterile or near-sterile environment [12]. Acquisi-tion of the microbiome starts at birth and is strongly influ-enced by mode of delivery [13]. Patterns of colonization infull-term infants tend to follow a well documented trajec-tory affected by diet, host genotype, and a limited set ofother variables, with the infant gut converging on anadult-like state around 2.5 years of life [14,15]. In VLBWinfants, early gut succession is characterized by extremelylimited diversity, chaotic flux in community composition,and an abundance of opportunistic pathogens [16-19]. Itis possible that a high rate of caesarean deliveries and theroutine use of broad-spectrum antibiotics during the firstweek of life serve to decouple VLBW infants from sourceinoculum introduced during the birthing process. Theseinfluences likely render premature infant microbiomes es-pecially susceptible to environmental influences.There is strong evidence suggesting that the ICU

serves as a reservoir of clinically relevant pathogens.‘Outbreaks’ of disease in ICUs are relatively common,and a recent study estimated at least 38% of all ICU out-breaks could be attributed to microbial sources within

the ICU environment, such as equipment, or personnel[20]. In addition, upward of 63% of extremely preterminfants develop life-threatening infections [21]. Epidemi-ologic investigations indicate environmental sources ofinfective agents in air [22], infant incubators [23,24], sinkdrains [25], soap dispensers [26], thermometers [27], andbaby toys [28]. Clearly there is a growing need for com-prehensive ecological surveys of the hospital BE to betterunderstand the overall process of microbe migration andestablishment on and in the body of occupants. Here, weperformed the first matched time series characterizationof the NICU and infant gut. Our analysis used metage-nomic sequencing of microbial community DNA ex-tracted from fecal samples to evaluate the metabolicpotential of gut colonizing microorganisms and a re-cently developed ‘expectation maximization iterative re-construction of genes from the environment’ (EMIRGE)amplicon protocol to profile the microbial communitycomposition of BE samples collected from six environ-ment types [29]. Our protocol was aimed at addressingthe hypothesis that the BE, specifically room surfacesfrequently touched by humans, is a predominant sourceof colonizing microbes in the GI tract of prematureinfants.

MethodsSample collectionFecal samples were collected every third day, starting onthe third day of life, for 1 month from two infants. In-fants were enrolled in the study based on the criteriathat they were <31 weeks’ gestation, <1,250 g at birth,and were housed in the same physical location withinthe NICU during the first month of life. A summary ofhealth-related metadata including antibiotics exposure isprovided in Table 1. Fecal samples were collected usinga previously established perineal stimulation procedureand were stored at -80°C within 10 minutes [16]. Allsamples were collected after signed guardian consentwas obtained, as outlined in our protocol to the ethical

Table 1 Health profile of premature infant cohortCharacteristic Infant 1 Infant 2

Gestational age 26 3/7 weeks 28 2/7 weeks

Weight 951 g 1,148 g

Multiple gestation No Twin

Delivery mode Vaginal Vaginal

Chorioamnionitis Yes Yes

Day of life (DOL)1 to 7 antibiotics

Ampicillin, gentamycin Ampicillin, gentamycin

Other antibiotics No DOL 14 to 16,vancomycin, cefotaxime

Feeding initiated DOL 3, maternal milk DOL 8, artificial formula

Survive to discharge Yes Yes

Brooks et al. Microbiome 2014, 2:1 Page 2 of 16http://www.microbiomejournal.com/content/2/1/1

research board of the University of Pittsburgh (IRBPRO11060238). This consent included sample collectionpermissions and consent to publish study findings.All samples were obtained from a private-style NICU at

Magee-Womens Hospital of the University of PittsburghMedical Center. Room samples were collected concur-rently with fecal samples and spanned four timepointson days of collection (9:00, 12:00, 13:00, and 16:00).Most frequently touched surfaces were determined byvisual observation and health care provider interviews inthe weeks leading up to sample collection. Microbial cellswere removed from surfaces using foam tipped swabs(BBL CultureSwab EZ Collection and Transport System,Franklin Lakes, NJ, USA) and a sampling buffer of 0.15 MNaCl and 0.1% Tween20. Six frequently touched areaswere processed per infant room: sink, feeding and intub-ation tubing, hands of healthcare providers and parents,general surfaces, access knobs on the incubator, and nursestation electronics (keyboard, mouse, and cell phone). Allsamples were placed in a sterile transport tube and storedwithin 30 minutes at -80°C until further processing.

DNA extraction and PCR amplificationFrozen fecal samples were thawed on ice and 0.25 g ofthawed sample added to tubes with prewarmed (65°C) lysissolution from the PowerSoil DNA Isolation Kit (MoBioLaboratories, Carlsbad, CA, USA). The incubation wasconducted for 5 minutes and the manufacturer’s protocolfollowed thereafter. Swab heads followed the same proced-ure, except heads were cut with sterilized scissors into theextraction tube before starting the protocol.DNA extracted from swabs was pooled such that the four

timepoints sampled in 1 day per environment were consoli-dated into one sample. Pooled DNA was used as templatefor amplification of the full-length 16S rRNA gene with 27F (5’-AGAGTTTGATCCTGGCTCAG-3’) and 1492R (5’-GGTTACCTTGTTACGACTT-3’) primers [30]. To limitPCR bias, gradient PCR was performed with 5 units/μL ofTaKaRa Ex Taq™ (Takara Bio Inc., Otsu, Japan) across 7 dif-ferent annealing temperatures with the following reaction: 1minute at 94°C; 35 cycles of 1 minute at 94°C, 30 s at 48°Cto 58°C (7°C temperature gradient) and 1 minute at 72°C;and a final extension for 7 minutes at 72°C. Amplicons werecombined across gradients and cleaned with the QIAquickPCR Purification Kit (Qiagen, Hilden, Germany) as directedby the manufacturer. Cleaned amplicons were quantifiedvia Qubit (Life Technologies, Carlsbad, CA, USA) andinput into an Illumina library preparation pipeline.

Sequencing preparation and sequencingIllumina library construction followed standard protocolsat the University of California Davis DNA TechnologiesCore Facility (http://dnatech.genomecenter.ucdavis.edu)as previously described [29]. Briefly, amplicons were

fragmented to an average size of 225 bp using the Biorup-tor NGS (Diagenode, Seraing, Belgium), and sheared frag-ments were used in a robotic library preparation protocolusing the Appollo 324 robot (Integenx, Pleasanton, CA,USA) following the manufacturer’s instructions. Each sam-ple was tagged with unique barcodes consisting of six nucle-otides internal to the adapter read as a separate indexingread, and ligated to each fragment. There were 12 cycles ofPCR enriched for adapter-ligated fragments before libraryquantification and validation. Fecal samples underwent thesame preparation with two exceptions: (1) genomic DNAwas used and (2) DNA was fragmented to 550 bp. Librarieswere added, in equimolar amounts, to the Illumina HiSeq2000 platform. Paired-end sequences were obtained with100 cycles and the data processed with Casava version1.8.2. Raw read data has been deposited in the NCBI ShortRead Archive (accession number SRP033353).

EMIRGE assembly of full-length 16S rRNA gene ampliconsEMIRGE is an iterative template-guided assembler that re-lies on a database of 16S rRNA gene sequences to prob-abilistically generate full-length 16S rRNA gene sequencesand provide the relative abundance of these sequences inthe assayed consortia [31]. For the reference database, weused version 108 of the SILVA SSU database, filtered toexclude sequences <1,200 bp and >1,900 bp [32]. To re-move closely related sequences, we clustered the databaseat 97% identity with USEARCH [33]. A total of 1 millionpaired-end reads from each barcoded library were sam-pled randomly without replacement to accommodatecomputational restrictions associated with use of thefull dataset. Reads from the subsample from each li-brary were stringently trimmed using Sickle [34] forquality scores >30 and length >60 bp. Trimmed readswere input into an amplicon-optimized version ofEMIRGE [29] for assembly using default parameters. Atotal of 80 iterations were performed for each sub-sample. EMIRGE-reconstructed sequences without Nsand with an estimated abundance of 0.01% or greaterwere kept for analysis. Putative chimeras were removedby using the intersection between two chimera detec-tion programs, DECIPHER [35] and UCHIME v6.0 [36]searched against the 2011 Greengenes database [37]. Fi-nally, reconstructed sequences from a spike-in controlexperiment (data not shown) were removed for down-stream analysis. Sequences used in the analysis are pub-licly available as a project attachment at http://ggkbase.berkeley.edu/NICU-Micro/.

Metagenomic EMIRGE assembly of 16S rRNA geneMetagenomic sequencing of 16 fecal samples on 1 lane ofan Illumina HiSeq 2000 produced approximately 350 Mbpof 101 bp paired-end reads. Trimmed reads were inputinto EMIRGE and default parameters run for 80 iterations

Brooks et al. Microbiome 2014, 2:1 Page 3 of 16http://www.microbiomejournal.com/content/2/1/1

research board of the University of Pittsburgh (IRBPRO11060238). This consent included sample collectionpermissions and consent to publish study findings.All samples were obtained from a private-style NICU at

Magee-Womens Hospital of the University of PittsburghMedical Center. Room samples were collected concur-rently with fecal samples and spanned four timepointson days of collection (9:00, 12:00, 13:00, and 16:00).Most frequently touched surfaces were determined byvisual observation and health care provider interviews inthe weeks leading up to sample collection. Microbial cellswere removed from surfaces using foam tipped swabs(BBL CultureSwab EZ Collection and Transport System,Franklin Lakes, NJ, USA) and a sampling buffer of 0.15 MNaCl and 0.1% Tween20. Six frequently touched areaswere processed per infant room: sink, feeding and intub-ation tubing, hands of healthcare providers and parents,general surfaces, access knobs on the incubator, and nursestation electronics (keyboard, mouse, and cell phone). Allsamples were placed in a sterile transport tube and storedwithin 30 minutes at -80°C until further processing.

DNA extraction and PCR amplificationFrozen fecal samples were thawed on ice and 0.25 g ofthawed sample added to tubes with prewarmed (65°C) lysissolution from the PowerSoil DNA Isolation Kit (MoBioLaboratories, Carlsbad, CA, USA). The incubation wasconducted for 5 minutes and the manufacturer’s protocolfollowed thereafter. Swab heads followed the same proced-ure, except heads were cut with sterilized scissors into theextraction tube before starting the protocol.DNA extracted from swabs was pooled such that the four

timepoints sampled in 1 day per environment were consoli-dated into one sample. Pooled DNA was used as templatefor amplification of the full-length 16S rRNA gene with 27F (5’-AGAGTTTGATCCTGGCTCAG-3’) and 1492R (5’-GGTTACCTTGTTACGACTT-3’) primers [30]. To limitPCR bias, gradient PCR was performed with 5 units/μL ofTaKaRa Ex Taq™ (Takara Bio Inc., Otsu, Japan) across 7 dif-ferent annealing temperatures with the following reaction: 1minute at 94°C; 35 cycles of 1 minute at 94°C, 30 s at 48°Cto 58°C (7°C temperature gradient) and 1 minute at 72°C;and a final extension for 7 minutes at 72°C. Amplicons werecombined across gradients and cleaned with the QIAquickPCR Purification Kit (Qiagen, Hilden, Germany) as directedby the manufacturer. Cleaned amplicons were quantifiedvia Qubit (Life Technologies, Carlsbad, CA, USA) andinput into an Illumina library preparation pipeline.

Sequencing preparation and sequencingIllumina library construction followed standard protocolsat the University of California Davis DNA TechnologiesCore Facility (http://dnatech.genomecenter.ucdavis.edu)as previously described [29]. Briefly, amplicons were

fragmented to an average size of 225 bp using the Biorup-tor NGS (Diagenode, Seraing, Belgium), and sheared frag-ments were used in a robotic library preparation protocolusing the Appollo 324 robot (Integenx, Pleasanton, CA,USA) following the manufacturer’s instructions. Each sam-ple was tagged with unique barcodes consisting of six nucle-otides internal to the adapter read as a separate indexingread, and ligated to each fragment. There were 12 cycles ofPCR enriched for adapter-ligated fragments before libraryquantification and validation. Fecal samples underwent thesame preparation with two exceptions: (1) genomic DNAwas used and (2) DNA was fragmented to 550 bp. Librarieswere added, in equimolar amounts, to the Illumina HiSeq2000 platform. Paired-end sequences were obtained with100 cycles and the data processed with Casava version1.8.2. Raw read data has been deposited in the NCBI ShortRead Archive (accession number SRP033353).

EMIRGE assembly of full-length 16S rRNA gene ampliconsEMIRGE is an iterative template-guided assembler that re-lies on a database of 16S rRNA gene sequences to prob-abilistically generate full-length 16S rRNA gene sequencesand provide the relative abundance of these sequences inthe assayed consortia [31]. For the reference database, weused version 108 of the SILVA SSU database, filtered toexclude sequences <1,200 bp and >1,900 bp [32]. To re-move closely related sequences, we clustered the databaseat 97% identity with USEARCH [33]. A total of 1 millionpaired-end reads from each barcoded library were sam-pled randomly without replacement to accommodatecomputational restrictions associated with use of thefull dataset. Reads from the subsample from each li-brary were stringently trimmed using Sickle [34] forquality scores >30 and length >60 bp. Trimmed readswere input into an amplicon-optimized version ofEMIRGE [29] for assembly using default parameters. Atotal of 80 iterations were performed for each sub-sample. EMIRGE-reconstructed sequences without Nsand with an estimated abundance of 0.01% or greaterwere kept for analysis. Putative chimeras were removedby using the intersection between two chimera detec-tion programs, DECIPHER [35] and UCHIME v6.0 [36]searched against the 2011 Greengenes database [37]. Fi-nally, reconstructed sequences from a spike-in controlexperiment (data not shown) were removed for down-stream analysis. Sequences used in the analysis are pub-licly available as a project attachment at http://ggkbase.berkeley.edu/NICU-Micro/.

Metagenomic EMIRGE assembly of 16S rRNA geneMetagenomic sequencing of 16 fecal samples on 1 lane ofan Illumina HiSeq 2000 produced approximately 350 Mbpof 101 bp paired-end reads. Trimmed reads were inputinto EMIRGE and default parameters run for 80 iterations

Brooks et al. Microbiome 2014, 2:1 Page 3 of 16http://www.microbiomejournal.com/content/2/1/1

research board of the University of Pittsburgh (IRBPRO11060238). This consent included sample collectionpermissions and consent to publish study findings.All samples were obtained from a private-style NICU at

Magee-Womens Hospital of the University of PittsburghMedical Center. Room samples were collected concur-rently with fecal samples and spanned four timepointson days of collection (9:00, 12:00, 13:00, and 16:00).Most frequently touched surfaces were determined byvisual observation and health care provider interviews inthe weeks leading up to sample collection. Microbial cellswere removed from surfaces using foam tipped swabs(BBL CultureSwab EZ Collection and Transport System,Franklin Lakes, NJ, USA) and a sampling buffer of 0.15 MNaCl and 0.1% Tween20. Six frequently touched areaswere processed per infant room: sink, feeding and intub-ation tubing, hands of healthcare providers and parents,general surfaces, access knobs on the incubator, and nursestation electronics (keyboard, mouse, and cell phone). Allsamples were placed in a sterile transport tube and storedwithin 30 minutes at -80°C until further processing.

DNA extraction and PCR amplificationFrozen fecal samples were thawed on ice and 0.25 g ofthawed sample added to tubes with prewarmed (65°C) lysissolution from the PowerSoil DNA Isolation Kit (MoBioLaboratories, Carlsbad, CA, USA). The incubation wasconducted for 5 minutes and the manufacturer’s protocolfollowed thereafter. Swab heads followed the same proced-ure, except heads were cut with sterilized scissors into theextraction tube before starting the protocol.DNA extracted from swabs was pooled such that the four

timepoints sampled in 1 day per environment were consoli-dated into one sample. Pooled DNA was used as templatefor amplification of the full-length 16S rRNA gene with 27F (5’-AGAGTTTGATCCTGGCTCAG-3’) and 1492R (5’-GGTTACCTTGTTACGACTT-3’) primers [30]. To limitPCR bias, gradient PCR was performed with 5 units/μL ofTaKaRa Ex Taq™ (Takara Bio Inc., Otsu, Japan) across 7 dif-ferent annealing temperatures with the following reaction: 1minute at 94°C; 35 cycles of 1 minute at 94°C, 30 s at 48°Cto 58°C (7°C temperature gradient) and 1 minute at 72°C;and a final extension for 7 minutes at 72°C. Amplicons werecombined across gradients and cleaned with the QIAquickPCR Purification Kit (Qiagen, Hilden, Germany) as directedby the manufacturer. Cleaned amplicons were quantifiedvia Qubit (Life Technologies, Carlsbad, CA, USA) andinput into an Illumina library preparation pipeline.

Sequencing preparation and sequencingIllumina library construction followed standard protocolsat the University of California Davis DNA TechnologiesCore Facility (http://dnatech.genomecenter.ucdavis.edu)as previously described [29]. Briefly, amplicons were

fragmented to an average size of 225 bp using the Biorup-tor NGS (Diagenode, Seraing, Belgium), and sheared frag-ments were used in a robotic library preparation protocolusing the Appollo 324 robot (Integenx, Pleasanton, CA,USA) following the manufacturer’s instructions. Each sam-ple was tagged with unique barcodes consisting of six nucle-otides internal to the adapter read as a separate indexingread, and ligated to each fragment. There were 12 cycles ofPCR enriched for adapter-ligated fragments before libraryquantification and validation. Fecal samples underwent thesame preparation with two exceptions: (1) genomic DNAwas used and (2) DNA was fragmented to 550 bp. Librarieswere added, in equimolar amounts, to the Illumina HiSeq2000 platform. Paired-end sequences were obtained with100 cycles and the data processed with Casava version1.8.2. Raw read data has been deposited in the NCBI ShortRead Archive (accession number SRP033353).

EMIRGE assembly of full-length 16S rRNA gene ampliconsEMIRGE is an iterative template-guided assembler that re-lies on a database of 16S rRNA gene sequences to prob-abilistically generate full-length 16S rRNA gene sequencesand provide the relative abundance of these sequences inthe assayed consortia [31]. For the reference database, weused version 108 of the SILVA SSU database, filtered toexclude sequences <1,200 bp and >1,900 bp [32]. To re-move closely related sequences, we clustered the databaseat 97% identity with USEARCH [33]. A total of 1 millionpaired-end reads from each barcoded library were sam-pled randomly without replacement to accommodatecomputational restrictions associated with use of thefull dataset. Reads from the subsample from each li-brary were stringently trimmed using Sickle [34] forquality scores >30 and length >60 bp. Trimmed readswere input into an amplicon-optimized version ofEMIRGE [29] for assembly using default parameters. Atotal of 80 iterations were performed for each sub-sample. EMIRGE-reconstructed sequences without Nsand with an estimated abundance of 0.01% or greaterwere kept for analysis. Putative chimeras were removedby using the intersection between two chimera detec-tion programs, DECIPHER [35] and UCHIME v6.0 [36]searched against the 2011 Greengenes database [37]. Fi-nally, reconstructed sequences from a spike-in controlexperiment (data not shown) were removed for down-stream analysis. Sequences used in the analysis are pub-licly available as a project attachment at http://ggkbase.berkeley.edu/NICU-Micro/.

Metagenomic EMIRGE assembly of 16S rRNA geneMetagenomic sequencing of 16 fecal samples on 1 lane ofan Illumina HiSeq 2000 produced approximately 350 Mbpof 101 bp paired-end reads. Trimmed reads were inputinto EMIRGE and default parameters run for 80 iterations

Brooks et al. Microbiome 2014, 2:1 Page 3 of 16http://www.microbiomejournal.com/content/2/1/1

research board of the University of Pittsburgh (IRBPRO11060238). This consent included sample collectionpermissions and consent to publish study findings.All samples were obtained from a private-style NICU at

Magee-Womens Hospital of the University of PittsburghMedical Center. Room samples were collected concur-rently with fecal samples and spanned four timepointson days of collection (9:00, 12:00, 13:00, and 16:00).Most frequently touched surfaces were determined byvisual observation and health care provider interviews inthe weeks leading up to sample collection. Microbial cellswere removed from surfaces using foam tipped swabs(BBL CultureSwab EZ Collection and Transport System,Franklin Lakes, NJ, USA) and a sampling buffer of 0.15 MNaCl and 0.1% Tween20. Six frequently touched areaswere processed per infant room: sink, feeding and intub-ation tubing, hands of healthcare providers and parents,general surfaces, access knobs on the incubator, and nursestation electronics (keyboard, mouse, and cell phone). Allsamples were placed in a sterile transport tube and storedwithin 30 minutes at -80°C until further processing.

DNA extraction and PCR amplificationFrozen fecal samples were thawed on ice and 0.25 g ofthawed sample added to tubes with prewarmed (65°C) lysissolution from the PowerSoil DNA Isolation Kit (MoBioLaboratories, Carlsbad, CA, USA). The incubation wasconducted for 5 minutes and the manufacturer’s protocolfollowed thereafter. Swab heads followed the same proced-ure, except heads were cut with sterilized scissors into theextraction tube before starting the protocol.DNA extracted from swabs was pooled such that the four

timepoints sampled in 1 day per environment were consoli-dated into one sample. Pooled DNA was used as templatefor amplification of the full-length 16S rRNA gene with 27F (5’-AGAGTTTGATCCTGGCTCAG-3’) and 1492R (5’-GGTTACCTTGTTACGACTT-3’) primers [30]. To limitPCR bias, gradient PCR was performed with 5 units/μL ofTaKaRa Ex Taq™ (Takara Bio Inc., Otsu, Japan) across 7 dif-ferent annealing temperatures with the following reaction: 1minute at 94°C; 35 cycles of 1 minute at 94°C, 30 s at 48°Cto 58°C (7°C temperature gradient) and 1 minute at 72°C;and a final extension for 7 minutes at 72°C. Amplicons werecombined across gradients and cleaned with the QIAquickPCR Purification Kit (Qiagen, Hilden, Germany) as directedby the manufacturer. Cleaned amplicons were quantifiedvia Qubit (Life Technologies, Carlsbad, CA, USA) andinput into an Illumina library preparation pipeline.

Sequencing preparation and sequencingIllumina library construction followed standard protocolsat the University of California Davis DNA TechnologiesCore Facility (http://dnatech.genomecenter.ucdavis.edu)as previously described [29]. Briefly, amplicons were

fragmented to an average size of 225 bp using the Biorup-tor NGS (Diagenode, Seraing, Belgium), and sheared frag-ments were used in a robotic library preparation protocolusing the Appollo 324 robot (Integenx, Pleasanton, CA,USA) following the manufacturer’s instructions. Each sam-ple was tagged with unique barcodes consisting of six nucle-otides internal to the adapter read as a separate indexingread, and ligated to each fragment. There were 12 cycles ofPCR enriched for adapter-ligated fragments before libraryquantification and validation. Fecal samples underwent thesame preparation with two exceptions: (1) genomic DNAwas used and (2) DNA was fragmented to 550 bp. Librarieswere added, in equimolar amounts, to the Illumina HiSeq2000 platform. Paired-end sequences were obtained with100 cycles and the data processed with Casava version1.8.2. Raw read data has been deposited in the NCBI ShortRead Archive (accession number SRP033353).

EMIRGE assembly of full-length 16S rRNA gene ampliconsEMIRGE is an iterative template-guided assembler that re-lies on a database of 16S rRNA gene sequences to prob-abilistically generate full-length 16S rRNA gene sequencesand provide the relative abundance of these sequences inthe assayed consortia [31]. For the reference database, weused version 108 of the SILVA SSU database, filtered toexclude sequences <1,200 bp and >1,900 bp [32]. To re-move closely related sequences, we clustered the databaseat 97% identity with USEARCH [33]. A total of 1 millionpaired-end reads from each barcoded library were sam-pled randomly without replacement to accommodatecomputational restrictions associated with use of thefull dataset. Reads from the subsample from each li-brary were stringently trimmed using Sickle [34] forquality scores >30 and length >60 bp. Trimmed readswere input into an amplicon-optimized version ofEMIRGE [29] for assembly using default parameters. Atotal of 80 iterations were performed for each sub-sample. EMIRGE-reconstructed sequences without Nsand with an estimated abundance of 0.01% or greaterwere kept for analysis. Putative chimeras were removedby using the intersection between two chimera detec-tion programs, DECIPHER [35] and UCHIME v6.0 [36]searched against the 2011 Greengenes database [37]. Fi-nally, reconstructed sequences from a spike-in controlexperiment (data not shown) were removed for down-stream analysis. Sequences used in the analysis are pub-licly available as a project attachment at http://ggkbase.berkeley.edu/NICU-Micro/.

Metagenomic EMIRGE assembly of 16S rRNA geneMetagenomic sequencing of 16 fecal samples on 1 lane ofan Illumina HiSeq 2000 produced approximately 350 Mbpof 101 bp paired-end reads. Trimmed reads were inputinto EMIRGE and default parameters run for 80 iterations

Brooks et al. Microbiome 2014, 2:1 Page 3 of 16http://www.microbiomejournal.com/content/2/1/1

research board of the University of Pittsburgh (IRBPRO11060238). This consent included sample collectionpermissions and consent to publish study findings.All samples were obtained from a private-style NICU at

Magee-Womens Hospital of the University of PittsburghMedical Center. Room samples were collected concur-rently with fecal samples and spanned four timepointson days of collection (9:00, 12:00, 13:00, and 16:00).Most frequently touched surfaces were determined byvisual observation and health care provider interviews inthe weeks leading up to sample collection. Microbial cellswere removed from surfaces using foam tipped swabs(BBL CultureSwab EZ Collection and Transport System,Franklin Lakes, NJ, USA) and a sampling buffer of 0.15 MNaCl and 0.1% Tween20. Six frequently touched areaswere processed per infant room: sink, feeding and intub-ation tubing, hands of healthcare providers and parents,general surfaces, access knobs on the incubator, and nursestation electronics (keyboard, mouse, and cell phone). Allsamples were placed in a sterile transport tube and storedwithin 30 minutes at -80°C until further processing.

DNA extraction and PCR amplificationFrozen fecal samples were thawed on ice and 0.25 g ofthawed sample added to tubes with prewarmed (65°C) lysissolution from the PowerSoil DNA Isolation Kit (MoBioLaboratories, Carlsbad, CA, USA). The incubation wasconducted for 5 minutes and the manufacturer’s protocolfollowed thereafter. Swab heads followed the same proced-ure, except heads were cut with sterilized scissors into theextraction tube before starting the protocol.DNA extracted from swabs was pooled such that the four

timepoints sampled in 1 day per environment were consoli-dated into one sample. Pooled DNA was used as templatefor amplification of the full-length 16S rRNA gene with 27F (5’-AGAGTTTGATCCTGGCTCAG-3’) and 1492R (5’-GGTTACCTTGTTACGACTT-3’) primers [30]. To limitPCR bias, gradient PCR was performed with 5 units/μL ofTaKaRa Ex Taq™ (Takara Bio Inc., Otsu, Japan) across 7 dif-ferent annealing temperatures with the following reaction: 1minute at 94°C; 35 cycles of 1 minute at 94°C, 30 s at 48°Cto 58°C (7°C temperature gradient) and 1 minute at 72°C;and a final extension for 7 minutes at 72°C. Amplicons werecombined across gradients and cleaned with the QIAquickPCR Purification Kit (Qiagen, Hilden, Germany) as directedby the manufacturer. Cleaned amplicons were quantifiedvia Qubit (Life Technologies, Carlsbad, CA, USA) andinput into an Illumina library preparation pipeline.

Sequencing preparation and sequencingIllumina library construction followed standard protocolsat the University of California Davis DNA TechnologiesCore Facility (http://dnatech.genomecenter.ucdavis.edu)as previously described [29]. Briefly, amplicons were

fragmented to an average size of 225 bp using the Biorup-tor NGS (Diagenode, Seraing, Belgium), and sheared frag-ments were used in a robotic library preparation protocolusing the Appollo 324 robot (Integenx, Pleasanton, CA,USA) following the manufacturer’s instructions. Each sam-ple was tagged with unique barcodes consisting of six nucle-otides internal to the adapter read as a separate indexingread, and ligated to each fragment. There were 12 cycles ofPCR enriched for adapter-ligated fragments before libraryquantification and validation. Fecal samples underwent thesame preparation with two exceptions: (1) genomic DNAwas used and (2) DNA was fragmented to 550 bp. Librarieswere added, in equimolar amounts, to the Illumina HiSeq2000 platform. Paired-end sequences were obtained with100 cycles and the data processed with Casava version1.8.2. Raw read data has been deposited in the NCBI ShortRead Archive (accession number SRP033353).

EMIRGE assembly of full-length 16S rRNA gene ampliconsEMIRGE is an iterative template-guided assembler that re-lies on a database of 16S rRNA gene sequences to prob-abilistically generate full-length 16S rRNA gene sequencesand provide the relative abundance of these sequences inthe assayed consortia [31]. For the reference database, weused version 108 of the SILVA SSU database, filtered toexclude sequences <1,200 bp and >1,900 bp [32]. To re-move closely related sequences, we clustered the databaseat 97% identity with USEARCH [33]. A total of 1 millionpaired-end reads from each barcoded library were sam-pled randomly without replacement to accommodatecomputational restrictions associated with use of thefull dataset. Reads from the subsample from each li-brary were stringently trimmed using Sickle [34] forquality scores >30 and length >60 bp. Trimmed readswere input into an amplicon-optimized version ofEMIRGE [29] for assembly using default parameters. Atotal of 80 iterations were performed for each sub-sample. EMIRGE-reconstructed sequences without Nsand with an estimated abundance of 0.01% or greaterwere kept for analysis. Putative chimeras were removedby using the intersection between two chimera detec-tion programs, DECIPHER [35] and UCHIME v6.0 [36]searched against the 2011 Greengenes database [37]. Fi-nally, reconstructed sequences from a spike-in controlexperiment (data not shown) were removed for down-stream analysis. Sequences used in the analysis are pub-licly available as a project attachment at http://ggkbase.berkeley.edu/NICU-Micro/.

Metagenomic EMIRGE assembly of 16S rRNA geneMetagenomic sequencing of 16 fecal samples on 1 lane ofan Illumina HiSeq 2000 produced approximately 350 Mbpof 101 bp paired-end reads. Trimmed reads were inputinto EMIRGE and default parameters run for 80 iterations

Brooks et al. Microbiome 2014, 2:1 Page 3 of 16http://www.microbiomejournal.com/content/2/1/1

using the aforementioned database. After the final iter-ation, 153,980 reads, spanning all samples, were used inreconstructing fecal 16S rRNA sequences. Downstreamfiltering and analysis of reconstructed 16S rRNA gene se-quences from fecal samples followed that of the roomsamples.

Community analysis of room and fecal samplesFor community analysis, EMIRGE-reconstructed se-quences were input into the standard QIIME 1.5.0 work-flow [38]. For presence/absence analyses, representativeoperational taxonomic units (OTUs) were clustered atthe >97% identity level using USEARCH [33] and anOTU table was constructed using QIIME’s pick_otus_through_otu_table.py script. An adjusted OTU table thatincorporated EMIRGE generated abundances was con-structed using an in-house script [29] and is publiclyavailable as a project attachment at http://ggkbase.berkeley.edu/NICU-Micro/. OTUs were aligned to the Greengenes[39] reference alignment (gg_97_otus_4feb2011.fasta) usingthe PyNAST aligner [40] and a phylogenetic tree builtusing FastTree v.2.1.3 [41] with default parameters. Betadiversity was calculated from similar trees using FastUniFrac scores and visualized with principle coordinatesanalysis (PCoA) [42]. Taxonomy was assigned to each OTUat the genera and/or species level using the RibosomalDatabase Project (RDP) classifier [43] at a confidenceinterval of 0.8 and trained with the same Greengenesdatabase. OTUs were visualized across room-infant pairsin a spring-weighted, edge-embedded network plot byusing QIIME’s make_otu_network.py script [38] with themodified OTU table as input.

Metagenomic assembly and gene predictionAssemblies were constructed using idba_ud [44] and aniterative implementation of Velvet [45,46]. For idba_udassemblies, trimmed reads were assembled using defaultparameters. For the Velvet assemblies, sequence cover-age bins representing major genomes in the dataset wereidentified by first running the program with permissiveparameters in which the k-mer size covered the wholerange of observed coverages. We summed the k-mercoverages for all contigs generated by this assembly todefine the coverage bins (each of which contains one ormore genomes). This provided bin-specific expectedcoverage, k-mer size, coverage cutoff, and coverage col-lection threshold parameters for the iterative assembly.After each iteration targeting a specific bin, the bin-specific reads were removed from the dataset.Time-series-coverage-based emergent self-organizing

maps (ESOMs) were used to bin scaffolds generated bymetagenomic assembly [47]. Genes were predicted andtranslated into protein sequences using Prodigal [48].Functional annotation was added with an in-house

pipeline [46]. Genome completeness was determinedbased on the number of single-copy genes and otherconserved genes [49,50] identified in each bin. The rela-tive abundance of each organism in each sample was cal-culated by mapping reads to unique regions on theassembled genomes. Metagenomic assemblies along withtheir annotations are publicly available at http://ggkbase.berkeley.edu/NICU-Micro/.

Enterococcus faecalis concatenated ribosomal proteinphylogenyFor phylogenetic resolution beyond the 16S rRNA gene,32 highly conserved, single copy ribosomal proteins wereused from infant 1 and 2’s assemblies (RpL10, 13, 14, 16,17, 18, 19, 2, 20, 21, 22, 24, 27, 29, 3, 30, 4, 5, andRpS10, 11, 12, 13, 15, 16, 17, 18, 19, 20, 5, 6, 7, 8). Thesame genes from recently sequenced E. faecalis ge-nomes, in addition to genes from more distantly relatedtaxa, were obtained from the JGI IMG database. To-gether, each gene set was aligned using MUSCLE 3.8.31[51,52] and manually curated to remove ambiguouslyaligned regions and end gaps [53]. The curated align-ments were concatenated to form a 32-gene, 39-taxa,4,101-position alignment. A maximum likelihood phyl-ogeny for the concatenated alignment was conductedusing PhyML under the LG + α + γ model of evolutionwith 100 bootstrap replicates.

ResultsStability of NICU room samples over time and spaceAfter sample preparation, 57 and 36 samples amplifiedsuccessfully and were subsequently analyzed for infant 1and infant 2, respectively (Table 2). EMIRGE generatedapproximately 12,000 full-length 16S rRNA sequencesand OTUs for each room-infant pair (clustered at the97% nucleotide identity level). Broadly speaking, speciesrichness decreased from electronics > sinks > surfaces >incubators > hands > tubes, a finding that was corrobo-rated with several alpha diversity indexes (Table 3).Nearly 300 genera were detected in the NICU. Tobroadly visualize temporal stability of environmentsacross time and space, the phylum level classificationsare plotted in Figure 1. Actinobacteria, Firmicutes, andProteobacteria dominate the sampled environments, withareas most exposed to human skin deposition having themost variation over time. At lower taxonomic levels, simi-lar trends are observed. Based on the 20 most abundantfamilies, frequently touched surfaces are distinct from in-frequently touched surfaces (Figure 1). UniFrac distance-based community composition PCoA reveals four discern-ible ecosystem types (skin associated communities, sinks,tubes, and feces) and confirms clustering of samples proneto skin deposition via touching (Figure 2).

Brooks et al. Microbiome 2014, 2:1 Page 4 of 16http://www.microbiomejournal.com/content/2/1/1

using the aforementioned database. After the final iter-ation, 153,980 reads, spanning all samples, were used inreconstructing fecal 16S rRNA sequences. Downstreamfiltering and analysis of reconstructed 16S rRNA gene se-quences from fecal samples followed that of the roomsamples.

Community analysis of room and fecal samplesFor community analysis, EMIRGE-reconstructed se-quences were input into the standard QIIME 1.5.0 work-flow [38]. For presence/absence analyses, representativeoperational taxonomic units (OTUs) were clustered atthe >97% identity level using USEARCH [33] and anOTU table was constructed using QIIME’s pick_otus_through_otu_table.py script. An adjusted OTU table thatincorporated EMIRGE generated abundances was con-structed using an in-house script [29] and is publiclyavailable as a project attachment at http://ggkbase.berkeley.edu/NICU-Micro/. OTUs were aligned to the Greengenes[39] reference alignment (gg_97_otus_4feb2011.fasta) usingthe PyNAST aligner [40] and a phylogenetic tree builtusing FastTree v.2.1.3 [41] with default parameters. Betadiversity was calculated from similar trees using FastUniFrac scores and visualized with principle coordinatesanalysis (PCoA) [42]. Taxonomy was assigned to each OTUat the genera and/or species level using the RibosomalDatabase Project (RDP) classifier [43] at a confidenceinterval of 0.8 and trained with the same Greengenesdatabase. OTUs were visualized across room-infant pairsin a spring-weighted, edge-embedded network plot byusing QIIME’s make_otu_network.py script [38] with themodified OTU table as input.

Metagenomic assembly and gene predictionAssemblies were constructed using idba_ud [44] and aniterative implementation of Velvet [45,46]. For idba_udassemblies, trimmed reads were assembled using defaultparameters. For the Velvet assemblies, sequence cover-age bins representing major genomes in the dataset wereidentified by first running the program with permissiveparameters in which the k-mer size covered the wholerange of observed coverages. We summed the k-mercoverages for all contigs generated by this assembly todefine the coverage bins (each of which contains one ormore genomes). This provided bin-specific expectedcoverage, k-mer size, coverage cutoff, and coverage col-lection threshold parameters for the iterative assembly.After each iteration targeting a specific bin, the bin-specific reads were removed from the dataset.Time-series-coverage-based emergent self-organizing

maps (ESOMs) were used to bin scaffolds generated bymetagenomic assembly [47]. Genes were predicted andtranslated into protein sequences using Prodigal [48].Functional annotation was added with an in-house

pipeline [46]. Genome completeness was determinedbased on the number of single-copy genes and otherconserved genes [49,50] identified in each bin. The rela-tive abundance of each organism in each sample was cal-culated by mapping reads to unique regions on theassembled genomes. Metagenomic assemblies along withtheir annotations are publicly available at http://ggkbase.berkeley.edu/NICU-Micro/.

Enterococcus faecalis concatenated ribosomal proteinphylogenyFor phylogenetic resolution beyond the 16S rRNA gene,32 highly conserved, single copy ribosomal proteins wereused from infant 1 and 2’s assemblies (RpL10, 13, 14, 16,17, 18, 19, 2, 20, 21, 22, 24, 27, 29, 3, 30, 4, 5, andRpS10, 11, 12, 13, 15, 16, 17, 18, 19, 20, 5, 6, 7, 8). Thesame genes from recently sequenced E. faecalis ge-nomes, in addition to genes from more distantly relatedtaxa, were obtained from the JGI IMG database. To-gether, each gene set was aligned using MUSCLE 3.8.31[51,52] and manually curated to remove ambiguouslyaligned regions and end gaps [53]. The curated align-ments were concatenated to form a 32-gene, 39-taxa,4,101-position alignment. A maximum likelihood phyl-ogeny for the concatenated alignment was conductedusing PhyML under the LG + α + γ model of evolutionwith 100 bootstrap replicates.

ResultsStability of NICU room samples over time and spaceAfter sample preparation, 57 and 36 samples amplifiedsuccessfully and were subsequently analyzed for infant 1and infant 2, respectively (Table 2). EMIRGE generatedapproximately 12,000 full-length 16S rRNA sequencesand OTUs for each room-infant pair (clustered at the97% nucleotide identity level). Broadly speaking, speciesrichness decreased from electronics > sinks > surfaces >incubators > hands > tubes, a finding that was corrobo-rated with several alpha diversity indexes (Table 3).Nearly 300 genera were detected in the NICU. Tobroadly visualize temporal stability of environmentsacross time and space, the phylum level classificationsare plotted in Figure 1. Actinobacteria, Firmicutes, andProteobacteria dominate the sampled environments, withareas most exposed to human skin deposition having themost variation over time. At lower taxonomic levels, simi-lar trends are observed. Based on the 20 most abundantfamilies, frequently touched surfaces are distinct from in-frequently touched surfaces (Figure 1). UniFrac distance-based community composition PCoA reveals four discern-ible ecosystem types (skin associated communities, sinks,tubes, and feces) and confirms clustering of samples proneto skin deposition via touching (Figure 2).

Brooks et al. Microbiome 2014, 2:1 Page 4 of 16http://www.microbiomejournal.com/content/2/1/1

using the aforementioned database. After the final iter-ation, 153,980 reads, spanning all samples, were used inreconstructing fecal 16S rRNA sequences. Downstreamfiltering and analysis of reconstructed 16S rRNA gene se-quences from fecal samples followed that of the roomsamples.

Community analysis of room and fecal samplesFor community analysis, EMIRGE-reconstructed se-quences were input into the standard QIIME 1.5.0 work-flow [38]. For presence/absence analyses, representativeoperational taxonomic units (OTUs) were clustered atthe >97% identity level using USEARCH [33] and anOTU table was constructed using QIIME’s pick_otus_through_otu_table.py script. An adjusted OTU table thatincorporated EMIRGE generated abundances was con-structed using an in-house script [29] and is publiclyavailable as a project attachment at http://ggkbase.berkeley.edu/NICU-Micro/. OTUs were aligned to the Greengenes[39] reference alignment (gg_97_otus_4feb2011.fasta) usingthe PyNAST aligner [40] and a phylogenetic tree builtusing FastTree v.2.1.3 [41] with default parameters. Betadiversity was calculated from similar trees using FastUniFrac scores and visualized with principle coordinatesanalysis (PCoA) [42]. Taxonomy was assigned to each OTUat the genera and/or species level using the RibosomalDatabase Project (RDP) classifier [43] at a confidenceinterval of 0.8 and trained with the same Greengenesdatabase. OTUs were visualized across room-infant pairsin a spring-weighted, edge-embedded network plot byusing QIIME’s make_otu_network.py script [38] with themodified OTU table as input.

Metagenomic assembly and gene predictionAssemblies were constructed using idba_ud [44] and aniterative implementation of Velvet [45,46]. For idba_udassemblies, trimmed reads were assembled using defaultparameters. For the Velvet assemblies, sequence cover-age bins representing major genomes in the dataset wereidentified by first running the program with permissiveparameters in which the k-mer size covered the wholerange of observed coverages. We summed the k-mercoverages for all contigs generated by this assembly todefine the coverage bins (each of which contains one ormore genomes). This provided bin-specific expectedcoverage, k-mer size, coverage cutoff, and coverage col-lection threshold parameters for the iterative assembly.After each iteration targeting a specific bin, the bin-specific reads were removed from the dataset.Time-series-coverage-based emergent self-organizing

maps (ESOMs) were used to bin scaffolds generated bymetagenomic assembly [47]. Genes were predicted andtranslated into protein sequences using Prodigal [48].Functional annotation was added with an in-house

pipeline [46]. Genome completeness was determinedbased on the number of single-copy genes and otherconserved genes [49,50] identified in each bin. The rela-tive abundance of each organism in each sample was cal-culated by mapping reads to unique regions on theassembled genomes. Metagenomic assemblies along withtheir annotations are publicly available at http://ggkbase.berkeley.edu/NICU-Micro/.

Enterococcus faecalis concatenated ribosomal proteinphylogenyFor phylogenetic resolution beyond the 16S rRNA gene,32 highly conserved, single copy ribosomal proteins wereused from infant 1 and 2’s assemblies (RpL10, 13, 14, 16,17, 18, 19, 2, 20, 21, 22, 24, 27, 29, 3, 30, 4, 5, andRpS10, 11, 12, 13, 15, 16, 17, 18, 19, 20, 5, 6, 7, 8). Thesame genes from recently sequenced E. faecalis ge-nomes, in addition to genes from more distantly relatedtaxa, were obtained from the JGI IMG database. To-gether, each gene set was aligned using MUSCLE 3.8.31[51,52] and manually curated to remove ambiguouslyaligned regions and end gaps [53]. The curated align-ments were concatenated to form a 32-gene, 39-taxa,4,101-position alignment. A maximum likelihood phyl-ogeny for the concatenated alignment was conductedusing PhyML under the LG + α + γ model of evolutionwith 100 bootstrap replicates.

ResultsStability of NICU room samples over time and spaceAfter sample preparation, 57 and 36 samples amplifiedsuccessfully and were subsequently analyzed for infant 1and infant 2, respectively (Table 2). EMIRGE generatedapproximately 12,000 full-length 16S rRNA sequencesand OTUs for each room-infant pair (clustered at the97% nucleotide identity level). Broadly speaking, speciesrichness decreased from electronics > sinks > surfaces >incubators > hands > tubes, a finding that was corrobo-rated with several alpha diversity indexes (Table 3).Nearly 300 genera were detected in the NICU. Tobroadly visualize temporal stability of environmentsacross time and space, the phylum level classificationsare plotted in Figure 1. Actinobacteria, Firmicutes, andProteobacteria dominate the sampled environments, withareas most exposed to human skin deposition having themost variation over time. At lower taxonomic levels, simi-lar trends are observed. Based on the 20 most abundantfamilies, frequently touched surfaces are distinct from in-frequently touched surfaces (Figure 1). UniFrac distance-based community composition PCoA reveals four discern-ible ecosystem types (skin associated communities, sinks,tubes, and feces) and confirms clustering of samples proneto skin deposition via touching (Figure 2).

Brooks et al. Microbiome 2014, 2:1 Page 4 of 16http://www.microbiomejournal.com/content/2/1/1

Page 30: UC Davis EVE161 Lecture 9 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014Time-series characterization of fecal samplesMore than 94% of the reads from infant 1’s samplesmapped to scaffolds generated by the idba_ud assembly.Consequently, this assembly was accepted for furtheranalysis. In comparison, the initial idba_ud assembly of

metagenomic data from infant 2 was highly fragmented,and less than 40% of reads could be mapped to the as-sembled scaffolds. Subsequent reassembly of metage-nomic data from infant 2’s samples using the iterativeVelvet-based assembly approach [54] generated a signifi-cantly better result. As >90% of reads could be mappedto the scaffolds generated by the Velvet assembly, thisassembly was chosen for further analysis.The de novo assemblies reconstructed a majority of

the genomes for 4 of the 5 and 8 of the 11 most abun-dant bacterial colonists from infant 1 and infant 2’smetagenomes, respectively. For infant 1, time-series or-ganism abundance patterns in the sample sets analyzedvia ESOM (Figure 3) defined five major genome bins forwhich between 37% and 99% of the single copy geneswere identified, based on standard analyses of the singlecopy gene inventory (Table 4). For infant 2, time-seriesorganism abundance patterns in the sample sets ana-lyzed via ESOM (Figure 3) defined 11 major genomebins for which between 27 and 99% of the single copygenes were identified (Table 4).Infant 1 and infant 2’s gastrointestinal tract (GIT) mi-

crobial communities are distinctly different. Infant 1’scolonization pattern echoes the canonical observation ininfant GIT succession that facultative anaerobes domin-ate early phase colonization whereas late stage colo-nizers are primarily obligate anaerobes [12]. This shift isobserved on day of life 12 in infant 1, but is not ob-served in infant 2, in whom facultative anaerobes wereobserved throughout the study period. The metage-nomic EMIRGE analyses corroborated the binning-based compositional analyses in that no sequences fornew taxa were assembled for scaffolds included in theESOM. Some 16S rRNA genes were identified in themetagenomic assemblies and match EMIRGE generatedsequences with approximately 100% identity. The E.faecalis sequence from infant 1 was not identified byEMIRGE due to low abundance, but was extracted fromthe assembly using RNAmmer for the phylogeneticanalysis [55].

Table 2 Sample collection summary and summary of thenumber of 16S rRNA genes assembledCharacteristic Infant 1 Infant 2

No. of samples

Electronics 10 4

Surfaces 7 5

Incubator 8 4

Sink 9 10

Hands 8 2

Tubes 6 4

Fecal 9 7

Total 57 36

No. of EMIRGE sequences

Electronics 3,359 1,298

Surfaces 2,440 2,205

Incubators 2,270 1,751

Sinks 2,936 4,766

Hands 1,783 812

Tubes 272 198

Fecal 33 32

Total 13,093 11,062

No. of OTUs

Electronics 3,353 1,293

Surfaces 2,436 2,197

Incubators 2,264 1,749

Sinks 2,933 4,762

Hands 1,781 812

Tubes 271 198

Fecal 33 32

Total 13,071 11,043

Shared OTUs 3,822

No. of unique OTUs

Electronics 2,486 1,202

Surfaces 2,211 2,015

Incubators 2,048 1,606

Sinks 2,756 4,453

Hands 1,603 801

Tubes 256 185

Fecal 11 11

Total 10,371 10,273

EMIRGE ‘expectation maximization iterative reconstruction of genes from theenvironment’, OTU operational taxonomic unit.

Table 3 Alpha diversity indexes from neonatal intensivecare unit (NICU) room and fecal samplesInfant Shannon Simpson Chao 1

1 2 1 2 1 2

Surfaces 8.42848 8.76498 0.997065 0.997677 42,978.9 47,467.2

Electronics 8.36375 8.27527 0.996905 0.996620 45,519.9 33,602.8

Incubators 8.11070 8.76042 0.996291 0.997674 30,216.9 76,196.9

Sinks 8.29052 8.82959 0.996676 0.997687 41,104.6 96,694.1

Hands 7.56186 8.60501 0.993397 0.997322 27,708.1 89,233.5

Tubes 5.06097 5.20681 0.961848 0.963895 1,756.60 1,828.00

Fecal 1.71097 2.10295 0.640741 0.747619 9.70000 13.7000

Brooks et al. Microbiome 2014, 2:1 Page 5 of 16http://www.microbiomejournal.com/content/2/1/1

Page 31: UC Davis EVE161 Lecture 9 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Results

Page 32: UC Davis EVE161 Lecture 9 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Stability

using the aforementioned database. After the final iter-ation, 153,980 reads, spanning all samples, were used inreconstructing fecal 16S rRNA sequences. Downstreamfiltering and analysis of reconstructed 16S rRNA gene se-quences from fecal samples followed that of the roomsamples.

Community analysis of room and fecal samplesFor community analysis, EMIRGE-reconstructed se-quences were input into the standard QIIME 1.5.0 work-flow [38]. For presence/absence analyses, representativeoperational taxonomic units (OTUs) were clustered atthe >97% identity level using USEARCH [33] and anOTU table was constructed using QIIME’s pick_otus_through_otu_table.py script. An adjusted OTU table thatincorporated EMIRGE generated abundances was con-structed using an in-house script [29] and is publiclyavailable as a project attachment at http://ggkbase.berkeley.edu/NICU-Micro/. OTUs were aligned to the Greengenes[39] reference alignment (gg_97_otus_4feb2011.fasta) usingthe PyNAST aligner [40] and a phylogenetic tree builtusing FastTree v.2.1.3 [41] with default parameters. Betadiversity was calculated from similar trees using FastUniFrac scores and visualized with principle coordinatesanalysis (PCoA) [42]. Taxonomy was assigned to each OTUat the genera and/or species level using the RibosomalDatabase Project (RDP) classifier [43] at a confidenceinterval of 0.8 and trained with the same Greengenesdatabase. OTUs were visualized across room-infant pairsin a spring-weighted, edge-embedded network plot byusing QIIME’s make_otu_network.py script [38] with themodified OTU table as input.

Metagenomic assembly and gene predictionAssemblies were constructed using idba_ud [44] and aniterative implementation of Velvet [45,46]. For idba_udassemblies, trimmed reads were assembled using defaultparameters. For the Velvet assemblies, sequence cover-age bins representing major genomes in the dataset wereidentified by first running the program with permissiveparameters in which the k-mer size covered the wholerange of observed coverages. We summed the k-mercoverages for all contigs generated by this assembly todefine the coverage bins (each of which contains one ormore genomes). This provided bin-specific expectedcoverage, k-mer size, coverage cutoff, and coverage col-lection threshold parameters for the iterative assembly.After each iteration targeting a specific bin, the bin-specific reads were removed from the dataset.Time-series-coverage-based emergent self-organizing

maps (ESOMs) were used to bin scaffolds generated bymetagenomic assembly [47]. Genes were predicted andtranslated into protein sequences using Prodigal [48].Functional annotation was added with an in-house

pipeline [46]. Genome completeness was determinedbased on the number of single-copy genes and otherconserved genes [49,50] identified in each bin. The rela-tive abundance of each organism in each sample was cal-culated by mapping reads to unique regions on theassembled genomes. Metagenomic assemblies along withtheir annotations are publicly available at http://ggkbase.berkeley.edu/NICU-Micro/.

Enterococcus faecalis concatenated ribosomal proteinphylogenyFor phylogenetic resolution beyond the 16S rRNA gene,32 highly conserved, single copy ribosomal proteins wereused from infant 1 and 2’s assemblies (RpL10, 13, 14, 16,17, 18, 19, 2, 20, 21, 22, 24, 27, 29, 3, 30, 4, 5, andRpS10, 11, 12, 13, 15, 16, 17, 18, 19, 20, 5, 6, 7, 8). Thesame genes from recently sequenced E. faecalis ge-nomes, in addition to genes from more distantly relatedtaxa, were obtained from the JGI IMG database. To-gether, each gene set was aligned using MUSCLE 3.8.31[51,52] and manually curated to remove ambiguouslyaligned regions and end gaps [53]. The curated align-ments were concatenated to form a 32-gene, 39-taxa,4,101-position alignment. A maximum likelihood phyl-ogeny for the concatenated alignment was conductedusing PhyML under the LG + α + γ model of evolutionwith 100 bootstrap replicates.

ResultsStability of NICU room samples over time and spaceAfter sample preparation, 57 and 36 samples amplifiedsuccessfully and were subsequently analyzed for infant 1and infant 2, respectively (Table 2). EMIRGE generatedapproximately 12,000 full-length 16S rRNA sequencesand OTUs for each room-infant pair (clustered at the97% nucleotide identity level). Broadly speaking, speciesrichness decreased from electronics > sinks > surfaces >incubators > hands > tubes, a finding that was corrobo-rated with several alpha diversity indexes (Table 3).Nearly 300 genera were detected in the NICU. Tobroadly visualize temporal stability of environmentsacross time and space, the phylum level classificationsare plotted in Figure 1. Actinobacteria, Firmicutes, andProteobacteria dominate the sampled environments, withareas most exposed to human skin deposition having themost variation over time. At lower taxonomic levels, simi-lar trends are observed. Based on the 20 most abundantfamilies, frequently touched surfaces are distinct from in-frequently touched surfaces (Figure 1). UniFrac distance-based community composition PCoA reveals four discern-ible ecosystem types (skin associated communities, sinks,tubes, and feces) and confirms clustering of samples proneto skin deposition via touching (Figure 2).

Brooks et al. Microbiome 2014, 2:1 Page 4 of 16http://www.microbiomejournal.com/content/2/1/1

Time-series characterization of fecal samplesMore than 94% of the reads from infant 1’s samplesmapped to scaffolds generated by the idba_ud assembly.Consequently, this assembly was accepted for furtheranalysis. In comparison, the initial idba_ud assembly of

metagenomic data from infant 2 was highly fragmented,and less than 40% of reads could be mapped to the as-sembled scaffolds. Subsequent reassembly of metage-nomic data from infant 2’s samples using the iterativeVelvet-based assembly approach [54] generated a signifi-cantly better result. As >90% of reads could be mappedto the scaffolds generated by the Velvet assembly, thisassembly was chosen for further analysis.The de novo assemblies reconstructed a majority of

the genomes for 4 of the 5 and 8 of the 11 most abun-dant bacterial colonists from infant 1 and infant 2’smetagenomes, respectively. For infant 1, time-series or-ganism abundance patterns in the sample sets analyzedvia ESOM (Figure 3) defined five major genome bins forwhich between 37% and 99% of the single copy geneswere identified, based on standard analyses of the singlecopy gene inventory (Table 4). For infant 2, time-seriesorganism abundance patterns in the sample sets ana-lyzed via ESOM (Figure 3) defined 11 major genomebins for which between 27 and 99% of the single copygenes were identified (Table 4).Infant 1 and infant 2’s gastrointestinal tract (GIT) mi-

crobial communities are distinctly different. Infant 1’scolonization pattern echoes the canonical observation ininfant GIT succession that facultative anaerobes domin-ate early phase colonization whereas late stage colo-nizers are primarily obligate anaerobes [12]. This shift isobserved on day of life 12 in infant 1, but is not ob-served in infant 2, in whom facultative anaerobes wereobserved throughout the study period. The metage-nomic EMIRGE analyses corroborated the binning-based compositional analyses in that no sequences fornew taxa were assembled for scaffolds included in theESOM. Some 16S rRNA genes were identified in themetagenomic assemblies and match EMIRGE generatedsequences with approximately 100% identity. The E.faecalis sequence from infant 1 was not identified byEMIRGE due to low abundance, but was extracted fromthe assembly using RNAmmer for the phylogeneticanalysis [55].

Table 2 Sample collection summary and summary of thenumber of 16S rRNA genes assembledCharacteristic Infant 1 Infant 2

No. of samples

Electronics 10 4

Surfaces 7 5

Incubator 8 4

Sink 9 10

Hands 8 2

Tubes 6 4

Fecal 9 7

Total 57 36

No. of EMIRGE sequences

Electronics 3,359 1,298

Surfaces 2,440 2,205

Incubators 2,270 1,751

Sinks 2,936 4,766

Hands 1,783 812

Tubes 272 198

Fecal 33 32

Total 13,093 11,062

No. of OTUs

Electronics 3,353 1,293

Surfaces 2,436 2,197

Incubators 2,264 1,749

Sinks 2,933 4,762

Hands 1,781 812

Tubes 271 198

Fecal 33 32

Total 13,071 11,043

Shared OTUs 3,822

No. of unique OTUs

Electronics 2,486 1,202

Surfaces 2,211 2,015

Incubators 2,048 1,606

Sinks 2,756 4,453

Hands 1,603 801

Tubes 256 185

Fecal 11 11

Total 10,371 10,273

EMIRGE ‘expectation maximization iterative reconstruction of genes from theenvironment’, OTU operational taxonomic unit.

Table 3 Alpha diversity indexes from neonatal intensivecare unit (NICU) room and fecal samplesInfant Shannon Simpson Chao 1

1 2 1 2 1 2

Surfaces 8.42848 8.76498 0.997065 0.997677 42,978.9 47,467.2

Electronics 8.36375 8.27527 0.996905 0.996620 45,519.9 33,602.8

Incubators 8.11070 8.76042 0.996291 0.997674 30,216.9 76,196.9

Sinks 8.29052 8.82959 0.996676 0.997687 41,104.6 96,694.1

Hands 7.56186 8.60501 0.993397 0.997322 27,708.1 89,233.5

Tubes 5.06097 5.20681 0.961848 0.963895 1,756.60 1,828.00

Fecal 1.71097 2.10295 0.640741 0.747619 9.70000 13.7000

Brooks et al. Microbiome 2014, 2:1 Page 5 of 16http://www.microbiomejournal.com/content/2/1/1

Page 33: UC Davis EVE161 Lecture 9 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Time Series of Rooms

Electronics Hands Incubators

Sinks Surfaces Tubes

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

Phylum Cyanobacteria OtherActinobacteria Bacteroidetes Firmicutes Fusobacteria Proteobacteria Unclassified

Electronics Hands Incubator

Sinks Surfaces Tubes

3 9 12 15 18 21 24 27 30 3 6 9 12 15 18 21 24 27 30 3 6 9 12 15 18 21 24 27 30 3 6 9 12 15 18 21 24 27 30 3 6 9 12 15 18 21 24 27 30 9 12 15 18 21 24 27 30

day of life

day of life

rela

tive

abun

danc

ere

lativ

e ab

unda

nce

Infant 1 Infant 2

Electronics Hands Incubators

Sinks Surfaces Tubes

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

Electronics Hands Incubators

Sinks Surfaces Tubes

3 6 9 12 15 18 21 24 27 30 3 6 9 12 15 18 21 24 27 30 3 6 9 12 15 18 21 24 27 30 3 6 9 12 15 18 21 24 27 30 3 6 9 12 15 18 21 24 27 30 9 12 15 18 21 24 27 30

Family

Aerococcaceae

Aeromonadaceae

Bacillaceae

Carnobacteriaceae

Caulobacteraceae

Clostridiaceae

Comamonadaceae

Corynebacteriaceae

Enterobacteriaceae

Enterococcaceae

Gemellaceae

Lactobacillaceae

Micrococcaceae

Moraxellaceae

Neisseriaceae

Other

Pasteurellaceae

Propionibacteriaceae

Pseudomonadaceae

Rhizobiaceae

Sphingomonadaceae

Staphylococcaceae

Streptococcaceae

Unclassified

Xanthomonadaceae

6 3 6

3 6

Figure 1 Taxonomic classification of neonatal intensive care unit (NICU) room microbes for infants 1 and 2. Phylum-level (top) andfamily-level (bottom) classifications were assigned using the Ribosomal Database Project (RDP) classifier on assembled full-length 16S rRNA genes.Day of life (DOL) is plotted on the X axis and relative abundance, generated by ‘expectation maximization iterative reconstruction of genes fromthe environment’ (EMIRGE), is plotted on the Y axis.

Brooks et al. Microbiome 2014, 2:1 Page 6 of 16http://www.microbiomejournal.com/content/2/1/1

Page 34: UC Davis EVE161 Lecture 9 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Electronics Hands Incubators

Sinks Surfaces Tubes

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

Phylum Cyanobacteria OtherActinobacteria Bacteroidetes Firmicutes Fusobacteria Proteobacteria Unclassified

Electronics Hands Incubator

Sinks Surfaces Tubes

3 9 12 15 18 21 24 27 30 3 6 9 12 15 18 21 24 27 30 3 6 9 12 15 18 21 24 27 30 3 6 9 12 15 18 21 24 27 30 3 6 9 12 15 18 21 24 27 30 9 12 15 18 21 24 27 30

day of life

day of life

rela

tive

abun

danc

ere

lativ

e ab

unda

nce

Infant 1 Infant 2

Electronics Hands Incubators

Sinks Surfaces Tubes

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

Electronics Hands Incubators

Sinks Surfaces Tubes

3 6 9 12 15 18 21 24 27 30 3 6 9 12 15 18 21 24 27 30 3 6 9 12 15 18 21 24 27 30 3 6 9 12 15 18 21 24 27 30 3 6 9 12 15 18 21 24 27 30 9 12 15 18 21 24 27 30

Family

Aerococcaceae

Aeromonadaceae

Bacillaceae

Carnobacteriaceae

Caulobacteraceae

Clostridiaceae

Comamonadaceae

Corynebacteriaceae

Enterobacteriaceae

Enterococcaceae

Gemellaceae

Lactobacillaceae

Micrococcaceae

Moraxellaceae

Neisseriaceae

Other

Pasteurellaceae

Propionibacteriaceae

Pseudomonadaceae

Rhizobiaceae

Sphingomonadaceae

Staphylococcaceae

Streptococcaceae

Unclassified

Xanthomonadaceae

6 3 6

3 6

Figure 1 Taxonomic classification of neonatal intensive care unit (NICU) room microbes for infants 1 and 2. Phylum-level (top) andfamily-level (bottom) classifications were assigned using the Ribosomal Database Project (RDP) classifier on assembled full-length 16S rRNA genes.Day of life (DOL) is plotted on the X axis and relative abundance, generated by ‘expectation maximization iterative reconstruction of genes fromthe environment’ (EMIRGE), is plotted on the Y axis.

Brooks et al. Microbiome 2014, 2:1 Page 6 of 16http://www.microbiomejournal.com/content/2/1/1

Time Series of Rooms

Page 35: UC Davis EVE161 Lecture 9 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Electronics Hands Incubators

Sinks Surfaces Tubes

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

Phylum Cyanobacteria OtherActinobacteria Bacteroidetes Firmicutes Fusobacteria Proteobacteria Unclassified

Electronics Hands Incubator

Sinks Surfaces Tubes

3 9 12 15 18 21 24 27 30 3 6 9 12 15 18 21 24 27 30 3 6 9 12 15 18 21 24 27 30 3 6 9 12 15 18 21 24 27 30 3 6 9 12 15 18 21 24 27 30 9 12 15 18 21 24 27 30

day of life

day of life

rela

tive

abun

danc

ere

lativ

e ab

unda

nce

Infant 1 Infant 2

Electronics Hands Incubators

Sinks Surfaces Tubes

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

Electronics Hands Incubators

Sinks Surfaces Tubes

3 6 9 12 15 18 21 24 27 30 3 6 9 12 15 18 21 24 27 30 3 6 9 12 15 18 21 24 27 30 3 6 9 12 15 18 21 24 27 30 3 6 9 12 15 18 21 24 27 30 9 12 15 18 21 24 27 30

Family

Aerococcaceae

Aeromonadaceae

Bacillaceae

Carnobacteriaceae

Caulobacteraceae

Clostridiaceae

Comamonadaceae

Corynebacteriaceae

Enterobacteriaceae

Enterococcaceae

Gemellaceae

Lactobacillaceae

Micrococcaceae

Moraxellaceae

Neisseriaceae

Other

Pasteurellaceae

Propionibacteriaceae

Pseudomonadaceae

Rhizobiaceae

Sphingomonadaceae

Staphylococcaceae

Streptococcaceae

Unclassified

Xanthomonadaceae

6 3 6

3 6

Figure 1 Taxonomic classification of neonatal intensive care unit (NICU) room microbes for infants 1 and 2. Phylum-level (top) andfamily-level (bottom) classifications were assigned using the Ribosomal Database Project (RDP) classifier on assembled full-length 16S rRNA genes.Day of life (DOL) is plotted on the X axis and relative abundance, generated by ‘expectation maximization iterative reconstruction of genes fromthe environment’ (EMIRGE), is plotted on the Y axis.

Brooks et al. Microbiome 2014, 2:1 Page 6 of 16http://www.microbiomejournal.com/content/2/1/1

Time Series of Rooms

Page 36: UC Davis EVE161 Lecture 9 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Room and Gut

Infant 1 Infant 2Hands, Electronics, Surfaces,Incubator, Tubes, Fecal, Sinks

Figure 2 Principle coordinates analysis (PCoA) based on UniFrac scores of room and gut microbes. Analysis reveals four discernibleecosystem clusters: skin associated communities, sinks, tubes, and feces.

Figure 3 Time-series coverage emergent self-organizing maps (ESOMs) reveal discrete genome bins for each infant’s dataset. Theunderlying ESOMs are shown in a tiled display with each data point colored by its taxonomic assignment. Labels to the left are colored to matchtheir respective data points and numbers in parentheses correspond to the bin numbers in Table 4.

Brooks et al. Microbiome 2014, 2:1 Page 7 of 16http://www.microbiomejournal.com/content/2/1/1

Page 37: UC Davis EVE161 Lecture 9 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Different Infants

Infant 1 Infant 2Hands, Electronics, Surfaces,Incubator, Tubes, Fecal, Sinks

Figure 2 Principle coordinates analysis (PCoA) based on UniFrac scores of room and gut microbes. Analysis reveals four discernibleecosystem clusters: skin associated communities, sinks, tubes, and feces.

Figure 3 Time-series coverage emergent self-organizing maps (ESOMs) reveal discrete genome bins for each infant’s dataset. Theunderlying ESOMs are shown in a tiled display with each data point colored by its taxonomic assignment. Labels to the left are colored to matchtheir respective data points and numbers in parentheses correspond to the bin numbers in Table 4.

Brooks et al. Microbiome 2014, 2:1 Page 7 of 16http://www.microbiomejournal.com/content/2/1/1

Page 38: UC Davis EVE161 Lecture 9 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Genomes

Highly connected BE microbesThe distribution of shared OTUs across sampled sites wasvisualized through a spring-weighted edge-embedded net-work plot. To limit the noise from infrequently detectedmicroorganism types, we restricted the plot to OTUsoccurring in two or more samples from each infant(Figure 4). The spring weight is derived from EMIRGE gen-erated abundances, and the distribution of OTUs in theplot is governed both by frequency of occurrence and abun-dance. In Figure 4, the circular white nodes (representingOTUs) found in many environment types (more edges) arepulled closer to the middle of the network whereas OTUs

shared by only two samples (fewer edges) are positionedcloser to the periphery of the network. The top 5% of mostfrequently occurring OTUs aggregate in a central cluster inthe middle of the network. Similar to the PCoA plot, gen-eral clustering is observed based on environment type (thatis, skin-associated sites cluster together, as do sink samples).When restricting the network for OTUs only found in fecalsamples (Figure 4, enlargements), one can visualize theOTU distribution across the sampled NICU environments.Three highly connected OTUs are present in fecal samples,two of which are in the top 5% most frequently occurringOTUs in infant 1’s room samples. Several of the OTUs in

Table 4 Genome summariesTaxa Bin no. bp Contigs N50 % GC Cvg % SCG

Infant 1:

Bacteroides fragilis 6 4,551,095 39 249,654 43.3 1,930.3 99

Bacteroides phage1 4 205,842 1 205,842 41.9 2,221.4 0

Bacteroides phage2 5 144,903 1 144,903 42.0 2,060.8 0

Enterococcus faecalis 8 2,649,897 93 40,945 37.8 7.6 99

Clostridium ramosum 7 3,630,043 63 78,436 31.4 23.5 99

Escherichia coli 3 5,035,302 53 218,574 50.5 1,254.1 57

Klebsiella pneumoniae 1 5,447,442 78 189,741 57.3 345.0 37

Staphylococcus epidermidis plasmid 2 20,739 2 11,095 31.5 14.5 0

Infant 2:

Actinomyces neuii strain 1 18 1,580,717 37 280,583 56.9 15.6 27

Actinomyces neuii strain 2 24 2,375,188 27 179,095 56.7 17.6 70

Actinomyces sp. 6 2,666,449 11 345,356 59.3 55.4 99

Anaerococcus prevotii 1 1,599,845 13 225,571 33.1 39.2 99

Caudovirales bacteriophage 26 18,308 1 18,308 29.5 1,169.7 0

Dermabacter sp. 4 2,040,279 12 289,797 62.8 51.9 90

Enterococcus faecalis 9 3,011,019 26 499,183 37.1 147.3 99

Enterococcus faecalis phage 14 335,286 39 12,896 34.8 103.7 0

Enterococcus faecalis plasmid 22 8,514 2 4,866 30.4 90.6 0

Finegoldia magna 7 1,729,913 42 78,482 32.0 93.0 99

Finegoldia phage 25 3,168 1 3,168 32.3 138.5 0

Finegoldia plasmid 1 23 7,589 2 3,969 33.0 103.4 0

Finegoldia plasmid 2 21 28,958 3 15,674 55.4 10.9 0

Pseudomonas aeruginosa 5 6,755,599 64 212,603 66.0 51.5 99

Staphylococcus epidermidis 10 1,902,759 82 40,484 33.0 65.4 7

Staphylococcus epidermidis mobile 17 55,503 10 6,452 31.7 54.5 43

Staphylococcus epidermidis phage 2 11 19,082 2 12,983 29.4 84.3 0

Staphylococcus epidermidis strain 3 81,754 9 14,965 29.4 67.1 0

Staphylococcus phage 1 13 216,785 13 8,080 29.5 45.7 0

Staphylococcus phage 2 16 198,742 14 20,782 0.3 79.3 0

Staphylococcus phage 3 and plasmid 15 137,609 12 19,343 29.3 67.8 0

Staphylococcus warneri 8 2,363,750 22 198,467 32.8 33.9 53

Veillonella sp. 2 2,281,484 223 12,637 37.8 56.2 70

Brooks et al. Microbiome 2014, 2:1 Page 8 of 16http://www.microbiomejournal.com/content/2/1/1

Page 39: UC Davis EVE161 Lecture 9 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Fecal Sinks Tubes HandsElectronics Surfaces Incubators

= OTUs = Samples

(b)

(a)

Figure 4 Spring-weighted edge-embedded network plots of room and fecal operational taxonomic units (OTUs). Found in two or moresamples (infant 1 (a), infant 2 (b)). Left, the entire network is displayed. To better visualize the distribution of gut colonizers across room samples,only room samples sharing fecal OTUs are shown in the excerpt (right). Triangles represent samples and circles represent OTUs. The springweight is derived from ‘expectation maximization iterative reconstruction of genes from the environment’ (EMIRGE) generated abundances andedges are colored by environment type. Each OTU has a taxonomic label and asterisks indicate OTUs detected in room samples before detectionin the gut.

Brooks et al. Microbiome 2014, 2:1 Page 9 of 16http://www.microbiomejournal.com/content/2/1/1

Page 40: UC Davis EVE161 Lecture 9 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Gut vs Roominfant 2’s fecal samples fall within the top ten most fre-quently occurring OTUs in the room environment.Interestingly, infant 2’s most abundant gut colonists,Staphylococcus sp. and E. faecalis, are the two most fre-quently occurring OTUs in the room environment.

The NICU as a reservoir for gut colonistsFigure 5 summarizes the gut colonizing organisms foundin room samples at the genera level. Typically, for bothinfants, electronics had the lowest relative abundance oforganisms detected in the gut whereas tubing had the

Electronics Hands Incubators

Sinks Surfaces Tubes

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

Phylum / Genus

Bacteroidetes / Bacteroides did not colonize gut

Proteobacteria / Escherichia Proteobacteria / Klebsiella

Species

Dermabacter sp.

P. aeruginosa Prochlorothrix sp. S. epidermidis Veillonella sp.

Phylum / GenusCyanobacteria / Prochlorothrix

did not colonize gut Firmicutes / Anaerococcus

Firmicutes / Staphylococcus

day of life

day of life

rela

tive

abun

danc

e

day of life

rela

tive

abun

danc

e

day of life

rela

tive

abun

danc

ere

lativ

e ab

unda

nce

Fecal

0.00

0.25

0.50

0.75

1.00

6 9 12 15 18 21 24 27 30

Infant 1

Infant 2

Species

Electronics Hands Incubator

Sinks Surfaces Tubes

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

3 6 9 12 15 18 21 24 27 30 3 6 9 12 15 18 21 24 27 30 3 6 9 12 15 18 21 24 27 30

Firmicutes / Clostridium Firmicutes / Enterococcus

A. prevotti Actinomyces sp. E. faecalis F. magnaActinobacteria / Actinomyces Actinobacteria / Dermabacter

Firmicutes / Enterococcus Firmicutes / Finegoldia

Firmicutes / Veillonella Proteobacteria / Pseudomonas

B. fragilis C. ramosum E. coli E. faecalis K. pneumoniae

3 6 9 12 15 18 21 24 27 30 3 6 9 12 15 18 21 24 27 30 3 6 9 12 15 18 21 24 27 30

Fecal

0.00

0.25

0.50

0.75

1.00

12 15 18 21 24 27 30

Figure 5 Community composition of gut colonizing microbes and room microbes through the first month of life. Time-seriescharacterization of the fecal microbial community (left) and fecal microbes concurrently collected from the room (right) display discrete reservoirsof gut colonizers in the neonatal intensive care unit.

Brooks et al. Microbiome 2014, 2:1 Page 10 of 16http://www.microbiomejournal.com/content/2/1/1

Page 41: UC Davis EVE161 Lecture 9 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

SourceTracker

highest. Temporal variation of gut genera was extremein most environments.The use of Bayesian microbial source tracking software

[56], with the perspective of room samples as the sourceand fecal samples as the sink, produced mixed results interms of finding likely gut reservoirs (Figure 6). In infant1, tubing, surfaces, and electronics had the highest prob-abilities as sources, but the bloom of Bacteroides fragilis,from a source not detected by our sampling regime, low-ered the probability of sampled source environments forthe latter half of the sampling period. Infant 2’s samplesshowed the opposite pattern in that early gut colonistsmigrated from an unknown reservoir, whereas later insampling, incubator, tubing, surfaces, and hands werethe most probable reservoir.

Shared gut colonizersThe infant cohort shared only one gut colonizer, E.faecalis, which contained 100% 16S rRNA gene levelsequence identity. A higher resolution analysis using aconcatenated alignment of 32 highly conserved, single-copy genes show the strains differ by only 2 amino acidsacross the 4,101 positions. These two E. faecalis strainsphylogenetically cluster most closely to each other, but arevery closely related to other E. faecalis strains (Figure 7).To further explore similarity of shared strains, reads

from infant 1 were mapped to infant 2’s assembled con-tigs. Infant 1’s reads covered 95% of the length of infant2’s assembly at an average of 4.66X coverage. Read

mapping revealed two distinct SNP profiles for infant 1’sreads, a major strain divergent from infant 2’s assemblyand a minor strain identical to the strain in infant 2. Inall, 77% of the length of infant 2’s E. faecalis assembly iscovered by infant 1’s reads mapped as mate pairs withno mismatches. This suggests that infant 1’s E. faecalisminor strain is the same strain dominating infant 2’s gut.Pheromone-responsive plasmids were found in both in-fants. The plasmid from infant 2 occurs in low abun-dance in infant 1 (as expected based on the lowrepresentation of E. faecalis in infant 1), but with highsequence identity.

Genes relevant to adaptation to the NICU environmentAnalysis of reconstructed genomes for gut microorgan-isms can lend clues as to how organisms detected in theGIT and room environment are able to persist in theNICU, which is subjected to regular cleaning/sterilization.Numerous antibiotic resistance genes were found in ge-nomes of microorganisms in fecal samples of both infants.A large portion of these were efflux pumps, with represen-tatives from all four families of multidrug transporters:major facilitator superfamily (MFS), small multidrug re-sistant (SMR), resistance-nodulation-cell division (RND),and multidrug and toxic compound extrusion (MATE)proteins [57]. Particularly interesting are genes encodingthe QacA/B MFS, SugE SMR, and MexA/B RND proteins,which are a growing concern in hospitals due to coselec-tion through the practice of combining two or more types

Sinks Surfaces

0.00

0.25

0.50

0.75

1.00

Electronics Hands Incubators Tubes Unknown

6 9 12 15 18 21 24 27 30 12 15 18 21 24 27 30

day of life

% p

roba

bilit

y of

sou

rce

Infant 1 Infant 2

Figure 6 The most probable source of gut colonizing microbes. This was generated using the source-sink characterization software, Source-Tracker. Neonatal intensive care unit room sequences were designated as putative sources and fecal sequences sinks.

Brooks et al. Microbiome 2014, 2:1 Page 11 of 16http://www.microbiomejournal.com/content/2/1/1

Page 42: UC Davis EVE161 Lecture 9 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

E. faecalis - the one taxon shared between infants

of antibiotic treatments [58]. Resistance to multiple typesof antibiotics can arise from a single resistance mechanismsuch as efflux pumping [59]. In addition to antibiotics,these pumps can expel quaternary ammonium com-pounds (QACs), the active biocide in the detergent usedto clean hospital surfaces during the study. Other notableobservations were the presence of biofilm forming genesin most colonizers, which can be induced by exposure toaminoglycosides [60], a suite of genes that confer resist-ance to starvation, and the presence of antibiotic resist-ance genes encoded on several phage and plasmidgenomes, as well as microbial genomes.

DiscussionIncreasing throughput, decreasing cost, and rapid devel-opment of informatics and sequencing pipelines hasreshaped the field of microbial ecology, allowing re-searchers to survey a breadth of new environments[34,61-63]. Recently, the first ICU survey to utilize nextgeneration sequencing technology was published [8] andshowed a surprising amount of bacterial diversity for anenvironment under constant attack via aggressive sanita-tion and antibiotic treatment efforts. The consortia weregenerally diverse, but some consortia contained a highrepresentation of members of the family Enterobacteria-ceae, typically considered to be gut microbes. Shortlyafter this publication, a study characterizing a snapshotof surfaces and sinks in two NICU rooms corroboratedhigh proportions of fecal coliform bacteria on surfacesamples [10]. Certainly the NICU has the capacity to re-tain enteric microbes, but their propensity to migrate tothe gut remains unclear.

Next-generation sequencing surveys in the ICU havereported high levels of community diversity. Poza et al.found 1,145 distinct OTUs in an ICU in Spain [8] andsubsequent studies reported 1,621 and 3,925 OTUs in aNICU in the US and in an Austrian ICU, respectively[9,10]. While comparing these studies is difficult due todifferences in sample size and protocols, we can begin toappreciate the need to better understand why so manytypes of bacteria can be found in a regularly cleaned en-vironment. Our study, the first time series survey of anICU using next-generation sequencing technologies, un-veiled over 20,000 OTUs across 2 NICU rooms occupiedby different infants with partial time overlap. Our studyis distinct from prior NICU surveys in that it usedamplicon-EMIRGE, a 16S rRNA gene assembly softwarewhich can be more sensitive in OTU detection [29] andprovide increased confidence when making lower taxo-nomic level classifications [64]. The increase in OTUsfrom study to study might be attributed to increases insequencing read lengths and, in this study, increased in-formation from reassembled, full-length genes, but thebiological relevance of this increase is unclear. Notably,of the over 20,000 OTUs characterized here, only 984were found in 2 or more samples. Further surveys areneeded, integrating time-series sampling and samplesfrom multiple surface types from different hospitals, tobetter characterize the expected number of OTUs in anICU and the implications of this number for ICUoccupants.The increased sensitivity provided by EMIRGE was

helpful when evaluating temporal patterns, especiallypertaining to source-sink characterization. Similarly, our

Streptococcus_suis_JS14

Streptococcus_suis_SC070731

Streptococcus_suis_ST1

Streptococcus_gallolyticus_subsp._gallolyticus_ATCC_BAA-2069

Streptococcus_gallolyticus_subsp._gallolyticus_ATCC_43143_DNA

Streptococcus_agalactiae_2603V/R

Streptococcus_dysgalactiae_subsp._equisimilis_GGS_124_chromosome_1

Streptococcus_equi_subsp._equi_4047

Streptococcus_equi_subsp._zooepidemicus

Streptococcus_equi_subsp._zooepidemicus_str._MGCS10565_2

Streptococcus_equi_subsp._zooepidemicus_str._MGCS10565

100

97

Streptococcus_mutans_NN2025

Streptococcus_mutans_GS-5

Streptococcus_mutans_LJ23

100

100

Enterococcus_faecalis_06-MB-DW-09

Enterococcus_casseliflavus_EC20

Enterococcus_hirae_ATCC_9790

Enterococcus_faecium_Aus0004

Enterococcus_faecium_NRRL_B-2354

Enterococcus_faecalis_V583

Enterococcus_faecalis_62

Enterococcus_faecalis_TX0645

Enterococcus_faecalis_SLO2C-1

Enterococcus_faecalis_TX0312

Enterococcus_faecalis_TX1346

Enterococcus_faecalis_D811610-10

Enterococcus_faecalis_TX0630

Enterococcus_faecalis_B83616-1

Enterococcus_faecalis_KI-6-1-110608-1

Enterococcus_faecalis_ERV63

Infant1_Enterococcus_faecalisInfant2_Enterococcus_faecalisEnterococcus_faecalis_OG1X

Enterococcus_faecalis_Symbioflor_1

Enterococcus_faecalis_OG1RF_ATCC_47077

95

100

100

Staphylococcus_aureus_aureus_11819-97

Staphylococcus_epidermidis_ATCC_12228

Staphylococcus_epidermidis_RP62A

100

100

Klebsiella_pneumoniae_NTUH-K2044

0.1

Enterococcus_faecalis_06-MB-DW-09

Enterococcus_casseliflavus_EC20

Enterococcus_hirae_ATCC_9790

Enterococcus_faecium_Aus0004

Enterococcus_faecium_NRRL_B-2354

100

Enterococcus_faecalis_V583

Enterococcus_faecalis_62

Enterococcus_faecalis_TX0645

Enterococcus_faecalis_SLO2C-1

Enterococcus_faecalis_TX0312

Enterococcus_faecalis_TX1346

Enterococcus_faecalis_D811610-10

Enterococcus_faecalis_TX0630

Enterococcus_faecalis_B83616-1

Enterococcus_faecalis_KI-6-1-110608-1

Enterococcus_faecalis_ERV63

Infant1_Enterococcus_faecalis

Infant2_Enterococcus_faecalis

Enterococcus_faecalis_OG1X

Enterococcus_faecalis_Symbioflor_1

Enterococcus_faecalis_OG1RF_ATCC_47077

95

100

0.01

Figure 7 Enterococcus faecalis phylogeny using 32 concatenated ribosomal proteins reveals closely related strains. The maximumlikelihood phylogeny of E. faecalis strains was based on a concatenation of single-copy, highly conserved ribosomal proteins from our data setand available reference genomes. Bootstrap values greater than 50 are shown. An excerpt of the E. faecalis clade is shown to the right.

Brooks et al. Microbiome 2014, 2:1 Page 12 of 16http://www.microbiomejournal.com/content/2/1/1

Page 43: UC Davis EVE161 Lecture 9 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

highest. Temporal variation of gut genera was extremein most environments.The use of Bayesian microbial source tracking software

[56], with the perspective of room samples as the sourceand fecal samples as the sink, produced mixed results interms of finding likely gut reservoirs (Figure 6). In infant1, tubing, surfaces, and electronics had the highest prob-abilities as sources, but the bloom of Bacteroides fragilis,from a source not detected by our sampling regime, low-ered the probability of sampled source environments forthe latter half of the sampling period. Infant 2’s samplesshowed the opposite pattern in that early gut colonistsmigrated from an unknown reservoir, whereas later insampling, incubator, tubing, surfaces, and hands werethe most probable reservoir.

Shared gut colonizersThe infant cohort shared only one gut colonizer, E.faecalis, which contained 100% 16S rRNA gene levelsequence identity. A higher resolution analysis using aconcatenated alignment of 32 highly conserved, single-copy genes show the strains differ by only 2 amino acidsacross the 4,101 positions. These two E. faecalis strainsphylogenetically cluster most closely to each other, but arevery closely related to other E. faecalis strains (Figure 7).To further explore similarity of shared strains, reads

from infant 1 were mapped to infant 2’s assembled con-tigs. Infant 1’s reads covered 95% of the length of infant2’s assembly at an average of 4.66X coverage. Read

mapping revealed two distinct SNP profiles for infant 1’sreads, a major strain divergent from infant 2’s assemblyand a minor strain identical to the strain in infant 2. Inall, 77% of the length of infant 2’s E. faecalis assembly iscovered by infant 1’s reads mapped as mate pairs withno mismatches. This suggests that infant 1’s E. faecalisminor strain is the same strain dominating infant 2’s gut.Pheromone-responsive plasmids were found in both in-fants. The plasmid from infant 2 occurs in low abun-dance in infant 1 (as expected based on the lowrepresentation of E. faecalis in infant 1), but with highsequence identity.

Genes relevant to adaptation to the NICU environmentAnalysis of reconstructed genomes for gut microorgan-isms can lend clues as to how organisms detected in theGIT and room environment are able to persist in theNICU, which is subjected to regular cleaning/sterilization.Numerous antibiotic resistance genes were found in ge-nomes of microorganisms in fecal samples of both infants.A large portion of these were efflux pumps, with represen-tatives from all four families of multidrug transporters:major facilitator superfamily (MFS), small multidrug re-sistant (SMR), resistance-nodulation-cell division (RND),and multidrug and toxic compound extrusion (MATE)proteins [57]. Particularly interesting are genes encodingthe QacA/B MFS, SugE SMR, and MexA/B RND proteins,which are a growing concern in hospitals due to coselec-tion through the practice of combining two or more types

Sinks Surfaces

0.00

0.25

0.50

0.75

1.00

Electronics Hands Incubators Tubes Unknown

6 9 12 15 18 21 24 27 30 12 15 18 21 24 27 30

day of life

% p

roba

bilit

y of

sou

rce

Infant 1 Infant 2

Figure 6 The most probable source of gut colonizing microbes. This was generated using the source-sink characterization software, Source-Tracker. Neonatal intensive care unit room sequences were designated as putative sources and fecal sequences sinks.

Brooks et al. Microbiome 2014, 2:1 Page 11 of 16http://www.microbiomejournal.com/content/2/1/1

of antibiotic treatments [58]. Resistance to multiple typesof antibiotics can arise from a single resistance mechanismsuch as efflux pumping [59]. In addition to antibiotics,these pumps can expel quaternary ammonium com-pounds (QACs), the active biocide in the detergent usedto clean hospital surfaces during the study. Other notableobservations were the presence of biofilm forming genesin most colonizers, which can be induced by exposure toaminoglycosides [60], a suite of genes that confer resist-ance to starvation, and the presence of antibiotic resist-ance genes encoded on several phage and plasmidgenomes, as well as microbial genomes.

DiscussionIncreasing throughput, decreasing cost, and rapid devel-opment of informatics and sequencing pipelines hasreshaped the field of microbial ecology, allowing re-searchers to survey a breadth of new environments[34,61-63]. Recently, the first ICU survey to utilize nextgeneration sequencing technology was published [8] andshowed a surprising amount of bacterial diversity for anenvironment under constant attack via aggressive sanita-tion and antibiotic treatment efforts. The consortia weregenerally diverse, but some consortia contained a highrepresentation of members of the family Enterobacteria-ceae, typically considered to be gut microbes. Shortlyafter this publication, a study characterizing a snapshotof surfaces and sinks in two NICU rooms corroboratedhigh proportions of fecal coliform bacteria on surfacesamples [10]. Certainly the NICU has the capacity to re-tain enteric microbes, but their propensity to migrate tothe gut remains unclear.

Next-generation sequencing surveys in the ICU havereported high levels of community diversity. Poza et al.found 1,145 distinct OTUs in an ICU in Spain [8] andsubsequent studies reported 1,621 and 3,925 OTUs in aNICU in the US and in an Austrian ICU, respectively[9,10]. While comparing these studies is difficult due todifferences in sample size and protocols, we can begin toappreciate the need to better understand why so manytypes of bacteria can be found in a regularly cleaned en-vironment. Our study, the first time series survey of anICU using next-generation sequencing technologies, un-veiled over 20,000 OTUs across 2 NICU rooms occupiedby different infants with partial time overlap. Our studyis distinct from prior NICU surveys in that it usedamplicon-EMIRGE, a 16S rRNA gene assembly softwarewhich can be more sensitive in OTU detection [29] andprovide increased confidence when making lower taxo-nomic level classifications [64]. The increase in OTUsfrom study to study might be attributed to increases insequencing read lengths and, in this study, increased in-formation from reassembled, full-length genes, but thebiological relevance of this increase is unclear. Notably,of the over 20,000 OTUs characterized here, only 984were found in 2 or more samples. Further surveys areneeded, integrating time-series sampling and samplesfrom multiple surface types from different hospitals, tobetter characterize the expected number of OTUs in anICU and the implications of this number for ICUoccupants.The increased sensitivity provided by EMIRGE was

helpful when evaluating temporal patterns, especiallypertaining to source-sink characterization. Similarly, our

Streptococcus_suis_JS14

Streptococcus_suis_SC070731

Streptococcus_suis_ST1

Streptococcus_gallolyticus_subsp._gallolyticus_ATCC_BAA-2069

Streptococcus_gallolyticus_subsp._gallolyticus_ATCC_43143_DNA

Streptococcus_agalactiae_2603V/R

Streptococcus_dysgalactiae_subsp._equisimilis_GGS_124_chromosome_1

Streptococcus_equi_subsp._equi_4047

Streptococcus_equi_subsp._zooepidemicus

Streptococcus_equi_subsp._zooepidemicus_str._MGCS10565_2

Streptococcus_equi_subsp._zooepidemicus_str._MGCS10565

100

97

Streptococcus_mutans_NN2025

Streptococcus_mutans_GS-5

Streptococcus_mutans_LJ23

100

100

Enterococcus_faecalis_06-MB-DW-09

Enterococcus_casseliflavus_EC20

Enterococcus_hirae_ATCC_9790

Enterococcus_faecium_Aus0004

Enterococcus_faecium_NRRL_B-2354

Enterococcus_faecalis_V583

Enterococcus_faecalis_62

Enterococcus_faecalis_TX0645

Enterococcus_faecalis_SLO2C-1

Enterococcus_faecalis_TX0312

Enterococcus_faecalis_TX1346

Enterococcus_faecalis_D811610-10

Enterococcus_faecalis_TX0630

Enterococcus_faecalis_B83616-1

Enterococcus_faecalis_KI-6-1-110608-1

Enterococcus_faecalis_ERV63

Infant1_Enterococcus_faecalisInfant2_Enterococcus_faecalisEnterococcus_faecalis_OG1X

Enterococcus_faecalis_Symbioflor_1

Enterococcus_faecalis_OG1RF_ATCC_47077

95

100

100

Staphylococcus_aureus_aureus_11819-97

Staphylococcus_epidermidis_ATCC_12228

Staphylococcus_epidermidis_RP62A

100

100

Klebsiella_pneumoniae_NTUH-K2044

0.1

Enterococcus_faecalis_06-MB-DW-09

Enterococcus_casseliflavus_EC20

Enterococcus_hirae_ATCC_9790

Enterococcus_faecium_Aus0004

Enterococcus_faecium_NRRL_B-2354

100

Enterococcus_faecalis_V583

Enterococcus_faecalis_62

Enterococcus_faecalis_TX0645

Enterococcus_faecalis_SLO2C-1

Enterococcus_faecalis_TX0312

Enterococcus_faecalis_TX1346

Enterococcus_faecalis_D811610-10

Enterococcus_faecalis_TX0630

Enterococcus_faecalis_B83616-1

Enterococcus_faecalis_KI-6-1-110608-1

Enterococcus_faecalis_ERV63

Infant1_Enterococcus_faecalis

Infant2_Enterococcus_faecalis

Enterococcus_faecalis_OG1X

Enterococcus_faecalis_Symbioflor_1

Enterococcus_faecalis_OG1RF_ATCC_47077

95

100

0.01

Figure 7 Enterococcus faecalis phylogeny using 32 concatenated ribosomal proteins reveals closely related strains. The maximumlikelihood phylogeny of E. faecalis strains was based on a concatenation of single-copy, highly conserved ribosomal proteins from our data setand available reference genomes. Bootstrap values greater than 50 are shown. An excerpt of the E. faecalis clade is shown to the right.

Brooks et al. Microbiome 2014, 2:1 Page 12 of 16http://www.microbiomejournal.com/content/2/1/1