Phylodynamics The use of phylogenetics in … use of phylogenetics in epidemiology ... Demographic...
Transcript of Phylodynamics The use of phylogenetics in … use of phylogenetics in epidemiology ... Demographic...
PhylodynamicsTheuseofphylogeneticsinepidemiology
CONORMEEHANUNITOFMYCOBACTERIOLOGY
BIOMEDICALSCIENCES
BIOMEDICALSCIENCES
Masir andCaetano-Anollés,ScienceAdvances2015https://en.wikipedia.org/wiki/Three-domain_system
Bacteria/Virusdiversity
(nogood2domainimage,seeTomWilliamswork)
JohnSnow
FatherofinfectiousdiseasesepidemiologyDemographicinformationScientificmethods
CholeraoutbreakinLondon1854
BIOMEDICALSCIENCES
4
CholerainLondon
DotmapandVoronoidiagram
CenteredonpumpNearcesspit
ShowedconnectionbetweenwatercompaniesandfatalitiesDidn'tleadtoacceptance
Germtheory1864
BIOMEDICALSCIENCES
5
Terminology
MolecularepidemiologyTheinterfacebetweenmolecularbiologyandepidemiologyContributionofgeneticandenvironmentalfactorstopathogenspread
PhylodynamicsTheinterfacebetweenevolutionarybiologyandmolecularepidemiologyEstimatingpathogenevo/epiparametersfromphylogenies
MutationratesTransmissionratesandchains(R0)Populationdynamics
BIOMEDICALSCIENCES
7
Genome-basedphylogenetics/phylodynamics
PhylogenyprogramsbuiltwithassumptionofsinglegeneinputAlldatapresent
Wholegenomedatacanbeinputin2ways:SNPalignment
HaveanascertainmentbiasSelectedonlythevariablesites
BreakthecalculationsWholegenomealignments
LargeamountsofdataIfcombinedwithlargenumberoftaxacanbecomputationallytooexpensive
BIOMEDICALSCIENCES
8
Thepowerofcode
LargedatasetsRequirelotsofstepsandcomputingpowerManualprocessingofthousandsofgenomesisnotfeasible
UNIXpipelinesLoopingonmultiplesets(folders/files)E.g.Assembleallgenomesinthesameway
CodinglanguagePython/Perl/C++/othersProcessingpipelinesE.g.Createwholegenomealignmentsandinputsfortreebuilding
Cloud/ServercomputingServerusagerequiresUNIXknowledgeAmazon,CyVerse,university/national
BIOMEDICALSCIENCES
10
Wholegenomeconversion/DNAreconstitutionmethod
BIOMEDICALSCIENCES
11
Wholegenomealignment
Variablesites Constantsites
CountsofACGTSNPalignment
Phylogeneticmethod
SinglesitecalculationX
CountofbaseCompletecalculations
Leachéetal.SysBio2015;StamatakisascertainmentbiascorrectioninRAxML
(assumesnorecombination)
WholegenomephylodynamicsTheuseofphylogeneticsinepidemiology
CONORMEEHANUNITOFMYCOBACTERIOLOGY
BIOMEDICALSCIENCES
MRSA
Methicillin-resistantStaphylococcusaureusAnySAresistanttoβ-lactamantibiotics
Primarilyhospitalrelatedinfections(HA-MRSA)Nowalsofoundinthecommunity(CA-MRSA)Alsoinlivestockanddomesticanimals(LA-MRSA)
PrimarilywoundinfectionSurgicalandnon-surgicalSkintoskin/infectedobjecttransmission
BIOMEDICALSCIENCES
14
Setting
BabyunitinUKhospitalInfectioncontrolunitscreensallbabiesforS.aureuscarriageusinganasopharyngealswab‘Outbreak’wasdetectedwith3patientscolonizedwithbacteriawiththesameantibioticresistanceprofileReviewedmicrobiologicalrecordsforotherpatientswithS.aureuswithsameresistanceprofile:8morepatientsmatched
BIOMEDICALSCIENCES
17
Joiningofclassicalandmolecularepidemiology
WGScontributedto:
Identifyingtheextentoftheoutbreakwithinhospitalandcommunity
Identifyingthehospitalworkerwholikelyre-introducedMRSAafterdeepcleaningoftheward
Followedbytreatment:endofoutbreak?
BIOMEDICALSCIENCES
21
WheredidHIVcomefrom?
Lentivirusesareknowntoinfectseveralspeciesofprimatesinsub-SaharanAfricaTreeconstructedcontainingsequencesfromSIVandbothHIV-1andHIV-2HIV-1likelyaroseinwesternequatorialAfricaHIV-2likelyaroseinWestAfrica
Primarilyconfinedtheretoo
BIOMEDICALSCIENCES
23WertheimandWorobey,PLOSCompBio2009
WhendidHIVgettohumans?
BIOMEDICALSCIENCES
24WertheimandWorobey,PLOSCompBio2009
EarlyestimationsofHIV-1divergencefromSIVcpzdatedthiseventas~1960(Lietal1988,Mol.Biol.Evol.)Reanalysisfoundthatthisestimationusedtoosimpleamodelofnucleotidesequenceevolution
HIVanalysisusuallyestimatedundertheGTRmodel
Datesspreadthroughoutearly20th century
HowdidHIVgettohumans?
BIOMEDICALSCIENCES
25
Anearlyhypothesis,outlinedin‘TheRiver’byEdwardHooper(1999)suggestedaacontaminatedoralpoliovaccine(OPV)usedinthe1950’sTheotherleadinghypothesisisthatblood-to-bloodtransmissionoccurredfrombutcheredprimatemeattohuntersMolecularevidencewasgatheredbySharpetal (2001,Phil.Trans.R.Soc.Lond.B)toreviewtheseclaims
OPVtrialchimpanzeeswerenotthesameasthosesuggestedtoberesevoirforSIVTheoriginsweredatedas~1931,not1950’
Althoughthebushmeathypothesiscannotbedirectlyproven,phylogeneticandmolecularanalysislendstrongsupportagainsttheOPVhypothesis
HIVtransmissionroutes
SexualcontactAnal(1.43%;(0.62%/0.11%))Vaginal(0.08%;0.04%)Oral(extremelylowbutnotzero)
Blood-borneUnsterilizedpre-usedneedles(0.15-10%;contextdependant)Bloodtransfusions(90%)
Mothertochild15-30%frompregnancy/delivery5-20%frombreastfeeding
Needstocontacttheblood,can’tpassthroughepithelialcellsRiskfactorscanincrease/decrease
Viralload,otherSTIs,tearing,anti-retroviraltreatment
BIOMEDICALSCIENCES
27
PhylogeneticsandcriminalprosecutionofHIVtransmission
IntentionalornegligenttransmissionofHIVcanresultinchargesofassault,manslaughterormurderinseveralcountriesTwothingsoftenmustbeprovenforthis:
ThedefendantwasrecklessThedefendantinfectedthecomplainant
IntheUKitwasrequiredthatscientificevidencemustbeusedtoproveinfection,evenifapleaof‘guilty’wasentered
Phylogeneticsisoftenusedinthisstep
PhylogeneticsisoftenrequiredtoproverecklessnesstooTimeofinfectionmustbeafterthedefendantbecameawareoftheirstatusandbeforethecomplainantbecameawareofthedefendant’sstatus
BIOMEDICALSCIENCES
28
PhylogeneticsandcriminalprosecutionofHIVtransmission
Firstusedin1990inacaseofadentistinfectingseveralpatientsthoughthiscaseneverwenttocourt
FirstusedinacriminalrapecaseinSwedenin1992,thoughdirectionalitywasnotdetermined
In2002phylogeneticanalysiswasusedtoupholdaconvictionduringappealbyagastroenterologistinthe2nd degreemurderchargeofhisgirlfriendafterithadbeenfoundtomeetstandardsofevidenceadmissibility
BIOMEDICALSCIENCES
29
PhylogeneticsandcriminalprosecutionofHIVtransmission
Lemey etal.“MoleculartestingofmultipleHIV-1transmissionsinacriminalcase”,AIDS19(15),2005Onesuspectandsixvictims2samplesfromeachperson,anonymouslylabelledandsequencedforpolandenvfragments30controlstakenfromlocalhospitalfittingascloselytotheage,riskandgeographicalparametersasthesuspect/victimsandfromaroundthesametimeofallegedtransmissionaspossiblePhylogenetictreesbuiltunderMLusing3methodsandalsousingBayesianinference
Sitesknowntoinferdrugresistancewereexcludedtopreventclusteringbasedondrugregimes
BIOMEDICALSCIENCES
30
EvidenceofHIVtransmission
Demonstratedgroupingofsuspectandvictimsamples,monophyletictotheexclusionofcontrols
Noinferencewasmadeaboutdirectionality(usuallyindicatedbyparaphyleticrelationshipofsourcesequencesaroundrecipientsequencesinatimetree)Cannotruleoutcaseofbothsuspectandvictiminfectedbya3rd personorsuspectinfectingapersonwhoinfectedvictimsLocalcontrolselectioniscritical
BIOMEDICALSCIENCES
31
Influenza
SeasonalinfectionFever,musclepains,headache,coughing,nasaldischarge250-500kdeathsayear
CausedbyInfluenzavirusThreetypes(A-C)
AcausesallpandemicsSerotypesbasedonhemagglutinin(H/HA)andneuraminidase(N/NA)
E.g.InfluenzaAH1N1(”Spanishflu”or‘Swineflu”)
BIOMEDICALSCIENCES
33
Trackinganinfluenzaoutbreak
TheH1N1(swineflu)influenzastrainwasfirstidentifiedinApril2009
Withinafewmonthsitreachedpandemicproportions
PhylogeographicanalysisWheredidtheoutbreakstart?Howandwhendiditspread?
BIOMEDICALSCIENCES
34
Trackinganinfluenzaoutbreak
Lemeyetal (2009),“Reconstructingtheinitialglobalspreadofahumaninfluenzapandemic”PLOScurrents242sequences
HAandNAgenesequences40locationsworldwide30thMarchto12thJuly2009
BayesianframeworkHKY+gammamodelRelaxedmolecularclockBSSVSmodelofspatialdiffusion
Bayesianstochasticsearchvariableselection7discretelocationsaspriorsAllowMCMCtoassignlocationprobabilitiestointernalnodes
BIOMEDICALSCIENCES
35
BIOMEDICALSCIENCES
36Lemeyetal.PLoS Curr.2009
PhylogeneticreconstructionindicatesMexicoasthelikelyoriginofthevirus
SeveralUSAstrainswereseededearlyintheoutbreak
MostEuropeanlineagescamefromUSAstrains,notMexico
OriginandspreadofH1N1
Mycobacterium genus
ThegenusMycobacteriumincludesmanyimportanthumanpathogensM.tuberculosis(TB)M.ulcerans (Buruliulcer)M.leprae (Leprosy)
Allothermycobacteriaaretermednontuberculousmycobacteria(NTMs)
PrimarilyenvironmentalManyemergingopportunisticpathogens
BIOMEDICALSCIENCES
39
Mycobacteriumulcerans
CausitiveagentofBuruliUlcer(BU)~6000casesayear(declining)CausesskinulcerationandsometimesboneinvolvementM.ulceransproducesatoxin,mycolactone,whichdamagestissueWARNING:photos!
BIOMEDICALSCIENCES
40
Mycobacteriumulceranstransmission
CurrentlyunknownNotdirectlyhumantohumanProximitytoslowflowing/stagnantwaterPrevailinghypothesis:
Environmentalspeciesthatinfectsaftermicrotrauma
DohumansplayaroleinthespreadofMU?
BIOMEDICALSCIENCES
42
Aimsanddataset
WhatisthepopulationstructureandevolutionaryhistoryofMUinAfrica?Vandelannooteetal.GBE.2017165isolates
1964-2012MostendemicAfricancountriesPapuaNewGuineaoutgroupIlluminareadsassembledwithSnippypipelineSNPalignment
Recombinationfree9,193SNPs
BIOMEDICALSCIENCES
43
Maximumlikelihoodreconstruction
RAxMLv8.2GTRCATwithDNAreconstitutionascertainmentbiascorrection(Stamatakismethod)RoottoTipdistancecalculationsandcorrelationwithTreeStatandR
MRCA:12226
BIOMEDICALSCIENCES
44
Bayesianreconstruction
BEAST2Testedclockandpopulationmodelcombinations(Pathsampling)
Uncorrelatedlognormalandconstantcoalescentfoundtobebest
Testedfortimesignalwithpermutationtestsandprioradjustments
Mutationrate:6.32E-8/site/year[3.90E-8- 8.84E-8]0.33SNPs/chromosome/year[0.20- 0.46]
Introductionscoincidewithcolonisation
BIOMEDICALSCIENCES
46
ThespreadofMycobacteriumulcerans inAfrica
SlowevolvingbacteriumOneoftheslowestratesrecordedClonalexpansionwithnorecombination
MultipleintroductionsEachmajorlineageintroducedseparatelyLikelybeganinSouth-EastAsia(stillbeconfirmed)ExactplaceoffirstintroductionintoAfricanotknown(perhapscentral)
PotentiallyspreadbyhumansInfectedAfricansmovedtonewareaduring’ScrambleforAfrica’Shedintowaterwhichtheninfectsnewhosts
Willtreatmentofhumansdeclinetheenvironmentalpopulationtoo?Populationmodellingsuggestsyes
BIOMEDICALSCIENCES
47
EstimatingtherateofinfectionofEbola
The2013WestAfricanEbolavirusepidemicspreadprimarilythroughGuinea,SierraLeoneandLiberiaandkilledover11,000people
EstimatedthatstrainbeganatafuneralinGuineaisDecember2013
PhylogeneticanalysisshowsMRCAoftheoutbreaktobelateFebruary2014with2strainsintroducedtoSierraLeone
BIOMEDICALSCIENCES
49Stephen K. Gire et al. Science 2014;345:1369-1372
EstimatingtherateofinfectionofEbola
Multiplebirth-deathmodelapproacheswereusedontheSierraLeonesequencestoestimateepidemiologicalparametersacrossaBayesianphylogenyofthesequencesHere,birthistherateoftransmissionfromaninfectiouspersonanddeathistherateofbecomingnon-infectiousthroughrecoveryordeath
BIOMEDICALSCIENCES
50StadlerTetal.PLOSCurrentsOutbreaks.2014
EstimatingtherateofinfectionofEbola
R0:~2.18(range1.24- 3.55)Incubationtime:~4.92daysInfectiousperiod:~2.58daysThus,onaverage2peoplewillbeinfectedbyeveryinfectedindividualThisislowwhencomparedtosomeothercommonpathogens.E.g.:
Influenza:2-3HIV:2-5Measles:12-18
BIOMEDICALSCIENCES
51StadlerTetal.PLOSCurrentsOutbreaks.2014
ConorMeehan [email protected]
LateralgeneTransfer(LGT)
Alsocalledhorizontalgenetransfer(HGT)Firstobservedbetweenpneumococciinmice3mainways:
TransformationUptakeofnakedDNAOftenlimitedtospecificenvironmentalcuesEstimated~1%ofknownspecies
ConjugationInvolvesthetransferofplasmidsManyplasmidsarehighlypromiscuous
TransductionInvolvesanintermediatephageRampantevidenceinnearlyallprokaryoticgenomes
BIOMEDICALSCIENCES
55
©!!""#!Nature Publishing Group!
!
1 Entry into the transfer process• Release of naked DNA
• Packaging into phage particle• Presence of pac sites• Interaction with mating-pair formation apparatus• Integration of plasmid into chromosome
3 Uptake + successful entry• Restriction• Antirestriction systems• Selection against restriction sites
Donor
Recipient
2 Selection of recipient• Uptake sequences in DNA• Binding of naked DNA
• Surface exclusion
• Phage receptor specificity• Pilus specificity
4 Establishment• Replication • Integration• Homologous recombination• Illegitimate recombination
COMPETENCEThe ability of bacteria to take up extracellular DNA.
For natural transformation to occur, bacterial cells must first develop a regulated physio logical state of COMPETENCE, which has been found to involve approximately 20 to 50 proteins. With the exception of Neisseria gonorrhoeae, most naturally transform-able bacteria develop time-limited competence in response to specific environmental conditions such as altered growth conditions, nutrient access, cell density (by quorum sensing) or starvation. The proportion of bacteria that develop competence in a bacterial population might range from near zero to almost 100%. As the growth environments and factors that regulate competence development vary between bacterial species and strains6, there is no universal approach to determine if a given bacterial isolate can develop competence as a part of its life cycle. To the extent investigated, the proportion of bacteria found to be naturally transformable is approximately 1% of the validly described bacterial species7. The ability to take up naked DNA by natural transformation has been detected in archaea and divergent subdivisions (phyla) of bacteria, including representatives of the Gram-positive bacteria, cyanobacteria, Thermus spp.,
Deinococcus spp., green sulphur bacteria and many other Gram-negative bacteria8,9. Many human patho-genic bacteria, including representatives of the genera Campylobacter, Haemophilus, Helicobacter, Neisseria, Pseudomonas, Staphylococcus and Streptococcus, are naturally transformable9. The conserved ability to acquire DNA molecules by natural transformation among a broad range of bacteria indicates that the genetic trait is functionally important in the environ-ment, enabling access to DNA as a source of nutri-ents or genetic information. Prerequisites for natural transformation include the release and persistence of extracellular DNA, the presence of competent bacte-rial cells and the ability of translocated chromosomal DNA to be stabilized by integration into the bacterial genome or the ability of translocated plasmid DNA to integrate or recircularize into self-replicating plasmids (FIG. 2).
Release of extracellular DNA in the environment. Natural transformation relies on bacterial exposure to extracellular DNA molecules in the environment. DNA continually enters the environment upon release from decomposing cells, disrupted cells or viral particles, or through excretion from living cells. The release of intact DNA from decomposing cells depends on the activity and location of nucleases and reactive chemi-cals. Active excretion of DNA has been reported for many genera of bacteria, including Acinetobacter, Alcaligenes, Azotobacter, Bacillus, Flavobacterium, Micrococcus, Pseudomonas and Streptococcus8–10. For instance, extracellular DNA has been found at con-centrations of up to 1–3 µg per ml in liquid cultures of an Acinetobacter sp. and Bacillus subtilis11 and up to 780 µg per ml in cultures of the environmental isolate Pseudomonas aeruginosa KYU-1 REF. 12. Recently, extracellular DNA has been identified as an important component in biofilm formation13. Nevertheless, the extent of, and role of, active release of DNA by bacteria in natural, nutrient-limited habitats remains to be fully understood.
Passive release of DNA from dead bacteria occurs after self-induced lysis, a process that results in broken cell walls and membranes and the subsequent exposure to, and release of, cytoplasmic contents, including DNA, in the environment14. Pathogenic microorganisms can also undergo lysis caused either by the host immune system or the antibiotic treatment of infections. From studies of 14C-labelled Escherichia coli, it has been estimated that between 95% and 100% of the bacte-rial DNA is released after contact with the immune system15. Most of this DNA is probably degraded by DNases present in human serum and plasma. In one study, the mean DNase activity of 50 patients destroyed 90% of the added DNA of Haemophilus influenzae within a few minutes16. A different study, however, reported longer persistence times for both chromosomal and plasmid DNA in serum17 — large plasmids and chromosomal DNA were substantially degraded after a 4-hour exposure of a serum-sensitive E. coli strain, but smaller plasmids (pBR322 and
Figure 1 | The process of horizontal gene transfer. A schematic outlining the stages through which DNA must go on its journey from donor to recipient bacteria. The process begins with DNA in a potential donor cell becoming available and ends when this DNA becomes a functional part of a recipient cell’s genome.
712 | SEPTEMBER 2005 | VOLUME 3 www.nature.com/reviews/micro
R E V I EWS
LGTconsequences
XenologsNewfunctionOrthologousreplacement
PhylogeneticsBreakseverythingSpeciestreesfromgenetrees
Ensureitsnotaxenolog firstWhatisthetruehistoryofanorganism?
Severalsub-histories?Whatistheunitofreproduction?
Philosophical,yetimportantformappingevolution
BIOMEDICALSCIENCES
56
DetectingLGT:homologyapproach
DetectionofLGTisaverydifficultproblem
Thesearesomesuggestions,eachwiththeirflaws
Homology-basedBi-directionalbesthits(BBH)HGTector (Zhuetal.BMCGen2014)
IssuesDatabasecoverageDistantLGTdifficulttofindPhylogeneticallyunaware
BIOMEDICALSCIENCES
57
DetectingLGT:phylogeneticcongruenceapproach
LeighetalGBE2011(DOI:10.1093/gbe/evr050)Isthespeciestreeareasonabletreegiventhegenealignment?Collectallreasonabletreesgiventhegenealignment
Bootstrapreplicates/BayesianposteriordistributionoftenusedShouldbealltrees
Isthespeciestreewithinthisreasonableset?AUtest
Issues:Limitedorancient(e.g.prespeciation) LGTinmostlyverticalwillnotbedetected
Samplingissues
Whatisthespeciestree?Canyoueverbesureamarkerhasnotbeentransferred?Oftencircularreasoning
Networkthinking
BIOMEDICALSCIENCES
58
Doweneedmicrobialspecies?
Perhapsnot
Usefultoclinicians
Usefulforcountingorganismsinanenvironmentorrelatingabundancestochanges
Usefulfordiscussingprojectsetc.
“Ilookatthetermspeciesasonearbitrarilygivenforthesakeofconveniencetoasetofindividualsresemblingeachother”(Darwin,1859)
BIOMEDICALSCIENCES
60
Speciesasclusters
SpeciesconceptsaregenerallybasedonthenotionthatorganismscomprisedistinctclustersinnatureThismeansthatthereisnotacontinuumofgenotypesand/orphenotypesHowever,clusterscanformunderrandombirth/deathmodels
Aspeciesshouldpresumablybeaclusterthatisformedbysomeprocess,notjustrandomdrift
Anygapsbetweenclustersshouldnotbeduetosamplingbiasorerror
Probablythebiggestproblemforprovingclustering
BIOMEDICALSCIENCES
61
Definingaspecies
TheBiologicalSpeciesConceptisthemostoftenuseddefinitionofaspecies(oratleastmostgenerallyknown)
StatesthataspeciesisagroupwherememberscanproducefertileoffspringthroughmatingWorksfor(most)animalsandplantsExcludesallasexualorganisms
CohansecotypemodelStatesthatanasexualclonalspeciescanformbymutationsthatallowittooutcompeteothersandthusselectivesweepsoccurLGTisallowedinmodeltoinitiateaselectivesweepbutnottoshapelong-termcohesiveness
Recombinationhasbeenshowntocontributemoretodiversificationthanpointmutationsinsomebacteria
BIOMEDICALSCIENCES
62
Definingmicrobialspecies
Inprokaryotes,specieswereoriginallydefinedby>70%inastandardizedDNA–DNAhybridisationexperiment
MakessomebacterialspeciesasdiverseasvertebrateordersNow,oftenaspeciesisdefinedashavingwithin97%identical16Ssequencesbetweenthetwoorganisms
Singlegene’sevolutionaryhistoryasbasisMultiplecopieswithlargedifferencesarepossible
Canalsousesharedorthologousgenesusing:ConcatenatedtreesAverageNucleotideIdentity(ANI)≥95%Genome-to-GenomeDistance(GGD)≥70%Genomic-Signature-Delta-Difference(GS-DD)δ<δ*
BIOMEDICALSCIENCES
63
ANI
BIOMEDICALSCIENCES
65
investigators think that an ANI of 99% would match more closelyto phenotypic diversity among species of animals and plants(Konstantinidis and Tiedje 2005), and perhaps even this is notstringent enough (Fig. 2).
Whatever species definition we adopt, there remains theproblem of coupling to some underlying species concept(s) thatrationalizes its methods and cut-off values. As Gevers et al. (2006)lament, ‘‘any effort to produce a robust species definition is hin-
dered by the lack of a solid theoreticalbasis explaining the effect of biologicalprocesses on cohesion within and di-vergence between species.’’ Possible co-hesive forces are addressed in the nextsections, but it is worth mentioning herethat two recent formulations of pro-karyotic species concepts appear to be(deliberately) so general that, like deQuerioz’s general lineage concept, theyfinesse the concept–definition coupling.The first is Staley’s ‘‘genomic-phylogeneticspecies concept’’ (Staley 2006), and thesecond is a ‘‘metapopulation lineage’’formulation endorsed by Achtman andWagner (2008). These latter authors, ac-knowledging a debt to and quoting deQuerioz, claim that ‘‘unlike other spe-cies concepts, metapopulation lineagesdo not have to be phenotypically dis-tinguishable, or diagnosable, or mono-phyletic, or reproductively isolated, orecologically divergent, to be species.They only have to be evolving separatelyfrom other lineages. Microbes that formdistinct groups owing to a cohesive forceare metapopulation lineages and thusform species, whereas microbes withoutlimits imposed by a cohesive force donot.’’
This way of thinking embodies thespirit of what one hopes to capture witha species concept. But we must againpoint out that by giving up all methodsof detecting or quantifying ‘‘cohesiveforces,’’ such bare bones species conceptscannot be used to answer any questionswe might have about species in general—such as how many there are, what theirpopulations sizes are, and whether they arecosmopolitan or endemic.
Clustered diversity and itsmeaning for speciesBasic to any notion of species is that innature they comprise discrete clustersof organisms, defined genomically andphenomically—that genome/phenomespace is not uniformly filled by a seam-less spectrum of intergrading types. AsKonstantinidis et al. (2006) note, ‘‘animportant issue that remains unresolvedis whether bacteria exhibit a geneticcontinuum in nature. . .’’
It is necessary to recall here thateven the simplest random birth and deathmodel of replicating lineages will produce
Figure 2. Comparison of average nucleotide identities (ANI) with gene content. 773 genomes availablein NCBI’s RefSeq database were initially clustered using 16S rRNA identity of at least 97% as a guide to formgroups. A dozen clusters were selected (list of genomes within each cluster is available in SupplementalTable 1). For genomes within each cluster, pairwise ANI was calculated essentially as described in Kon-stantinidis and Tiedje (2005). Shared genes for each pair of genomes were identified as reciprocal top-scoring BLASTP matches (E-value < 0.001, z = 20,000,000). The proportion of shared genes was calculatedas a ratio of the number of shared genes over the average number of genes in two genomes. Each ORF ina genome was assigned to a functional category according to the Clusters of Orthologous Groups (COG)database (August 2005 release), and three selected categories are depicted in this figure: categories J, P,and Q in COG category one-letter designation. Note that genomes of the E. coli/Shigella group have similarANI values, but dramatically varying gene content. Some groups form tight clusters (e.g., Legionella spp.),while others exhibit a continuum of ANI/shared genes values (e.g., Burkholderia spp.). The clustering alsoexhibits a large variability in the number of shared genes if genes are considered by functional category.
Doolittle and Zhaxybayeva
746 Genome Researchwww.genome.org
Cold Spring Harbor Laboratory Press on July 17, 2012 - Published by genome.cshlp.orgDownloaded from
Species Suggestedaction
Newname
M.conceptionense M.senegalense fusion M.senegalenseM.chimaera M.intracellulare M.yongonense subspeciation M.intracellulare
subsp.intracellularesubsp.chimaerasubsp.yongonense
M.engbaekii M.hiberniae fusion M.hiberniae
M.austroafricanum M.vanbaalenii fusion M.austroafricanum
M.marinum M.pseudoshottsii
fusion M.marinum
Fusesmanyspeciesdefinedinotherways(11/134Mycobacteriumspecies)
Perhapsfusestoomany
DoolittleandZhaxybayeva.GenomeRes.2009Tortoli,Fedrizzi,Meehanetal Submitted
Microbiomesandspecies
Wecannotaskthesimplequestion‘Whoisthere?’withoutdefiningthewho(species?)Metagenomedatahasallowedustosomewhatovercomethesamplingbias
MinorpopulationsContinuum?
CommunitymicrobialecologyraisesmanyquestionsCommunityorassemblage?
Boon,Meehanetal.FEMSmicrorev2014Hostandmicrobeeffectsoneachothersevolution?
CanbegintoaskwhataunitofdiversityisAgeneorcellorcommunity?
BIOMEDICALSCIENCES
66
ConorMeehan [email protected]