Global patterns in coronavirus diversity · Since the emergence of Severe Acute Respiratory...

15
Global patterns in coronavirus diversity Simon J. Anthony, 1,2,3, * Christine K. Johnson, 4 Denise J. Greig, 4 Sarah Kramer, 1,5 Xiaoyu Che, 1 Heather Wells, 1 Allison L. Hicks 1 , Damien O. Joly, 6,7 Nathan D. Wolfe, 6 Peter Daszak, 3 William Karesh 3 , W. I. Lipkin, 1,2 Stephen S. Morse, 2 PREDICT Consortium, 8 Jonna A. K. Mazet, 4,† and Tracey Goldstein 4, * ,† 1 Center for Infection and Immunity, Mailman School of Public Health, Columbia University, 722 West 168 th Street, New York, NY 10032, USA, 2 Department of Epidemiology, Mailman School of Public Health, Columbia University, 722 West 168 th Street, New York, NY 10032, USA, 3 EcoHealth Alliance, 460 West 34 th Street, New York, NY 10001, USA, 4 One Health Institute & Karen C Drayer Wildlife Health Center, School of Veterinary Medicine, University of California Davis, Davis, CA 95616, USA, 5 Department of Environmental Health Sciences, Mailman School of Public Health, Columbia University, 722 West 168 th Street, New York, NY 10032, USA, 6 Metabiota, Inc. One Sutter, Suite 600, San Francisco, CA 94104, USA, 7 Wildlife Conservation Society, New York, NY 10460, USA and 8 http://www.vetmed.ucdavis.edu/ohi/predict/publications/Authorship.cfm *Corresponding author: E-mail: [email protected]; [email protected] Joint senior author. Abstract Since the emergence of Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV) and Middle East Respiratory Syndrom Coronavirus (MERS-CoV) it has become increasingly clear that bats are important reservoirs of CoVs. Despite this, only 6% of all CoV sequences in GenBank are from bats. The remaining 94% largely consist of known pathogens of public health or agri- cultural significance, indicating that current research effort is heavily biased towards describing known diseases rather than the ‘pre-emergent’ diversity in bats. Our study addresses this critical gap, and focuses on resource poor countries where the risk of zoonotic emergence is believed to be highest. We surveyed the diversity of CoVs in multiple host taxa from twenty countries to explore the factors driving viral diversity at a global scale. We identified sequences representing 100 discrete phylogenetic clusters, ninety-one of which were found in bats, and used ecological and epidemiologic analyses to show that patterns of CoV diversity correlate with those of bat diversity. This cements bats as the major evolutionary res- ervoirs and ecological drivers of CoV diversity. Co-phylogenetic reconciliation analysis was also used to show that host switching has contributed to CoV evolution, and a preliminary analysis suggests that regional variation exists in the dy- namics of this process. Overall our study represents a model for exploring global viral diversity and advances our funda- mental understanding of CoV biodiversity and the potential risk factors associated with zoonotic emergence. Key words: coronavirus; viral ecology; bat; evolution. V C The Author 2017. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/ licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected] 1 Virus Evolution, 2017, 3(1): vex012 doi: 10.1093/ve/vex012 Research article

Transcript of Global patterns in coronavirus diversity · Since the emergence of Severe Acute Respiratory...

Page 1: Global patterns in coronavirus diversity · Since the emergence of Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV) and Middle East Respiratory Syndrom Coronavirus (MERS-CoV)

Global patterns in coronavirus diversitySimon J Anthony123 Christine K Johnson4 Denise J Greig4

Sarah Kramer15 Xiaoyu Che1 Heather Wells1 Allison L Hicks1Damien O Joly67 Nathan D Wolfe6 Peter Daszak3 William Karesh3W I Lipkin12 Stephen S Morse2 PREDICT Consortium8 Jonna A K Mazet4dagger

and Tracey Goldstein4dagger

1Center for Infection and Immunity Mailman School of Public Health Columbia University 722 West 168th

Street New York NY 10032 USA 2Department of Epidemiology Mailman School of Public Health ColumbiaUniversity 722 West 168th Street New York NY 10032 USA 3EcoHealth Alliance 460 West 34th Street NewYork NY 10001 USA 4One Health Institute amp Karen C Drayer Wildlife Health Center School of VeterinaryMedicine University of California Davis Davis CA 95616 USA 5Department of Environmental HealthSciences Mailman School of Public Health Columbia University 722 West 168th Street New York NY 10032USA 6Metabiota Inc One Sutter Suite 600 San Francisco CA 94104 USA 7Wildlife Conservation Society NewYork NY 10460 USA and 8httpwwwvetmeducdaviseduohipredictpublicationsAuthorshipcfm

Corresponding author E-mail sja2127cumccolumbiaedu tgoldsteinucdavisedudaggerJoint senior author

Abstract

Since the emergence of Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV) and Middle East Respiratory SyndromCoronavirus (MERS-CoV) it has become increasingly clear that bats are important reservoirs of CoVs Despite this only 6 ofall CoV sequences in GenBank are from bats The remaining 94 largely consist of known pathogens of public health or agri-cultural significance indicating that current research effort is heavily biased towards describing known diseases ratherthan the lsquopre-emergentrsquo diversity in bats Our study addresses this critical gap and focuses on resource poor countrieswhere the risk of zoonotic emergence is believed to be highest We surveyed the diversity of CoVs in multiple host taxafrom twenty countries to explore the factors driving viral diversity at a global scale We identified sequences representing100 discrete phylogenetic clusters ninety-one of which were found in bats and used ecological and epidemiologic analysesto show that patterns of CoV diversity correlate with those of bat diversity This cements bats as the major evolutionary res-ervoirs and ecological drivers of CoV diversity Co-phylogenetic reconciliation analysis was also used to show that hostswitching has contributed to CoV evolution and a preliminary analysis suggests that regional variation exists in the dy-namics of this process Overall our study represents a model for exploring global viral diversity and advances our funda-mental understanding of CoV biodiversity and the potential risk factors associated with zoonotic emergence

Key words coronavirus viral ecology bat evolution

VC The Author 2017 Published by Oxford University PressThis is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (httpcreativecommonsorglicensesby-nc40) which permits non-commercial re-use distribution and reproduction in any medium provided the original work is properly citedFor commercial re-use please contact journalspermissionsoupcom

1

Virus Evolution 2017 3(1) vex012

doi 101093vevex012Research article

1 Introduction

In 20023 SARS-Coronavirus (SARS-CoV) emerged in theGuangdong province of southern China (Drosten et al 2003Ksiazek et al 2003) It quickly spread to twenty-seven countriesinfecting 8098 people with 774 deaths and was declared thefirst global pandemic of the 21st century Bats were identified asthe reservoir (Lau et al 2005 Li et al 2005) and probable source(Ge et al 2013) of the outbreak In 2012 the SARS pandemic wasfollowed by MERS-Coronavirus (MERS-CoV) which emerged inthe Middle East (Zaki et al 2012) with 1782 confirmed humancases and 640 deaths (as of September 2016) Camels were iden-tified as the likely source of human infections (Reusken et al2013 Azhar et al 2014) however bats were again found to hostclosely related (MERS-like) viruses and are therefore assumed tobe the original evolutionary source (Woo et al 2006 Anthonyet al 2012 Corman et al 2014ab Wacharapluesadee et al 2015Anthony et al 2017)

Together these outbreaks have cemented the coronaviridaeas a family of zoonotic concern and stimulated a surge in viraldiscovery efforts in bats [reviewed by Drexler et al (2014)]These efforts appear to show that almost all human CoVs havezoonotic origins or otherwise circulate in animals including hu-man 229E [bats (Pfefferle et al 2009) camels (Sabir et al 2016)]NL63 [bats (Donaldson et al 2010 Huynh et al 2012)] and OC43[cattle (Vijgen et al 2005)] Even non-human CoVs such as por-cine epidemic diarrhea virus (PEDV) may have emerged by hostswitching from other animals [bats (Tang et al 2006 Huanget al 2013)] Overall it seems that CoV diversity in bats is sub-stantial (Drexler et al 2014) that these viruses are prone to hostswitching (Woo et al 2009) and that they are a current historicand future threat to public health

Despite recent efforts there are several notable gaps in ourunderstanding of CoV diversity Foremost our knowledge ofCoVs in resource-limited countries is poor (Drexler et al 2014)This is particularly problematic given that many of these sameareas are predicted to be hotspots of disease emergence (Joneset al 2008) Second there has been little effort to understand theevolutionary and ecological drivers of CoV diversity on a globalscale or to evaluate host and regional variation in the factorsthat contribute to the risk of emergence (eg host switching)Indeed most studies to date have been somewhat limited in theirgeographic scope and have included only small sample sizeswith little epidemiologic or ecological context In short there is acritical need for a more global perspective on CoV diversity

In 2009 the PREDICT project was established in part to addressthis need Focused on strengthening capacity and identifying vi-ruses in wildlife at high-risk interfaces (PREDICT_Consortium2014) this USAID Emerging Pandemic Threats (EPT) initiativeworked with local partners in twenty countries across LatinAmerica Africa and Asia over 5 years to better understand the cur-rent diversity of CoVs (as well as other viruses) and evaluate thefactors that drive this diversity at different scales Herein we reporta large diversity of CoV sequences (mostly from bats) show thatthe biogeography of bats has shaped the diversity of CoVs globallyand provide evidence to suggest there could be regional variationin host switching and the risk for zoonotic emergence

2 Methods21 Animals and samples

Bats (nfrac14 12333) rodents (nfrac14 3387) and non-human primates(nfrac14 3470) were humanely sampled (capture and release) from

twenty lsquohotspotrsquo countries (Jones et al 2008) representing cen-tral Africa (Cameroon Gabon Democratic Republic of CongoRepublic of Congo Rwanda Tanzania Uganda) Latin America(Peru Bolivia Brazil Mexico) and Asia (Bangladesh CambodiaChina Indonesia Laos Malaysia Nepal Thailand Viet Nam)Samples were also collected from humans (nfrac14 1124) in sevencountries in central Africa and Asia as a pilot to begin to explorethe propensity for viral sharing with wildlife In all twenty coun-tries wildlife samples were collected from lsquohigh riskrsquo interfaceswhere direct or indirect contact with humans might promotezoonotic viral transmission The selected sampling sites in-cluded areas of land-use change (deforestation conversion toagriculture) sites in and around human dwellings foci of eco-tourism markets restaurants and farms along the animal valuechain and areas where occupational exposure was likely (ani-mal sanctuaries agricultural activities) When possible individ-uals were identified to lowest taxonomic order (genus andspecies) and assigned to an age class (adult subadult neonate)by the field teams All samples from swabs (eg oral urine rec-tal) fluids (eg saliva) and tissues were collected into (1)NucliSensVR Lysis Buffer (bioMerieux Inc Marcy-IrsquoEtoile France)and (2) viral transport media and then frozen in the field in liquidnitrogen and transferred to the laboratory for storage at 80 CAll animal sampling activities were conducted with permissionsfrom local authorities and under the Institutional Animal Careand Use Committee at the University of California Davis (proto-col number 16048) All human activities were reviewed and ap-proved by the UC Davis Institutional Review Board (IRB) underprotocols 215253 and 432330

22 Viral discovery

RNA was extracted from all samples and cDNA prepared usingsuperscript III (Invitrogen) Two broadly reactive consensus PCRassays targeting non-overlapping fragments of the orf1ab wereused to detect both known and novel CoVs (Quan et al 2010Watanabe et al 2010) The first [the lsquoWatanabersquo assay(Watanabe et al 2010)] amplified a 434 bp fragment of theRNA-dependent RNA polymerase (RdRp) corresponding to nu-cleotides (NTs) 15156ndash15589 in the human CoV OC43 genome(NC_005147) while the second [the lsquoQuanrsquo assay (Quan et al2010)] amplified a 332 bp fragment of a different peptide down-stream of the RdRp corresponding to NTs 18323ndash18654Amplified products of the expected size were cloned and se-quenced (traditional Sanger dideoxy sequencing) according tostandard protocols and sequences edited manually in GeneiousPro (ver 913 Biomatters Auckland NZ) We note that this ap-proach was adopted to facilitate viral discovery in resource-poor settings which are the target of pandemic preparednessinitiatives such as USAID PREDICT and where techniques suchas high throughput sequencing are largely unavailable

23 Phylogenetics

Coronavirus sequences from the Quan and Watanabe datasetswere aligned with reference sequences collected from GenBankusing MUSCLE (Edgar 2004) followed by manual alignment inSe-Al v20a11 (httptreebioedacuk) The best-fitting model ofnucleotide substitution for each dataset was selected byAkaikersquos Information Criterion (AIC) using jModeltest v215(Darriba et al 2012) For both alignments the general time re-versible model with a gamma distribution of rate heterogeneitywas the best-fitting model Maximum likelihood trees were con-structed from each alignment using MEGA v606 (Tamura et al

2 | Virus Evolution 2017 Vol 3 No 1

2013) Histograms of all pairwise sequence identities () wereplotted as described previously for hantaviruses (Maes et al2009) and used to define cut-offs between viral sequence clus-ters for analysis (akin to operating taxonomic units)

24 Diversity measures

The Earthrsquos surface was divided into grid cells each 10 latitudeby 10 longitude Thirty-four of these grid cells included areasfrom which bats were sampled (bats were identified down tothe species level in thirty cells) and twenty-seven cells hadsamples testing positive for CoVs If the coordinates listed for aparticular bat were on the line between grid cells it was arbi-trarily included in the grid cell with higher latitudelongitudeViral and host species richness for each cell was calculated bydetermining the total number of unique virus and bat specieswithin each cell Alpha diversity was calculated using theShannon index (Jost et al 2011) which was chosen over theGinindashSimpson index because it does not place disproportionateweight on dominant species Since we do not know how repre-sentative our sampling was the Shannon index was deemedthe best approach The effective number of viruses and bats ineach cell was then correlated using Kendallrsquos tau since the val-ues were not normally distributed Sensitivity analyses wereperformed by shifting the grid cells in 1 increments and re-calculating richness and effective species numbers

Matrices of beta diversity between each combination of gridcells were developed using the Jaccard index (Jost et al 2011)and Mantel tests were used to determine whether the distancebetween grids was associated with the dissimilarity in virus andbat species Geographic distance matrices were calculated byfinding the centroid of each grid cell and calculating the geode-sic distance between each pair using the R packagelsquogeospherersquo(Geosphere 2015) and tests were performed usingSpearmanrsquos rank correlation All analyses of diversity were con-ducted in R version 232

25 Network model

A presenceabsence matrix was constructed to show the distri-bution of viral sequence clusters across all species Using thePython package NetworkX (Hagberg et al 2008) a networkmodel was constructed connecting bat species to all viral clus-ters that were identified within that species in our data Thenetwork is made up of thirty-one connected componentseighty-two viral clusters eighty-five bat species and 159 edgesNetworks were plotted using Gephi using the force-directed al-gorithm ForceAtlas2 (Jacomy et al 2014) Specifically we plottedtwo bipartite graphs one where hosts are colored by region andone in which they are colored by family

26 Cophylogeny

The Jane (Conow et al 2010) software tool was used for cophylo-genetic reconstructions This approach applies an a priori lsquocostschemersquo to different evolutionary events in order to test the de-gree of congruence between two trees (virus and host) Theselsquoeventsrsquo include cospeciation duplication host switching andfailure to diverge (or sorting) (Charleston 1998 Conow et al2010) which are used to create a minimal-cost reconstruction ofthe evolutionary history between the viruses and hostsAnother cophylogeny program CoRe-PA (Merkle et al 2010)uses a parameter-adaptive approach to estimate an appropriatecost scheme without any prior value assignment For our analy-sis we used Jane with the default pre-determined cost scheme

where cospeciation is the null hypothesis and therefore thelsquocheapestrsquo event with the assumption that host switchingwould theoretically be more costly than genetic drift within aspecies to which a virus is already adapted Analysis was re-peated with varied cost values in Jane to test for sensitivity toour cost scheme as well as in CoRe-PA for robustness andevent types and costs from each reconstruction were comparedbetween the two programs As these changes ultimately did notsignificantly influence the outcome of the analysis exact eventassignments used in the final analysis were inferred solely fromJane with the default cost scheme

The analyses were performed using all unique associationsbetween viruses and their respective hosts (limited to bats)Alpha-CoV sequences and beta-CoV sequences were analyzedseparately since each appears to have diversified indepen-dently within bats This gave us a total of four reconstructions(alpha-Quan alpha-Watanabe beta-Quan beta-Watanabe)Only one instance of each virusndashhost association was includedeven if multiple instances of the same association were ob-served (ie our analysis does not account for the frequency ofdetection) Associations were excluded if the host was not iden-tified to the species level or if the cytochrome B sequence didnot exist in GenBank or could not be amplified by PCR The to-pology of the virus tree was inferred from the topology of thefull tree for consistency since subsets of sequences often pro-duce different arrangements

For each reconstruction the most recent evolutionary eventleading to a virusndashhost association was recorded Given thatsorting is particularly sensitive to missing data (it indicates aloss on the tree when in fact it may simply be an artifact ofunder-sampling) we have excluded this event from our analy-sis We also distinguished between two potential types of hostswitching based on an assumption that there could be differ-ences in zoonotic risk between viruses that only move betweenclosely related bats (eg those within the same genus) and thosethat are able to make more substantive jumps into distantly re-lated species (those in other host genera or families) We there-fore use lsquohost switchingrsquo to refer specifically to viral lineagesthat have moved into a host belonging to another genus or fam-ily and lsquosharingrsquo to indicate a viral lineage that has moved intoa different host species of the same genus (without speciating)We selected host genus as the cut-off to be consistent with cur-rent conventions of taxonomic hierarchy and verified that thenumber of species per genus is relatively consistent between re-gions We further verified that the mean genetic distance be-tween host species is consistent between regions (ie that thetaxonomic distance a virus has to navigate when moving be-tween any two species is roughly similar in all three regions)

A binary regression model with logistic link function wasused to investigate whether the degree of host switching variedby region The dependent variable was host switching and ei-ther cospeciation or sharing was the reference group Regionwas the independent variable We used the generalized estimat-ing equations (GEE) (Liang and Zeger 1986) method in order toaccount for multiple event types within one family and derive arobust estimate of the standard errors Event data for all four re-constructions were aggregated for the analysis Where hostndashvi-rus associations were duplicated one was removed If theduplicated event types did not agree [which can occur given thesensitivity of these methods to variation in topology and num-ber of taxa in each tree (Conow et al 2010)] we randomly se-lected one of the two possibilities and repeated therandomization 100 times The GEE estimates from the 100 ran-dom iterations were then combined using rules established by

S J Anthony et al | 3

Rubin (1987) The analysis was repeated to explore variationbased on bat family (independent variable) Some families onlypossessed one type of event which would introduce quasi-complete separation into the GEE model so they were excludedfrom the corresponding models However the numbers of batspecies within these excluded families were very small (lt5)compared to the overall number of bat species for the wholestudy so we felt this would not impact any overall trends

27 Viral discovery effort

The relationship between sampling effort and viral discoverywas evaluated using a log-link Poisson regression model fittedwith the count of viruses for each species as the dependent vari-able and the number of animals sampled as the independentvariable This model is universally applied with count data Thecoefficient of the sample number variable is the log of the ratioof the expected number of viruses with (nthorn 1) samples collectedto the expected number of viruses with n samples collectedwhere n can be any arbitrary positive integer Using this esti-mated coefficient we calculated the estimated numbers of vi-ruses (viral sequence clusters) that can be found with respect tothe sampling effort (ie all possible numbers of collected ani-mals within a species) and they were then used to plot the esti-mated curve together with its 95 confidence band We clarifythat we use the term lsquoexpectedrsquo to indicate the statistical expec-tation based on the estimate of the regression model We ac-knowledge that the model fit is not as good as the model withan additional quadratic term of the independent variable none-theless it avoids the problem of over-fitting and realistically re-flects the relationship between sampling effort and viraldiscovery when the number of animals collected is relativelylower

28 Factors associated with CoV positivity

Due to the small numbers of positive individuals obtained forrodents non-human primates and humans only bat data wereanalyzed for factors associated with positivity A total of 12333bats were tested This included 5624 males and 5767 females(922 were not sexed) Among aged bats 7385 were adults 1029were sub-adults and 8 were neonates (3911 were not aged)Binary logistic regression models were used to evaluate signifi-cant variables restricting analyses to species for which 50 in-dividuals were tested Model selection was first performed in R(version 325) (R Foundation for Statistical Computing 2008) us-ing the R package lsquogmultirsquo and the step function and the bestmodel determined by AIC A robust estimation of standard errorwas calculated by clustering by individual Animal Identificationnumber to account for non-independence of multiple tests fromthe same animal using STATA 131 SE (College Station TXUSA) The independent variables included in analyses wereSpecimen Type (blood fecesrectal swabs guano oralrectalsample oralnasal swabs tissue urineurogenital swabs) HostTaxon (family sub-family genus) Host Age (adult sub-adultneonate) Season (wet dry) and Interface (animal use humanactivity land use pristine area) Season was designated as lsquowetrsquoor lsquodryrsquo according to month and proximity to the equator foreach country Interfaces were broadly grouped according to thetype and intensity of animal contact with people specifically (1)Land Use (animals sampled in areas with crops extractive in-dustries livestock activities) (2) Animal Use (hunting marketsrestaurants trade wild animal farms wildlife managementszoossanctuaries handling by veterinarianresearchers) (3)

Human Activities (ecotourism in and around human dwell-ings) and (4) Pristine (where animal human contact was notlikely) The model fit for each region was evaluated using theHosmerndashLemeshow goodness-of-fit test and pseudo R2 values

29 Virus lsquoHotspotrsquo Maps

The number of viral sequence clusters (richness) in each sub-clade was summarized by host family (Supplementary TableS1) and chi square tests used to evaluate whether viral sub-clade and host family were independent of each other Bat spe-cies richness maps were then created to infer hotspots of CoVdiversity For each viral clade spatial distribution data on all batspecies belonging to associated families was obtained from theInternational Union for Conservation of Nature (IUCN) UsingArcGIS version 10 (ESRI 2011) bat species were quantified bycounting overlapping polygons The resulting count data werethen converted to a raster and plotted All sampled bats as wellas all identified sequence clusters within the clade in questionwere then plotted over the raster data

3 Results31 Global diversity of CoVs

A total of 19192 animals and humans were assayed for thepresence of CoV by consensus PCR (cPCR) (Table 1) The majoritywere bats (nfrac14 12333) representing 282 species from twelvefamilies Overall the proportion of CoV positive individuals was86 in bats (nfrac14 106512333) and 02 in non-bats (nfrac14 176859) In other words over 98 of all positive individuals werebats

Partial sequences were obtained from two non-overlappingfragments of the orf1ab gene yielding 654 sequences for thelsquoQuanrsquo region and 950 for the lsquoWatanabersquo region (see lsquoMethodsrsquo)There was a 27 overlap where sequences were obtained for bothregions from the same sample The distribution of pairwise se-quence identities defined a 90 cut-off between taxa (Fig 1) andthe resulting monophyletic groups (in which all sequenceshad90 identity) used as our operating taxonomic units Groupsthat shared less than 90 identity to a known sequence were la-beled sequentially as PREDICT_CoV-1 -2 -3 etc while groups shar-ing90 identity to a sequence already in GenBank wereconsidered to be strains of a known virus and assigned the samename as the matching sequence (eg Kenya_bat_CoV_KY33 orHKU-9) Sequences that sharedgt90 identity but were found indifferent hosts were considered part of the same taxonomic unitBased on these criteria 100 discrete viral taxa were identifiedninety-one of which were found in bats (Table 1 Fig 2)Importantly we make no claims that these groups correspond tolsquospeciesrsquo as full orf1ab replicase sequences would be required forthat (King et al 2012) Instead we clarify that we are using thesepartial fragments to cluster sequences into lsquooperational taxonomicunitsrsquo (analogous to OTU clustering in microbiome research) Wealso stress that our cut-off was determined based on two discreteregions within the orf1ab separated by gt3000 nucleotides andthat these regions correspond to unique peptides post-transla-tional cleavage

Of note was the detection of subgroup 2a CoV sequences inbats humans and non-human primates from four differentcountries This sub-clade is largely considered the rodent sub-clade (Lau et al 2015 Wang et al 2015 Tsoleridis et al 2016)but here we detected sequences corresponding to the known vi-rus betacoronavirus-1 in Pteropus medius from Bangladesh

4 | Virus Evolution 2017 Vol 3 No 1

Pteropus alecto from Indonesia and from humans in China andsequences corresponding to murine coronavirus in non-humanprimates from Nepal (these sequences were detected by fourdifferent laboratories and no rodent samples were processed atthe same time) (Fig 2) In addition we report the discovery ofseveral sub-clade 2b and 2c CoV sequence clusters (the SARSand MERS sub-clades respectively) (Fig 2) and the detection ofhuman coronavirus 229E-like sequences in hipposideros andrhinolophus bats sampled in ROC Uganda Cameroon andGabon supporting previous suggestions that 229E has zoonoticorigins (Pfefferle et al 2009 Hu et al 2015) Finally we highlightthe detection of avian-associated infectious bronchitis virus-like sequences (IBV) in bats and a closely related virus(PREDICT_CoV-49) in non-human primates from Bangladesh aswell as porcine epidemic diarrhea virus (PEDV) in bats (Fig 2)Again we confirm that no livestock samples were processed inany of these laboratories that might have acted as a source ofcontamination

32 Factors driving viral diversity

Viral a-diversity (the lsquoeffective numberrsquo of species) was signifi-cantly correlated with bat a-diversity (taufrac14 0325 Pfrac14 0022) (Fig3) suggesting that more viruses will be found in regions wherebat diversity is higher This association was maintained whenviral richness rather than effective number of species wasused as the diversity index (taufrac14 0421 Plt 0001) Viral and batb-diversity were also significantly correlated (Mantel testrhofrac14 0575 Plt 0001) In both cases diversity differentiated al-most entirely into three discrete communities by region (Fig 3)Only Africa and Asia shared any viral sequence clusters(PREDICT_CoV-35 and HKU9 Fig 4) demonstrating that bats

have had a strong biogeographic influence on the global ecologyand evolution of CoVs

Co-phylogenetic reconciliation analysis was used to investi-gate the evolutionary mechanisms driving the virusndashhost asso-ciations observed for example host switching viral sharing orco-speciation (see lsquoMethodsrsquo) Overall host switching was thedominant mechanism followed by co-speciation (Fig 5) Whenanalyzed by region host switching (inter-genus transmission)remained dominant in Africa and Asia but not in LatinAmerica Despite fewer host-switching events in Latin Americathere was a concomitant increase in virus sharing (intra-genustransmission) This suggests that viruses are still switchinghosts in Latin America but preferentially move between closelyrelated species while in Africa and Asia viruses cross betweenmore distantly related species To assess the sensitivity of ourresults to our chosen method (ie use of a pre-defined costscheme in the program Jane) we repeated the analysis for theQuan alpha-CoV and beta-CoV reconstructions using CoRePA aparameter-adaptive approach that estimates an appropriatecost scheme without any prior value assignment Comparingthe results we noted 91 agreement between Jane and CoRePAin alpha-CoV reconstruction (thirty-four events of which thirty-one agreed) and 100 agreement in the beta-CoV reconstruc-tions (twenty-six events all of which agreed) We further notethat the cost scheme calculated in CoRePA was 07 for hostswitching and 01 for co-speciationmdashsupporting the default costscheme used in Jane (ie that host switching should be consid-ered more lsquocostlyrsquo than co-speciation)

A bipartite network model connecting viral sequence clus-ters to their hosts supports these results illustrating that batCoVs were connected to several host families in Africa and Asiawhile being largely restricted to a single bat family in Latin

Table 1 Summary of individuals tested and number positive for at least one coronavirus by host taxa

Host taxa tested No individuals tested No individuals positive No distinct viruses detected

Bats 12333 1065 91Non-Human Primates 3470 4 2Rodents and Shrews 3387 11 7Humans 1124 2 2Total 19192 1082 100a

aNote Numbers do not total as two viruses were detected in two taxa

Figure 1 Histogram of the relative frequency of pairwise sequence identities used to define the cutoff between operating taxonomic units for all CoV sequences de-

tected A bimodal distribution was observed and a cutoff of 90 sequence identity used to separate sequences from both the Watanabe (Panel A) and Quan (Panel B) as-

says into discrete viral taxa

S J Anthony et al | 5

Table 2 Variables associated with coronavirus positive bat tests in Africa Asia and Latin America Regional datasets for each logistic regres-sion model consisted of bat species where greater than fifty bats were tested Age datasets were a subset of the regional datasets as age classwas not determined for all individuals Numbers in bold are statically significant for Plt 005

Odds ratio P 95 Confidence intervals

AfricaSample type Lower Upper

Fecesrectal swab 35098 lt0001 14664 84007Oralnasal swab 505 lt0001 203 1254Oralrectal swab 3098 lt0001 1346 7130Blood 165 0540 033 809Tissue 100

Host familyHipposideridae 122 0559 062 241Pteropodidae 384 lt0001 227 649Molossidae 100

SeasonDry 242 lt0001 179 325Wet 100

InterfaceAnimal use 198 0001 135 291Pristine area 139 0624 037 529Land use 166 0219 074 374Human activity 100

AgeSubadult 591 lt0001 428 817Adult 100

AsiaSample type

Fecesrectal swab 6280 lt0001 1544 25549Oralnasal swab 1325 lt0001 311 5638Urineurogenital swab 4692 lt0001 1085 20280Guano 2212 lt0001 366 13356Tissue Oralrectal swab 100

Host familyEmballonuridae 031 026 004 237Miniopteridae 978 lt0001 569 1681Pteropodidae 216 lt0001 129 362Rhinolophidae 108 082 057 203Vespertilionidae 367 lt0001 218 618Hipposideridae 100

SeasonDry 149 003 103 216Wet 100

InterfaceAnimal use 330 001 129 840Human activity 348 001 136 895Pristine area 181 035 052 635Land use 100

AgeSubadult 184 000 126 267Adult 100

Latin AmericaSample type

Fecesrectal swab 1566 0000 502 4879Oralrectal swab 2786 0000 686 11322Oralnasal swab 100

Host subfamilyCarolliinae 695 006 090 5376Stenodermatinae 504 012 067 3789Glossophaginae 100

InterfaceAnimal use 525 001 148 1865Human activity 273 012 077 969Land use 422 010 078 2295Pristine area 100

SeasonDry 130 052 058 289Wet 100

AgeSubadult 173 015 083 362Adult 100

6 | Virus Evolution 2017 Vol 3 No 1

America (Fig 4 Panel B) Further host switching was nearlyfour times more likely than virus sharing in Africa when com-pared with Latin America (OR 3858 Pfrac14 0040) CoVs in Asiawere also more likely to host switch than share when comparedwith Latin America however the results were not significant(OR 2474 Pfrac14 0143) No particular bat family was associatedwith increased host switching but this may reflect small sam-ple sizes when the data are summarized by family

33 Estimated number of CoVs in bats

A large number of host (bat) species in our study were negativefor CoV (nfrac14 197) however none of these species were sampledextensively (Fig 6) We found that all species with samplesizesgt110 individuals were positive for one or more CoVs sug-gesting that we would have detected CoVs in some of the nega-tive species in our study if sampling effort were increased Dueto the high number of these negative species and their effect onthe average number of virus sequence clusters per species theywere excluded from our estimates By retaining only those spe-cies for whichgt110 individuals were sampled (there weretwenty-seven species that qualified) we estimated the averagenumber of CoVs per species to be 267 (stdfrac14 138) thus account-ing for the likely scenario that some species will have more vi-ruses and others less We then extrapolated the average to all1200 bat species to estimate a total potential richness of 3204

CoVs (rangefrac14 1200ndash6000 CoVs) most of which have yet to bedescribed

34 Factors predicting CoV positivity

In order to refine future surveillance efforts to find the undis-covered diversity of CoVs in bats we evaluated the factors pre-dicting CoV positivity For each region the best model includedspecimen type bat family (or in the case of Latin America sub-family) season and animalndashhuman interface (Table 2) Seasonwas not included in the top model for Latin America but themodel with the season included was within two AIC (deltaAICfrac14 106) and therefore considered to have as much support asthe top model (Burnham and Anderson 2002) Specimen typewas highly associated with CoV positivity and samples contain-ing feces or fecal swabs were significantly more likely to testpositive than other specimen types in all regions Bat familywas important in Asia and Africa Season was also significantwith samples collected during the dry season more likely to testpositive in Africa and Asia than those collected during the wetseason (albeit with low odds ratios) Age class appears to be im-portant as sub-adults were more likely to test positive thanadults in Africa and Asia Sex was not related to CoV positivityin Asia or Latin America while males were slightly more likelyto be CoV positive in Africa (ORfrac14 15 12ndash19 CI Plt 0001) Broadcategories of animalndashhuman interfaces were significantly

Figure 2 Maximum likelihood phylogenetic reconstructions for all partial CoV RdRp fragments for both the Quan (Panel A) and Watanabe (Panel B) assays Sequences

are collapsed into clades representing our operating taxonomic units (sequences sharing90 identity) and number of sequences for each taxon is indicated in paren-

theses Representative published sequences from GenBank have been included for comparison (GenBank accession number indicated) Both trees were rooted using

the related Breda virus (NC_007447) and all nodes have 60 bootstrap support Pie charts indicate the distribution of viral taxa by host (bat) family in each virus sub-

clade

S J Anthony et al | 7

associated with CoV positivity among bats in all three regionsin Africa and Latin America bats sampled at the animal use in-terface category were more likely to be positive and in Asiabats sampled at the animal use and human activities interfaceswere more likely to be positive than those sampled at other in-terfaces The significance of the animal interface in all three

regions suggests that practices around animal use may be im-portant for disease transmission

35 Future sampling effort

Based on our study we were able to estimate the sampling ef-fort that would be required to find all CoVs in bats based on a

Figure 3 Comparison of viral and bat diversity The earthrsquos surface was divided into grid cells by latitude and longitude (10 10 degree units) for diversity calculations

(Panel A) Grid cells where bats were sampled are numbered in each region Alpha diversity (Shannon H) for virus (Panel B) and host (Panel C) were correlated indicat-

ing that areas of high bat diversity also have high viral diversity Darker cells indicate higher alpha diversity (ie more viral or host taxa) in each grid cell Beta diversity

was also correlated between virus (Panel D) and host (Panel E) and differentiated into three discrete communities by regionmdashLatin America (grid cells 1ndash10) Africa

(grid cells 11ndash20) and Asia (grid cells 21ndash34) Shading indicates that either viruses (in red) or hosts (in blue) are shared between two corresponding grid cells with darker

cells indicating higher pairwise similarity

8 | Virus Evolution 2017 Vol 3 No 1

Poisson regression model of our data (Fig 6) With 154 individ-uals we estimate that an average of one CoV will be detected(95 CI 136ndash177) With 397 individuals we estimate that upto five CoVs will be detected (95 CI 351ndash458) As we did not

observe any species with greater than five viral sequenceclusters we make the assumption that sampling 397 individ-uals should capture the full diversity of CoVs in each batspecies

Figure 4 Network model showing the connection of CoVs and their hosts Viral sequence clusters (colored grey) are connected to host species either by region (Panel

A) or family (Panel B) Viral and host and communities separate almost entirely by region only Africa and Asia are connected by two shared viruses (HKU9 and

PREDICT_CoV-35) found in species from both continents Networks also show that viruses appear to be shared by multiple host families in Africa and Asia while being

more restricted to a single family in Latin America

S J Anthony et al | 9

Figure 5 Relative proportion of evolutionary events leading to observed virushost associations for CoVs in bats Cophylogenetic reconstructions were used to identify

each event (Supplementary Fig S1 Panels AndashD) and significance evaluated by region Across all regions host switching was the dominant evolutionary event (Panel

A) When separated by region host switching remained dominant in Africa (Panel B) and Asia (Panel C) but not in Latin America (Panel D)

Figure 6 Relationship between sampling effort and viral detection among all bat species sampled A Poisson regression model was used to estimate the expected num-

ber of viruses based on the number of animals sampled by species (each circle indicates a separate species) The 95 confidence intervals are indicated

10 | Virus Evolution 2017 Vol 3 No 1

36 Predicting the distribution of unknown CoVs

Viral sub-clade and host family were not independent of eachother (v2 by cladefrac14 Plt 0001 Supplementary Table S1) demon-strating that different clades have significant associations withspecific bat families For example subgroup 2d CoVs were sig-nificantly associated with pteropid bats and were only identi-fied in regions where pteropid bats exist Likewise subgroup 2bCoVs were associated with rhinolophid and hipposiderid batsand 2c with vespertilionid bats To lsquopredictrsquo the potential distri-bution of unknown CoVs we plotted the known distribution ofbats belonging to these families and assume that lsquohotspotsrsquo ofbat diversity infer hotspots of viral diversity for each sub-clade(Fig 7) We note that the alphaCoV genus has been consideredas one group since they do not resolve well into sub-clades(de Groot et al 2009) and that no 2a map was generatedgiven that only one bat virus from this clade was identified

4 Discussion

The emergence of SARS and MERS has driven a need to under-stand more about the diversity ecology and evolution of coro-naviruses particularly at so-called lsquohotspotsrsquo of zoonoticemergence (Jones et al 2008 Drexler et al 2014) To this end wesurveyed the diversity of CoVs from twenty countries in LatinAmerica Africa and Asia to identify global factors driving viraldiversity and to look for regional differences in factors that con-tribute to the risk of emergence such as host switching In totalwe identified sequences from 100 discrete phylogenetic clus-ters ninety-one of which were found in bats

Our data suggest that the diversity of bat CoVs has beendriven primarily by host ecology First viral richness wasstrongly correlated with bat richness suggesting that mostCoVs will be found in regions where bat diversity is highestSecond we showed that CoV diversity separates into three

Figure 7 Viral diversity lsquohotspotrsquo maps Panel A shows the potential hotspots for CoVs based on the distribution of bats worldwide Location of alpha CoV sequences

from this study are shown in black and beta CoV sequences in blue indicating there is no geographical bias based on viral genus (ie alpha and beta CoVs are equally

likely to be found in all regions) Some viral sub-clades were associated with particular bat families and the spatial distribution data of all species belonging to these

families were plotted to indicate the potential hotspots of viral diversity (richness) for these sub-clades Panel B indicates the potential distribution of 2b CoVs based on

the distribution of rhinolphus and hipposideros bats Locations of 2b-positive animals identified in this study are indicated in black and correlate with areas of high

species richness (for these families) CoV-positive animals for other sub-clades shown in light blue Panel C indicates the potential distribution of 2c CoVs based on the

distribution of vespertilionid bats Locations of 2c-positive animals identified in this study are indicated in black This map suggests there are hotspots of 2c diversity

in regions not covered in this study (eg Europe) Panel D indicates the potential distribution of 2d viruses based on the distribution of pteropid bats The map suggests

these viruses may have a more limited distribution compared with viruses of other sub-clades Locations of 2d-positive animals identified in this study are shown in

black

S J Anthony et al | 11

distinct communities by region echoing the distribution of batsand suggesting an ecological dependence on their hosts Andthird we identified particular associations between viral sub-clade and bat family indicating that CoVs have evolved with (oradapted to) preferred families Collectively these data showthat the global diversity and distribution of CoVs in bats is non-random and is driven by variation in the biogeography of bats

Regional variation was also observed in the proportion ofhost switching events relative to other evolutionary mecha-nisms such as co-speciation duplication or sharing (seeMethods for definitions) Overall host switching was the domi-nant mechanism supporting the general trend observed forCoVs (Vijaykrishna et al 2007 Woo et al 2009 Lau et al 2012) aswell as similar findings for other viral families including para-mxyoviruses (Melade et al 2016) hantaviruses (Ramsden et al2009) and arenaviruses (Coulibaly-NrsquoGolo et al 2011 Irwin et al2012) However when analyzed by region there were propor-tionally fewer events in Latin America compared with Africa orAsia Given that host switching is the first critical step in zoo-notic emergence (Li et al 2005Pfefferle et al 2009 Woo et al2012 Azhar et al 2014) we suggest this finding could reflect re-gional differences in the risk of disease emergence

It is important to note that we distinguish lsquohost switchingrsquo(inter-genus or family switching) from lsquosharingrsquo (intra-genusswitching) in our analysis and that while the proportion of hostswitches was lower in Latin America the proportion of sharingevents was concomitantly higher This indicates that CoVs inthis region still switch hosts just like they do in Africa and Asiabut that they preferentially move between closely related spe-cies While we do not yet understand the factors driving this dif-ference if we assume that the risk of spillover to humansreflects the taxonomic distances that CoVs are able to jump(Kreuder Johnson et al 2015) our results would indicate ahigher risk for zoonotic emergence in Africa and Asia [acknowl-edging that host switching is not the only important factor driv-ing zoonotic emergence (Jones et al 2008 Keesing et al 2010Karesh et al 2012)] While evidence of regional variation in hostswitching has not been reported previously for viruses (toour knowledge) similar patterns have been observed inhemosporidian parasites (Ellis et al 2015)

In total we estimated that there are at least 3204 CoVs(based on the cluster definition used in this study) in bats Thissuggests that although there is still a large number of coronavi-ruses that remain to be discovered their detection remains anachievable goal and we advocate for their discovery given thepotential for even greater insights into the global ecology andevolution of these viruses (eg further investigation into re-gional differences in host switching) and the opportunity toelucidate their individual zoonotic potential (Ge et al 2013Wang et al 2014 Yang et al 2014 Anthony et al 2017) Buildingon our study we suggest that future surveillance efforts shouldconsider the number of samples carefully and include up to 400individuals (of either sex) per species in order to maximize thechance of detecting all CoVs and that including less than 154individuals per species might reduce the chance of finding evenone virus and would likely produce a poor return on invest-ment We also recommend that feces (or rectal swabs) followedby saliva (or oral swabs) should be prioritized if resources arelimited and all sample types cannot be processed and thatsampling should be biased towards immature animals and inthe dry season

To predict where this unknown diversity is likely to be weplotted the known distribution of all bat families associatedwith each CoV sub-clade The 2b SARS clade was associated

with rhinolophid and hipposiderid bats so we plotted the distri-bution of all bat species belonging to these families to generatea map of viral diversity (richness) Based on this map we wouldexpect the greatest diversity of unknown 2b viruses to be foundin South East Asia which is consistent with previous studies de-scribing 2b viruses in bats but also in isolated pockets through-out Africa In contrast the 2c MERS clade was associated withvespertilionid bats predicting that 2c diversity will be highest inMexico Europe Central and South East Africa and parts ofSouth East Asia Again this is consistent with the distributionof 2c viruses reported here and previously in the literature(Woo et al 2007 Reusken et al 2010 Anthony et al 2012 Lelliet al 2013 Corman et al 2014ab Anthony et al 2017 )

Certain limitations should be considered in the interpreta-tion of our data First the diversity of CoV sequences detected isalmost certainly biased by our sampling effort A small numberof species were sampled abundantly thus maximizing ourchances of finding a CoV (27282 species had gt110 individuals)however most were under-sampled (232282 with50 individ-uals) While this biases the probability of finding a CoV towardsthe more abundantly sampled species we stress that it also re-flects the uneven distribution of naturally occurring communi-ties (Magurran 2011) and that targeting the very rare specieswould be prohibitively expensive It is also unclear if populationsize has an effect on the number of viruses present in a spe-ciesmdasheg do larger populations accommodate more viral diver-sity Second our analysis only represents the lsquolastrsquo evolutionaryevent identified It does not represent or account for the com-plete evolutionary history of these lineages Therefore while asingle event has been ascribed to each association (seelsquoMethodsrsquo) it does not preclude other events having an equal orgreater impact in the past We are limited to using this most re-cent event type in order to assign a single evolutionary event toeach unique association It should also be noted that events aredependent on the relationships observed in the reconstructedtrees and could change as more viruses are added or longer se-quences are used Third we have only generated partial se-quences for analysis which limits our ability to comment onthe specific zoonotic potential of each virus or provide a com-prehensive analysis of their genetic histories and taxonomy forexample estimating the frequency or impact of recombinationwhich can be an important driver of host switching and diseaseemergence (Anthony et al 2017) We certainly support full-genome sequencing wherever possible [and have such effortsunderway (Anthony et al 2017)] however difficulties in virusisolation or sequencing directly from swabs (where materialand viral load can be low yielding variable results) can oftenpreclude extending sequences much beyond the short frag-ments amplified by consensus PCR (Drexler et al 2014) Equallylogistic and permitting issues in moving biological samplesacross borders and the need to first develop in-country capacityto fully sequence these viruses have all limited our ability tocharacterize (most of) these viruses further

Our lsquostate of the artrsquo is in the unique scope of our study focus-ing on a single viral family at a global scale To achieve this theUSAID PREDICT project first had to establish a strategy for viraldiscovery in twenty different countries many of which had noprior capacity to do this work at all For this reason we used asimple yet highly implementable approach based on cPCRInitially this approach might not appear to be as powerful ashigh throughput sequencing (HTS) techniques however thisstudy has demonstrated the particular and considerablestrengths of cPCR as an affordable screening and discovery tool inresource-poor settings where HTS is still largely unsupported

12 | Virus Evolution 2017 Vol 3 No 1

Studying viral diversity on a global scale is just one compo-nent of a larger strategy to understand the factors driving viraldiversity Ecological and evolutionary mechanisms can contrib-ute differently across scales and we therefore advocate compli-menting the global perspective with studies that focus on entireviral communities (rather than only one viral family) within sin-gle individuals or host species (rather than only on a globalscale) (Anthony et al 2015) We propose that it is critical to un-derstand both the component parts of the lsquozoonotic poolrsquo(Morse et al 2012) and the nature of their interactions at differ-ent scales if we are to move towards better predictive modelsand ultimately to reducing zoonotic emergence

Finally we offer a specific comment regarding the publichealth threat posed by bats While it is tempting to concludethat bats harbor a large number of potentially zoonotic CoVsmost of the putative viruses detected in this study are unlikelyto pose any threat to humansmdasheither because they lack the bio-logical pre-requisites to infect human cells or because the ecol-ogy of their hosts limits the opportunity for spillover Studiessuch as this are intended to advance our understanding of thefundamental biology of viruses not to create alarm or incite theretaliatory culling of bats Indeed such actions often have un-anticipated consequences and can even enhance disease trans-mission as seen with rabies in vampire bats (Streicker et al2012 Blackwood et al 2013) Bats are important insectivorespollinators and seed dispersers and we underscore both their vi-tal ecosystem role and the need to consider any public healthinterventions carefully

Supplementary data

Supplementary data are available at Virus Evolution online

Conflict of interest None declared

Acknowledgements

This study was made possible by the generous support ofthe American people through the United States Agency forInternational Development (USAID) Emerging PandemicThreats PREDICT project (cooperative agreement numberGHN-A-OO-09-00010-00) We thank the governments ofCameroon Gabon Democratic Republic of Congo Republicof Congo Rwanda Tanzania Uganda Peru Bolivia BrazilMexico Bangladesh Cambodia China Indonesia LaosMalaysia Nepal Thailand and Viet Nam for permission toconduct this study and the field teams and collaboratinglaboratories that performed sample collection and testing

ReferencesAnthony S J et al (2012) lsquoCoronaviruses in bats from Mexicorsquo

Journal of General Virology 94 1028ndash38 et al (2015) lsquoNon-random patterns in viral diversityrsquo

Nature Communications 6 8147 et al (2017) lsquoFurther evidence for bats as the evolutionary

source of MERS coronavirusrsquo mBio 82 pii e00373-17 doi101128mBio00373-17

Azhar E I et al (2014) lsquoEvidence for camel-to-human transmis-sion of MERS coronavirusrsquo The New England Journal of Medicine370 2499ndash505

Blackwood J C et al (2013) lsquoResolving the roles of immunitypathogenesis and immigration for rabies persistence in

vampire batsrsquo Proceedings of the National Academy of Sciences ofthe United States of America 110 20837ndash42

Burnham K P and Anderson D R (2002) Model Selection andMultimodel Inference A Practical Information-Theoretic ApproachNew York Springer

Charleston M A (1998) lsquoJungles a new solution to the hostpar-asite phylogeny reconciliation problemrsquo MathematicalBiosciences 149 191ndash223

Conow C et al (2010) lsquoJane a new tool for the cophylogeny recon-struction problemrsquo Algorithms for Molecular Biology AMB 5 16

Corman V M et al (2014a) lsquoRooting the phylogenetic tree ofmiddle east respiratory syndrome coronavirus by characteri-zation of a conspecific virus from an African batrsquo Journal ofVirology 88 11297ndash303

et al (2014b) lsquoCharacterization of a novel betacoronavirusrelated to middle East respiratory syndrome coronavirus inEuropean hedgehogsrsquo Journal of Virology 88 717ndash24

Coulibaly-NrsquoGolo D et al (2011) lsquoNovel arenavirus sequences inHylomyscus sp and Mus (Nannomys) setulosus from CotedrsquoIvoire implications for evolution of arenaviruses in AfricarsquoPLoS One 6 e20893

Darriba D et al (2012) lsquojModelTest 2 more models new heuris-tics and parallel computingrsquo Nature Methods 9 772

de Groot R J et al (2009) lsquoVirus taxonomy classification and no-menclature of virusesrsquo in A M Q King M J Adams E BCarstens and J Lefkkowitch (eds) Ninth Report of theInternational Committee for the Taxonomy of Viruses AmsterdamElsevier

Donaldson E F et al (2010) lsquoMetagenomic analysis of theviromes of three North American bat species viral diversityamong different bat species that share a common habitatrsquoJournal of Virology 84 13004ndash18

Drexler J F Corman V M and Drosten C (2014) lsquoEcology evo-lution and classification of bat coronaviruses in the aftermathof SARSrsquo Antiviral Research 101 45ndash56

Drosten C et al (2003) lsquoIdentification of a novel coronavirus inpatients with severe acute respiratory syndromersquo The NewEngland Journal of Medicine 348 1967ndash76

Edgar R C (2004) lsquoMUSCLE multiple sequence alignment withhigh accuracy and high throughputrsquo Nucleic Acids Research 321792ndash7

Ellis V A et al (2015) lsquoLocal host specialization host-switchingand dispersal shape the regional distributions of avian haemo-sporidian parasitesrsquo Proceedings of the National Academy ofSciences of the United States of America 112 11294ndash9

ESRI (2011) ArcGIS Desktop Release 10 Redlands CAEnvironmental Systems Research Institute

Ge X Y et al (2013) lsquoIsolation and characterization of a batSARS-like coronavirus that uses the ACE2 receptorrsquo Nature503 535ndash8

Hijmans R J Williams E Vennes C (2015) GeosphereSpherical Trigonometry v R Package version 15

Hagberg A A Schult D A and Swart P J (2008) lsquoExploring net-work structure dynamics and function using NetworkXrsquo in GVaroquaux T Vaught and J Millman (eds) Proceedings of 7thPython in Science Conference (SciPy2008) pp 11ndash15 Pasadena CA

Hu B et al (2015) lsquoBat origin of human coronavirusesrsquo VirologyJournal 12 221

Huang Y W et al (2013) lsquoOrigin evolution and genotyping ofemergent porcine epidemic diarrhea virus strains in theUnited Statesrsquo mBio 4 e00737ndash13

Huynh J et al (2012) lsquoEvidence supporting a zoonotic origin ofhuman coronavirus strain NL63rsquo Journal of Virology 8612816ndash25

S J Anthony et al | 13

Irwin N R et al (2012) lsquoComplex patterns of host switching inNew World arenavirusesrsquo Molecular Ecology 21 4137ndash50

Jacomy M et al (2014) lsquoForceAtlas2 a continuous graph layoutalgorithm for handy network visualization designed for theGephi softwarersquo PLoS One 9 e98679

Jones K E et al (2008) lsquoGlobal trends in emerging infectious dis-easesrsquo Nature 451 990ndash3

Jost L Chao A and Chazdon R L (2011) lsquoCompositional simi-larity and beta diversityrsquo in A E Magurran and B J McGill(eds) Biological Diversity Frontiers in Measurement andAssessment Oxford Oxford University Press

Karesh W B et al (2012) lsquoEcology of zoonoses natural and un-natural historiesrsquo Lancet 380 1936ndash45

Keesing F et al (2010) lsquoImpacts of biodiversity on the emergenceand transmission of infectious diseasesrsquo Nature 468 647ndash52

King A M Q et al (2012) Virus Taxonomy Classification andNomenclature of Viruses Ninth Report of the InternationalCommittee on Taxonomy of Viruses Elsevier Academic Press

Kreuder Johnson C et al (2015) lsquoSpillover and pandemic proper-ties of zoonotic viruses with high host plasticityrsquo ScientificReports 5 14830

Ksiazek T G et al (2003) lsquoA novel coronavirus associated withsevere acute respiratory syndromersquo The New England Journal ofMedicine 348 1953ndash66

Lau S K et al (2005) lsquoSevere acute respiratory syndromecoronavirus-like virus in Chinese horseshoe batsrsquo Proceedingsof the National Academy of Sciences of the United States of America102 14040ndash5

et al (2012) lsquoRecent transmission of a novel alphacoronavi-rus bat coronavirus HKU10 from Leschenaultrsquos rousettes topomona leaf-nosed bats first evidence of interspecies trans-mission of coronavirus between bats of different subordersrsquoJournal of Virology 86 11906ndash18

et al (2015) lsquoDiscovery of a novel coronavirus ChinaRattus coronavirus HKU24 from Norway rats supports the mu-rine origin of Betacoronavirus 1 and has implications for theancestor of Betacoronavirus lineage Arsquo Journal of Virology 893076ndash92

Lelli D et al (2013) lsquoDetection of coronaviruses in bats of variousspecies in Italyrsquo Viruses 5 2679ndash89

Li W et al (2005) lsquoBats are natural reservoirs of SARS-like coro-navirusesrsquo Science 310 676ndash9

Liang K and Zeger S L (1986) lsquoLongitudinal data analysis usinggeneralized linear modelsrsquo Biometrika 73 13ndash22

Maes P et al (2009) lsquoA proposal for new criteria for the classifica-tion of hantaviruses based on S and M segment protein se-quencesrsquo Infection Genetics and Evolution Journal of MolecularEpidemiology and Evolutionary Genetics in Infectious Diseases 9813ndash20

Magurran A E (2011) Biological Diversity Frontiers in Measurementand Assessment Oxford Oxford University Press

Melade J et al (2016) lsquoAn eco-epidemiological study of Morbilli-related paramyxovirus infection in Madagascar bats revealshost-switching as the dominant macro-evolutionary mecha-nismrsquo Scientific Reports 6 23752

Merkle D Middendorf M and Wieseke N (2010) lsquoA parameter-adaptive dynamic programming approach for inferring cophy-logeniesrsquo BMC Bioinformatics 11 Suppl 1 S60

Morse S S et al (2012) lsquoPrediction and prevention of the nextpandemic zoonosisrsquo The Lancet 380 1956ndash65

Pfefferle S et al (2009) lsquoDistant relatives of severe acute respira-tory syndrome coronavirus and close relatives of human

coronavirus 229E in bats Ghanarsquo Emerging Infectious Diseases15 1377ndash84

PREDICT_Consortium (2014) Reducing Pandemic Risk PromotingGlobal Health One Health Institute University of CaliforniaDavis Davis CA

Quan P L et al (2010) lsquoIdentification of a severe acute respira-tory syndrome coronavirus-like virus in a leaf-nosed bat inNigeriarsquo mBio 1 pii e00208-10 doi101128mBio00208-10

R Foundation for Statistical Computing (2008) R A Language andEnvironment for Statistical Computing Vienna Austria RFoundation for Statistical Computing

Ramsden C Holmes E C and Charleston M A (2009)lsquoHantavirus evolution in relation to its rodent and insectivorehosts no evidence for codivergencersquo Molecular Biology andEvolution 26 143ndash53

Reusken C B et al (2010) lsquoCirculation of group 2 coronavirusesin a bat species common to urban areas in Western EuropersquoVector Borne Zoonotic Dis 10 785ndash91

et al (2013) lsquoMiddle East respiratory syndrome coronavirusneutralising serum antibodies in dromedary camels a com-parative serological studyrsquo The Lancet Infectious Diseases 13859ndash66

Rubin D B (1987) Multiple Imputation for Nonresponse in SurveysNew York John Wiley and Sons

Sabir J S et al (2016) lsquoCo-circulation of three camel coronavirusspecies and recombination of MERS-CoVs in Saudi ArabiarsquoScience 351 81ndash4

Streicker D G et al (2012) lsquoEcological and anthropogenic driversof rabies exposure in vampire bats implications for transmis-sion and controlrsquo Proceedings Biological Sciencesthe RoyalSociety 279 3384ndash92

Tamura K et al (2013) lsquoMEGA6 molecular evolutionary geneticsanalysis version 60rsquo Molecular Biology and Evolution 30 2725ndash9

Tang X C et al (2006) lsquoPrevalence and genetic diversity of coro-naviruses in bats from Chinarsquo Journal of Virology 80 7481ndash90

Tsoleridis T et al (2016) lsquoDiscovery of novel alphacoronavirusesin European rodents and shrewsrsquo Viruses 8 84

Vijaykrishna D et al (2007) lsquoEvolutionary insights into the ecol-ogy of coronavirusesrsquo Journal of Virology 81 4012ndash20

Vijgen L et al (2005) lsquoComplete genomic sequence of human co-ronavirus OC43 molecular clock analysis suggests a relativelyrecent zoonotic coronavirus transmission eventrsquo Journal ofVirology 79 1595ndash604

Wacharapluesadee S et al (2015) lsquoDiversity of coronavirus inbats from Eastern Thailandrsquo Virology Journal 12 57

Wang Q et al (2014) lsquoBat origins of MERS-CoV supported by batcoronavirus HKU4 usage of human receptor CD26rsquo Cell Host ampMicrobe 16 328ndash37

Wang W et al (2015) lsquoDiscovery diversity and evolution ofnovel coronaviruses sampled from rodents in Chinarsquo Virology474 19ndash27

Watanabe S et al (2010) lsquoBat Coronaviruses and experimentalinfection of bats the Philippinesrsquo Emerging Infectious Diseases16 1217ndash23

Woo P C et al (2006) lsquoMolecular diversity of coronaviruses inbatsrsquo Virology 351 180ndash7

et al (2007) lsquoComparative analysis of twelve genomes ofthree novel group 2c and group 2d coronaviruses reveals uniquegroup and subgroup featuresrsquo Journal of Virology 81 1574ndash85

et al (2009) lsquoCoronavirus diversity phylogeny and inter-species jumpingrsquo Experimental Biology and Medicine 2341117ndash27

14 | Virus Evolution 2017 Vol 3 No 1

et al (2012) lsquoGenetic relatedness of the novel human groupC betacoronavirus to Tylonycteris bat coronavirus HKU4 andPipistrellus bat coronavirus HKU5rsquo Emerging Microbes andInfection 1 e35

Yang Y et al (2014) lsquoReceptor usage and cell entry of bat co-ronavirus HKU4 provide insight into bat-to-human

transmission of MERS coronavirusrsquo Proceedings of theNational Academy of Sciences of the United States of America111 12516ndash21

Zaki A M et al (2012) lsquoIsolation of a novel coronavirus from aman with pneumonia in Saudi Arabiarsquo The New England Journalof Medicine 367 1814ndash20

S J Anthony et al | 15

  • vex012-TF1
Page 2: Global patterns in coronavirus diversity · Since the emergence of Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV) and Middle East Respiratory Syndrom Coronavirus (MERS-CoV)

1 Introduction

In 20023 SARS-Coronavirus (SARS-CoV) emerged in theGuangdong province of southern China (Drosten et al 2003Ksiazek et al 2003) It quickly spread to twenty-seven countriesinfecting 8098 people with 774 deaths and was declared thefirst global pandemic of the 21st century Bats were identified asthe reservoir (Lau et al 2005 Li et al 2005) and probable source(Ge et al 2013) of the outbreak In 2012 the SARS pandemic wasfollowed by MERS-Coronavirus (MERS-CoV) which emerged inthe Middle East (Zaki et al 2012) with 1782 confirmed humancases and 640 deaths (as of September 2016) Camels were iden-tified as the likely source of human infections (Reusken et al2013 Azhar et al 2014) however bats were again found to hostclosely related (MERS-like) viruses and are therefore assumed tobe the original evolutionary source (Woo et al 2006 Anthonyet al 2012 Corman et al 2014ab Wacharapluesadee et al 2015Anthony et al 2017)

Together these outbreaks have cemented the coronaviridaeas a family of zoonotic concern and stimulated a surge in viraldiscovery efforts in bats [reviewed by Drexler et al (2014)]These efforts appear to show that almost all human CoVs havezoonotic origins or otherwise circulate in animals including hu-man 229E [bats (Pfefferle et al 2009) camels (Sabir et al 2016)]NL63 [bats (Donaldson et al 2010 Huynh et al 2012)] and OC43[cattle (Vijgen et al 2005)] Even non-human CoVs such as por-cine epidemic diarrhea virus (PEDV) may have emerged by hostswitching from other animals [bats (Tang et al 2006 Huanget al 2013)] Overall it seems that CoV diversity in bats is sub-stantial (Drexler et al 2014) that these viruses are prone to hostswitching (Woo et al 2009) and that they are a current historicand future threat to public health

Despite recent efforts there are several notable gaps in ourunderstanding of CoV diversity Foremost our knowledge ofCoVs in resource-limited countries is poor (Drexler et al 2014)This is particularly problematic given that many of these sameareas are predicted to be hotspots of disease emergence (Joneset al 2008) Second there has been little effort to understand theevolutionary and ecological drivers of CoV diversity on a globalscale or to evaluate host and regional variation in the factorsthat contribute to the risk of emergence (eg host switching)Indeed most studies to date have been somewhat limited in theirgeographic scope and have included only small sample sizeswith little epidemiologic or ecological context In short there is acritical need for a more global perspective on CoV diversity

In 2009 the PREDICT project was established in part to addressthis need Focused on strengthening capacity and identifying vi-ruses in wildlife at high-risk interfaces (PREDICT_Consortium2014) this USAID Emerging Pandemic Threats (EPT) initiativeworked with local partners in twenty countries across LatinAmerica Africa and Asia over 5 years to better understand the cur-rent diversity of CoVs (as well as other viruses) and evaluate thefactors that drive this diversity at different scales Herein we reporta large diversity of CoV sequences (mostly from bats) show thatthe biogeography of bats has shaped the diversity of CoVs globallyand provide evidence to suggest there could be regional variationin host switching and the risk for zoonotic emergence

2 Methods21 Animals and samples

Bats (nfrac14 12333) rodents (nfrac14 3387) and non-human primates(nfrac14 3470) were humanely sampled (capture and release) from

twenty lsquohotspotrsquo countries (Jones et al 2008) representing cen-tral Africa (Cameroon Gabon Democratic Republic of CongoRepublic of Congo Rwanda Tanzania Uganda) Latin America(Peru Bolivia Brazil Mexico) and Asia (Bangladesh CambodiaChina Indonesia Laos Malaysia Nepal Thailand Viet Nam)Samples were also collected from humans (nfrac14 1124) in sevencountries in central Africa and Asia as a pilot to begin to explorethe propensity for viral sharing with wildlife In all twenty coun-tries wildlife samples were collected from lsquohigh riskrsquo interfaceswhere direct or indirect contact with humans might promotezoonotic viral transmission The selected sampling sites in-cluded areas of land-use change (deforestation conversion toagriculture) sites in and around human dwellings foci of eco-tourism markets restaurants and farms along the animal valuechain and areas where occupational exposure was likely (ani-mal sanctuaries agricultural activities) When possible individ-uals were identified to lowest taxonomic order (genus andspecies) and assigned to an age class (adult subadult neonate)by the field teams All samples from swabs (eg oral urine rec-tal) fluids (eg saliva) and tissues were collected into (1)NucliSensVR Lysis Buffer (bioMerieux Inc Marcy-IrsquoEtoile France)and (2) viral transport media and then frozen in the field in liquidnitrogen and transferred to the laboratory for storage at 80 CAll animal sampling activities were conducted with permissionsfrom local authorities and under the Institutional Animal Careand Use Committee at the University of California Davis (proto-col number 16048) All human activities were reviewed and ap-proved by the UC Davis Institutional Review Board (IRB) underprotocols 215253 and 432330

22 Viral discovery

RNA was extracted from all samples and cDNA prepared usingsuperscript III (Invitrogen) Two broadly reactive consensus PCRassays targeting non-overlapping fragments of the orf1ab wereused to detect both known and novel CoVs (Quan et al 2010Watanabe et al 2010) The first [the lsquoWatanabersquo assay(Watanabe et al 2010)] amplified a 434 bp fragment of theRNA-dependent RNA polymerase (RdRp) corresponding to nu-cleotides (NTs) 15156ndash15589 in the human CoV OC43 genome(NC_005147) while the second [the lsquoQuanrsquo assay (Quan et al2010)] amplified a 332 bp fragment of a different peptide down-stream of the RdRp corresponding to NTs 18323ndash18654Amplified products of the expected size were cloned and se-quenced (traditional Sanger dideoxy sequencing) according tostandard protocols and sequences edited manually in GeneiousPro (ver 913 Biomatters Auckland NZ) We note that this ap-proach was adopted to facilitate viral discovery in resource-poor settings which are the target of pandemic preparednessinitiatives such as USAID PREDICT and where techniques suchas high throughput sequencing are largely unavailable

23 Phylogenetics

Coronavirus sequences from the Quan and Watanabe datasetswere aligned with reference sequences collected from GenBankusing MUSCLE (Edgar 2004) followed by manual alignment inSe-Al v20a11 (httptreebioedacuk) The best-fitting model ofnucleotide substitution for each dataset was selected byAkaikersquos Information Criterion (AIC) using jModeltest v215(Darriba et al 2012) For both alignments the general time re-versible model with a gamma distribution of rate heterogeneitywas the best-fitting model Maximum likelihood trees were con-structed from each alignment using MEGA v606 (Tamura et al

2 | Virus Evolution 2017 Vol 3 No 1

2013) Histograms of all pairwise sequence identities () wereplotted as described previously for hantaviruses (Maes et al2009) and used to define cut-offs between viral sequence clus-ters for analysis (akin to operating taxonomic units)

24 Diversity measures

The Earthrsquos surface was divided into grid cells each 10 latitudeby 10 longitude Thirty-four of these grid cells included areasfrom which bats were sampled (bats were identified down tothe species level in thirty cells) and twenty-seven cells hadsamples testing positive for CoVs If the coordinates listed for aparticular bat were on the line between grid cells it was arbi-trarily included in the grid cell with higher latitudelongitudeViral and host species richness for each cell was calculated bydetermining the total number of unique virus and bat specieswithin each cell Alpha diversity was calculated using theShannon index (Jost et al 2011) which was chosen over theGinindashSimpson index because it does not place disproportionateweight on dominant species Since we do not know how repre-sentative our sampling was the Shannon index was deemedthe best approach The effective number of viruses and bats ineach cell was then correlated using Kendallrsquos tau since the val-ues were not normally distributed Sensitivity analyses wereperformed by shifting the grid cells in 1 increments and re-calculating richness and effective species numbers

Matrices of beta diversity between each combination of gridcells were developed using the Jaccard index (Jost et al 2011)and Mantel tests were used to determine whether the distancebetween grids was associated with the dissimilarity in virus andbat species Geographic distance matrices were calculated byfinding the centroid of each grid cell and calculating the geode-sic distance between each pair using the R packagelsquogeospherersquo(Geosphere 2015) and tests were performed usingSpearmanrsquos rank correlation All analyses of diversity were con-ducted in R version 232

25 Network model

A presenceabsence matrix was constructed to show the distri-bution of viral sequence clusters across all species Using thePython package NetworkX (Hagberg et al 2008) a networkmodel was constructed connecting bat species to all viral clus-ters that were identified within that species in our data Thenetwork is made up of thirty-one connected componentseighty-two viral clusters eighty-five bat species and 159 edgesNetworks were plotted using Gephi using the force-directed al-gorithm ForceAtlas2 (Jacomy et al 2014) Specifically we plottedtwo bipartite graphs one where hosts are colored by region andone in which they are colored by family

26 Cophylogeny

The Jane (Conow et al 2010) software tool was used for cophylo-genetic reconstructions This approach applies an a priori lsquocostschemersquo to different evolutionary events in order to test the de-gree of congruence between two trees (virus and host) Theselsquoeventsrsquo include cospeciation duplication host switching andfailure to diverge (or sorting) (Charleston 1998 Conow et al2010) which are used to create a minimal-cost reconstruction ofthe evolutionary history between the viruses and hostsAnother cophylogeny program CoRe-PA (Merkle et al 2010)uses a parameter-adaptive approach to estimate an appropriatecost scheme without any prior value assignment For our analy-sis we used Jane with the default pre-determined cost scheme

where cospeciation is the null hypothesis and therefore thelsquocheapestrsquo event with the assumption that host switchingwould theoretically be more costly than genetic drift within aspecies to which a virus is already adapted Analysis was re-peated with varied cost values in Jane to test for sensitivity toour cost scheme as well as in CoRe-PA for robustness andevent types and costs from each reconstruction were comparedbetween the two programs As these changes ultimately did notsignificantly influence the outcome of the analysis exact eventassignments used in the final analysis were inferred solely fromJane with the default cost scheme

The analyses were performed using all unique associationsbetween viruses and their respective hosts (limited to bats)Alpha-CoV sequences and beta-CoV sequences were analyzedseparately since each appears to have diversified indepen-dently within bats This gave us a total of four reconstructions(alpha-Quan alpha-Watanabe beta-Quan beta-Watanabe)Only one instance of each virusndashhost association was includedeven if multiple instances of the same association were ob-served (ie our analysis does not account for the frequency ofdetection) Associations were excluded if the host was not iden-tified to the species level or if the cytochrome B sequence didnot exist in GenBank or could not be amplified by PCR The to-pology of the virus tree was inferred from the topology of thefull tree for consistency since subsets of sequences often pro-duce different arrangements

For each reconstruction the most recent evolutionary eventleading to a virusndashhost association was recorded Given thatsorting is particularly sensitive to missing data (it indicates aloss on the tree when in fact it may simply be an artifact ofunder-sampling) we have excluded this event from our analy-sis We also distinguished between two potential types of hostswitching based on an assumption that there could be differ-ences in zoonotic risk between viruses that only move betweenclosely related bats (eg those within the same genus) and thosethat are able to make more substantive jumps into distantly re-lated species (those in other host genera or families) We there-fore use lsquohost switchingrsquo to refer specifically to viral lineagesthat have moved into a host belonging to another genus or fam-ily and lsquosharingrsquo to indicate a viral lineage that has moved intoa different host species of the same genus (without speciating)We selected host genus as the cut-off to be consistent with cur-rent conventions of taxonomic hierarchy and verified that thenumber of species per genus is relatively consistent between re-gions We further verified that the mean genetic distance be-tween host species is consistent between regions (ie that thetaxonomic distance a virus has to navigate when moving be-tween any two species is roughly similar in all three regions)

A binary regression model with logistic link function wasused to investigate whether the degree of host switching variedby region The dependent variable was host switching and ei-ther cospeciation or sharing was the reference group Regionwas the independent variable We used the generalized estimat-ing equations (GEE) (Liang and Zeger 1986) method in order toaccount for multiple event types within one family and derive arobust estimate of the standard errors Event data for all four re-constructions were aggregated for the analysis Where hostndashvi-rus associations were duplicated one was removed If theduplicated event types did not agree [which can occur given thesensitivity of these methods to variation in topology and num-ber of taxa in each tree (Conow et al 2010)] we randomly se-lected one of the two possibilities and repeated therandomization 100 times The GEE estimates from the 100 ran-dom iterations were then combined using rules established by

S J Anthony et al | 3

Rubin (1987) The analysis was repeated to explore variationbased on bat family (independent variable) Some families onlypossessed one type of event which would introduce quasi-complete separation into the GEE model so they were excludedfrom the corresponding models However the numbers of batspecies within these excluded families were very small (lt5)compared to the overall number of bat species for the wholestudy so we felt this would not impact any overall trends

27 Viral discovery effort

The relationship between sampling effort and viral discoverywas evaluated using a log-link Poisson regression model fittedwith the count of viruses for each species as the dependent vari-able and the number of animals sampled as the independentvariable This model is universally applied with count data Thecoefficient of the sample number variable is the log of the ratioof the expected number of viruses with (nthorn 1) samples collectedto the expected number of viruses with n samples collectedwhere n can be any arbitrary positive integer Using this esti-mated coefficient we calculated the estimated numbers of vi-ruses (viral sequence clusters) that can be found with respect tothe sampling effort (ie all possible numbers of collected ani-mals within a species) and they were then used to plot the esti-mated curve together with its 95 confidence band We clarifythat we use the term lsquoexpectedrsquo to indicate the statistical expec-tation based on the estimate of the regression model We ac-knowledge that the model fit is not as good as the model withan additional quadratic term of the independent variable none-theless it avoids the problem of over-fitting and realistically re-flects the relationship between sampling effort and viraldiscovery when the number of animals collected is relativelylower

28 Factors associated with CoV positivity

Due to the small numbers of positive individuals obtained forrodents non-human primates and humans only bat data wereanalyzed for factors associated with positivity A total of 12333bats were tested This included 5624 males and 5767 females(922 were not sexed) Among aged bats 7385 were adults 1029were sub-adults and 8 were neonates (3911 were not aged)Binary logistic regression models were used to evaluate signifi-cant variables restricting analyses to species for which 50 in-dividuals were tested Model selection was first performed in R(version 325) (R Foundation for Statistical Computing 2008) us-ing the R package lsquogmultirsquo and the step function and the bestmodel determined by AIC A robust estimation of standard errorwas calculated by clustering by individual Animal Identificationnumber to account for non-independence of multiple tests fromthe same animal using STATA 131 SE (College Station TXUSA) The independent variables included in analyses wereSpecimen Type (blood fecesrectal swabs guano oralrectalsample oralnasal swabs tissue urineurogenital swabs) HostTaxon (family sub-family genus) Host Age (adult sub-adultneonate) Season (wet dry) and Interface (animal use humanactivity land use pristine area) Season was designated as lsquowetrsquoor lsquodryrsquo according to month and proximity to the equator foreach country Interfaces were broadly grouped according to thetype and intensity of animal contact with people specifically (1)Land Use (animals sampled in areas with crops extractive in-dustries livestock activities) (2) Animal Use (hunting marketsrestaurants trade wild animal farms wildlife managementszoossanctuaries handling by veterinarianresearchers) (3)

Human Activities (ecotourism in and around human dwell-ings) and (4) Pristine (where animal human contact was notlikely) The model fit for each region was evaluated using theHosmerndashLemeshow goodness-of-fit test and pseudo R2 values

29 Virus lsquoHotspotrsquo Maps

The number of viral sequence clusters (richness) in each sub-clade was summarized by host family (Supplementary TableS1) and chi square tests used to evaluate whether viral sub-clade and host family were independent of each other Bat spe-cies richness maps were then created to infer hotspots of CoVdiversity For each viral clade spatial distribution data on all batspecies belonging to associated families was obtained from theInternational Union for Conservation of Nature (IUCN) UsingArcGIS version 10 (ESRI 2011) bat species were quantified bycounting overlapping polygons The resulting count data werethen converted to a raster and plotted All sampled bats as wellas all identified sequence clusters within the clade in questionwere then plotted over the raster data

3 Results31 Global diversity of CoVs

A total of 19192 animals and humans were assayed for thepresence of CoV by consensus PCR (cPCR) (Table 1) The majoritywere bats (nfrac14 12333) representing 282 species from twelvefamilies Overall the proportion of CoV positive individuals was86 in bats (nfrac14 106512333) and 02 in non-bats (nfrac14 176859) In other words over 98 of all positive individuals werebats

Partial sequences were obtained from two non-overlappingfragments of the orf1ab gene yielding 654 sequences for thelsquoQuanrsquo region and 950 for the lsquoWatanabersquo region (see lsquoMethodsrsquo)There was a 27 overlap where sequences were obtained for bothregions from the same sample The distribution of pairwise se-quence identities defined a 90 cut-off between taxa (Fig 1) andthe resulting monophyletic groups (in which all sequenceshad90 identity) used as our operating taxonomic units Groupsthat shared less than 90 identity to a known sequence were la-beled sequentially as PREDICT_CoV-1 -2 -3 etc while groups shar-ing90 identity to a sequence already in GenBank wereconsidered to be strains of a known virus and assigned the samename as the matching sequence (eg Kenya_bat_CoV_KY33 orHKU-9) Sequences that sharedgt90 identity but were found indifferent hosts were considered part of the same taxonomic unitBased on these criteria 100 discrete viral taxa were identifiedninety-one of which were found in bats (Table 1 Fig 2)Importantly we make no claims that these groups correspond tolsquospeciesrsquo as full orf1ab replicase sequences would be required forthat (King et al 2012) Instead we clarify that we are using thesepartial fragments to cluster sequences into lsquooperational taxonomicunitsrsquo (analogous to OTU clustering in microbiome research) Wealso stress that our cut-off was determined based on two discreteregions within the orf1ab separated by gt3000 nucleotides andthat these regions correspond to unique peptides post-transla-tional cleavage

Of note was the detection of subgroup 2a CoV sequences inbats humans and non-human primates from four differentcountries This sub-clade is largely considered the rodent sub-clade (Lau et al 2015 Wang et al 2015 Tsoleridis et al 2016)but here we detected sequences corresponding to the known vi-rus betacoronavirus-1 in Pteropus medius from Bangladesh

4 | Virus Evolution 2017 Vol 3 No 1

Pteropus alecto from Indonesia and from humans in China andsequences corresponding to murine coronavirus in non-humanprimates from Nepal (these sequences were detected by fourdifferent laboratories and no rodent samples were processed atthe same time) (Fig 2) In addition we report the discovery ofseveral sub-clade 2b and 2c CoV sequence clusters (the SARSand MERS sub-clades respectively) (Fig 2) and the detection ofhuman coronavirus 229E-like sequences in hipposideros andrhinolophus bats sampled in ROC Uganda Cameroon andGabon supporting previous suggestions that 229E has zoonoticorigins (Pfefferle et al 2009 Hu et al 2015) Finally we highlightthe detection of avian-associated infectious bronchitis virus-like sequences (IBV) in bats and a closely related virus(PREDICT_CoV-49) in non-human primates from Bangladesh aswell as porcine epidemic diarrhea virus (PEDV) in bats (Fig 2)Again we confirm that no livestock samples were processed inany of these laboratories that might have acted as a source ofcontamination

32 Factors driving viral diversity

Viral a-diversity (the lsquoeffective numberrsquo of species) was signifi-cantly correlated with bat a-diversity (taufrac14 0325 Pfrac14 0022) (Fig3) suggesting that more viruses will be found in regions wherebat diversity is higher This association was maintained whenviral richness rather than effective number of species wasused as the diversity index (taufrac14 0421 Plt 0001) Viral and batb-diversity were also significantly correlated (Mantel testrhofrac14 0575 Plt 0001) In both cases diversity differentiated al-most entirely into three discrete communities by region (Fig 3)Only Africa and Asia shared any viral sequence clusters(PREDICT_CoV-35 and HKU9 Fig 4) demonstrating that bats

have had a strong biogeographic influence on the global ecologyand evolution of CoVs

Co-phylogenetic reconciliation analysis was used to investi-gate the evolutionary mechanisms driving the virusndashhost asso-ciations observed for example host switching viral sharing orco-speciation (see lsquoMethodsrsquo) Overall host switching was thedominant mechanism followed by co-speciation (Fig 5) Whenanalyzed by region host switching (inter-genus transmission)remained dominant in Africa and Asia but not in LatinAmerica Despite fewer host-switching events in Latin Americathere was a concomitant increase in virus sharing (intra-genustransmission) This suggests that viruses are still switchinghosts in Latin America but preferentially move between closelyrelated species while in Africa and Asia viruses cross betweenmore distantly related species To assess the sensitivity of ourresults to our chosen method (ie use of a pre-defined costscheme in the program Jane) we repeated the analysis for theQuan alpha-CoV and beta-CoV reconstructions using CoRePA aparameter-adaptive approach that estimates an appropriatecost scheme without any prior value assignment Comparingthe results we noted 91 agreement between Jane and CoRePAin alpha-CoV reconstruction (thirty-four events of which thirty-one agreed) and 100 agreement in the beta-CoV reconstruc-tions (twenty-six events all of which agreed) We further notethat the cost scheme calculated in CoRePA was 07 for hostswitching and 01 for co-speciationmdashsupporting the default costscheme used in Jane (ie that host switching should be consid-ered more lsquocostlyrsquo than co-speciation)

A bipartite network model connecting viral sequence clus-ters to their hosts supports these results illustrating that batCoVs were connected to several host families in Africa and Asiawhile being largely restricted to a single bat family in Latin

Table 1 Summary of individuals tested and number positive for at least one coronavirus by host taxa

Host taxa tested No individuals tested No individuals positive No distinct viruses detected

Bats 12333 1065 91Non-Human Primates 3470 4 2Rodents and Shrews 3387 11 7Humans 1124 2 2Total 19192 1082 100a

aNote Numbers do not total as two viruses were detected in two taxa

Figure 1 Histogram of the relative frequency of pairwise sequence identities used to define the cutoff between operating taxonomic units for all CoV sequences de-

tected A bimodal distribution was observed and a cutoff of 90 sequence identity used to separate sequences from both the Watanabe (Panel A) and Quan (Panel B) as-

says into discrete viral taxa

S J Anthony et al | 5

Table 2 Variables associated with coronavirus positive bat tests in Africa Asia and Latin America Regional datasets for each logistic regres-sion model consisted of bat species where greater than fifty bats were tested Age datasets were a subset of the regional datasets as age classwas not determined for all individuals Numbers in bold are statically significant for Plt 005

Odds ratio P 95 Confidence intervals

AfricaSample type Lower Upper

Fecesrectal swab 35098 lt0001 14664 84007Oralnasal swab 505 lt0001 203 1254Oralrectal swab 3098 lt0001 1346 7130Blood 165 0540 033 809Tissue 100

Host familyHipposideridae 122 0559 062 241Pteropodidae 384 lt0001 227 649Molossidae 100

SeasonDry 242 lt0001 179 325Wet 100

InterfaceAnimal use 198 0001 135 291Pristine area 139 0624 037 529Land use 166 0219 074 374Human activity 100

AgeSubadult 591 lt0001 428 817Adult 100

AsiaSample type

Fecesrectal swab 6280 lt0001 1544 25549Oralnasal swab 1325 lt0001 311 5638Urineurogenital swab 4692 lt0001 1085 20280Guano 2212 lt0001 366 13356Tissue Oralrectal swab 100

Host familyEmballonuridae 031 026 004 237Miniopteridae 978 lt0001 569 1681Pteropodidae 216 lt0001 129 362Rhinolophidae 108 082 057 203Vespertilionidae 367 lt0001 218 618Hipposideridae 100

SeasonDry 149 003 103 216Wet 100

InterfaceAnimal use 330 001 129 840Human activity 348 001 136 895Pristine area 181 035 052 635Land use 100

AgeSubadult 184 000 126 267Adult 100

Latin AmericaSample type

Fecesrectal swab 1566 0000 502 4879Oralrectal swab 2786 0000 686 11322Oralnasal swab 100

Host subfamilyCarolliinae 695 006 090 5376Stenodermatinae 504 012 067 3789Glossophaginae 100

InterfaceAnimal use 525 001 148 1865Human activity 273 012 077 969Land use 422 010 078 2295Pristine area 100

SeasonDry 130 052 058 289Wet 100

AgeSubadult 173 015 083 362Adult 100

6 | Virus Evolution 2017 Vol 3 No 1

America (Fig 4 Panel B) Further host switching was nearlyfour times more likely than virus sharing in Africa when com-pared with Latin America (OR 3858 Pfrac14 0040) CoVs in Asiawere also more likely to host switch than share when comparedwith Latin America however the results were not significant(OR 2474 Pfrac14 0143) No particular bat family was associatedwith increased host switching but this may reflect small sam-ple sizes when the data are summarized by family

33 Estimated number of CoVs in bats

A large number of host (bat) species in our study were negativefor CoV (nfrac14 197) however none of these species were sampledextensively (Fig 6) We found that all species with samplesizesgt110 individuals were positive for one or more CoVs sug-gesting that we would have detected CoVs in some of the nega-tive species in our study if sampling effort were increased Dueto the high number of these negative species and their effect onthe average number of virus sequence clusters per species theywere excluded from our estimates By retaining only those spe-cies for whichgt110 individuals were sampled (there weretwenty-seven species that qualified) we estimated the averagenumber of CoVs per species to be 267 (stdfrac14 138) thus account-ing for the likely scenario that some species will have more vi-ruses and others less We then extrapolated the average to all1200 bat species to estimate a total potential richness of 3204

CoVs (rangefrac14 1200ndash6000 CoVs) most of which have yet to bedescribed

34 Factors predicting CoV positivity

In order to refine future surveillance efforts to find the undis-covered diversity of CoVs in bats we evaluated the factors pre-dicting CoV positivity For each region the best model includedspecimen type bat family (or in the case of Latin America sub-family) season and animalndashhuman interface (Table 2) Seasonwas not included in the top model for Latin America but themodel with the season included was within two AIC (deltaAICfrac14 106) and therefore considered to have as much support asthe top model (Burnham and Anderson 2002) Specimen typewas highly associated with CoV positivity and samples contain-ing feces or fecal swabs were significantly more likely to testpositive than other specimen types in all regions Bat familywas important in Asia and Africa Season was also significantwith samples collected during the dry season more likely to testpositive in Africa and Asia than those collected during the wetseason (albeit with low odds ratios) Age class appears to be im-portant as sub-adults were more likely to test positive thanadults in Africa and Asia Sex was not related to CoV positivityin Asia or Latin America while males were slightly more likelyto be CoV positive in Africa (ORfrac14 15 12ndash19 CI Plt 0001) Broadcategories of animalndashhuman interfaces were significantly

Figure 2 Maximum likelihood phylogenetic reconstructions for all partial CoV RdRp fragments for both the Quan (Panel A) and Watanabe (Panel B) assays Sequences

are collapsed into clades representing our operating taxonomic units (sequences sharing90 identity) and number of sequences for each taxon is indicated in paren-

theses Representative published sequences from GenBank have been included for comparison (GenBank accession number indicated) Both trees were rooted using

the related Breda virus (NC_007447) and all nodes have 60 bootstrap support Pie charts indicate the distribution of viral taxa by host (bat) family in each virus sub-

clade

S J Anthony et al | 7

associated with CoV positivity among bats in all three regionsin Africa and Latin America bats sampled at the animal use in-terface category were more likely to be positive and in Asiabats sampled at the animal use and human activities interfaceswere more likely to be positive than those sampled at other in-terfaces The significance of the animal interface in all three

regions suggests that practices around animal use may be im-portant for disease transmission

35 Future sampling effort

Based on our study we were able to estimate the sampling ef-fort that would be required to find all CoVs in bats based on a

Figure 3 Comparison of viral and bat diversity The earthrsquos surface was divided into grid cells by latitude and longitude (10 10 degree units) for diversity calculations

(Panel A) Grid cells where bats were sampled are numbered in each region Alpha diversity (Shannon H) for virus (Panel B) and host (Panel C) were correlated indicat-

ing that areas of high bat diversity also have high viral diversity Darker cells indicate higher alpha diversity (ie more viral or host taxa) in each grid cell Beta diversity

was also correlated between virus (Panel D) and host (Panel E) and differentiated into three discrete communities by regionmdashLatin America (grid cells 1ndash10) Africa

(grid cells 11ndash20) and Asia (grid cells 21ndash34) Shading indicates that either viruses (in red) or hosts (in blue) are shared between two corresponding grid cells with darker

cells indicating higher pairwise similarity

8 | Virus Evolution 2017 Vol 3 No 1

Poisson regression model of our data (Fig 6) With 154 individ-uals we estimate that an average of one CoV will be detected(95 CI 136ndash177) With 397 individuals we estimate that upto five CoVs will be detected (95 CI 351ndash458) As we did not

observe any species with greater than five viral sequenceclusters we make the assumption that sampling 397 individ-uals should capture the full diversity of CoVs in each batspecies

Figure 4 Network model showing the connection of CoVs and their hosts Viral sequence clusters (colored grey) are connected to host species either by region (Panel

A) or family (Panel B) Viral and host and communities separate almost entirely by region only Africa and Asia are connected by two shared viruses (HKU9 and

PREDICT_CoV-35) found in species from both continents Networks also show that viruses appear to be shared by multiple host families in Africa and Asia while being

more restricted to a single family in Latin America

S J Anthony et al | 9

Figure 5 Relative proportion of evolutionary events leading to observed virushost associations for CoVs in bats Cophylogenetic reconstructions were used to identify

each event (Supplementary Fig S1 Panels AndashD) and significance evaluated by region Across all regions host switching was the dominant evolutionary event (Panel

A) When separated by region host switching remained dominant in Africa (Panel B) and Asia (Panel C) but not in Latin America (Panel D)

Figure 6 Relationship between sampling effort and viral detection among all bat species sampled A Poisson regression model was used to estimate the expected num-

ber of viruses based on the number of animals sampled by species (each circle indicates a separate species) The 95 confidence intervals are indicated

10 | Virus Evolution 2017 Vol 3 No 1

36 Predicting the distribution of unknown CoVs

Viral sub-clade and host family were not independent of eachother (v2 by cladefrac14 Plt 0001 Supplementary Table S1) demon-strating that different clades have significant associations withspecific bat families For example subgroup 2d CoVs were sig-nificantly associated with pteropid bats and were only identi-fied in regions where pteropid bats exist Likewise subgroup 2bCoVs were associated with rhinolophid and hipposiderid batsand 2c with vespertilionid bats To lsquopredictrsquo the potential distri-bution of unknown CoVs we plotted the known distribution ofbats belonging to these families and assume that lsquohotspotsrsquo ofbat diversity infer hotspots of viral diversity for each sub-clade(Fig 7) We note that the alphaCoV genus has been consideredas one group since they do not resolve well into sub-clades(de Groot et al 2009) and that no 2a map was generatedgiven that only one bat virus from this clade was identified

4 Discussion

The emergence of SARS and MERS has driven a need to under-stand more about the diversity ecology and evolution of coro-naviruses particularly at so-called lsquohotspotsrsquo of zoonoticemergence (Jones et al 2008 Drexler et al 2014) To this end wesurveyed the diversity of CoVs from twenty countries in LatinAmerica Africa and Asia to identify global factors driving viraldiversity and to look for regional differences in factors that con-tribute to the risk of emergence such as host switching In totalwe identified sequences from 100 discrete phylogenetic clus-ters ninety-one of which were found in bats

Our data suggest that the diversity of bat CoVs has beendriven primarily by host ecology First viral richness wasstrongly correlated with bat richness suggesting that mostCoVs will be found in regions where bat diversity is highestSecond we showed that CoV diversity separates into three

Figure 7 Viral diversity lsquohotspotrsquo maps Panel A shows the potential hotspots for CoVs based on the distribution of bats worldwide Location of alpha CoV sequences

from this study are shown in black and beta CoV sequences in blue indicating there is no geographical bias based on viral genus (ie alpha and beta CoVs are equally

likely to be found in all regions) Some viral sub-clades were associated with particular bat families and the spatial distribution data of all species belonging to these

families were plotted to indicate the potential hotspots of viral diversity (richness) for these sub-clades Panel B indicates the potential distribution of 2b CoVs based on

the distribution of rhinolphus and hipposideros bats Locations of 2b-positive animals identified in this study are indicated in black and correlate with areas of high

species richness (for these families) CoV-positive animals for other sub-clades shown in light blue Panel C indicates the potential distribution of 2c CoVs based on the

distribution of vespertilionid bats Locations of 2c-positive animals identified in this study are indicated in black This map suggests there are hotspots of 2c diversity

in regions not covered in this study (eg Europe) Panel D indicates the potential distribution of 2d viruses based on the distribution of pteropid bats The map suggests

these viruses may have a more limited distribution compared with viruses of other sub-clades Locations of 2d-positive animals identified in this study are shown in

black

S J Anthony et al | 11

distinct communities by region echoing the distribution of batsand suggesting an ecological dependence on their hosts Andthird we identified particular associations between viral sub-clade and bat family indicating that CoVs have evolved with (oradapted to) preferred families Collectively these data showthat the global diversity and distribution of CoVs in bats is non-random and is driven by variation in the biogeography of bats

Regional variation was also observed in the proportion ofhost switching events relative to other evolutionary mecha-nisms such as co-speciation duplication or sharing (seeMethods for definitions) Overall host switching was the domi-nant mechanism supporting the general trend observed forCoVs (Vijaykrishna et al 2007 Woo et al 2009 Lau et al 2012) aswell as similar findings for other viral families including para-mxyoviruses (Melade et al 2016) hantaviruses (Ramsden et al2009) and arenaviruses (Coulibaly-NrsquoGolo et al 2011 Irwin et al2012) However when analyzed by region there were propor-tionally fewer events in Latin America compared with Africa orAsia Given that host switching is the first critical step in zoo-notic emergence (Li et al 2005Pfefferle et al 2009 Woo et al2012 Azhar et al 2014) we suggest this finding could reflect re-gional differences in the risk of disease emergence

It is important to note that we distinguish lsquohost switchingrsquo(inter-genus or family switching) from lsquosharingrsquo (intra-genusswitching) in our analysis and that while the proportion of hostswitches was lower in Latin America the proportion of sharingevents was concomitantly higher This indicates that CoVs inthis region still switch hosts just like they do in Africa and Asiabut that they preferentially move between closely related spe-cies While we do not yet understand the factors driving this dif-ference if we assume that the risk of spillover to humansreflects the taxonomic distances that CoVs are able to jump(Kreuder Johnson et al 2015) our results would indicate ahigher risk for zoonotic emergence in Africa and Asia [acknowl-edging that host switching is not the only important factor driv-ing zoonotic emergence (Jones et al 2008 Keesing et al 2010Karesh et al 2012)] While evidence of regional variation in hostswitching has not been reported previously for viruses (toour knowledge) similar patterns have been observed inhemosporidian parasites (Ellis et al 2015)

In total we estimated that there are at least 3204 CoVs(based on the cluster definition used in this study) in bats Thissuggests that although there is still a large number of coronavi-ruses that remain to be discovered their detection remains anachievable goal and we advocate for their discovery given thepotential for even greater insights into the global ecology andevolution of these viruses (eg further investigation into re-gional differences in host switching) and the opportunity toelucidate their individual zoonotic potential (Ge et al 2013Wang et al 2014 Yang et al 2014 Anthony et al 2017) Buildingon our study we suggest that future surveillance efforts shouldconsider the number of samples carefully and include up to 400individuals (of either sex) per species in order to maximize thechance of detecting all CoVs and that including less than 154individuals per species might reduce the chance of finding evenone virus and would likely produce a poor return on invest-ment We also recommend that feces (or rectal swabs) followedby saliva (or oral swabs) should be prioritized if resources arelimited and all sample types cannot be processed and thatsampling should be biased towards immature animals and inthe dry season

To predict where this unknown diversity is likely to be weplotted the known distribution of all bat families associatedwith each CoV sub-clade The 2b SARS clade was associated

with rhinolophid and hipposiderid bats so we plotted the distri-bution of all bat species belonging to these families to generatea map of viral diversity (richness) Based on this map we wouldexpect the greatest diversity of unknown 2b viruses to be foundin South East Asia which is consistent with previous studies de-scribing 2b viruses in bats but also in isolated pockets through-out Africa In contrast the 2c MERS clade was associated withvespertilionid bats predicting that 2c diversity will be highest inMexico Europe Central and South East Africa and parts ofSouth East Asia Again this is consistent with the distributionof 2c viruses reported here and previously in the literature(Woo et al 2007 Reusken et al 2010 Anthony et al 2012 Lelliet al 2013 Corman et al 2014ab Anthony et al 2017 )

Certain limitations should be considered in the interpreta-tion of our data First the diversity of CoV sequences detected isalmost certainly biased by our sampling effort A small numberof species were sampled abundantly thus maximizing ourchances of finding a CoV (27282 species had gt110 individuals)however most were under-sampled (232282 with50 individ-uals) While this biases the probability of finding a CoV towardsthe more abundantly sampled species we stress that it also re-flects the uneven distribution of naturally occurring communi-ties (Magurran 2011) and that targeting the very rare specieswould be prohibitively expensive It is also unclear if populationsize has an effect on the number of viruses present in a spe-ciesmdasheg do larger populations accommodate more viral diver-sity Second our analysis only represents the lsquolastrsquo evolutionaryevent identified It does not represent or account for the com-plete evolutionary history of these lineages Therefore while asingle event has been ascribed to each association (seelsquoMethodsrsquo) it does not preclude other events having an equal orgreater impact in the past We are limited to using this most re-cent event type in order to assign a single evolutionary event toeach unique association It should also be noted that events aredependent on the relationships observed in the reconstructedtrees and could change as more viruses are added or longer se-quences are used Third we have only generated partial se-quences for analysis which limits our ability to comment onthe specific zoonotic potential of each virus or provide a com-prehensive analysis of their genetic histories and taxonomy forexample estimating the frequency or impact of recombinationwhich can be an important driver of host switching and diseaseemergence (Anthony et al 2017) We certainly support full-genome sequencing wherever possible [and have such effortsunderway (Anthony et al 2017)] however difficulties in virusisolation or sequencing directly from swabs (where materialand viral load can be low yielding variable results) can oftenpreclude extending sequences much beyond the short frag-ments amplified by consensus PCR (Drexler et al 2014) Equallylogistic and permitting issues in moving biological samplesacross borders and the need to first develop in-country capacityto fully sequence these viruses have all limited our ability tocharacterize (most of) these viruses further

Our lsquostate of the artrsquo is in the unique scope of our study focus-ing on a single viral family at a global scale To achieve this theUSAID PREDICT project first had to establish a strategy for viraldiscovery in twenty different countries many of which had noprior capacity to do this work at all For this reason we used asimple yet highly implementable approach based on cPCRInitially this approach might not appear to be as powerful ashigh throughput sequencing (HTS) techniques however thisstudy has demonstrated the particular and considerablestrengths of cPCR as an affordable screening and discovery tool inresource-poor settings where HTS is still largely unsupported

12 | Virus Evolution 2017 Vol 3 No 1

Studying viral diversity on a global scale is just one compo-nent of a larger strategy to understand the factors driving viraldiversity Ecological and evolutionary mechanisms can contrib-ute differently across scales and we therefore advocate compli-menting the global perspective with studies that focus on entireviral communities (rather than only one viral family) within sin-gle individuals or host species (rather than only on a globalscale) (Anthony et al 2015) We propose that it is critical to un-derstand both the component parts of the lsquozoonotic poolrsquo(Morse et al 2012) and the nature of their interactions at differ-ent scales if we are to move towards better predictive modelsand ultimately to reducing zoonotic emergence

Finally we offer a specific comment regarding the publichealth threat posed by bats While it is tempting to concludethat bats harbor a large number of potentially zoonotic CoVsmost of the putative viruses detected in this study are unlikelyto pose any threat to humansmdasheither because they lack the bio-logical pre-requisites to infect human cells or because the ecol-ogy of their hosts limits the opportunity for spillover Studiessuch as this are intended to advance our understanding of thefundamental biology of viruses not to create alarm or incite theretaliatory culling of bats Indeed such actions often have un-anticipated consequences and can even enhance disease trans-mission as seen with rabies in vampire bats (Streicker et al2012 Blackwood et al 2013) Bats are important insectivorespollinators and seed dispersers and we underscore both their vi-tal ecosystem role and the need to consider any public healthinterventions carefully

Supplementary data

Supplementary data are available at Virus Evolution online

Conflict of interest None declared

Acknowledgements

This study was made possible by the generous support ofthe American people through the United States Agency forInternational Development (USAID) Emerging PandemicThreats PREDICT project (cooperative agreement numberGHN-A-OO-09-00010-00) We thank the governments ofCameroon Gabon Democratic Republic of Congo Republicof Congo Rwanda Tanzania Uganda Peru Bolivia BrazilMexico Bangladesh Cambodia China Indonesia LaosMalaysia Nepal Thailand and Viet Nam for permission toconduct this study and the field teams and collaboratinglaboratories that performed sample collection and testing

ReferencesAnthony S J et al (2012) lsquoCoronaviruses in bats from Mexicorsquo

Journal of General Virology 94 1028ndash38 et al (2015) lsquoNon-random patterns in viral diversityrsquo

Nature Communications 6 8147 et al (2017) lsquoFurther evidence for bats as the evolutionary

source of MERS coronavirusrsquo mBio 82 pii e00373-17 doi101128mBio00373-17

Azhar E I et al (2014) lsquoEvidence for camel-to-human transmis-sion of MERS coronavirusrsquo The New England Journal of Medicine370 2499ndash505

Blackwood J C et al (2013) lsquoResolving the roles of immunitypathogenesis and immigration for rabies persistence in

vampire batsrsquo Proceedings of the National Academy of Sciences ofthe United States of America 110 20837ndash42

Burnham K P and Anderson D R (2002) Model Selection andMultimodel Inference A Practical Information-Theoretic ApproachNew York Springer

Charleston M A (1998) lsquoJungles a new solution to the hostpar-asite phylogeny reconciliation problemrsquo MathematicalBiosciences 149 191ndash223

Conow C et al (2010) lsquoJane a new tool for the cophylogeny recon-struction problemrsquo Algorithms for Molecular Biology AMB 5 16

Corman V M et al (2014a) lsquoRooting the phylogenetic tree ofmiddle east respiratory syndrome coronavirus by characteri-zation of a conspecific virus from an African batrsquo Journal ofVirology 88 11297ndash303

et al (2014b) lsquoCharacterization of a novel betacoronavirusrelated to middle East respiratory syndrome coronavirus inEuropean hedgehogsrsquo Journal of Virology 88 717ndash24

Coulibaly-NrsquoGolo D et al (2011) lsquoNovel arenavirus sequences inHylomyscus sp and Mus (Nannomys) setulosus from CotedrsquoIvoire implications for evolution of arenaviruses in AfricarsquoPLoS One 6 e20893

Darriba D et al (2012) lsquojModelTest 2 more models new heuris-tics and parallel computingrsquo Nature Methods 9 772

de Groot R J et al (2009) lsquoVirus taxonomy classification and no-menclature of virusesrsquo in A M Q King M J Adams E BCarstens and J Lefkkowitch (eds) Ninth Report of theInternational Committee for the Taxonomy of Viruses AmsterdamElsevier

Donaldson E F et al (2010) lsquoMetagenomic analysis of theviromes of three North American bat species viral diversityamong different bat species that share a common habitatrsquoJournal of Virology 84 13004ndash18

Drexler J F Corman V M and Drosten C (2014) lsquoEcology evo-lution and classification of bat coronaviruses in the aftermathof SARSrsquo Antiviral Research 101 45ndash56

Drosten C et al (2003) lsquoIdentification of a novel coronavirus inpatients with severe acute respiratory syndromersquo The NewEngland Journal of Medicine 348 1967ndash76

Edgar R C (2004) lsquoMUSCLE multiple sequence alignment withhigh accuracy and high throughputrsquo Nucleic Acids Research 321792ndash7

Ellis V A et al (2015) lsquoLocal host specialization host-switchingand dispersal shape the regional distributions of avian haemo-sporidian parasitesrsquo Proceedings of the National Academy ofSciences of the United States of America 112 11294ndash9

ESRI (2011) ArcGIS Desktop Release 10 Redlands CAEnvironmental Systems Research Institute

Ge X Y et al (2013) lsquoIsolation and characterization of a batSARS-like coronavirus that uses the ACE2 receptorrsquo Nature503 535ndash8

Hijmans R J Williams E Vennes C (2015) GeosphereSpherical Trigonometry v R Package version 15

Hagberg A A Schult D A and Swart P J (2008) lsquoExploring net-work structure dynamics and function using NetworkXrsquo in GVaroquaux T Vaught and J Millman (eds) Proceedings of 7thPython in Science Conference (SciPy2008) pp 11ndash15 Pasadena CA

Hu B et al (2015) lsquoBat origin of human coronavirusesrsquo VirologyJournal 12 221

Huang Y W et al (2013) lsquoOrigin evolution and genotyping ofemergent porcine epidemic diarrhea virus strains in theUnited Statesrsquo mBio 4 e00737ndash13

Huynh J et al (2012) lsquoEvidence supporting a zoonotic origin ofhuman coronavirus strain NL63rsquo Journal of Virology 8612816ndash25

S J Anthony et al | 13

Irwin N R et al (2012) lsquoComplex patterns of host switching inNew World arenavirusesrsquo Molecular Ecology 21 4137ndash50

Jacomy M et al (2014) lsquoForceAtlas2 a continuous graph layoutalgorithm for handy network visualization designed for theGephi softwarersquo PLoS One 9 e98679

Jones K E et al (2008) lsquoGlobal trends in emerging infectious dis-easesrsquo Nature 451 990ndash3

Jost L Chao A and Chazdon R L (2011) lsquoCompositional simi-larity and beta diversityrsquo in A E Magurran and B J McGill(eds) Biological Diversity Frontiers in Measurement andAssessment Oxford Oxford University Press

Karesh W B et al (2012) lsquoEcology of zoonoses natural and un-natural historiesrsquo Lancet 380 1936ndash45

Keesing F et al (2010) lsquoImpacts of biodiversity on the emergenceand transmission of infectious diseasesrsquo Nature 468 647ndash52

King A M Q et al (2012) Virus Taxonomy Classification andNomenclature of Viruses Ninth Report of the InternationalCommittee on Taxonomy of Viruses Elsevier Academic Press

Kreuder Johnson C et al (2015) lsquoSpillover and pandemic proper-ties of zoonotic viruses with high host plasticityrsquo ScientificReports 5 14830

Ksiazek T G et al (2003) lsquoA novel coronavirus associated withsevere acute respiratory syndromersquo The New England Journal ofMedicine 348 1953ndash66

Lau S K et al (2005) lsquoSevere acute respiratory syndromecoronavirus-like virus in Chinese horseshoe batsrsquo Proceedingsof the National Academy of Sciences of the United States of America102 14040ndash5

et al (2012) lsquoRecent transmission of a novel alphacoronavi-rus bat coronavirus HKU10 from Leschenaultrsquos rousettes topomona leaf-nosed bats first evidence of interspecies trans-mission of coronavirus between bats of different subordersrsquoJournal of Virology 86 11906ndash18

et al (2015) lsquoDiscovery of a novel coronavirus ChinaRattus coronavirus HKU24 from Norway rats supports the mu-rine origin of Betacoronavirus 1 and has implications for theancestor of Betacoronavirus lineage Arsquo Journal of Virology 893076ndash92

Lelli D et al (2013) lsquoDetection of coronaviruses in bats of variousspecies in Italyrsquo Viruses 5 2679ndash89

Li W et al (2005) lsquoBats are natural reservoirs of SARS-like coro-navirusesrsquo Science 310 676ndash9

Liang K and Zeger S L (1986) lsquoLongitudinal data analysis usinggeneralized linear modelsrsquo Biometrika 73 13ndash22

Maes P et al (2009) lsquoA proposal for new criteria for the classifica-tion of hantaviruses based on S and M segment protein se-quencesrsquo Infection Genetics and Evolution Journal of MolecularEpidemiology and Evolutionary Genetics in Infectious Diseases 9813ndash20

Magurran A E (2011) Biological Diversity Frontiers in Measurementand Assessment Oxford Oxford University Press

Melade J et al (2016) lsquoAn eco-epidemiological study of Morbilli-related paramyxovirus infection in Madagascar bats revealshost-switching as the dominant macro-evolutionary mecha-nismrsquo Scientific Reports 6 23752

Merkle D Middendorf M and Wieseke N (2010) lsquoA parameter-adaptive dynamic programming approach for inferring cophy-logeniesrsquo BMC Bioinformatics 11 Suppl 1 S60

Morse S S et al (2012) lsquoPrediction and prevention of the nextpandemic zoonosisrsquo The Lancet 380 1956ndash65

Pfefferle S et al (2009) lsquoDistant relatives of severe acute respira-tory syndrome coronavirus and close relatives of human

coronavirus 229E in bats Ghanarsquo Emerging Infectious Diseases15 1377ndash84

PREDICT_Consortium (2014) Reducing Pandemic Risk PromotingGlobal Health One Health Institute University of CaliforniaDavis Davis CA

Quan P L et al (2010) lsquoIdentification of a severe acute respira-tory syndrome coronavirus-like virus in a leaf-nosed bat inNigeriarsquo mBio 1 pii e00208-10 doi101128mBio00208-10

R Foundation for Statistical Computing (2008) R A Language andEnvironment for Statistical Computing Vienna Austria RFoundation for Statistical Computing

Ramsden C Holmes E C and Charleston M A (2009)lsquoHantavirus evolution in relation to its rodent and insectivorehosts no evidence for codivergencersquo Molecular Biology andEvolution 26 143ndash53

Reusken C B et al (2010) lsquoCirculation of group 2 coronavirusesin a bat species common to urban areas in Western EuropersquoVector Borne Zoonotic Dis 10 785ndash91

et al (2013) lsquoMiddle East respiratory syndrome coronavirusneutralising serum antibodies in dromedary camels a com-parative serological studyrsquo The Lancet Infectious Diseases 13859ndash66

Rubin D B (1987) Multiple Imputation for Nonresponse in SurveysNew York John Wiley and Sons

Sabir J S et al (2016) lsquoCo-circulation of three camel coronavirusspecies and recombination of MERS-CoVs in Saudi ArabiarsquoScience 351 81ndash4

Streicker D G et al (2012) lsquoEcological and anthropogenic driversof rabies exposure in vampire bats implications for transmis-sion and controlrsquo Proceedings Biological Sciencesthe RoyalSociety 279 3384ndash92

Tamura K et al (2013) lsquoMEGA6 molecular evolutionary geneticsanalysis version 60rsquo Molecular Biology and Evolution 30 2725ndash9

Tang X C et al (2006) lsquoPrevalence and genetic diversity of coro-naviruses in bats from Chinarsquo Journal of Virology 80 7481ndash90

Tsoleridis T et al (2016) lsquoDiscovery of novel alphacoronavirusesin European rodents and shrewsrsquo Viruses 8 84

Vijaykrishna D et al (2007) lsquoEvolutionary insights into the ecol-ogy of coronavirusesrsquo Journal of Virology 81 4012ndash20

Vijgen L et al (2005) lsquoComplete genomic sequence of human co-ronavirus OC43 molecular clock analysis suggests a relativelyrecent zoonotic coronavirus transmission eventrsquo Journal ofVirology 79 1595ndash604

Wacharapluesadee S et al (2015) lsquoDiversity of coronavirus inbats from Eastern Thailandrsquo Virology Journal 12 57

Wang Q et al (2014) lsquoBat origins of MERS-CoV supported by batcoronavirus HKU4 usage of human receptor CD26rsquo Cell Host ampMicrobe 16 328ndash37

Wang W et al (2015) lsquoDiscovery diversity and evolution ofnovel coronaviruses sampled from rodents in Chinarsquo Virology474 19ndash27

Watanabe S et al (2010) lsquoBat Coronaviruses and experimentalinfection of bats the Philippinesrsquo Emerging Infectious Diseases16 1217ndash23

Woo P C et al (2006) lsquoMolecular diversity of coronaviruses inbatsrsquo Virology 351 180ndash7

et al (2007) lsquoComparative analysis of twelve genomes ofthree novel group 2c and group 2d coronaviruses reveals uniquegroup and subgroup featuresrsquo Journal of Virology 81 1574ndash85

et al (2009) lsquoCoronavirus diversity phylogeny and inter-species jumpingrsquo Experimental Biology and Medicine 2341117ndash27

14 | Virus Evolution 2017 Vol 3 No 1

et al (2012) lsquoGenetic relatedness of the novel human groupC betacoronavirus to Tylonycteris bat coronavirus HKU4 andPipistrellus bat coronavirus HKU5rsquo Emerging Microbes andInfection 1 e35

Yang Y et al (2014) lsquoReceptor usage and cell entry of bat co-ronavirus HKU4 provide insight into bat-to-human

transmission of MERS coronavirusrsquo Proceedings of theNational Academy of Sciences of the United States of America111 12516ndash21

Zaki A M et al (2012) lsquoIsolation of a novel coronavirus from aman with pneumonia in Saudi Arabiarsquo The New England Journalof Medicine 367 1814ndash20

S J Anthony et al | 15

  • vex012-TF1
Page 3: Global patterns in coronavirus diversity · Since the emergence of Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV) and Middle East Respiratory Syndrom Coronavirus (MERS-CoV)

2013) Histograms of all pairwise sequence identities () wereplotted as described previously for hantaviruses (Maes et al2009) and used to define cut-offs between viral sequence clus-ters for analysis (akin to operating taxonomic units)

24 Diversity measures

The Earthrsquos surface was divided into grid cells each 10 latitudeby 10 longitude Thirty-four of these grid cells included areasfrom which bats were sampled (bats were identified down tothe species level in thirty cells) and twenty-seven cells hadsamples testing positive for CoVs If the coordinates listed for aparticular bat were on the line between grid cells it was arbi-trarily included in the grid cell with higher latitudelongitudeViral and host species richness for each cell was calculated bydetermining the total number of unique virus and bat specieswithin each cell Alpha diversity was calculated using theShannon index (Jost et al 2011) which was chosen over theGinindashSimpson index because it does not place disproportionateweight on dominant species Since we do not know how repre-sentative our sampling was the Shannon index was deemedthe best approach The effective number of viruses and bats ineach cell was then correlated using Kendallrsquos tau since the val-ues were not normally distributed Sensitivity analyses wereperformed by shifting the grid cells in 1 increments and re-calculating richness and effective species numbers

Matrices of beta diversity between each combination of gridcells were developed using the Jaccard index (Jost et al 2011)and Mantel tests were used to determine whether the distancebetween grids was associated with the dissimilarity in virus andbat species Geographic distance matrices were calculated byfinding the centroid of each grid cell and calculating the geode-sic distance between each pair using the R packagelsquogeospherersquo(Geosphere 2015) and tests were performed usingSpearmanrsquos rank correlation All analyses of diversity were con-ducted in R version 232

25 Network model

A presenceabsence matrix was constructed to show the distri-bution of viral sequence clusters across all species Using thePython package NetworkX (Hagberg et al 2008) a networkmodel was constructed connecting bat species to all viral clus-ters that were identified within that species in our data Thenetwork is made up of thirty-one connected componentseighty-two viral clusters eighty-five bat species and 159 edgesNetworks were plotted using Gephi using the force-directed al-gorithm ForceAtlas2 (Jacomy et al 2014) Specifically we plottedtwo bipartite graphs one where hosts are colored by region andone in which they are colored by family

26 Cophylogeny

The Jane (Conow et al 2010) software tool was used for cophylo-genetic reconstructions This approach applies an a priori lsquocostschemersquo to different evolutionary events in order to test the de-gree of congruence between two trees (virus and host) Theselsquoeventsrsquo include cospeciation duplication host switching andfailure to diverge (or sorting) (Charleston 1998 Conow et al2010) which are used to create a minimal-cost reconstruction ofthe evolutionary history between the viruses and hostsAnother cophylogeny program CoRe-PA (Merkle et al 2010)uses a parameter-adaptive approach to estimate an appropriatecost scheme without any prior value assignment For our analy-sis we used Jane with the default pre-determined cost scheme

where cospeciation is the null hypothesis and therefore thelsquocheapestrsquo event with the assumption that host switchingwould theoretically be more costly than genetic drift within aspecies to which a virus is already adapted Analysis was re-peated with varied cost values in Jane to test for sensitivity toour cost scheme as well as in CoRe-PA for robustness andevent types and costs from each reconstruction were comparedbetween the two programs As these changes ultimately did notsignificantly influence the outcome of the analysis exact eventassignments used in the final analysis were inferred solely fromJane with the default cost scheme

The analyses were performed using all unique associationsbetween viruses and their respective hosts (limited to bats)Alpha-CoV sequences and beta-CoV sequences were analyzedseparately since each appears to have diversified indepen-dently within bats This gave us a total of four reconstructions(alpha-Quan alpha-Watanabe beta-Quan beta-Watanabe)Only one instance of each virusndashhost association was includedeven if multiple instances of the same association were ob-served (ie our analysis does not account for the frequency ofdetection) Associations were excluded if the host was not iden-tified to the species level or if the cytochrome B sequence didnot exist in GenBank or could not be amplified by PCR The to-pology of the virus tree was inferred from the topology of thefull tree for consistency since subsets of sequences often pro-duce different arrangements

For each reconstruction the most recent evolutionary eventleading to a virusndashhost association was recorded Given thatsorting is particularly sensitive to missing data (it indicates aloss on the tree when in fact it may simply be an artifact ofunder-sampling) we have excluded this event from our analy-sis We also distinguished between two potential types of hostswitching based on an assumption that there could be differ-ences in zoonotic risk between viruses that only move betweenclosely related bats (eg those within the same genus) and thosethat are able to make more substantive jumps into distantly re-lated species (those in other host genera or families) We there-fore use lsquohost switchingrsquo to refer specifically to viral lineagesthat have moved into a host belonging to another genus or fam-ily and lsquosharingrsquo to indicate a viral lineage that has moved intoa different host species of the same genus (without speciating)We selected host genus as the cut-off to be consistent with cur-rent conventions of taxonomic hierarchy and verified that thenumber of species per genus is relatively consistent between re-gions We further verified that the mean genetic distance be-tween host species is consistent between regions (ie that thetaxonomic distance a virus has to navigate when moving be-tween any two species is roughly similar in all three regions)

A binary regression model with logistic link function wasused to investigate whether the degree of host switching variedby region The dependent variable was host switching and ei-ther cospeciation or sharing was the reference group Regionwas the independent variable We used the generalized estimat-ing equations (GEE) (Liang and Zeger 1986) method in order toaccount for multiple event types within one family and derive arobust estimate of the standard errors Event data for all four re-constructions were aggregated for the analysis Where hostndashvi-rus associations were duplicated one was removed If theduplicated event types did not agree [which can occur given thesensitivity of these methods to variation in topology and num-ber of taxa in each tree (Conow et al 2010)] we randomly se-lected one of the two possibilities and repeated therandomization 100 times The GEE estimates from the 100 ran-dom iterations were then combined using rules established by

S J Anthony et al | 3

Rubin (1987) The analysis was repeated to explore variationbased on bat family (independent variable) Some families onlypossessed one type of event which would introduce quasi-complete separation into the GEE model so they were excludedfrom the corresponding models However the numbers of batspecies within these excluded families were very small (lt5)compared to the overall number of bat species for the wholestudy so we felt this would not impact any overall trends

27 Viral discovery effort

The relationship between sampling effort and viral discoverywas evaluated using a log-link Poisson regression model fittedwith the count of viruses for each species as the dependent vari-able and the number of animals sampled as the independentvariable This model is universally applied with count data Thecoefficient of the sample number variable is the log of the ratioof the expected number of viruses with (nthorn 1) samples collectedto the expected number of viruses with n samples collectedwhere n can be any arbitrary positive integer Using this esti-mated coefficient we calculated the estimated numbers of vi-ruses (viral sequence clusters) that can be found with respect tothe sampling effort (ie all possible numbers of collected ani-mals within a species) and they were then used to plot the esti-mated curve together with its 95 confidence band We clarifythat we use the term lsquoexpectedrsquo to indicate the statistical expec-tation based on the estimate of the regression model We ac-knowledge that the model fit is not as good as the model withan additional quadratic term of the independent variable none-theless it avoids the problem of over-fitting and realistically re-flects the relationship between sampling effort and viraldiscovery when the number of animals collected is relativelylower

28 Factors associated with CoV positivity

Due to the small numbers of positive individuals obtained forrodents non-human primates and humans only bat data wereanalyzed for factors associated with positivity A total of 12333bats were tested This included 5624 males and 5767 females(922 were not sexed) Among aged bats 7385 were adults 1029were sub-adults and 8 were neonates (3911 were not aged)Binary logistic regression models were used to evaluate signifi-cant variables restricting analyses to species for which 50 in-dividuals were tested Model selection was first performed in R(version 325) (R Foundation for Statistical Computing 2008) us-ing the R package lsquogmultirsquo and the step function and the bestmodel determined by AIC A robust estimation of standard errorwas calculated by clustering by individual Animal Identificationnumber to account for non-independence of multiple tests fromthe same animal using STATA 131 SE (College Station TXUSA) The independent variables included in analyses wereSpecimen Type (blood fecesrectal swabs guano oralrectalsample oralnasal swabs tissue urineurogenital swabs) HostTaxon (family sub-family genus) Host Age (adult sub-adultneonate) Season (wet dry) and Interface (animal use humanactivity land use pristine area) Season was designated as lsquowetrsquoor lsquodryrsquo according to month and proximity to the equator foreach country Interfaces were broadly grouped according to thetype and intensity of animal contact with people specifically (1)Land Use (animals sampled in areas with crops extractive in-dustries livestock activities) (2) Animal Use (hunting marketsrestaurants trade wild animal farms wildlife managementszoossanctuaries handling by veterinarianresearchers) (3)

Human Activities (ecotourism in and around human dwell-ings) and (4) Pristine (where animal human contact was notlikely) The model fit for each region was evaluated using theHosmerndashLemeshow goodness-of-fit test and pseudo R2 values

29 Virus lsquoHotspotrsquo Maps

The number of viral sequence clusters (richness) in each sub-clade was summarized by host family (Supplementary TableS1) and chi square tests used to evaluate whether viral sub-clade and host family were independent of each other Bat spe-cies richness maps were then created to infer hotspots of CoVdiversity For each viral clade spatial distribution data on all batspecies belonging to associated families was obtained from theInternational Union for Conservation of Nature (IUCN) UsingArcGIS version 10 (ESRI 2011) bat species were quantified bycounting overlapping polygons The resulting count data werethen converted to a raster and plotted All sampled bats as wellas all identified sequence clusters within the clade in questionwere then plotted over the raster data

3 Results31 Global diversity of CoVs

A total of 19192 animals and humans were assayed for thepresence of CoV by consensus PCR (cPCR) (Table 1) The majoritywere bats (nfrac14 12333) representing 282 species from twelvefamilies Overall the proportion of CoV positive individuals was86 in bats (nfrac14 106512333) and 02 in non-bats (nfrac14 176859) In other words over 98 of all positive individuals werebats

Partial sequences were obtained from two non-overlappingfragments of the orf1ab gene yielding 654 sequences for thelsquoQuanrsquo region and 950 for the lsquoWatanabersquo region (see lsquoMethodsrsquo)There was a 27 overlap where sequences were obtained for bothregions from the same sample The distribution of pairwise se-quence identities defined a 90 cut-off between taxa (Fig 1) andthe resulting monophyletic groups (in which all sequenceshad90 identity) used as our operating taxonomic units Groupsthat shared less than 90 identity to a known sequence were la-beled sequentially as PREDICT_CoV-1 -2 -3 etc while groups shar-ing90 identity to a sequence already in GenBank wereconsidered to be strains of a known virus and assigned the samename as the matching sequence (eg Kenya_bat_CoV_KY33 orHKU-9) Sequences that sharedgt90 identity but were found indifferent hosts were considered part of the same taxonomic unitBased on these criteria 100 discrete viral taxa were identifiedninety-one of which were found in bats (Table 1 Fig 2)Importantly we make no claims that these groups correspond tolsquospeciesrsquo as full orf1ab replicase sequences would be required forthat (King et al 2012) Instead we clarify that we are using thesepartial fragments to cluster sequences into lsquooperational taxonomicunitsrsquo (analogous to OTU clustering in microbiome research) Wealso stress that our cut-off was determined based on two discreteregions within the orf1ab separated by gt3000 nucleotides andthat these regions correspond to unique peptides post-transla-tional cleavage

Of note was the detection of subgroup 2a CoV sequences inbats humans and non-human primates from four differentcountries This sub-clade is largely considered the rodent sub-clade (Lau et al 2015 Wang et al 2015 Tsoleridis et al 2016)but here we detected sequences corresponding to the known vi-rus betacoronavirus-1 in Pteropus medius from Bangladesh

4 | Virus Evolution 2017 Vol 3 No 1

Pteropus alecto from Indonesia and from humans in China andsequences corresponding to murine coronavirus in non-humanprimates from Nepal (these sequences were detected by fourdifferent laboratories and no rodent samples were processed atthe same time) (Fig 2) In addition we report the discovery ofseveral sub-clade 2b and 2c CoV sequence clusters (the SARSand MERS sub-clades respectively) (Fig 2) and the detection ofhuman coronavirus 229E-like sequences in hipposideros andrhinolophus bats sampled in ROC Uganda Cameroon andGabon supporting previous suggestions that 229E has zoonoticorigins (Pfefferle et al 2009 Hu et al 2015) Finally we highlightthe detection of avian-associated infectious bronchitis virus-like sequences (IBV) in bats and a closely related virus(PREDICT_CoV-49) in non-human primates from Bangladesh aswell as porcine epidemic diarrhea virus (PEDV) in bats (Fig 2)Again we confirm that no livestock samples were processed inany of these laboratories that might have acted as a source ofcontamination

32 Factors driving viral diversity

Viral a-diversity (the lsquoeffective numberrsquo of species) was signifi-cantly correlated with bat a-diversity (taufrac14 0325 Pfrac14 0022) (Fig3) suggesting that more viruses will be found in regions wherebat diversity is higher This association was maintained whenviral richness rather than effective number of species wasused as the diversity index (taufrac14 0421 Plt 0001) Viral and batb-diversity were also significantly correlated (Mantel testrhofrac14 0575 Plt 0001) In both cases diversity differentiated al-most entirely into three discrete communities by region (Fig 3)Only Africa and Asia shared any viral sequence clusters(PREDICT_CoV-35 and HKU9 Fig 4) demonstrating that bats

have had a strong biogeographic influence on the global ecologyand evolution of CoVs

Co-phylogenetic reconciliation analysis was used to investi-gate the evolutionary mechanisms driving the virusndashhost asso-ciations observed for example host switching viral sharing orco-speciation (see lsquoMethodsrsquo) Overall host switching was thedominant mechanism followed by co-speciation (Fig 5) Whenanalyzed by region host switching (inter-genus transmission)remained dominant in Africa and Asia but not in LatinAmerica Despite fewer host-switching events in Latin Americathere was a concomitant increase in virus sharing (intra-genustransmission) This suggests that viruses are still switchinghosts in Latin America but preferentially move between closelyrelated species while in Africa and Asia viruses cross betweenmore distantly related species To assess the sensitivity of ourresults to our chosen method (ie use of a pre-defined costscheme in the program Jane) we repeated the analysis for theQuan alpha-CoV and beta-CoV reconstructions using CoRePA aparameter-adaptive approach that estimates an appropriatecost scheme without any prior value assignment Comparingthe results we noted 91 agreement between Jane and CoRePAin alpha-CoV reconstruction (thirty-four events of which thirty-one agreed) and 100 agreement in the beta-CoV reconstruc-tions (twenty-six events all of which agreed) We further notethat the cost scheme calculated in CoRePA was 07 for hostswitching and 01 for co-speciationmdashsupporting the default costscheme used in Jane (ie that host switching should be consid-ered more lsquocostlyrsquo than co-speciation)

A bipartite network model connecting viral sequence clus-ters to their hosts supports these results illustrating that batCoVs were connected to several host families in Africa and Asiawhile being largely restricted to a single bat family in Latin

Table 1 Summary of individuals tested and number positive for at least one coronavirus by host taxa

Host taxa tested No individuals tested No individuals positive No distinct viruses detected

Bats 12333 1065 91Non-Human Primates 3470 4 2Rodents and Shrews 3387 11 7Humans 1124 2 2Total 19192 1082 100a

aNote Numbers do not total as two viruses were detected in two taxa

Figure 1 Histogram of the relative frequency of pairwise sequence identities used to define the cutoff between operating taxonomic units for all CoV sequences de-

tected A bimodal distribution was observed and a cutoff of 90 sequence identity used to separate sequences from both the Watanabe (Panel A) and Quan (Panel B) as-

says into discrete viral taxa

S J Anthony et al | 5

Table 2 Variables associated with coronavirus positive bat tests in Africa Asia and Latin America Regional datasets for each logistic regres-sion model consisted of bat species where greater than fifty bats were tested Age datasets were a subset of the regional datasets as age classwas not determined for all individuals Numbers in bold are statically significant for Plt 005

Odds ratio P 95 Confidence intervals

AfricaSample type Lower Upper

Fecesrectal swab 35098 lt0001 14664 84007Oralnasal swab 505 lt0001 203 1254Oralrectal swab 3098 lt0001 1346 7130Blood 165 0540 033 809Tissue 100

Host familyHipposideridae 122 0559 062 241Pteropodidae 384 lt0001 227 649Molossidae 100

SeasonDry 242 lt0001 179 325Wet 100

InterfaceAnimal use 198 0001 135 291Pristine area 139 0624 037 529Land use 166 0219 074 374Human activity 100

AgeSubadult 591 lt0001 428 817Adult 100

AsiaSample type

Fecesrectal swab 6280 lt0001 1544 25549Oralnasal swab 1325 lt0001 311 5638Urineurogenital swab 4692 lt0001 1085 20280Guano 2212 lt0001 366 13356Tissue Oralrectal swab 100

Host familyEmballonuridae 031 026 004 237Miniopteridae 978 lt0001 569 1681Pteropodidae 216 lt0001 129 362Rhinolophidae 108 082 057 203Vespertilionidae 367 lt0001 218 618Hipposideridae 100

SeasonDry 149 003 103 216Wet 100

InterfaceAnimal use 330 001 129 840Human activity 348 001 136 895Pristine area 181 035 052 635Land use 100

AgeSubadult 184 000 126 267Adult 100

Latin AmericaSample type

Fecesrectal swab 1566 0000 502 4879Oralrectal swab 2786 0000 686 11322Oralnasal swab 100

Host subfamilyCarolliinae 695 006 090 5376Stenodermatinae 504 012 067 3789Glossophaginae 100

InterfaceAnimal use 525 001 148 1865Human activity 273 012 077 969Land use 422 010 078 2295Pristine area 100

SeasonDry 130 052 058 289Wet 100

AgeSubadult 173 015 083 362Adult 100

6 | Virus Evolution 2017 Vol 3 No 1

America (Fig 4 Panel B) Further host switching was nearlyfour times more likely than virus sharing in Africa when com-pared with Latin America (OR 3858 Pfrac14 0040) CoVs in Asiawere also more likely to host switch than share when comparedwith Latin America however the results were not significant(OR 2474 Pfrac14 0143) No particular bat family was associatedwith increased host switching but this may reflect small sam-ple sizes when the data are summarized by family

33 Estimated number of CoVs in bats

A large number of host (bat) species in our study were negativefor CoV (nfrac14 197) however none of these species were sampledextensively (Fig 6) We found that all species with samplesizesgt110 individuals were positive for one or more CoVs sug-gesting that we would have detected CoVs in some of the nega-tive species in our study if sampling effort were increased Dueto the high number of these negative species and their effect onthe average number of virus sequence clusters per species theywere excluded from our estimates By retaining only those spe-cies for whichgt110 individuals were sampled (there weretwenty-seven species that qualified) we estimated the averagenumber of CoVs per species to be 267 (stdfrac14 138) thus account-ing for the likely scenario that some species will have more vi-ruses and others less We then extrapolated the average to all1200 bat species to estimate a total potential richness of 3204

CoVs (rangefrac14 1200ndash6000 CoVs) most of which have yet to bedescribed

34 Factors predicting CoV positivity

In order to refine future surveillance efforts to find the undis-covered diversity of CoVs in bats we evaluated the factors pre-dicting CoV positivity For each region the best model includedspecimen type bat family (or in the case of Latin America sub-family) season and animalndashhuman interface (Table 2) Seasonwas not included in the top model for Latin America but themodel with the season included was within two AIC (deltaAICfrac14 106) and therefore considered to have as much support asthe top model (Burnham and Anderson 2002) Specimen typewas highly associated with CoV positivity and samples contain-ing feces or fecal swabs were significantly more likely to testpositive than other specimen types in all regions Bat familywas important in Asia and Africa Season was also significantwith samples collected during the dry season more likely to testpositive in Africa and Asia than those collected during the wetseason (albeit with low odds ratios) Age class appears to be im-portant as sub-adults were more likely to test positive thanadults in Africa and Asia Sex was not related to CoV positivityin Asia or Latin America while males were slightly more likelyto be CoV positive in Africa (ORfrac14 15 12ndash19 CI Plt 0001) Broadcategories of animalndashhuman interfaces were significantly

Figure 2 Maximum likelihood phylogenetic reconstructions for all partial CoV RdRp fragments for both the Quan (Panel A) and Watanabe (Panel B) assays Sequences

are collapsed into clades representing our operating taxonomic units (sequences sharing90 identity) and number of sequences for each taxon is indicated in paren-

theses Representative published sequences from GenBank have been included for comparison (GenBank accession number indicated) Both trees were rooted using

the related Breda virus (NC_007447) and all nodes have 60 bootstrap support Pie charts indicate the distribution of viral taxa by host (bat) family in each virus sub-

clade

S J Anthony et al | 7

associated with CoV positivity among bats in all three regionsin Africa and Latin America bats sampled at the animal use in-terface category were more likely to be positive and in Asiabats sampled at the animal use and human activities interfaceswere more likely to be positive than those sampled at other in-terfaces The significance of the animal interface in all three

regions suggests that practices around animal use may be im-portant for disease transmission

35 Future sampling effort

Based on our study we were able to estimate the sampling ef-fort that would be required to find all CoVs in bats based on a

Figure 3 Comparison of viral and bat diversity The earthrsquos surface was divided into grid cells by latitude and longitude (10 10 degree units) for diversity calculations

(Panel A) Grid cells where bats were sampled are numbered in each region Alpha diversity (Shannon H) for virus (Panel B) and host (Panel C) were correlated indicat-

ing that areas of high bat diversity also have high viral diversity Darker cells indicate higher alpha diversity (ie more viral or host taxa) in each grid cell Beta diversity

was also correlated between virus (Panel D) and host (Panel E) and differentiated into three discrete communities by regionmdashLatin America (grid cells 1ndash10) Africa

(grid cells 11ndash20) and Asia (grid cells 21ndash34) Shading indicates that either viruses (in red) or hosts (in blue) are shared between two corresponding grid cells with darker

cells indicating higher pairwise similarity

8 | Virus Evolution 2017 Vol 3 No 1

Poisson regression model of our data (Fig 6) With 154 individ-uals we estimate that an average of one CoV will be detected(95 CI 136ndash177) With 397 individuals we estimate that upto five CoVs will be detected (95 CI 351ndash458) As we did not

observe any species with greater than five viral sequenceclusters we make the assumption that sampling 397 individ-uals should capture the full diversity of CoVs in each batspecies

Figure 4 Network model showing the connection of CoVs and their hosts Viral sequence clusters (colored grey) are connected to host species either by region (Panel

A) or family (Panel B) Viral and host and communities separate almost entirely by region only Africa and Asia are connected by two shared viruses (HKU9 and

PREDICT_CoV-35) found in species from both continents Networks also show that viruses appear to be shared by multiple host families in Africa and Asia while being

more restricted to a single family in Latin America

S J Anthony et al | 9

Figure 5 Relative proportion of evolutionary events leading to observed virushost associations for CoVs in bats Cophylogenetic reconstructions were used to identify

each event (Supplementary Fig S1 Panels AndashD) and significance evaluated by region Across all regions host switching was the dominant evolutionary event (Panel

A) When separated by region host switching remained dominant in Africa (Panel B) and Asia (Panel C) but not in Latin America (Panel D)

Figure 6 Relationship between sampling effort and viral detection among all bat species sampled A Poisson regression model was used to estimate the expected num-

ber of viruses based on the number of animals sampled by species (each circle indicates a separate species) The 95 confidence intervals are indicated

10 | Virus Evolution 2017 Vol 3 No 1

36 Predicting the distribution of unknown CoVs

Viral sub-clade and host family were not independent of eachother (v2 by cladefrac14 Plt 0001 Supplementary Table S1) demon-strating that different clades have significant associations withspecific bat families For example subgroup 2d CoVs were sig-nificantly associated with pteropid bats and were only identi-fied in regions where pteropid bats exist Likewise subgroup 2bCoVs were associated with rhinolophid and hipposiderid batsand 2c with vespertilionid bats To lsquopredictrsquo the potential distri-bution of unknown CoVs we plotted the known distribution ofbats belonging to these families and assume that lsquohotspotsrsquo ofbat diversity infer hotspots of viral diversity for each sub-clade(Fig 7) We note that the alphaCoV genus has been consideredas one group since they do not resolve well into sub-clades(de Groot et al 2009) and that no 2a map was generatedgiven that only one bat virus from this clade was identified

4 Discussion

The emergence of SARS and MERS has driven a need to under-stand more about the diversity ecology and evolution of coro-naviruses particularly at so-called lsquohotspotsrsquo of zoonoticemergence (Jones et al 2008 Drexler et al 2014) To this end wesurveyed the diversity of CoVs from twenty countries in LatinAmerica Africa and Asia to identify global factors driving viraldiversity and to look for regional differences in factors that con-tribute to the risk of emergence such as host switching In totalwe identified sequences from 100 discrete phylogenetic clus-ters ninety-one of which were found in bats

Our data suggest that the diversity of bat CoVs has beendriven primarily by host ecology First viral richness wasstrongly correlated with bat richness suggesting that mostCoVs will be found in regions where bat diversity is highestSecond we showed that CoV diversity separates into three

Figure 7 Viral diversity lsquohotspotrsquo maps Panel A shows the potential hotspots for CoVs based on the distribution of bats worldwide Location of alpha CoV sequences

from this study are shown in black and beta CoV sequences in blue indicating there is no geographical bias based on viral genus (ie alpha and beta CoVs are equally

likely to be found in all regions) Some viral sub-clades were associated with particular bat families and the spatial distribution data of all species belonging to these

families were plotted to indicate the potential hotspots of viral diversity (richness) for these sub-clades Panel B indicates the potential distribution of 2b CoVs based on

the distribution of rhinolphus and hipposideros bats Locations of 2b-positive animals identified in this study are indicated in black and correlate with areas of high

species richness (for these families) CoV-positive animals for other sub-clades shown in light blue Panel C indicates the potential distribution of 2c CoVs based on the

distribution of vespertilionid bats Locations of 2c-positive animals identified in this study are indicated in black This map suggests there are hotspots of 2c diversity

in regions not covered in this study (eg Europe) Panel D indicates the potential distribution of 2d viruses based on the distribution of pteropid bats The map suggests

these viruses may have a more limited distribution compared with viruses of other sub-clades Locations of 2d-positive animals identified in this study are shown in

black

S J Anthony et al | 11

distinct communities by region echoing the distribution of batsand suggesting an ecological dependence on their hosts Andthird we identified particular associations between viral sub-clade and bat family indicating that CoVs have evolved with (oradapted to) preferred families Collectively these data showthat the global diversity and distribution of CoVs in bats is non-random and is driven by variation in the biogeography of bats

Regional variation was also observed in the proportion ofhost switching events relative to other evolutionary mecha-nisms such as co-speciation duplication or sharing (seeMethods for definitions) Overall host switching was the domi-nant mechanism supporting the general trend observed forCoVs (Vijaykrishna et al 2007 Woo et al 2009 Lau et al 2012) aswell as similar findings for other viral families including para-mxyoviruses (Melade et al 2016) hantaviruses (Ramsden et al2009) and arenaviruses (Coulibaly-NrsquoGolo et al 2011 Irwin et al2012) However when analyzed by region there were propor-tionally fewer events in Latin America compared with Africa orAsia Given that host switching is the first critical step in zoo-notic emergence (Li et al 2005Pfefferle et al 2009 Woo et al2012 Azhar et al 2014) we suggest this finding could reflect re-gional differences in the risk of disease emergence

It is important to note that we distinguish lsquohost switchingrsquo(inter-genus or family switching) from lsquosharingrsquo (intra-genusswitching) in our analysis and that while the proportion of hostswitches was lower in Latin America the proportion of sharingevents was concomitantly higher This indicates that CoVs inthis region still switch hosts just like they do in Africa and Asiabut that they preferentially move between closely related spe-cies While we do not yet understand the factors driving this dif-ference if we assume that the risk of spillover to humansreflects the taxonomic distances that CoVs are able to jump(Kreuder Johnson et al 2015) our results would indicate ahigher risk for zoonotic emergence in Africa and Asia [acknowl-edging that host switching is not the only important factor driv-ing zoonotic emergence (Jones et al 2008 Keesing et al 2010Karesh et al 2012)] While evidence of regional variation in hostswitching has not been reported previously for viruses (toour knowledge) similar patterns have been observed inhemosporidian parasites (Ellis et al 2015)

In total we estimated that there are at least 3204 CoVs(based on the cluster definition used in this study) in bats Thissuggests that although there is still a large number of coronavi-ruses that remain to be discovered their detection remains anachievable goal and we advocate for their discovery given thepotential for even greater insights into the global ecology andevolution of these viruses (eg further investigation into re-gional differences in host switching) and the opportunity toelucidate their individual zoonotic potential (Ge et al 2013Wang et al 2014 Yang et al 2014 Anthony et al 2017) Buildingon our study we suggest that future surveillance efforts shouldconsider the number of samples carefully and include up to 400individuals (of either sex) per species in order to maximize thechance of detecting all CoVs and that including less than 154individuals per species might reduce the chance of finding evenone virus and would likely produce a poor return on invest-ment We also recommend that feces (or rectal swabs) followedby saliva (or oral swabs) should be prioritized if resources arelimited and all sample types cannot be processed and thatsampling should be biased towards immature animals and inthe dry season

To predict where this unknown diversity is likely to be weplotted the known distribution of all bat families associatedwith each CoV sub-clade The 2b SARS clade was associated

with rhinolophid and hipposiderid bats so we plotted the distri-bution of all bat species belonging to these families to generatea map of viral diversity (richness) Based on this map we wouldexpect the greatest diversity of unknown 2b viruses to be foundin South East Asia which is consistent with previous studies de-scribing 2b viruses in bats but also in isolated pockets through-out Africa In contrast the 2c MERS clade was associated withvespertilionid bats predicting that 2c diversity will be highest inMexico Europe Central and South East Africa and parts ofSouth East Asia Again this is consistent with the distributionof 2c viruses reported here and previously in the literature(Woo et al 2007 Reusken et al 2010 Anthony et al 2012 Lelliet al 2013 Corman et al 2014ab Anthony et al 2017 )

Certain limitations should be considered in the interpreta-tion of our data First the diversity of CoV sequences detected isalmost certainly biased by our sampling effort A small numberof species were sampled abundantly thus maximizing ourchances of finding a CoV (27282 species had gt110 individuals)however most were under-sampled (232282 with50 individ-uals) While this biases the probability of finding a CoV towardsthe more abundantly sampled species we stress that it also re-flects the uneven distribution of naturally occurring communi-ties (Magurran 2011) and that targeting the very rare specieswould be prohibitively expensive It is also unclear if populationsize has an effect on the number of viruses present in a spe-ciesmdasheg do larger populations accommodate more viral diver-sity Second our analysis only represents the lsquolastrsquo evolutionaryevent identified It does not represent or account for the com-plete evolutionary history of these lineages Therefore while asingle event has been ascribed to each association (seelsquoMethodsrsquo) it does not preclude other events having an equal orgreater impact in the past We are limited to using this most re-cent event type in order to assign a single evolutionary event toeach unique association It should also be noted that events aredependent on the relationships observed in the reconstructedtrees and could change as more viruses are added or longer se-quences are used Third we have only generated partial se-quences for analysis which limits our ability to comment onthe specific zoonotic potential of each virus or provide a com-prehensive analysis of their genetic histories and taxonomy forexample estimating the frequency or impact of recombinationwhich can be an important driver of host switching and diseaseemergence (Anthony et al 2017) We certainly support full-genome sequencing wherever possible [and have such effortsunderway (Anthony et al 2017)] however difficulties in virusisolation or sequencing directly from swabs (where materialand viral load can be low yielding variable results) can oftenpreclude extending sequences much beyond the short frag-ments amplified by consensus PCR (Drexler et al 2014) Equallylogistic and permitting issues in moving biological samplesacross borders and the need to first develop in-country capacityto fully sequence these viruses have all limited our ability tocharacterize (most of) these viruses further

Our lsquostate of the artrsquo is in the unique scope of our study focus-ing on a single viral family at a global scale To achieve this theUSAID PREDICT project first had to establish a strategy for viraldiscovery in twenty different countries many of which had noprior capacity to do this work at all For this reason we used asimple yet highly implementable approach based on cPCRInitially this approach might not appear to be as powerful ashigh throughput sequencing (HTS) techniques however thisstudy has demonstrated the particular and considerablestrengths of cPCR as an affordable screening and discovery tool inresource-poor settings where HTS is still largely unsupported

12 | Virus Evolution 2017 Vol 3 No 1

Studying viral diversity on a global scale is just one compo-nent of a larger strategy to understand the factors driving viraldiversity Ecological and evolutionary mechanisms can contrib-ute differently across scales and we therefore advocate compli-menting the global perspective with studies that focus on entireviral communities (rather than only one viral family) within sin-gle individuals or host species (rather than only on a globalscale) (Anthony et al 2015) We propose that it is critical to un-derstand both the component parts of the lsquozoonotic poolrsquo(Morse et al 2012) and the nature of their interactions at differ-ent scales if we are to move towards better predictive modelsand ultimately to reducing zoonotic emergence

Finally we offer a specific comment regarding the publichealth threat posed by bats While it is tempting to concludethat bats harbor a large number of potentially zoonotic CoVsmost of the putative viruses detected in this study are unlikelyto pose any threat to humansmdasheither because they lack the bio-logical pre-requisites to infect human cells or because the ecol-ogy of their hosts limits the opportunity for spillover Studiessuch as this are intended to advance our understanding of thefundamental biology of viruses not to create alarm or incite theretaliatory culling of bats Indeed such actions often have un-anticipated consequences and can even enhance disease trans-mission as seen with rabies in vampire bats (Streicker et al2012 Blackwood et al 2013) Bats are important insectivorespollinators and seed dispersers and we underscore both their vi-tal ecosystem role and the need to consider any public healthinterventions carefully

Supplementary data

Supplementary data are available at Virus Evolution online

Conflict of interest None declared

Acknowledgements

This study was made possible by the generous support ofthe American people through the United States Agency forInternational Development (USAID) Emerging PandemicThreats PREDICT project (cooperative agreement numberGHN-A-OO-09-00010-00) We thank the governments ofCameroon Gabon Democratic Republic of Congo Republicof Congo Rwanda Tanzania Uganda Peru Bolivia BrazilMexico Bangladesh Cambodia China Indonesia LaosMalaysia Nepal Thailand and Viet Nam for permission toconduct this study and the field teams and collaboratinglaboratories that performed sample collection and testing

ReferencesAnthony S J et al (2012) lsquoCoronaviruses in bats from Mexicorsquo

Journal of General Virology 94 1028ndash38 et al (2015) lsquoNon-random patterns in viral diversityrsquo

Nature Communications 6 8147 et al (2017) lsquoFurther evidence for bats as the evolutionary

source of MERS coronavirusrsquo mBio 82 pii e00373-17 doi101128mBio00373-17

Azhar E I et al (2014) lsquoEvidence for camel-to-human transmis-sion of MERS coronavirusrsquo The New England Journal of Medicine370 2499ndash505

Blackwood J C et al (2013) lsquoResolving the roles of immunitypathogenesis and immigration for rabies persistence in

vampire batsrsquo Proceedings of the National Academy of Sciences ofthe United States of America 110 20837ndash42

Burnham K P and Anderson D R (2002) Model Selection andMultimodel Inference A Practical Information-Theoretic ApproachNew York Springer

Charleston M A (1998) lsquoJungles a new solution to the hostpar-asite phylogeny reconciliation problemrsquo MathematicalBiosciences 149 191ndash223

Conow C et al (2010) lsquoJane a new tool for the cophylogeny recon-struction problemrsquo Algorithms for Molecular Biology AMB 5 16

Corman V M et al (2014a) lsquoRooting the phylogenetic tree ofmiddle east respiratory syndrome coronavirus by characteri-zation of a conspecific virus from an African batrsquo Journal ofVirology 88 11297ndash303

et al (2014b) lsquoCharacterization of a novel betacoronavirusrelated to middle East respiratory syndrome coronavirus inEuropean hedgehogsrsquo Journal of Virology 88 717ndash24

Coulibaly-NrsquoGolo D et al (2011) lsquoNovel arenavirus sequences inHylomyscus sp and Mus (Nannomys) setulosus from CotedrsquoIvoire implications for evolution of arenaviruses in AfricarsquoPLoS One 6 e20893

Darriba D et al (2012) lsquojModelTest 2 more models new heuris-tics and parallel computingrsquo Nature Methods 9 772

de Groot R J et al (2009) lsquoVirus taxonomy classification and no-menclature of virusesrsquo in A M Q King M J Adams E BCarstens and J Lefkkowitch (eds) Ninth Report of theInternational Committee for the Taxonomy of Viruses AmsterdamElsevier

Donaldson E F et al (2010) lsquoMetagenomic analysis of theviromes of three North American bat species viral diversityamong different bat species that share a common habitatrsquoJournal of Virology 84 13004ndash18

Drexler J F Corman V M and Drosten C (2014) lsquoEcology evo-lution and classification of bat coronaviruses in the aftermathof SARSrsquo Antiviral Research 101 45ndash56

Drosten C et al (2003) lsquoIdentification of a novel coronavirus inpatients with severe acute respiratory syndromersquo The NewEngland Journal of Medicine 348 1967ndash76

Edgar R C (2004) lsquoMUSCLE multiple sequence alignment withhigh accuracy and high throughputrsquo Nucleic Acids Research 321792ndash7

Ellis V A et al (2015) lsquoLocal host specialization host-switchingand dispersal shape the regional distributions of avian haemo-sporidian parasitesrsquo Proceedings of the National Academy ofSciences of the United States of America 112 11294ndash9

ESRI (2011) ArcGIS Desktop Release 10 Redlands CAEnvironmental Systems Research Institute

Ge X Y et al (2013) lsquoIsolation and characterization of a batSARS-like coronavirus that uses the ACE2 receptorrsquo Nature503 535ndash8

Hijmans R J Williams E Vennes C (2015) GeosphereSpherical Trigonometry v R Package version 15

Hagberg A A Schult D A and Swart P J (2008) lsquoExploring net-work structure dynamics and function using NetworkXrsquo in GVaroquaux T Vaught and J Millman (eds) Proceedings of 7thPython in Science Conference (SciPy2008) pp 11ndash15 Pasadena CA

Hu B et al (2015) lsquoBat origin of human coronavirusesrsquo VirologyJournal 12 221

Huang Y W et al (2013) lsquoOrigin evolution and genotyping ofemergent porcine epidemic diarrhea virus strains in theUnited Statesrsquo mBio 4 e00737ndash13

Huynh J et al (2012) lsquoEvidence supporting a zoonotic origin ofhuman coronavirus strain NL63rsquo Journal of Virology 8612816ndash25

S J Anthony et al | 13

Irwin N R et al (2012) lsquoComplex patterns of host switching inNew World arenavirusesrsquo Molecular Ecology 21 4137ndash50

Jacomy M et al (2014) lsquoForceAtlas2 a continuous graph layoutalgorithm for handy network visualization designed for theGephi softwarersquo PLoS One 9 e98679

Jones K E et al (2008) lsquoGlobal trends in emerging infectious dis-easesrsquo Nature 451 990ndash3

Jost L Chao A and Chazdon R L (2011) lsquoCompositional simi-larity and beta diversityrsquo in A E Magurran and B J McGill(eds) Biological Diversity Frontiers in Measurement andAssessment Oxford Oxford University Press

Karesh W B et al (2012) lsquoEcology of zoonoses natural and un-natural historiesrsquo Lancet 380 1936ndash45

Keesing F et al (2010) lsquoImpacts of biodiversity on the emergenceand transmission of infectious diseasesrsquo Nature 468 647ndash52

King A M Q et al (2012) Virus Taxonomy Classification andNomenclature of Viruses Ninth Report of the InternationalCommittee on Taxonomy of Viruses Elsevier Academic Press

Kreuder Johnson C et al (2015) lsquoSpillover and pandemic proper-ties of zoonotic viruses with high host plasticityrsquo ScientificReports 5 14830

Ksiazek T G et al (2003) lsquoA novel coronavirus associated withsevere acute respiratory syndromersquo The New England Journal ofMedicine 348 1953ndash66

Lau S K et al (2005) lsquoSevere acute respiratory syndromecoronavirus-like virus in Chinese horseshoe batsrsquo Proceedingsof the National Academy of Sciences of the United States of America102 14040ndash5

et al (2012) lsquoRecent transmission of a novel alphacoronavi-rus bat coronavirus HKU10 from Leschenaultrsquos rousettes topomona leaf-nosed bats first evidence of interspecies trans-mission of coronavirus between bats of different subordersrsquoJournal of Virology 86 11906ndash18

et al (2015) lsquoDiscovery of a novel coronavirus ChinaRattus coronavirus HKU24 from Norway rats supports the mu-rine origin of Betacoronavirus 1 and has implications for theancestor of Betacoronavirus lineage Arsquo Journal of Virology 893076ndash92

Lelli D et al (2013) lsquoDetection of coronaviruses in bats of variousspecies in Italyrsquo Viruses 5 2679ndash89

Li W et al (2005) lsquoBats are natural reservoirs of SARS-like coro-navirusesrsquo Science 310 676ndash9

Liang K and Zeger S L (1986) lsquoLongitudinal data analysis usinggeneralized linear modelsrsquo Biometrika 73 13ndash22

Maes P et al (2009) lsquoA proposal for new criteria for the classifica-tion of hantaviruses based on S and M segment protein se-quencesrsquo Infection Genetics and Evolution Journal of MolecularEpidemiology and Evolutionary Genetics in Infectious Diseases 9813ndash20

Magurran A E (2011) Biological Diversity Frontiers in Measurementand Assessment Oxford Oxford University Press

Melade J et al (2016) lsquoAn eco-epidemiological study of Morbilli-related paramyxovirus infection in Madagascar bats revealshost-switching as the dominant macro-evolutionary mecha-nismrsquo Scientific Reports 6 23752

Merkle D Middendorf M and Wieseke N (2010) lsquoA parameter-adaptive dynamic programming approach for inferring cophy-logeniesrsquo BMC Bioinformatics 11 Suppl 1 S60

Morse S S et al (2012) lsquoPrediction and prevention of the nextpandemic zoonosisrsquo The Lancet 380 1956ndash65

Pfefferle S et al (2009) lsquoDistant relatives of severe acute respira-tory syndrome coronavirus and close relatives of human

coronavirus 229E in bats Ghanarsquo Emerging Infectious Diseases15 1377ndash84

PREDICT_Consortium (2014) Reducing Pandemic Risk PromotingGlobal Health One Health Institute University of CaliforniaDavis Davis CA

Quan P L et al (2010) lsquoIdentification of a severe acute respira-tory syndrome coronavirus-like virus in a leaf-nosed bat inNigeriarsquo mBio 1 pii e00208-10 doi101128mBio00208-10

R Foundation for Statistical Computing (2008) R A Language andEnvironment for Statistical Computing Vienna Austria RFoundation for Statistical Computing

Ramsden C Holmes E C and Charleston M A (2009)lsquoHantavirus evolution in relation to its rodent and insectivorehosts no evidence for codivergencersquo Molecular Biology andEvolution 26 143ndash53

Reusken C B et al (2010) lsquoCirculation of group 2 coronavirusesin a bat species common to urban areas in Western EuropersquoVector Borne Zoonotic Dis 10 785ndash91

et al (2013) lsquoMiddle East respiratory syndrome coronavirusneutralising serum antibodies in dromedary camels a com-parative serological studyrsquo The Lancet Infectious Diseases 13859ndash66

Rubin D B (1987) Multiple Imputation for Nonresponse in SurveysNew York John Wiley and Sons

Sabir J S et al (2016) lsquoCo-circulation of three camel coronavirusspecies and recombination of MERS-CoVs in Saudi ArabiarsquoScience 351 81ndash4

Streicker D G et al (2012) lsquoEcological and anthropogenic driversof rabies exposure in vampire bats implications for transmis-sion and controlrsquo Proceedings Biological Sciencesthe RoyalSociety 279 3384ndash92

Tamura K et al (2013) lsquoMEGA6 molecular evolutionary geneticsanalysis version 60rsquo Molecular Biology and Evolution 30 2725ndash9

Tang X C et al (2006) lsquoPrevalence and genetic diversity of coro-naviruses in bats from Chinarsquo Journal of Virology 80 7481ndash90

Tsoleridis T et al (2016) lsquoDiscovery of novel alphacoronavirusesin European rodents and shrewsrsquo Viruses 8 84

Vijaykrishna D et al (2007) lsquoEvolutionary insights into the ecol-ogy of coronavirusesrsquo Journal of Virology 81 4012ndash20

Vijgen L et al (2005) lsquoComplete genomic sequence of human co-ronavirus OC43 molecular clock analysis suggests a relativelyrecent zoonotic coronavirus transmission eventrsquo Journal ofVirology 79 1595ndash604

Wacharapluesadee S et al (2015) lsquoDiversity of coronavirus inbats from Eastern Thailandrsquo Virology Journal 12 57

Wang Q et al (2014) lsquoBat origins of MERS-CoV supported by batcoronavirus HKU4 usage of human receptor CD26rsquo Cell Host ampMicrobe 16 328ndash37

Wang W et al (2015) lsquoDiscovery diversity and evolution ofnovel coronaviruses sampled from rodents in Chinarsquo Virology474 19ndash27

Watanabe S et al (2010) lsquoBat Coronaviruses and experimentalinfection of bats the Philippinesrsquo Emerging Infectious Diseases16 1217ndash23

Woo P C et al (2006) lsquoMolecular diversity of coronaviruses inbatsrsquo Virology 351 180ndash7

et al (2007) lsquoComparative analysis of twelve genomes ofthree novel group 2c and group 2d coronaviruses reveals uniquegroup and subgroup featuresrsquo Journal of Virology 81 1574ndash85

et al (2009) lsquoCoronavirus diversity phylogeny and inter-species jumpingrsquo Experimental Biology and Medicine 2341117ndash27

14 | Virus Evolution 2017 Vol 3 No 1

et al (2012) lsquoGenetic relatedness of the novel human groupC betacoronavirus to Tylonycteris bat coronavirus HKU4 andPipistrellus bat coronavirus HKU5rsquo Emerging Microbes andInfection 1 e35

Yang Y et al (2014) lsquoReceptor usage and cell entry of bat co-ronavirus HKU4 provide insight into bat-to-human

transmission of MERS coronavirusrsquo Proceedings of theNational Academy of Sciences of the United States of America111 12516ndash21

Zaki A M et al (2012) lsquoIsolation of a novel coronavirus from aman with pneumonia in Saudi Arabiarsquo The New England Journalof Medicine 367 1814ndash20

S J Anthony et al | 15

  • vex012-TF1
Page 4: Global patterns in coronavirus diversity · Since the emergence of Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV) and Middle East Respiratory Syndrom Coronavirus (MERS-CoV)

Rubin (1987) The analysis was repeated to explore variationbased on bat family (independent variable) Some families onlypossessed one type of event which would introduce quasi-complete separation into the GEE model so they were excludedfrom the corresponding models However the numbers of batspecies within these excluded families were very small (lt5)compared to the overall number of bat species for the wholestudy so we felt this would not impact any overall trends

27 Viral discovery effort

The relationship between sampling effort and viral discoverywas evaluated using a log-link Poisson regression model fittedwith the count of viruses for each species as the dependent vari-able and the number of animals sampled as the independentvariable This model is universally applied with count data Thecoefficient of the sample number variable is the log of the ratioof the expected number of viruses with (nthorn 1) samples collectedto the expected number of viruses with n samples collectedwhere n can be any arbitrary positive integer Using this esti-mated coefficient we calculated the estimated numbers of vi-ruses (viral sequence clusters) that can be found with respect tothe sampling effort (ie all possible numbers of collected ani-mals within a species) and they were then used to plot the esti-mated curve together with its 95 confidence band We clarifythat we use the term lsquoexpectedrsquo to indicate the statistical expec-tation based on the estimate of the regression model We ac-knowledge that the model fit is not as good as the model withan additional quadratic term of the independent variable none-theless it avoids the problem of over-fitting and realistically re-flects the relationship between sampling effort and viraldiscovery when the number of animals collected is relativelylower

28 Factors associated with CoV positivity

Due to the small numbers of positive individuals obtained forrodents non-human primates and humans only bat data wereanalyzed for factors associated with positivity A total of 12333bats were tested This included 5624 males and 5767 females(922 were not sexed) Among aged bats 7385 were adults 1029were sub-adults and 8 were neonates (3911 were not aged)Binary logistic regression models were used to evaluate signifi-cant variables restricting analyses to species for which 50 in-dividuals were tested Model selection was first performed in R(version 325) (R Foundation for Statistical Computing 2008) us-ing the R package lsquogmultirsquo and the step function and the bestmodel determined by AIC A robust estimation of standard errorwas calculated by clustering by individual Animal Identificationnumber to account for non-independence of multiple tests fromthe same animal using STATA 131 SE (College Station TXUSA) The independent variables included in analyses wereSpecimen Type (blood fecesrectal swabs guano oralrectalsample oralnasal swabs tissue urineurogenital swabs) HostTaxon (family sub-family genus) Host Age (adult sub-adultneonate) Season (wet dry) and Interface (animal use humanactivity land use pristine area) Season was designated as lsquowetrsquoor lsquodryrsquo according to month and proximity to the equator foreach country Interfaces were broadly grouped according to thetype and intensity of animal contact with people specifically (1)Land Use (animals sampled in areas with crops extractive in-dustries livestock activities) (2) Animal Use (hunting marketsrestaurants trade wild animal farms wildlife managementszoossanctuaries handling by veterinarianresearchers) (3)

Human Activities (ecotourism in and around human dwell-ings) and (4) Pristine (where animal human contact was notlikely) The model fit for each region was evaluated using theHosmerndashLemeshow goodness-of-fit test and pseudo R2 values

29 Virus lsquoHotspotrsquo Maps

The number of viral sequence clusters (richness) in each sub-clade was summarized by host family (Supplementary TableS1) and chi square tests used to evaluate whether viral sub-clade and host family were independent of each other Bat spe-cies richness maps were then created to infer hotspots of CoVdiversity For each viral clade spatial distribution data on all batspecies belonging to associated families was obtained from theInternational Union for Conservation of Nature (IUCN) UsingArcGIS version 10 (ESRI 2011) bat species were quantified bycounting overlapping polygons The resulting count data werethen converted to a raster and plotted All sampled bats as wellas all identified sequence clusters within the clade in questionwere then plotted over the raster data

3 Results31 Global diversity of CoVs

A total of 19192 animals and humans were assayed for thepresence of CoV by consensus PCR (cPCR) (Table 1) The majoritywere bats (nfrac14 12333) representing 282 species from twelvefamilies Overall the proportion of CoV positive individuals was86 in bats (nfrac14 106512333) and 02 in non-bats (nfrac14 176859) In other words over 98 of all positive individuals werebats

Partial sequences were obtained from two non-overlappingfragments of the orf1ab gene yielding 654 sequences for thelsquoQuanrsquo region and 950 for the lsquoWatanabersquo region (see lsquoMethodsrsquo)There was a 27 overlap where sequences were obtained for bothregions from the same sample The distribution of pairwise se-quence identities defined a 90 cut-off between taxa (Fig 1) andthe resulting monophyletic groups (in which all sequenceshad90 identity) used as our operating taxonomic units Groupsthat shared less than 90 identity to a known sequence were la-beled sequentially as PREDICT_CoV-1 -2 -3 etc while groups shar-ing90 identity to a sequence already in GenBank wereconsidered to be strains of a known virus and assigned the samename as the matching sequence (eg Kenya_bat_CoV_KY33 orHKU-9) Sequences that sharedgt90 identity but were found indifferent hosts were considered part of the same taxonomic unitBased on these criteria 100 discrete viral taxa were identifiedninety-one of which were found in bats (Table 1 Fig 2)Importantly we make no claims that these groups correspond tolsquospeciesrsquo as full orf1ab replicase sequences would be required forthat (King et al 2012) Instead we clarify that we are using thesepartial fragments to cluster sequences into lsquooperational taxonomicunitsrsquo (analogous to OTU clustering in microbiome research) Wealso stress that our cut-off was determined based on two discreteregions within the orf1ab separated by gt3000 nucleotides andthat these regions correspond to unique peptides post-transla-tional cleavage

Of note was the detection of subgroup 2a CoV sequences inbats humans and non-human primates from four differentcountries This sub-clade is largely considered the rodent sub-clade (Lau et al 2015 Wang et al 2015 Tsoleridis et al 2016)but here we detected sequences corresponding to the known vi-rus betacoronavirus-1 in Pteropus medius from Bangladesh

4 | Virus Evolution 2017 Vol 3 No 1

Pteropus alecto from Indonesia and from humans in China andsequences corresponding to murine coronavirus in non-humanprimates from Nepal (these sequences were detected by fourdifferent laboratories and no rodent samples were processed atthe same time) (Fig 2) In addition we report the discovery ofseveral sub-clade 2b and 2c CoV sequence clusters (the SARSand MERS sub-clades respectively) (Fig 2) and the detection ofhuman coronavirus 229E-like sequences in hipposideros andrhinolophus bats sampled in ROC Uganda Cameroon andGabon supporting previous suggestions that 229E has zoonoticorigins (Pfefferle et al 2009 Hu et al 2015) Finally we highlightthe detection of avian-associated infectious bronchitis virus-like sequences (IBV) in bats and a closely related virus(PREDICT_CoV-49) in non-human primates from Bangladesh aswell as porcine epidemic diarrhea virus (PEDV) in bats (Fig 2)Again we confirm that no livestock samples were processed inany of these laboratories that might have acted as a source ofcontamination

32 Factors driving viral diversity

Viral a-diversity (the lsquoeffective numberrsquo of species) was signifi-cantly correlated with bat a-diversity (taufrac14 0325 Pfrac14 0022) (Fig3) suggesting that more viruses will be found in regions wherebat diversity is higher This association was maintained whenviral richness rather than effective number of species wasused as the diversity index (taufrac14 0421 Plt 0001) Viral and batb-diversity were also significantly correlated (Mantel testrhofrac14 0575 Plt 0001) In both cases diversity differentiated al-most entirely into three discrete communities by region (Fig 3)Only Africa and Asia shared any viral sequence clusters(PREDICT_CoV-35 and HKU9 Fig 4) demonstrating that bats

have had a strong biogeographic influence on the global ecologyand evolution of CoVs

Co-phylogenetic reconciliation analysis was used to investi-gate the evolutionary mechanisms driving the virusndashhost asso-ciations observed for example host switching viral sharing orco-speciation (see lsquoMethodsrsquo) Overall host switching was thedominant mechanism followed by co-speciation (Fig 5) Whenanalyzed by region host switching (inter-genus transmission)remained dominant in Africa and Asia but not in LatinAmerica Despite fewer host-switching events in Latin Americathere was a concomitant increase in virus sharing (intra-genustransmission) This suggests that viruses are still switchinghosts in Latin America but preferentially move between closelyrelated species while in Africa and Asia viruses cross betweenmore distantly related species To assess the sensitivity of ourresults to our chosen method (ie use of a pre-defined costscheme in the program Jane) we repeated the analysis for theQuan alpha-CoV and beta-CoV reconstructions using CoRePA aparameter-adaptive approach that estimates an appropriatecost scheme without any prior value assignment Comparingthe results we noted 91 agreement between Jane and CoRePAin alpha-CoV reconstruction (thirty-four events of which thirty-one agreed) and 100 agreement in the beta-CoV reconstruc-tions (twenty-six events all of which agreed) We further notethat the cost scheme calculated in CoRePA was 07 for hostswitching and 01 for co-speciationmdashsupporting the default costscheme used in Jane (ie that host switching should be consid-ered more lsquocostlyrsquo than co-speciation)

A bipartite network model connecting viral sequence clus-ters to their hosts supports these results illustrating that batCoVs were connected to several host families in Africa and Asiawhile being largely restricted to a single bat family in Latin

Table 1 Summary of individuals tested and number positive for at least one coronavirus by host taxa

Host taxa tested No individuals tested No individuals positive No distinct viruses detected

Bats 12333 1065 91Non-Human Primates 3470 4 2Rodents and Shrews 3387 11 7Humans 1124 2 2Total 19192 1082 100a

aNote Numbers do not total as two viruses were detected in two taxa

Figure 1 Histogram of the relative frequency of pairwise sequence identities used to define the cutoff between operating taxonomic units for all CoV sequences de-

tected A bimodal distribution was observed and a cutoff of 90 sequence identity used to separate sequences from both the Watanabe (Panel A) and Quan (Panel B) as-

says into discrete viral taxa

S J Anthony et al | 5

Table 2 Variables associated with coronavirus positive bat tests in Africa Asia and Latin America Regional datasets for each logistic regres-sion model consisted of bat species where greater than fifty bats were tested Age datasets were a subset of the regional datasets as age classwas not determined for all individuals Numbers in bold are statically significant for Plt 005

Odds ratio P 95 Confidence intervals

AfricaSample type Lower Upper

Fecesrectal swab 35098 lt0001 14664 84007Oralnasal swab 505 lt0001 203 1254Oralrectal swab 3098 lt0001 1346 7130Blood 165 0540 033 809Tissue 100

Host familyHipposideridae 122 0559 062 241Pteropodidae 384 lt0001 227 649Molossidae 100

SeasonDry 242 lt0001 179 325Wet 100

InterfaceAnimal use 198 0001 135 291Pristine area 139 0624 037 529Land use 166 0219 074 374Human activity 100

AgeSubadult 591 lt0001 428 817Adult 100

AsiaSample type

Fecesrectal swab 6280 lt0001 1544 25549Oralnasal swab 1325 lt0001 311 5638Urineurogenital swab 4692 lt0001 1085 20280Guano 2212 lt0001 366 13356Tissue Oralrectal swab 100

Host familyEmballonuridae 031 026 004 237Miniopteridae 978 lt0001 569 1681Pteropodidae 216 lt0001 129 362Rhinolophidae 108 082 057 203Vespertilionidae 367 lt0001 218 618Hipposideridae 100

SeasonDry 149 003 103 216Wet 100

InterfaceAnimal use 330 001 129 840Human activity 348 001 136 895Pristine area 181 035 052 635Land use 100

AgeSubadult 184 000 126 267Adult 100

Latin AmericaSample type

Fecesrectal swab 1566 0000 502 4879Oralrectal swab 2786 0000 686 11322Oralnasal swab 100

Host subfamilyCarolliinae 695 006 090 5376Stenodermatinae 504 012 067 3789Glossophaginae 100

InterfaceAnimal use 525 001 148 1865Human activity 273 012 077 969Land use 422 010 078 2295Pristine area 100

SeasonDry 130 052 058 289Wet 100

AgeSubadult 173 015 083 362Adult 100

6 | Virus Evolution 2017 Vol 3 No 1

America (Fig 4 Panel B) Further host switching was nearlyfour times more likely than virus sharing in Africa when com-pared with Latin America (OR 3858 Pfrac14 0040) CoVs in Asiawere also more likely to host switch than share when comparedwith Latin America however the results were not significant(OR 2474 Pfrac14 0143) No particular bat family was associatedwith increased host switching but this may reflect small sam-ple sizes when the data are summarized by family

33 Estimated number of CoVs in bats

A large number of host (bat) species in our study were negativefor CoV (nfrac14 197) however none of these species were sampledextensively (Fig 6) We found that all species with samplesizesgt110 individuals were positive for one or more CoVs sug-gesting that we would have detected CoVs in some of the nega-tive species in our study if sampling effort were increased Dueto the high number of these negative species and their effect onthe average number of virus sequence clusters per species theywere excluded from our estimates By retaining only those spe-cies for whichgt110 individuals were sampled (there weretwenty-seven species that qualified) we estimated the averagenumber of CoVs per species to be 267 (stdfrac14 138) thus account-ing for the likely scenario that some species will have more vi-ruses and others less We then extrapolated the average to all1200 bat species to estimate a total potential richness of 3204

CoVs (rangefrac14 1200ndash6000 CoVs) most of which have yet to bedescribed

34 Factors predicting CoV positivity

In order to refine future surveillance efforts to find the undis-covered diversity of CoVs in bats we evaluated the factors pre-dicting CoV positivity For each region the best model includedspecimen type bat family (or in the case of Latin America sub-family) season and animalndashhuman interface (Table 2) Seasonwas not included in the top model for Latin America but themodel with the season included was within two AIC (deltaAICfrac14 106) and therefore considered to have as much support asthe top model (Burnham and Anderson 2002) Specimen typewas highly associated with CoV positivity and samples contain-ing feces or fecal swabs were significantly more likely to testpositive than other specimen types in all regions Bat familywas important in Asia and Africa Season was also significantwith samples collected during the dry season more likely to testpositive in Africa and Asia than those collected during the wetseason (albeit with low odds ratios) Age class appears to be im-portant as sub-adults were more likely to test positive thanadults in Africa and Asia Sex was not related to CoV positivityin Asia or Latin America while males were slightly more likelyto be CoV positive in Africa (ORfrac14 15 12ndash19 CI Plt 0001) Broadcategories of animalndashhuman interfaces were significantly

Figure 2 Maximum likelihood phylogenetic reconstructions for all partial CoV RdRp fragments for both the Quan (Panel A) and Watanabe (Panel B) assays Sequences

are collapsed into clades representing our operating taxonomic units (sequences sharing90 identity) and number of sequences for each taxon is indicated in paren-

theses Representative published sequences from GenBank have been included for comparison (GenBank accession number indicated) Both trees were rooted using

the related Breda virus (NC_007447) and all nodes have 60 bootstrap support Pie charts indicate the distribution of viral taxa by host (bat) family in each virus sub-

clade

S J Anthony et al | 7

associated with CoV positivity among bats in all three regionsin Africa and Latin America bats sampled at the animal use in-terface category were more likely to be positive and in Asiabats sampled at the animal use and human activities interfaceswere more likely to be positive than those sampled at other in-terfaces The significance of the animal interface in all three

regions suggests that practices around animal use may be im-portant for disease transmission

35 Future sampling effort

Based on our study we were able to estimate the sampling ef-fort that would be required to find all CoVs in bats based on a

Figure 3 Comparison of viral and bat diversity The earthrsquos surface was divided into grid cells by latitude and longitude (10 10 degree units) for diversity calculations

(Panel A) Grid cells where bats were sampled are numbered in each region Alpha diversity (Shannon H) for virus (Panel B) and host (Panel C) were correlated indicat-

ing that areas of high bat diversity also have high viral diversity Darker cells indicate higher alpha diversity (ie more viral or host taxa) in each grid cell Beta diversity

was also correlated between virus (Panel D) and host (Panel E) and differentiated into three discrete communities by regionmdashLatin America (grid cells 1ndash10) Africa

(grid cells 11ndash20) and Asia (grid cells 21ndash34) Shading indicates that either viruses (in red) or hosts (in blue) are shared between two corresponding grid cells with darker

cells indicating higher pairwise similarity

8 | Virus Evolution 2017 Vol 3 No 1

Poisson regression model of our data (Fig 6) With 154 individ-uals we estimate that an average of one CoV will be detected(95 CI 136ndash177) With 397 individuals we estimate that upto five CoVs will be detected (95 CI 351ndash458) As we did not

observe any species with greater than five viral sequenceclusters we make the assumption that sampling 397 individ-uals should capture the full diversity of CoVs in each batspecies

Figure 4 Network model showing the connection of CoVs and their hosts Viral sequence clusters (colored grey) are connected to host species either by region (Panel

A) or family (Panel B) Viral and host and communities separate almost entirely by region only Africa and Asia are connected by two shared viruses (HKU9 and

PREDICT_CoV-35) found in species from both continents Networks also show that viruses appear to be shared by multiple host families in Africa and Asia while being

more restricted to a single family in Latin America

S J Anthony et al | 9

Figure 5 Relative proportion of evolutionary events leading to observed virushost associations for CoVs in bats Cophylogenetic reconstructions were used to identify

each event (Supplementary Fig S1 Panels AndashD) and significance evaluated by region Across all regions host switching was the dominant evolutionary event (Panel

A) When separated by region host switching remained dominant in Africa (Panel B) and Asia (Panel C) but not in Latin America (Panel D)

Figure 6 Relationship between sampling effort and viral detection among all bat species sampled A Poisson regression model was used to estimate the expected num-

ber of viruses based on the number of animals sampled by species (each circle indicates a separate species) The 95 confidence intervals are indicated

10 | Virus Evolution 2017 Vol 3 No 1

36 Predicting the distribution of unknown CoVs

Viral sub-clade and host family were not independent of eachother (v2 by cladefrac14 Plt 0001 Supplementary Table S1) demon-strating that different clades have significant associations withspecific bat families For example subgroup 2d CoVs were sig-nificantly associated with pteropid bats and were only identi-fied in regions where pteropid bats exist Likewise subgroup 2bCoVs were associated with rhinolophid and hipposiderid batsand 2c with vespertilionid bats To lsquopredictrsquo the potential distri-bution of unknown CoVs we plotted the known distribution ofbats belonging to these families and assume that lsquohotspotsrsquo ofbat diversity infer hotspots of viral diversity for each sub-clade(Fig 7) We note that the alphaCoV genus has been consideredas one group since they do not resolve well into sub-clades(de Groot et al 2009) and that no 2a map was generatedgiven that only one bat virus from this clade was identified

4 Discussion

The emergence of SARS and MERS has driven a need to under-stand more about the diversity ecology and evolution of coro-naviruses particularly at so-called lsquohotspotsrsquo of zoonoticemergence (Jones et al 2008 Drexler et al 2014) To this end wesurveyed the diversity of CoVs from twenty countries in LatinAmerica Africa and Asia to identify global factors driving viraldiversity and to look for regional differences in factors that con-tribute to the risk of emergence such as host switching In totalwe identified sequences from 100 discrete phylogenetic clus-ters ninety-one of which were found in bats

Our data suggest that the diversity of bat CoVs has beendriven primarily by host ecology First viral richness wasstrongly correlated with bat richness suggesting that mostCoVs will be found in regions where bat diversity is highestSecond we showed that CoV diversity separates into three

Figure 7 Viral diversity lsquohotspotrsquo maps Panel A shows the potential hotspots for CoVs based on the distribution of bats worldwide Location of alpha CoV sequences

from this study are shown in black and beta CoV sequences in blue indicating there is no geographical bias based on viral genus (ie alpha and beta CoVs are equally

likely to be found in all regions) Some viral sub-clades were associated with particular bat families and the spatial distribution data of all species belonging to these

families were plotted to indicate the potential hotspots of viral diversity (richness) for these sub-clades Panel B indicates the potential distribution of 2b CoVs based on

the distribution of rhinolphus and hipposideros bats Locations of 2b-positive animals identified in this study are indicated in black and correlate with areas of high

species richness (for these families) CoV-positive animals for other sub-clades shown in light blue Panel C indicates the potential distribution of 2c CoVs based on the

distribution of vespertilionid bats Locations of 2c-positive animals identified in this study are indicated in black This map suggests there are hotspots of 2c diversity

in regions not covered in this study (eg Europe) Panel D indicates the potential distribution of 2d viruses based on the distribution of pteropid bats The map suggests

these viruses may have a more limited distribution compared with viruses of other sub-clades Locations of 2d-positive animals identified in this study are shown in

black

S J Anthony et al | 11

distinct communities by region echoing the distribution of batsand suggesting an ecological dependence on their hosts Andthird we identified particular associations between viral sub-clade and bat family indicating that CoVs have evolved with (oradapted to) preferred families Collectively these data showthat the global diversity and distribution of CoVs in bats is non-random and is driven by variation in the biogeography of bats

Regional variation was also observed in the proportion ofhost switching events relative to other evolutionary mecha-nisms such as co-speciation duplication or sharing (seeMethods for definitions) Overall host switching was the domi-nant mechanism supporting the general trend observed forCoVs (Vijaykrishna et al 2007 Woo et al 2009 Lau et al 2012) aswell as similar findings for other viral families including para-mxyoviruses (Melade et al 2016) hantaviruses (Ramsden et al2009) and arenaviruses (Coulibaly-NrsquoGolo et al 2011 Irwin et al2012) However when analyzed by region there were propor-tionally fewer events in Latin America compared with Africa orAsia Given that host switching is the first critical step in zoo-notic emergence (Li et al 2005Pfefferle et al 2009 Woo et al2012 Azhar et al 2014) we suggest this finding could reflect re-gional differences in the risk of disease emergence

It is important to note that we distinguish lsquohost switchingrsquo(inter-genus or family switching) from lsquosharingrsquo (intra-genusswitching) in our analysis and that while the proportion of hostswitches was lower in Latin America the proportion of sharingevents was concomitantly higher This indicates that CoVs inthis region still switch hosts just like they do in Africa and Asiabut that they preferentially move between closely related spe-cies While we do not yet understand the factors driving this dif-ference if we assume that the risk of spillover to humansreflects the taxonomic distances that CoVs are able to jump(Kreuder Johnson et al 2015) our results would indicate ahigher risk for zoonotic emergence in Africa and Asia [acknowl-edging that host switching is not the only important factor driv-ing zoonotic emergence (Jones et al 2008 Keesing et al 2010Karesh et al 2012)] While evidence of regional variation in hostswitching has not been reported previously for viruses (toour knowledge) similar patterns have been observed inhemosporidian parasites (Ellis et al 2015)

In total we estimated that there are at least 3204 CoVs(based on the cluster definition used in this study) in bats Thissuggests that although there is still a large number of coronavi-ruses that remain to be discovered their detection remains anachievable goal and we advocate for their discovery given thepotential for even greater insights into the global ecology andevolution of these viruses (eg further investigation into re-gional differences in host switching) and the opportunity toelucidate their individual zoonotic potential (Ge et al 2013Wang et al 2014 Yang et al 2014 Anthony et al 2017) Buildingon our study we suggest that future surveillance efforts shouldconsider the number of samples carefully and include up to 400individuals (of either sex) per species in order to maximize thechance of detecting all CoVs and that including less than 154individuals per species might reduce the chance of finding evenone virus and would likely produce a poor return on invest-ment We also recommend that feces (or rectal swabs) followedby saliva (or oral swabs) should be prioritized if resources arelimited and all sample types cannot be processed and thatsampling should be biased towards immature animals and inthe dry season

To predict where this unknown diversity is likely to be weplotted the known distribution of all bat families associatedwith each CoV sub-clade The 2b SARS clade was associated

with rhinolophid and hipposiderid bats so we plotted the distri-bution of all bat species belonging to these families to generatea map of viral diversity (richness) Based on this map we wouldexpect the greatest diversity of unknown 2b viruses to be foundin South East Asia which is consistent with previous studies de-scribing 2b viruses in bats but also in isolated pockets through-out Africa In contrast the 2c MERS clade was associated withvespertilionid bats predicting that 2c diversity will be highest inMexico Europe Central and South East Africa and parts ofSouth East Asia Again this is consistent with the distributionof 2c viruses reported here and previously in the literature(Woo et al 2007 Reusken et al 2010 Anthony et al 2012 Lelliet al 2013 Corman et al 2014ab Anthony et al 2017 )

Certain limitations should be considered in the interpreta-tion of our data First the diversity of CoV sequences detected isalmost certainly biased by our sampling effort A small numberof species were sampled abundantly thus maximizing ourchances of finding a CoV (27282 species had gt110 individuals)however most were under-sampled (232282 with50 individ-uals) While this biases the probability of finding a CoV towardsthe more abundantly sampled species we stress that it also re-flects the uneven distribution of naturally occurring communi-ties (Magurran 2011) and that targeting the very rare specieswould be prohibitively expensive It is also unclear if populationsize has an effect on the number of viruses present in a spe-ciesmdasheg do larger populations accommodate more viral diver-sity Second our analysis only represents the lsquolastrsquo evolutionaryevent identified It does not represent or account for the com-plete evolutionary history of these lineages Therefore while asingle event has been ascribed to each association (seelsquoMethodsrsquo) it does not preclude other events having an equal orgreater impact in the past We are limited to using this most re-cent event type in order to assign a single evolutionary event toeach unique association It should also be noted that events aredependent on the relationships observed in the reconstructedtrees and could change as more viruses are added or longer se-quences are used Third we have only generated partial se-quences for analysis which limits our ability to comment onthe specific zoonotic potential of each virus or provide a com-prehensive analysis of their genetic histories and taxonomy forexample estimating the frequency or impact of recombinationwhich can be an important driver of host switching and diseaseemergence (Anthony et al 2017) We certainly support full-genome sequencing wherever possible [and have such effortsunderway (Anthony et al 2017)] however difficulties in virusisolation or sequencing directly from swabs (where materialand viral load can be low yielding variable results) can oftenpreclude extending sequences much beyond the short frag-ments amplified by consensus PCR (Drexler et al 2014) Equallylogistic and permitting issues in moving biological samplesacross borders and the need to first develop in-country capacityto fully sequence these viruses have all limited our ability tocharacterize (most of) these viruses further

Our lsquostate of the artrsquo is in the unique scope of our study focus-ing on a single viral family at a global scale To achieve this theUSAID PREDICT project first had to establish a strategy for viraldiscovery in twenty different countries many of which had noprior capacity to do this work at all For this reason we used asimple yet highly implementable approach based on cPCRInitially this approach might not appear to be as powerful ashigh throughput sequencing (HTS) techniques however thisstudy has demonstrated the particular and considerablestrengths of cPCR as an affordable screening and discovery tool inresource-poor settings where HTS is still largely unsupported

12 | Virus Evolution 2017 Vol 3 No 1

Studying viral diversity on a global scale is just one compo-nent of a larger strategy to understand the factors driving viraldiversity Ecological and evolutionary mechanisms can contrib-ute differently across scales and we therefore advocate compli-menting the global perspective with studies that focus on entireviral communities (rather than only one viral family) within sin-gle individuals or host species (rather than only on a globalscale) (Anthony et al 2015) We propose that it is critical to un-derstand both the component parts of the lsquozoonotic poolrsquo(Morse et al 2012) and the nature of their interactions at differ-ent scales if we are to move towards better predictive modelsand ultimately to reducing zoonotic emergence

Finally we offer a specific comment regarding the publichealth threat posed by bats While it is tempting to concludethat bats harbor a large number of potentially zoonotic CoVsmost of the putative viruses detected in this study are unlikelyto pose any threat to humansmdasheither because they lack the bio-logical pre-requisites to infect human cells or because the ecol-ogy of their hosts limits the opportunity for spillover Studiessuch as this are intended to advance our understanding of thefundamental biology of viruses not to create alarm or incite theretaliatory culling of bats Indeed such actions often have un-anticipated consequences and can even enhance disease trans-mission as seen with rabies in vampire bats (Streicker et al2012 Blackwood et al 2013) Bats are important insectivorespollinators and seed dispersers and we underscore both their vi-tal ecosystem role and the need to consider any public healthinterventions carefully

Supplementary data

Supplementary data are available at Virus Evolution online

Conflict of interest None declared

Acknowledgements

This study was made possible by the generous support ofthe American people through the United States Agency forInternational Development (USAID) Emerging PandemicThreats PREDICT project (cooperative agreement numberGHN-A-OO-09-00010-00) We thank the governments ofCameroon Gabon Democratic Republic of Congo Republicof Congo Rwanda Tanzania Uganda Peru Bolivia BrazilMexico Bangladesh Cambodia China Indonesia LaosMalaysia Nepal Thailand and Viet Nam for permission toconduct this study and the field teams and collaboratinglaboratories that performed sample collection and testing

ReferencesAnthony S J et al (2012) lsquoCoronaviruses in bats from Mexicorsquo

Journal of General Virology 94 1028ndash38 et al (2015) lsquoNon-random patterns in viral diversityrsquo

Nature Communications 6 8147 et al (2017) lsquoFurther evidence for bats as the evolutionary

source of MERS coronavirusrsquo mBio 82 pii e00373-17 doi101128mBio00373-17

Azhar E I et al (2014) lsquoEvidence for camel-to-human transmis-sion of MERS coronavirusrsquo The New England Journal of Medicine370 2499ndash505

Blackwood J C et al (2013) lsquoResolving the roles of immunitypathogenesis and immigration for rabies persistence in

vampire batsrsquo Proceedings of the National Academy of Sciences ofthe United States of America 110 20837ndash42

Burnham K P and Anderson D R (2002) Model Selection andMultimodel Inference A Practical Information-Theoretic ApproachNew York Springer

Charleston M A (1998) lsquoJungles a new solution to the hostpar-asite phylogeny reconciliation problemrsquo MathematicalBiosciences 149 191ndash223

Conow C et al (2010) lsquoJane a new tool for the cophylogeny recon-struction problemrsquo Algorithms for Molecular Biology AMB 5 16

Corman V M et al (2014a) lsquoRooting the phylogenetic tree ofmiddle east respiratory syndrome coronavirus by characteri-zation of a conspecific virus from an African batrsquo Journal ofVirology 88 11297ndash303

et al (2014b) lsquoCharacterization of a novel betacoronavirusrelated to middle East respiratory syndrome coronavirus inEuropean hedgehogsrsquo Journal of Virology 88 717ndash24

Coulibaly-NrsquoGolo D et al (2011) lsquoNovel arenavirus sequences inHylomyscus sp and Mus (Nannomys) setulosus from CotedrsquoIvoire implications for evolution of arenaviruses in AfricarsquoPLoS One 6 e20893

Darriba D et al (2012) lsquojModelTest 2 more models new heuris-tics and parallel computingrsquo Nature Methods 9 772

de Groot R J et al (2009) lsquoVirus taxonomy classification and no-menclature of virusesrsquo in A M Q King M J Adams E BCarstens and J Lefkkowitch (eds) Ninth Report of theInternational Committee for the Taxonomy of Viruses AmsterdamElsevier

Donaldson E F et al (2010) lsquoMetagenomic analysis of theviromes of three North American bat species viral diversityamong different bat species that share a common habitatrsquoJournal of Virology 84 13004ndash18

Drexler J F Corman V M and Drosten C (2014) lsquoEcology evo-lution and classification of bat coronaviruses in the aftermathof SARSrsquo Antiviral Research 101 45ndash56

Drosten C et al (2003) lsquoIdentification of a novel coronavirus inpatients with severe acute respiratory syndromersquo The NewEngland Journal of Medicine 348 1967ndash76

Edgar R C (2004) lsquoMUSCLE multiple sequence alignment withhigh accuracy and high throughputrsquo Nucleic Acids Research 321792ndash7

Ellis V A et al (2015) lsquoLocal host specialization host-switchingand dispersal shape the regional distributions of avian haemo-sporidian parasitesrsquo Proceedings of the National Academy ofSciences of the United States of America 112 11294ndash9

ESRI (2011) ArcGIS Desktop Release 10 Redlands CAEnvironmental Systems Research Institute

Ge X Y et al (2013) lsquoIsolation and characterization of a batSARS-like coronavirus that uses the ACE2 receptorrsquo Nature503 535ndash8

Hijmans R J Williams E Vennes C (2015) GeosphereSpherical Trigonometry v R Package version 15

Hagberg A A Schult D A and Swart P J (2008) lsquoExploring net-work structure dynamics and function using NetworkXrsquo in GVaroquaux T Vaught and J Millman (eds) Proceedings of 7thPython in Science Conference (SciPy2008) pp 11ndash15 Pasadena CA

Hu B et al (2015) lsquoBat origin of human coronavirusesrsquo VirologyJournal 12 221

Huang Y W et al (2013) lsquoOrigin evolution and genotyping ofemergent porcine epidemic diarrhea virus strains in theUnited Statesrsquo mBio 4 e00737ndash13

Huynh J et al (2012) lsquoEvidence supporting a zoonotic origin ofhuman coronavirus strain NL63rsquo Journal of Virology 8612816ndash25

S J Anthony et al | 13

Irwin N R et al (2012) lsquoComplex patterns of host switching inNew World arenavirusesrsquo Molecular Ecology 21 4137ndash50

Jacomy M et al (2014) lsquoForceAtlas2 a continuous graph layoutalgorithm for handy network visualization designed for theGephi softwarersquo PLoS One 9 e98679

Jones K E et al (2008) lsquoGlobal trends in emerging infectious dis-easesrsquo Nature 451 990ndash3

Jost L Chao A and Chazdon R L (2011) lsquoCompositional simi-larity and beta diversityrsquo in A E Magurran and B J McGill(eds) Biological Diversity Frontiers in Measurement andAssessment Oxford Oxford University Press

Karesh W B et al (2012) lsquoEcology of zoonoses natural and un-natural historiesrsquo Lancet 380 1936ndash45

Keesing F et al (2010) lsquoImpacts of biodiversity on the emergenceand transmission of infectious diseasesrsquo Nature 468 647ndash52

King A M Q et al (2012) Virus Taxonomy Classification andNomenclature of Viruses Ninth Report of the InternationalCommittee on Taxonomy of Viruses Elsevier Academic Press

Kreuder Johnson C et al (2015) lsquoSpillover and pandemic proper-ties of zoonotic viruses with high host plasticityrsquo ScientificReports 5 14830

Ksiazek T G et al (2003) lsquoA novel coronavirus associated withsevere acute respiratory syndromersquo The New England Journal ofMedicine 348 1953ndash66

Lau S K et al (2005) lsquoSevere acute respiratory syndromecoronavirus-like virus in Chinese horseshoe batsrsquo Proceedingsof the National Academy of Sciences of the United States of America102 14040ndash5

et al (2012) lsquoRecent transmission of a novel alphacoronavi-rus bat coronavirus HKU10 from Leschenaultrsquos rousettes topomona leaf-nosed bats first evidence of interspecies trans-mission of coronavirus between bats of different subordersrsquoJournal of Virology 86 11906ndash18

et al (2015) lsquoDiscovery of a novel coronavirus ChinaRattus coronavirus HKU24 from Norway rats supports the mu-rine origin of Betacoronavirus 1 and has implications for theancestor of Betacoronavirus lineage Arsquo Journal of Virology 893076ndash92

Lelli D et al (2013) lsquoDetection of coronaviruses in bats of variousspecies in Italyrsquo Viruses 5 2679ndash89

Li W et al (2005) lsquoBats are natural reservoirs of SARS-like coro-navirusesrsquo Science 310 676ndash9

Liang K and Zeger S L (1986) lsquoLongitudinal data analysis usinggeneralized linear modelsrsquo Biometrika 73 13ndash22

Maes P et al (2009) lsquoA proposal for new criteria for the classifica-tion of hantaviruses based on S and M segment protein se-quencesrsquo Infection Genetics and Evolution Journal of MolecularEpidemiology and Evolutionary Genetics in Infectious Diseases 9813ndash20

Magurran A E (2011) Biological Diversity Frontiers in Measurementand Assessment Oxford Oxford University Press

Melade J et al (2016) lsquoAn eco-epidemiological study of Morbilli-related paramyxovirus infection in Madagascar bats revealshost-switching as the dominant macro-evolutionary mecha-nismrsquo Scientific Reports 6 23752

Merkle D Middendorf M and Wieseke N (2010) lsquoA parameter-adaptive dynamic programming approach for inferring cophy-logeniesrsquo BMC Bioinformatics 11 Suppl 1 S60

Morse S S et al (2012) lsquoPrediction and prevention of the nextpandemic zoonosisrsquo The Lancet 380 1956ndash65

Pfefferle S et al (2009) lsquoDistant relatives of severe acute respira-tory syndrome coronavirus and close relatives of human

coronavirus 229E in bats Ghanarsquo Emerging Infectious Diseases15 1377ndash84

PREDICT_Consortium (2014) Reducing Pandemic Risk PromotingGlobal Health One Health Institute University of CaliforniaDavis Davis CA

Quan P L et al (2010) lsquoIdentification of a severe acute respira-tory syndrome coronavirus-like virus in a leaf-nosed bat inNigeriarsquo mBio 1 pii e00208-10 doi101128mBio00208-10

R Foundation for Statistical Computing (2008) R A Language andEnvironment for Statistical Computing Vienna Austria RFoundation for Statistical Computing

Ramsden C Holmes E C and Charleston M A (2009)lsquoHantavirus evolution in relation to its rodent and insectivorehosts no evidence for codivergencersquo Molecular Biology andEvolution 26 143ndash53

Reusken C B et al (2010) lsquoCirculation of group 2 coronavirusesin a bat species common to urban areas in Western EuropersquoVector Borne Zoonotic Dis 10 785ndash91

et al (2013) lsquoMiddle East respiratory syndrome coronavirusneutralising serum antibodies in dromedary camels a com-parative serological studyrsquo The Lancet Infectious Diseases 13859ndash66

Rubin D B (1987) Multiple Imputation for Nonresponse in SurveysNew York John Wiley and Sons

Sabir J S et al (2016) lsquoCo-circulation of three camel coronavirusspecies and recombination of MERS-CoVs in Saudi ArabiarsquoScience 351 81ndash4

Streicker D G et al (2012) lsquoEcological and anthropogenic driversof rabies exposure in vampire bats implications for transmis-sion and controlrsquo Proceedings Biological Sciencesthe RoyalSociety 279 3384ndash92

Tamura K et al (2013) lsquoMEGA6 molecular evolutionary geneticsanalysis version 60rsquo Molecular Biology and Evolution 30 2725ndash9

Tang X C et al (2006) lsquoPrevalence and genetic diversity of coro-naviruses in bats from Chinarsquo Journal of Virology 80 7481ndash90

Tsoleridis T et al (2016) lsquoDiscovery of novel alphacoronavirusesin European rodents and shrewsrsquo Viruses 8 84

Vijaykrishna D et al (2007) lsquoEvolutionary insights into the ecol-ogy of coronavirusesrsquo Journal of Virology 81 4012ndash20

Vijgen L et al (2005) lsquoComplete genomic sequence of human co-ronavirus OC43 molecular clock analysis suggests a relativelyrecent zoonotic coronavirus transmission eventrsquo Journal ofVirology 79 1595ndash604

Wacharapluesadee S et al (2015) lsquoDiversity of coronavirus inbats from Eastern Thailandrsquo Virology Journal 12 57

Wang Q et al (2014) lsquoBat origins of MERS-CoV supported by batcoronavirus HKU4 usage of human receptor CD26rsquo Cell Host ampMicrobe 16 328ndash37

Wang W et al (2015) lsquoDiscovery diversity and evolution ofnovel coronaviruses sampled from rodents in Chinarsquo Virology474 19ndash27

Watanabe S et al (2010) lsquoBat Coronaviruses and experimentalinfection of bats the Philippinesrsquo Emerging Infectious Diseases16 1217ndash23

Woo P C et al (2006) lsquoMolecular diversity of coronaviruses inbatsrsquo Virology 351 180ndash7

et al (2007) lsquoComparative analysis of twelve genomes ofthree novel group 2c and group 2d coronaviruses reveals uniquegroup and subgroup featuresrsquo Journal of Virology 81 1574ndash85

et al (2009) lsquoCoronavirus diversity phylogeny and inter-species jumpingrsquo Experimental Biology and Medicine 2341117ndash27

14 | Virus Evolution 2017 Vol 3 No 1

et al (2012) lsquoGenetic relatedness of the novel human groupC betacoronavirus to Tylonycteris bat coronavirus HKU4 andPipistrellus bat coronavirus HKU5rsquo Emerging Microbes andInfection 1 e35

Yang Y et al (2014) lsquoReceptor usage and cell entry of bat co-ronavirus HKU4 provide insight into bat-to-human

transmission of MERS coronavirusrsquo Proceedings of theNational Academy of Sciences of the United States of America111 12516ndash21

Zaki A M et al (2012) lsquoIsolation of a novel coronavirus from aman with pneumonia in Saudi Arabiarsquo The New England Journalof Medicine 367 1814ndash20

S J Anthony et al | 15

  • vex012-TF1
Page 5: Global patterns in coronavirus diversity · Since the emergence of Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV) and Middle East Respiratory Syndrom Coronavirus (MERS-CoV)

Pteropus alecto from Indonesia and from humans in China andsequences corresponding to murine coronavirus in non-humanprimates from Nepal (these sequences were detected by fourdifferent laboratories and no rodent samples were processed atthe same time) (Fig 2) In addition we report the discovery ofseveral sub-clade 2b and 2c CoV sequence clusters (the SARSand MERS sub-clades respectively) (Fig 2) and the detection ofhuman coronavirus 229E-like sequences in hipposideros andrhinolophus bats sampled in ROC Uganda Cameroon andGabon supporting previous suggestions that 229E has zoonoticorigins (Pfefferle et al 2009 Hu et al 2015) Finally we highlightthe detection of avian-associated infectious bronchitis virus-like sequences (IBV) in bats and a closely related virus(PREDICT_CoV-49) in non-human primates from Bangladesh aswell as porcine epidemic diarrhea virus (PEDV) in bats (Fig 2)Again we confirm that no livestock samples were processed inany of these laboratories that might have acted as a source ofcontamination

32 Factors driving viral diversity

Viral a-diversity (the lsquoeffective numberrsquo of species) was signifi-cantly correlated with bat a-diversity (taufrac14 0325 Pfrac14 0022) (Fig3) suggesting that more viruses will be found in regions wherebat diversity is higher This association was maintained whenviral richness rather than effective number of species wasused as the diversity index (taufrac14 0421 Plt 0001) Viral and batb-diversity were also significantly correlated (Mantel testrhofrac14 0575 Plt 0001) In both cases diversity differentiated al-most entirely into three discrete communities by region (Fig 3)Only Africa and Asia shared any viral sequence clusters(PREDICT_CoV-35 and HKU9 Fig 4) demonstrating that bats

have had a strong biogeographic influence on the global ecologyand evolution of CoVs

Co-phylogenetic reconciliation analysis was used to investi-gate the evolutionary mechanisms driving the virusndashhost asso-ciations observed for example host switching viral sharing orco-speciation (see lsquoMethodsrsquo) Overall host switching was thedominant mechanism followed by co-speciation (Fig 5) Whenanalyzed by region host switching (inter-genus transmission)remained dominant in Africa and Asia but not in LatinAmerica Despite fewer host-switching events in Latin Americathere was a concomitant increase in virus sharing (intra-genustransmission) This suggests that viruses are still switchinghosts in Latin America but preferentially move between closelyrelated species while in Africa and Asia viruses cross betweenmore distantly related species To assess the sensitivity of ourresults to our chosen method (ie use of a pre-defined costscheme in the program Jane) we repeated the analysis for theQuan alpha-CoV and beta-CoV reconstructions using CoRePA aparameter-adaptive approach that estimates an appropriatecost scheme without any prior value assignment Comparingthe results we noted 91 agreement between Jane and CoRePAin alpha-CoV reconstruction (thirty-four events of which thirty-one agreed) and 100 agreement in the beta-CoV reconstruc-tions (twenty-six events all of which agreed) We further notethat the cost scheme calculated in CoRePA was 07 for hostswitching and 01 for co-speciationmdashsupporting the default costscheme used in Jane (ie that host switching should be consid-ered more lsquocostlyrsquo than co-speciation)

A bipartite network model connecting viral sequence clus-ters to their hosts supports these results illustrating that batCoVs were connected to several host families in Africa and Asiawhile being largely restricted to a single bat family in Latin

Table 1 Summary of individuals tested and number positive for at least one coronavirus by host taxa

Host taxa tested No individuals tested No individuals positive No distinct viruses detected

Bats 12333 1065 91Non-Human Primates 3470 4 2Rodents and Shrews 3387 11 7Humans 1124 2 2Total 19192 1082 100a

aNote Numbers do not total as two viruses were detected in two taxa

Figure 1 Histogram of the relative frequency of pairwise sequence identities used to define the cutoff between operating taxonomic units for all CoV sequences de-

tected A bimodal distribution was observed and a cutoff of 90 sequence identity used to separate sequences from both the Watanabe (Panel A) and Quan (Panel B) as-

says into discrete viral taxa

S J Anthony et al | 5

Table 2 Variables associated with coronavirus positive bat tests in Africa Asia and Latin America Regional datasets for each logistic regres-sion model consisted of bat species where greater than fifty bats were tested Age datasets were a subset of the regional datasets as age classwas not determined for all individuals Numbers in bold are statically significant for Plt 005

Odds ratio P 95 Confidence intervals

AfricaSample type Lower Upper

Fecesrectal swab 35098 lt0001 14664 84007Oralnasal swab 505 lt0001 203 1254Oralrectal swab 3098 lt0001 1346 7130Blood 165 0540 033 809Tissue 100

Host familyHipposideridae 122 0559 062 241Pteropodidae 384 lt0001 227 649Molossidae 100

SeasonDry 242 lt0001 179 325Wet 100

InterfaceAnimal use 198 0001 135 291Pristine area 139 0624 037 529Land use 166 0219 074 374Human activity 100

AgeSubadult 591 lt0001 428 817Adult 100

AsiaSample type

Fecesrectal swab 6280 lt0001 1544 25549Oralnasal swab 1325 lt0001 311 5638Urineurogenital swab 4692 lt0001 1085 20280Guano 2212 lt0001 366 13356Tissue Oralrectal swab 100

Host familyEmballonuridae 031 026 004 237Miniopteridae 978 lt0001 569 1681Pteropodidae 216 lt0001 129 362Rhinolophidae 108 082 057 203Vespertilionidae 367 lt0001 218 618Hipposideridae 100

SeasonDry 149 003 103 216Wet 100

InterfaceAnimal use 330 001 129 840Human activity 348 001 136 895Pristine area 181 035 052 635Land use 100

AgeSubadult 184 000 126 267Adult 100

Latin AmericaSample type

Fecesrectal swab 1566 0000 502 4879Oralrectal swab 2786 0000 686 11322Oralnasal swab 100

Host subfamilyCarolliinae 695 006 090 5376Stenodermatinae 504 012 067 3789Glossophaginae 100

InterfaceAnimal use 525 001 148 1865Human activity 273 012 077 969Land use 422 010 078 2295Pristine area 100

SeasonDry 130 052 058 289Wet 100

AgeSubadult 173 015 083 362Adult 100

6 | Virus Evolution 2017 Vol 3 No 1

America (Fig 4 Panel B) Further host switching was nearlyfour times more likely than virus sharing in Africa when com-pared with Latin America (OR 3858 Pfrac14 0040) CoVs in Asiawere also more likely to host switch than share when comparedwith Latin America however the results were not significant(OR 2474 Pfrac14 0143) No particular bat family was associatedwith increased host switching but this may reflect small sam-ple sizes when the data are summarized by family

33 Estimated number of CoVs in bats

A large number of host (bat) species in our study were negativefor CoV (nfrac14 197) however none of these species were sampledextensively (Fig 6) We found that all species with samplesizesgt110 individuals were positive for one or more CoVs sug-gesting that we would have detected CoVs in some of the nega-tive species in our study if sampling effort were increased Dueto the high number of these negative species and their effect onthe average number of virus sequence clusters per species theywere excluded from our estimates By retaining only those spe-cies for whichgt110 individuals were sampled (there weretwenty-seven species that qualified) we estimated the averagenumber of CoVs per species to be 267 (stdfrac14 138) thus account-ing for the likely scenario that some species will have more vi-ruses and others less We then extrapolated the average to all1200 bat species to estimate a total potential richness of 3204

CoVs (rangefrac14 1200ndash6000 CoVs) most of which have yet to bedescribed

34 Factors predicting CoV positivity

In order to refine future surveillance efforts to find the undis-covered diversity of CoVs in bats we evaluated the factors pre-dicting CoV positivity For each region the best model includedspecimen type bat family (or in the case of Latin America sub-family) season and animalndashhuman interface (Table 2) Seasonwas not included in the top model for Latin America but themodel with the season included was within two AIC (deltaAICfrac14 106) and therefore considered to have as much support asthe top model (Burnham and Anderson 2002) Specimen typewas highly associated with CoV positivity and samples contain-ing feces or fecal swabs were significantly more likely to testpositive than other specimen types in all regions Bat familywas important in Asia and Africa Season was also significantwith samples collected during the dry season more likely to testpositive in Africa and Asia than those collected during the wetseason (albeit with low odds ratios) Age class appears to be im-portant as sub-adults were more likely to test positive thanadults in Africa and Asia Sex was not related to CoV positivityin Asia or Latin America while males were slightly more likelyto be CoV positive in Africa (ORfrac14 15 12ndash19 CI Plt 0001) Broadcategories of animalndashhuman interfaces were significantly

Figure 2 Maximum likelihood phylogenetic reconstructions for all partial CoV RdRp fragments for both the Quan (Panel A) and Watanabe (Panel B) assays Sequences

are collapsed into clades representing our operating taxonomic units (sequences sharing90 identity) and number of sequences for each taxon is indicated in paren-

theses Representative published sequences from GenBank have been included for comparison (GenBank accession number indicated) Both trees were rooted using

the related Breda virus (NC_007447) and all nodes have 60 bootstrap support Pie charts indicate the distribution of viral taxa by host (bat) family in each virus sub-

clade

S J Anthony et al | 7

associated with CoV positivity among bats in all three regionsin Africa and Latin America bats sampled at the animal use in-terface category were more likely to be positive and in Asiabats sampled at the animal use and human activities interfaceswere more likely to be positive than those sampled at other in-terfaces The significance of the animal interface in all three

regions suggests that practices around animal use may be im-portant for disease transmission

35 Future sampling effort

Based on our study we were able to estimate the sampling ef-fort that would be required to find all CoVs in bats based on a

Figure 3 Comparison of viral and bat diversity The earthrsquos surface was divided into grid cells by latitude and longitude (10 10 degree units) for diversity calculations

(Panel A) Grid cells where bats were sampled are numbered in each region Alpha diversity (Shannon H) for virus (Panel B) and host (Panel C) were correlated indicat-

ing that areas of high bat diversity also have high viral diversity Darker cells indicate higher alpha diversity (ie more viral or host taxa) in each grid cell Beta diversity

was also correlated between virus (Panel D) and host (Panel E) and differentiated into three discrete communities by regionmdashLatin America (grid cells 1ndash10) Africa

(grid cells 11ndash20) and Asia (grid cells 21ndash34) Shading indicates that either viruses (in red) or hosts (in blue) are shared between two corresponding grid cells with darker

cells indicating higher pairwise similarity

8 | Virus Evolution 2017 Vol 3 No 1

Poisson regression model of our data (Fig 6) With 154 individ-uals we estimate that an average of one CoV will be detected(95 CI 136ndash177) With 397 individuals we estimate that upto five CoVs will be detected (95 CI 351ndash458) As we did not

observe any species with greater than five viral sequenceclusters we make the assumption that sampling 397 individ-uals should capture the full diversity of CoVs in each batspecies

Figure 4 Network model showing the connection of CoVs and their hosts Viral sequence clusters (colored grey) are connected to host species either by region (Panel

A) or family (Panel B) Viral and host and communities separate almost entirely by region only Africa and Asia are connected by two shared viruses (HKU9 and

PREDICT_CoV-35) found in species from both continents Networks also show that viruses appear to be shared by multiple host families in Africa and Asia while being

more restricted to a single family in Latin America

S J Anthony et al | 9

Figure 5 Relative proportion of evolutionary events leading to observed virushost associations for CoVs in bats Cophylogenetic reconstructions were used to identify

each event (Supplementary Fig S1 Panels AndashD) and significance evaluated by region Across all regions host switching was the dominant evolutionary event (Panel

A) When separated by region host switching remained dominant in Africa (Panel B) and Asia (Panel C) but not in Latin America (Panel D)

Figure 6 Relationship between sampling effort and viral detection among all bat species sampled A Poisson regression model was used to estimate the expected num-

ber of viruses based on the number of animals sampled by species (each circle indicates a separate species) The 95 confidence intervals are indicated

10 | Virus Evolution 2017 Vol 3 No 1

36 Predicting the distribution of unknown CoVs

Viral sub-clade and host family were not independent of eachother (v2 by cladefrac14 Plt 0001 Supplementary Table S1) demon-strating that different clades have significant associations withspecific bat families For example subgroup 2d CoVs were sig-nificantly associated with pteropid bats and were only identi-fied in regions where pteropid bats exist Likewise subgroup 2bCoVs were associated with rhinolophid and hipposiderid batsand 2c with vespertilionid bats To lsquopredictrsquo the potential distri-bution of unknown CoVs we plotted the known distribution ofbats belonging to these families and assume that lsquohotspotsrsquo ofbat diversity infer hotspots of viral diversity for each sub-clade(Fig 7) We note that the alphaCoV genus has been consideredas one group since they do not resolve well into sub-clades(de Groot et al 2009) and that no 2a map was generatedgiven that only one bat virus from this clade was identified

4 Discussion

The emergence of SARS and MERS has driven a need to under-stand more about the diversity ecology and evolution of coro-naviruses particularly at so-called lsquohotspotsrsquo of zoonoticemergence (Jones et al 2008 Drexler et al 2014) To this end wesurveyed the diversity of CoVs from twenty countries in LatinAmerica Africa and Asia to identify global factors driving viraldiversity and to look for regional differences in factors that con-tribute to the risk of emergence such as host switching In totalwe identified sequences from 100 discrete phylogenetic clus-ters ninety-one of which were found in bats

Our data suggest that the diversity of bat CoVs has beendriven primarily by host ecology First viral richness wasstrongly correlated with bat richness suggesting that mostCoVs will be found in regions where bat diversity is highestSecond we showed that CoV diversity separates into three

Figure 7 Viral diversity lsquohotspotrsquo maps Panel A shows the potential hotspots for CoVs based on the distribution of bats worldwide Location of alpha CoV sequences

from this study are shown in black and beta CoV sequences in blue indicating there is no geographical bias based on viral genus (ie alpha and beta CoVs are equally

likely to be found in all regions) Some viral sub-clades were associated with particular bat families and the spatial distribution data of all species belonging to these

families were plotted to indicate the potential hotspots of viral diversity (richness) for these sub-clades Panel B indicates the potential distribution of 2b CoVs based on

the distribution of rhinolphus and hipposideros bats Locations of 2b-positive animals identified in this study are indicated in black and correlate with areas of high

species richness (for these families) CoV-positive animals for other sub-clades shown in light blue Panel C indicates the potential distribution of 2c CoVs based on the

distribution of vespertilionid bats Locations of 2c-positive animals identified in this study are indicated in black This map suggests there are hotspots of 2c diversity

in regions not covered in this study (eg Europe) Panel D indicates the potential distribution of 2d viruses based on the distribution of pteropid bats The map suggests

these viruses may have a more limited distribution compared with viruses of other sub-clades Locations of 2d-positive animals identified in this study are shown in

black

S J Anthony et al | 11

distinct communities by region echoing the distribution of batsand suggesting an ecological dependence on their hosts Andthird we identified particular associations between viral sub-clade and bat family indicating that CoVs have evolved with (oradapted to) preferred families Collectively these data showthat the global diversity and distribution of CoVs in bats is non-random and is driven by variation in the biogeography of bats

Regional variation was also observed in the proportion ofhost switching events relative to other evolutionary mecha-nisms such as co-speciation duplication or sharing (seeMethods for definitions) Overall host switching was the domi-nant mechanism supporting the general trend observed forCoVs (Vijaykrishna et al 2007 Woo et al 2009 Lau et al 2012) aswell as similar findings for other viral families including para-mxyoviruses (Melade et al 2016) hantaviruses (Ramsden et al2009) and arenaviruses (Coulibaly-NrsquoGolo et al 2011 Irwin et al2012) However when analyzed by region there were propor-tionally fewer events in Latin America compared with Africa orAsia Given that host switching is the first critical step in zoo-notic emergence (Li et al 2005Pfefferle et al 2009 Woo et al2012 Azhar et al 2014) we suggest this finding could reflect re-gional differences in the risk of disease emergence

It is important to note that we distinguish lsquohost switchingrsquo(inter-genus or family switching) from lsquosharingrsquo (intra-genusswitching) in our analysis and that while the proportion of hostswitches was lower in Latin America the proportion of sharingevents was concomitantly higher This indicates that CoVs inthis region still switch hosts just like they do in Africa and Asiabut that they preferentially move between closely related spe-cies While we do not yet understand the factors driving this dif-ference if we assume that the risk of spillover to humansreflects the taxonomic distances that CoVs are able to jump(Kreuder Johnson et al 2015) our results would indicate ahigher risk for zoonotic emergence in Africa and Asia [acknowl-edging that host switching is not the only important factor driv-ing zoonotic emergence (Jones et al 2008 Keesing et al 2010Karesh et al 2012)] While evidence of regional variation in hostswitching has not been reported previously for viruses (toour knowledge) similar patterns have been observed inhemosporidian parasites (Ellis et al 2015)

In total we estimated that there are at least 3204 CoVs(based on the cluster definition used in this study) in bats Thissuggests that although there is still a large number of coronavi-ruses that remain to be discovered their detection remains anachievable goal and we advocate for their discovery given thepotential for even greater insights into the global ecology andevolution of these viruses (eg further investigation into re-gional differences in host switching) and the opportunity toelucidate their individual zoonotic potential (Ge et al 2013Wang et al 2014 Yang et al 2014 Anthony et al 2017) Buildingon our study we suggest that future surveillance efforts shouldconsider the number of samples carefully and include up to 400individuals (of either sex) per species in order to maximize thechance of detecting all CoVs and that including less than 154individuals per species might reduce the chance of finding evenone virus and would likely produce a poor return on invest-ment We also recommend that feces (or rectal swabs) followedby saliva (or oral swabs) should be prioritized if resources arelimited and all sample types cannot be processed and thatsampling should be biased towards immature animals and inthe dry season

To predict where this unknown diversity is likely to be weplotted the known distribution of all bat families associatedwith each CoV sub-clade The 2b SARS clade was associated

with rhinolophid and hipposiderid bats so we plotted the distri-bution of all bat species belonging to these families to generatea map of viral diversity (richness) Based on this map we wouldexpect the greatest diversity of unknown 2b viruses to be foundin South East Asia which is consistent with previous studies de-scribing 2b viruses in bats but also in isolated pockets through-out Africa In contrast the 2c MERS clade was associated withvespertilionid bats predicting that 2c diversity will be highest inMexico Europe Central and South East Africa and parts ofSouth East Asia Again this is consistent with the distributionof 2c viruses reported here and previously in the literature(Woo et al 2007 Reusken et al 2010 Anthony et al 2012 Lelliet al 2013 Corman et al 2014ab Anthony et al 2017 )

Certain limitations should be considered in the interpreta-tion of our data First the diversity of CoV sequences detected isalmost certainly biased by our sampling effort A small numberof species were sampled abundantly thus maximizing ourchances of finding a CoV (27282 species had gt110 individuals)however most were under-sampled (232282 with50 individ-uals) While this biases the probability of finding a CoV towardsthe more abundantly sampled species we stress that it also re-flects the uneven distribution of naturally occurring communi-ties (Magurran 2011) and that targeting the very rare specieswould be prohibitively expensive It is also unclear if populationsize has an effect on the number of viruses present in a spe-ciesmdasheg do larger populations accommodate more viral diver-sity Second our analysis only represents the lsquolastrsquo evolutionaryevent identified It does not represent or account for the com-plete evolutionary history of these lineages Therefore while asingle event has been ascribed to each association (seelsquoMethodsrsquo) it does not preclude other events having an equal orgreater impact in the past We are limited to using this most re-cent event type in order to assign a single evolutionary event toeach unique association It should also be noted that events aredependent on the relationships observed in the reconstructedtrees and could change as more viruses are added or longer se-quences are used Third we have only generated partial se-quences for analysis which limits our ability to comment onthe specific zoonotic potential of each virus or provide a com-prehensive analysis of their genetic histories and taxonomy forexample estimating the frequency or impact of recombinationwhich can be an important driver of host switching and diseaseemergence (Anthony et al 2017) We certainly support full-genome sequencing wherever possible [and have such effortsunderway (Anthony et al 2017)] however difficulties in virusisolation or sequencing directly from swabs (where materialand viral load can be low yielding variable results) can oftenpreclude extending sequences much beyond the short frag-ments amplified by consensus PCR (Drexler et al 2014) Equallylogistic and permitting issues in moving biological samplesacross borders and the need to first develop in-country capacityto fully sequence these viruses have all limited our ability tocharacterize (most of) these viruses further

Our lsquostate of the artrsquo is in the unique scope of our study focus-ing on a single viral family at a global scale To achieve this theUSAID PREDICT project first had to establish a strategy for viraldiscovery in twenty different countries many of which had noprior capacity to do this work at all For this reason we used asimple yet highly implementable approach based on cPCRInitially this approach might not appear to be as powerful ashigh throughput sequencing (HTS) techniques however thisstudy has demonstrated the particular and considerablestrengths of cPCR as an affordable screening and discovery tool inresource-poor settings where HTS is still largely unsupported

12 | Virus Evolution 2017 Vol 3 No 1

Studying viral diversity on a global scale is just one compo-nent of a larger strategy to understand the factors driving viraldiversity Ecological and evolutionary mechanisms can contrib-ute differently across scales and we therefore advocate compli-menting the global perspective with studies that focus on entireviral communities (rather than only one viral family) within sin-gle individuals or host species (rather than only on a globalscale) (Anthony et al 2015) We propose that it is critical to un-derstand both the component parts of the lsquozoonotic poolrsquo(Morse et al 2012) and the nature of their interactions at differ-ent scales if we are to move towards better predictive modelsand ultimately to reducing zoonotic emergence

Finally we offer a specific comment regarding the publichealth threat posed by bats While it is tempting to concludethat bats harbor a large number of potentially zoonotic CoVsmost of the putative viruses detected in this study are unlikelyto pose any threat to humansmdasheither because they lack the bio-logical pre-requisites to infect human cells or because the ecol-ogy of their hosts limits the opportunity for spillover Studiessuch as this are intended to advance our understanding of thefundamental biology of viruses not to create alarm or incite theretaliatory culling of bats Indeed such actions often have un-anticipated consequences and can even enhance disease trans-mission as seen with rabies in vampire bats (Streicker et al2012 Blackwood et al 2013) Bats are important insectivorespollinators and seed dispersers and we underscore both their vi-tal ecosystem role and the need to consider any public healthinterventions carefully

Supplementary data

Supplementary data are available at Virus Evolution online

Conflict of interest None declared

Acknowledgements

This study was made possible by the generous support ofthe American people through the United States Agency forInternational Development (USAID) Emerging PandemicThreats PREDICT project (cooperative agreement numberGHN-A-OO-09-00010-00) We thank the governments ofCameroon Gabon Democratic Republic of Congo Republicof Congo Rwanda Tanzania Uganda Peru Bolivia BrazilMexico Bangladesh Cambodia China Indonesia LaosMalaysia Nepal Thailand and Viet Nam for permission toconduct this study and the field teams and collaboratinglaboratories that performed sample collection and testing

ReferencesAnthony S J et al (2012) lsquoCoronaviruses in bats from Mexicorsquo

Journal of General Virology 94 1028ndash38 et al (2015) lsquoNon-random patterns in viral diversityrsquo

Nature Communications 6 8147 et al (2017) lsquoFurther evidence for bats as the evolutionary

source of MERS coronavirusrsquo mBio 82 pii e00373-17 doi101128mBio00373-17

Azhar E I et al (2014) lsquoEvidence for camel-to-human transmis-sion of MERS coronavirusrsquo The New England Journal of Medicine370 2499ndash505

Blackwood J C et al (2013) lsquoResolving the roles of immunitypathogenesis and immigration for rabies persistence in

vampire batsrsquo Proceedings of the National Academy of Sciences ofthe United States of America 110 20837ndash42

Burnham K P and Anderson D R (2002) Model Selection andMultimodel Inference A Practical Information-Theoretic ApproachNew York Springer

Charleston M A (1998) lsquoJungles a new solution to the hostpar-asite phylogeny reconciliation problemrsquo MathematicalBiosciences 149 191ndash223

Conow C et al (2010) lsquoJane a new tool for the cophylogeny recon-struction problemrsquo Algorithms for Molecular Biology AMB 5 16

Corman V M et al (2014a) lsquoRooting the phylogenetic tree ofmiddle east respiratory syndrome coronavirus by characteri-zation of a conspecific virus from an African batrsquo Journal ofVirology 88 11297ndash303

et al (2014b) lsquoCharacterization of a novel betacoronavirusrelated to middle East respiratory syndrome coronavirus inEuropean hedgehogsrsquo Journal of Virology 88 717ndash24

Coulibaly-NrsquoGolo D et al (2011) lsquoNovel arenavirus sequences inHylomyscus sp and Mus (Nannomys) setulosus from CotedrsquoIvoire implications for evolution of arenaviruses in AfricarsquoPLoS One 6 e20893

Darriba D et al (2012) lsquojModelTest 2 more models new heuris-tics and parallel computingrsquo Nature Methods 9 772

de Groot R J et al (2009) lsquoVirus taxonomy classification and no-menclature of virusesrsquo in A M Q King M J Adams E BCarstens and J Lefkkowitch (eds) Ninth Report of theInternational Committee for the Taxonomy of Viruses AmsterdamElsevier

Donaldson E F et al (2010) lsquoMetagenomic analysis of theviromes of three North American bat species viral diversityamong different bat species that share a common habitatrsquoJournal of Virology 84 13004ndash18

Drexler J F Corman V M and Drosten C (2014) lsquoEcology evo-lution and classification of bat coronaviruses in the aftermathof SARSrsquo Antiviral Research 101 45ndash56

Drosten C et al (2003) lsquoIdentification of a novel coronavirus inpatients with severe acute respiratory syndromersquo The NewEngland Journal of Medicine 348 1967ndash76

Edgar R C (2004) lsquoMUSCLE multiple sequence alignment withhigh accuracy and high throughputrsquo Nucleic Acids Research 321792ndash7

Ellis V A et al (2015) lsquoLocal host specialization host-switchingand dispersal shape the regional distributions of avian haemo-sporidian parasitesrsquo Proceedings of the National Academy ofSciences of the United States of America 112 11294ndash9

ESRI (2011) ArcGIS Desktop Release 10 Redlands CAEnvironmental Systems Research Institute

Ge X Y et al (2013) lsquoIsolation and characterization of a batSARS-like coronavirus that uses the ACE2 receptorrsquo Nature503 535ndash8

Hijmans R J Williams E Vennes C (2015) GeosphereSpherical Trigonometry v R Package version 15

Hagberg A A Schult D A and Swart P J (2008) lsquoExploring net-work structure dynamics and function using NetworkXrsquo in GVaroquaux T Vaught and J Millman (eds) Proceedings of 7thPython in Science Conference (SciPy2008) pp 11ndash15 Pasadena CA

Hu B et al (2015) lsquoBat origin of human coronavirusesrsquo VirologyJournal 12 221

Huang Y W et al (2013) lsquoOrigin evolution and genotyping ofemergent porcine epidemic diarrhea virus strains in theUnited Statesrsquo mBio 4 e00737ndash13

Huynh J et al (2012) lsquoEvidence supporting a zoonotic origin ofhuman coronavirus strain NL63rsquo Journal of Virology 8612816ndash25

S J Anthony et al | 13

Irwin N R et al (2012) lsquoComplex patterns of host switching inNew World arenavirusesrsquo Molecular Ecology 21 4137ndash50

Jacomy M et al (2014) lsquoForceAtlas2 a continuous graph layoutalgorithm for handy network visualization designed for theGephi softwarersquo PLoS One 9 e98679

Jones K E et al (2008) lsquoGlobal trends in emerging infectious dis-easesrsquo Nature 451 990ndash3

Jost L Chao A and Chazdon R L (2011) lsquoCompositional simi-larity and beta diversityrsquo in A E Magurran and B J McGill(eds) Biological Diversity Frontiers in Measurement andAssessment Oxford Oxford University Press

Karesh W B et al (2012) lsquoEcology of zoonoses natural and un-natural historiesrsquo Lancet 380 1936ndash45

Keesing F et al (2010) lsquoImpacts of biodiversity on the emergenceand transmission of infectious diseasesrsquo Nature 468 647ndash52

King A M Q et al (2012) Virus Taxonomy Classification andNomenclature of Viruses Ninth Report of the InternationalCommittee on Taxonomy of Viruses Elsevier Academic Press

Kreuder Johnson C et al (2015) lsquoSpillover and pandemic proper-ties of zoonotic viruses with high host plasticityrsquo ScientificReports 5 14830

Ksiazek T G et al (2003) lsquoA novel coronavirus associated withsevere acute respiratory syndromersquo The New England Journal ofMedicine 348 1953ndash66

Lau S K et al (2005) lsquoSevere acute respiratory syndromecoronavirus-like virus in Chinese horseshoe batsrsquo Proceedingsof the National Academy of Sciences of the United States of America102 14040ndash5

et al (2012) lsquoRecent transmission of a novel alphacoronavi-rus bat coronavirus HKU10 from Leschenaultrsquos rousettes topomona leaf-nosed bats first evidence of interspecies trans-mission of coronavirus between bats of different subordersrsquoJournal of Virology 86 11906ndash18

et al (2015) lsquoDiscovery of a novel coronavirus ChinaRattus coronavirus HKU24 from Norway rats supports the mu-rine origin of Betacoronavirus 1 and has implications for theancestor of Betacoronavirus lineage Arsquo Journal of Virology 893076ndash92

Lelli D et al (2013) lsquoDetection of coronaviruses in bats of variousspecies in Italyrsquo Viruses 5 2679ndash89

Li W et al (2005) lsquoBats are natural reservoirs of SARS-like coro-navirusesrsquo Science 310 676ndash9

Liang K and Zeger S L (1986) lsquoLongitudinal data analysis usinggeneralized linear modelsrsquo Biometrika 73 13ndash22

Maes P et al (2009) lsquoA proposal for new criteria for the classifica-tion of hantaviruses based on S and M segment protein se-quencesrsquo Infection Genetics and Evolution Journal of MolecularEpidemiology and Evolutionary Genetics in Infectious Diseases 9813ndash20

Magurran A E (2011) Biological Diversity Frontiers in Measurementand Assessment Oxford Oxford University Press

Melade J et al (2016) lsquoAn eco-epidemiological study of Morbilli-related paramyxovirus infection in Madagascar bats revealshost-switching as the dominant macro-evolutionary mecha-nismrsquo Scientific Reports 6 23752

Merkle D Middendorf M and Wieseke N (2010) lsquoA parameter-adaptive dynamic programming approach for inferring cophy-logeniesrsquo BMC Bioinformatics 11 Suppl 1 S60

Morse S S et al (2012) lsquoPrediction and prevention of the nextpandemic zoonosisrsquo The Lancet 380 1956ndash65

Pfefferle S et al (2009) lsquoDistant relatives of severe acute respira-tory syndrome coronavirus and close relatives of human

coronavirus 229E in bats Ghanarsquo Emerging Infectious Diseases15 1377ndash84

PREDICT_Consortium (2014) Reducing Pandemic Risk PromotingGlobal Health One Health Institute University of CaliforniaDavis Davis CA

Quan P L et al (2010) lsquoIdentification of a severe acute respira-tory syndrome coronavirus-like virus in a leaf-nosed bat inNigeriarsquo mBio 1 pii e00208-10 doi101128mBio00208-10

R Foundation for Statistical Computing (2008) R A Language andEnvironment for Statistical Computing Vienna Austria RFoundation for Statistical Computing

Ramsden C Holmes E C and Charleston M A (2009)lsquoHantavirus evolution in relation to its rodent and insectivorehosts no evidence for codivergencersquo Molecular Biology andEvolution 26 143ndash53

Reusken C B et al (2010) lsquoCirculation of group 2 coronavirusesin a bat species common to urban areas in Western EuropersquoVector Borne Zoonotic Dis 10 785ndash91

et al (2013) lsquoMiddle East respiratory syndrome coronavirusneutralising serum antibodies in dromedary camels a com-parative serological studyrsquo The Lancet Infectious Diseases 13859ndash66

Rubin D B (1987) Multiple Imputation for Nonresponse in SurveysNew York John Wiley and Sons

Sabir J S et al (2016) lsquoCo-circulation of three camel coronavirusspecies and recombination of MERS-CoVs in Saudi ArabiarsquoScience 351 81ndash4

Streicker D G et al (2012) lsquoEcological and anthropogenic driversof rabies exposure in vampire bats implications for transmis-sion and controlrsquo Proceedings Biological Sciencesthe RoyalSociety 279 3384ndash92

Tamura K et al (2013) lsquoMEGA6 molecular evolutionary geneticsanalysis version 60rsquo Molecular Biology and Evolution 30 2725ndash9

Tang X C et al (2006) lsquoPrevalence and genetic diversity of coro-naviruses in bats from Chinarsquo Journal of Virology 80 7481ndash90

Tsoleridis T et al (2016) lsquoDiscovery of novel alphacoronavirusesin European rodents and shrewsrsquo Viruses 8 84

Vijaykrishna D et al (2007) lsquoEvolutionary insights into the ecol-ogy of coronavirusesrsquo Journal of Virology 81 4012ndash20

Vijgen L et al (2005) lsquoComplete genomic sequence of human co-ronavirus OC43 molecular clock analysis suggests a relativelyrecent zoonotic coronavirus transmission eventrsquo Journal ofVirology 79 1595ndash604

Wacharapluesadee S et al (2015) lsquoDiversity of coronavirus inbats from Eastern Thailandrsquo Virology Journal 12 57

Wang Q et al (2014) lsquoBat origins of MERS-CoV supported by batcoronavirus HKU4 usage of human receptor CD26rsquo Cell Host ampMicrobe 16 328ndash37

Wang W et al (2015) lsquoDiscovery diversity and evolution ofnovel coronaviruses sampled from rodents in Chinarsquo Virology474 19ndash27

Watanabe S et al (2010) lsquoBat Coronaviruses and experimentalinfection of bats the Philippinesrsquo Emerging Infectious Diseases16 1217ndash23

Woo P C et al (2006) lsquoMolecular diversity of coronaviruses inbatsrsquo Virology 351 180ndash7

et al (2007) lsquoComparative analysis of twelve genomes ofthree novel group 2c and group 2d coronaviruses reveals uniquegroup and subgroup featuresrsquo Journal of Virology 81 1574ndash85

et al (2009) lsquoCoronavirus diversity phylogeny and inter-species jumpingrsquo Experimental Biology and Medicine 2341117ndash27

14 | Virus Evolution 2017 Vol 3 No 1

et al (2012) lsquoGenetic relatedness of the novel human groupC betacoronavirus to Tylonycteris bat coronavirus HKU4 andPipistrellus bat coronavirus HKU5rsquo Emerging Microbes andInfection 1 e35

Yang Y et al (2014) lsquoReceptor usage and cell entry of bat co-ronavirus HKU4 provide insight into bat-to-human

transmission of MERS coronavirusrsquo Proceedings of theNational Academy of Sciences of the United States of America111 12516ndash21

Zaki A M et al (2012) lsquoIsolation of a novel coronavirus from aman with pneumonia in Saudi Arabiarsquo The New England Journalof Medicine 367 1814ndash20

S J Anthony et al | 15

  • vex012-TF1
Page 6: Global patterns in coronavirus diversity · Since the emergence of Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV) and Middle East Respiratory Syndrom Coronavirus (MERS-CoV)

Table 2 Variables associated with coronavirus positive bat tests in Africa Asia and Latin America Regional datasets for each logistic regres-sion model consisted of bat species where greater than fifty bats were tested Age datasets were a subset of the regional datasets as age classwas not determined for all individuals Numbers in bold are statically significant for Plt 005

Odds ratio P 95 Confidence intervals

AfricaSample type Lower Upper

Fecesrectal swab 35098 lt0001 14664 84007Oralnasal swab 505 lt0001 203 1254Oralrectal swab 3098 lt0001 1346 7130Blood 165 0540 033 809Tissue 100

Host familyHipposideridae 122 0559 062 241Pteropodidae 384 lt0001 227 649Molossidae 100

SeasonDry 242 lt0001 179 325Wet 100

InterfaceAnimal use 198 0001 135 291Pristine area 139 0624 037 529Land use 166 0219 074 374Human activity 100

AgeSubadult 591 lt0001 428 817Adult 100

AsiaSample type

Fecesrectal swab 6280 lt0001 1544 25549Oralnasal swab 1325 lt0001 311 5638Urineurogenital swab 4692 lt0001 1085 20280Guano 2212 lt0001 366 13356Tissue Oralrectal swab 100

Host familyEmballonuridae 031 026 004 237Miniopteridae 978 lt0001 569 1681Pteropodidae 216 lt0001 129 362Rhinolophidae 108 082 057 203Vespertilionidae 367 lt0001 218 618Hipposideridae 100

SeasonDry 149 003 103 216Wet 100

InterfaceAnimal use 330 001 129 840Human activity 348 001 136 895Pristine area 181 035 052 635Land use 100

AgeSubadult 184 000 126 267Adult 100

Latin AmericaSample type

Fecesrectal swab 1566 0000 502 4879Oralrectal swab 2786 0000 686 11322Oralnasal swab 100

Host subfamilyCarolliinae 695 006 090 5376Stenodermatinae 504 012 067 3789Glossophaginae 100

InterfaceAnimal use 525 001 148 1865Human activity 273 012 077 969Land use 422 010 078 2295Pristine area 100

SeasonDry 130 052 058 289Wet 100

AgeSubadult 173 015 083 362Adult 100

6 | Virus Evolution 2017 Vol 3 No 1

America (Fig 4 Panel B) Further host switching was nearlyfour times more likely than virus sharing in Africa when com-pared with Latin America (OR 3858 Pfrac14 0040) CoVs in Asiawere also more likely to host switch than share when comparedwith Latin America however the results were not significant(OR 2474 Pfrac14 0143) No particular bat family was associatedwith increased host switching but this may reflect small sam-ple sizes when the data are summarized by family

33 Estimated number of CoVs in bats

A large number of host (bat) species in our study were negativefor CoV (nfrac14 197) however none of these species were sampledextensively (Fig 6) We found that all species with samplesizesgt110 individuals were positive for one or more CoVs sug-gesting that we would have detected CoVs in some of the nega-tive species in our study if sampling effort were increased Dueto the high number of these negative species and their effect onthe average number of virus sequence clusters per species theywere excluded from our estimates By retaining only those spe-cies for whichgt110 individuals were sampled (there weretwenty-seven species that qualified) we estimated the averagenumber of CoVs per species to be 267 (stdfrac14 138) thus account-ing for the likely scenario that some species will have more vi-ruses and others less We then extrapolated the average to all1200 bat species to estimate a total potential richness of 3204

CoVs (rangefrac14 1200ndash6000 CoVs) most of which have yet to bedescribed

34 Factors predicting CoV positivity

In order to refine future surveillance efforts to find the undis-covered diversity of CoVs in bats we evaluated the factors pre-dicting CoV positivity For each region the best model includedspecimen type bat family (or in the case of Latin America sub-family) season and animalndashhuman interface (Table 2) Seasonwas not included in the top model for Latin America but themodel with the season included was within two AIC (deltaAICfrac14 106) and therefore considered to have as much support asthe top model (Burnham and Anderson 2002) Specimen typewas highly associated with CoV positivity and samples contain-ing feces or fecal swabs were significantly more likely to testpositive than other specimen types in all regions Bat familywas important in Asia and Africa Season was also significantwith samples collected during the dry season more likely to testpositive in Africa and Asia than those collected during the wetseason (albeit with low odds ratios) Age class appears to be im-portant as sub-adults were more likely to test positive thanadults in Africa and Asia Sex was not related to CoV positivityin Asia or Latin America while males were slightly more likelyto be CoV positive in Africa (ORfrac14 15 12ndash19 CI Plt 0001) Broadcategories of animalndashhuman interfaces were significantly

Figure 2 Maximum likelihood phylogenetic reconstructions for all partial CoV RdRp fragments for both the Quan (Panel A) and Watanabe (Panel B) assays Sequences

are collapsed into clades representing our operating taxonomic units (sequences sharing90 identity) and number of sequences for each taxon is indicated in paren-

theses Representative published sequences from GenBank have been included for comparison (GenBank accession number indicated) Both trees were rooted using

the related Breda virus (NC_007447) and all nodes have 60 bootstrap support Pie charts indicate the distribution of viral taxa by host (bat) family in each virus sub-

clade

S J Anthony et al | 7

associated with CoV positivity among bats in all three regionsin Africa and Latin America bats sampled at the animal use in-terface category were more likely to be positive and in Asiabats sampled at the animal use and human activities interfaceswere more likely to be positive than those sampled at other in-terfaces The significance of the animal interface in all three

regions suggests that practices around animal use may be im-portant for disease transmission

35 Future sampling effort

Based on our study we were able to estimate the sampling ef-fort that would be required to find all CoVs in bats based on a

Figure 3 Comparison of viral and bat diversity The earthrsquos surface was divided into grid cells by latitude and longitude (10 10 degree units) for diversity calculations

(Panel A) Grid cells where bats were sampled are numbered in each region Alpha diversity (Shannon H) for virus (Panel B) and host (Panel C) were correlated indicat-

ing that areas of high bat diversity also have high viral diversity Darker cells indicate higher alpha diversity (ie more viral or host taxa) in each grid cell Beta diversity

was also correlated between virus (Panel D) and host (Panel E) and differentiated into three discrete communities by regionmdashLatin America (grid cells 1ndash10) Africa

(grid cells 11ndash20) and Asia (grid cells 21ndash34) Shading indicates that either viruses (in red) or hosts (in blue) are shared between two corresponding grid cells with darker

cells indicating higher pairwise similarity

8 | Virus Evolution 2017 Vol 3 No 1

Poisson regression model of our data (Fig 6) With 154 individ-uals we estimate that an average of one CoV will be detected(95 CI 136ndash177) With 397 individuals we estimate that upto five CoVs will be detected (95 CI 351ndash458) As we did not

observe any species with greater than five viral sequenceclusters we make the assumption that sampling 397 individ-uals should capture the full diversity of CoVs in each batspecies

Figure 4 Network model showing the connection of CoVs and their hosts Viral sequence clusters (colored grey) are connected to host species either by region (Panel

A) or family (Panel B) Viral and host and communities separate almost entirely by region only Africa and Asia are connected by two shared viruses (HKU9 and

PREDICT_CoV-35) found in species from both continents Networks also show that viruses appear to be shared by multiple host families in Africa and Asia while being

more restricted to a single family in Latin America

S J Anthony et al | 9

Figure 5 Relative proportion of evolutionary events leading to observed virushost associations for CoVs in bats Cophylogenetic reconstructions were used to identify

each event (Supplementary Fig S1 Panels AndashD) and significance evaluated by region Across all regions host switching was the dominant evolutionary event (Panel

A) When separated by region host switching remained dominant in Africa (Panel B) and Asia (Panel C) but not in Latin America (Panel D)

Figure 6 Relationship between sampling effort and viral detection among all bat species sampled A Poisson regression model was used to estimate the expected num-

ber of viruses based on the number of animals sampled by species (each circle indicates a separate species) The 95 confidence intervals are indicated

10 | Virus Evolution 2017 Vol 3 No 1

36 Predicting the distribution of unknown CoVs

Viral sub-clade and host family were not independent of eachother (v2 by cladefrac14 Plt 0001 Supplementary Table S1) demon-strating that different clades have significant associations withspecific bat families For example subgroup 2d CoVs were sig-nificantly associated with pteropid bats and were only identi-fied in regions where pteropid bats exist Likewise subgroup 2bCoVs were associated with rhinolophid and hipposiderid batsand 2c with vespertilionid bats To lsquopredictrsquo the potential distri-bution of unknown CoVs we plotted the known distribution ofbats belonging to these families and assume that lsquohotspotsrsquo ofbat diversity infer hotspots of viral diversity for each sub-clade(Fig 7) We note that the alphaCoV genus has been consideredas one group since they do not resolve well into sub-clades(de Groot et al 2009) and that no 2a map was generatedgiven that only one bat virus from this clade was identified

4 Discussion

The emergence of SARS and MERS has driven a need to under-stand more about the diversity ecology and evolution of coro-naviruses particularly at so-called lsquohotspotsrsquo of zoonoticemergence (Jones et al 2008 Drexler et al 2014) To this end wesurveyed the diversity of CoVs from twenty countries in LatinAmerica Africa and Asia to identify global factors driving viraldiversity and to look for regional differences in factors that con-tribute to the risk of emergence such as host switching In totalwe identified sequences from 100 discrete phylogenetic clus-ters ninety-one of which were found in bats

Our data suggest that the diversity of bat CoVs has beendriven primarily by host ecology First viral richness wasstrongly correlated with bat richness suggesting that mostCoVs will be found in regions where bat diversity is highestSecond we showed that CoV diversity separates into three

Figure 7 Viral diversity lsquohotspotrsquo maps Panel A shows the potential hotspots for CoVs based on the distribution of bats worldwide Location of alpha CoV sequences

from this study are shown in black and beta CoV sequences in blue indicating there is no geographical bias based on viral genus (ie alpha and beta CoVs are equally

likely to be found in all regions) Some viral sub-clades were associated with particular bat families and the spatial distribution data of all species belonging to these

families were plotted to indicate the potential hotspots of viral diversity (richness) for these sub-clades Panel B indicates the potential distribution of 2b CoVs based on

the distribution of rhinolphus and hipposideros bats Locations of 2b-positive animals identified in this study are indicated in black and correlate with areas of high

species richness (for these families) CoV-positive animals for other sub-clades shown in light blue Panel C indicates the potential distribution of 2c CoVs based on the

distribution of vespertilionid bats Locations of 2c-positive animals identified in this study are indicated in black This map suggests there are hotspots of 2c diversity

in regions not covered in this study (eg Europe) Panel D indicates the potential distribution of 2d viruses based on the distribution of pteropid bats The map suggests

these viruses may have a more limited distribution compared with viruses of other sub-clades Locations of 2d-positive animals identified in this study are shown in

black

S J Anthony et al | 11

distinct communities by region echoing the distribution of batsand suggesting an ecological dependence on their hosts Andthird we identified particular associations between viral sub-clade and bat family indicating that CoVs have evolved with (oradapted to) preferred families Collectively these data showthat the global diversity and distribution of CoVs in bats is non-random and is driven by variation in the biogeography of bats

Regional variation was also observed in the proportion ofhost switching events relative to other evolutionary mecha-nisms such as co-speciation duplication or sharing (seeMethods for definitions) Overall host switching was the domi-nant mechanism supporting the general trend observed forCoVs (Vijaykrishna et al 2007 Woo et al 2009 Lau et al 2012) aswell as similar findings for other viral families including para-mxyoviruses (Melade et al 2016) hantaviruses (Ramsden et al2009) and arenaviruses (Coulibaly-NrsquoGolo et al 2011 Irwin et al2012) However when analyzed by region there were propor-tionally fewer events in Latin America compared with Africa orAsia Given that host switching is the first critical step in zoo-notic emergence (Li et al 2005Pfefferle et al 2009 Woo et al2012 Azhar et al 2014) we suggest this finding could reflect re-gional differences in the risk of disease emergence

It is important to note that we distinguish lsquohost switchingrsquo(inter-genus or family switching) from lsquosharingrsquo (intra-genusswitching) in our analysis and that while the proportion of hostswitches was lower in Latin America the proportion of sharingevents was concomitantly higher This indicates that CoVs inthis region still switch hosts just like they do in Africa and Asiabut that they preferentially move between closely related spe-cies While we do not yet understand the factors driving this dif-ference if we assume that the risk of spillover to humansreflects the taxonomic distances that CoVs are able to jump(Kreuder Johnson et al 2015) our results would indicate ahigher risk for zoonotic emergence in Africa and Asia [acknowl-edging that host switching is not the only important factor driv-ing zoonotic emergence (Jones et al 2008 Keesing et al 2010Karesh et al 2012)] While evidence of regional variation in hostswitching has not been reported previously for viruses (toour knowledge) similar patterns have been observed inhemosporidian parasites (Ellis et al 2015)

In total we estimated that there are at least 3204 CoVs(based on the cluster definition used in this study) in bats Thissuggests that although there is still a large number of coronavi-ruses that remain to be discovered their detection remains anachievable goal and we advocate for their discovery given thepotential for even greater insights into the global ecology andevolution of these viruses (eg further investigation into re-gional differences in host switching) and the opportunity toelucidate their individual zoonotic potential (Ge et al 2013Wang et al 2014 Yang et al 2014 Anthony et al 2017) Buildingon our study we suggest that future surveillance efforts shouldconsider the number of samples carefully and include up to 400individuals (of either sex) per species in order to maximize thechance of detecting all CoVs and that including less than 154individuals per species might reduce the chance of finding evenone virus and would likely produce a poor return on invest-ment We also recommend that feces (or rectal swabs) followedby saliva (or oral swabs) should be prioritized if resources arelimited and all sample types cannot be processed and thatsampling should be biased towards immature animals and inthe dry season

To predict where this unknown diversity is likely to be weplotted the known distribution of all bat families associatedwith each CoV sub-clade The 2b SARS clade was associated

with rhinolophid and hipposiderid bats so we plotted the distri-bution of all bat species belonging to these families to generatea map of viral diversity (richness) Based on this map we wouldexpect the greatest diversity of unknown 2b viruses to be foundin South East Asia which is consistent with previous studies de-scribing 2b viruses in bats but also in isolated pockets through-out Africa In contrast the 2c MERS clade was associated withvespertilionid bats predicting that 2c diversity will be highest inMexico Europe Central and South East Africa and parts ofSouth East Asia Again this is consistent with the distributionof 2c viruses reported here and previously in the literature(Woo et al 2007 Reusken et al 2010 Anthony et al 2012 Lelliet al 2013 Corman et al 2014ab Anthony et al 2017 )

Certain limitations should be considered in the interpreta-tion of our data First the diversity of CoV sequences detected isalmost certainly biased by our sampling effort A small numberof species were sampled abundantly thus maximizing ourchances of finding a CoV (27282 species had gt110 individuals)however most were under-sampled (232282 with50 individ-uals) While this biases the probability of finding a CoV towardsthe more abundantly sampled species we stress that it also re-flects the uneven distribution of naturally occurring communi-ties (Magurran 2011) and that targeting the very rare specieswould be prohibitively expensive It is also unclear if populationsize has an effect on the number of viruses present in a spe-ciesmdasheg do larger populations accommodate more viral diver-sity Second our analysis only represents the lsquolastrsquo evolutionaryevent identified It does not represent or account for the com-plete evolutionary history of these lineages Therefore while asingle event has been ascribed to each association (seelsquoMethodsrsquo) it does not preclude other events having an equal orgreater impact in the past We are limited to using this most re-cent event type in order to assign a single evolutionary event toeach unique association It should also be noted that events aredependent on the relationships observed in the reconstructedtrees and could change as more viruses are added or longer se-quences are used Third we have only generated partial se-quences for analysis which limits our ability to comment onthe specific zoonotic potential of each virus or provide a com-prehensive analysis of their genetic histories and taxonomy forexample estimating the frequency or impact of recombinationwhich can be an important driver of host switching and diseaseemergence (Anthony et al 2017) We certainly support full-genome sequencing wherever possible [and have such effortsunderway (Anthony et al 2017)] however difficulties in virusisolation or sequencing directly from swabs (where materialand viral load can be low yielding variable results) can oftenpreclude extending sequences much beyond the short frag-ments amplified by consensus PCR (Drexler et al 2014) Equallylogistic and permitting issues in moving biological samplesacross borders and the need to first develop in-country capacityto fully sequence these viruses have all limited our ability tocharacterize (most of) these viruses further

Our lsquostate of the artrsquo is in the unique scope of our study focus-ing on a single viral family at a global scale To achieve this theUSAID PREDICT project first had to establish a strategy for viraldiscovery in twenty different countries many of which had noprior capacity to do this work at all For this reason we used asimple yet highly implementable approach based on cPCRInitially this approach might not appear to be as powerful ashigh throughput sequencing (HTS) techniques however thisstudy has demonstrated the particular and considerablestrengths of cPCR as an affordable screening and discovery tool inresource-poor settings where HTS is still largely unsupported

12 | Virus Evolution 2017 Vol 3 No 1

Studying viral diversity on a global scale is just one compo-nent of a larger strategy to understand the factors driving viraldiversity Ecological and evolutionary mechanisms can contrib-ute differently across scales and we therefore advocate compli-menting the global perspective with studies that focus on entireviral communities (rather than only one viral family) within sin-gle individuals or host species (rather than only on a globalscale) (Anthony et al 2015) We propose that it is critical to un-derstand both the component parts of the lsquozoonotic poolrsquo(Morse et al 2012) and the nature of their interactions at differ-ent scales if we are to move towards better predictive modelsand ultimately to reducing zoonotic emergence

Finally we offer a specific comment regarding the publichealth threat posed by bats While it is tempting to concludethat bats harbor a large number of potentially zoonotic CoVsmost of the putative viruses detected in this study are unlikelyto pose any threat to humansmdasheither because they lack the bio-logical pre-requisites to infect human cells or because the ecol-ogy of their hosts limits the opportunity for spillover Studiessuch as this are intended to advance our understanding of thefundamental biology of viruses not to create alarm or incite theretaliatory culling of bats Indeed such actions often have un-anticipated consequences and can even enhance disease trans-mission as seen with rabies in vampire bats (Streicker et al2012 Blackwood et al 2013) Bats are important insectivorespollinators and seed dispersers and we underscore both their vi-tal ecosystem role and the need to consider any public healthinterventions carefully

Supplementary data

Supplementary data are available at Virus Evolution online

Conflict of interest None declared

Acknowledgements

This study was made possible by the generous support ofthe American people through the United States Agency forInternational Development (USAID) Emerging PandemicThreats PREDICT project (cooperative agreement numberGHN-A-OO-09-00010-00) We thank the governments ofCameroon Gabon Democratic Republic of Congo Republicof Congo Rwanda Tanzania Uganda Peru Bolivia BrazilMexico Bangladesh Cambodia China Indonesia LaosMalaysia Nepal Thailand and Viet Nam for permission toconduct this study and the field teams and collaboratinglaboratories that performed sample collection and testing

ReferencesAnthony S J et al (2012) lsquoCoronaviruses in bats from Mexicorsquo

Journal of General Virology 94 1028ndash38 et al (2015) lsquoNon-random patterns in viral diversityrsquo

Nature Communications 6 8147 et al (2017) lsquoFurther evidence for bats as the evolutionary

source of MERS coronavirusrsquo mBio 82 pii e00373-17 doi101128mBio00373-17

Azhar E I et al (2014) lsquoEvidence for camel-to-human transmis-sion of MERS coronavirusrsquo The New England Journal of Medicine370 2499ndash505

Blackwood J C et al (2013) lsquoResolving the roles of immunitypathogenesis and immigration for rabies persistence in

vampire batsrsquo Proceedings of the National Academy of Sciences ofthe United States of America 110 20837ndash42

Burnham K P and Anderson D R (2002) Model Selection andMultimodel Inference A Practical Information-Theoretic ApproachNew York Springer

Charleston M A (1998) lsquoJungles a new solution to the hostpar-asite phylogeny reconciliation problemrsquo MathematicalBiosciences 149 191ndash223

Conow C et al (2010) lsquoJane a new tool for the cophylogeny recon-struction problemrsquo Algorithms for Molecular Biology AMB 5 16

Corman V M et al (2014a) lsquoRooting the phylogenetic tree ofmiddle east respiratory syndrome coronavirus by characteri-zation of a conspecific virus from an African batrsquo Journal ofVirology 88 11297ndash303

et al (2014b) lsquoCharacterization of a novel betacoronavirusrelated to middle East respiratory syndrome coronavirus inEuropean hedgehogsrsquo Journal of Virology 88 717ndash24

Coulibaly-NrsquoGolo D et al (2011) lsquoNovel arenavirus sequences inHylomyscus sp and Mus (Nannomys) setulosus from CotedrsquoIvoire implications for evolution of arenaviruses in AfricarsquoPLoS One 6 e20893

Darriba D et al (2012) lsquojModelTest 2 more models new heuris-tics and parallel computingrsquo Nature Methods 9 772

de Groot R J et al (2009) lsquoVirus taxonomy classification and no-menclature of virusesrsquo in A M Q King M J Adams E BCarstens and J Lefkkowitch (eds) Ninth Report of theInternational Committee for the Taxonomy of Viruses AmsterdamElsevier

Donaldson E F et al (2010) lsquoMetagenomic analysis of theviromes of three North American bat species viral diversityamong different bat species that share a common habitatrsquoJournal of Virology 84 13004ndash18

Drexler J F Corman V M and Drosten C (2014) lsquoEcology evo-lution and classification of bat coronaviruses in the aftermathof SARSrsquo Antiviral Research 101 45ndash56

Drosten C et al (2003) lsquoIdentification of a novel coronavirus inpatients with severe acute respiratory syndromersquo The NewEngland Journal of Medicine 348 1967ndash76

Edgar R C (2004) lsquoMUSCLE multiple sequence alignment withhigh accuracy and high throughputrsquo Nucleic Acids Research 321792ndash7

Ellis V A et al (2015) lsquoLocal host specialization host-switchingand dispersal shape the regional distributions of avian haemo-sporidian parasitesrsquo Proceedings of the National Academy ofSciences of the United States of America 112 11294ndash9

ESRI (2011) ArcGIS Desktop Release 10 Redlands CAEnvironmental Systems Research Institute

Ge X Y et al (2013) lsquoIsolation and characterization of a batSARS-like coronavirus that uses the ACE2 receptorrsquo Nature503 535ndash8

Hijmans R J Williams E Vennes C (2015) GeosphereSpherical Trigonometry v R Package version 15

Hagberg A A Schult D A and Swart P J (2008) lsquoExploring net-work structure dynamics and function using NetworkXrsquo in GVaroquaux T Vaught and J Millman (eds) Proceedings of 7thPython in Science Conference (SciPy2008) pp 11ndash15 Pasadena CA

Hu B et al (2015) lsquoBat origin of human coronavirusesrsquo VirologyJournal 12 221

Huang Y W et al (2013) lsquoOrigin evolution and genotyping ofemergent porcine epidemic diarrhea virus strains in theUnited Statesrsquo mBio 4 e00737ndash13

Huynh J et al (2012) lsquoEvidence supporting a zoonotic origin ofhuman coronavirus strain NL63rsquo Journal of Virology 8612816ndash25

S J Anthony et al | 13

Irwin N R et al (2012) lsquoComplex patterns of host switching inNew World arenavirusesrsquo Molecular Ecology 21 4137ndash50

Jacomy M et al (2014) lsquoForceAtlas2 a continuous graph layoutalgorithm for handy network visualization designed for theGephi softwarersquo PLoS One 9 e98679

Jones K E et al (2008) lsquoGlobal trends in emerging infectious dis-easesrsquo Nature 451 990ndash3

Jost L Chao A and Chazdon R L (2011) lsquoCompositional simi-larity and beta diversityrsquo in A E Magurran and B J McGill(eds) Biological Diversity Frontiers in Measurement andAssessment Oxford Oxford University Press

Karesh W B et al (2012) lsquoEcology of zoonoses natural and un-natural historiesrsquo Lancet 380 1936ndash45

Keesing F et al (2010) lsquoImpacts of biodiversity on the emergenceand transmission of infectious diseasesrsquo Nature 468 647ndash52

King A M Q et al (2012) Virus Taxonomy Classification andNomenclature of Viruses Ninth Report of the InternationalCommittee on Taxonomy of Viruses Elsevier Academic Press

Kreuder Johnson C et al (2015) lsquoSpillover and pandemic proper-ties of zoonotic viruses with high host plasticityrsquo ScientificReports 5 14830

Ksiazek T G et al (2003) lsquoA novel coronavirus associated withsevere acute respiratory syndromersquo The New England Journal ofMedicine 348 1953ndash66

Lau S K et al (2005) lsquoSevere acute respiratory syndromecoronavirus-like virus in Chinese horseshoe batsrsquo Proceedingsof the National Academy of Sciences of the United States of America102 14040ndash5

et al (2012) lsquoRecent transmission of a novel alphacoronavi-rus bat coronavirus HKU10 from Leschenaultrsquos rousettes topomona leaf-nosed bats first evidence of interspecies trans-mission of coronavirus between bats of different subordersrsquoJournal of Virology 86 11906ndash18

et al (2015) lsquoDiscovery of a novel coronavirus ChinaRattus coronavirus HKU24 from Norway rats supports the mu-rine origin of Betacoronavirus 1 and has implications for theancestor of Betacoronavirus lineage Arsquo Journal of Virology 893076ndash92

Lelli D et al (2013) lsquoDetection of coronaviruses in bats of variousspecies in Italyrsquo Viruses 5 2679ndash89

Li W et al (2005) lsquoBats are natural reservoirs of SARS-like coro-navirusesrsquo Science 310 676ndash9

Liang K and Zeger S L (1986) lsquoLongitudinal data analysis usinggeneralized linear modelsrsquo Biometrika 73 13ndash22

Maes P et al (2009) lsquoA proposal for new criteria for the classifica-tion of hantaviruses based on S and M segment protein se-quencesrsquo Infection Genetics and Evolution Journal of MolecularEpidemiology and Evolutionary Genetics in Infectious Diseases 9813ndash20

Magurran A E (2011) Biological Diversity Frontiers in Measurementand Assessment Oxford Oxford University Press

Melade J et al (2016) lsquoAn eco-epidemiological study of Morbilli-related paramyxovirus infection in Madagascar bats revealshost-switching as the dominant macro-evolutionary mecha-nismrsquo Scientific Reports 6 23752

Merkle D Middendorf M and Wieseke N (2010) lsquoA parameter-adaptive dynamic programming approach for inferring cophy-logeniesrsquo BMC Bioinformatics 11 Suppl 1 S60

Morse S S et al (2012) lsquoPrediction and prevention of the nextpandemic zoonosisrsquo The Lancet 380 1956ndash65

Pfefferle S et al (2009) lsquoDistant relatives of severe acute respira-tory syndrome coronavirus and close relatives of human

coronavirus 229E in bats Ghanarsquo Emerging Infectious Diseases15 1377ndash84

PREDICT_Consortium (2014) Reducing Pandemic Risk PromotingGlobal Health One Health Institute University of CaliforniaDavis Davis CA

Quan P L et al (2010) lsquoIdentification of a severe acute respira-tory syndrome coronavirus-like virus in a leaf-nosed bat inNigeriarsquo mBio 1 pii e00208-10 doi101128mBio00208-10

R Foundation for Statistical Computing (2008) R A Language andEnvironment for Statistical Computing Vienna Austria RFoundation for Statistical Computing

Ramsden C Holmes E C and Charleston M A (2009)lsquoHantavirus evolution in relation to its rodent and insectivorehosts no evidence for codivergencersquo Molecular Biology andEvolution 26 143ndash53

Reusken C B et al (2010) lsquoCirculation of group 2 coronavirusesin a bat species common to urban areas in Western EuropersquoVector Borne Zoonotic Dis 10 785ndash91

et al (2013) lsquoMiddle East respiratory syndrome coronavirusneutralising serum antibodies in dromedary camels a com-parative serological studyrsquo The Lancet Infectious Diseases 13859ndash66

Rubin D B (1987) Multiple Imputation for Nonresponse in SurveysNew York John Wiley and Sons

Sabir J S et al (2016) lsquoCo-circulation of three camel coronavirusspecies and recombination of MERS-CoVs in Saudi ArabiarsquoScience 351 81ndash4

Streicker D G et al (2012) lsquoEcological and anthropogenic driversof rabies exposure in vampire bats implications for transmis-sion and controlrsquo Proceedings Biological Sciencesthe RoyalSociety 279 3384ndash92

Tamura K et al (2013) lsquoMEGA6 molecular evolutionary geneticsanalysis version 60rsquo Molecular Biology and Evolution 30 2725ndash9

Tang X C et al (2006) lsquoPrevalence and genetic diversity of coro-naviruses in bats from Chinarsquo Journal of Virology 80 7481ndash90

Tsoleridis T et al (2016) lsquoDiscovery of novel alphacoronavirusesin European rodents and shrewsrsquo Viruses 8 84

Vijaykrishna D et al (2007) lsquoEvolutionary insights into the ecol-ogy of coronavirusesrsquo Journal of Virology 81 4012ndash20

Vijgen L et al (2005) lsquoComplete genomic sequence of human co-ronavirus OC43 molecular clock analysis suggests a relativelyrecent zoonotic coronavirus transmission eventrsquo Journal ofVirology 79 1595ndash604

Wacharapluesadee S et al (2015) lsquoDiversity of coronavirus inbats from Eastern Thailandrsquo Virology Journal 12 57

Wang Q et al (2014) lsquoBat origins of MERS-CoV supported by batcoronavirus HKU4 usage of human receptor CD26rsquo Cell Host ampMicrobe 16 328ndash37

Wang W et al (2015) lsquoDiscovery diversity and evolution ofnovel coronaviruses sampled from rodents in Chinarsquo Virology474 19ndash27

Watanabe S et al (2010) lsquoBat Coronaviruses and experimentalinfection of bats the Philippinesrsquo Emerging Infectious Diseases16 1217ndash23

Woo P C et al (2006) lsquoMolecular diversity of coronaviruses inbatsrsquo Virology 351 180ndash7

et al (2007) lsquoComparative analysis of twelve genomes ofthree novel group 2c and group 2d coronaviruses reveals uniquegroup and subgroup featuresrsquo Journal of Virology 81 1574ndash85

et al (2009) lsquoCoronavirus diversity phylogeny and inter-species jumpingrsquo Experimental Biology and Medicine 2341117ndash27

14 | Virus Evolution 2017 Vol 3 No 1

et al (2012) lsquoGenetic relatedness of the novel human groupC betacoronavirus to Tylonycteris bat coronavirus HKU4 andPipistrellus bat coronavirus HKU5rsquo Emerging Microbes andInfection 1 e35

Yang Y et al (2014) lsquoReceptor usage and cell entry of bat co-ronavirus HKU4 provide insight into bat-to-human

transmission of MERS coronavirusrsquo Proceedings of theNational Academy of Sciences of the United States of America111 12516ndash21

Zaki A M et al (2012) lsquoIsolation of a novel coronavirus from aman with pneumonia in Saudi Arabiarsquo The New England Journalof Medicine 367 1814ndash20

S J Anthony et al | 15

  • vex012-TF1
Page 7: Global patterns in coronavirus diversity · Since the emergence of Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV) and Middle East Respiratory Syndrom Coronavirus (MERS-CoV)

America (Fig 4 Panel B) Further host switching was nearlyfour times more likely than virus sharing in Africa when com-pared with Latin America (OR 3858 Pfrac14 0040) CoVs in Asiawere also more likely to host switch than share when comparedwith Latin America however the results were not significant(OR 2474 Pfrac14 0143) No particular bat family was associatedwith increased host switching but this may reflect small sam-ple sizes when the data are summarized by family

33 Estimated number of CoVs in bats

A large number of host (bat) species in our study were negativefor CoV (nfrac14 197) however none of these species were sampledextensively (Fig 6) We found that all species with samplesizesgt110 individuals were positive for one or more CoVs sug-gesting that we would have detected CoVs in some of the nega-tive species in our study if sampling effort were increased Dueto the high number of these negative species and their effect onthe average number of virus sequence clusters per species theywere excluded from our estimates By retaining only those spe-cies for whichgt110 individuals were sampled (there weretwenty-seven species that qualified) we estimated the averagenumber of CoVs per species to be 267 (stdfrac14 138) thus account-ing for the likely scenario that some species will have more vi-ruses and others less We then extrapolated the average to all1200 bat species to estimate a total potential richness of 3204

CoVs (rangefrac14 1200ndash6000 CoVs) most of which have yet to bedescribed

34 Factors predicting CoV positivity

In order to refine future surveillance efforts to find the undis-covered diversity of CoVs in bats we evaluated the factors pre-dicting CoV positivity For each region the best model includedspecimen type bat family (or in the case of Latin America sub-family) season and animalndashhuman interface (Table 2) Seasonwas not included in the top model for Latin America but themodel with the season included was within two AIC (deltaAICfrac14 106) and therefore considered to have as much support asthe top model (Burnham and Anderson 2002) Specimen typewas highly associated with CoV positivity and samples contain-ing feces or fecal swabs were significantly more likely to testpositive than other specimen types in all regions Bat familywas important in Asia and Africa Season was also significantwith samples collected during the dry season more likely to testpositive in Africa and Asia than those collected during the wetseason (albeit with low odds ratios) Age class appears to be im-portant as sub-adults were more likely to test positive thanadults in Africa and Asia Sex was not related to CoV positivityin Asia or Latin America while males were slightly more likelyto be CoV positive in Africa (ORfrac14 15 12ndash19 CI Plt 0001) Broadcategories of animalndashhuman interfaces were significantly

Figure 2 Maximum likelihood phylogenetic reconstructions for all partial CoV RdRp fragments for both the Quan (Panel A) and Watanabe (Panel B) assays Sequences

are collapsed into clades representing our operating taxonomic units (sequences sharing90 identity) and number of sequences for each taxon is indicated in paren-

theses Representative published sequences from GenBank have been included for comparison (GenBank accession number indicated) Both trees were rooted using

the related Breda virus (NC_007447) and all nodes have 60 bootstrap support Pie charts indicate the distribution of viral taxa by host (bat) family in each virus sub-

clade

S J Anthony et al | 7

associated with CoV positivity among bats in all three regionsin Africa and Latin America bats sampled at the animal use in-terface category were more likely to be positive and in Asiabats sampled at the animal use and human activities interfaceswere more likely to be positive than those sampled at other in-terfaces The significance of the animal interface in all three

regions suggests that practices around animal use may be im-portant for disease transmission

35 Future sampling effort

Based on our study we were able to estimate the sampling ef-fort that would be required to find all CoVs in bats based on a

Figure 3 Comparison of viral and bat diversity The earthrsquos surface was divided into grid cells by latitude and longitude (10 10 degree units) for diversity calculations

(Panel A) Grid cells where bats were sampled are numbered in each region Alpha diversity (Shannon H) for virus (Panel B) and host (Panel C) were correlated indicat-

ing that areas of high bat diversity also have high viral diversity Darker cells indicate higher alpha diversity (ie more viral or host taxa) in each grid cell Beta diversity

was also correlated between virus (Panel D) and host (Panel E) and differentiated into three discrete communities by regionmdashLatin America (grid cells 1ndash10) Africa

(grid cells 11ndash20) and Asia (grid cells 21ndash34) Shading indicates that either viruses (in red) or hosts (in blue) are shared between two corresponding grid cells with darker

cells indicating higher pairwise similarity

8 | Virus Evolution 2017 Vol 3 No 1

Poisson regression model of our data (Fig 6) With 154 individ-uals we estimate that an average of one CoV will be detected(95 CI 136ndash177) With 397 individuals we estimate that upto five CoVs will be detected (95 CI 351ndash458) As we did not

observe any species with greater than five viral sequenceclusters we make the assumption that sampling 397 individ-uals should capture the full diversity of CoVs in each batspecies

Figure 4 Network model showing the connection of CoVs and their hosts Viral sequence clusters (colored grey) are connected to host species either by region (Panel

A) or family (Panel B) Viral and host and communities separate almost entirely by region only Africa and Asia are connected by two shared viruses (HKU9 and

PREDICT_CoV-35) found in species from both continents Networks also show that viruses appear to be shared by multiple host families in Africa and Asia while being

more restricted to a single family in Latin America

S J Anthony et al | 9

Figure 5 Relative proportion of evolutionary events leading to observed virushost associations for CoVs in bats Cophylogenetic reconstructions were used to identify

each event (Supplementary Fig S1 Panels AndashD) and significance evaluated by region Across all regions host switching was the dominant evolutionary event (Panel

A) When separated by region host switching remained dominant in Africa (Panel B) and Asia (Panel C) but not in Latin America (Panel D)

Figure 6 Relationship between sampling effort and viral detection among all bat species sampled A Poisson regression model was used to estimate the expected num-

ber of viruses based on the number of animals sampled by species (each circle indicates a separate species) The 95 confidence intervals are indicated

10 | Virus Evolution 2017 Vol 3 No 1

36 Predicting the distribution of unknown CoVs

Viral sub-clade and host family were not independent of eachother (v2 by cladefrac14 Plt 0001 Supplementary Table S1) demon-strating that different clades have significant associations withspecific bat families For example subgroup 2d CoVs were sig-nificantly associated with pteropid bats and were only identi-fied in regions where pteropid bats exist Likewise subgroup 2bCoVs were associated with rhinolophid and hipposiderid batsand 2c with vespertilionid bats To lsquopredictrsquo the potential distri-bution of unknown CoVs we plotted the known distribution ofbats belonging to these families and assume that lsquohotspotsrsquo ofbat diversity infer hotspots of viral diversity for each sub-clade(Fig 7) We note that the alphaCoV genus has been consideredas one group since they do not resolve well into sub-clades(de Groot et al 2009) and that no 2a map was generatedgiven that only one bat virus from this clade was identified

4 Discussion

The emergence of SARS and MERS has driven a need to under-stand more about the diversity ecology and evolution of coro-naviruses particularly at so-called lsquohotspotsrsquo of zoonoticemergence (Jones et al 2008 Drexler et al 2014) To this end wesurveyed the diversity of CoVs from twenty countries in LatinAmerica Africa and Asia to identify global factors driving viraldiversity and to look for regional differences in factors that con-tribute to the risk of emergence such as host switching In totalwe identified sequences from 100 discrete phylogenetic clus-ters ninety-one of which were found in bats

Our data suggest that the diversity of bat CoVs has beendriven primarily by host ecology First viral richness wasstrongly correlated with bat richness suggesting that mostCoVs will be found in regions where bat diversity is highestSecond we showed that CoV diversity separates into three

Figure 7 Viral diversity lsquohotspotrsquo maps Panel A shows the potential hotspots for CoVs based on the distribution of bats worldwide Location of alpha CoV sequences

from this study are shown in black and beta CoV sequences in blue indicating there is no geographical bias based on viral genus (ie alpha and beta CoVs are equally

likely to be found in all regions) Some viral sub-clades were associated with particular bat families and the spatial distribution data of all species belonging to these

families were plotted to indicate the potential hotspots of viral diversity (richness) for these sub-clades Panel B indicates the potential distribution of 2b CoVs based on

the distribution of rhinolphus and hipposideros bats Locations of 2b-positive animals identified in this study are indicated in black and correlate with areas of high

species richness (for these families) CoV-positive animals for other sub-clades shown in light blue Panel C indicates the potential distribution of 2c CoVs based on the

distribution of vespertilionid bats Locations of 2c-positive animals identified in this study are indicated in black This map suggests there are hotspots of 2c diversity

in regions not covered in this study (eg Europe) Panel D indicates the potential distribution of 2d viruses based on the distribution of pteropid bats The map suggests

these viruses may have a more limited distribution compared with viruses of other sub-clades Locations of 2d-positive animals identified in this study are shown in

black

S J Anthony et al | 11

distinct communities by region echoing the distribution of batsand suggesting an ecological dependence on their hosts Andthird we identified particular associations between viral sub-clade and bat family indicating that CoVs have evolved with (oradapted to) preferred families Collectively these data showthat the global diversity and distribution of CoVs in bats is non-random and is driven by variation in the biogeography of bats

Regional variation was also observed in the proportion ofhost switching events relative to other evolutionary mecha-nisms such as co-speciation duplication or sharing (seeMethods for definitions) Overall host switching was the domi-nant mechanism supporting the general trend observed forCoVs (Vijaykrishna et al 2007 Woo et al 2009 Lau et al 2012) aswell as similar findings for other viral families including para-mxyoviruses (Melade et al 2016) hantaviruses (Ramsden et al2009) and arenaviruses (Coulibaly-NrsquoGolo et al 2011 Irwin et al2012) However when analyzed by region there were propor-tionally fewer events in Latin America compared with Africa orAsia Given that host switching is the first critical step in zoo-notic emergence (Li et al 2005Pfefferle et al 2009 Woo et al2012 Azhar et al 2014) we suggest this finding could reflect re-gional differences in the risk of disease emergence

It is important to note that we distinguish lsquohost switchingrsquo(inter-genus or family switching) from lsquosharingrsquo (intra-genusswitching) in our analysis and that while the proportion of hostswitches was lower in Latin America the proportion of sharingevents was concomitantly higher This indicates that CoVs inthis region still switch hosts just like they do in Africa and Asiabut that they preferentially move between closely related spe-cies While we do not yet understand the factors driving this dif-ference if we assume that the risk of spillover to humansreflects the taxonomic distances that CoVs are able to jump(Kreuder Johnson et al 2015) our results would indicate ahigher risk for zoonotic emergence in Africa and Asia [acknowl-edging that host switching is not the only important factor driv-ing zoonotic emergence (Jones et al 2008 Keesing et al 2010Karesh et al 2012)] While evidence of regional variation in hostswitching has not been reported previously for viruses (toour knowledge) similar patterns have been observed inhemosporidian parasites (Ellis et al 2015)

In total we estimated that there are at least 3204 CoVs(based on the cluster definition used in this study) in bats Thissuggests that although there is still a large number of coronavi-ruses that remain to be discovered their detection remains anachievable goal and we advocate for their discovery given thepotential for even greater insights into the global ecology andevolution of these viruses (eg further investigation into re-gional differences in host switching) and the opportunity toelucidate their individual zoonotic potential (Ge et al 2013Wang et al 2014 Yang et al 2014 Anthony et al 2017) Buildingon our study we suggest that future surveillance efforts shouldconsider the number of samples carefully and include up to 400individuals (of either sex) per species in order to maximize thechance of detecting all CoVs and that including less than 154individuals per species might reduce the chance of finding evenone virus and would likely produce a poor return on invest-ment We also recommend that feces (or rectal swabs) followedby saliva (or oral swabs) should be prioritized if resources arelimited and all sample types cannot be processed and thatsampling should be biased towards immature animals and inthe dry season

To predict where this unknown diversity is likely to be weplotted the known distribution of all bat families associatedwith each CoV sub-clade The 2b SARS clade was associated

with rhinolophid and hipposiderid bats so we plotted the distri-bution of all bat species belonging to these families to generatea map of viral diversity (richness) Based on this map we wouldexpect the greatest diversity of unknown 2b viruses to be foundin South East Asia which is consistent with previous studies de-scribing 2b viruses in bats but also in isolated pockets through-out Africa In contrast the 2c MERS clade was associated withvespertilionid bats predicting that 2c diversity will be highest inMexico Europe Central and South East Africa and parts ofSouth East Asia Again this is consistent with the distributionof 2c viruses reported here and previously in the literature(Woo et al 2007 Reusken et al 2010 Anthony et al 2012 Lelliet al 2013 Corman et al 2014ab Anthony et al 2017 )

Certain limitations should be considered in the interpreta-tion of our data First the diversity of CoV sequences detected isalmost certainly biased by our sampling effort A small numberof species were sampled abundantly thus maximizing ourchances of finding a CoV (27282 species had gt110 individuals)however most were under-sampled (232282 with50 individ-uals) While this biases the probability of finding a CoV towardsthe more abundantly sampled species we stress that it also re-flects the uneven distribution of naturally occurring communi-ties (Magurran 2011) and that targeting the very rare specieswould be prohibitively expensive It is also unclear if populationsize has an effect on the number of viruses present in a spe-ciesmdasheg do larger populations accommodate more viral diver-sity Second our analysis only represents the lsquolastrsquo evolutionaryevent identified It does not represent or account for the com-plete evolutionary history of these lineages Therefore while asingle event has been ascribed to each association (seelsquoMethodsrsquo) it does not preclude other events having an equal orgreater impact in the past We are limited to using this most re-cent event type in order to assign a single evolutionary event toeach unique association It should also be noted that events aredependent on the relationships observed in the reconstructedtrees and could change as more viruses are added or longer se-quences are used Third we have only generated partial se-quences for analysis which limits our ability to comment onthe specific zoonotic potential of each virus or provide a com-prehensive analysis of their genetic histories and taxonomy forexample estimating the frequency or impact of recombinationwhich can be an important driver of host switching and diseaseemergence (Anthony et al 2017) We certainly support full-genome sequencing wherever possible [and have such effortsunderway (Anthony et al 2017)] however difficulties in virusisolation or sequencing directly from swabs (where materialand viral load can be low yielding variable results) can oftenpreclude extending sequences much beyond the short frag-ments amplified by consensus PCR (Drexler et al 2014) Equallylogistic and permitting issues in moving biological samplesacross borders and the need to first develop in-country capacityto fully sequence these viruses have all limited our ability tocharacterize (most of) these viruses further

Our lsquostate of the artrsquo is in the unique scope of our study focus-ing on a single viral family at a global scale To achieve this theUSAID PREDICT project first had to establish a strategy for viraldiscovery in twenty different countries many of which had noprior capacity to do this work at all For this reason we used asimple yet highly implementable approach based on cPCRInitially this approach might not appear to be as powerful ashigh throughput sequencing (HTS) techniques however thisstudy has demonstrated the particular and considerablestrengths of cPCR as an affordable screening and discovery tool inresource-poor settings where HTS is still largely unsupported

12 | Virus Evolution 2017 Vol 3 No 1

Studying viral diversity on a global scale is just one compo-nent of a larger strategy to understand the factors driving viraldiversity Ecological and evolutionary mechanisms can contrib-ute differently across scales and we therefore advocate compli-menting the global perspective with studies that focus on entireviral communities (rather than only one viral family) within sin-gle individuals or host species (rather than only on a globalscale) (Anthony et al 2015) We propose that it is critical to un-derstand both the component parts of the lsquozoonotic poolrsquo(Morse et al 2012) and the nature of their interactions at differ-ent scales if we are to move towards better predictive modelsand ultimately to reducing zoonotic emergence

Finally we offer a specific comment regarding the publichealth threat posed by bats While it is tempting to concludethat bats harbor a large number of potentially zoonotic CoVsmost of the putative viruses detected in this study are unlikelyto pose any threat to humansmdasheither because they lack the bio-logical pre-requisites to infect human cells or because the ecol-ogy of their hosts limits the opportunity for spillover Studiessuch as this are intended to advance our understanding of thefundamental biology of viruses not to create alarm or incite theretaliatory culling of bats Indeed such actions often have un-anticipated consequences and can even enhance disease trans-mission as seen with rabies in vampire bats (Streicker et al2012 Blackwood et al 2013) Bats are important insectivorespollinators and seed dispersers and we underscore both their vi-tal ecosystem role and the need to consider any public healthinterventions carefully

Supplementary data

Supplementary data are available at Virus Evolution online

Conflict of interest None declared

Acknowledgements

This study was made possible by the generous support ofthe American people through the United States Agency forInternational Development (USAID) Emerging PandemicThreats PREDICT project (cooperative agreement numberGHN-A-OO-09-00010-00) We thank the governments ofCameroon Gabon Democratic Republic of Congo Republicof Congo Rwanda Tanzania Uganda Peru Bolivia BrazilMexico Bangladesh Cambodia China Indonesia LaosMalaysia Nepal Thailand and Viet Nam for permission toconduct this study and the field teams and collaboratinglaboratories that performed sample collection and testing

ReferencesAnthony S J et al (2012) lsquoCoronaviruses in bats from Mexicorsquo

Journal of General Virology 94 1028ndash38 et al (2015) lsquoNon-random patterns in viral diversityrsquo

Nature Communications 6 8147 et al (2017) lsquoFurther evidence for bats as the evolutionary

source of MERS coronavirusrsquo mBio 82 pii e00373-17 doi101128mBio00373-17

Azhar E I et al (2014) lsquoEvidence for camel-to-human transmis-sion of MERS coronavirusrsquo The New England Journal of Medicine370 2499ndash505

Blackwood J C et al (2013) lsquoResolving the roles of immunitypathogenesis and immigration for rabies persistence in

vampire batsrsquo Proceedings of the National Academy of Sciences ofthe United States of America 110 20837ndash42

Burnham K P and Anderson D R (2002) Model Selection andMultimodel Inference A Practical Information-Theoretic ApproachNew York Springer

Charleston M A (1998) lsquoJungles a new solution to the hostpar-asite phylogeny reconciliation problemrsquo MathematicalBiosciences 149 191ndash223

Conow C et al (2010) lsquoJane a new tool for the cophylogeny recon-struction problemrsquo Algorithms for Molecular Biology AMB 5 16

Corman V M et al (2014a) lsquoRooting the phylogenetic tree ofmiddle east respiratory syndrome coronavirus by characteri-zation of a conspecific virus from an African batrsquo Journal ofVirology 88 11297ndash303

et al (2014b) lsquoCharacterization of a novel betacoronavirusrelated to middle East respiratory syndrome coronavirus inEuropean hedgehogsrsquo Journal of Virology 88 717ndash24

Coulibaly-NrsquoGolo D et al (2011) lsquoNovel arenavirus sequences inHylomyscus sp and Mus (Nannomys) setulosus from CotedrsquoIvoire implications for evolution of arenaviruses in AfricarsquoPLoS One 6 e20893

Darriba D et al (2012) lsquojModelTest 2 more models new heuris-tics and parallel computingrsquo Nature Methods 9 772

de Groot R J et al (2009) lsquoVirus taxonomy classification and no-menclature of virusesrsquo in A M Q King M J Adams E BCarstens and J Lefkkowitch (eds) Ninth Report of theInternational Committee for the Taxonomy of Viruses AmsterdamElsevier

Donaldson E F et al (2010) lsquoMetagenomic analysis of theviromes of three North American bat species viral diversityamong different bat species that share a common habitatrsquoJournal of Virology 84 13004ndash18

Drexler J F Corman V M and Drosten C (2014) lsquoEcology evo-lution and classification of bat coronaviruses in the aftermathof SARSrsquo Antiviral Research 101 45ndash56

Drosten C et al (2003) lsquoIdentification of a novel coronavirus inpatients with severe acute respiratory syndromersquo The NewEngland Journal of Medicine 348 1967ndash76

Edgar R C (2004) lsquoMUSCLE multiple sequence alignment withhigh accuracy and high throughputrsquo Nucleic Acids Research 321792ndash7

Ellis V A et al (2015) lsquoLocal host specialization host-switchingand dispersal shape the regional distributions of avian haemo-sporidian parasitesrsquo Proceedings of the National Academy ofSciences of the United States of America 112 11294ndash9

ESRI (2011) ArcGIS Desktop Release 10 Redlands CAEnvironmental Systems Research Institute

Ge X Y et al (2013) lsquoIsolation and characterization of a batSARS-like coronavirus that uses the ACE2 receptorrsquo Nature503 535ndash8

Hijmans R J Williams E Vennes C (2015) GeosphereSpherical Trigonometry v R Package version 15

Hagberg A A Schult D A and Swart P J (2008) lsquoExploring net-work structure dynamics and function using NetworkXrsquo in GVaroquaux T Vaught and J Millman (eds) Proceedings of 7thPython in Science Conference (SciPy2008) pp 11ndash15 Pasadena CA

Hu B et al (2015) lsquoBat origin of human coronavirusesrsquo VirologyJournal 12 221

Huang Y W et al (2013) lsquoOrigin evolution and genotyping ofemergent porcine epidemic diarrhea virus strains in theUnited Statesrsquo mBio 4 e00737ndash13

Huynh J et al (2012) lsquoEvidence supporting a zoonotic origin ofhuman coronavirus strain NL63rsquo Journal of Virology 8612816ndash25

S J Anthony et al | 13

Irwin N R et al (2012) lsquoComplex patterns of host switching inNew World arenavirusesrsquo Molecular Ecology 21 4137ndash50

Jacomy M et al (2014) lsquoForceAtlas2 a continuous graph layoutalgorithm for handy network visualization designed for theGephi softwarersquo PLoS One 9 e98679

Jones K E et al (2008) lsquoGlobal trends in emerging infectious dis-easesrsquo Nature 451 990ndash3

Jost L Chao A and Chazdon R L (2011) lsquoCompositional simi-larity and beta diversityrsquo in A E Magurran and B J McGill(eds) Biological Diversity Frontiers in Measurement andAssessment Oxford Oxford University Press

Karesh W B et al (2012) lsquoEcology of zoonoses natural and un-natural historiesrsquo Lancet 380 1936ndash45

Keesing F et al (2010) lsquoImpacts of biodiversity on the emergenceand transmission of infectious diseasesrsquo Nature 468 647ndash52

King A M Q et al (2012) Virus Taxonomy Classification andNomenclature of Viruses Ninth Report of the InternationalCommittee on Taxonomy of Viruses Elsevier Academic Press

Kreuder Johnson C et al (2015) lsquoSpillover and pandemic proper-ties of zoonotic viruses with high host plasticityrsquo ScientificReports 5 14830

Ksiazek T G et al (2003) lsquoA novel coronavirus associated withsevere acute respiratory syndromersquo The New England Journal ofMedicine 348 1953ndash66

Lau S K et al (2005) lsquoSevere acute respiratory syndromecoronavirus-like virus in Chinese horseshoe batsrsquo Proceedingsof the National Academy of Sciences of the United States of America102 14040ndash5

et al (2012) lsquoRecent transmission of a novel alphacoronavi-rus bat coronavirus HKU10 from Leschenaultrsquos rousettes topomona leaf-nosed bats first evidence of interspecies trans-mission of coronavirus between bats of different subordersrsquoJournal of Virology 86 11906ndash18

et al (2015) lsquoDiscovery of a novel coronavirus ChinaRattus coronavirus HKU24 from Norway rats supports the mu-rine origin of Betacoronavirus 1 and has implications for theancestor of Betacoronavirus lineage Arsquo Journal of Virology 893076ndash92

Lelli D et al (2013) lsquoDetection of coronaviruses in bats of variousspecies in Italyrsquo Viruses 5 2679ndash89

Li W et al (2005) lsquoBats are natural reservoirs of SARS-like coro-navirusesrsquo Science 310 676ndash9

Liang K and Zeger S L (1986) lsquoLongitudinal data analysis usinggeneralized linear modelsrsquo Biometrika 73 13ndash22

Maes P et al (2009) lsquoA proposal for new criteria for the classifica-tion of hantaviruses based on S and M segment protein se-quencesrsquo Infection Genetics and Evolution Journal of MolecularEpidemiology and Evolutionary Genetics in Infectious Diseases 9813ndash20

Magurran A E (2011) Biological Diversity Frontiers in Measurementand Assessment Oxford Oxford University Press

Melade J et al (2016) lsquoAn eco-epidemiological study of Morbilli-related paramyxovirus infection in Madagascar bats revealshost-switching as the dominant macro-evolutionary mecha-nismrsquo Scientific Reports 6 23752

Merkle D Middendorf M and Wieseke N (2010) lsquoA parameter-adaptive dynamic programming approach for inferring cophy-logeniesrsquo BMC Bioinformatics 11 Suppl 1 S60

Morse S S et al (2012) lsquoPrediction and prevention of the nextpandemic zoonosisrsquo The Lancet 380 1956ndash65

Pfefferle S et al (2009) lsquoDistant relatives of severe acute respira-tory syndrome coronavirus and close relatives of human

coronavirus 229E in bats Ghanarsquo Emerging Infectious Diseases15 1377ndash84

PREDICT_Consortium (2014) Reducing Pandemic Risk PromotingGlobal Health One Health Institute University of CaliforniaDavis Davis CA

Quan P L et al (2010) lsquoIdentification of a severe acute respira-tory syndrome coronavirus-like virus in a leaf-nosed bat inNigeriarsquo mBio 1 pii e00208-10 doi101128mBio00208-10

R Foundation for Statistical Computing (2008) R A Language andEnvironment for Statistical Computing Vienna Austria RFoundation for Statistical Computing

Ramsden C Holmes E C and Charleston M A (2009)lsquoHantavirus evolution in relation to its rodent and insectivorehosts no evidence for codivergencersquo Molecular Biology andEvolution 26 143ndash53

Reusken C B et al (2010) lsquoCirculation of group 2 coronavirusesin a bat species common to urban areas in Western EuropersquoVector Borne Zoonotic Dis 10 785ndash91

et al (2013) lsquoMiddle East respiratory syndrome coronavirusneutralising serum antibodies in dromedary camels a com-parative serological studyrsquo The Lancet Infectious Diseases 13859ndash66

Rubin D B (1987) Multiple Imputation for Nonresponse in SurveysNew York John Wiley and Sons

Sabir J S et al (2016) lsquoCo-circulation of three camel coronavirusspecies and recombination of MERS-CoVs in Saudi ArabiarsquoScience 351 81ndash4

Streicker D G et al (2012) lsquoEcological and anthropogenic driversof rabies exposure in vampire bats implications for transmis-sion and controlrsquo Proceedings Biological Sciencesthe RoyalSociety 279 3384ndash92

Tamura K et al (2013) lsquoMEGA6 molecular evolutionary geneticsanalysis version 60rsquo Molecular Biology and Evolution 30 2725ndash9

Tang X C et al (2006) lsquoPrevalence and genetic diversity of coro-naviruses in bats from Chinarsquo Journal of Virology 80 7481ndash90

Tsoleridis T et al (2016) lsquoDiscovery of novel alphacoronavirusesin European rodents and shrewsrsquo Viruses 8 84

Vijaykrishna D et al (2007) lsquoEvolutionary insights into the ecol-ogy of coronavirusesrsquo Journal of Virology 81 4012ndash20

Vijgen L et al (2005) lsquoComplete genomic sequence of human co-ronavirus OC43 molecular clock analysis suggests a relativelyrecent zoonotic coronavirus transmission eventrsquo Journal ofVirology 79 1595ndash604

Wacharapluesadee S et al (2015) lsquoDiversity of coronavirus inbats from Eastern Thailandrsquo Virology Journal 12 57

Wang Q et al (2014) lsquoBat origins of MERS-CoV supported by batcoronavirus HKU4 usage of human receptor CD26rsquo Cell Host ampMicrobe 16 328ndash37

Wang W et al (2015) lsquoDiscovery diversity and evolution ofnovel coronaviruses sampled from rodents in Chinarsquo Virology474 19ndash27

Watanabe S et al (2010) lsquoBat Coronaviruses and experimentalinfection of bats the Philippinesrsquo Emerging Infectious Diseases16 1217ndash23

Woo P C et al (2006) lsquoMolecular diversity of coronaviruses inbatsrsquo Virology 351 180ndash7

et al (2007) lsquoComparative analysis of twelve genomes ofthree novel group 2c and group 2d coronaviruses reveals uniquegroup and subgroup featuresrsquo Journal of Virology 81 1574ndash85

et al (2009) lsquoCoronavirus diversity phylogeny and inter-species jumpingrsquo Experimental Biology and Medicine 2341117ndash27

14 | Virus Evolution 2017 Vol 3 No 1

et al (2012) lsquoGenetic relatedness of the novel human groupC betacoronavirus to Tylonycteris bat coronavirus HKU4 andPipistrellus bat coronavirus HKU5rsquo Emerging Microbes andInfection 1 e35

Yang Y et al (2014) lsquoReceptor usage and cell entry of bat co-ronavirus HKU4 provide insight into bat-to-human

transmission of MERS coronavirusrsquo Proceedings of theNational Academy of Sciences of the United States of America111 12516ndash21

Zaki A M et al (2012) lsquoIsolation of a novel coronavirus from aman with pneumonia in Saudi Arabiarsquo The New England Journalof Medicine 367 1814ndash20

S J Anthony et al | 15

  • vex012-TF1
Page 8: Global patterns in coronavirus diversity · Since the emergence of Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV) and Middle East Respiratory Syndrom Coronavirus (MERS-CoV)

associated with CoV positivity among bats in all three regionsin Africa and Latin America bats sampled at the animal use in-terface category were more likely to be positive and in Asiabats sampled at the animal use and human activities interfaceswere more likely to be positive than those sampled at other in-terfaces The significance of the animal interface in all three

regions suggests that practices around animal use may be im-portant for disease transmission

35 Future sampling effort

Based on our study we were able to estimate the sampling ef-fort that would be required to find all CoVs in bats based on a

Figure 3 Comparison of viral and bat diversity The earthrsquos surface was divided into grid cells by latitude and longitude (10 10 degree units) for diversity calculations

(Panel A) Grid cells where bats were sampled are numbered in each region Alpha diversity (Shannon H) for virus (Panel B) and host (Panel C) were correlated indicat-

ing that areas of high bat diversity also have high viral diversity Darker cells indicate higher alpha diversity (ie more viral or host taxa) in each grid cell Beta diversity

was also correlated between virus (Panel D) and host (Panel E) and differentiated into three discrete communities by regionmdashLatin America (grid cells 1ndash10) Africa

(grid cells 11ndash20) and Asia (grid cells 21ndash34) Shading indicates that either viruses (in red) or hosts (in blue) are shared between two corresponding grid cells with darker

cells indicating higher pairwise similarity

8 | Virus Evolution 2017 Vol 3 No 1

Poisson regression model of our data (Fig 6) With 154 individ-uals we estimate that an average of one CoV will be detected(95 CI 136ndash177) With 397 individuals we estimate that upto five CoVs will be detected (95 CI 351ndash458) As we did not

observe any species with greater than five viral sequenceclusters we make the assumption that sampling 397 individ-uals should capture the full diversity of CoVs in each batspecies

Figure 4 Network model showing the connection of CoVs and their hosts Viral sequence clusters (colored grey) are connected to host species either by region (Panel

A) or family (Panel B) Viral and host and communities separate almost entirely by region only Africa and Asia are connected by two shared viruses (HKU9 and

PREDICT_CoV-35) found in species from both continents Networks also show that viruses appear to be shared by multiple host families in Africa and Asia while being

more restricted to a single family in Latin America

S J Anthony et al | 9

Figure 5 Relative proportion of evolutionary events leading to observed virushost associations for CoVs in bats Cophylogenetic reconstructions were used to identify

each event (Supplementary Fig S1 Panels AndashD) and significance evaluated by region Across all regions host switching was the dominant evolutionary event (Panel

A) When separated by region host switching remained dominant in Africa (Panel B) and Asia (Panel C) but not in Latin America (Panel D)

Figure 6 Relationship between sampling effort and viral detection among all bat species sampled A Poisson regression model was used to estimate the expected num-

ber of viruses based on the number of animals sampled by species (each circle indicates a separate species) The 95 confidence intervals are indicated

10 | Virus Evolution 2017 Vol 3 No 1

36 Predicting the distribution of unknown CoVs

Viral sub-clade and host family were not independent of eachother (v2 by cladefrac14 Plt 0001 Supplementary Table S1) demon-strating that different clades have significant associations withspecific bat families For example subgroup 2d CoVs were sig-nificantly associated with pteropid bats and were only identi-fied in regions where pteropid bats exist Likewise subgroup 2bCoVs were associated with rhinolophid and hipposiderid batsand 2c with vespertilionid bats To lsquopredictrsquo the potential distri-bution of unknown CoVs we plotted the known distribution ofbats belonging to these families and assume that lsquohotspotsrsquo ofbat diversity infer hotspots of viral diversity for each sub-clade(Fig 7) We note that the alphaCoV genus has been consideredas one group since they do not resolve well into sub-clades(de Groot et al 2009) and that no 2a map was generatedgiven that only one bat virus from this clade was identified

4 Discussion

The emergence of SARS and MERS has driven a need to under-stand more about the diversity ecology and evolution of coro-naviruses particularly at so-called lsquohotspotsrsquo of zoonoticemergence (Jones et al 2008 Drexler et al 2014) To this end wesurveyed the diversity of CoVs from twenty countries in LatinAmerica Africa and Asia to identify global factors driving viraldiversity and to look for regional differences in factors that con-tribute to the risk of emergence such as host switching In totalwe identified sequences from 100 discrete phylogenetic clus-ters ninety-one of which were found in bats

Our data suggest that the diversity of bat CoVs has beendriven primarily by host ecology First viral richness wasstrongly correlated with bat richness suggesting that mostCoVs will be found in regions where bat diversity is highestSecond we showed that CoV diversity separates into three

Figure 7 Viral diversity lsquohotspotrsquo maps Panel A shows the potential hotspots for CoVs based on the distribution of bats worldwide Location of alpha CoV sequences

from this study are shown in black and beta CoV sequences in blue indicating there is no geographical bias based on viral genus (ie alpha and beta CoVs are equally

likely to be found in all regions) Some viral sub-clades were associated with particular bat families and the spatial distribution data of all species belonging to these

families were plotted to indicate the potential hotspots of viral diversity (richness) for these sub-clades Panel B indicates the potential distribution of 2b CoVs based on

the distribution of rhinolphus and hipposideros bats Locations of 2b-positive animals identified in this study are indicated in black and correlate with areas of high

species richness (for these families) CoV-positive animals for other sub-clades shown in light blue Panel C indicates the potential distribution of 2c CoVs based on the

distribution of vespertilionid bats Locations of 2c-positive animals identified in this study are indicated in black This map suggests there are hotspots of 2c diversity

in regions not covered in this study (eg Europe) Panel D indicates the potential distribution of 2d viruses based on the distribution of pteropid bats The map suggests

these viruses may have a more limited distribution compared with viruses of other sub-clades Locations of 2d-positive animals identified in this study are shown in

black

S J Anthony et al | 11

distinct communities by region echoing the distribution of batsand suggesting an ecological dependence on their hosts Andthird we identified particular associations between viral sub-clade and bat family indicating that CoVs have evolved with (oradapted to) preferred families Collectively these data showthat the global diversity and distribution of CoVs in bats is non-random and is driven by variation in the biogeography of bats

Regional variation was also observed in the proportion ofhost switching events relative to other evolutionary mecha-nisms such as co-speciation duplication or sharing (seeMethods for definitions) Overall host switching was the domi-nant mechanism supporting the general trend observed forCoVs (Vijaykrishna et al 2007 Woo et al 2009 Lau et al 2012) aswell as similar findings for other viral families including para-mxyoviruses (Melade et al 2016) hantaviruses (Ramsden et al2009) and arenaviruses (Coulibaly-NrsquoGolo et al 2011 Irwin et al2012) However when analyzed by region there were propor-tionally fewer events in Latin America compared with Africa orAsia Given that host switching is the first critical step in zoo-notic emergence (Li et al 2005Pfefferle et al 2009 Woo et al2012 Azhar et al 2014) we suggest this finding could reflect re-gional differences in the risk of disease emergence

It is important to note that we distinguish lsquohost switchingrsquo(inter-genus or family switching) from lsquosharingrsquo (intra-genusswitching) in our analysis and that while the proportion of hostswitches was lower in Latin America the proportion of sharingevents was concomitantly higher This indicates that CoVs inthis region still switch hosts just like they do in Africa and Asiabut that they preferentially move between closely related spe-cies While we do not yet understand the factors driving this dif-ference if we assume that the risk of spillover to humansreflects the taxonomic distances that CoVs are able to jump(Kreuder Johnson et al 2015) our results would indicate ahigher risk for zoonotic emergence in Africa and Asia [acknowl-edging that host switching is not the only important factor driv-ing zoonotic emergence (Jones et al 2008 Keesing et al 2010Karesh et al 2012)] While evidence of regional variation in hostswitching has not been reported previously for viruses (toour knowledge) similar patterns have been observed inhemosporidian parasites (Ellis et al 2015)

In total we estimated that there are at least 3204 CoVs(based on the cluster definition used in this study) in bats Thissuggests that although there is still a large number of coronavi-ruses that remain to be discovered their detection remains anachievable goal and we advocate for their discovery given thepotential for even greater insights into the global ecology andevolution of these viruses (eg further investigation into re-gional differences in host switching) and the opportunity toelucidate their individual zoonotic potential (Ge et al 2013Wang et al 2014 Yang et al 2014 Anthony et al 2017) Buildingon our study we suggest that future surveillance efforts shouldconsider the number of samples carefully and include up to 400individuals (of either sex) per species in order to maximize thechance of detecting all CoVs and that including less than 154individuals per species might reduce the chance of finding evenone virus and would likely produce a poor return on invest-ment We also recommend that feces (or rectal swabs) followedby saliva (or oral swabs) should be prioritized if resources arelimited and all sample types cannot be processed and thatsampling should be biased towards immature animals and inthe dry season

To predict where this unknown diversity is likely to be weplotted the known distribution of all bat families associatedwith each CoV sub-clade The 2b SARS clade was associated

with rhinolophid and hipposiderid bats so we plotted the distri-bution of all bat species belonging to these families to generatea map of viral diversity (richness) Based on this map we wouldexpect the greatest diversity of unknown 2b viruses to be foundin South East Asia which is consistent with previous studies de-scribing 2b viruses in bats but also in isolated pockets through-out Africa In contrast the 2c MERS clade was associated withvespertilionid bats predicting that 2c diversity will be highest inMexico Europe Central and South East Africa and parts ofSouth East Asia Again this is consistent with the distributionof 2c viruses reported here and previously in the literature(Woo et al 2007 Reusken et al 2010 Anthony et al 2012 Lelliet al 2013 Corman et al 2014ab Anthony et al 2017 )

Certain limitations should be considered in the interpreta-tion of our data First the diversity of CoV sequences detected isalmost certainly biased by our sampling effort A small numberof species were sampled abundantly thus maximizing ourchances of finding a CoV (27282 species had gt110 individuals)however most were under-sampled (232282 with50 individ-uals) While this biases the probability of finding a CoV towardsthe more abundantly sampled species we stress that it also re-flects the uneven distribution of naturally occurring communi-ties (Magurran 2011) and that targeting the very rare specieswould be prohibitively expensive It is also unclear if populationsize has an effect on the number of viruses present in a spe-ciesmdasheg do larger populations accommodate more viral diver-sity Second our analysis only represents the lsquolastrsquo evolutionaryevent identified It does not represent or account for the com-plete evolutionary history of these lineages Therefore while asingle event has been ascribed to each association (seelsquoMethodsrsquo) it does not preclude other events having an equal orgreater impact in the past We are limited to using this most re-cent event type in order to assign a single evolutionary event toeach unique association It should also be noted that events aredependent on the relationships observed in the reconstructedtrees and could change as more viruses are added or longer se-quences are used Third we have only generated partial se-quences for analysis which limits our ability to comment onthe specific zoonotic potential of each virus or provide a com-prehensive analysis of their genetic histories and taxonomy forexample estimating the frequency or impact of recombinationwhich can be an important driver of host switching and diseaseemergence (Anthony et al 2017) We certainly support full-genome sequencing wherever possible [and have such effortsunderway (Anthony et al 2017)] however difficulties in virusisolation or sequencing directly from swabs (where materialand viral load can be low yielding variable results) can oftenpreclude extending sequences much beyond the short frag-ments amplified by consensus PCR (Drexler et al 2014) Equallylogistic and permitting issues in moving biological samplesacross borders and the need to first develop in-country capacityto fully sequence these viruses have all limited our ability tocharacterize (most of) these viruses further

Our lsquostate of the artrsquo is in the unique scope of our study focus-ing on a single viral family at a global scale To achieve this theUSAID PREDICT project first had to establish a strategy for viraldiscovery in twenty different countries many of which had noprior capacity to do this work at all For this reason we used asimple yet highly implementable approach based on cPCRInitially this approach might not appear to be as powerful ashigh throughput sequencing (HTS) techniques however thisstudy has demonstrated the particular and considerablestrengths of cPCR as an affordable screening and discovery tool inresource-poor settings where HTS is still largely unsupported

12 | Virus Evolution 2017 Vol 3 No 1

Studying viral diversity on a global scale is just one compo-nent of a larger strategy to understand the factors driving viraldiversity Ecological and evolutionary mechanisms can contrib-ute differently across scales and we therefore advocate compli-menting the global perspective with studies that focus on entireviral communities (rather than only one viral family) within sin-gle individuals or host species (rather than only on a globalscale) (Anthony et al 2015) We propose that it is critical to un-derstand both the component parts of the lsquozoonotic poolrsquo(Morse et al 2012) and the nature of their interactions at differ-ent scales if we are to move towards better predictive modelsand ultimately to reducing zoonotic emergence

Finally we offer a specific comment regarding the publichealth threat posed by bats While it is tempting to concludethat bats harbor a large number of potentially zoonotic CoVsmost of the putative viruses detected in this study are unlikelyto pose any threat to humansmdasheither because they lack the bio-logical pre-requisites to infect human cells or because the ecol-ogy of their hosts limits the opportunity for spillover Studiessuch as this are intended to advance our understanding of thefundamental biology of viruses not to create alarm or incite theretaliatory culling of bats Indeed such actions often have un-anticipated consequences and can even enhance disease trans-mission as seen with rabies in vampire bats (Streicker et al2012 Blackwood et al 2013) Bats are important insectivorespollinators and seed dispersers and we underscore both their vi-tal ecosystem role and the need to consider any public healthinterventions carefully

Supplementary data

Supplementary data are available at Virus Evolution online

Conflict of interest None declared

Acknowledgements

This study was made possible by the generous support ofthe American people through the United States Agency forInternational Development (USAID) Emerging PandemicThreats PREDICT project (cooperative agreement numberGHN-A-OO-09-00010-00) We thank the governments ofCameroon Gabon Democratic Republic of Congo Republicof Congo Rwanda Tanzania Uganda Peru Bolivia BrazilMexico Bangladesh Cambodia China Indonesia LaosMalaysia Nepal Thailand and Viet Nam for permission toconduct this study and the field teams and collaboratinglaboratories that performed sample collection and testing

ReferencesAnthony S J et al (2012) lsquoCoronaviruses in bats from Mexicorsquo

Journal of General Virology 94 1028ndash38 et al (2015) lsquoNon-random patterns in viral diversityrsquo

Nature Communications 6 8147 et al (2017) lsquoFurther evidence for bats as the evolutionary

source of MERS coronavirusrsquo mBio 82 pii e00373-17 doi101128mBio00373-17

Azhar E I et al (2014) lsquoEvidence for camel-to-human transmis-sion of MERS coronavirusrsquo The New England Journal of Medicine370 2499ndash505

Blackwood J C et al (2013) lsquoResolving the roles of immunitypathogenesis and immigration for rabies persistence in

vampire batsrsquo Proceedings of the National Academy of Sciences ofthe United States of America 110 20837ndash42

Burnham K P and Anderson D R (2002) Model Selection andMultimodel Inference A Practical Information-Theoretic ApproachNew York Springer

Charleston M A (1998) lsquoJungles a new solution to the hostpar-asite phylogeny reconciliation problemrsquo MathematicalBiosciences 149 191ndash223

Conow C et al (2010) lsquoJane a new tool for the cophylogeny recon-struction problemrsquo Algorithms for Molecular Biology AMB 5 16

Corman V M et al (2014a) lsquoRooting the phylogenetic tree ofmiddle east respiratory syndrome coronavirus by characteri-zation of a conspecific virus from an African batrsquo Journal ofVirology 88 11297ndash303

et al (2014b) lsquoCharacterization of a novel betacoronavirusrelated to middle East respiratory syndrome coronavirus inEuropean hedgehogsrsquo Journal of Virology 88 717ndash24

Coulibaly-NrsquoGolo D et al (2011) lsquoNovel arenavirus sequences inHylomyscus sp and Mus (Nannomys) setulosus from CotedrsquoIvoire implications for evolution of arenaviruses in AfricarsquoPLoS One 6 e20893

Darriba D et al (2012) lsquojModelTest 2 more models new heuris-tics and parallel computingrsquo Nature Methods 9 772

de Groot R J et al (2009) lsquoVirus taxonomy classification and no-menclature of virusesrsquo in A M Q King M J Adams E BCarstens and J Lefkkowitch (eds) Ninth Report of theInternational Committee for the Taxonomy of Viruses AmsterdamElsevier

Donaldson E F et al (2010) lsquoMetagenomic analysis of theviromes of three North American bat species viral diversityamong different bat species that share a common habitatrsquoJournal of Virology 84 13004ndash18

Drexler J F Corman V M and Drosten C (2014) lsquoEcology evo-lution and classification of bat coronaviruses in the aftermathof SARSrsquo Antiviral Research 101 45ndash56

Drosten C et al (2003) lsquoIdentification of a novel coronavirus inpatients with severe acute respiratory syndromersquo The NewEngland Journal of Medicine 348 1967ndash76

Edgar R C (2004) lsquoMUSCLE multiple sequence alignment withhigh accuracy and high throughputrsquo Nucleic Acids Research 321792ndash7

Ellis V A et al (2015) lsquoLocal host specialization host-switchingand dispersal shape the regional distributions of avian haemo-sporidian parasitesrsquo Proceedings of the National Academy ofSciences of the United States of America 112 11294ndash9

ESRI (2011) ArcGIS Desktop Release 10 Redlands CAEnvironmental Systems Research Institute

Ge X Y et al (2013) lsquoIsolation and characterization of a batSARS-like coronavirus that uses the ACE2 receptorrsquo Nature503 535ndash8

Hijmans R J Williams E Vennes C (2015) GeosphereSpherical Trigonometry v R Package version 15

Hagberg A A Schult D A and Swart P J (2008) lsquoExploring net-work structure dynamics and function using NetworkXrsquo in GVaroquaux T Vaught and J Millman (eds) Proceedings of 7thPython in Science Conference (SciPy2008) pp 11ndash15 Pasadena CA

Hu B et al (2015) lsquoBat origin of human coronavirusesrsquo VirologyJournal 12 221

Huang Y W et al (2013) lsquoOrigin evolution and genotyping ofemergent porcine epidemic diarrhea virus strains in theUnited Statesrsquo mBio 4 e00737ndash13

Huynh J et al (2012) lsquoEvidence supporting a zoonotic origin ofhuman coronavirus strain NL63rsquo Journal of Virology 8612816ndash25

S J Anthony et al | 13

Irwin N R et al (2012) lsquoComplex patterns of host switching inNew World arenavirusesrsquo Molecular Ecology 21 4137ndash50

Jacomy M et al (2014) lsquoForceAtlas2 a continuous graph layoutalgorithm for handy network visualization designed for theGephi softwarersquo PLoS One 9 e98679

Jones K E et al (2008) lsquoGlobal trends in emerging infectious dis-easesrsquo Nature 451 990ndash3

Jost L Chao A and Chazdon R L (2011) lsquoCompositional simi-larity and beta diversityrsquo in A E Magurran and B J McGill(eds) Biological Diversity Frontiers in Measurement andAssessment Oxford Oxford University Press

Karesh W B et al (2012) lsquoEcology of zoonoses natural and un-natural historiesrsquo Lancet 380 1936ndash45

Keesing F et al (2010) lsquoImpacts of biodiversity on the emergenceand transmission of infectious diseasesrsquo Nature 468 647ndash52

King A M Q et al (2012) Virus Taxonomy Classification andNomenclature of Viruses Ninth Report of the InternationalCommittee on Taxonomy of Viruses Elsevier Academic Press

Kreuder Johnson C et al (2015) lsquoSpillover and pandemic proper-ties of zoonotic viruses with high host plasticityrsquo ScientificReports 5 14830

Ksiazek T G et al (2003) lsquoA novel coronavirus associated withsevere acute respiratory syndromersquo The New England Journal ofMedicine 348 1953ndash66

Lau S K et al (2005) lsquoSevere acute respiratory syndromecoronavirus-like virus in Chinese horseshoe batsrsquo Proceedingsof the National Academy of Sciences of the United States of America102 14040ndash5

et al (2012) lsquoRecent transmission of a novel alphacoronavi-rus bat coronavirus HKU10 from Leschenaultrsquos rousettes topomona leaf-nosed bats first evidence of interspecies trans-mission of coronavirus between bats of different subordersrsquoJournal of Virology 86 11906ndash18

et al (2015) lsquoDiscovery of a novel coronavirus ChinaRattus coronavirus HKU24 from Norway rats supports the mu-rine origin of Betacoronavirus 1 and has implications for theancestor of Betacoronavirus lineage Arsquo Journal of Virology 893076ndash92

Lelli D et al (2013) lsquoDetection of coronaviruses in bats of variousspecies in Italyrsquo Viruses 5 2679ndash89

Li W et al (2005) lsquoBats are natural reservoirs of SARS-like coro-navirusesrsquo Science 310 676ndash9

Liang K and Zeger S L (1986) lsquoLongitudinal data analysis usinggeneralized linear modelsrsquo Biometrika 73 13ndash22

Maes P et al (2009) lsquoA proposal for new criteria for the classifica-tion of hantaviruses based on S and M segment protein se-quencesrsquo Infection Genetics and Evolution Journal of MolecularEpidemiology and Evolutionary Genetics in Infectious Diseases 9813ndash20

Magurran A E (2011) Biological Diversity Frontiers in Measurementand Assessment Oxford Oxford University Press

Melade J et al (2016) lsquoAn eco-epidemiological study of Morbilli-related paramyxovirus infection in Madagascar bats revealshost-switching as the dominant macro-evolutionary mecha-nismrsquo Scientific Reports 6 23752

Merkle D Middendorf M and Wieseke N (2010) lsquoA parameter-adaptive dynamic programming approach for inferring cophy-logeniesrsquo BMC Bioinformatics 11 Suppl 1 S60

Morse S S et al (2012) lsquoPrediction and prevention of the nextpandemic zoonosisrsquo The Lancet 380 1956ndash65

Pfefferle S et al (2009) lsquoDistant relatives of severe acute respira-tory syndrome coronavirus and close relatives of human

coronavirus 229E in bats Ghanarsquo Emerging Infectious Diseases15 1377ndash84

PREDICT_Consortium (2014) Reducing Pandemic Risk PromotingGlobal Health One Health Institute University of CaliforniaDavis Davis CA

Quan P L et al (2010) lsquoIdentification of a severe acute respira-tory syndrome coronavirus-like virus in a leaf-nosed bat inNigeriarsquo mBio 1 pii e00208-10 doi101128mBio00208-10

R Foundation for Statistical Computing (2008) R A Language andEnvironment for Statistical Computing Vienna Austria RFoundation for Statistical Computing

Ramsden C Holmes E C and Charleston M A (2009)lsquoHantavirus evolution in relation to its rodent and insectivorehosts no evidence for codivergencersquo Molecular Biology andEvolution 26 143ndash53

Reusken C B et al (2010) lsquoCirculation of group 2 coronavirusesin a bat species common to urban areas in Western EuropersquoVector Borne Zoonotic Dis 10 785ndash91

et al (2013) lsquoMiddle East respiratory syndrome coronavirusneutralising serum antibodies in dromedary camels a com-parative serological studyrsquo The Lancet Infectious Diseases 13859ndash66

Rubin D B (1987) Multiple Imputation for Nonresponse in SurveysNew York John Wiley and Sons

Sabir J S et al (2016) lsquoCo-circulation of three camel coronavirusspecies and recombination of MERS-CoVs in Saudi ArabiarsquoScience 351 81ndash4

Streicker D G et al (2012) lsquoEcological and anthropogenic driversof rabies exposure in vampire bats implications for transmis-sion and controlrsquo Proceedings Biological Sciencesthe RoyalSociety 279 3384ndash92

Tamura K et al (2013) lsquoMEGA6 molecular evolutionary geneticsanalysis version 60rsquo Molecular Biology and Evolution 30 2725ndash9

Tang X C et al (2006) lsquoPrevalence and genetic diversity of coro-naviruses in bats from Chinarsquo Journal of Virology 80 7481ndash90

Tsoleridis T et al (2016) lsquoDiscovery of novel alphacoronavirusesin European rodents and shrewsrsquo Viruses 8 84

Vijaykrishna D et al (2007) lsquoEvolutionary insights into the ecol-ogy of coronavirusesrsquo Journal of Virology 81 4012ndash20

Vijgen L et al (2005) lsquoComplete genomic sequence of human co-ronavirus OC43 molecular clock analysis suggests a relativelyrecent zoonotic coronavirus transmission eventrsquo Journal ofVirology 79 1595ndash604

Wacharapluesadee S et al (2015) lsquoDiversity of coronavirus inbats from Eastern Thailandrsquo Virology Journal 12 57

Wang Q et al (2014) lsquoBat origins of MERS-CoV supported by batcoronavirus HKU4 usage of human receptor CD26rsquo Cell Host ampMicrobe 16 328ndash37

Wang W et al (2015) lsquoDiscovery diversity and evolution ofnovel coronaviruses sampled from rodents in Chinarsquo Virology474 19ndash27

Watanabe S et al (2010) lsquoBat Coronaviruses and experimentalinfection of bats the Philippinesrsquo Emerging Infectious Diseases16 1217ndash23

Woo P C et al (2006) lsquoMolecular diversity of coronaviruses inbatsrsquo Virology 351 180ndash7

et al (2007) lsquoComparative analysis of twelve genomes ofthree novel group 2c and group 2d coronaviruses reveals uniquegroup and subgroup featuresrsquo Journal of Virology 81 1574ndash85

et al (2009) lsquoCoronavirus diversity phylogeny and inter-species jumpingrsquo Experimental Biology and Medicine 2341117ndash27

14 | Virus Evolution 2017 Vol 3 No 1

et al (2012) lsquoGenetic relatedness of the novel human groupC betacoronavirus to Tylonycteris bat coronavirus HKU4 andPipistrellus bat coronavirus HKU5rsquo Emerging Microbes andInfection 1 e35

Yang Y et al (2014) lsquoReceptor usage and cell entry of bat co-ronavirus HKU4 provide insight into bat-to-human

transmission of MERS coronavirusrsquo Proceedings of theNational Academy of Sciences of the United States of America111 12516ndash21

Zaki A M et al (2012) lsquoIsolation of a novel coronavirus from aman with pneumonia in Saudi Arabiarsquo The New England Journalof Medicine 367 1814ndash20

S J Anthony et al | 15

  • vex012-TF1
Page 9: Global patterns in coronavirus diversity · Since the emergence of Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV) and Middle East Respiratory Syndrom Coronavirus (MERS-CoV)

Poisson regression model of our data (Fig 6) With 154 individ-uals we estimate that an average of one CoV will be detected(95 CI 136ndash177) With 397 individuals we estimate that upto five CoVs will be detected (95 CI 351ndash458) As we did not

observe any species with greater than five viral sequenceclusters we make the assumption that sampling 397 individ-uals should capture the full diversity of CoVs in each batspecies

Figure 4 Network model showing the connection of CoVs and their hosts Viral sequence clusters (colored grey) are connected to host species either by region (Panel

A) or family (Panel B) Viral and host and communities separate almost entirely by region only Africa and Asia are connected by two shared viruses (HKU9 and

PREDICT_CoV-35) found in species from both continents Networks also show that viruses appear to be shared by multiple host families in Africa and Asia while being

more restricted to a single family in Latin America

S J Anthony et al | 9

Figure 5 Relative proportion of evolutionary events leading to observed virushost associations for CoVs in bats Cophylogenetic reconstructions were used to identify

each event (Supplementary Fig S1 Panels AndashD) and significance evaluated by region Across all regions host switching was the dominant evolutionary event (Panel

A) When separated by region host switching remained dominant in Africa (Panel B) and Asia (Panel C) but not in Latin America (Panel D)

Figure 6 Relationship between sampling effort and viral detection among all bat species sampled A Poisson regression model was used to estimate the expected num-

ber of viruses based on the number of animals sampled by species (each circle indicates a separate species) The 95 confidence intervals are indicated

10 | Virus Evolution 2017 Vol 3 No 1

36 Predicting the distribution of unknown CoVs

Viral sub-clade and host family were not independent of eachother (v2 by cladefrac14 Plt 0001 Supplementary Table S1) demon-strating that different clades have significant associations withspecific bat families For example subgroup 2d CoVs were sig-nificantly associated with pteropid bats and were only identi-fied in regions where pteropid bats exist Likewise subgroup 2bCoVs were associated with rhinolophid and hipposiderid batsand 2c with vespertilionid bats To lsquopredictrsquo the potential distri-bution of unknown CoVs we plotted the known distribution ofbats belonging to these families and assume that lsquohotspotsrsquo ofbat diversity infer hotspots of viral diversity for each sub-clade(Fig 7) We note that the alphaCoV genus has been consideredas one group since they do not resolve well into sub-clades(de Groot et al 2009) and that no 2a map was generatedgiven that only one bat virus from this clade was identified

4 Discussion

The emergence of SARS and MERS has driven a need to under-stand more about the diversity ecology and evolution of coro-naviruses particularly at so-called lsquohotspotsrsquo of zoonoticemergence (Jones et al 2008 Drexler et al 2014) To this end wesurveyed the diversity of CoVs from twenty countries in LatinAmerica Africa and Asia to identify global factors driving viraldiversity and to look for regional differences in factors that con-tribute to the risk of emergence such as host switching In totalwe identified sequences from 100 discrete phylogenetic clus-ters ninety-one of which were found in bats

Our data suggest that the diversity of bat CoVs has beendriven primarily by host ecology First viral richness wasstrongly correlated with bat richness suggesting that mostCoVs will be found in regions where bat diversity is highestSecond we showed that CoV diversity separates into three

Figure 7 Viral diversity lsquohotspotrsquo maps Panel A shows the potential hotspots for CoVs based on the distribution of bats worldwide Location of alpha CoV sequences

from this study are shown in black and beta CoV sequences in blue indicating there is no geographical bias based on viral genus (ie alpha and beta CoVs are equally

likely to be found in all regions) Some viral sub-clades were associated with particular bat families and the spatial distribution data of all species belonging to these

families were plotted to indicate the potential hotspots of viral diversity (richness) for these sub-clades Panel B indicates the potential distribution of 2b CoVs based on

the distribution of rhinolphus and hipposideros bats Locations of 2b-positive animals identified in this study are indicated in black and correlate with areas of high

species richness (for these families) CoV-positive animals for other sub-clades shown in light blue Panel C indicates the potential distribution of 2c CoVs based on the

distribution of vespertilionid bats Locations of 2c-positive animals identified in this study are indicated in black This map suggests there are hotspots of 2c diversity

in regions not covered in this study (eg Europe) Panel D indicates the potential distribution of 2d viruses based on the distribution of pteropid bats The map suggests

these viruses may have a more limited distribution compared with viruses of other sub-clades Locations of 2d-positive animals identified in this study are shown in

black

S J Anthony et al | 11

distinct communities by region echoing the distribution of batsand suggesting an ecological dependence on their hosts Andthird we identified particular associations between viral sub-clade and bat family indicating that CoVs have evolved with (oradapted to) preferred families Collectively these data showthat the global diversity and distribution of CoVs in bats is non-random and is driven by variation in the biogeography of bats

Regional variation was also observed in the proportion ofhost switching events relative to other evolutionary mecha-nisms such as co-speciation duplication or sharing (seeMethods for definitions) Overall host switching was the domi-nant mechanism supporting the general trend observed forCoVs (Vijaykrishna et al 2007 Woo et al 2009 Lau et al 2012) aswell as similar findings for other viral families including para-mxyoviruses (Melade et al 2016) hantaviruses (Ramsden et al2009) and arenaviruses (Coulibaly-NrsquoGolo et al 2011 Irwin et al2012) However when analyzed by region there were propor-tionally fewer events in Latin America compared with Africa orAsia Given that host switching is the first critical step in zoo-notic emergence (Li et al 2005Pfefferle et al 2009 Woo et al2012 Azhar et al 2014) we suggest this finding could reflect re-gional differences in the risk of disease emergence

It is important to note that we distinguish lsquohost switchingrsquo(inter-genus or family switching) from lsquosharingrsquo (intra-genusswitching) in our analysis and that while the proportion of hostswitches was lower in Latin America the proportion of sharingevents was concomitantly higher This indicates that CoVs inthis region still switch hosts just like they do in Africa and Asiabut that they preferentially move between closely related spe-cies While we do not yet understand the factors driving this dif-ference if we assume that the risk of spillover to humansreflects the taxonomic distances that CoVs are able to jump(Kreuder Johnson et al 2015) our results would indicate ahigher risk for zoonotic emergence in Africa and Asia [acknowl-edging that host switching is not the only important factor driv-ing zoonotic emergence (Jones et al 2008 Keesing et al 2010Karesh et al 2012)] While evidence of regional variation in hostswitching has not been reported previously for viruses (toour knowledge) similar patterns have been observed inhemosporidian parasites (Ellis et al 2015)

In total we estimated that there are at least 3204 CoVs(based on the cluster definition used in this study) in bats Thissuggests that although there is still a large number of coronavi-ruses that remain to be discovered their detection remains anachievable goal and we advocate for their discovery given thepotential for even greater insights into the global ecology andevolution of these viruses (eg further investigation into re-gional differences in host switching) and the opportunity toelucidate their individual zoonotic potential (Ge et al 2013Wang et al 2014 Yang et al 2014 Anthony et al 2017) Buildingon our study we suggest that future surveillance efforts shouldconsider the number of samples carefully and include up to 400individuals (of either sex) per species in order to maximize thechance of detecting all CoVs and that including less than 154individuals per species might reduce the chance of finding evenone virus and would likely produce a poor return on invest-ment We also recommend that feces (or rectal swabs) followedby saliva (or oral swabs) should be prioritized if resources arelimited and all sample types cannot be processed and thatsampling should be biased towards immature animals and inthe dry season

To predict where this unknown diversity is likely to be weplotted the known distribution of all bat families associatedwith each CoV sub-clade The 2b SARS clade was associated

with rhinolophid and hipposiderid bats so we plotted the distri-bution of all bat species belonging to these families to generatea map of viral diversity (richness) Based on this map we wouldexpect the greatest diversity of unknown 2b viruses to be foundin South East Asia which is consistent with previous studies de-scribing 2b viruses in bats but also in isolated pockets through-out Africa In contrast the 2c MERS clade was associated withvespertilionid bats predicting that 2c diversity will be highest inMexico Europe Central and South East Africa and parts ofSouth East Asia Again this is consistent with the distributionof 2c viruses reported here and previously in the literature(Woo et al 2007 Reusken et al 2010 Anthony et al 2012 Lelliet al 2013 Corman et al 2014ab Anthony et al 2017 )

Certain limitations should be considered in the interpreta-tion of our data First the diversity of CoV sequences detected isalmost certainly biased by our sampling effort A small numberof species were sampled abundantly thus maximizing ourchances of finding a CoV (27282 species had gt110 individuals)however most were under-sampled (232282 with50 individ-uals) While this biases the probability of finding a CoV towardsthe more abundantly sampled species we stress that it also re-flects the uneven distribution of naturally occurring communi-ties (Magurran 2011) and that targeting the very rare specieswould be prohibitively expensive It is also unclear if populationsize has an effect on the number of viruses present in a spe-ciesmdasheg do larger populations accommodate more viral diver-sity Second our analysis only represents the lsquolastrsquo evolutionaryevent identified It does not represent or account for the com-plete evolutionary history of these lineages Therefore while asingle event has been ascribed to each association (seelsquoMethodsrsquo) it does not preclude other events having an equal orgreater impact in the past We are limited to using this most re-cent event type in order to assign a single evolutionary event toeach unique association It should also be noted that events aredependent on the relationships observed in the reconstructedtrees and could change as more viruses are added or longer se-quences are used Third we have only generated partial se-quences for analysis which limits our ability to comment onthe specific zoonotic potential of each virus or provide a com-prehensive analysis of their genetic histories and taxonomy forexample estimating the frequency or impact of recombinationwhich can be an important driver of host switching and diseaseemergence (Anthony et al 2017) We certainly support full-genome sequencing wherever possible [and have such effortsunderway (Anthony et al 2017)] however difficulties in virusisolation or sequencing directly from swabs (where materialand viral load can be low yielding variable results) can oftenpreclude extending sequences much beyond the short frag-ments amplified by consensus PCR (Drexler et al 2014) Equallylogistic and permitting issues in moving biological samplesacross borders and the need to first develop in-country capacityto fully sequence these viruses have all limited our ability tocharacterize (most of) these viruses further

Our lsquostate of the artrsquo is in the unique scope of our study focus-ing on a single viral family at a global scale To achieve this theUSAID PREDICT project first had to establish a strategy for viraldiscovery in twenty different countries many of which had noprior capacity to do this work at all For this reason we used asimple yet highly implementable approach based on cPCRInitially this approach might not appear to be as powerful ashigh throughput sequencing (HTS) techniques however thisstudy has demonstrated the particular and considerablestrengths of cPCR as an affordable screening and discovery tool inresource-poor settings where HTS is still largely unsupported

12 | Virus Evolution 2017 Vol 3 No 1

Studying viral diversity on a global scale is just one compo-nent of a larger strategy to understand the factors driving viraldiversity Ecological and evolutionary mechanisms can contrib-ute differently across scales and we therefore advocate compli-menting the global perspective with studies that focus on entireviral communities (rather than only one viral family) within sin-gle individuals or host species (rather than only on a globalscale) (Anthony et al 2015) We propose that it is critical to un-derstand both the component parts of the lsquozoonotic poolrsquo(Morse et al 2012) and the nature of their interactions at differ-ent scales if we are to move towards better predictive modelsand ultimately to reducing zoonotic emergence

Finally we offer a specific comment regarding the publichealth threat posed by bats While it is tempting to concludethat bats harbor a large number of potentially zoonotic CoVsmost of the putative viruses detected in this study are unlikelyto pose any threat to humansmdasheither because they lack the bio-logical pre-requisites to infect human cells or because the ecol-ogy of their hosts limits the opportunity for spillover Studiessuch as this are intended to advance our understanding of thefundamental biology of viruses not to create alarm or incite theretaliatory culling of bats Indeed such actions often have un-anticipated consequences and can even enhance disease trans-mission as seen with rabies in vampire bats (Streicker et al2012 Blackwood et al 2013) Bats are important insectivorespollinators and seed dispersers and we underscore both their vi-tal ecosystem role and the need to consider any public healthinterventions carefully

Supplementary data

Supplementary data are available at Virus Evolution online

Conflict of interest None declared

Acknowledgements

This study was made possible by the generous support ofthe American people through the United States Agency forInternational Development (USAID) Emerging PandemicThreats PREDICT project (cooperative agreement numberGHN-A-OO-09-00010-00) We thank the governments ofCameroon Gabon Democratic Republic of Congo Republicof Congo Rwanda Tanzania Uganda Peru Bolivia BrazilMexico Bangladesh Cambodia China Indonesia LaosMalaysia Nepal Thailand and Viet Nam for permission toconduct this study and the field teams and collaboratinglaboratories that performed sample collection and testing

ReferencesAnthony S J et al (2012) lsquoCoronaviruses in bats from Mexicorsquo

Journal of General Virology 94 1028ndash38 et al (2015) lsquoNon-random patterns in viral diversityrsquo

Nature Communications 6 8147 et al (2017) lsquoFurther evidence for bats as the evolutionary

source of MERS coronavirusrsquo mBio 82 pii e00373-17 doi101128mBio00373-17

Azhar E I et al (2014) lsquoEvidence for camel-to-human transmis-sion of MERS coronavirusrsquo The New England Journal of Medicine370 2499ndash505

Blackwood J C et al (2013) lsquoResolving the roles of immunitypathogenesis and immigration for rabies persistence in

vampire batsrsquo Proceedings of the National Academy of Sciences ofthe United States of America 110 20837ndash42

Burnham K P and Anderson D R (2002) Model Selection andMultimodel Inference A Practical Information-Theoretic ApproachNew York Springer

Charleston M A (1998) lsquoJungles a new solution to the hostpar-asite phylogeny reconciliation problemrsquo MathematicalBiosciences 149 191ndash223

Conow C et al (2010) lsquoJane a new tool for the cophylogeny recon-struction problemrsquo Algorithms for Molecular Biology AMB 5 16

Corman V M et al (2014a) lsquoRooting the phylogenetic tree ofmiddle east respiratory syndrome coronavirus by characteri-zation of a conspecific virus from an African batrsquo Journal ofVirology 88 11297ndash303

et al (2014b) lsquoCharacterization of a novel betacoronavirusrelated to middle East respiratory syndrome coronavirus inEuropean hedgehogsrsquo Journal of Virology 88 717ndash24

Coulibaly-NrsquoGolo D et al (2011) lsquoNovel arenavirus sequences inHylomyscus sp and Mus (Nannomys) setulosus from CotedrsquoIvoire implications for evolution of arenaviruses in AfricarsquoPLoS One 6 e20893

Darriba D et al (2012) lsquojModelTest 2 more models new heuris-tics and parallel computingrsquo Nature Methods 9 772

de Groot R J et al (2009) lsquoVirus taxonomy classification and no-menclature of virusesrsquo in A M Q King M J Adams E BCarstens and J Lefkkowitch (eds) Ninth Report of theInternational Committee for the Taxonomy of Viruses AmsterdamElsevier

Donaldson E F et al (2010) lsquoMetagenomic analysis of theviromes of three North American bat species viral diversityamong different bat species that share a common habitatrsquoJournal of Virology 84 13004ndash18

Drexler J F Corman V M and Drosten C (2014) lsquoEcology evo-lution and classification of bat coronaviruses in the aftermathof SARSrsquo Antiviral Research 101 45ndash56

Drosten C et al (2003) lsquoIdentification of a novel coronavirus inpatients with severe acute respiratory syndromersquo The NewEngland Journal of Medicine 348 1967ndash76

Edgar R C (2004) lsquoMUSCLE multiple sequence alignment withhigh accuracy and high throughputrsquo Nucleic Acids Research 321792ndash7

Ellis V A et al (2015) lsquoLocal host specialization host-switchingand dispersal shape the regional distributions of avian haemo-sporidian parasitesrsquo Proceedings of the National Academy ofSciences of the United States of America 112 11294ndash9

ESRI (2011) ArcGIS Desktop Release 10 Redlands CAEnvironmental Systems Research Institute

Ge X Y et al (2013) lsquoIsolation and characterization of a batSARS-like coronavirus that uses the ACE2 receptorrsquo Nature503 535ndash8

Hijmans R J Williams E Vennes C (2015) GeosphereSpherical Trigonometry v R Package version 15

Hagberg A A Schult D A and Swart P J (2008) lsquoExploring net-work structure dynamics and function using NetworkXrsquo in GVaroquaux T Vaught and J Millman (eds) Proceedings of 7thPython in Science Conference (SciPy2008) pp 11ndash15 Pasadena CA

Hu B et al (2015) lsquoBat origin of human coronavirusesrsquo VirologyJournal 12 221

Huang Y W et al (2013) lsquoOrigin evolution and genotyping ofemergent porcine epidemic diarrhea virus strains in theUnited Statesrsquo mBio 4 e00737ndash13

Huynh J et al (2012) lsquoEvidence supporting a zoonotic origin ofhuman coronavirus strain NL63rsquo Journal of Virology 8612816ndash25

S J Anthony et al | 13

Irwin N R et al (2012) lsquoComplex patterns of host switching inNew World arenavirusesrsquo Molecular Ecology 21 4137ndash50

Jacomy M et al (2014) lsquoForceAtlas2 a continuous graph layoutalgorithm for handy network visualization designed for theGephi softwarersquo PLoS One 9 e98679

Jones K E et al (2008) lsquoGlobal trends in emerging infectious dis-easesrsquo Nature 451 990ndash3

Jost L Chao A and Chazdon R L (2011) lsquoCompositional simi-larity and beta diversityrsquo in A E Magurran and B J McGill(eds) Biological Diversity Frontiers in Measurement andAssessment Oxford Oxford University Press

Karesh W B et al (2012) lsquoEcology of zoonoses natural and un-natural historiesrsquo Lancet 380 1936ndash45

Keesing F et al (2010) lsquoImpacts of biodiversity on the emergenceand transmission of infectious diseasesrsquo Nature 468 647ndash52

King A M Q et al (2012) Virus Taxonomy Classification andNomenclature of Viruses Ninth Report of the InternationalCommittee on Taxonomy of Viruses Elsevier Academic Press

Kreuder Johnson C et al (2015) lsquoSpillover and pandemic proper-ties of zoonotic viruses with high host plasticityrsquo ScientificReports 5 14830

Ksiazek T G et al (2003) lsquoA novel coronavirus associated withsevere acute respiratory syndromersquo The New England Journal ofMedicine 348 1953ndash66

Lau S K et al (2005) lsquoSevere acute respiratory syndromecoronavirus-like virus in Chinese horseshoe batsrsquo Proceedingsof the National Academy of Sciences of the United States of America102 14040ndash5

et al (2012) lsquoRecent transmission of a novel alphacoronavi-rus bat coronavirus HKU10 from Leschenaultrsquos rousettes topomona leaf-nosed bats first evidence of interspecies trans-mission of coronavirus between bats of different subordersrsquoJournal of Virology 86 11906ndash18

et al (2015) lsquoDiscovery of a novel coronavirus ChinaRattus coronavirus HKU24 from Norway rats supports the mu-rine origin of Betacoronavirus 1 and has implications for theancestor of Betacoronavirus lineage Arsquo Journal of Virology 893076ndash92

Lelli D et al (2013) lsquoDetection of coronaviruses in bats of variousspecies in Italyrsquo Viruses 5 2679ndash89

Li W et al (2005) lsquoBats are natural reservoirs of SARS-like coro-navirusesrsquo Science 310 676ndash9

Liang K and Zeger S L (1986) lsquoLongitudinal data analysis usinggeneralized linear modelsrsquo Biometrika 73 13ndash22

Maes P et al (2009) lsquoA proposal for new criteria for the classifica-tion of hantaviruses based on S and M segment protein se-quencesrsquo Infection Genetics and Evolution Journal of MolecularEpidemiology and Evolutionary Genetics in Infectious Diseases 9813ndash20

Magurran A E (2011) Biological Diversity Frontiers in Measurementand Assessment Oxford Oxford University Press

Melade J et al (2016) lsquoAn eco-epidemiological study of Morbilli-related paramyxovirus infection in Madagascar bats revealshost-switching as the dominant macro-evolutionary mecha-nismrsquo Scientific Reports 6 23752

Merkle D Middendorf M and Wieseke N (2010) lsquoA parameter-adaptive dynamic programming approach for inferring cophy-logeniesrsquo BMC Bioinformatics 11 Suppl 1 S60

Morse S S et al (2012) lsquoPrediction and prevention of the nextpandemic zoonosisrsquo The Lancet 380 1956ndash65

Pfefferle S et al (2009) lsquoDistant relatives of severe acute respira-tory syndrome coronavirus and close relatives of human

coronavirus 229E in bats Ghanarsquo Emerging Infectious Diseases15 1377ndash84

PREDICT_Consortium (2014) Reducing Pandemic Risk PromotingGlobal Health One Health Institute University of CaliforniaDavis Davis CA

Quan P L et al (2010) lsquoIdentification of a severe acute respira-tory syndrome coronavirus-like virus in a leaf-nosed bat inNigeriarsquo mBio 1 pii e00208-10 doi101128mBio00208-10

R Foundation for Statistical Computing (2008) R A Language andEnvironment for Statistical Computing Vienna Austria RFoundation for Statistical Computing

Ramsden C Holmes E C and Charleston M A (2009)lsquoHantavirus evolution in relation to its rodent and insectivorehosts no evidence for codivergencersquo Molecular Biology andEvolution 26 143ndash53

Reusken C B et al (2010) lsquoCirculation of group 2 coronavirusesin a bat species common to urban areas in Western EuropersquoVector Borne Zoonotic Dis 10 785ndash91

et al (2013) lsquoMiddle East respiratory syndrome coronavirusneutralising serum antibodies in dromedary camels a com-parative serological studyrsquo The Lancet Infectious Diseases 13859ndash66

Rubin D B (1987) Multiple Imputation for Nonresponse in SurveysNew York John Wiley and Sons

Sabir J S et al (2016) lsquoCo-circulation of three camel coronavirusspecies and recombination of MERS-CoVs in Saudi ArabiarsquoScience 351 81ndash4

Streicker D G et al (2012) lsquoEcological and anthropogenic driversof rabies exposure in vampire bats implications for transmis-sion and controlrsquo Proceedings Biological Sciencesthe RoyalSociety 279 3384ndash92

Tamura K et al (2013) lsquoMEGA6 molecular evolutionary geneticsanalysis version 60rsquo Molecular Biology and Evolution 30 2725ndash9

Tang X C et al (2006) lsquoPrevalence and genetic diversity of coro-naviruses in bats from Chinarsquo Journal of Virology 80 7481ndash90

Tsoleridis T et al (2016) lsquoDiscovery of novel alphacoronavirusesin European rodents and shrewsrsquo Viruses 8 84

Vijaykrishna D et al (2007) lsquoEvolutionary insights into the ecol-ogy of coronavirusesrsquo Journal of Virology 81 4012ndash20

Vijgen L et al (2005) lsquoComplete genomic sequence of human co-ronavirus OC43 molecular clock analysis suggests a relativelyrecent zoonotic coronavirus transmission eventrsquo Journal ofVirology 79 1595ndash604

Wacharapluesadee S et al (2015) lsquoDiversity of coronavirus inbats from Eastern Thailandrsquo Virology Journal 12 57

Wang Q et al (2014) lsquoBat origins of MERS-CoV supported by batcoronavirus HKU4 usage of human receptor CD26rsquo Cell Host ampMicrobe 16 328ndash37

Wang W et al (2015) lsquoDiscovery diversity and evolution ofnovel coronaviruses sampled from rodents in Chinarsquo Virology474 19ndash27

Watanabe S et al (2010) lsquoBat Coronaviruses and experimentalinfection of bats the Philippinesrsquo Emerging Infectious Diseases16 1217ndash23

Woo P C et al (2006) lsquoMolecular diversity of coronaviruses inbatsrsquo Virology 351 180ndash7

et al (2007) lsquoComparative analysis of twelve genomes ofthree novel group 2c and group 2d coronaviruses reveals uniquegroup and subgroup featuresrsquo Journal of Virology 81 1574ndash85

et al (2009) lsquoCoronavirus diversity phylogeny and inter-species jumpingrsquo Experimental Biology and Medicine 2341117ndash27

14 | Virus Evolution 2017 Vol 3 No 1

et al (2012) lsquoGenetic relatedness of the novel human groupC betacoronavirus to Tylonycteris bat coronavirus HKU4 andPipistrellus bat coronavirus HKU5rsquo Emerging Microbes andInfection 1 e35

Yang Y et al (2014) lsquoReceptor usage and cell entry of bat co-ronavirus HKU4 provide insight into bat-to-human

transmission of MERS coronavirusrsquo Proceedings of theNational Academy of Sciences of the United States of America111 12516ndash21

Zaki A M et al (2012) lsquoIsolation of a novel coronavirus from aman with pneumonia in Saudi Arabiarsquo The New England Journalof Medicine 367 1814ndash20

S J Anthony et al | 15

  • vex012-TF1
Page 10: Global patterns in coronavirus diversity · Since the emergence of Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV) and Middle East Respiratory Syndrom Coronavirus (MERS-CoV)

Figure 5 Relative proportion of evolutionary events leading to observed virushost associations for CoVs in bats Cophylogenetic reconstructions were used to identify

each event (Supplementary Fig S1 Panels AndashD) and significance evaluated by region Across all regions host switching was the dominant evolutionary event (Panel

A) When separated by region host switching remained dominant in Africa (Panel B) and Asia (Panel C) but not in Latin America (Panel D)

Figure 6 Relationship between sampling effort and viral detection among all bat species sampled A Poisson regression model was used to estimate the expected num-

ber of viruses based on the number of animals sampled by species (each circle indicates a separate species) The 95 confidence intervals are indicated

10 | Virus Evolution 2017 Vol 3 No 1

36 Predicting the distribution of unknown CoVs

Viral sub-clade and host family were not independent of eachother (v2 by cladefrac14 Plt 0001 Supplementary Table S1) demon-strating that different clades have significant associations withspecific bat families For example subgroup 2d CoVs were sig-nificantly associated with pteropid bats and were only identi-fied in regions where pteropid bats exist Likewise subgroup 2bCoVs were associated with rhinolophid and hipposiderid batsand 2c with vespertilionid bats To lsquopredictrsquo the potential distri-bution of unknown CoVs we plotted the known distribution ofbats belonging to these families and assume that lsquohotspotsrsquo ofbat diversity infer hotspots of viral diversity for each sub-clade(Fig 7) We note that the alphaCoV genus has been consideredas one group since they do not resolve well into sub-clades(de Groot et al 2009) and that no 2a map was generatedgiven that only one bat virus from this clade was identified

4 Discussion

The emergence of SARS and MERS has driven a need to under-stand more about the diversity ecology and evolution of coro-naviruses particularly at so-called lsquohotspotsrsquo of zoonoticemergence (Jones et al 2008 Drexler et al 2014) To this end wesurveyed the diversity of CoVs from twenty countries in LatinAmerica Africa and Asia to identify global factors driving viraldiversity and to look for regional differences in factors that con-tribute to the risk of emergence such as host switching In totalwe identified sequences from 100 discrete phylogenetic clus-ters ninety-one of which were found in bats

Our data suggest that the diversity of bat CoVs has beendriven primarily by host ecology First viral richness wasstrongly correlated with bat richness suggesting that mostCoVs will be found in regions where bat diversity is highestSecond we showed that CoV diversity separates into three

Figure 7 Viral diversity lsquohotspotrsquo maps Panel A shows the potential hotspots for CoVs based on the distribution of bats worldwide Location of alpha CoV sequences

from this study are shown in black and beta CoV sequences in blue indicating there is no geographical bias based on viral genus (ie alpha and beta CoVs are equally

likely to be found in all regions) Some viral sub-clades were associated with particular bat families and the spatial distribution data of all species belonging to these

families were plotted to indicate the potential hotspots of viral diversity (richness) for these sub-clades Panel B indicates the potential distribution of 2b CoVs based on

the distribution of rhinolphus and hipposideros bats Locations of 2b-positive animals identified in this study are indicated in black and correlate with areas of high

species richness (for these families) CoV-positive animals for other sub-clades shown in light blue Panel C indicates the potential distribution of 2c CoVs based on the

distribution of vespertilionid bats Locations of 2c-positive animals identified in this study are indicated in black This map suggests there are hotspots of 2c diversity

in regions not covered in this study (eg Europe) Panel D indicates the potential distribution of 2d viruses based on the distribution of pteropid bats The map suggests

these viruses may have a more limited distribution compared with viruses of other sub-clades Locations of 2d-positive animals identified in this study are shown in

black

S J Anthony et al | 11

distinct communities by region echoing the distribution of batsand suggesting an ecological dependence on their hosts Andthird we identified particular associations between viral sub-clade and bat family indicating that CoVs have evolved with (oradapted to) preferred families Collectively these data showthat the global diversity and distribution of CoVs in bats is non-random and is driven by variation in the biogeography of bats

Regional variation was also observed in the proportion ofhost switching events relative to other evolutionary mecha-nisms such as co-speciation duplication or sharing (seeMethods for definitions) Overall host switching was the domi-nant mechanism supporting the general trend observed forCoVs (Vijaykrishna et al 2007 Woo et al 2009 Lau et al 2012) aswell as similar findings for other viral families including para-mxyoviruses (Melade et al 2016) hantaviruses (Ramsden et al2009) and arenaviruses (Coulibaly-NrsquoGolo et al 2011 Irwin et al2012) However when analyzed by region there were propor-tionally fewer events in Latin America compared with Africa orAsia Given that host switching is the first critical step in zoo-notic emergence (Li et al 2005Pfefferle et al 2009 Woo et al2012 Azhar et al 2014) we suggest this finding could reflect re-gional differences in the risk of disease emergence

It is important to note that we distinguish lsquohost switchingrsquo(inter-genus or family switching) from lsquosharingrsquo (intra-genusswitching) in our analysis and that while the proportion of hostswitches was lower in Latin America the proportion of sharingevents was concomitantly higher This indicates that CoVs inthis region still switch hosts just like they do in Africa and Asiabut that they preferentially move between closely related spe-cies While we do not yet understand the factors driving this dif-ference if we assume that the risk of spillover to humansreflects the taxonomic distances that CoVs are able to jump(Kreuder Johnson et al 2015) our results would indicate ahigher risk for zoonotic emergence in Africa and Asia [acknowl-edging that host switching is not the only important factor driv-ing zoonotic emergence (Jones et al 2008 Keesing et al 2010Karesh et al 2012)] While evidence of regional variation in hostswitching has not been reported previously for viruses (toour knowledge) similar patterns have been observed inhemosporidian parasites (Ellis et al 2015)

In total we estimated that there are at least 3204 CoVs(based on the cluster definition used in this study) in bats Thissuggests that although there is still a large number of coronavi-ruses that remain to be discovered their detection remains anachievable goal and we advocate for their discovery given thepotential for even greater insights into the global ecology andevolution of these viruses (eg further investigation into re-gional differences in host switching) and the opportunity toelucidate their individual zoonotic potential (Ge et al 2013Wang et al 2014 Yang et al 2014 Anthony et al 2017) Buildingon our study we suggest that future surveillance efforts shouldconsider the number of samples carefully and include up to 400individuals (of either sex) per species in order to maximize thechance of detecting all CoVs and that including less than 154individuals per species might reduce the chance of finding evenone virus and would likely produce a poor return on invest-ment We also recommend that feces (or rectal swabs) followedby saliva (or oral swabs) should be prioritized if resources arelimited and all sample types cannot be processed and thatsampling should be biased towards immature animals and inthe dry season

To predict where this unknown diversity is likely to be weplotted the known distribution of all bat families associatedwith each CoV sub-clade The 2b SARS clade was associated

with rhinolophid and hipposiderid bats so we plotted the distri-bution of all bat species belonging to these families to generatea map of viral diversity (richness) Based on this map we wouldexpect the greatest diversity of unknown 2b viruses to be foundin South East Asia which is consistent with previous studies de-scribing 2b viruses in bats but also in isolated pockets through-out Africa In contrast the 2c MERS clade was associated withvespertilionid bats predicting that 2c diversity will be highest inMexico Europe Central and South East Africa and parts ofSouth East Asia Again this is consistent with the distributionof 2c viruses reported here and previously in the literature(Woo et al 2007 Reusken et al 2010 Anthony et al 2012 Lelliet al 2013 Corman et al 2014ab Anthony et al 2017 )

Certain limitations should be considered in the interpreta-tion of our data First the diversity of CoV sequences detected isalmost certainly biased by our sampling effort A small numberof species were sampled abundantly thus maximizing ourchances of finding a CoV (27282 species had gt110 individuals)however most were under-sampled (232282 with50 individ-uals) While this biases the probability of finding a CoV towardsthe more abundantly sampled species we stress that it also re-flects the uneven distribution of naturally occurring communi-ties (Magurran 2011) and that targeting the very rare specieswould be prohibitively expensive It is also unclear if populationsize has an effect on the number of viruses present in a spe-ciesmdasheg do larger populations accommodate more viral diver-sity Second our analysis only represents the lsquolastrsquo evolutionaryevent identified It does not represent or account for the com-plete evolutionary history of these lineages Therefore while asingle event has been ascribed to each association (seelsquoMethodsrsquo) it does not preclude other events having an equal orgreater impact in the past We are limited to using this most re-cent event type in order to assign a single evolutionary event toeach unique association It should also be noted that events aredependent on the relationships observed in the reconstructedtrees and could change as more viruses are added or longer se-quences are used Third we have only generated partial se-quences for analysis which limits our ability to comment onthe specific zoonotic potential of each virus or provide a com-prehensive analysis of their genetic histories and taxonomy forexample estimating the frequency or impact of recombinationwhich can be an important driver of host switching and diseaseemergence (Anthony et al 2017) We certainly support full-genome sequencing wherever possible [and have such effortsunderway (Anthony et al 2017)] however difficulties in virusisolation or sequencing directly from swabs (where materialand viral load can be low yielding variable results) can oftenpreclude extending sequences much beyond the short frag-ments amplified by consensus PCR (Drexler et al 2014) Equallylogistic and permitting issues in moving biological samplesacross borders and the need to first develop in-country capacityto fully sequence these viruses have all limited our ability tocharacterize (most of) these viruses further

Our lsquostate of the artrsquo is in the unique scope of our study focus-ing on a single viral family at a global scale To achieve this theUSAID PREDICT project first had to establish a strategy for viraldiscovery in twenty different countries many of which had noprior capacity to do this work at all For this reason we used asimple yet highly implementable approach based on cPCRInitially this approach might not appear to be as powerful ashigh throughput sequencing (HTS) techniques however thisstudy has demonstrated the particular and considerablestrengths of cPCR as an affordable screening and discovery tool inresource-poor settings where HTS is still largely unsupported

12 | Virus Evolution 2017 Vol 3 No 1

Studying viral diversity on a global scale is just one compo-nent of a larger strategy to understand the factors driving viraldiversity Ecological and evolutionary mechanisms can contrib-ute differently across scales and we therefore advocate compli-menting the global perspective with studies that focus on entireviral communities (rather than only one viral family) within sin-gle individuals or host species (rather than only on a globalscale) (Anthony et al 2015) We propose that it is critical to un-derstand both the component parts of the lsquozoonotic poolrsquo(Morse et al 2012) and the nature of their interactions at differ-ent scales if we are to move towards better predictive modelsand ultimately to reducing zoonotic emergence

Finally we offer a specific comment regarding the publichealth threat posed by bats While it is tempting to concludethat bats harbor a large number of potentially zoonotic CoVsmost of the putative viruses detected in this study are unlikelyto pose any threat to humansmdasheither because they lack the bio-logical pre-requisites to infect human cells or because the ecol-ogy of their hosts limits the opportunity for spillover Studiessuch as this are intended to advance our understanding of thefundamental biology of viruses not to create alarm or incite theretaliatory culling of bats Indeed such actions often have un-anticipated consequences and can even enhance disease trans-mission as seen with rabies in vampire bats (Streicker et al2012 Blackwood et al 2013) Bats are important insectivorespollinators and seed dispersers and we underscore both their vi-tal ecosystem role and the need to consider any public healthinterventions carefully

Supplementary data

Supplementary data are available at Virus Evolution online

Conflict of interest None declared

Acknowledgements

This study was made possible by the generous support ofthe American people through the United States Agency forInternational Development (USAID) Emerging PandemicThreats PREDICT project (cooperative agreement numberGHN-A-OO-09-00010-00) We thank the governments ofCameroon Gabon Democratic Republic of Congo Republicof Congo Rwanda Tanzania Uganda Peru Bolivia BrazilMexico Bangladesh Cambodia China Indonesia LaosMalaysia Nepal Thailand and Viet Nam for permission toconduct this study and the field teams and collaboratinglaboratories that performed sample collection and testing

ReferencesAnthony S J et al (2012) lsquoCoronaviruses in bats from Mexicorsquo

Journal of General Virology 94 1028ndash38 et al (2015) lsquoNon-random patterns in viral diversityrsquo

Nature Communications 6 8147 et al (2017) lsquoFurther evidence for bats as the evolutionary

source of MERS coronavirusrsquo mBio 82 pii e00373-17 doi101128mBio00373-17

Azhar E I et al (2014) lsquoEvidence for camel-to-human transmis-sion of MERS coronavirusrsquo The New England Journal of Medicine370 2499ndash505

Blackwood J C et al (2013) lsquoResolving the roles of immunitypathogenesis and immigration for rabies persistence in

vampire batsrsquo Proceedings of the National Academy of Sciences ofthe United States of America 110 20837ndash42

Burnham K P and Anderson D R (2002) Model Selection andMultimodel Inference A Practical Information-Theoretic ApproachNew York Springer

Charleston M A (1998) lsquoJungles a new solution to the hostpar-asite phylogeny reconciliation problemrsquo MathematicalBiosciences 149 191ndash223

Conow C et al (2010) lsquoJane a new tool for the cophylogeny recon-struction problemrsquo Algorithms for Molecular Biology AMB 5 16

Corman V M et al (2014a) lsquoRooting the phylogenetic tree ofmiddle east respiratory syndrome coronavirus by characteri-zation of a conspecific virus from an African batrsquo Journal ofVirology 88 11297ndash303

et al (2014b) lsquoCharacterization of a novel betacoronavirusrelated to middle East respiratory syndrome coronavirus inEuropean hedgehogsrsquo Journal of Virology 88 717ndash24

Coulibaly-NrsquoGolo D et al (2011) lsquoNovel arenavirus sequences inHylomyscus sp and Mus (Nannomys) setulosus from CotedrsquoIvoire implications for evolution of arenaviruses in AfricarsquoPLoS One 6 e20893

Darriba D et al (2012) lsquojModelTest 2 more models new heuris-tics and parallel computingrsquo Nature Methods 9 772

de Groot R J et al (2009) lsquoVirus taxonomy classification and no-menclature of virusesrsquo in A M Q King M J Adams E BCarstens and J Lefkkowitch (eds) Ninth Report of theInternational Committee for the Taxonomy of Viruses AmsterdamElsevier

Donaldson E F et al (2010) lsquoMetagenomic analysis of theviromes of three North American bat species viral diversityamong different bat species that share a common habitatrsquoJournal of Virology 84 13004ndash18

Drexler J F Corman V M and Drosten C (2014) lsquoEcology evo-lution and classification of bat coronaviruses in the aftermathof SARSrsquo Antiviral Research 101 45ndash56

Drosten C et al (2003) lsquoIdentification of a novel coronavirus inpatients with severe acute respiratory syndromersquo The NewEngland Journal of Medicine 348 1967ndash76

Edgar R C (2004) lsquoMUSCLE multiple sequence alignment withhigh accuracy and high throughputrsquo Nucleic Acids Research 321792ndash7

Ellis V A et al (2015) lsquoLocal host specialization host-switchingand dispersal shape the regional distributions of avian haemo-sporidian parasitesrsquo Proceedings of the National Academy ofSciences of the United States of America 112 11294ndash9

ESRI (2011) ArcGIS Desktop Release 10 Redlands CAEnvironmental Systems Research Institute

Ge X Y et al (2013) lsquoIsolation and characterization of a batSARS-like coronavirus that uses the ACE2 receptorrsquo Nature503 535ndash8

Hijmans R J Williams E Vennes C (2015) GeosphereSpherical Trigonometry v R Package version 15

Hagberg A A Schult D A and Swart P J (2008) lsquoExploring net-work structure dynamics and function using NetworkXrsquo in GVaroquaux T Vaught and J Millman (eds) Proceedings of 7thPython in Science Conference (SciPy2008) pp 11ndash15 Pasadena CA

Hu B et al (2015) lsquoBat origin of human coronavirusesrsquo VirologyJournal 12 221

Huang Y W et al (2013) lsquoOrigin evolution and genotyping ofemergent porcine epidemic diarrhea virus strains in theUnited Statesrsquo mBio 4 e00737ndash13

Huynh J et al (2012) lsquoEvidence supporting a zoonotic origin ofhuman coronavirus strain NL63rsquo Journal of Virology 8612816ndash25

S J Anthony et al | 13

Irwin N R et al (2012) lsquoComplex patterns of host switching inNew World arenavirusesrsquo Molecular Ecology 21 4137ndash50

Jacomy M et al (2014) lsquoForceAtlas2 a continuous graph layoutalgorithm for handy network visualization designed for theGephi softwarersquo PLoS One 9 e98679

Jones K E et al (2008) lsquoGlobal trends in emerging infectious dis-easesrsquo Nature 451 990ndash3

Jost L Chao A and Chazdon R L (2011) lsquoCompositional simi-larity and beta diversityrsquo in A E Magurran and B J McGill(eds) Biological Diversity Frontiers in Measurement andAssessment Oxford Oxford University Press

Karesh W B et al (2012) lsquoEcology of zoonoses natural and un-natural historiesrsquo Lancet 380 1936ndash45

Keesing F et al (2010) lsquoImpacts of biodiversity on the emergenceand transmission of infectious diseasesrsquo Nature 468 647ndash52

King A M Q et al (2012) Virus Taxonomy Classification andNomenclature of Viruses Ninth Report of the InternationalCommittee on Taxonomy of Viruses Elsevier Academic Press

Kreuder Johnson C et al (2015) lsquoSpillover and pandemic proper-ties of zoonotic viruses with high host plasticityrsquo ScientificReports 5 14830

Ksiazek T G et al (2003) lsquoA novel coronavirus associated withsevere acute respiratory syndromersquo The New England Journal ofMedicine 348 1953ndash66

Lau S K et al (2005) lsquoSevere acute respiratory syndromecoronavirus-like virus in Chinese horseshoe batsrsquo Proceedingsof the National Academy of Sciences of the United States of America102 14040ndash5

et al (2012) lsquoRecent transmission of a novel alphacoronavi-rus bat coronavirus HKU10 from Leschenaultrsquos rousettes topomona leaf-nosed bats first evidence of interspecies trans-mission of coronavirus between bats of different subordersrsquoJournal of Virology 86 11906ndash18

et al (2015) lsquoDiscovery of a novel coronavirus ChinaRattus coronavirus HKU24 from Norway rats supports the mu-rine origin of Betacoronavirus 1 and has implications for theancestor of Betacoronavirus lineage Arsquo Journal of Virology 893076ndash92

Lelli D et al (2013) lsquoDetection of coronaviruses in bats of variousspecies in Italyrsquo Viruses 5 2679ndash89

Li W et al (2005) lsquoBats are natural reservoirs of SARS-like coro-navirusesrsquo Science 310 676ndash9

Liang K and Zeger S L (1986) lsquoLongitudinal data analysis usinggeneralized linear modelsrsquo Biometrika 73 13ndash22

Maes P et al (2009) lsquoA proposal for new criteria for the classifica-tion of hantaviruses based on S and M segment protein se-quencesrsquo Infection Genetics and Evolution Journal of MolecularEpidemiology and Evolutionary Genetics in Infectious Diseases 9813ndash20

Magurran A E (2011) Biological Diversity Frontiers in Measurementand Assessment Oxford Oxford University Press

Melade J et al (2016) lsquoAn eco-epidemiological study of Morbilli-related paramyxovirus infection in Madagascar bats revealshost-switching as the dominant macro-evolutionary mecha-nismrsquo Scientific Reports 6 23752

Merkle D Middendorf M and Wieseke N (2010) lsquoA parameter-adaptive dynamic programming approach for inferring cophy-logeniesrsquo BMC Bioinformatics 11 Suppl 1 S60

Morse S S et al (2012) lsquoPrediction and prevention of the nextpandemic zoonosisrsquo The Lancet 380 1956ndash65

Pfefferle S et al (2009) lsquoDistant relatives of severe acute respira-tory syndrome coronavirus and close relatives of human

coronavirus 229E in bats Ghanarsquo Emerging Infectious Diseases15 1377ndash84

PREDICT_Consortium (2014) Reducing Pandemic Risk PromotingGlobal Health One Health Institute University of CaliforniaDavis Davis CA

Quan P L et al (2010) lsquoIdentification of a severe acute respira-tory syndrome coronavirus-like virus in a leaf-nosed bat inNigeriarsquo mBio 1 pii e00208-10 doi101128mBio00208-10

R Foundation for Statistical Computing (2008) R A Language andEnvironment for Statistical Computing Vienna Austria RFoundation for Statistical Computing

Ramsden C Holmes E C and Charleston M A (2009)lsquoHantavirus evolution in relation to its rodent and insectivorehosts no evidence for codivergencersquo Molecular Biology andEvolution 26 143ndash53

Reusken C B et al (2010) lsquoCirculation of group 2 coronavirusesin a bat species common to urban areas in Western EuropersquoVector Borne Zoonotic Dis 10 785ndash91

et al (2013) lsquoMiddle East respiratory syndrome coronavirusneutralising serum antibodies in dromedary camels a com-parative serological studyrsquo The Lancet Infectious Diseases 13859ndash66

Rubin D B (1987) Multiple Imputation for Nonresponse in SurveysNew York John Wiley and Sons

Sabir J S et al (2016) lsquoCo-circulation of three camel coronavirusspecies and recombination of MERS-CoVs in Saudi ArabiarsquoScience 351 81ndash4

Streicker D G et al (2012) lsquoEcological and anthropogenic driversof rabies exposure in vampire bats implications for transmis-sion and controlrsquo Proceedings Biological Sciencesthe RoyalSociety 279 3384ndash92

Tamura K et al (2013) lsquoMEGA6 molecular evolutionary geneticsanalysis version 60rsquo Molecular Biology and Evolution 30 2725ndash9

Tang X C et al (2006) lsquoPrevalence and genetic diversity of coro-naviruses in bats from Chinarsquo Journal of Virology 80 7481ndash90

Tsoleridis T et al (2016) lsquoDiscovery of novel alphacoronavirusesin European rodents and shrewsrsquo Viruses 8 84

Vijaykrishna D et al (2007) lsquoEvolutionary insights into the ecol-ogy of coronavirusesrsquo Journal of Virology 81 4012ndash20

Vijgen L et al (2005) lsquoComplete genomic sequence of human co-ronavirus OC43 molecular clock analysis suggests a relativelyrecent zoonotic coronavirus transmission eventrsquo Journal ofVirology 79 1595ndash604

Wacharapluesadee S et al (2015) lsquoDiversity of coronavirus inbats from Eastern Thailandrsquo Virology Journal 12 57

Wang Q et al (2014) lsquoBat origins of MERS-CoV supported by batcoronavirus HKU4 usage of human receptor CD26rsquo Cell Host ampMicrobe 16 328ndash37

Wang W et al (2015) lsquoDiscovery diversity and evolution ofnovel coronaviruses sampled from rodents in Chinarsquo Virology474 19ndash27

Watanabe S et al (2010) lsquoBat Coronaviruses and experimentalinfection of bats the Philippinesrsquo Emerging Infectious Diseases16 1217ndash23

Woo P C et al (2006) lsquoMolecular diversity of coronaviruses inbatsrsquo Virology 351 180ndash7

et al (2007) lsquoComparative analysis of twelve genomes ofthree novel group 2c and group 2d coronaviruses reveals uniquegroup and subgroup featuresrsquo Journal of Virology 81 1574ndash85

et al (2009) lsquoCoronavirus diversity phylogeny and inter-species jumpingrsquo Experimental Biology and Medicine 2341117ndash27

14 | Virus Evolution 2017 Vol 3 No 1

et al (2012) lsquoGenetic relatedness of the novel human groupC betacoronavirus to Tylonycteris bat coronavirus HKU4 andPipistrellus bat coronavirus HKU5rsquo Emerging Microbes andInfection 1 e35

Yang Y et al (2014) lsquoReceptor usage and cell entry of bat co-ronavirus HKU4 provide insight into bat-to-human

transmission of MERS coronavirusrsquo Proceedings of theNational Academy of Sciences of the United States of America111 12516ndash21

Zaki A M et al (2012) lsquoIsolation of a novel coronavirus from aman with pneumonia in Saudi Arabiarsquo The New England Journalof Medicine 367 1814ndash20

S J Anthony et al | 15

  • vex012-TF1
Page 11: Global patterns in coronavirus diversity · Since the emergence of Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV) and Middle East Respiratory Syndrom Coronavirus (MERS-CoV)

36 Predicting the distribution of unknown CoVs

Viral sub-clade and host family were not independent of eachother (v2 by cladefrac14 Plt 0001 Supplementary Table S1) demon-strating that different clades have significant associations withspecific bat families For example subgroup 2d CoVs were sig-nificantly associated with pteropid bats and were only identi-fied in regions where pteropid bats exist Likewise subgroup 2bCoVs were associated with rhinolophid and hipposiderid batsand 2c with vespertilionid bats To lsquopredictrsquo the potential distri-bution of unknown CoVs we plotted the known distribution ofbats belonging to these families and assume that lsquohotspotsrsquo ofbat diversity infer hotspots of viral diversity for each sub-clade(Fig 7) We note that the alphaCoV genus has been consideredas one group since they do not resolve well into sub-clades(de Groot et al 2009) and that no 2a map was generatedgiven that only one bat virus from this clade was identified

4 Discussion

The emergence of SARS and MERS has driven a need to under-stand more about the diversity ecology and evolution of coro-naviruses particularly at so-called lsquohotspotsrsquo of zoonoticemergence (Jones et al 2008 Drexler et al 2014) To this end wesurveyed the diversity of CoVs from twenty countries in LatinAmerica Africa and Asia to identify global factors driving viraldiversity and to look for regional differences in factors that con-tribute to the risk of emergence such as host switching In totalwe identified sequences from 100 discrete phylogenetic clus-ters ninety-one of which were found in bats

Our data suggest that the diversity of bat CoVs has beendriven primarily by host ecology First viral richness wasstrongly correlated with bat richness suggesting that mostCoVs will be found in regions where bat diversity is highestSecond we showed that CoV diversity separates into three

Figure 7 Viral diversity lsquohotspotrsquo maps Panel A shows the potential hotspots for CoVs based on the distribution of bats worldwide Location of alpha CoV sequences

from this study are shown in black and beta CoV sequences in blue indicating there is no geographical bias based on viral genus (ie alpha and beta CoVs are equally

likely to be found in all regions) Some viral sub-clades were associated with particular bat families and the spatial distribution data of all species belonging to these

families were plotted to indicate the potential hotspots of viral diversity (richness) for these sub-clades Panel B indicates the potential distribution of 2b CoVs based on

the distribution of rhinolphus and hipposideros bats Locations of 2b-positive animals identified in this study are indicated in black and correlate with areas of high

species richness (for these families) CoV-positive animals for other sub-clades shown in light blue Panel C indicates the potential distribution of 2c CoVs based on the

distribution of vespertilionid bats Locations of 2c-positive animals identified in this study are indicated in black This map suggests there are hotspots of 2c diversity

in regions not covered in this study (eg Europe) Panel D indicates the potential distribution of 2d viruses based on the distribution of pteropid bats The map suggests

these viruses may have a more limited distribution compared with viruses of other sub-clades Locations of 2d-positive animals identified in this study are shown in

black

S J Anthony et al | 11

distinct communities by region echoing the distribution of batsand suggesting an ecological dependence on their hosts Andthird we identified particular associations between viral sub-clade and bat family indicating that CoVs have evolved with (oradapted to) preferred families Collectively these data showthat the global diversity and distribution of CoVs in bats is non-random and is driven by variation in the biogeography of bats

Regional variation was also observed in the proportion ofhost switching events relative to other evolutionary mecha-nisms such as co-speciation duplication or sharing (seeMethods for definitions) Overall host switching was the domi-nant mechanism supporting the general trend observed forCoVs (Vijaykrishna et al 2007 Woo et al 2009 Lau et al 2012) aswell as similar findings for other viral families including para-mxyoviruses (Melade et al 2016) hantaviruses (Ramsden et al2009) and arenaviruses (Coulibaly-NrsquoGolo et al 2011 Irwin et al2012) However when analyzed by region there were propor-tionally fewer events in Latin America compared with Africa orAsia Given that host switching is the first critical step in zoo-notic emergence (Li et al 2005Pfefferle et al 2009 Woo et al2012 Azhar et al 2014) we suggest this finding could reflect re-gional differences in the risk of disease emergence

It is important to note that we distinguish lsquohost switchingrsquo(inter-genus or family switching) from lsquosharingrsquo (intra-genusswitching) in our analysis and that while the proportion of hostswitches was lower in Latin America the proportion of sharingevents was concomitantly higher This indicates that CoVs inthis region still switch hosts just like they do in Africa and Asiabut that they preferentially move between closely related spe-cies While we do not yet understand the factors driving this dif-ference if we assume that the risk of spillover to humansreflects the taxonomic distances that CoVs are able to jump(Kreuder Johnson et al 2015) our results would indicate ahigher risk for zoonotic emergence in Africa and Asia [acknowl-edging that host switching is not the only important factor driv-ing zoonotic emergence (Jones et al 2008 Keesing et al 2010Karesh et al 2012)] While evidence of regional variation in hostswitching has not been reported previously for viruses (toour knowledge) similar patterns have been observed inhemosporidian parasites (Ellis et al 2015)

In total we estimated that there are at least 3204 CoVs(based on the cluster definition used in this study) in bats Thissuggests that although there is still a large number of coronavi-ruses that remain to be discovered their detection remains anachievable goal and we advocate for their discovery given thepotential for even greater insights into the global ecology andevolution of these viruses (eg further investigation into re-gional differences in host switching) and the opportunity toelucidate their individual zoonotic potential (Ge et al 2013Wang et al 2014 Yang et al 2014 Anthony et al 2017) Buildingon our study we suggest that future surveillance efforts shouldconsider the number of samples carefully and include up to 400individuals (of either sex) per species in order to maximize thechance of detecting all CoVs and that including less than 154individuals per species might reduce the chance of finding evenone virus and would likely produce a poor return on invest-ment We also recommend that feces (or rectal swabs) followedby saliva (or oral swabs) should be prioritized if resources arelimited and all sample types cannot be processed and thatsampling should be biased towards immature animals and inthe dry season

To predict where this unknown diversity is likely to be weplotted the known distribution of all bat families associatedwith each CoV sub-clade The 2b SARS clade was associated

with rhinolophid and hipposiderid bats so we plotted the distri-bution of all bat species belonging to these families to generatea map of viral diversity (richness) Based on this map we wouldexpect the greatest diversity of unknown 2b viruses to be foundin South East Asia which is consistent with previous studies de-scribing 2b viruses in bats but also in isolated pockets through-out Africa In contrast the 2c MERS clade was associated withvespertilionid bats predicting that 2c diversity will be highest inMexico Europe Central and South East Africa and parts ofSouth East Asia Again this is consistent with the distributionof 2c viruses reported here and previously in the literature(Woo et al 2007 Reusken et al 2010 Anthony et al 2012 Lelliet al 2013 Corman et al 2014ab Anthony et al 2017 )

Certain limitations should be considered in the interpreta-tion of our data First the diversity of CoV sequences detected isalmost certainly biased by our sampling effort A small numberof species were sampled abundantly thus maximizing ourchances of finding a CoV (27282 species had gt110 individuals)however most were under-sampled (232282 with50 individ-uals) While this biases the probability of finding a CoV towardsthe more abundantly sampled species we stress that it also re-flects the uneven distribution of naturally occurring communi-ties (Magurran 2011) and that targeting the very rare specieswould be prohibitively expensive It is also unclear if populationsize has an effect on the number of viruses present in a spe-ciesmdasheg do larger populations accommodate more viral diver-sity Second our analysis only represents the lsquolastrsquo evolutionaryevent identified It does not represent or account for the com-plete evolutionary history of these lineages Therefore while asingle event has been ascribed to each association (seelsquoMethodsrsquo) it does not preclude other events having an equal orgreater impact in the past We are limited to using this most re-cent event type in order to assign a single evolutionary event toeach unique association It should also be noted that events aredependent on the relationships observed in the reconstructedtrees and could change as more viruses are added or longer se-quences are used Third we have only generated partial se-quences for analysis which limits our ability to comment onthe specific zoonotic potential of each virus or provide a com-prehensive analysis of their genetic histories and taxonomy forexample estimating the frequency or impact of recombinationwhich can be an important driver of host switching and diseaseemergence (Anthony et al 2017) We certainly support full-genome sequencing wherever possible [and have such effortsunderway (Anthony et al 2017)] however difficulties in virusisolation or sequencing directly from swabs (where materialand viral load can be low yielding variable results) can oftenpreclude extending sequences much beyond the short frag-ments amplified by consensus PCR (Drexler et al 2014) Equallylogistic and permitting issues in moving biological samplesacross borders and the need to first develop in-country capacityto fully sequence these viruses have all limited our ability tocharacterize (most of) these viruses further

Our lsquostate of the artrsquo is in the unique scope of our study focus-ing on a single viral family at a global scale To achieve this theUSAID PREDICT project first had to establish a strategy for viraldiscovery in twenty different countries many of which had noprior capacity to do this work at all For this reason we used asimple yet highly implementable approach based on cPCRInitially this approach might not appear to be as powerful ashigh throughput sequencing (HTS) techniques however thisstudy has demonstrated the particular and considerablestrengths of cPCR as an affordable screening and discovery tool inresource-poor settings where HTS is still largely unsupported

12 | Virus Evolution 2017 Vol 3 No 1

Studying viral diversity on a global scale is just one compo-nent of a larger strategy to understand the factors driving viraldiversity Ecological and evolutionary mechanisms can contrib-ute differently across scales and we therefore advocate compli-menting the global perspective with studies that focus on entireviral communities (rather than only one viral family) within sin-gle individuals or host species (rather than only on a globalscale) (Anthony et al 2015) We propose that it is critical to un-derstand both the component parts of the lsquozoonotic poolrsquo(Morse et al 2012) and the nature of their interactions at differ-ent scales if we are to move towards better predictive modelsand ultimately to reducing zoonotic emergence

Finally we offer a specific comment regarding the publichealth threat posed by bats While it is tempting to concludethat bats harbor a large number of potentially zoonotic CoVsmost of the putative viruses detected in this study are unlikelyto pose any threat to humansmdasheither because they lack the bio-logical pre-requisites to infect human cells or because the ecol-ogy of their hosts limits the opportunity for spillover Studiessuch as this are intended to advance our understanding of thefundamental biology of viruses not to create alarm or incite theretaliatory culling of bats Indeed such actions often have un-anticipated consequences and can even enhance disease trans-mission as seen with rabies in vampire bats (Streicker et al2012 Blackwood et al 2013) Bats are important insectivorespollinators and seed dispersers and we underscore both their vi-tal ecosystem role and the need to consider any public healthinterventions carefully

Supplementary data

Supplementary data are available at Virus Evolution online

Conflict of interest None declared

Acknowledgements

This study was made possible by the generous support ofthe American people through the United States Agency forInternational Development (USAID) Emerging PandemicThreats PREDICT project (cooperative agreement numberGHN-A-OO-09-00010-00) We thank the governments ofCameroon Gabon Democratic Republic of Congo Republicof Congo Rwanda Tanzania Uganda Peru Bolivia BrazilMexico Bangladesh Cambodia China Indonesia LaosMalaysia Nepal Thailand and Viet Nam for permission toconduct this study and the field teams and collaboratinglaboratories that performed sample collection and testing

ReferencesAnthony S J et al (2012) lsquoCoronaviruses in bats from Mexicorsquo

Journal of General Virology 94 1028ndash38 et al (2015) lsquoNon-random patterns in viral diversityrsquo

Nature Communications 6 8147 et al (2017) lsquoFurther evidence for bats as the evolutionary

source of MERS coronavirusrsquo mBio 82 pii e00373-17 doi101128mBio00373-17

Azhar E I et al (2014) lsquoEvidence for camel-to-human transmis-sion of MERS coronavirusrsquo The New England Journal of Medicine370 2499ndash505

Blackwood J C et al (2013) lsquoResolving the roles of immunitypathogenesis and immigration for rabies persistence in

vampire batsrsquo Proceedings of the National Academy of Sciences ofthe United States of America 110 20837ndash42

Burnham K P and Anderson D R (2002) Model Selection andMultimodel Inference A Practical Information-Theoretic ApproachNew York Springer

Charleston M A (1998) lsquoJungles a new solution to the hostpar-asite phylogeny reconciliation problemrsquo MathematicalBiosciences 149 191ndash223

Conow C et al (2010) lsquoJane a new tool for the cophylogeny recon-struction problemrsquo Algorithms for Molecular Biology AMB 5 16

Corman V M et al (2014a) lsquoRooting the phylogenetic tree ofmiddle east respiratory syndrome coronavirus by characteri-zation of a conspecific virus from an African batrsquo Journal ofVirology 88 11297ndash303

et al (2014b) lsquoCharacterization of a novel betacoronavirusrelated to middle East respiratory syndrome coronavirus inEuropean hedgehogsrsquo Journal of Virology 88 717ndash24

Coulibaly-NrsquoGolo D et al (2011) lsquoNovel arenavirus sequences inHylomyscus sp and Mus (Nannomys) setulosus from CotedrsquoIvoire implications for evolution of arenaviruses in AfricarsquoPLoS One 6 e20893

Darriba D et al (2012) lsquojModelTest 2 more models new heuris-tics and parallel computingrsquo Nature Methods 9 772

de Groot R J et al (2009) lsquoVirus taxonomy classification and no-menclature of virusesrsquo in A M Q King M J Adams E BCarstens and J Lefkkowitch (eds) Ninth Report of theInternational Committee for the Taxonomy of Viruses AmsterdamElsevier

Donaldson E F et al (2010) lsquoMetagenomic analysis of theviromes of three North American bat species viral diversityamong different bat species that share a common habitatrsquoJournal of Virology 84 13004ndash18

Drexler J F Corman V M and Drosten C (2014) lsquoEcology evo-lution and classification of bat coronaviruses in the aftermathof SARSrsquo Antiviral Research 101 45ndash56

Drosten C et al (2003) lsquoIdentification of a novel coronavirus inpatients with severe acute respiratory syndromersquo The NewEngland Journal of Medicine 348 1967ndash76

Edgar R C (2004) lsquoMUSCLE multiple sequence alignment withhigh accuracy and high throughputrsquo Nucleic Acids Research 321792ndash7

Ellis V A et al (2015) lsquoLocal host specialization host-switchingand dispersal shape the regional distributions of avian haemo-sporidian parasitesrsquo Proceedings of the National Academy ofSciences of the United States of America 112 11294ndash9

ESRI (2011) ArcGIS Desktop Release 10 Redlands CAEnvironmental Systems Research Institute

Ge X Y et al (2013) lsquoIsolation and characterization of a batSARS-like coronavirus that uses the ACE2 receptorrsquo Nature503 535ndash8

Hijmans R J Williams E Vennes C (2015) GeosphereSpherical Trigonometry v R Package version 15

Hagberg A A Schult D A and Swart P J (2008) lsquoExploring net-work structure dynamics and function using NetworkXrsquo in GVaroquaux T Vaught and J Millman (eds) Proceedings of 7thPython in Science Conference (SciPy2008) pp 11ndash15 Pasadena CA

Hu B et al (2015) lsquoBat origin of human coronavirusesrsquo VirologyJournal 12 221

Huang Y W et al (2013) lsquoOrigin evolution and genotyping ofemergent porcine epidemic diarrhea virus strains in theUnited Statesrsquo mBio 4 e00737ndash13

Huynh J et al (2012) lsquoEvidence supporting a zoonotic origin ofhuman coronavirus strain NL63rsquo Journal of Virology 8612816ndash25

S J Anthony et al | 13

Irwin N R et al (2012) lsquoComplex patterns of host switching inNew World arenavirusesrsquo Molecular Ecology 21 4137ndash50

Jacomy M et al (2014) lsquoForceAtlas2 a continuous graph layoutalgorithm for handy network visualization designed for theGephi softwarersquo PLoS One 9 e98679

Jones K E et al (2008) lsquoGlobal trends in emerging infectious dis-easesrsquo Nature 451 990ndash3

Jost L Chao A and Chazdon R L (2011) lsquoCompositional simi-larity and beta diversityrsquo in A E Magurran and B J McGill(eds) Biological Diversity Frontiers in Measurement andAssessment Oxford Oxford University Press

Karesh W B et al (2012) lsquoEcology of zoonoses natural and un-natural historiesrsquo Lancet 380 1936ndash45

Keesing F et al (2010) lsquoImpacts of biodiversity on the emergenceand transmission of infectious diseasesrsquo Nature 468 647ndash52

King A M Q et al (2012) Virus Taxonomy Classification andNomenclature of Viruses Ninth Report of the InternationalCommittee on Taxonomy of Viruses Elsevier Academic Press

Kreuder Johnson C et al (2015) lsquoSpillover and pandemic proper-ties of zoonotic viruses with high host plasticityrsquo ScientificReports 5 14830

Ksiazek T G et al (2003) lsquoA novel coronavirus associated withsevere acute respiratory syndromersquo The New England Journal ofMedicine 348 1953ndash66

Lau S K et al (2005) lsquoSevere acute respiratory syndromecoronavirus-like virus in Chinese horseshoe batsrsquo Proceedingsof the National Academy of Sciences of the United States of America102 14040ndash5

et al (2012) lsquoRecent transmission of a novel alphacoronavi-rus bat coronavirus HKU10 from Leschenaultrsquos rousettes topomona leaf-nosed bats first evidence of interspecies trans-mission of coronavirus between bats of different subordersrsquoJournal of Virology 86 11906ndash18

et al (2015) lsquoDiscovery of a novel coronavirus ChinaRattus coronavirus HKU24 from Norway rats supports the mu-rine origin of Betacoronavirus 1 and has implications for theancestor of Betacoronavirus lineage Arsquo Journal of Virology 893076ndash92

Lelli D et al (2013) lsquoDetection of coronaviruses in bats of variousspecies in Italyrsquo Viruses 5 2679ndash89

Li W et al (2005) lsquoBats are natural reservoirs of SARS-like coro-navirusesrsquo Science 310 676ndash9

Liang K and Zeger S L (1986) lsquoLongitudinal data analysis usinggeneralized linear modelsrsquo Biometrika 73 13ndash22

Maes P et al (2009) lsquoA proposal for new criteria for the classifica-tion of hantaviruses based on S and M segment protein se-quencesrsquo Infection Genetics and Evolution Journal of MolecularEpidemiology and Evolutionary Genetics in Infectious Diseases 9813ndash20

Magurran A E (2011) Biological Diversity Frontiers in Measurementand Assessment Oxford Oxford University Press

Melade J et al (2016) lsquoAn eco-epidemiological study of Morbilli-related paramyxovirus infection in Madagascar bats revealshost-switching as the dominant macro-evolutionary mecha-nismrsquo Scientific Reports 6 23752

Merkle D Middendorf M and Wieseke N (2010) lsquoA parameter-adaptive dynamic programming approach for inferring cophy-logeniesrsquo BMC Bioinformatics 11 Suppl 1 S60

Morse S S et al (2012) lsquoPrediction and prevention of the nextpandemic zoonosisrsquo The Lancet 380 1956ndash65

Pfefferle S et al (2009) lsquoDistant relatives of severe acute respira-tory syndrome coronavirus and close relatives of human

coronavirus 229E in bats Ghanarsquo Emerging Infectious Diseases15 1377ndash84

PREDICT_Consortium (2014) Reducing Pandemic Risk PromotingGlobal Health One Health Institute University of CaliforniaDavis Davis CA

Quan P L et al (2010) lsquoIdentification of a severe acute respira-tory syndrome coronavirus-like virus in a leaf-nosed bat inNigeriarsquo mBio 1 pii e00208-10 doi101128mBio00208-10

R Foundation for Statistical Computing (2008) R A Language andEnvironment for Statistical Computing Vienna Austria RFoundation for Statistical Computing

Ramsden C Holmes E C and Charleston M A (2009)lsquoHantavirus evolution in relation to its rodent and insectivorehosts no evidence for codivergencersquo Molecular Biology andEvolution 26 143ndash53

Reusken C B et al (2010) lsquoCirculation of group 2 coronavirusesin a bat species common to urban areas in Western EuropersquoVector Borne Zoonotic Dis 10 785ndash91

et al (2013) lsquoMiddle East respiratory syndrome coronavirusneutralising serum antibodies in dromedary camels a com-parative serological studyrsquo The Lancet Infectious Diseases 13859ndash66

Rubin D B (1987) Multiple Imputation for Nonresponse in SurveysNew York John Wiley and Sons

Sabir J S et al (2016) lsquoCo-circulation of three camel coronavirusspecies and recombination of MERS-CoVs in Saudi ArabiarsquoScience 351 81ndash4

Streicker D G et al (2012) lsquoEcological and anthropogenic driversof rabies exposure in vampire bats implications for transmis-sion and controlrsquo Proceedings Biological Sciencesthe RoyalSociety 279 3384ndash92

Tamura K et al (2013) lsquoMEGA6 molecular evolutionary geneticsanalysis version 60rsquo Molecular Biology and Evolution 30 2725ndash9

Tang X C et al (2006) lsquoPrevalence and genetic diversity of coro-naviruses in bats from Chinarsquo Journal of Virology 80 7481ndash90

Tsoleridis T et al (2016) lsquoDiscovery of novel alphacoronavirusesin European rodents and shrewsrsquo Viruses 8 84

Vijaykrishna D et al (2007) lsquoEvolutionary insights into the ecol-ogy of coronavirusesrsquo Journal of Virology 81 4012ndash20

Vijgen L et al (2005) lsquoComplete genomic sequence of human co-ronavirus OC43 molecular clock analysis suggests a relativelyrecent zoonotic coronavirus transmission eventrsquo Journal ofVirology 79 1595ndash604

Wacharapluesadee S et al (2015) lsquoDiversity of coronavirus inbats from Eastern Thailandrsquo Virology Journal 12 57

Wang Q et al (2014) lsquoBat origins of MERS-CoV supported by batcoronavirus HKU4 usage of human receptor CD26rsquo Cell Host ampMicrobe 16 328ndash37

Wang W et al (2015) lsquoDiscovery diversity and evolution ofnovel coronaviruses sampled from rodents in Chinarsquo Virology474 19ndash27

Watanabe S et al (2010) lsquoBat Coronaviruses and experimentalinfection of bats the Philippinesrsquo Emerging Infectious Diseases16 1217ndash23

Woo P C et al (2006) lsquoMolecular diversity of coronaviruses inbatsrsquo Virology 351 180ndash7

et al (2007) lsquoComparative analysis of twelve genomes ofthree novel group 2c and group 2d coronaviruses reveals uniquegroup and subgroup featuresrsquo Journal of Virology 81 1574ndash85

et al (2009) lsquoCoronavirus diversity phylogeny and inter-species jumpingrsquo Experimental Biology and Medicine 2341117ndash27

14 | Virus Evolution 2017 Vol 3 No 1

et al (2012) lsquoGenetic relatedness of the novel human groupC betacoronavirus to Tylonycteris bat coronavirus HKU4 andPipistrellus bat coronavirus HKU5rsquo Emerging Microbes andInfection 1 e35

Yang Y et al (2014) lsquoReceptor usage and cell entry of bat co-ronavirus HKU4 provide insight into bat-to-human

transmission of MERS coronavirusrsquo Proceedings of theNational Academy of Sciences of the United States of America111 12516ndash21

Zaki A M et al (2012) lsquoIsolation of a novel coronavirus from aman with pneumonia in Saudi Arabiarsquo The New England Journalof Medicine 367 1814ndash20

S J Anthony et al | 15

  • vex012-TF1
Page 12: Global patterns in coronavirus diversity · Since the emergence of Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV) and Middle East Respiratory Syndrom Coronavirus (MERS-CoV)

distinct communities by region echoing the distribution of batsand suggesting an ecological dependence on their hosts Andthird we identified particular associations between viral sub-clade and bat family indicating that CoVs have evolved with (oradapted to) preferred families Collectively these data showthat the global diversity and distribution of CoVs in bats is non-random and is driven by variation in the biogeography of bats

Regional variation was also observed in the proportion ofhost switching events relative to other evolutionary mecha-nisms such as co-speciation duplication or sharing (seeMethods for definitions) Overall host switching was the domi-nant mechanism supporting the general trend observed forCoVs (Vijaykrishna et al 2007 Woo et al 2009 Lau et al 2012) aswell as similar findings for other viral families including para-mxyoviruses (Melade et al 2016) hantaviruses (Ramsden et al2009) and arenaviruses (Coulibaly-NrsquoGolo et al 2011 Irwin et al2012) However when analyzed by region there were propor-tionally fewer events in Latin America compared with Africa orAsia Given that host switching is the first critical step in zoo-notic emergence (Li et al 2005Pfefferle et al 2009 Woo et al2012 Azhar et al 2014) we suggest this finding could reflect re-gional differences in the risk of disease emergence

It is important to note that we distinguish lsquohost switchingrsquo(inter-genus or family switching) from lsquosharingrsquo (intra-genusswitching) in our analysis and that while the proportion of hostswitches was lower in Latin America the proportion of sharingevents was concomitantly higher This indicates that CoVs inthis region still switch hosts just like they do in Africa and Asiabut that they preferentially move between closely related spe-cies While we do not yet understand the factors driving this dif-ference if we assume that the risk of spillover to humansreflects the taxonomic distances that CoVs are able to jump(Kreuder Johnson et al 2015) our results would indicate ahigher risk for zoonotic emergence in Africa and Asia [acknowl-edging that host switching is not the only important factor driv-ing zoonotic emergence (Jones et al 2008 Keesing et al 2010Karesh et al 2012)] While evidence of regional variation in hostswitching has not been reported previously for viruses (toour knowledge) similar patterns have been observed inhemosporidian parasites (Ellis et al 2015)

In total we estimated that there are at least 3204 CoVs(based on the cluster definition used in this study) in bats Thissuggests that although there is still a large number of coronavi-ruses that remain to be discovered their detection remains anachievable goal and we advocate for their discovery given thepotential for even greater insights into the global ecology andevolution of these viruses (eg further investigation into re-gional differences in host switching) and the opportunity toelucidate their individual zoonotic potential (Ge et al 2013Wang et al 2014 Yang et al 2014 Anthony et al 2017) Buildingon our study we suggest that future surveillance efforts shouldconsider the number of samples carefully and include up to 400individuals (of either sex) per species in order to maximize thechance of detecting all CoVs and that including less than 154individuals per species might reduce the chance of finding evenone virus and would likely produce a poor return on invest-ment We also recommend that feces (or rectal swabs) followedby saliva (or oral swabs) should be prioritized if resources arelimited and all sample types cannot be processed and thatsampling should be biased towards immature animals and inthe dry season

To predict where this unknown diversity is likely to be weplotted the known distribution of all bat families associatedwith each CoV sub-clade The 2b SARS clade was associated

with rhinolophid and hipposiderid bats so we plotted the distri-bution of all bat species belonging to these families to generatea map of viral diversity (richness) Based on this map we wouldexpect the greatest diversity of unknown 2b viruses to be foundin South East Asia which is consistent with previous studies de-scribing 2b viruses in bats but also in isolated pockets through-out Africa In contrast the 2c MERS clade was associated withvespertilionid bats predicting that 2c diversity will be highest inMexico Europe Central and South East Africa and parts ofSouth East Asia Again this is consistent with the distributionof 2c viruses reported here and previously in the literature(Woo et al 2007 Reusken et al 2010 Anthony et al 2012 Lelliet al 2013 Corman et al 2014ab Anthony et al 2017 )

Certain limitations should be considered in the interpreta-tion of our data First the diversity of CoV sequences detected isalmost certainly biased by our sampling effort A small numberof species were sampled abundantly thus maximizing ourchances of finding a CoV (27282 species had gt110 individuals)however most were under-sampled (232282 with50 individ-uals) While this biases the probability of finding a CoV towardsthe more abundantly sampled species we stress that it also re-flects the uneven distribution of naturally occurring communi-ties (Magurran 2011) and that targeting the very rare specieswould be prohibitively expensive It is also unclear if populationsize has an effect on the number of viruses present in a spe-ciesmdasheg do larger populations accommodate more viral diver-sity Second our analysis only represents the lsquolastrsquo evolutionaryevent identified It does not represent or account for the com-plete evolutionary history of these lineages Therefore while asingle event has been ascribed to each association (seelsquoMethodsrsquo) it does not preclude other events having an equal orgreater impact in the past We are limited to using this most re-cent event type in order to assign a single evolutionary event toeach unique association It should also be noted that events aredependent on the relationships observed in the reconstructedtrees and could change as more viruses are added or longer se-quences are used Third we have only generated partial se-quences for analysis which limits our ability to comment onthe specific zoonotic potential of each virus or provide a com-prehensive analysis of their genetic histories and taxonomy forexample estimating the frequency or impact of recombinationwhich can be an important driver of host switching and diseaseemergence (Anthony et al 2017) We certainly support full-genome sequencing wherever possible [and have such effortsunderway (Anthony et al 2017)] however difficulties in virusisolation or sequencing directly from swabs (where materialand viral load can be low yielding variable results) can oftenpreclude extending sequences much beyond the short frag-ments amplified by consensus PCR (Drexler et al 2014) Equallylogistic and permitting issues in moving biological samplesacross borders and the need to first develop in-country capacityto fully sequence these viruses have all limited our ability tocharacterize (most of) these viruses further

Our lsquostate of the artrsquo is in the unique scope of our study focus-ing on a single viral family at a global scale To achieve this theUSAID PREDICT project first had to establish a strategy for viraldiscovery in twenty different countries many of which had noprior capacity to do this work at all For this reason we used asimple yet highly implementable approach based on cPCRInitially this approach might not appear to be as powerful ashigh throughput sequencing (HTS) techniques however thisstudy has demonstrated the particular and considerablestrengths of cPCR as an affordable screening and discovery tool inresource-poor settings where HTS is still largely unsupported

12 | Virus Evolution 2017 Vol 3 No 1

Studying viral diversity on a global scale is just one compo-nent of a larger strategy to understand the factors driving viraldiversity Ecological and evolutionary mechanisms can contrib-ute differently across scales and we therefore advocate compli-menting the global perspective with studies that focus on entireviral communities (rather than only one viral family) within sin-gle individuals or host species (rather than only on a globalscale) (Anthony et al 2015) We propose that it is critical to un-derstand both the component parts of the lsquozoonotic poolrsquo(Morse et al 2012) and the nature of their interactions at differ-ent scales if we are to move towards better predictive modelsand ultimately to reducing zoonotic emergence

Finally we offer a specific comment regarding the publichealth threat posed by bats While it is tempting to concludethat bats harbor a large number of potentially zoonotic CoVsmost of the putative viruses detected in this study are unlikelyto pose any threat to humansmdasheither because they lack the bio-logical pre-requisites to infect human cells or because the ecol-ogy of their hosts limits the opportunity for spillover Studiessuch as this are intended to advance our understanding of thefundamental biology of viruses not to create alarm or incite theretaliatory culling of bats Indeed such actions often have un-anticipated consequences and can even enhance disease trans-mission as seen with rabies in vampire bats (Streicker et al2012 Blackwood et al 2013) Bats are important insectivorespollinators and seed dispersers and we underscore both their vi-tal ecosystem role and the need to consider any public healthinterventions carefully

Supplementary data

Supplementary data are available at Virus Evolution online

Conflict of interest None declared

Acknowledgements

This study was made possible by the generous support ofthe American people through the United States Agency forInternational Development (USAID) Emerging PandemicThreats PREDICT project (cooperative agreement numberGHN-A-OO-09-00010-00) We thank the governments ofCameroon Gabon Democratic Republic of Congo Republicof Congo Rwanda Tanzania Uganda Peru Bolivia BrazilMexico Bangladesh Cambodia China Indonesia LaosMalaysia Nepal Thailand and Viet Nam for permission toconduct this study and the field teams and collaboratinglaboratories that performed sample collection and testing

ReferencesAnthony S J et al (2012) lsquoCoronaviruses in bats from Mexicorsquo

Journal of General Virology 94 1028ndash38 et al (2015) lsquoNon-random patterns in viral diversityrsquo

Nature Communications 6 8147 et al (2017) lsquoFurther evidence for bats as the evolutionary

source of MERS coronavirusrsquo mBio 82 pii e00373-17 doi101128mBio00373-17

Azhar E I et al (2014) lsquoEvidence for camel-to-human transmis-sion of MERS coronavirusrsquo The New England Journal of Medicine370 2499ndash505

Blackwood J C et al (2013) lsquoResolving the roles of immunitypathogenesis and immigration for rabies persistence in

vampire batsrsquo Proceedings of the National Academy of Sciences ofthe United States of America 110 20837ndash42

Burnham K P and Anderson D R (2002) Model Selection andMultimodel Inference A Practical Information-Theoretic ApproachNew York Springer

Charleston M A (1998) lsquoJungles a new solution to the hostpar-asite phylogeny reconciliation problemrsquo MathematicalBiosciences 149 191ndash223

Conow C et al (2010) lsquoJane a new tool for the cophylogeny recon-struction problemrsquo Algorithms for Molecular Biology AMB 5 16

Corman V M et al (2014a) lsquoRooting the phylogenetic tree ofmiddle east respiratory syndrome coronavirus by characteri-zation of a conspecific virus from an African batrsquo Journal ofVirology 88 11297ndash303

et al (2014b) lsquoCharacterization of a novel betacoronavirusrelated to middle East respiratory syndrome coronavirus inEuropean hedgehogsrsquo Journal of Virology 88 717ndash24

Coulibaly-NrsquoGolo D et al (2011) lsquoNovel arenavirus sequences inHylomyscus sp and Mus (Nannomys) setulosus from CotedrsquoIvoire implications for evolution of arenaviruses in AfricarsquoPLoS One 6 e20893

Darriba D et al (2012) lsquojModelTest 2 more models new heuris-tics and parallel computingrsquo Nature Methods 9 772

de Groot R J et al (2009) lsquoVirus taxonomy classification and no-menclature of virusesrsquo in A M Q King M J Adams E BCarstens and J Lefkkowitch (eds) Ninth Report of theInternational Committee for the Taxonomy of Viruses AmsterdamElsevier

Donaldson E F et al (2010) lsquoMetagenomic analysis of theviromes of three North American bat species viral diversityamong different bat species that share a common habitatrsquoJournal of Virology 84 13004ndash18

Drexler J F Corman V M and Drosten C (2014) lsquoEcology evo-lution and classification of bat coronaviruses in the aftermathof SARSrsquo Antiviral Research 101 45ndash56

Drosten C et al (2003) lsquoIdentification of a novel coronavirus inpatients with severe acute respiratory syndromersquo The NewEngland Journal of Medicine 348 1967ndash76

Edgar R C (2004) lsquoMUSCLE multiple sequence alignment withhigh accuracy and high throughputrsquo Nucleic Acids Research 321792ndash7

Ellis V A et al (2015) lsquoLocal host specialization host-switchingand dispersal shape the regional distributions of avian haemo-sporidian parasitesrsquo Proceedings of the National Academy ofSciences of the United States of America 112 11294ndash9

ESRI (2011) ArcGIS Desktop Release 10 Redlands CAEnvironmental Systems Research Institute

Ge X Y et al (2013) lsquoIsolation and characterization of a batSARS-like coronavirus that uses the ACE2 receptorrsquo Nature503 535ndash8

Hijmans R J Williams E Vennes C (2015) GeosphereSpherical Trigonometry v R Package version 15

Hagberg A A Schult D A and Swart P J (2008) lsquoExploring net-work structure dynamics and function using NetworkXrsquo in GVaroquaux T Vaught and J Millman (eds) Proceedings of 7thPython in Science Conference (SciPy2008) pp 11ndash15 Pasadena CA

Hu B et al (2015) lsquoBat origin of human coronavirusesrsquo VirologyJournal 12 221

Huang Y W et al (2013) lsquoOrigin evolution and genotyping ofemergent porcine epidemic diarrhea virus strains in theUnited Statesrsquo mBio 4 e00737ndash13

Huynh J et al (2012) lsquoEvidence supporting a zoonotic origin ofhuman coronavirus strain NL63rsquo Journal of Virology 8612816ndash25

S J Anthony et al | 13

Irwin N R et al (2012) lsquoComplex patterns of host switching inNew World arenavirusesrsquo Molecular Ecology 21 4137ndash50

Jacomy M et al (2014) lsquoForceAtlas2 a continuous graph layoutalgorithm for handy network visualization designed for theGephi softwarersquo PLoS One 9 e98679

Jones K E et al (2008) lsquoGlobal trends in emerging infectious dis-easesrsquo Nature 451 990ndash3

Jost L Chao A and Chazdon R L (2011) lsquoCompositional simi-larity and beta diversityrsquo in A E Magurran and B J McGill(eds) Biological Diversity Frontiers in Measurement andAssessment Oxford Oxford University Press

Karesh W B et al (2012) lsquoEcology of zoonoses natural and un-natural historiesrsquo Lancet 380 1936ndash45

Keesing F et al (2010) lsquoImpacts of biodiversity on the emergenceand transmission of infectious diseasesrsquo Nature 468 647ndash52

King A M Q et al (2012) Virus Taxonomy Classification andNomenclature of Viruses Ninth Report of the InternationalCommittee on Taxonomy of Viruses Elsevier Academic Press

Kreuder Johnson C et al (2015) lsquoSpillover and pandemic proper-ties of zoonotic viruses with high host plasticityrsquo ScientificReports 5 14830

Ksiazek T G et al (2003) lsquoA novel coronavirus associated withsevere acute respiratory syndromersquo The New England Journal ofMedicine 348 1953ndash66

Lau S K et al (2005) lsquoSevere acute respiratory syndromecoronavirus-like virus in Chinese horseshoe batsrsquo Proceedingsof the National Academy of Sciences of the United States of America102 14040ndash5

et al (2012) lsquoRecent transmission of a novel alphacoronavi-rus bat coronavirus HKU10 from Leschenaultrsquos rousettes topomona leaf-nosed bats first evidence of interspecies trans-mission of coronavirus between bats of different subordersrsquoJournal of Virology 86 11906ndash18

et al (2015) lsquoDiscovery of a novel coronavirus ChinaRattus coronavirus HKU24 from Norway rats supports the mu-rine origin of Betacoronavirus 1 and has implications for theancestor of Betacoronavirus lineage Arsquo Journal of Virology 893076ndash92

Lelli D et al (2013) lsquoDetection of coronaviruses in bats of variousspecies in Italyrsquo Viruses 5 2679ndash89

Li W et al (2005) lsquoBats are natural reservoirs of SARS-like coro-navirusesrsquo Science 310 676ndash9

Liang K and Zeger S L (1986) lsquoLongitudinal data analysis usinggeneralized linear modelsrsquo Biometrika 73 13ndash22

Maes P et al (2009) lsquoA proposal for new criteria for the classifica-tion of hantaviruses based on S and M segment protein se-quencesrsquo Infection Genetics and Evolution Journal of MolecularEpidemiology and Evolutionary Genetics in Infectious Diseases 9813ndash20

Magurran A E (2011) Biological Diversity Frontiers in Measurementand Assessment Oxford Oxford University Press

Melade J et al (2016) lsquoAn eco-epidemiological study of Morbilli-related paramyxovirus infection in Madagascar bats revealshost-switching as the dominant macro-evolutionary mecha-nismrsquo Scientific Reports 6 23752

Merkle D Middendorf M and Wieseke N (2010) lsquoA parameter-adaptive dynamic programming approach for inferring cophy-logeniesrsquo BMC Bioinformatics 11 Suppl 1 S60

Morse S S et al (2012) lsquoPrediction and prevention of the nextpandemic zoonosisrsquo The Lancet 380 1956ndash65

Pfefferle S et al (2009) lsquoDistant relatives of severe acute respira-tory syndrome coronavirus and close relatives of human

coronavirus 229E in bats Ghanarsquo Emerging Infectious Diseases15 1377ndash84

PREDICT_Consortium (2014) Reducing Pandemic Risk PromotingGlobal Health One Health Institute University of CaliforniaDavis Davis CA

Quan P L et al (2010) lsquoIdentification of a severe acute respira-tory syndrome coronavirus-like virus in a leaf-nosed bat inNigeriarsquo mBio 1 pii e00208-10 doi101128mBio00208-10

R Foundation for Statistical Computing (2008) R A Language andEnvironment for Statistical Computing Vienna Austria RFoundation for Statistical Computing

Ramsden C Holmes E C and Charleston M A (2009)lsquoHantavirus evolution in relation to its rodent and insectivorehosts no evidence for codivergencersquo Molecular Biology andEvolution 26 143ndash53

Reusken C B et al (2010) lsquoCirculation of group 2 coronavirusesin a bat species common to urban areas in Western EuropersquoVector Borne Zoonotic Dis 10 785ndash91

et al (2013) lsquoMiddle East respiratory syndrome coronavirusneutralising serum antibodies in dromedary camels a com-parative serological studyrsquo The Lancet Infectious Diseases 13859ndash66

Rubin D B (1987) Multiple Imputation for Nonresponse in SurveysNew York John Wiley and Sons

Sabir J S et al (2016) lsquoCo-circulation of three camel coronavirusspecies and recombination of MERS-CoVs in Saudi ArabiarsquoScience 351 81ndash4

Streicker D G et al (2012) lsquoEcological and anthropogenic driversof rabies exposure in vampire bats implications for transmis-sion and controlrsquo Proceedings Biological Sciencesthe RoyalSociety 279 3384ndash92

Tamura K et al (2013) lsquoMEGA6 molecular evolutionary geneticsanalysis version 60rsquo Molecular Biology and Evolution 30 2725ndash9

Tang X C et al (2006) lsquoPrevalence and genetic diversity of coro-naviruses in bats from Chinarsquo Journal of Virology 80 7481ndash90

Tsoleridis T et al (2016) lsquoDiscovery of novel alphacoronavirusesin European rodents and shrewsrsquo Viruses 8 84

Vijaykrishna D et al (2007) lsquoEvolutionary insights into the ecol-ogy of coronavirusesrsquo Journal of Virology 81 4012ndash20

Vijgen L et al (2005) lsquoComplete genomic sequence of human co-ronavirus OC43 molecular clock analysis suggests a relativelyrecent zoonotic coronavirus transmission eventrsquo Journal ofVirology 79 1595ndash604

Wacharapluesadee S et al (2015) lsquoDiversity of coronavirus inbats from Eastern Thailandrsquo Virology Journal 12 57

Wang Q et al (2014) lsquoBat origins of MERS-CoV supported by batcoronavirus HKU4 usage of human receptor CD26rsquo Cell Host ampMicrobe 16 328ndash37

Wang W et al (2015) lsquoDiscovery diversity and evolution ofnovel coronaviruses sampled from rodents in Chinarsquo Virology474 19ndash27

Watanabe S et al (2010) lsquoBat Coronaviruses and experimentalinfection of bats the Philippinesrsquo Emerging Infectious Diseases16 1217ndash23

Woo P C et al (2006) lsquoMolecular diversity of coronaviruses inbatsrsquo Virology 351 180ndash7

et al (2007) lsquoComparative analysis of twelve genomes ofthree novel group 2c and group 2d coronaviruses reveals uniquegroup and subgroup featuresrsquo Journal of Virology 81 1574ndash85

et al (2009) lsquoCoronavirus diversity phylogeny and inter-species jumpingrsquo Experimental Biology and Medicine 2341117ndash27

14 | Virus Evolution 2017 Vol 3 No 1

et al (2012) lsquoGenetic relatedness of the novel human groupC betacoronavirus to Tylonycteris bat coronavirus HKU4 andPipistrellus bat coronavirus HKU5rsquo Emerging Microbes andInfection 1 e35

Yang Y et al (2014) lsquoReceptor usage and cell entry of bat co-ronavirus HKU4 provide insight into bat-to-human

transmission of MERS coronavirusrsquo Proceedings of theNational Academy of Sciences of the United States of America111 12516ndash21

Zaki A M et al (2012) lsquoIsolation of a novel coronavirus from aman with pneumonia in Saudi Arabiarsquo The New England Journalof Medicine 367 1814ndash20

S J Anthony et al | 15

  • vex012-TF1
Page 13: Global patterns in coronavirus diversity · Since the emergence of Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV) and Middle East Respiratory Syndrom Coronavirus (MERS-CoV)

Studying viral diversity on a global scale is just one compo-nent of a larger strategy to understand the factors driving viraldiversity Ecological and evolutionary mechanisms can contrib-ute differently across scales and we therefore advocate compli-menting the global perspective with studies that focus on entireviral communities (rather than only one viral family) within sin-gle individuals or host species (rather than only on a globalscale) (Anthony et al 2015) We propose that it is critical to un-derstand both the component parts of the lsquozoonotic poolrsquo(Morse et al 2012) and the nature of their interactions at differ-ent scales if we are to move towards better predictive modelsand ultimately to reducing zoonotic emergence

Finally we offer a specific comment regarding the publichealth threat posed by bats While it is tempting to concludethat bats harbor a large number of potentially zoonotic CoVsmost of the putative viruses detected in this study are unlikelyto pose any threat to humansmdasheither because they lack the bio-logical pre-requisites to infect human cells or because the ecol-ogy of their hosts limits the opportunity for spillover Studiessuch as this are intended to advance our understanding of thefundamental biology of viruses not to create alarm or incite theretaliatory culling of bats Indeed such actions often have un-anticipated consequences and can even enhance disease trans-mission as seen with rabies in vampire bats (Streicker et al2012 Blackwood et al 2013) Bats are important insectivorespollinators and seed dispersers and we underscore both their vi-tal ecosystem role and the need to consider any public healthinterventions carefully

Supplementary data

Supplementary data are available at Virus Evolution online

Conflict of interest None declared

Acknowledgements

This study was made possible by the generous support ofthe American people through the United States Agency forInternational Development (USAID) Emerging PandemicThreats PREDICT project (cooperative agreement numberGHN-A-OO-09-00010-00) We thank the governments ofCameroon Gabon Democratic Republic of Congo Republicof Congo Rwanda Tanzania Uganda Peru Bolivia BrazilMexico Bangladesh Cambodia China Indonesia LaosMalaysia Nepal Thailand and Viet Nam for permission toconduct this study and the field teams and collaboratinglaboratories that performed sample collection and testing

ReferencesAnthony S J et al (2012) lsquoCoronaviruses in bats from Mexicorsquo

Journal of General Virology 94 1028ndash38 et al (2015) lsquoNon-random patterns in viral diversityrsquo

Nature Communications 6 8147 et al (2017) lsquoFurther evidence for bats as the evolutionary

source of MERS coronavirusrsquo mBio 82 pii e00373-17 doi101128mBio00373-17

Azhar E I et al (2014) lsquoEvidence for camel-to-human transmis-sion of MERS coronavirusrsquo The New England Journal of Medicine370 2499ndash505

Blackwood J C et al (2013) lsquoResolving the roles of immunitypathogenesis and immigration for rabies persistence in

vampire batsrsquo Proceedings of the National Academy of Sciences ofthe United States of America 110 20837ndash42

Burnham K P and Anderson D R (2002) Model Selection andMultimodel Inference A Practical Information-Theoretic ApproachNew York Springer

Charleston M A (1998) lsquoJungles a new solution to the hostpar-asite phylogeny reconciliation problemrsquo MathematicalBiosciences 149 191ndash223

Conow C et al (2010) lsquoJane a new tool for the cophylogeny recon-struction problemrsquo Algorithms for Molecular Biology AMB 5 16

Corman V M et al (2014a) lsquoRooting the phylogenetic tree ofmiddle east respiratory syndrome coronavirus by characteri-zation of a conspecific virus from an African batrsquo Journal ofVirology 88 11297ndash303

et al (2014b) lsquoCharacterization of a novel betacoronavirusrelated to middle East respiratory syndrome coronavirus inEuropean hedgehogsrsquo Journal of Virology 88 717ndash24

Coulibaly-NrsquoGolo D et al (2011) lsquoNovel arenavirus sequences inHylomyscus sp and Mus (Nannomys) setulosus from CotedrsquoIvoire implications for evolution of arenaviruses in AfricarsquoPLoS One 6 e20893

Darriba D et al (2012) lsquojModelTest 2 more models new heuris-tics and parallel computingrsquo Nature Methods 9 772

de Groot R J et al (2009) lsquoVirus taxonomy classification and no-menclature of virusesrsquo in A M Q King M J Adams E BCarstens and J Lefkkowitch (eds) Ninth Report of theInternational Committee for the Taxonomy of Viruses AmsterdamElsevier

Donaldson E F et al (2010) lsquoMetagenomic analysis of theviromes of three North American bat species viral diversityamong different bat species that share a common habitatrsquoJournal of Virology 84 13004ndash18

Drexler J F Corman V M and Drosten C (2014) lsquoEcology evo-lution and classification of bat coronaviruses in the aftermathof SARSrsquo Antiviral Research 101 45ndash56

Drosten C et al (2003) lsquoIdentification of a novel coronavirus inpatients with severe acute respiratory syndromersquo The NewEngland Journal of Medicine 348 1967ndash76

Edgar R C (2004) lsquoMUSCLE multiple sequence alignment withhigh accuracy and high throughputrsquo Nucleic Acids Research 321792ndash7

Ellis V A et al (2015) lsquoLocal host specialization host-switchingand dispersal shape the regional distributions of avian haemo-sporidian parasitesrsquo Proceedings of the National Academy ofSciences of the United States of America 112 11294ndash9

ESRI (2011) ArcGIS Desktop Release 10 Redlands CAEnvironmental Systems Research Institute

Ge X Y et al (2013) lsquoIsolation and characterization of a batSARS-like coronavirus that uses the ACE2 receptorrsquo Nature503 535ndash8

Hijmans R J Williams E Vennes C (2015) GeosphereSpherical Trigonometry v R Package version 15

Hagberg A A Schult D A and Swart P J (2008) lsquoExploring net-work structure dynamics and function using NetworkXrsquo in GVaroquaux T Vaught and J Millman (eds) Proceedings of 7thPython in Science Conference (SciPy2008) pp 11ndash15 Pasadena CA

Hu B et al (2015) lsquoBat origin of human coronavirusesrsquo VirologyJournal 12 221

Huang Y W et al (2013) lsquoOrigin evolution and genotyping ofemergent porcine epidemic diarrhea virus strains in theUnited Statesrsquo mBio 4 e00737ndash13

Huynh J et al (2012) lsquoEvidence supporting a zoonotic origin ofhuman coronavirus strain NL63rsquo Journal of Virology 8612816ndash25

S J Anthony et al | 13

Irwin N R et al (2012) lsquoComplex patterns of host switching inNew World arenavirusesrsquo Molecular Ecology 21 4137ndash50

Jacomy M et al (2014) lsquoForceAtlas2 a continuous graph layoutalgorithm for handy network visualization designed for theGephi softwarersquo PLoS One 9 e98679

Jones K E et al (2008) lsquoGlobal trends in emerging infectious dis-easesrsquo Nature 451 990ndash3

Jost L Chao A and Chazdon R L (2011) lsquoCompositional simi-larity and beta diversityrsquo in A E Magurran and B J McGill(eds) Biological Diversity Frontiers in Measurement andAssessment Oxford Oxford University Press

Karesh W B et al (2012) lsquoEcology of zoonoses natural and un-natural historiesrsquo Lancet 380 1936ndash45

Keesing F et al (2010) lsquoImpacts of biodiversity on the emergenceand transmission of infectious diseasesrsquo Nature 468 647ndash52

King A M Q et al (2012) Virus Taxonomy Classification andNomenclature of Viruses Ninth Report of the InternationalCommittee on Taxonomy of Viruses Elsevier Academic Press

Kreuder Johnson C et al (2015) lsquoSpillover and pandemic proper-ties of zoonotic viruses with high host plasticityrsquo ScientificReports 5 14830

Ksiazek T G et al (2003) lsquoA novel coronavirus associated withsevere acute respiratory syndromersquo The New England Journal ofMedicine 348 1953ndash66

Lau S K et al (2005) lsquoSevere acute respiratory syndromecoronavirus-like virus in Chinese horseshoe batsrsquo Proceedingsof the National Academy of Sciences of the United States of America102 14040ndash5

et al (2012) lsquoRecent transmission of a novel alphacoronavi-rus bat coronavirus HKU10 from Leschenaultrsquos rousettes topomona leaf-nosed bats first evidence of interspecies trans-mission of coronavirus between bats of different subordersrsquoJournal of Virology 86 11906ndash18

et al (2015) lsquoDiscovery of a novel coronavirus ChinaRattus coronavirus HKU24 from Norway rats supports the mu-rine origin of Betacoronavirus 1 and has implications for theancestor of Betacoronavirus lineage Arsquo Journal of Virology 893076ndash92

Lelli D et al (2013) lsquoDetection of coronaviruses in bats of variousspecies in Italyrsquo Viruses 5 2679ndash89

Li W et al (2005) lsquoBats are natural reservoirs of SARS-like coro-navirusesrsquo Science 310 676ndash9

Liang K and Zeger S L (1986) lsquoLongitudinal data analysis usinggeneralized linear modelsrsquo Biometrika 73 13ndash22

Maes P et al (2009) lsquoA proposal for new criteria for the classifica-tion of hantaviruses based on S and M segment protein se-quencesrsquo Infection Genetics and Evolution Journal of MolecularEpidemiology and Evolutionary Genetics in Infectious Diseases 9813ndash20

Magurran A E (2011) Biological Diversity Frontiers in Measurementand Assessment Oxford Oxford University Press

Melade J et al (2016) lsquoAn eco-epidemiological study of Morbilli-related paramyxovirus infection in Madagascar bats revealshost-switching as the dominant macro-evolutionary mecha-nismrsquo Scientific Reports 6 23752

Merkle D Middendorf M and Wieseke N (2010) lsquoA parameter-adaptive dynamic programming approach for inferring cophy-logeniesrsquo BMC Bioinformatics 11 Suppl 1 S60

Morse S S et al (2012) lsquoPrediction and prevention of the nextpandemic zoonosisrsquo The Lancet 380 1956ndash65

Pfefferle S et al (2009) lsquoDistant relatives of severe acute respira-tory syndrome coronavirus and close relatives of human

coronavirus 229E in bats Ghanarsquo Emerging Infectious Diseases15 1377ndash84

PREDICT_Consortium (2014) Reducing Pandemic Risk PromotingGlobal Health One Health Institute University of CaliforniaDavis Davis CA

Quan P L et al (2010) lsquoIdentification of a severe acute respira-tory syndrome coronavirus-like virus in a leaf-nosed bat inNigeriarsquo mBio 1 pii e00208-10 doi101128mBio00208-10

R Foundation for Statistical Computing (2008) R A Language andEnvironment for Statistical Computing Vienna Austria RFoundation for Statistical Computing

Ramsden C Holmes E C and Charleston M A (2009)lsquoHantavirus evolution in relation to its rodent and insectivorehosts no evidence for codivergencersquo Molecular Biology andEvolution 26 143ndash53

Reusken C B et al (2010) lsquoCirculation of group 2 coronavirusesin a bat species common to urban areas in Western EuropersquoVector Borne Zoonotic Dis 10 785ndash91

et al (2013) lsquoMiddle East respiratory syndrome coronavirusneutralising serum antibodies in dromedary camels a com-parative serological studyrsquo The Lancet Infectious Diseases 13859ndash66

Rubin D B (1987) Multiple Imputation for Nonresponse in SurveysNew York John Wiley and Sons

Sabir J S et al (2016) lsquoCo-circulation of three camel coronavirusspecies and recombination of MERS-CoVs in Saudi ArabiarsquoScience 351 81ndash4

Streicker D G et al (2012) lsquoEcological and anthropogenic driversof rabies exposure in vampire bats implications for transmis-sion and controlrsquo Proceedings Biological Sciencesthe RoyalSociety 279 3384ndash92

Tamura K et al (2013) lsquoMEGA6 molecular evolutionary geneticsanalysis version 60rsquo Molecular Biology and Evolution 30 2725ndash9

Tang X C et al (2006) lsquoPrevalence and genetic diversity of coro-naviruses in bats from Chinarsquo Journal of Virology 80 7481ndash90

Tsoleridis T et al (2016) lsquoDiscovery of novel alphacoronavirusesin European rodents and shrewsrsquo Viruses 8 84

Vijaykrishna D et al (2007) lsquoEvolutionary insights into the ecol-ogy of coronavirusesrsquo Journal of Virology 81 4012ndash20

Vijgen L et al (2005) lsquoComplete genomic sequence of human co-ronavirus OC43 molecular clock analysis suggests a relativelyrecent zoonotic coronavirus transmission eventrsquo Journal ofVirology 79 1595ndash604

Wacharapluesadee S et al (2015) lsquoDiversity of coronavirus inbats from Eastern Thailandrsquo Virology Journal 12 57

Wang Q et al (2014) lsquoBat origins of MERS-CoV supported by batcoronavirus HKU4 usage of human receptor CD26rsquo Cell Host ampMicrobe 16 328ndash37

Wang W et al (2015) lsquoDiscovery diversity and evolution ofnovel coronaviruses sampled from rodents in Chinarsquo Virology474 19ndash27

Watanabe S et al (2010) lsquoBat Coronaviruses and experimentalinfection of bats the Philippinesrsquo Emerging Infectious Diseases16 1217ndash23

Woo P C et al (2006) lsquoMolecular diversity of coronaviruses inbatsrsquo Virology 351 180ndash7

et al (2007) lsquoComparative analysis of twelve genomes ofthree novel group 2c and group 2d coronaviruses reveals uniquegroup and subgroup featuresrsquo Journal of Virology 81 1574ndash85

et al (2009) lsquoCoronavirus diversity phylogeny and inter-species jumpingrsquo Experimental Biology and Medicine 2341117ndash27

14 | Virus Evolution 2017 Vol 3 No 1

et al (2012) lsquoGenetic relatedness of the novel human groupC betacoronavirus to Tylonycteris bat coronavirus HKU4 andPipistrellus bat coronavirus HKU5rsquo Emerging Microbes andInfection 1 e35

Yang Y et al (2014) lsquoReceptor usage and cell entry of bat co-ronavirus HKU4 provide insight into bat-to-human

transmission of MERS coronavirusrsquo Proceedings of theNational Academy of Sciences of the United States of America111 12516ndash21

Zaki A M et al (2012) lsquoIsolation of a novel coronavirus from aman with pneumonia in Saudi Arabiarsquo The New England Journalof Medicine 367 1814ndash20

S J Anthony et al | 15

  • vex012-TF1
Page 14: Global patterns in coronavirus diversity · Since the emergence of Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV) and Middle East Respiratory Syndrom Coronavirus (MERS-CoV)

Irwin N R et al (2012) lsquoComplex patterns of host switching inNew World arenavirusesrsquo Molecular Ecology 21 4137ndash50

Jacomy M et al (2014) lsquoForceAtlas2 a continuous graph layoutalgorithm for handy network visualization designed for theGephi softwarersquo PLoS One 9 e98679

Jones K E et al (2008) lsquoGlobal trends in emerging infectious dis-easesrsquo Nature 451 990ndash3

Jost L Chao A and Chazdon R L (2011) lsquoCompositional simi-larity and beta diversityrsquo in A E Magurran and B J McGill(eds) Biological Diversity Frontiers in Measurement andAssessment Oxford Oxford University Press

Karesh W B et al (2012) lsquoEcology of zoonoses natural and un-natural historiesrsquo Lancet 380 1936ndash45

Keesing F et al (2010) lsquoImpacts of biodiversity on the emergenceand transmission of infectious diseasesrsquo Nature 468 647ndash52

King A M Q et al (2012) Virus Taxonomy Classification andNomenclature of Viruses Ninth Report of the InternationalCommittee on Taxonomy of Viruses Elsevier Academic Press

Kreuder Johnson C et al (2015) lsquoSpillover and pandemic proper-ties of zoonotic viruses with high host plasticityrsquo ScientificReports 5 14830

Ksiazek T G et al (2003) lsquoA novel coronavirus associated withsevere acute respiratory syndromersquo The New England Journal ofMedicine 348 1953ndash66

Lau S K et al (2005) lsquoSevere acute respiratory syndromecoronavirus-like virus in Chinese horseshoe batsrsquo Proceedingsof the National Academy of Sciences of the United States of America102 14040ndash5

et al (2012) lsquoRecent transmission of a novel alphacoronavi-rus bat coronavirus HKU10 from Leschenaultrsquos rousettes topomona leaf-nosed bats first evidence of interspecies trans-mission of coronavirus between bats of different subordersrsquoJournal of Virology 86 11906ndash18

et al (2015) lsquoDiscovery of a novel coronavirus ChinaRattus coronavirus HKU24 from Norway rats supports the mu-rine origin of Betacoronavirus 1 and has implications for theancestor of Betacoronavirus lineage Arsquo Journal of Virology 893076ndash92

Lelli D et al (2013) lsquoDetection of coronaviruses in bats of variousspecies in Italyrsquo Viruses 5 2679ndash89

Li W et al (2005) lsquoBats are natural reservoirs of SARS-like coro-navirusesrsquo Science 310 676ndash9

Liang K and Zeger S L (1986) lsquoLongitudinal data analysis usinggeneralized linear modelsrsquo Biometrika 73 13ndash22

Maes P et al (2009) lsquoA proposal for new criteria for the classifica-tion of hantaviruses based on S and M segment protein se-quencesrsquo Infection Genetics and Evolution Journal of MolecularEpidemiology and Evolutionary Genetics in Infectious Diseases 9813ndash20

Magurran A E (2011) Biological Diversity Frontiers in Measurementand Assessment Oxford Oxford University Press

Melade J et al (2016) lsquoAn eco-epidemiological study of Morbilli-related paramyxovirus infection in Madagascar bats revealshost-switching as the dominant macro-evolutionary mecha-nismrsquo Scientific Reports 6 23752

Merkle D Middendorf M and Wieseke N (2010) lsquoA parameter-adaptive dynamic programming approach for inferring cophy-logeniesrsquo BMC Bioinformatics 11 Suppl 1 S60

Morse S S et al (2012) lsquoPrediction and prevention of the nextpandemic zoonosisrsquo The Lancet 380 1956ndash65

Pfefferle S et al (2009) lsquoDistant relatives of severe acute respira-tory syndrome coronavirus and close relatives of human

coronavirus 229E in bats Ghanarsquo Emerging Infectious Diseases15 1377ndash84

PREDICT_Consortium (2014) Reducing Pandemic Risk PromotingGlobal Health One Health Institute University of CaliforniaDavis Davis CA

Quan P L et al (2010) lsquoIdentification of a severe acute respira-tory syndrome coronavirus-like virus in a leaf-nosed bat inNigeriarsquo mBio 1 pii e00208-10 doi101128mBio00208-10

R Foundation for Statistical Computing (2008) R A Language andEnvironment for Statistical Computing Vienna Austria RFoundation for Statistical Computing

Ramsden C Holmes E C and Charleston M A (2009)lsquoHantavirus evolution in relation to its rodent and insectivorehosts no evidence for codivergencersquo Molecular Biology andEvolution 26 143ndash53

Reusken C B et al (2010) lsquoCirculation of group 2 coronavirusesin a bat species common to urban areas in Western EuropersquoVector Borne Zoonotic Dis 10 785ndash91

et al (2013) lsquoMiddle East respiratory syndrome coronavirusneutralising serum antibodies in dromedary camels a com-parative serological studyrsquo The Lancet Infectious Diseases 13859ndash66

Rubin D B (1987) Multiple Imputation for Nonresponse in SurveysNew York John Wiley and Sons

Sabir J S et al (2016) lsquoCo-circulation of three camel coronavirusspecies and recombination of MERS-CoVs in Saudi ArabiarsquoScience 351 81ndash4

Streicker D G et al (2012) lsquoEcological and anthropogenic driversof rabies exposure in vampire bats implications for transmis-sion and controlrsquo Proceedings Biological Sciencesthe RoyalSociety 279 3384ndash92

Tamura K et al (2013) lsquoMEGA6 molecular evolutionary geneticsanalysis version 60rsquo Molecular Biology and Evolution 30 2725ndash9

Tang X C et al (2006) lsquoPrevalence and genetic diversity of coro-naviruses in bats from Chinarsquo Journal of Virology 80 7481ndash90

Tsoleridis T et al (2016) lsquoDiscovery of novel alphacoronavirusesin European rodents and shrewsrsquo Viruses 8 84

Vijaykrishna D et al (2007) lsquoEvolutionary insights into the ecol-ogy of coronavirusesrsquo Journal of Virology 81 4012ndash20

Vijgen L et al (2005) lsquoComplete genomic sequence of human co-ronavirus OC43 molecular clock analysis suggests a relativelyrecent zoonotic coronavirus transmission eventrsquo Journal ofVirology 79 1595ndash604

Wacharapluesadee S et al (2015) lsquoDiversity of coronavirus inbats from Eastern Thailandrsquo Virology Journal 12 57

Wang Q et al (2014) lsquoBat origins of MERS-CoV supported by batcoronavirus HKU4 usage of human receptor CD26rsquo Cell Host ampMicrobe 16 328ndash37

Wang W et al (2015) lsquoDiscovery diversity and evolution ofnovel coronaviruses sampled from rodents in Chinarsquo Virology474 19ndash27

Watanabe S et al (2010) lsquoBat Coronaviruses and experimentalinfection of bats the Philippinesrsquo Emerging Infectious Diseases16 1217ndash23

Woo P C et al (2006) lsquoMolecular diversity of coronaviruses inbatsrsquo Virology 351 180ndash7

et al (2007) lsquoComparative analysis of twelve genomes ofthree novel group 2c and group 2d coronaviruses reveals uniquegroup and subgroup featuresrsquo Journal of Virology 81 1574ndash85

et al (2009) lsquoCoronavirus diversity phylogeny and inter-species jumpingrsquo Experimental Biology and Medicine 2341117ndash27

14 | Virus Evolution 2017 Vol 3 No 1

et al (2012) lsquoGenetic relatedness of the novel human groupC betacoronavirus to Tylonycteris bat coronavirus HKU4 andPipistrellus bat coronavirus HKU5rsquo Emerging Microbes andInfection 1 e35

Yang Y et al (2014) lsquoReceptor usage and cell entry of bat co-ronavirus HKU4 provide insight into bat-to-human

transmission of MERS coronavirusrsquo Proceedings of theNational Academy of Sciences of the United States of America111 12516ndash21

Zaki A M et al (2012) lsquoIsolation of a novel coronavirus from aman with pneumonia in Saudi Arabiarsquo The New England Journalof Medicine 367 1814ndash20

S J Anthony et al | 15

  • vex012-TF1
Page 15: Global patterns in coronavirus diversity · Since the emergence of Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV) and Middle East Respiratory Syndrom Coronavirus (MERS-CoV)

et al (2012) lsquoGenetic relatedness of the novel human groupC betacoronavirus to Tylonycteris bat coronavirus HKU4 andPipistrellus bat coronavirus HKU5rsquo Emerging Microbes andInfection 1 e35

Yang Y et al (2014) lsquoReceptor usage and cell entry of bat co-ronavirus HKU4 provide insight into bat-to-human

transmission of MERS coronavirusrsquo Proceedings of theNational Academy of Sciences of the United States of America111 12516ndash21

Zaki A M et al (2012) lsquoIsolation of a novel coronavirus from aman with pneumonia in Saudi Arabiarsquo The New England Journalof Medicine 367 1814ndash20

S J Anthony et al | 15

  • vex012-TF1