École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf ·...
Transcript of École doctorale et discipline ou spécialitéthesesups.ups-tlse.fr/4296/1/2017TOU30390.pdf ·...
1
AcknowledgementsRemerciements
Jesouhaiteraisicichaleureusementremerciertouteslespersonnesquiontcontribuéaubon déroulement de cette thèse de façon plus ou moins directe, par leur aide, leursconseilsouleursoutien.Merci tout d’abord à Jérôme Chave,mon directeur de thèse à Toulouse, pourm’avoirouvertlesportesdel’écologie,unedisciplinedontj’ignoraistoutoupresqueaudébutdemathèse.Ayantété jusqu’alorsprincipalementconfrontéà larecherchethéorique, j’aidécouvert un univers où les données, leur production, leur analyse et leurinterprétation,sontlenerfdelaguerre.Monadaptationnes’estpasfaiteenunjour,etje mesure aujourd’hui le chemin parcouru. Je te remercie Jérôme de tes efforts pourm’intégrer à la discipline en début de thèse, notamment enme permettant de partirdeux fois faire du terrain aux Nouragues, une expérience inoubliable. Merci pour tesconseilstoujoursavisésettesrelecturesattentives.Tonexigence,tonenthousiasmeettonénergieinépuisables,tonreculsurladisciplineettalargeculturescientifiquem’ontinspirétoutaulongdemathèse.Merci à Hélène Morlon, ma directrice de thèse à Paris, de m’avoir accueilli pour madernièreannéedethèse.Outrequecetteannéesupplémentairem’aétéprécieusepourterminermathèsedansdebonnesconditions,cetteimmersiondanslamacro-évolutiona été pour moi l’occasion de découvrir une autre façon d’aborder la recherche enécologieetévolution.MerciHélènepourtesconseils,tonsoutienettapatience.Etmercidem’avoirfaitprofiterdecetteatmosphèreunique,oùdynamisme,rigueuretefficacitériment si bien avec convivialité, bienveillance et décontraction. Nous n’avons pas eul’occasiondetravaillerensembleautantquejel’auraissouhaitéaucoursdecettethèse,etjemeréjouisparconséquentdepouvoircontinueràlefairedanslefutur.I am very grateful to Christopher Quince and Corinne Vacher for taking the time toreviewthisthesis.Thankyouverymuchforyourcarefulreadingandyourconstructivecriticism.IalsothankSébastienBrosseandFrancescoFicetolaforagreeingtobepartofthejury.
Merci àmes co-auteurset collaborateurs, sur le travaildesquelsunegrandepartiedecettethèseestbasée.
Merci enparticulier à Lucie Zinger, dont le travail d’analyse et d’interprétationdes données utilisées dans cette thèse a été crucial. Lucie, tu as été un précieux pontaveclabiologietoutaulongdecettethèsepourlephysiciendeformationquejesuis,etjetesuisinfinimentreconnaissantpourtesnombreusesexplicationsetconseils.MerciàPierreTaberletetEricCoissacdem’avoir faitbénéficierde leurexpertisestate-of-the-artdanslaproductionetl’analysedesdonnéesmetabarcoding.MerciàHeidySchimannpour son aide dans l’interprétation biologique des données, ainsi que pour son aide
2
logistique sur le terrain. Merci à Amaia Iribar pour son aide indispensable dans lapréparationduterrain,sacontributionà laproductiondesdonnées,ainsiquepoursaremarquable efficacité pour surmonter les difficultés technique, logistique etadministrativedetoutordre,letoutdansunebonnehumeurinaltérable.MerciàSophieManzietElianeLouisannapourleuraidesurleterrainetleurtravaildewetlab.MerciàVincent Schilling pour sa contribution aux analyses bioinformatiques, son aide sur leterrain, et pour son humour en tant que camarade de carbet aux Nouragues. MerciégalementàSaintOmerCazaletAudreySagnepourleuraidesurleterrainàParacouetArbocel,etàDanielBoutaudpoursestrèsutilesporte-piochonstout-terrain.MercienfinàElodieCourtoisetBlaiseTymendem’avoirfaitdécouvrirlaGuyaneetlesNouraguesendébutdethèse,etpourcesmomentspartagéssurleterrain.
Je remercie en outre Antoine Fouquet et Jean-Pierre Vacher de m’avoir faitconfiancepour l’analysede leursdonnées.Merci également àHélèneHolotapour sonefficacité, sa disponibilité et sa gentillesse, à Blaise Tymen pour son aide avec lesdonnées Lidar, et àMélanie Roy, Antoine Fouquet, Gaël Grenouillet et Lounès Chikhi,entreautres,pourd’enrichissantesdiscussionsetpourleursconseils.
Je voudrais ensuite remercier un certain nombre de personnes qui, si elles n’ont pasdirectementcontribuéaucontenudecette thèse,ontéclairémes journéesde travailàToulouseetParisetontfaitleseldecesquatreannéesdevie.
Merci à mes co-bureaux toulousains Jessica et Félix pour toutes ces longuesdiscussions, et pour avoir supporté mon humour avec bienveillance en toutecirconstance.Merciàmesquatre«camaradesdepromo»àEDB,Arthur,Paul,Isabelleet Jean-Pierre, avec qui cela a été un immense plaisir de partager ces trois années àToulouse.MerciàBoris,Blaise,Mathieu,Olivia,Léa,Josselin,Aurèle,Luc,Nico,Camille,Céline,Marine,Lucie,Alice, Sébastien, Jade,Kévin, Jan,Fabian, Isabel… pour tous cesmomentspartagés,etàtouslesmembresd’EDBjeunesetmoinsjeunespourl’ambianceremarquabledulabo.
MerciàSimonetaux locatairessuccessifsde l’inénarrablemaisonde lacultureFrançois Magendie, à Louise et ses plongées dans le monde du théâtre, à Etienne etMathildeetleurs«écolesd’été»hippies,àGuillem,Claire,Hélène,Lucie,Lisa-Louetauxautres,pouravoirbrillammentpeuplé ces années toulousaines.Merci à Jean-PierreetSébastiendem’avoirinitiéàl’herpéto.MerciàAlexpoursonaccueilàMontpellieretcesdébatspassionnéssurlascienceetl’écologie.MerciauxAméricains:MarcetLéoetleurinspirant«SiliconValleyspirit»,etMatthieu,fidèlebirdingbuddy.EtmerciàSimonetFlorian pour leurs – trop rares – immersions dans l’informatique quantique et lesréseauxd’énergie.
Ungrandmerciégalementàmesco-bureauxparisiens.Marc,Odile, Julien,Eric,Leandro,Olivier,votreaccueilchaleureuxdansethorsdulaboagrandementfacilitématransitionparisienne.
Merci enfin aux anciens, aux vieux de la vieille, Grenoblois d’ici et d’ailleurs:Thibault,Mathieu,Arantxa,David,Lucas,Aurore,Carl,Vio… Jevouscompteparmi lesamis,maisc’estdéjàpresquelafamille.
Je remercie pour finir, last but not least, mes parents et mon frère, pour leursoutiensansfailleetôcombienindispensable.
3
TableofContents
Introduction 5
I. Whatdrivestheassemblyofecologicalcommunities? 6II. DNA-basedbiodiversitypatterns 21III. Statisticalapproaches 34IV. Objectivesandoutline 53
Chapter1 69
CausesofvariationinsoilbetadiversityacrossdomainsoflifeinthetropicalforestsofFrenchGuiana
Chapter2 113
InferringneutralbiodiversityparametersusingenvironmentalDNAdatasets
Chapter3 163
TopicmodellingrevealsspatialstructureinaDNA-basedbiodiversitysurvey
Discussion 203
I. Synthesis 204II. Perspectives 208
Appendix 221
Large-scaleDNAbarcodingofAmazoniananuransleadstoanewdefinitionofbiogeographicalsubregionsintheGuianaShieldandrevealsavastunderestimationofdiversityandlocalendemism
4
Introduction
5
Introduction
Introduction
6
I. Whatdrivestheassemblyofecological
communities?
Scienceconsistsinfindingpatternsinacollectionofisolatedobservationssoastogain
understanding of the processes that generated them. Natural sciences began with
attempts at classifying thediversity of the living organisms into categories (Aristotle,
IVth cent. BC), and this classification has been developed and perfected over the
centuries into themodernbinomialnomenclature (Linnaeus,1753).But classification
effortswere not limited to the description of species. Associations of species, and in
particular plant associations, were named using the samemodel, and were carefully
described based on their taxonomic composition and the abiotic properties of their
environment(Braun-Blanquet&Pavillard,1922).Eventhoughforestplantassociations
wereobservedshiftingthroughtime,thisphenomenonwasdescribedasmirroringthe
life cycle of individual organisms, from ‘youth’ to ‘senescence’ (Clements, 1916). The
underlying idea was that the organization of the living world obeyed static and
deterministic rules, which were to be uncovered. This idea was encouraged by the
discoveryoftheelegantlawsthatgovernphysicsandchemistry.
By contrast, early discoveries on evolution and biogeography (Darwin, 1859;
Wallace,1876)broughttheideathatchanceandhistoryhaveplayedanoverwhelming
roleinshapingthemodernlivingworld.Gleason(1926)andTansley(1935)werethe
first to contend that the diversity of plant associations was not well described by
discrete vegetation types, and that species associations were rather the transient
outcome of random dispersal events, constrained by abiotic conditions and species
interactions. Later, Hutchinson (1961), MacArthur (1972), Diamond (1975), Hubbell
(1979),Ricklefs(1987),andBrown(1995),amongothers,havesuccessivelyelaborated
on this idea, laying the foundations of modern community ecology. The term
‘community’ refers to all the organisms coexisting in a given location and at a given
time. Itmay also refer to a taxonomic subgroup of these organisms, such as a ‘plant
community’.
Introduction
7
The question of the relative role played by deterministic and stochastic
processesinshapingecologicalcommunitiesremainscentraltoecology.Inthissection,
I first argue that addressing this question is key to our ability to preserve natural
ecosystemsandtopredicttheirresponsetohumanperturbations.Ithenbrieflyreview
themechanismsofcommunityassemblythathavebeenproposed.
Motivations1.
The increasing awareness of the threats posed to natural ecosystems by human
activitieshasaddedasenseofurgencytothestudyofecologicalprocesses.Indeed,the
fate of the Earth’s biodiversity, and beyond it, of the ecosystems on which human
societiesrelyforfood,water,cleanair,health,andrawmaterials,hasbecomeamajor
sourceofconcern(Daily,1997).Asaconsequence,theoreticaladvancesinecologycan
no longer be considered in isolation from their practical implications. In particular,
manypredictionsrelevanttopolicy-makingstronglydependonassumptionsregarding
the mechanisms of community assembly. Thus, data-driven understanding of
community assembly is critical to well-informed policy-making. Three examples are
givenbelow:thepredictionofecosystemstabilityandstateshiftsinresponsetohuman
perturbations, thepredictionof the impactofclimatechange,and theconservationof
biodiversity.
Measuringecosystemstabilitytoperturbationsisasubjectofactiveresearch,as
istherelationshipbetweenbiodiversityandecosystemstability(McCann,2000;Tilman
et al., 2006; Loreau & deMazancourt, 2013). In this context, natural ecosystems are
commonlyrepresentedasstablecommunitiesheldtogetherbyspeciesinteractions,in
partbecausethisrepresentationlendsitselfwelltotheoreticalapproaches(Arnoldiet
al., 2016).Drawingon this framework, it hasbeenhypothesized that the responseof
ecosystems to perturbations may bear a similarity with that of physical systems
exhibitingcriticalphasetransitions(cf.Fig.1;Schefferetal.,2012).Accordingly,‘tipping
points’, sudden and difficult-to-reverse shifts in a system’s state in response to
Introduction
8
perturbation,shouldbeexpected(Brooketal.,2013).Moreover,suchstateshiftscould
be possibly predicted in advance through the identification of early-warning signals
(Carpenteretal., 2011; Schefferetal., 2012).While this typeof non-linearbehaviour
hasbeenevidenced in lakeecosystems (Carpenteretal., 2011), it remainsdifficult to
studyempirically,andknowledgeofcommunityassemblyprocesses iskey toprovide
realisticassumptionsforthetheoreticalpredictionofpossibletippingpoints.
Figure 1. The response of ecosystems to human-induced stress is commonly studied using anetwork representationof ecological communities, envisionedas stableentitiesheld togetherbyinteractions.Dependingonnetworkconnectivityandmodularity,theresponsemaybelinear(left) or exhibit a tipping point (right). Data-driven knowledge of community assemblyprocessesismuchneededtoinformsuchmodels.AdaptedfromSchefferetal.(2012).
Climatechangehasbecometheforemostthreattomanyecosystems,especially
thosethatarelessdirectlyimpactedbyhumanactivities.Speciesdistributionmodelling
is an important tool to predict the effect of climate change on biodiversity (Miller,
2010).Itconsistsininferringtheabioticrequirementsofindividualspeciesfromtheir
observed geographic distribution, and predicting their future distribution based on
predictedchangesinabioticconditions.Theneedtotakeintoaccountprocessesother
Anticipating Critical TransitionsMarten Scheffer,1,2* Stephen R. Carpenter,3 Timothy M. Lenton,4 Jordi Bascompte,5William Brock,6 Vasilis Dakos,1,5 Johan van de Koppel,7,8 Ingrid A. van de Leemput,1 Simon A. Levin,9Egbert H. van Nes,1 Mercedes Pascual,10,11 John Vandermeer10
Tipping points in complex systems may imply risks of unwanted collapse, but also opportunitiesfor positive change. Our capacity to navigate such risks and opportunities can be boosted bycombining emerging insights from two unconnected fields of research. One line of work isrevealing fundamental architectural features that may cause ecological networks, financialmarkets, and other complex systems to have tipping points. Another field of research is uncoveringgeneric empirical indicators of the proximity to such critical thresholds. Although suddenshifts in complex systems will inevitably continue to surprise us, work at the crossroads of theseemerging fields offers new approaches for anticipating critical transitions.
About 12,000 years ago, the Earth sud-denly shifted from a long, harsh glacialepisode into the benign and stable Hol-
ocene climate that allowed human civilization todevelop. On smaller and faster scales, ecosystemsoccasionally flip to contrasting states. Unlike grad-ual trends, such sharp shifts are largely unpre-dictable (1–3). Nonetheless, science is now carvinginto this realm of unpredictability in fundamentalways. Although the complexity of systems suchas societies and ecological networks prohibits ac-curate mechanistic modeling, certain features turnout to be generic markers of the fragility that maytypically precede a large class of abrupt changes.Two distinct approaches have led to these in-sights. On the one hand, analyses across networksand other systems with many components haverevealed that particular aspects of their structuredetermine whether they are likely to have criticalthresholds where they may change abruptly; onthe other hand, recent findings suggest that cer-tain generic indicators may be used to detect if asystem is close to such a “tipping point.”We high-light key findings but also challenges in these
emerging research areas and discuss how excit-ing opportunities arise from the combination ofthese so far disconnected fields of work.
The Architecture of FragilitySharp regime shifts that punctuate the usual fluc-tuations around trends in ecosystems or societiesmay often be simply the result of an unpredict-able external shock. However, another possibilityis that such a shift represents a so-called criticaltransition (3, 4). The likelihood of such tran-sitions may gradually increase as a system ap-proaches a “tipping point” [i.e., a catastrophicbifurcation (5)], where a minor trigger can invokea self-propagating shift to a contrasting state. Oneof the big questions in complex systems scienceis what causes some systems to have such tipping
points. The basic ingredient for a tipping pointis a positive feedback that, once a critical pointis passed, propels change toward an alternativestate (6). Although this principle is well under-stood for simple isolated systems, it is more chal-lenging to fathom how heterogeneous structurallycomplex systems such as networks of species,habitats, or societal structures might respond tochanging conditions and perturbations. A broadrange of studies suggests that two major featuresare crucial for the overall response of such sys-tems (7): (i) the heterogeneity of the componentsand (ii) their connectivity (Fig. 1). How theseproperties affect the stability depends on the na-ture of the interactions in the network.
Domino effects. One broad class of networksincludes those where units (or “nodes”) can flipbetween alternative stable states and where theprobability of being in one state is promoted byhaving neighbors in that state. Onemay think, forinstance, of networks of populations (extinct ornot), or ecosystems (with alternative stable states),or banks (solvent or not). In such networks, het-erogeneity in the response of individual nodesand a low level of connectivity may cause the net-work as a whole to change gradually—rather thanabruptly—in response to environmental change.This is because the relatively isolated and differ-ent nodes will each shift at another level of an en-vironmental driver (8). By contrast, homogeneity(nodes beingmore similar) and a highly connectednetwork may provide resistance to change until athreshold for a systemic critical transition is reachedwhere all nodes shift in synchrony (8, 9).
This situation implies a trade-off between lo-cal and systemic resilience. Strong connectivity
REVIEW
1Department of Environmental Sciences, Wageningen Univer-sity, Post Office Box 47, NL-6700 AA Wageningen, Nether-lands. 2South American Institute for Resilience and SustainabilityStudies (SARAS), Maldonado, Uruguay. 3Center for Limnology,University of Wisconsin, 680 North Park Street, Madison, WI53706, USA. 4College of Life and Environmental Sciences,University of Exeter, Hatherly Laboratories, Prince of WalesRoad, Exeter EX44PS, UK. 5Integrative Ecology Group, EstaciónBiológica de Doñana, Consejo Superior de InvestigacionesCientíficas, E-41092 Sevilla, Spain. 6Department of Economics,University of Wisconsin, 1180 Observatory Drive, Madison, WI53706, USA. 7Spatial Ecology Department, Royal NetherlandsInstitute for Sea Research (NIOZ), Post Office Box 140, 4400AC,Yerseke, Netherlands. 8Community and Conservation EcologyGroup, Centre for Ecological and Evolutionary Studies (CEES),University of Groningen, Post Office Box 11103, 9700 CCGroningen, Netherlands. 9Department of Ecology and Evolu-tionary Biology, PrincetonUniversity, Princeton, NJ 08544–1003,USA. 10University of Michigan and Howard Hughes MedicalInstitute, 2045 Kraus Natural Science Building, 830 North Uni-versity, Ann Arbor, MI 48109–1048, USA. 11Santa Fe Institute,1399 Hyde Park Road, Santa Fe, NM 87501, USA.
*To whom correspondence should be addressed. E-mail:[email protected]
Modularity
Stress
Sta
te
Sta
te
Stress
Connectivity
Heterogeneity
Adaptive capacity Resistance to change
Local losses Local repairs
Gradual change Critical transitions
+ +Homogeneity
+ +
+ +
Fig. 1. The connectivity and homogeneity of the units affect the way in which distributed systems withlocal alternative states respond to changing conditions. Networks in which the components differ (areheterogeneous) and where incomplete connectivity causes modularity tend to have adaptive capacity inthat they adjust gradually to change. By contrast, in highly connected networks, local losses tend to be“repaired” by subsidiary inputs from linked units until at a critical stress level the system collapses. Theparticular structure of connections also has important consequences for the robustness of networks,depending on the kind of interactions between the nodes of the network.
19 OCTOBER 2012 VOL 338 SCIENCE www.sciencemag.org344
on July 20, 2017
http://science.sciencemag.org/
Dow
nloaded from
Introduction
9
thanabioticrequirements,suchasspeciesinteractions,dispersallimitation,adaptation,
and phenotypic plasticity, has long been acknowledged (Guisan & Thuiller, 2005),
neverthelessmostpredictionsarestillobtainedwhileignoringtheseprocesses(Wiszet
al.,2013).Anotherapproachtopredictingtheeffectofclimatechangeonecosystemsis
through the dynamical simulation of ecosystems, either by simulating each organism
individually or using coarser models (Fisher et al., 2014). Building such models,
especially at the level of individual organisms, requires a clear understanding of the
processesrelevanttocommunityassemblyanddynamics.
Lastly, knowledge of community assembly is necessary to guide conservation
efforts.Assumptionsonthemechanismsofcommunityassemblyplayakeyroleinthe
debate on the optimal design of natural reserves (Cabeza & Moilanen, 2001) or on
species sensitivity to extinction (Tilman et al., 1994). Such assumptions are also
required toestimate theamountofbiodiversityharboured in species-richandpoorly
knownecosystems.Astraightforwardwaytoproceedistoassumethattherelationship
betweenthenumberofindividualsandthenumberofspecies,observedforasampleof
individuals, holds for the entire ecosystem. This reasoning implies that community
assembly can be regarded as random at the scale of the ecosystem. It has been for
instanceappliedtoAmazoniantrees,yieldinganestimatedtotalof16,000treespecies
extrapolatedfromabout5,000observedspecies(terSteegeetal.,2013).
Deterministicprocesses2.
Thedeterministicprocessesofcommunityassemblycanbedecomposedintotwomajor
components:abioticfilteringandbioticinteractions.
‘Abiotic filtering’ is a metaphor referring to the fact that species can only
establishthemselvesinlocationswhereabioticconditionssuittheirneeds:hence,any
givenlocationhostsonlyasubsetofthespeciesthatwouldhavetheabilitytoreachit
(Kraftetal., 2015).While this concept is very general, it has its roots in the studyof
plantcommunityassembly(Noble&Slatyer,1977). Inthiscontext,abiotic filtersmay
Introduction
10
include temperature, precipitation, soil nutrients, soil pH, soil grain size, soil water
content,soildepthandbedrock.
Biotic interactions refer to any type of interaction between organisms, either
betweenorwithinspecies,andcanbebroadlycategorizedintocompetition,predation,
parasitism, commensalism andmutualism (Schemske etal., 2009). Biotic interactions
mayfacilitateorhindertheestablishmentofaspeciesinacommunitydependingonthe
typeofinteraction,andassuchtheiractiononcommunityassemblymaybereferredto
as ‘biotic filtering’. Biotic and abiotic filtering are sometimes jointly referred to as
‘habitat filtering’ (Maireetal., 2012). Indirectbiotic interactions across trophic levels
mayhavecomplexandnon-trivialoutcomes.Forinstance, ifweassumethatatrophic
networkcanbedecomposedintodiscretetrophiclevels,increasingabundancesamong
the species belonging to a given trophic level (e.g., carnivores) lead to decreasing
abundances in the trophic level immediately below (e.g., herbivores), and in turn to
increasing abundances one level lower (primary producers), a process known as a
‘trophiccascade’(Paine,1980;Polisetal.,2000).Interspecificinteractionmayalsotake
theformofamodificationofsurroundingabioticconditionsbyorganisms,forinstance
by so-called ‘ecosystem engineer’ species (Wright et al., 2002), or simply through
shadinginthecaseofplants,thusblurringthelinebetweenabioticandbioticfiltering.
Withinasingletrophiclevel,competitionisconsideredtobethedominanttype
ofbioticinteractions(Chesson,2000).The‘competitiveexclusionprinciple’statesthat
the coexistence of two species competing for the same resource is not stable (Gause,
1932;MacArthur,1958;Hutchinson,1961;Armstrong&McGehee,1980).Indeed,ifone
of thespecieshasanevenslightcompetitiveadvantage, itwilleventuallyoutcompete
the other. Thus, any set of coexisting species is expected to exhibit differences in the
waytheyexploittheirhabitat.Thishasledtotheconceptof‘niche’,whichrefersinits
broader meaning to the relationship between a species and its habitat, including its
resource use, its interactionswith other species, and theway its occupies its habitat
both spatially and temporally (cf. Fig. 2; Grinnell, 1917; Hutchinson, 1957; Chase &
Leibold,2003).Aspecies’nichemayberepresentedasahypervolumeinthespaceofall
availableresourcesandpossiblehabitatuses.
Introduction
11
Figure 2.Aclassicalexampleofnichepartitioning:habitatpreferencesamongclosely relatedwarblerspeciesintheborealforestsofNorthAmerica.(A)CapeMay,(B)Blackburnian,(C)Bay-breasted, (D) Yellow-rumped, and (E) Black-throated Green warblers favour different treelayersanddifferenttreeheightswhenforagingforinsectsduringthebreedingseason.AdaptedfromMacArthur(1958).
In spite of theoretical predictions, the coexistence of many similar species
competing for a common resource in homogeneous environments is a common
occurrenceinnature.Thisisforinstancethecaseinspecies-richcommunitiessuchas
tropical forest trees and oceanic phytoplankton communities. This apparent paradox
has been called the ‘paradox of the plankton’ (Hutchinson, 1961). Thus, additional
October, 1958 WARBLER POPULATION ECOLOGY W (3
feeding. For this reason, differences between the species' feeding positions and behavior have been observed in detail.
For the purpose of describing the birds' feeding zone, the number of seconds each observed bird spent in each of 16 zones was recorded. (In the summer of 1956 the seconds were counted by saying "thousand and one, thousand and two, . . ." all subsequent timing was done by stop watch. When the stop watch became available, an attempt was made to calibrate the counted seconds. It was found that each counted second was approxi- mately 1.25 true seconds.) The zones varied with height and position on branch as shown in Figure 2. The height zones were ten foot units measured from the top of the tree. Each branch could be
divided into three zones, one of bare or lichen- covered base (B), a middle zone of old needles
(M), and a terminal zone of new (less than 1.5 years old) needles or buds (T). Thus a measure- ment in zone T3 was an observation between 20 and 30 feet from the top of the tree and in the terminal part of the branch. Since most of the trees were 50 to 60 feet tall, a rough idea of the height above the ground can also be obtained from
the measurements. There are certain difficulties concerning these
measurements. Since the forest was very dense, certain types of behavior rendered birds invisible. This resulted in all species being observed slightly disproportionately in the open zones of the trees. To combat this difficulty each bird was observed
for as long as possible so that a brief excursion into an open but not often-frequented zone would be compensated for by the remaining part of the observation. I believe there is no serious error in this respect. Furthermore, the comparative aspect is independent of this error. A different difficulty arises from measurements of time spent in each zone. The error due to counting should not affect results which are comparative in nature. If a bird sits very still or sings, it might spend a
large amount of time in one zone without actually requiring that zone for feeding. To alleviate this trouble, a record of activity, when not feeding, was kept. Because of these difficulties, non-parametric statistics have been used throughout the analysis of the study to avoid any a priori assumptions about distributions. One difficulty is of a dif- ferent nature; because of the density of the vegeta- tion and the activity of the warblers a large number of hours of watching result in disappointingly few seconds of worthwhile observations.
The results of these observations are illustrated in Figures 2-6 in which the species' feeding zones are indicated on diagrammatic spruce trees. While
4 9- fes 43. 8
13-2_-__ , -13.8
z ZO. 6-4_-ZI 3 8.4 . r '\ .3
I, .
4.0- / ~~V-5. 0
OBSERVAIO I OBEVAIN
~%
%T
- ~ ~ ~ ~ -
FIG. 2. Cape May warbler feeding position. The zones of most concentrated activity are shaded until at least 50% of the activity is in the stippled zones.
the base zone is always proximal to the trunk of the tree, as shown, the T zone surrounds the M, and is exterior to it but not always distal. For each species observed, the feeding zone is illus- trated. The left side of each illustration is the percentage of the number of seconds of observa- tions of the species in each zone. On the right hand side the percentage of the total number of times the species was observed in each zone is entered. The stippled area gives roughly the area in which the species is most likely to be found. More specifically, the zone with the highest per- centage is stippled, then the zone with the second highest percentage, and so on until at least fifty percent of the observations or time lie within the stippled zone.
Early in the investigation it became apparent that there were differences between the species' feeding habits other than those of feeding zones. Subjectively, the black-throated green appeared omnervous," the bay-breasted slow and "deliberate." In an attempt to make these observations objective, the following measurements were taken on feeding birds. Then a bird landed after a flight, a count
This content downloaded from 129.199.24.197 on Tue, 09 May 2017 16:57:19 UTCAll use subject to http://about.jstor.org/terms
October, 1958 WARBLER POPULATION ECOLOGY 605
34.8 ,z4.7 1 ~ 10.5_\'l',1 7
______. __SE-3b
2 ...Q / _ 13 0 2.7J A A-\- 6.4
11.0- io4 . 672.6'
/ 3/
J/
% OF TOTAL % OF TOTAL lqUMBER (1631) MUMBER (7 7)
OF SECODS OF OF
OB5PE:RVAT1O0E O B53EP:VATIONE 3
FIG. 5. Blackburnian warbler feeding position. The zones of most concentrated activity are shaded until at least 50% of the activity is in the stippled zones.
tion. To give a nonparametric test of the signifi- cance of these differences Table III is required.
Each motion was classified according to the di- rection in which the bird moved farthest. Thus, in 47 bay-breasted warbler observations of this type, the bird moved predominantly in a radial direction 32 times. Applying a X9 test to these, bay-breasted and blackburnian are not different but all others are significantly (P<.O1) different from one another and from bay-breasted and blackburnian.
There is one further quantitative comparison which can be made between species, providing ad- ditional evidence that during normal feeding be- havior the species could become exposed to dif- ferent types of food. During those observations of 1957 in which the bird was never lost from sight, occurrence of long flights, hawking, or hovering was recorded. A flight was called long if it went between different trees and was greater than an estimated 25 feet. Hawking is dis- tinguished from hovering by the fact that in hawk- ing a moving prey individual is sought in the air, while in hovering a nearly stationary prey indi-
L .59/ti ,t, 4.
:} 1 9 50
7 /
// /:0..:io: 7.0 \7
% OF TOTrAL OF TOT^AL. NqUM:BER L(416 6) 1;tUM-BM1:Z (Z 2R4S) OF SECONDS OF OF
O:B S3E1EV.ATI O OBS5:ERVAT I OqS
FIG. 6. Bay-breasted warbler feeding position. The zones of most concentrated activity are shaded until at least 50,01 of the activity is in the stippled zones.
duall is sought amid the foliage. This informa- tion is summarized in Table IV.
Both Cape May and myrtle hawk and undertake long flights significantly more often than any of the other species. Black-throated green hovers significantly more often than the others.
At this point it is possible to summarize differ- ences in the species' feeding behavior in the breed- ing season. Unfortunately, there are very few original descriptions in the literature for com- parison. The widely known writings of William Brewster (Griscom 1938), Ora Knight (1908), and S. C. Kendeigh (1947) include the best ob- servations that have been published. Based upon the observations reported by, these authors, the other scattered published observations, and the observations made during this study, the following comparison of the species' feeding behavior seems warranted.
Cape May W~arbler. The foregoing data show that this species feeds more consistently near the top of the tree than any species expect black- burnian, from which it differs principally in type
This content downloaded from 129.199.24.197 on Tue, 09 May 2017 16:57:19 UTCAll use subject to http://about.jstor.org/terms
604 ROBERT H. MACARTHUR Ecology, Vol. 39, No. 4
1 Al I t
9.8-I
_______~ 3 1_ HMlA
7.s/ l 1.*.* 10. 6
3~~~~~** '<34
/ l:,53.6 93.6\ ' \
/
5em.
% OF TOTAi. 7 OF TOTAL NUMBER (477 7) NUMBER (z263)
OF 5ECONDS OF OF
OZ5ERVATION OB5ERVATIONS
FIG. 3. Myrtle warbler feeding position. The zones of most concentrated activity are shaded until at least 50% of the activity is in the stippled zones.
of seconds was begun and continued until the bird was lost from sight. The total number of flights (visible uses of the wing) during this period was recorded so that the mean interval between uses of the wing could be computed.
The results for 1956 are shown in Table I. The results for 1957 are shown in Table II. Except for the Cape May fewer observations were taken than in 1956.
By means of the sign test (Wilson, 1952), treating each observation irrespective of the num- ber of flights as a single estimate of mean interval between flights, a test of the difference in activity can be performed. These data are summarized in
the following inequality, where < is interpreted
to mean "has smaller mean interval between flights. with 95% certainty."
1 5.72 s, 1 ' -Z4.9
18.9
_____________ .13
1.4 / ' /& /2.3 3.34./ ..
% OF TOTAL % OF TOTAL
NUMBPIL (2 611) mu MBEP. (1 64)
OF 5ECOMID5 OF OF
OBS:EPrVAT1:O}.I OB.SE.VATI O 5
FIG. 4. Black-throated green warbler feeding position. The zones of most concentrated activity are shaded until at least 50% of the activity is in the stippled zones.
their time searching in the foliage for food, some
appear to crawl along branches and others to hop across branches. To measure this the following procedure was adopted. All motions of a bird from place to place in a tree were resolved into
components in three independent directions. The natural directions to use were vertical, radial, and tangential. When an observation was made in which all the motion was visible, the number of feet the bird moved in each of the three direc- tions was noted. A surpringing degree of di- versity was discovered in this way as is shown in Figure 7. Here, making use of the fact that the sum of the three perpendicular distances from an interior point to the sides of an equilateral triangle is independent of the position of the point, the proportion of motion in each direction is re- corded within a triangle. Thus the Cape May
Black-throated green 95 Blackburnian 99 f Cape May t K< Myrtle f < |Bay-breastedf The differences in feeding behavior of the
warblers can be studied in another way. For, while all the species spend a substantial part of
moves predominantly in a vertical direction, black- throated green and myrtle in a tangential direction, bay-breasted and blackburnian in a radial direc-
This content downloaded from 129.199.24.197 on Tue, 09 May 2017 16:57:19 UTCAll use subject to http://about.jstor.org/terms
A B C
D E
Introduction
12
mechanisms need to be considered to account for species coexistence in such
communities (Tilman, 1982; Chesson, 2000). Even though a vast number of potential
mechanisms of species coexistence has been proposed (Palmer, 1994), they can be
roughly divided into ‘equalizing’ mechanisms, that reduce competitive differences
between species, and ‘stabilizing’mechanisms, that balance the effect of interspecific
competition(Chesson,2000).
Intraspecific competition represents one stabilizing mechanism. It has indeed
beenfoundempiricallythatcompetitionamongconspecificindividualsisoftenatleast
as intense as among different species (Connell, 1983). Predation and parasitism are
another important cause of negative intraspecific interactions among prey or host
species.Indeed,thefactthatpredatorsandparasitestendtospecializeononeorafew
speciesinducesa‘negativedensity-dependence’,i.e.favourslowerpopulationdensities.
This effect, known as the Janzen-Connell effect,was first proposed for tropical forest
trees (Connell, 1970; Janzen, 1970). Lastly, spatial and temporal fluctuations in
environmental conditions are also a stabilizing mechanism favouring species
coexistence(Chase&Leibold,2003;seesectionI.4below).
Competition, predation and parasitism act also as equalizing mechanisms.
Indeed, interspecific competition eliminates less competitive species from the
community, while predation and parasitism effectively offsets the competitive
advantageofthemostsuccessfulspecies(Chesson,2000).Theimportanceofequalizing
mechanisms and intraspecific competition in species-rich communities has prompted
some ecologists to propose that competitive differences between organisms could be
altogetherneglectedinsuchsystems(Hubbell,2001),asdiscussedinthefollowing.
Stochasticprocesses3.
However complex and fascinating the interplay of species’ niches is, community
assembly cannot be fully understoodwithout considering the influence of geography
andhistoryoncommunitycomposition(MacArthur,1972;Ricklefs,1987).Firstly, the
Introduction
13
capacity todisperse is finite in all species: offspringaremore likely tobe foundnear
parent individuals.Thus, community composition in a given location isdependenton
thepoolof species thatarewithindispersaldistanceof that location, andon random
dispersal events. The limiteddispersal of individuals generates spatial clusters in the
distributionofaspecies(Houchmandzadeh,2009),andthuscausesspatialvariationsin
communitycompositionevenintheabsenceofothermechanisms.Secondly,ifthereare
nocompetitivedifferencesbetweentwocompetingspecies,thefactthanoneislocally
commonandtheotherrareisduetochancealone.Therelativeabundancesofthetwo
speciesareexpectedtofluctuaterandomlyovertime,untiloneeventuallygoesextinct.
Thus,overasufficiently longperiodoftime,competitiveexclusionisexpectedtotake
place even in the absence of competitive differences. The larger the number of
competingspeciesinagivenlocation,thelowertheaveragepopulationofeachspecies
is,andthefasterthecommunitywilllosespeciestorandomdemographicfluctuations.
Thisprocesshasbeencalleddemographicorecologicaldrift,byanalogytotheprocess
ofgeneticdriftinpopulationgenetics(Etienne&Alonso,2007).
The ‘neutrality’ assumption is defined as the absence of any competitive
differences among individuals, irrespective of the species they belong to (Watterson,
1974; Caswell, 1976). Since dispersal limitation and demographic drift take place
independently of any competitive differences between organisms, they are often
referred to as ‘neutral’ processes, even though they are also present in non-neutral
systems. Under a dynamics governed by dispersal limitation and demographic drift,
ecological communities never reach equilibrium: their composition indefinitely shifts
overtime.Nevertheless,ifthetotalnumberofindividuals,thespeciesrichness,andthe
dispersal capacity of individuals remain constant over time, community structure
reaches a stationary state that can be described statistically as a function of these
parameters.
MacArthur & Wilson (1967) were the first to build dispersal limitation and
demographicdriftintoamodel,whichtheyusedasafoundationfora‘theoryofisland
biogeography’aimedatexplainingspeciesrichnessonislands.Theyreasonedthatthe
number of species found on a coastal island results from an equilibrium between
Introduction
14
immigration of new species from the mainland and species extinction on the island
throughdemographicdrift,eventhoughtheydidnotexplicitlyinterprettheirtheoryas
neutral.The twoprocessesarestochasticand their relative frequencydetermines the
number of species found on the island at any given time (cf. Fig. 3). They further
assumedthat the immigrationratedependsonthedistanceto themainland,andthat
theextinctionratedependsontheisland’sarea,thusenablingempiricalcomparisonof
theirtheorytoobservations(Simberloff&Wilson,1969).
Figure 3. MacArthur & Wilson (1967) were the first to combine dispersal limitation anddemographicdriftintoasimplemodel,thataimsatexplainingthenumberofspeciesfoundonislands.Theyassumedthatthenumberofspeciesresultsfromadynamicequilibriumbetweenstochasticimmigrationandextinction,whicharedependentondistancetothemainlandandonislandsize,respectively.AdaptedfromHubbell(2001).
The theory of island biogeography was later expanded to better account for
empirical observations (Brown & Kodric-Brown, 1977). It was also proposed that it
mightapplymoregenerallytoanypatchofisolatedhabitat(Brown,1978).Inparallel,
Watterson (1974) and Caswell (1976) used the mathematical tools of population
MacARTHUR AND WILSON’S RADICAL THEORY
Fig. 1.3. Various enhancements to the basic equilibrium hypothe-sis of MacArthur and Wilson do not change the dispersal assemblyassumption underlying the model. Downwardly bowing immigrationand extincton curves were added to characterize the effects of compe-tition on these rates, but all species, whether early or late colonizers,good or bad competitors, experience the same changes in rates. Sim-ilarly, the effects of island distance from the mainland and island sizeon immigration and extinction rates, respectively, operate equally onall species.
and Wilson fully appreciated the implications of this rad-ical assumption. A majority of their 1967 monographwas devoted to discussing such topics as species differ-ences in colonization strategies, causes of species differ-ences in extinction rates, temporal patterning in the orderin which species would successfully establish, and so on—alldifferences forbidden by their model! Although MacArthurand Wilson (1967) wrote about traditional ecological pro-cesses such as competition, the actual parameters of theirmodel were immigration and extinction rates, distance from
17
Introduction
15
genetics tomodelneutral communitiesat the levelof individualorganisms insteadof
thelevelofspecies,thusprovidingamoremechanisticdescriptionoftheprocesses,but
withoutincludingdispersallimitation.Hubbell(1979,1997,2001)eventuallycombined
both ideas into an influential neutral model, which he used as a basis to propose a
‘unifiedneutraltheoryofbiodiversityandbiogeography’.Histheorynotonlystatesthe
importanceofdemographicdriftanddispersallimitationforcommunityassembly,but
also proposes that they may be the dominating mechanisms in some species-rich
communities,especiallytropicalforesttreesandcoralreefs.Indeed,stronginterspecific
competition and predation could act as equalizing mechanisms between species in
these communities, as mentioned earlier, and combine with strong intraspecific
competitiontomakeallindividualsofallspecieseffectivelyequivalent(Scheffer&van
Nes, 2006). Another hypothesis is that in highly diversified communities, complex
interspecificinteractionscouldaverageoutatthescaleofthecommunity,leadingtoan
‘emergentneutrality’(Holt,2006).
In Hubbell’s model, the mainland’s species reservoir, called the
‘metacommunity’,undergoesademographicdriftwhererandomextinctionsareoffset
by random speciation events. The island, or ‘local community’, also undergoes a
demographicdrift,butrandomextinctionsareoffsetbythedispersal,orimmigration,of
individualsfromthesourcemetacommunity.Sincethemodelisneutral,allindividuals
are considered to have the same dispersal capacity, irrespective of the species they
belong to.The scopeof the theory isnot limited to isolatedhabitatpatches: the local
community may represent any spatially delineated ecological community, while the
metacommunityrepresentstheregionalpoolofspeciesconstitutedbytheaggregation
ofall localcommunities.Themodel iscontrolledbytwoparameters, the frequencyof
speciation events in the metacommunity, which determines the regional species
richness,andthefrequencyofimmigrationintothelocalcommunity.Theimmigration
fluxintothelocalcommunitymodulatesitsconnectivitywiththemetacommunity:the
stronger the immigration flux, the more species-rich and the more similar to the
metacommunity the local community is. Hubbell’s model and subsequent related
neutral models (Etienne & Alonso, 2007) are amenable to several quantitative
Introduction
16
predictions,andthustostatisticaltesting(this isdiscussedinmoredetails insections
II.1andIII.4).
Two distinct neutrality assumptions can be distinguished in Hubbell’s neutral
theory: one regarding themetacommunity dynamics, over an evolutionary timescale,
andoneregardingthelocalcommunitydynamics,overthetimescaleofanindividual’s
lifetime. Predictions regarding local community structure, namely the relationship
betweenareaandspeciesrichness,thedecayoftaxonomicsimilaritywithdistance,and
the distribution of relative species abundances (see section II.1), integrate both
assumptions. They are in good qualitative agreement with empirical data (Hubbell,
2001), nevertheless most datasets exhibit quantitative departure from neutrality
(McGill et al., 2006). The assumption of a neutral diversification dynamics in the
metacommunitycanbetestedseparately,andhasbeenshowntobeunrealistic.Indeed,
themean species agepredictedbyHubbell’smodel arenot consistentwith empirical
measurements(Ricklefs,2003,2006),andtheshapeofthepredictedphylogenetictrees
does not match that of empirically reconstructed trees (Davies et al., 2011). Hence,
recent approacheshave instead focusedon testing separately theassumptionof local
neutral assembly through immigration, with contrasting results depending on the
system(Sloanetal.,2006;Jabotetal.,2008;Ofiteruetal.,2010;Harrisetal.,2015).
Even though comparison of empirical patterns to model predictions suggests
that real ecological communities are rarely neutral, Hubbell’s neutral theory retains
important merits (Alonso et al., 2006). Indeed, it has been pointed out that all the
processesofcommunityecologyareunderpinnedbyonlyfourfundamentalprocesses:
naturalselection,demographicdrift,speciation,anddispersal(Vellend,2010).Yet,the
majorityof ecological literature focusesononlyoneof them,natural selection,which
underpins all niche differences between species and thus all deterministic ecological
processes. In contrast, Hubbell’s neutral theory focuses on the three remaining
fundamental processes, which are inherently stochastic, and places them in a
quantitative framework. In practice, neutral models are essential tools for twomain
uses (Rosindell et al., 2012). Firstly, they may serve as a ‘null model’ against which
empiricalpatternscanbecontrasted,soastoidentifycaseswhereneutralprocessesare
Introduction
17
sufficienttoexplainthedataandcaseswheretheyarenot.Secondly,theymayserveas
a parsimonious approximation to real systems, and as a foundation for incorporating
relevantnon-neutralmechanisms,suchasnichedifferences(Chisholm&Pacala,2010),
environmentalstochasticity(Kalyuzhnyetal.,2015),negativedensity-dependence(Du
etal.,2011),oramorerealisticspeciationdynamics(Rosindelletal.,2010).
Spatialandtemporalscales4.
Community assembly involves a range of temporal and spatial scales spanningmany
orders ofmagnitude - from the evolutionary timescale to the behaviour of individual
organisms,andfromtheglobalscaletothescaleofmicroorganisms(Chave,2013).The
continental scale is therealmofbiogeography,wherespeciesdistributionreflects the
geological and evolutionary history of continents (Cox et al., 2016), as well the
latitudinalgradientofdiversity(Hillebrand,2004).Attheoppositeend,moststudieson
species interactions focus on a limited number of individuals. Community ecology is
concernedwiththeintermediatescales(Lawton,1999):namely,withinabiogeographic
unit (Morrone, 2015), but encompassing a number of individuals large enough for
statistical patterns to emerge. The scale at which statistical patterns start emerging
depends on the type of organisms considered, and will differ by many orders of
magnitudebetweenplantsandbacteria.
Nicheandneutralprocessesmightalternatelydominateatdifferentspatialand
temporal scales.Firstly, locallyobservedspecies interactionsdonotprecluderandom
species assembly over larger spatial and temporal scales. Indeed, the majority of
interspecificinteractionsareopportunisticandvaryacrossspaceandtime(Holt,1996;
Poisot et al., 2014), despite much-studied instances of specialized interspecific
interactions such as plant-pollinator mutualisms (Rønsted et al., 2005). Secondly,
speciesdynamically adapt theirniche to the local competitive context, either through
plasticityorthroughnaturalselection.Forinstance,closelyrelatedspecieswithmostly
disjointgeographicaldistributionsareknowntodisplaygreaterphenotypicdifferences
Introduction
18
(such as a difference in size)wherever they co-occur, a process known as ‘character
displacement’ (Brown & Wilson, 1956). Natural selection has been found to have
measureableeffectsonphenotypeovertimescalesasshortasafewgenerationswhen
speciesareconfrontedwithasuddenchangeintheirbioticorabioticenvironment,thus
questioning the legitimacy of the traditional separation between the timescales of
evolutionaryandcommunityassemblyprocesses(Ghalamboretal.,2015).
Figure 4. Community assembly processes depend on the spatial and temporal scalesconsidered: current geographical patterns of tree diversity in Europe might reflect on-goingdispersalfromiceagetreerefugia,whichstarted14,000yearsago.Topright,bottomleftandbottomright:geographicaldistributionoftreediversity(increasingfromyellowtoblue)forall60Europeantreespecies,the45temperatespeciesandthe15borealspecies,respectively.Topleft: accessibility through dispersal from ice age tree refugia (black dots). Adapted fromSvenning&Skov(2007).
Anotherkeyaspectofcommunityassemblyishowfastcommunitycomposition
responds to abiotic change, relative to thepace of the abiotic change itself. Indeed, if
abiotic change is fast enough relative to community response, the community may
never reach equilibrium, thus leading to an apparently random dynamics. This
present time, i.e. extrinsic ecogeographical factors. Notably,it is easy to imagine that the cold-hardy Quercus robur hadmore and more northerly located refugia than Q. cerris andfor this reason could achieve an earlier and faster postglacialspread. The first tree species to spread would have metmuch less competition from other tree species than late-spreading species, which would have had to spread throughwell-established late-successional forest communities. Sven-ning & Skov (2004) did in fact find a relatively strongpositive correlation between range filling and cold hardinessin European trees.
C A N C U R R E N T P A T T E R N S O F T R E E D I V E R S I T Y B EP R E D I C T E D F R O M A S I M P L E M E A S U R E O FA C C E S S I B I L I T Y F R O M G L A C I A L R E F U G I A ?
While it is evident that climate and to a lesser extent otherenvironmental factors such as soil do constrain Europe-wide tree species diversity and distribution patterns (Walter& Breckle 1986; Pigott 1991; Sykes et al. 1996; Svenning &
Skov 2005), we will now consider the extent to which thesepatterns could entirely be caused by dispersal.
If diversity patterns were entirely driven by limiteddispersal out of the glacial refugia, we expect that the areasthat are most accessible from the refugia, i.e. located closestto the greatest number of refugia, would harbour the greatestnumber of species. Figure 2 shows the pattern of accessi-bility across Central and Northern Europe as well as theobserved pattern of tree species richness. The accessibility(ACC) of each grid cell in the receiving area (Central andNorthern Europe) was computed as the inverse of thesummed distances to all grid cells in the source area. Hence,the more distant a receiving grid cell on average is locatedfrom any one source cell the lower its accessibility. Thesource area was set to be Southern Europe at 43–46! N, aspostglacial expansions into Central and Northern Europeprimarily took place from or via this region (e.g. Petit et al.2002; Magri et al. 2006). Albeit some of the most cold-tolerant tree species had LGM refugia somewhat furthernorth, especially in eastern Europe (Willis & van Andel
Figure 2 Top-left: The accessibility of each 50 · 50 m grid cell in Central and Northern Europe to postglacial immigration from the ice agetree refugia, computed as the inverse of the summed distances to all grid cell in the source area (Southern Europe at 43–46! N). Top-right:The current native species richness of tree species (60 species in total, 2–31 species per cell) in Europe. Right: Bottom-left: The current nativespecies richness of temperate tree species (45 species in total, 0–22 species per cell). Bottom-right: The current native species richness ofboreal tree species (15 species in total, 0–10 species per cell). Colour coding corresponds to 10 equal frequency categories, with yellow overgreen to blue representing low to high accessibility and few to many species, respectively.
456 J.-C. Svenning and F. Skov Idea and Perspective
" 2007 Blackwell Publishing Ltd/CNRS
Introduction
19
phenomenonmaybemorepervasivethanitseems:forinstance,ithasbeenshownthat
thedispersaloftreespeciesinEuropefollowingtheendofthelasticeageisstillanon-
going process (cf. Fig. 4; Svenning & Skov, 2007). In contrast, organisms with short
generation time and high dispersal ability are able to track environmental changes
more efficiently. Additionally, if several local communities are connected by a
permanent and strong enough dispersal flux, they may never reach the optimal
composition that would be expected based on local abiotic conditions (Gravel et al.,
2006). A local community will also bemore prone to demographic stochasticity if it
hostsasmallerpopulationsize(Fisher&Mehta,2014).Theseobservationshaveledto
thedevelopmentinthelastdecadeof‘metacommunitytheory’,afamilyofmathematical
models aiming at reconcilingneutral andnicheprocessesby explicitly accounting for
spatialand temporaldynamics (Leiboldetal.,2004).However,unlikesimplerneutral
models,thesemodelsdonotprovidepredictionsthatareeasilyamenabletostatistical
comparisonwithempiricaldata.
Lastly,mostoftheexistingknowledgeoncommunityassemblycomesfromthe
study of plants and vertebrates, and the extension of community ecology to
microorganisms is comparatively very recent (Curtis & Sloan, 2005; Martiny et al.,
2006;seesectionII.2).Whilethefundamentalprocessesofcommunityassemblyapply
toalllivingorganisms,theyoperateoververydifferentscalesformicroorganisms,and
their relative importance is likely to differ (Hanson et al., 2012). It has long been
considered that microorganisms had effectively infinite dispersal capacity, and that
abiotic filtering was the dominant process of community assembly (Baas Becking,
1934). Microbial communities have indeed been found to be very sensitive to local
abioticconditionsanddominatedbyspecialisttaxa(Ramirezetal.,2014;Mariadassou
etal.,2015).Nevertheless,thisviewhasnowbeennuanced,anddispersallimitationhas
beenshowntoplayaroleaswell(Ofiteruetal.,2010;Martinyetal.,2011;Roguetetal.,
2015). While microorganisms tend to be more cosmopolitan than larger organisms,
biogeographic patterns do exist (Hanson et al., 2012; Livermore & Jones, 2015).
Microorganismshavealsobeenfoundableofcomplexinteractionsbeyondcompetition
(Corderoetal.,2012).
Introduction
20
Introduction
21
II. DNA-basedbiodiversitypatterns
Mostofecologicalknowledgecomes fromstudiesperformedat the levelof individual
species, and from this perspective, the singularity of each species and sometimes of
each individual isstriking.Thus,ecologistshave longwonderedwhethergeneral laws
werehidingbehindthecollectionofidiosyncrasies(Lawton,1999).Integrativedataon
speciesrichness,abundanceandspatialoccurrencehavebeengatheredwiththehope
that they would yield insight into the general mechanisms of community assembly
(Brown, 1995). The underlying idea is that, as in statistical physics, informative
statisticalpropertiesmightemergefromtheobservationofa largeenoughnumberof
individualsandspeciesirrespectiveofthedetailsofspeciesidentities.
Inthissection,Ifirstintroducetwotypesofintegrativepatternsthathavebeen
widely studied in community ecology: the distribution of species abundances, and
spatial patterns. I then discuss why the emergence of automated data collection is
opening new horizons for the study of these patterns. Lastly, I briefly present the
ecosystem that this thesismore specifically focuses on, the tropical forests of French
Guiana.
Integrativebiodiversitypatterns1.
a. Speciesrelativeabundances
The distribution of species abundances in a random sample of individuals takes two
forms in the ecological literature: the ‘rank-abundance distribution’ (RAD), or
‘Whittaker’splot’,consistsoftheabundancesniofallSspeciesinthesamplerankedby
decreasing abundance,while the ‘species abundancedistribution’ (SAD), or ‘Preston’s
Introduction
22
plot’, is the distribution of the numberΦ!of species having abundance n for all the
possible n values in 𝑛!,… ,𝑛! (cf. Fig. 5; Preston, 1948; Whittaker, 1965). To
accommodate the limited amount of data, the SAD is usually binned into abundance
categories. This binning step leads to a loss of information, thus the RAD is more
informative than the SAD. Nevertheless, the SAD has often been the preferred
distributionbecauseitiseasiertohandlemathematicallyandtoderivefromtheoretical
models.This is linked to the fact that it canbe interpreteduponnormalizationas the
probabilitydistributionfortheabundanceofarandomlychosenspeciesinthesample.
Because of the wide range of abundances typically observed in empirical data,
abundances are often log-transformed in SAD and RAD – in SAD, this amounts to
binning species into abundance classes of exponentially increasing width from the
lowest abundance class (one individual) to the highest, following the example of
Preston(1948).
Itwasnoticedearlyonthatthedistributionofspeciesabundancestendedtobe
similar in species-rich communities. Indeed, within a single trophic level, there are
usuallyafewcommonspeciesandalongtailofrarespecies–simplyput,‘mostspecies
are rare’ (cf. Fig. 5). This spurred attempts at finding a general explanation for this
pattern. Fisher et al. (1943) and Preston (1948) were the first to propose statistical
distributionstofitthedistributionofspeciesabundances.
Fisher assumed that the sampled species abundances followed a negative-
binomialdistributionwithoutthezero-abundanceclass,andderivedaSADoftheform
𝔼 Φ! = 𝛼𝑥! 𝑛,whereαisaconstantparameter,𝑥isafunctionofαandofsamplesize
N(with0 < 𝑥 < 1),and𝔼 Φ! isthestatisticallyexpectedvalueofΦ!(cf.sectionIII.3.b;
Chave, 2004). Since 𝔼 Φ!!!!! = −𝛼 ln(1− 𝑥), this distribution is called the ‘log-
series’.Aremarkablepropertyofthismodelisthattheexpectednumberofspecies𝔼 𝑆
in the sample is given as a function of the number of sampled individuals N by
𝔼 𝑆 = 𝛼 ln(1+ 𝑁 𝛼). Hence, the parameter𝛼is sufficient to predict the observed
speciesrichnessasafunctionofthesamplingeffort.Itcanthusbeusedasasampling-
independent measure of the community’s diversity. The value of𝛼 can be easily
Introduction
23
visualized in the RAD representation, since the log-transformed abundances are
expectedtodecreaselinearlywithslope−1 𝛼asafunctionofspeciesrank(cf.Fig.5).
Preston (1948) argued in contrast that a log-normal SAD best fitted empirical
data,i.e.𝔼 Φ! ∝ 𝑒! !"!!! !!! with𝜇and𝜎constantparameters.Anotabledifference
between the two SADs is that the log-normal distribution exhibits a mode (i.e., the
abundanceclasswiththemostspeciesisnotthelowestabundanceclass),whileFisher’s
log-seriesdoesnot.Prestonexplainedthefactthatbothsituationscouldbeencountered
inempiricaldatabytheeffectofsampling:acommunityinwhichthe‘true’SAD(i.e.,for
METACOMMUNITY DYNAMICS
Fig. 5.7. Preston-type plot of relative species abundance for treespecies >10 cm dbh in the 50 ha BCI plot, compared with expecta-tions from the lognormal, and from the zero-sum multinomial of theunified neutral theory, for θ = 50 and m = 0"10. The error bars are±1 standard deviation.
Fig. 5.8. Preston-type plot of relative species abundance for treespecies >10 cm dbh in the 50 ha Pasoh plot, compared with expecta-tions from the lognormal, and from the zero-sum multinomial of theunified neutral theory, for θ = 180 and m = 0"15. The error bars are±1 standard deviation.
135
METACOMMUNITY DYNAMICS
Fig. 5.9. Fitted and observed dominance-diversity distributions fortrees >10 cm dbh in the 50 ha plot on Barro Colorado Island, Panama.The best fit θ had a value of 50. Note the departure of the metacom-munity distribution for very rare species, but that the observed distri-bution is fit well once dispersal limitation (m = 0"10) is taken intoaccount. The error bars are ±1 standard deviation.
each figure. The metacommunity logseries distribution isthe diagonal line extending downward beyond the empiricalcurves to the lower right. The metacommunity distributionwas calculated for a fitted θ value of 50 in the case of theBCI forest, and for a fitted θ value of 180 in the Pasoh for-est. Then the parameters for dispersal limitation and localcommunity size were included to predict the local commu-nity dominance-diversity curves in each forest plot. Localcommunity size was 20,541 trees > 10 cm dbh in the BCIplot, but it was 28% higher (26,331) in the Pasoh plot. Thepreviously estimated values of m of 0.10 and 0.15 for BCIand Pasoh, respectively, were used. The precision of thepredicted local dominance-diversity curves in each plot isreadily apparent from figures 5.9 and 5.10. The expecteddistributions fit even the abundances of the rarest species
137
Figure5:SpeciesAbundanceDistribution(top)andRankAbundanceDistibution(bottom)formaturetreesinthe50-haBarroColoradoIsland(BCI)monitoredforestplot(Panama).Maturetreesaredefinedasstemswithdiameterlargerthan10cmatbreastheight(or‘>10cmdbh’).Thedispersal-limitedHubbell’smodelisfittedtothedata(𝜃 = 50,𝑚 = 0.1),andiscomparedwiththelog-normalSAD(top;dashedline),andwiththeRADofFisher’smodel(bottom;dashedline).Fisher’smodelisequivalenttoHubbell’smodelwithoutdispersallimitation(i.e.,case𝑚 = 1)forlargesamplesize.Errorbarsindicate±1standarddeviation.
Introduction
24
aninfinitenumberofindividuals)islog-normalcanloseitsmodeifunder-sampled,and
be mistaken for a log-series. It has since then been acknowledged that the effect of
samplingisindeedparamountinourabilitytodistinguishbetweendifferently-shaped
SAD by curve-fitting (Sloan et al., 2007). In the RAD representation with log-
transformed abundances, a log-normal SAD takes the form of an S-shaped curve, the
commonspeciesbeingcommonerandtherarespeciesrarerthaninFisher’slog-series.
Later models have focused on finding a mechanistic justification for the
proposed distributions.MacArthur (1957) proposed that species relative abundances
resultedfromtherandompartitioningofthenichespacebetweenthedifferentspecies
of the community. A number of more sophisticated niche partitioning models’ were
subsequentlyproposed(Tokeshi,1996;McGilletal.,2007).However,Hubbell’sneutral
model is themechanisticmodel thathasbeen themost successful at fittingempirical
SADs (Hubbell, 2001; cf. section I.3 and III.4). Indeed, the metacommunity SAD
convergestowardFisher’slog-seriesforalargeenoughsamplesizeandischaracterized
by a ‘fundamental biodiversity number’ θ that converges toward Fisher’s𝛼(Chave,
2004). Intheabsenceofdispersal limitation,the localcommunity isarandomsample
fromtheregionalmetacommunity,andhencealsoexhibitsalog-series-likeSAD.Inthe
presenceofdispersallimitationhowever,thedepletionofrarespeciesandtheincrease
inabundanceoflocallycommonspeciesleadtoalog-normal-likeSAD(cf.Fig.5).Thus,
Hubbell’sneutralmodelcanapproximateboththelog-seriesandthelog-normalSADs,
whileprovidingamechanistic justification for themand fullyaccounting forsampling
effects.
Nevertheless,ithasbeenshownthatmanytypesofnon-neutralprocessescould
yieldSADs similar toneutralones (Chaveetal., 2002;Pueyoetal., 2007;Chisholm&
Pacala, 2010). It has also been argued that the log-normal distribution fits empirical
SADsatleastaswellasHubbell’slocalcommunitySAD(McGill,2003).Thelog-normalis
still themostpopularchoicewhenitcomestochoosingarealistically-shapedSADfor
modellingpurposesirrespectiveoftheunderlyingmechanisms(Connollyetal.,2017).A
log-normal SAD is not in itself very informative on the mechanisms of community
assembly.Indeed,thelog-normaldistributionisthelimitingprobabilitydistributionfor
Introduction
25
anyproductofsufficientlymanyrandomvariables,asaconsequenceofthecentrallimit
theorem(cf.sectionIII.2andIII.3.a),thusalog-normalSADcouldariseastheresultof
anytypeofmultiplicativeprocess.Moregenerally,ithasbeensuggestedthattherange
ofempiricallyobservedSADscouldsimplyresultfromtheiterativespatialaggregation
of smaller-scale SADs, a phenomenon described as a ‘spatial analogy of central limit
theorem’(Sizlingetal.,2009).Asaconsequence,ithasbeencalledfor,ontheonehand,
more statistically powerful tests than simple curve-fitting (Chave et al., 2006; Al
Hammal et al., 2015), and on the other hand, testing multiple predicted patterns
simultaneouslyinsteadofsolelytheSAD(McGilletal.,2007).
b. Spatialpatterns
Spatial patterns form a second family of integrative patterns in ecology. The
relationshipbetweenthesampledareaandthenumberofsampledspeciesistheoldest
suchpattern tohavebeenstudied(Watson,1859).Thiscurvewas first regardedasa
meantoassesswhetheracommunityhadbeenadequatelysampled,i.e.toensurethat
onlyamarginalnumberofnewspecieswouldappearinthesampleifthesampledarea
were to be increased. It was soon realized that the species-area relationship (SAR)
mightalsocontainvaluableinformationregardingspatialcommunitystructure.Indeed,
attheregionalscale,thenumberofspeciesSwasfoundtoconsistentlyfollowapower
law𝑆 ∝ 𝐴!asafunctionofareaA,wheretheexponentztakesvaluesbetween0.15and
0.40(Arrhenius,1921;Williamson,1988).This ‘law’has laterbeenobservedtobreak
downattheextremes,eitherforareasthatarebelowapproximately1km2(forplants
or vertebrates), or conversely for areas that exceed the boundaries of a single
biogeographic unit (cf. Fig. 6; Preston, 1960; Shmida &Wilson, 1985). The resulting
curveexhibitsan ‘S’ shapeona log-logscale,witha lineardomain in thecentralpart
correspondingtothepower-lawbehaviourdescribedabove,andsteeperslopesatboth
ends.
Introduction
26
The three domains of the SAR reflect different processes at play. At the local
scale,theSARdirectlyresultsfromsamplingthelocalspeciesabundancedistribution:
thenumberofdetectedspeciesfirstincreaseslinearlywithareaandthenprogressively
slowsdownasonlytherarerspeciesremaintobesampled.Attheglobalscale,theSAR
approacheslinearityagainasspecieswithdistinctevolutionaryhistoryaresampledin
differentbiogeographiczones.At intermediatescales, thepower-lawregimereflectsa
slowincreaseinspeciesrichnesswithareaoncethelocalspeciesrichnesshasbeenfully
sampled. This increase corresponds to a shift in species composition with distance,
referred to as ‘beta-diversity’ by Whittaker (1960), i.e. the link between ‘alpha-
diversity’, the number of species in the local community, and ‘gamma-diversity’, the
numberofspeciesattheregionalscale.
Figure6:Numberofbirdspeciesasafunctionofarea;datafromPreston(1960).TheS-shapedSpecies-Area Relationship introduces two characteristic spatial scales for a given taxonomicgroup (verticaldashed lines), separating ‘local’, ‘intermediate’, and ‘large’ scales.The studyofbetadiversitymostlyfocuseson‘intermediate’scales,whilebiogeographyismostlyconcernedwith‘large’scales.AdaptedfromHubbell(2001).
Conceptually, beta-diversity is the variation in taxonomic composition among
siteswithinaregionofinterest.However,severalquantitativedefinitionscoexist.One
approachistoconsiderbeta-diversityasaquantityβthatlinksthemeanlocaldiversity
CHAPTER S IX
Fig. 6.2. Species-area curve for the world’s avifauna, spanning spa-tial scales from less than one acre to the entire surface of the Earth.The S-shaped curve suggests that the sampling units change as area isincreased, from individuals, to species ranges, and finally to differentbiogeographic realms at local, regional to subcontinental, and finallyto intercontinental spatial scales. Data from Preston (1960).
steep once again over large intercontinental spatial scales,until the area of the entire world was included. The changein slope implies that scale-dependent changes in samplingunits are occurring, a possibility of which Preston (1960) wasclearly well aware. I will discuss this scale-dependent changein a more formal theoretical treatment shortly.
A similar S-shaped curve was obtained by Shmida andWilson (1985), who plotted plant species-area relationshipson local to global scales (fig. 6.3). They argued that thechange in the form of the curve reflected changes in thebiological determinants of plant species richness. On verylocal scales, niche-assembly rules would dominate. On some-what larger spatial scales, mass effects and habitat diver-sity would become important. They define mass effects as animmigration subsidy from regional populations of a speciesthat would go locally extinct without this immigration sub-
158
Introduction
27
α to the regional diversity γ through𝛾 = 𝛼𝛽, so that the regional diversity can be
partitioned into independent within-community and among-community components
(Whittaker,1960;Jost,2007).Thespatialscalethatseparatesalpha-andbeta-diversity
maybedefinedasthescalewitnessingtheregimeshiftintheSAR.Anotherapproachis
tomeasure beta-diversity independently of alpha- and gamma-diversity as themean
taxonomic similarity between sites or as the variance of the community matrix
(Legendre & De Caceres, 2013). The community matrix is the matrix describing the
numberofindividualsperspeciesandpersites,takingusuallyspeciesascolumnsand
sitesasrows.Awealthofsimilaritymetricscanbeusedtocomparesitestoeachother,
depending for instanceon theweight given to rare species, onwhether the sampling
effort ishomogeneousamongsitesornot, andonwhetherabundance informationor
onlyspeciesoccurrenceshouldbetakenintoaccount(Legendre&DeCaceres,2013).
Taxonomicsimilarityiswellknowntodecreasewithdistance,ageneralpattern
ofecologythatisrelatedtothemonotonousincreaseofdiversitywitharea(Soininenet
al.,2007).Dependingonthemechanismsofcommunityassembly,this ‘distance-decay
of similarity’ can be interpreted either as the result of dispersal limitation, or as the
consequence of new habitats and community types being encountered. A major
motivationforthestudyofbeta-diversityliesinthefactthatitisanindirectmeansto
investigate thedriversof community assembly. Indeed, taxonomic similaritybetween
sitescanbecomparedtodistanceandtoenvironmentalsimilarity,soastoempirically
assess therelative importanceofdispersalandabiotic filtering inshapingcommunity
composition (Tuomistoetal., 2003).Thisquestionmayalsobe addressedbydirectly
comparing taxonomic compositionwith quantitative environmental descriptors using
multivariate statistical methods, an approach deemed more statistically powerful
(Legendreetal.,2005,2008;cf.sectionIII.2).
Formally, the distance-decay of similarity can be described using the pair-
correlation function of statistical physics, i.e. the probability for two individuals at a
givendistance tobelong to thesamespecies (Chave&Leigh,2002;Zillioetal., 2005;
Houchmandzadeh, 2009). Predictions for both the SAR and the distance-decay of
similaritycanbeobtainedfromaspatiallyexplicitversionofHubbell’sneutralmodel.
Introduction
28
Theneutral predictions are in qualitative agreementwith observations, including the
tri-phasic SAR (Hubbell, 2001; Condit et al., 2002). Nevertheless, as in the case of
speciesabundancedistributions, thisdoesnotprecludeothermechanismsfrombeing
involved.
EnvironmentalDNAdata2.
Collecting the large amounts of data required to study integrative patterns has long
been a tedious and challenging task (Lawton et al., 1998). Direct taxonomic
identification relies on rare expert knowledge, and is prone to errors. Sampling
protocolsaredifficult tostandardize,andowingtotheamountofworkinvolved,data
collection may spread over long periods of time – sometimes years – which may
introducebiases.Lastbutnotleast,onlyasmallfractionofbiodiversitycanbedirectly
sampledandidentifiedbyahumanobserver,mostlyvertebratesandplants.Asaresult,
datasets available for the study of integrative biodiversity patterns have long been
relatively rare and limited in their taxonomic extent. However, major technological
advances have been made over the last decades that now allow for the automatic
collectionofecologicaldata.Theseadvancesareallrelatedtotheexponentialincrease
in computer power that took place over the same period of time, and that has
dramaticallyimpactedallfieldsofscienceandindustry.
For instance,remotesensingofecological featuresover largespatialscalescan
be achieved using Lidar and hyperspectral imaging. Lidar is a small-wavelength
equivalent of radar (either airborne or ground-based) that allows for fine-grain 3D
imaging. Hyperspectral imaging consists in recording images (from a plane or a
satellite) foramuch largerspectrumofelectromagneticwavelengths than thehuman
eye does: the additional information may for instance be used for the automated
identification of tree species from their spectral signature, especiallywhen combined
withLidardata(Alonzoetal.,2014).
Introduction
29
Arguably, theonerecenttechnological innovationwiththestrongest impacton
biology has been high-throughput DNA sequencing (Schuster, 2007). While DNA
sequencingmethodshaveexistedsincethe1970s(Sangeretal.,1977),abreakthrough
occurredaround2005bywhichsequencingspeedwasmultipliedbyseveralordersof
magnitude for a fraction of the cost of previousmethods (cf. Fig. 7; Margulies et al.,
2005).Since2011,thedominanthigh-throughputDNAsequencingmethodisIllumina
sequencing,whichconsistsinspreadingandattachingthetargetDNAstrandsonaflat
surface and synthesizing the complementary strands using four-colour fluorescent
nucleotides(Bentleyetal.,2008).Byrecordingtheorderofappearanceofthedifferent
colours at the location of each DNA strand with a fast and high-resolution camera,
millions of strands can be simultaneously sequenced with high accuracy. The main
limitation of the method is on the length of the sequenced strands, which currently
cannotexceed150or300basepairs,dependingontheexacttechnology.
The idea of using DNA sequencing to study biodiversity predates high-
throughput sequencing, and was introduced as a mean to study microorganisms
(Giovannoni et al., 1990). Indeed, most microorganisms can only be detected in the
environmentthroughtheirDNA,collectedfromsoilorwatersamples(Pace,1997).The
idea was to identify a short DNA sequence satisfying two properties. First, it should
haveconservedextremitiesacrosstherangeoftargetedtaxa,sothatitcanbeamplified
byPCRusing a single pair of primers frombulkDNA. Second, its central part should
exhibitrandommutationsmakingthedifferenttaxadistinguishable,i.e.itshouldnotbe
understrongevolutionaryselection.Suchasequence iscalledabarcode,andthefirst
that has been used is the 16S rRNA gene of prokaryotes, which codes for the RNA
formingthesmall (16S)subunitof theprokaryoticribosome(Giovannonietal.,1990;
Pace, 1997). DNA barcodes have soon also been recognized as amean to bypass the
needfortraditionaltaxonomicexpertiseinidentifyinglargerorganisms,forwhichDNA
can be directly extracted from tissue (Hebert et al., 2003). Nevertheless, barcode
sequences can only be attributed to known taxa once a reference database has been
establishedforthebarcode.Whennoreferencedatabaseisavailablefortheorganisms
under study, molecular Operational Taxonomic Units (OTUs) defined based on
Introduction
30
sequencesimilarityaresubstitutedforspeciesinanalyses.Moreover,dependingonthe
frequency at which mutations occur in the barcode sequence, the comparison of
sequencesacrossspeciesmaynotbecongruentwithtraditionalspeciesdelineation,and
twobarcodestargetingthesametaxonomicgroupmayhavewidelydiffering levelsof
taxonomicresolution.
before 2003, hundreds of articles have been publishedsince the emergence of the DNA barcoding concept.Clearly, standardization was an important step in thedevelopment of DNA-based species identification, and ithas encouraged extensive international efforts to build tax-onomic reference libraries of the standardized regions.However, the barcoding standards were designed to iden-tify species from more or less intact DNA isolated fromsingle specimens using Sanger sequencing, and focusmore on the variability of the amplified region than onthe nonvariability of the primer sites and the length of thetargeted DNA region.
The emergence of DNA metabarcoding in relation
to next-generation sequencing and to the needs
of the scientific community
Here, we introduce the term ‘DNA metabarcoding’ to desig-nate high-throughput multispecies (or higher-level taxon)identification using the total and typically degraded DNAextracted from an environmental sample (i.e. soil, water, fae-ces, etc.). Species identification from bulk samples of entireorganisms (e.g. Chariton et al. 2010; Creer et al. 2010; Pora-zinska et al. 2010; Hajibabaei et al. 2011), where the organ-isms are isolated prior to analysis, can also be considered asDNA metabarcoding. Below, we will restrict our consider-
ations to the analysis of environmental DNA (eDNA),because analysis of bulk samples has very relaxed technicalconstraints compared to that of environmental samples.Bulk samples are usually composed of a restricted taxo-nomic group and provide high-quality DNA allowing theuse of a longer barcode, even the standardized ones. We willalso emphasize that because the goal of DNA metabarcodingis to identify taxa, it should be clearly differentiated frommetagenomics that ‘describes the functional and sequence-based analysis of the collective microbial genomes containedin an environmental sample’ (Riesenfeld et al. 2004).
The emergence of DNA metabarcoding was because oftechnology catching up to a scientific need. Standardized(‘traditional’) DNA barcoding does not fulfil all the needsof ecologists. As it is designed to identify single specimenswith DNA that is more or less intact, it typically requiresthe isolation of a suitable specimen to be analysed, whichis time-consuming and, for some taxonomic groups, diffi-cult or virtually impossible. Consequently, standardizedbarcoding is limited in the number of specimens that it canidentify. We must therefore accept that standardized DNAbarcoding is not ideal for high-throughput species identifi-cation for use in ecological studies, although it has an obvi-ous added value in many situations where classical speciesidentification is difficult and in facilitating the discovery ofnew species. These limitations have been surmounted bythe increasing availability of NGS machines that permithigh-throughput techniques such as DNA metabarcoding.At the moment, sequencing platforms can produce up to6 billions of sequence reads of 100 bp per run, with thepossibility to implement paired-end experiments (Glenn2011). Thus, it is not any more a problem to obtain severalthousands of sequence reads per amplicon, and the lengthof the sequence reads is already fully compatible with theshort fragment lengths required for eDNA metabarcoding.There is no doubt that the technology will improve stillfurther. As a consequence, NGS has the potential to pro-vide an enormous amount of information per experimentfrom in-depth sequencing of uniquely tagged amplicons(Binladen et al. 2007; Valentini et al. 2009). So, why not useeDNA to simultaneously identify many species in a singleexperiment? After some initial experiments based onPCR ⁄ cloning ⁄ sequencing (Willerslev et al. 2003, 2007), theapproach using NGS has already demonstrated its poten-tial, for analysing plant communities using soil samples(Yoccoz et al. 2012), for reconstructing past plant or animalcommunities using permafrost or ice samples (Haile et al.2009; Sønstebø et al. 2010; Boessenkool et al. 2012; Jørgen-sen et al. 2012a,b; Epp et al. submitted), for tracking earth-worms using soil samples (Bienert et al. 2012), formonitoring vertebrate biodiversity (Andersen et al. 2012),or for diet analysis using faeces or stomach content as asource of DNA (see review in Pompanon et al. 2012).
However, there are significant constraints when designingan eDNA metabarcoding study. First, eDNA is often highlydegraded, and long fragments of several hundreds of basepairs cannot be reliably amplified (Willerslev et al. 2004;Hansen et al. 2006). Second, because many species have tobe amplified in the same PCR experiment, it is extremely
DNA-based species identification(RFLP/Southern blots)
DNA barcoding(PCR – Sanger sequencing)
eDNA metabarcoding(PCR – NGS)
eDNA metabarcoding(Shotgun – NGS)
DNA-based species identification(PCR – Sanger sequencing)
Time
1990
2000
2010
PCR
Nex
t Gen
erat
ion
Sequ
enci
ng (N
GS)
eDNA metabarcoding(Capture probes – NGS)
Method implemented forspecies identification
Availabletechniques
?
Fig. 1 DNA-based species identification. Past and currentapproaches, and possible future trends.
2046 NE XT -G EN ERAT ION DN A M E TAB ARCOD ING
! 2012 Blackwell Publishing Ltd
Figure7:EvolutionofDNA-basedspeciesidentificationovertime.ThesuccessiveintroductionofPolymeraseChainReaction(PCR)andHigh-throughput(orNextGeneration)Sequencingtoecologyhavetransformedthefield,andautomateddatacollectionusingmolecularapproachesisdevelopingfast.AdaptedfromTaberletetal.(2012b).
Introduction
31
With the advent of high-throughput sequencing, several thousands to several
millionsof barcode sequences cannowbe readily sequenced froma singlebulkDNA
sample.Asaconsequence,theuseofbarcodesequencingtomeasurebiodiversityfrom
environmentalDNA–or‘metabarcoding’–hasboomed(Biketal.,2012;Taberletetal.,
2012b,a;Bohmannetal.,2014).Thevastdiversityofthemicrobialworldonlystartsto
befullygrasped,andwholenewswathesofthetreeoflifearebeingdiscovered(Huget
al., 2016). Ambitious projects aim at sampling microbial diversity across the globe,
eitheronland(Gilbertetal.,2014)orintheocean(deVargasetal.,2015).Inparallel,
metabarcodingcanbeusedasafastandstandardizedmeanstogatherinformationon
macroscopic organisms, either using environmental DNA or, for small enough
organisms,DNAextractedfroma‘soup’ofsampledspecimen(Andersenetal.,2012;Yu
etal.,2012;Gibsonetal.,2014).Thiswealthofdatahasledtoarenewalofinterestin
the study of integrative biodiversity patterns and biogeography, which were until
recently entirely unknown formicroorganisms (Martiny etal., 2006; Fuhrman, 2009;
Hanson et al., 2012). Since metabarcoding is but the simplest method to exploit the
information contained in environmental DNA, and is being replaced by approaches
makinguseofalargerfractionoftheorganisms’genomeassequencingcapacitykeeps
increasing(Taberletetal.,2012b),thetrendtowardincorporatingsequencingdatainto
ecologicalstudiesisprobablyjuststarting.
ThetropicalforestsofFrenchGuiana3.
Tropicalforestsareestimatedtoconcentratehalfofglobalbiodiversity,andareassuch
thearchetypical‘hyperdiverse’ecosystem(Scheffersetal.,2012).Theyhaveplayedan
historicalrole ingeneratinghypotheses inecologyandevolution,especiallyregarding
themechanismsofspeciescoexistence(Wright,2002).Indeed,likethephytoplanktonic
communities at the origin of the ‘paradox of the plankton’ (Hutchinson, 1961), they
harbour formany taxonomic groups awide range of species competing for the same
resources.Hubbell’sneutraltheoryofbiodiversityhasbeenelaboratedbasedprimarily
Introduction
32
ontheobservationoftropicalforesttreecommunities(Hubbell,2001),andmuchofthe
ensuing debate has initially focused on these communities as well (McGill, 2003;
Ricklefs, 2003; Volkov et al., 2003). In addition to their unparalleled biodiversity,
tropicalforestsarealsothoughttoharbourthemajorityofthenon-microbialterrestrial
taxa still unknown to science (Scheffers et al., 2012). Hence, the automated
measurementof integrativepatterns iswell suited to their study, and is inparticular
uniquelycomprehensivecomparedtootherpossibleapproaches.
Figure 8. Whether ecosystems can be considered pristine depends on the temporal scaleconsidered: map showing estimated changes in phosphorus (P) concentrations over time inSouthAmericafollowingthesuddenextinctionofmostlargemammalspecies12,000yearsagoand the consecutive disruption of nutrient transport through dung, likely caused by humanarrival.AdaptedfromDoughtyetal.(2013).
UnlikemostlandecosystemsonEarth,asignificant,iffastdwindling,fractionof
tropicalforestscanstillbeconsideredtobeinapristinestate,thusguaranteeingaccess
NATURE GEOSCIENCE DOI: 10.1038/NGEO1895LETTERS
70° W 60° W
70° W 60° W
a
b
c
d
500
400
300
kg P km¬2
200
100
0
70° W 60° W
60° W70° W
Steady state 15,000 years ago
Current day Change in steady state
Steady state in 28,000 years
0°
10° S
0°
10° S
0°
10° S
0°
10° S
Figure 3 |Map showing changing ecosystem P concentrations in South America due to megafauna extinctions. a, The steady-state estimate of Pconcentrations in the Amazon basin before the megafaunal extinctions with a lateral diffusivity �excreta value of 4.4 km2 yr�1. b, The current-day estimateof P concentrations 12,000 years after the extinctions with current animals and a �excreta value of 0.027 km2 yr�1. c, Estimated P concentrations in theAmazon basin 28,000 years in the future. d, The difference between the pre- and post-extinction equilibrium (a and c).
Table 1 |Average�excreta ⇤↵B (km2 yr�1) for each continent calculated for modern species and modern plus extinct species.
North America South America Australia Eurasia Africa
Number of species extinct 65 64 45 9 13Mean weight of extinct animals (kg) 846 1,156 188 2,430 970Modern �excreta ⇤↵B 13,876 12,934 21,804 21,779 265,621Modern+extinct fauna �excreta ⇤↵B 140,716 (±38,000) 283,854 (±81,000) 48,250 (±8,000) 118,349 (±29,000) 324,848 (±18,000)Percentage of original 10% (±2%) 5% (±1%) 45% (±6%) 18% (±4%) 82% (±4%)
Bottom row is the percentage of the original �excreta ⇤↵B remaining. The error represents an uncertainty in extinct species distribution of 30%.
on the loss rate (K ) which is a large source of uncertainty.).Our simulated modern-day distribution of P does not includethe large diversity of parent material and soil evolutionary stageswhich greatly impact observations of soil P across Amazonia(Supplementary Fig. S3), and instead represents the change inaccessible P in the biomass-necromass-soil continuum (‘ecosystemP’) andnot total P. EcosystemP concentrations in intact Amazonianforests could, therefore, potentially continue to decrease (to >90%of steady state) for 17 (between 3 and 43) thousand years into thefuture as a legacy of the Pleistocenemegafauna extinctions.
Although we have concentrated our analysis on Amazonia, itis likely that there were similar changes in nutrient transfer onall continents that experienced megafaunal extinction, albeit withvariations in the local nutrient gradients and the key limitingmacro-or micronutrients. Using data on Pleistocene megafaunal bodymasses, we estimate that � decreased drastically on all continents.Africa, the continent on which modern humans co-evolved withmegafauna, is the only continent with most (82%) of the lateralnutrient distribution capacity still intact (Table 1). The largestdeclines (90–95%) were in the Americas. It seems that Eurasia alsoshowed a large decline despite only nine extinctions, because theextinct megafauna were large (for example mammoths) whereasAustralia showed a moderate decline despite a large numberof extinctions, because the extinct megafauna were relativelysmall. However, these are estimates of non-pressured populationdensities, and ranges and current values for Africa and Eurasia
are probably reduced owing to current pressures on megafauna,because of decreases in megafaunal population size and restrictionson their free movement across landscapes.
Following the extinction of the megafauna, humans eventuallyappropriated much of the net primary production that had beenconsumed by the extinct animals23,24. Did we also take over theirrole of nutrient dispersal? People currently provide nutrients asfertilizer to agricultural systems, but much of this gets concentratednear agriculture, suggesting that humans act as concentratingagents rather than diffusive agents like the herbivorous megafauna.Therefore, compared to earlier eras, the post-megafaunal world ischaracterized by greater heterogeneity in nutrient availability25.
Our framework for estimating nutrient diffusion by animals canbe applied to modern ecosystems globally, and even incorporatedinto global land biosphere models demonstrating the ecosystemservice of nutrient dispersal. This service is analogous to that playedby arteries in the human body, with large animals acting as arteriesof ecosystems transporting nutrients further and smaller animalsacting as capillaries distributing nutrients to smaller subsectionsof the ecosystem. Therefore, after the demise of its large animals,the Amazon basin has lost its nutrient ‘arteries’ and the widespreadassumption of P limitation in the Amazon basin may be a relic ofan ecosystem without the functional connectedness it once had3.This new mathematical framework provides a potential tool ofquantifying the important but rarely recognized biogeochemicalservices provided by existing large animals. Therefore, those
NATURE GEOSCIENCE | VOL 6 | SEPTEMBER 2013 | www.nature.com/naturegeoscience 763
Introduction
33
tonaturalprocessesunaffectedbyhumanactivities.Amazonia represents theworld’s
largest tropical forest, andwithin it, French Guiana counts among its least disturbed
parts(Hansenetal.,2013).Nevertheless,recentfindingshavechallengedtheideaofan
entirely pristine Amazonian basin. Indeed, it appears that human population density
wasrelativelyhigh inplacesuntil theEuropeanconquest (Heckenbergeretal.,2008).
Moreover,ona longertimescale, theAmazonianbasinmaystillbe inatransientstate
following the suddendisappearanceofmost largemammal species12,000years ago,
which was likely caused by the arrival of human hunters and has had deep
consequencesonnutrienttransportandseeddispersal(cf.Fig.8;Doughtyetal.,2013).
TworesearchstationshavebeenestablishedinFrenchGuianainthe1980sfor
research on Amazonian biodiversity. This is where the data used in this thesis have
beencollected.TheNouraguesresearchstationisabout100kminland,intheheartof
theNouraguesnaturalreserve,andisdevotedtothestudyoftheundisturbedlowland
forestaswellasoftheneighbouringinselberg.TheParacouresearchstation,nearthe
coast, is devoted to the study of the long-term effects of logging on biodiversity
(Gourlet-Fleuryetal., 2004). Inboth stations, soils are acidic andnutrient-poor, as is
typical intropical forests,withamoresandysoil inParacouandamoreclayeysoil in
Nouragues. The mean rainfall is about 3,000 mm per year, with relatively strong
seasonalvariation,andtemperatureisaround26°Cthroughouttheyear.
Introduction
34
III. Statisticalapproaches
In this section, I introduce the statistical approachesused in this thesis. I first briefly
review the classical approaches of community ecology. I then introduce Hubbell’s
neutral model and Dirichlet mixture models, which are respectively the foci of the
second and third chapters of this thesis, by emphasizing their commonmathematical
structurebasedontheDirichletdistributionanditsDirichletprocessextension.
Comparingmodelstodatainecology1.
Inphysics,empiricaldatacanoftenbesatisfyinglycharacterizedbyaone-dimensional
mathematicalfunctionfittedtothedata(‘curvefitting’).Thisisnotthecaseinecology,
because observations do not as a rule tightly follow the prediction of a theoretical
model, andbecausedatapoints are always relatively scarce and costly to acquire.To
take full advantage of the available data, it is hence essential to account for the
statistical distribution of the observations around the fittedmodel, and often for the
statisticaldependencebetweenobservations.Intheabsenceofatheoreticalprediction,
deterministic trends in the relationshipbetweenvariables are conversely assumed to
be very simple (e.g., linear). Thus, ecologicalmodels aiming at comparisonwith data
need to be expressed in probabilistic terms, and model fitting heavily relies on
likelihood-basedinference(Fisher,1925;Pawitan,2001).
Thelikelihoodfunctionofamodelisgivenbytheprobabilitydistribution𝑝(𝑋|𝜃)
for the dataX to be observed conditional on themodel’s parameters𝜃. Themodel is
fittedtodatabymaximizingthelikelihoodfunction𝐿 𝜃 𝑋 = 𝑝(𝑋|𝜃),whichisameans
of simultaneously estimating themodel’s parameters as𝜃(𝑋) = argmax! 𝐿(𝜃|𝑋) and
measuringthegoodness-of-fitas𝐿(𝑋) = max! 𝐿(𝜃|𝑋) .Inpractice,thelogarithmofthe
likelihood is maximized, and the normalization factor in the likelihood expression is
discarded.Dependingonthesituation, the focusmaybeonmeasuringgoodness-of-fit
Introduction
35
oronestimatingandinterpretingmodelparameters.Ifseveralalternativemodelsareto
becomparedtoeachother,thiscanbeachievedbycomparingtheAkaikeInformation
Criterionforeachmodel,equalto2𝐾 − 2 ln 𝐿(𝑋),whereKisthenumberofparameters
in themodel (Akaike, 1974; Burnham&Anderson, 2002). If only onemodel is to be
compared to the data, themost popular approach is to assess how likely the data at
handwouldbetobeobservediftheyweretobegeneratedbytheprobabilisticmodel
underconsideration.Tothisend,thevaluetakenbya‘teststatistics’-forinstancethe
log-likelihoodln 𝐿(𝑋)-iscomparedtoitstheoreticaldistributiongiventhemodel.The
thresholdforrejectingthemodelwithreasonableconfidenceistraditionallysetat5%
probability,followingtheexampleofFisher(1925).
Another approach to likelihood-based inference consists in estimating the full
probability distribution of themodel’s parameters conditional on the data instead of
only their most likely value (Gelman et al., 2014). This approach is called Bayesian
inference, in contrast to maximum-likelihood inference, since the full probability
distribution of the model’s parameters 𝜃 is given by Bayes’ equation 𝑝 𝜃 𝑋 =
𝑝(𝑋|𝜃)𝑝(𝜃)/𝑝(𝑋)(Bayes&Price,1763).Anotherdistinctionbetweenbothapproaches
is that maximum-likelihood inference assumes that
argmax! 𝑝(𝜃|𝑋) = argmax! 𝑝(𝑋|𝜃) , and thus implicitly that 𝑝(𝜃) is a uniform
distribution.Incontrast,𝑝(𝜃)isoftenusedtoexpresspriorbeliefonparametervalues
in Bayesian inference. The normalization factor𝑝 𝑋 = 𝑝(𝑋|𝜃)𝑝(𝜃)! , or marginal
likelihood,canthenbeusedasameasureofgoodness-of-fitaccountingforallpossible
parameter choices. Because it is less analytically tractable than maximum-likelihood
inference,Bayesianinferencehasbeenlessemployedhistorically.However,itcannow
be performed numerically, and even though it is usually more computationally
demanding than maximum-likelihood inference, it has become increasingly popular
withthesteadyincreaseincomputerpower.Oneofthereasonsofitssuccessisthatit
canaccommodatecomplexmodelsinwhichthelikelihoodisdifficulttomaximize.
Introduction
36
Thestatisticaltoolsofcommunityecology2.
Univariatemodelssuchassimplelinearregression,whereobservationsareregardedas
realizationsofasingerandomvariable,canbedistinguishedfrommultivariatemodels
where observations result from several non-independent random variables. The
analysis of communitymatrices relies onmultivariate statisticalmethods, where the
abundance, or theoccurrence, of eachof thep taxa is regardedas a randomvariable
with a realization at each of the n sampling sites. Not all of the many multivariate
methods classically used in community ecology are explicitly model-based: they
typicallycombinemultivariatelinearregression,eigenvaluedecompositionandtheuse
of (dis)similarity metrics (Legendre & Legendre, 2012). Their results are often
interpretedwithintheframeworkofthe‘analysisofvariance’(ANOVA),whichconsists
inpartitioningthevarianceoftheobservedvariablesintocomponentscorrespondingto
differentsourcesofvariation.
Themultivariatemethods that include an eigenvaluedecomposition step (or a
generalizedversionofit)arecalled‘ordination’methods.Acornerstoneofmultivariate
analysis is Principal Component Analysis (PCA), a simple ordination method of
widespreadusewellbeyondecology.Itconsistsinrotatingpobservedvariablesaround
their mean so as to obtain p uncorrelated variables ordered by decreasing variance.
Namely, then-by-pmatrixT containing thep newvariables isobtainedas thematrix
product𝑇 = 𝑋𝑊,where𝑋isthen-by-pmatrixcontainingthecentredoriginalvariables,
and𝑊 the p-by-p matrix formed by the eigenvectors of the covariance matrix
1 𝑛 − 1 𝑋!𝑋orderedbydecreasingeigenvalues.ThefirstuseofPCAistodecorrelate
the data. It may also be used for reducing data dimensionality by discarding the
independent variables accounting for the least variance. Thus, PCA allows for
conveniently representing the data by projecting them on the two or three axes that
accountforthemostvariance.Toinvestigatethedependenceofacommunitymatrixon
a set of explanatory variables, such as environmental variables measured at the
samplingsites,aclassicalmethodistoperformamultivariate linearregressionofthe
communitymatrix on the explanatory variables, followed by a PCA on thematrix of
Introduction
37
fitted coefficients, a method known as Canonical Redundancy Analysis (RDA). Using
partiallinearregression,RDAcanbeextendedinto‘partialRDA’tocomparetheeffect
ofseveralsetsofexplanatoryvariablesonthecommunitymatrix.
Clustering methods constitute another family of extensively used statistical
methods in ecology (Legendre & Legendre, 2012), as well asmore generally in data
miningandmachine learning (Bishop,2006; Jain,2010).Theyaimatpartitioning the
data into ‘natural’ clustersofobservations,bysearching forstructure in thematrixof
pairwisesimilaritybetweenobservations.Assuch,theirscopeoverlapstosomeextent
withthatofexploratoryordinationmethodssuchasPCA.Intheterminologyofmachine
learning, clustering algorithms are ‘unsupervised’ algorithms, i.e. they aim at
discovering patterns without being provided any prior information, in contrast to
‘supervised’algorithmsaimingatclassifyingpatternsbasedonpre-existingcriteria.
Themostpopularclusteringalgorithms inecologyare ‘hierarchical’ones.They
consist inrecursivelysplitting thedata intoclustersofobservationsstarting fromthe
whole dataset – or conversely, recursively agglomerating clusters of observations
startingfromtheindividualobservations–bymaximizingbetween-clusterdissimilarity
at each step. Dissimilarity between two clusters is most commonlymeasured as the
meanpairwisedissimilaritybetweentheobservationsofeachcluster,amethodcalled
UPGMA (‘Unweighted Pair Group Method with Arithmetic Mean’). The pairwise
dissimilarity between observations can bemeasured using any dissimilarity metrics,
whichisoftenanadvantageinecologyowingtothewiderangeofdissimilaritymetrics
in use (Legendre & De Caceres, 2013; cf. section II.1.b). Another advantage of
hierarchical clustering is that the result can be displayed as a tree of hierarchically
nested clusters (or ‘dendrogram’): in addition to visualizingdata structure, thishelps
choose the number of clusters according to the desired level of similarity within
clusters.Hierarchicalclusteringishowevercomputationallyintensiveforlargedatasets.
Moreover, because splits – or merges – decided at each hierarchical step cannot be
undoneandhaveastrongimpactonthesubsequentsteps,thealgorithmmaybeeasily
trappedinsuboptimalsolutionsforlargeandnoisydatasets.
Introduction
38
‘Partitional’ algorithms,which consist in searching for the optimal partition of
thedataintoapredefinednumberofclusters,formasecondfamilyofalgorithmsthat
are better adapted to large datasets (Jain, 2010). The most widespread partitional
algorithm isk-means clustering,which formally consists in finding thek clusters that
minimizewithin-clustervarianceintheEuclidianspaceofobservations,withkafixed
parameter.Unlikehierarchical clustering,which ispurelyheuristic, theproblemofk-
means clustering can be reframed as the fit of amultivariate statisticalmodel to the
data(specifically,a‘Gaussianmixturemodel’).Thisishoweverachievedusingheuristic
algorithms,whichmayconvergetosuboptimalsolutions.Themostcommonalgorithm
consists in randomly setting the position of the k cluster centres in the space of
observations, delineating the clusters by assigning each observation to the closest
clustercentrebasedonEuclidiandistance,and then iterativelyreshaping theclusters
using theirmean in the previous step as their new centre, until convergence. Lastly,
‘networkscience’providesa rangeofclusteringalgorithms thatarebasedonagraph
representation of the similaritymatrix (Rosvall et al., 2009; Fortunato, 2010). These
methods that are well adapted to large datasets have recently enjoyed a rise in
popularity in ecology (Vilhena&Antonelli, 2015;Bloomfieldetal., 2017;Wangetal.,
2017).
A pervasive assumption in classical statistical models is that observations are
normally distributed – i.e., followGaussian probability distributions. This assumption
may be explicit, or sometimes implicit. For instance, model fitting by least-square
regression amounts to maximizing the log-likelihood of independent identically
distributednormalvariablescentredon the fittedmodel.Likewise, theassumption in
PCA that the observed variables can be entirely characterized by their mean and
varianceimpliesthattheyarenormallydistributed,sincethispropertyisuniquetothe
Gaussian distribution. A justification for the normality assumption is that an
observationonasamplecantypicallyberegardedasthesum,orthemeanoutcome,of
manyrandomdraws,yetthecentrallimittheoremstatesthatthemeanofasufficiently
largenumberof randomvariables is alwaysnormallydistributed.Thank to themany
convenient mathematical properties of the Gaussian distribution, exact analytical
Introduction
39
expressions have been obtained for maximum-likelihood estimators and for the
theoretical distribution of test statistics. Prior to the advent of computers, such
analytical resultswereanessential condition for thepracticalusefulnessof statistical
models.This ishowevernotthecaseanymore,andtheexplorationofmodelsthatare
notbasedontheGaussiandistributionisnowpossible.
TheDirichletdistributionanditsDirichletprocessextension3.
a. TheDirichletdistribution
Notallnaturalprocessesareadditive,andasaconsequence,notallquantitiescanbe
assumed to be normally distributed as the sum of a large number of random draws.
Some processes are multiplicative, and a direct consequence of the central limit
theoremisthattheproductofalargenumberofrandomdrawswillfollowalog-normal
distribution. Indeed, for N random variables𝑋! ,ln 𝑋!!!!! = ln𝑋!!
!!! . Hence, the
central limit theorem states thatln 𝑋!!!!! is normally distributed for large N. It
follows from the definition of the log-normal distribution that 𝑋!!!!! is log-normally
distributed.Asmentioned insectionII.1.a, this isapossibleexplanation for theoften-
observed log-normaldistributionof species abundances. Indeed, if the abundancesof
species are independent of each other, a species’ change in abundance through time
maytaketheformofarandommultiplicativefactorappliedtoitsreproductiveoutput
ateachgeneration,dependingforinstanceonenvironmentalfluctuations.
However, if changes in species abundance are rather driven by demographic
drift, as assumed in a neutral framework, relative species abundances are better
described by the following process: starting from abundances 𝑎!,… ,𝑎! , where𝑎! is
thenumberof individuals inspecies i,oneoftheSspecies ispickedateachtimestep
with probability equal to its relative abundance (or equivalently, one individual is
pickedatrandominthepopulation),anditsabundanceisincreasedbyoneindividual.If
this sampling scheme, called a Pólya urn, is repeated indefinitely, the distribution of
Introduction
40
species relative abundances 𝑥!,… , 𝑥! will follow the Dirichlet distribution of
parameters 𝑎!,… ,𝑎! , which may be regarded as a distribution over distributions
(Blackwell&MacQueen,1973):
𝑝 𝑥!,… , 𝑥!!!|𝑎!,… ,𝑎! =Γ 𝑎!!
!!!
Γ 𝑎!!!!!
𝑥!!!!!
!
!!!
𝑥! = 1− 𝑥!!!!
!!!
Γis the gamma function generalizing the factorial to real numbers and taking value
Γ 𝑎 = 𝑎 − 1 !when𝑎isapositiveinteger.NotethatthedescriptionofthePólyaurn
originallyinvolvesdrawingballsofdifferentcoloursfromanurninsteadofindividuals
ofdifferentspeciesfromacommunity.
If there is no a priori reason to assume differences between the S species,
parsimony leads to setting all initial abundances𝑎! to the same value𝑎(‘symmetric’
Dirichlet distribution). In that case, some species will randomly emerge as more
abundant than others over time in the Pólya urn sampling scheme, since any above-
averageabundancetendstobeamplified.Theshapeofthelimitingdistributionafteran
infinitenumberoftimesteps isheavily influencedbythe ‘concentrationparameter’𝑎,
which can formally take any positive real value. If𝑎ismuch smaller than 1, the first
speciestobepickedbythesamplingschemewillhaveitsabundanceupdatedto𝑎 + 1,
andwillhaveadisproportionatelyhigherprobabilitytobepickedagainatthenexttime
step. Conversely, if𝑎is much larger than 1, the fact that a species’ abundance is
increased by 1 has little influence on its subsequent probability to be picked. Thus,
dependingonthevalueof𝑎relativeto1,thesymmetricDirichletdistributioncaneither
describeaspeciesabundancedistributionwithafewdominantspeciesandmanyrare
one (𝑎 ≪ 1), reminiscentof the structureobserved in species-richcommunities,or in
contrast a veryeven species abundancedistribution (𝑎 ≫ 1). In thegeneral case, any
set of parameters 𝑎!,… ,𝑎! can be rewritten as 𝜃𝑝!,… ,𝜃𝑝! , with 𝑎!!!!! = 𝜃and
𝑝! = 𝑎! 𝜃, so that 𝑝!!!!! = 1. The Dirichlet distributionwith asymmetric parameters
behaves similarly to the symmetric case, except that the relative abundance𝑥! of
Introduction
41
species i hasmean𝑝! overallpossibledraws from theDirichletdistribution,while its
varianceisdeterminedbythevalueof𝜃 𝑆.
The symmetric Dirichlet distribution is the distribution that Fisher (1943)
implicitlyassumedforrelativespeciesabundancestoderivethelog-seriesSAD,defined
as𝔼 Φ! = 𝛼𝑥! 𝑛 (cf. section II.1.a). He assumed that the number of sampled
individuals per species followed a negative-binomial distribution of parameters
(𝛼 𝑆 , 𝑥)(withoutthezero-abundanceclass,becausethelattercannotbeobserved),as
the result of Poisson sampling from a large number S of Gamma-distributed species
abundances with shape parameter𝛼 𝑆and rate parameter 1− 𝑥 𝑥. The negative-
binomial distribution 𝑃!" can indeed be obtained as
𝑃!" 𝑘|𝛼 𝑆 , 𝑥 = 𝑃! 𝑘|𝜆 𝑝!(𝜆|𝛼 𝑆 , (1− 𝑥) 𝑥)𝑑𝜆!! , where 𝑃! and 𝑝! denote the
Poisson and Gamma distributions. Yet, if S species have abundances𝑛! identically
distributed as Gamma 𝛼 𝑆 ,𝜃 , their relative abundances𝑛! 𝑁 , where𝑁 = 𝑛!!!!! ,
followasymmetricDirichletdistributionwithconcentrationparameter𝛼 𝑆(Devroye,
1986).SinceFisherassumed𝛼 𝑆 ≪ 1toobtainthelog-series,thisindeedcorresponds
totheregimeofveryunevenrelativespeciesabundances.
b. TheDirichletprocessandtheEwenssamplingformula
As it is apparent in the case of Fisher’s log-series, a limitation of the Dirichlet
distribution as amean to describe species relative abundances is that it requires the
number S of species to be fixed in advance. It is hence appealing to generalize the
Dirichlet distributionbymakingS tend toward infinity. Let us consider the time step
𝑁 + 1of the Pólya urn sampling scheme with symmetric concentration parameter𝑎,
where N individuals have already been added to the original𝑆𝑎 individuals. The
probability to pick species i is(𝑛! + 𝑎) 𝑁 + 𝑆𝑎 , where𝑛! is the number of times
species ihasalreadybeenpicked.Hence, theprobability topickoneof the𝑆!species
that have already been picked at least once is (𝑁 + 𝑆!𝑎) 𝑁 + 𝑆𝑎 , while the
probability to pick one of the 𝑆 − 𝑆! species that have never been picked is
Introduction
42
(𝑆 − 𝑆!)𝑎 𝑁 + 𝑆𝑎 . If we simultaneously make S tend toward infinity and𝑎tend
toward 0, keeping the product𝑆𝑎 equal to a constant𝜃 , we obtain an infinite-
dimensional version of thePólya urn, called theHoppe urn,where the probability to
pickanexistingspeciesiattimestep𝑁 + 1is𝑛! 𝑁 + 𝜃 andtheprobabilitytopicka
newspeciesis𝜃 𝑁 + 𝜃 (Hoppe,1984).Afteraninfinitenumberoftimesteps,species
relative abundances are distributed according to a Dirichlet process of concentration
parameter𝜃anduniformbasedistribution,whichcanberegardedasthelimitoftheS-
dimensional symmetric Dirichlet distribution of concentration parameter𝜃 𝑆when S
tendstowardinfinity(Ferguson,1973;Tehetal.,2006).
More generally, a Dirichlet process of concentration parameter𝜃 and base
distribution𝒑 = 𝑝! !∈ℕ∗ can be regarded as the limit of the S-dimensional Dirichlet
distributionofconcentrationparameters 𝜃𝑝!,… ,𝜃𝑝! ,where 𝑝!!!!! = 1,whenStends
toward infinity (Ferguson, 1973). The infinite base distribution p is the distribution
fromwhich new species are sampled during the Hoppe urn scheme of parameter𝜃:
each new species is sampled from an infinite number of possible species labelswith
probabilityweightsp. If the basedistribution is uniform, as assumed in theprevious
paragraph,anever-encounteredlabelissimplyassignedtoeachnewspecies.
The Dirichlet process ismost intuitively understood by sampling from it. IfN
individuals are sampled from relative species abundances described by a Dirichlet
processofparameter𝜃anduniformbasedistribution,theirpartition Φ!,… ,Φ! intoS
species, whereΦ! is the number of species with abundance n, obeys the ‘Ewens
samplingformula’ofparameters(𝜃,𝑁)(Ewens,1972):
𝑃 Φ!,… ,Φ!|𝜃,𝑁 =𝑁!𝜃 !
1Φ!!
𝜃𝑛
!!!
!!!
where 𝜃 ! = Γ(𝜃 + 𝑁) Γ(𝜃).ThisformulaalsodescribesthepartitionofNindividuals
intoSspeciesobtainedbystoppingaHoppeurnschemeofparameter𝜃atstepN,thus
theDirichlet processdoesnot need to be explicitly defined for theEwens formula to
emerge from the Hoppe urn scheme. For a large enough sample, theΦ! are
approximatelydrawnfromindependentPoissonrandomvariableswithparameter𝜃 𝑛
Introduction
43
(Crane,2016).AremarkablepropertyoftheEwensformulaisthatityieldsasampling-
invariantdescriptionofrelativespeciesabundancescharacterizedbytheparameter𝜃:
indeed, any random subsample of𝑁! < 𝑁individuals taken from the initial sample
obeystheEwenssamplingformulaofparameters(𝜃,𝑁!).Moreover, theprobabilityof
observingSspeciesinasampleofNindividualsdoesnotdependontheexactpartition
but only on𝜃andN, as𝑃 𝑆 𝜃,𝑁 = 𝑠(𝑁, 𝑆)𝜃! 𝜃 ! ,where the function𝑠denotes the
absolutevalueof theStirlingnumbersof the first kind (Ewens,1972).Thus,𝜃canbe
regarded as a sampling-invariantmeasure of diversity in a species pool describedby
Ewenssamplingformula,irrespectiveofwhetherthisspeciespoolisfiniteorinfinite.
Neutralmodels4.
TheEwensformulawasfirstdiscoveredbyEwens(1972)inthecontextofpopulation
genetics. Indeed, it arises as the stationary distribution of allele frequency in the
Wright-Fisher andMoranmodels,whichdescribe theneutraldynamicsof alleles in a
population (Fisher, 1930; Wright, 1931; Moran, 1958; Wakeley, 2009). More
importantly for ecologists, the Ewens formula is also the stationary distribution of
species frequency in Hubbell’s neutral model of biodiversity, which was directly
inspired by population genetics (Hubbell, 2001). These models all bear some
resemblancetotheHoppeurnsamplingscheme,exceptthattheyaccountforthedeath
ofindividuals,sothatthetotalnumberofindividualsremainsconstantovertime.The
Wright-Fisher model assumes that all N individuals die at each time step and are
replacedbyanewgenerationofNnewindividuals.Theallelesofthesenewindividuals
aresampled(withreplacement)fromtheallelesinthepreviousgeneration,exceptfora
small probability in each new individual ofmutating into a never-encountered allele.
This translates into a demographic drift of allele frequency through time, and, over
longer time scales, by a turnover in thepoolof alleles through randommutationand
extinctionevents.TheMoranmodelissimilarbutassumesthanindividualsdieandare
replaced one at a time,which allows for overlapping generations. Hubbell’smodel is
almostidenticaltotheMoranmodel,exceptthatallelesarereinterpretedasspeciesand
Introduction
44
mutation as speciation, and that dying individuals cannot be replaced by their own
offspring. Even though all three models have the same stationary abundance
distribution described by Ewens formula, the exact expression of𝜃depends on the
model’sdynamics(Etienne&Alonso,2007).InHubbell’smodel,𝜃 = (𝑁 − 1) 𝜈 (1− 𝜈),
where𝜈isthespeciationprobabilityateachtimestep,i.e.theprobabilitythatthedying
individualisreplacedbyanewspecies.
ThekeyinnovationofHubbell’smodelcomparedtopopulationgeneticsmodels
isthatitalsoincludesthedescriptionofadispersal-limitedlocalcommunityconnected
to the regional metacommunity through immigration. The dynamics of the local
community is identical to that of the metacommunity, except that new species arise
through immigration instead of speciation: at each time step, an individual dies and
there isprobabilitym that thereplacing individualresults fromimmigration fromthe
metacommunity insteadof from local reproduction(if𝑚 = 1, there isno limitation to
dispersal). The difference is that unlike individuals arising through speciation, an
immigrating individual may belong to a species that is already present in the local
community.Thus,thestationarydistributionofspeciesfrequencyinalocalcommunity
of size N obeys a Ewens sampling formula of parameter𝐼 = (𝑁 − 1)𝑚 (1−𝑚) ,
modified to account for the fact that the immigrating ancestors to the current local
communityaresampledfromtheEwensformulaofparameter𝜃(Etienne&Olff,2004).
Theresultingtwo-layersamplingformulawasderivedbyEtienne(2005).The‘Etienne
sampling formula’ can also be regarded as the result of ‘dispersal-limited sampling’
fromEwens formula,which can be defined as a type of skewed sampling (Etienne&
Alonso, 2005). Importantly, Etienne formula still satisfies the sampling-invariance
propertyofEwensformula,i.e.anyrandomsubsampleof𝑁! < 𝑁individualswillfollow
theEtienneformulaofparameters 𝜃, 𝐼,𝑁! .
EwensandEtiennesamplingformulaallowforlikelihood-basedinferenceofthe
neutralparameters𝜃andI,aswellasforrigorousstatisticaltestsofmodelfit(cf.Fig.9;
Etienne & Olff, 2005; Etienne, 2007; Al Hammal et al., 2015). In practice, the
metacommunitycannotbedirectlyobservedand isusuallyregardedas infinite,while
thelocalcommunityisequatedwiththeobservedsampleofindividuals.Thus,𝜃andm
Introduction
45
are often chosen as model parameters instead of𝜃and I, reflecting the fact that the
numberofindividualsisknowninthelocalcommunitybutnotinthemetacommunity.I
canbeinterpretedastheeffectivenumberofindividualsinthemetacommunitythatare
in direct competition with the local community for reproduction. Furthermore, data
usuallyconsistofsamplesfromseverallocalcommunities.Thisconsiderablyincreases
statisticalpower,sincestatisticalinferencedoesnotonlyrelyontheshapeofthelocal
abundancedistributions,butalsoonthetaxonomicoverlapbetweenlocalcommunities.
Exact sampling formulas have been derived both for the case where all local
communitieshavethesameimmigrationparameterm(Etienne,2007)andforthecase
wheretheydonot(Etienne,2009).
Figure 9. Bayesian inference of neutral parameters based on the ‘Etienne sampling formula’:map showing the joint posterior probability density of𝜃andm for the tree abundance data(>10cmdbh)ofthe50-haBarroColoradoIslandmonitoredplot.AdaptedfromEtienne&Olff(2004).
As our approach focuses on the full multivariateprobability distribution P ½~JS jI ; H; J ", all information inthe data is used, or in other words, the curve-fitting exerciseis applied to the dominance-diversity (rank-abundance)curve. This is in contrast with McGill (2003) and Volkovet al. (2003) who lump abundances in (arbitrary) logarithmicabundance classes, and then fit a curve through the resultingspecies-abundance distribution.
In our analysis we only accounted for process error, theerror inherent in the stochastic model, and not formeasurement error (false identifications of species orincorrect abundance counts) that may play a very importantrole as well. The Bayesian framework is ideally suited toaccommodate such errors. This is however beyond thescope of this paper, as our primary goal is to present a newgenealogical modelling approach rather than a detailedBayesian data analysis.
We have presented a novel approach that shines newlights on the neutral model improving our understanding ofbiodiversity and particularly of the role of immigration. LikeVolkov et al. (2003), we expect our analysis to have
important consequences in population genetics andevolutionary biology, but in a more complete way thantheir approach, because our model predicts the relatednessof individuals and thus illustrates that genetic and speciesdiversity are inseparable aspects of biodiversity. It provides anew test of the neutral model: positive correlation betweengenetic and species diversity (Vellend 2003) may beinterpreted as evidence of neutral processes.
A C K N O W L E D G E M E N T S
We thank M. Boer, J. Chave, H. Heesterbeek, M. Rietkerk,M. Ritchie, P. de Ruiter and two anonymous reviewers forhelpful comments on earlier versions of the manuscript, andC. Bos, B. Carlin, M. Carlson, S. Chib, O. Diserud, P. Laud,I. Ntzoufras and A. McKane for clarifying their Bayesianand modelling approaches. Funding for Rampal Etienne wasprovided by the priority program !Biodiversity in DisturbedEcosystems" of the Netherlands Organization for ScientificResearch. Part of the work for this article was carried out atthe Tropical Nature Conservation and Vertebrate EcologyGroup, Wageningen University and Research Centre,Wageningen, The Netherlands, and at EnvironmentalScience, Faculty of Geosciences, Utrecht University,Utrecht, The Netherlands.
R E F E R E N C E S
Aitkin, M. (1991). Posterior Bayes factors. J. R. Stat. Soc. B, 53, 111–142.
Bell, G. (2001). Neutral macroecology. Science, 293, 2413–2418.
Figure 2 Joint posterior probability densityplot of the two model parameters h (funda-mental biodiversity number measuringregional diversity) and m (immigration prob-ability m :¼ I
I þ J % 1) for trees with diameterat breast height equal to or larger than 10 cmin the 1982 census of the Barro ColoradoIsland dataset (J ¼ 20 741, Condit et al.
1996). The maximum occurs at hopt ¼ 44.6and mopt ¼ 0.20 and the correlation coeffi-cient is r ¼ )0.61. The joint posteriordensity is obtained with a Metropolis–Hastings Markov Chain Monte Carloalgorithm, treating the elements of thespecies-ancestry abundance vector ~nSA aslatent variables. We used the Jeffreys priordistribution for h and I. Total sample size (offive parallel chains) and lag were 7 875 000and five iterations respectively.
Table 1 Interpretation of the Bayes factor B10 in comparing model1 to model 0 (Kass & Raftery 1995)
B10 Evidence against model 0
1–3 Not worth more than a bare mention3–20 Positive20–150 Strong>150 Very strong
174 R. S. Etienne and H. Olff
!2004 Blackwell Publishing Ltd/CNRS
Introduction
46
Despite the interest of exact sampling formulas for statistical inference,
approximate approaches have proved more practical in some instances. When the
number of samples is large enough, themetacommunity compositionmay be simply
approximated as the sum of all samples, instead of being explicitly modelled. In so
doing,onecanavoidmakinganyassumptiononthemetacommunitywhenestimating
immigration rates, or when testing the assumption of dispersal-limited neutral
community assembly (Sloan et al., 2006; Jabot et al., 2008; Harris et al., 2015).
Furthermore,whenabundancedataareunavailableorunreliable,theimmigrationrate
fromthemetacommunitymaybeestimatedsolelybasedontheoccurrenceofspecies
acrosssamples(Sloanetal.,2006).Alimitationofexactsamplingformulasisthattheir
computationisnumericallydemandingwhenthenumberofindividualsbecomeslarge.
Analternativeapproachisthentorepresentthesampleascontinuousspeciesrelative
abundances rather than in a fully discrete way. The species relative abundances
𝑥!,… , 𝑥! in a large dispersal-limited sample containing S species may be
approximatedasfollowingtheDirichletdistributionofparameters 𝐼𝑝!,… , 𝐼𝑝! ,where
𝑝!,… ,𝑝! aretherelativeabundancesofthoseSspeciesinthemetacommunity(Sloan
et al., 2007). In turn, 𝑝!,… ,𝑝! can be approximated as following the symmetric
Dirichlet distribution of parameter𝜃 𝑆(Woodcock et al., 2007). As is apparent from
sectionIII.b,thesecontinuousapproximationsmaybeextendedtothecaseofaninfinite
numberofspeciesSbymodellingtherelativeabundances𝒙inthelocalcommunityasa
Dirichlet process of parameter𝐼and base distribution𝒑 = 𝑝! !∈ℕ∗ , and the relative
abundancespinthemetacommunityasaDirichletprocessofparameter𝜃anduniform
base distribution (Harris et al., 2015). Such a model is referred to as a ‘hierarchical
Dirichletprocess’inthelanguageofmachinelearning.
While multivariate likelihood expressions are powerful tools for statistical
inference,theyaredifficulttovisualize,andone-dimensionalSADsmaybebettersuited
for intuitively understanding themodel’s behaviour. For instance, the non dispersal-
limitedSADinHubbell’smodelisequalto(Moran,1958;Vallade&Houchmandzadeh,
2003):
Introduction
47
𝔼 Φ!|𝜃,𝑁 =𝜃𝑛𝑁 + 1− 𝑛 !
𝑁 + 𝜃 − 𝑛 !
and converges toward Fisher’s log-serieswith𝜃 = 𝛼for a large enough numberN of
individuals(Chave,2004). Ingeneral, theSADcanberegardedas the firstmomentof
the multivariate sampling formula, since it is obtained as:
𝔼 Φ!|𝜃,𝑁 = Φ!𝑃 Φ!,… ,Φ!|𝜃,𝑁!!,…,!!| !!!!!!
!!!
A more straightforward approach to deriving this quantity is to express Hubbell’s
dynamical model through the approximate conditional transition probabilities
𝑃(𝑛 + 1|𝑛,𝜃),𝑃(𝑛 − 1|𝑛,𝜃)and𝑃(𝑛|𝑛,𝜃)thatagivenspecieswithcurrentabundancen
will have abundances𝑛 + 1 ,𝑛 − 1 , or n at the next time step, respectively. The
stationaryprobabilitydistributionofthis‘masterequation’thenprovidesanestimateof
𝔼 Φ!|𝜃,𝑁 ,oncemultipliedbytheobservednumberSofspeciesinthesample(Volkov
etal.,2003;Alonso&McKane,2004;McKaneetal.,2004;O’Dwyeretal.,2009).Unlike
the exact ‘genealogical’ approach described above, this approach typical of statistical
physicsdoesnot explicitly account for the interdependencebetween species, induced
by the constraint of a fixed total number of individuals through time (‘mean field’
approach). While this constraint was originally deemed a key element of the model
since it accounts for competition between species (Hubbell, 2001), both the
genealogical and themaster equation approaches have been found to yield the same
SADexpressionforalargeenoughsample(Etienneetal.,2007).
Categoricalmixturemodels5.
Letusassumethattherelativeabundances𝒙 = 𝑥!,… , 𝑥! ofSspecies,with 𝑥!!!!! = 1,
follow a Dirichlet distribution of parameters 𝒂 = 𝑎!,… ,𝑎! . The categorical
distributiondescribesthechoiceofoneoutofSspecies(orcategories)withprobability
weights𝒙. Itcanberegardedasaspecialcaseofthemultinomialdistribution,defined
as𝑃 𝒏 𝑁,𝒙 = 𝑁! 𝑛!!…𝑛!! 𝑥!!! … 𝑥!
!! , which describesmore generally the outcome
Introduction
48
ofNsuccessivecategoricaldrawswithprobabilityweights𝒙.Aremarkablepropertyof
the Dirichlet distribution is that it is the conjugate prior of the categorical and
multinomialdistributions.Namely, ifamultinomialsample𝒏 = 𝑛!,… ,𝑛! isobserved
from𝒙,with 𝑛!!!!! = 𝑁, then theposteriordistributionof𝒙given theobservations𝒏
still follows a Dirichlet distribution, but with parameters updated to𝒂+ 𝒏 = 𝑎! +
𝑛!,… ,𝑎! + 𝑛! to account for the observations. Fundamentally, this means that the
Dirichlet distribution is the “naturaldistributionoccurringwhentheprobability thata
forthcomingobservationisofcertainclassonlydependsonthenumberoftimesthisclass
hasalreadybeenobservedandonthetotalnumberofobservationsmadesofar” (Crane,
2016).
The posterior distribution of 𝒙 given the observations 𝒏 is defined as
𝑝 𝒙|𝒏,𝑁,𝒂 = 𝑃 𝒏 𝑁,𝒙 𝑝 𝒙 𝒂 /𝑃(𝒏|𝑁,𝒂) . The marginal likelihood 𝑃 𝒏|𝑁,𝒂 =
𝑃 𝒏 𝑁,𝒙 𝑝 𝒙 𝒂 𝑑𝒙𝒙 is the ‘Dirichlet-multinomial’ distribution of parameters 𝑁,𝒂 ,
i.e. the distribution of a N-individual multinomial sample with Dirichlet-distributed
probability weights of parameters𝒂. The Dirichlet-multinomial distribution can be
regardedasafinite-dimensionalversionoftheEwensformula(Crane,2016).
Because theDirichlet distribution is the conjugate prior of the categorical and
multinomial distributions, it is the natural prior in any probabilisticmodel involving
categorical or multinomial sampling from discrete classes. This is the case of
‘categoricalmixturemodels’,whichdescribeobservationsassampledfromamixtureof
K classes,with ‘mixtureweights’𝜽𝒌 = 𝜃! !∈ !,! , verifying 𝜃!!!!! = 1. The different
classes are typically not directly observable: instead, each is characterized by a
probabilitydistributionofparameters𝝓𝒌fromwhichall theobservationsassigned to
classkaresampled.Theprobabilitydistributionassociatedwitheachclassmaybefor
instance Gaussian if observations are continuous, or categorical if observations are
discrete.Ifthegoalofstatisticalinferenceistocapturedatastructure,thefocuswillbe
onestimatingthemixtureweights𝜽𝒌ofthedifferentclassesgiventheobservations,as
wellas theparameters𝝓𝒌of theprobabilitydistributionassociatedwitheachclass. If
thegoalistoclustertheobservations(ortoclassifythem,ifinferenceisconductedina
supervisedway),thefocuswillbeonassigningtoeachobservationitsmostlikelyclass
Introduction
49
k. In a fully Bayesian setting, the parameters𝝓𝒌 of the probability distribution
associatedwithclasskmayalsobegivenapriordistribution,suchasaDirichletpriorin
thecaseofcategoricalobservations.Inthislattercase,themodelmaybereferredtoas
a‘Dirichletmixturemodel’,sinceitdescribesobservationsascategoricalormultinomial
samplesfromamixtureofDirichlet-distributedclasses.
Thisfamilyofmodelshasfoundapplicationsinmanyfields.Inparticular,Holmes
etal.(2012)appliedsuchamodeltoinvestigatethestructureofmicrobialcommunities
sampled by environmentalDNA sequencing. They assume that each sample is a local
community belonging to one of K possible classes, which they interpret as
‘metacommunities’.Toeachofthesemetacommunitiesisassignedamixtureweight𝜃! ,
whichistheprobabilityforasampletooriginatefromit.Ametacommunitykisdefined
byprobabilityweights𝝓𝒌 = 𝜙!! !∈ !,!over theSOTUsobserved in thedataset, from
which local OTU abundances are sampled. These probability weights are themselves
Dirichlet-distributed with parameters𝒂𝒌 = 𝑎!! !∈ !,!. For practical purposes, the
parameters 𝑎!! may be further assumed to follow a ‘hyperprior’ distribution
parameterizedby‘hyperparameters’,soastoreducethenumberoffixedparametersto
estimate.
A version of this model was introduced earlier in population genetics, and
implemented in the software Structure (Pritchard et al., 2000). In the context of
population genetics, each sample is an individual, each class is a population, and
observations consist in the alleles found at a number of loci in each individual. As a
consequence,themodelexhibitsafewminordifferencescomparedtothatofHolmeset
al.(2012).SincethereareLobservedlociperindividual,eachclasskisnotdefinedby
onedistribution,butbyLdistributions𝝓𝒌,𝒍 = 𝜙!!,!
!∈ !,!!overthe𝑆! possibleallelesat
locusl,eachofthesedistributionshavingDirichletprior.Moreover,onlyonecategorical
draw from𝝓𝒌,𝒍is observed at each locus in each individual, instead of amultinomial
sample.
Introduction
50
Figure 10. Valle et al. (2014) applied Latent Dirichlet Allocation to identify forest treeassemblages in theEasternUnitedStates,basedon treecensusdata from34,174 forestplots.Maps show the relative proportion of each of the𝐾 = 11LDA classes in each forest plot.AdaptedfromValleetal.(2014).
In the same paper, Pritchard et al. (2000) proposed a second slightly more
sophisticatedmodel,whichincludesthepossibilityofadmixturebetweenpopulations.
This is achieved by relaxing the assumption that each individualm originates from a
single population, and by assuming instead that it originates from a mixture of K
populationswithindividual-specificweights𝜽𝒎 = 𝜃!! !∈ !,! .Asinthemodelwithout
admixture, the K populations are each defined by a single set of L distributions𝝓𝒌,𝒍
acrossthedataset.Thus,eachobservedalleleistheresultofacategoricaldrawfromthe
0.00 0.01 0.02 0.03 0.04 0.05 0.06
0.00
0.01
0.02
0.03
0.04
0.05
0.06
Minnesota
0.74
0.05 0.10 0.15 0.20 0.25 0.30 0.35
0.05
0.10
0.15
0.20
0.25
0.30
0.35
Indiana
0.7
Rel
ativ
e ab
unda
nce
(200
8–20
12)
Relative abundance (1999–2003) Relative abundance (1998)
(a)
(b)
Figure 4 Spatial and temporal patterns for tree plots in Eastern United States. Panel (a) depicts the spatial distribution of the proportion of eachcomponent community. Each subpanel corresponds to a component community (numbers in the lower right corner, see Table S1) except for the colour keyin the lower right. Panel (b) shows temporal patterns of relative abundance of the oak community in Minnesota (left subpanels) and Indiana (rightsubpanels). Upper subpanels show the relative abundance of community 4 in earlier and current forest inventories (numbers in red are the proportion ofplots indicating a decline in relative abundance). Lower subpanels show the spatial distribution of this decline, based on an inverse-distance weightedinterpolation. Data from Minnesota refer only to re-measured plots while data from Indiana were grouped into latitude–longitude bins because no plotswere re-measured. Only bins with at least four plots in 1998 and 2008–2012 are used.
© 2014 The Authors. Ecology Letters published by John Wiley & Sons Ltd and CNRS.
Letter A new multivariate tool for biodiversity data 1597
Introduction
51
individual-specific weights𝜽𝒎 , followed by a second categorical draw from the
population-specificandlocus-specificweights𝝓𝒌,𝒍.Anotherversionofthismodelwith
admixture was independently proposed under the name ‘Latent Dirichlet Allocation’
(LDA) by Blei et al. (2003) in the field of natural language processing, a subfield of
machinelearning,toaddresstheproblemof‘topicmodelling’.Inthiscontext,theaimof
themodel is to decompose text documents into topics based on their word content.
Each class or topick is defined by its distribution𝝓𝒌 = 𝜙!! !∈ !,!over theS distinct
words observed in the whole text corpus, and each documentm is a mixture, with
document-specificweights𝜽𝒎,ofmultinomialsamplesfromthesedistributions.
This model with admixture has proved very successful and has been
subsequently extended, both in its population genetics version (Falush et al., 2003,
2007;Hubiszetal.,2009)andinitstopicmodellingversion(Griffiths&Steyvers,2004;
Rosen-Zvietal.,2004;Tehetal.,2006;Blei,2012).Thelatter(LDA)hasbeenappliedto
awiderangeofdomainspertainingtomachinelearningwhereitsabilitytohandlelarge
andcomplexdatasetshasbeenpraised,includingsatelliteimageprocessing(Vaduvaet
al., 2013), bioinformatics (Liu et al., 2010), fraud detection in telecommunications
(Olszewski, 2012) and social sciences (Mauch et al., 2015). In particular, it has been
recently applied to spatially and temporally explicit forest tree composition data in
ecology, where its ability to decompose samples into classes learnt over the whole
datasetallowsforcapturingsmoothspatialandtemporalgradientsacrossthesamples
(cf.Fig.10;Valleetal.,2014).Relatedmodelshavealsobeenappliedtothedetectionof
different source environments in microbial community samples, with a focus on
supervised inference: Knights et al. (2011) applied this approach to the detection of
contamination in amedical environment,while Shafieietal. (2015)proposedamore
sophisticated two-layer model, where each class is itself a mixture of higher-level
classes.
Asinthecaseofneutralmodels,alimitationofDirichlet-multinomialmodelsis
that the number of classesmust be specified in advance. A number ofmethods have
beenusedtohelpselect thenumberofclasses(Airoldietal.,2010).Nevertheless, the
most rigorous approach is to design a model with a potentially infinite number of
Introduction
52
classes,anapproachreferredtoas‘nonparametricBayesian’,sincethesizeofthemodel
is not fixed in advance by a parameter. This can be achieved by setting a Dirichlet
processprioroverthemixtureweights,sincetheDirichletprocessis,liketheDirichlet
distribution,conjugate to thecategoricalandmultinomialdistributions (Crane,2016).
ThisamountstomakingthenumberKofclassestendtowardinfinity.
In the infinite-dimensional extension of the model without admixture, the
mixture weights𝜽 = 𝜃! !∈ℕ∗ over classes follow a Dirichlet process of uniform base
distributionoverclasslabels,whileeachclassk isdefinedasinthefinite-dimensional
case by its distribution𝝓𝒌 = 𝜙!! !∈ !,!over the S possible observations (Teh et al.,
2006).Inthemodelwithadmixturehowever,ahierarchicalDirichletprocessneedsto
be defined. Indeed, if an independent Dirichlet process of uniform base distribution
weretobeassignedineachsamplemasapriortothemixtureweights𝜽𝒎 = 𝜃!! !∈ℕ∗ ,
twodocumentswouldnothaveanyclassincommon.Thus,intheinfinite-dimensional
extension of the model with admixture, the mixture weights𝜽𝒎in each sample m
originate from a Dirichlet process of base distribution𝜷 over classes, while the
distribution𝜷follows itselfaDirichletprocessofuniformbasedistributionoverclass
labels (Teh et al., 2006). Likewise, two local communities in the infinite-dimensional
approximationofHubbell’sneutralmodelwouldnothaveanyspeciesincommonifnot
forthehierarchicalDirichletprocessconstruction(cf.sectionIII.4).
Introduction
53
IV. Objectivesandoutline
Objectives1.
MostofEarth’sbiodiversityisconcentratedinafewhyperdiverseecosystems,suchas
tropicalforests.Yet,themechanismsthatpermitthecoexistenceofsuchalargenumber
ofspeciesarenotfullyunderstood.Inparticular,therelativeinfluenceofdeterministic
niche processes and stochastic dispersal limitation has long been debated. One
approach to address this question is through the study of integrative biodiversity
patterns, such as the distribution of species abundances and the turnover of species
compositionthroughspace.Atatimewhenhumanactivitiesthreatenbothbiodiversity
and the associated ecosystems, a better understanding of these patterns and of the
underlyingmechanismsismuchneeded.
Amajorobstacleliesinthedifficultytomeasurebiodiversity.Indeed,ithaslong
reliedondirecthumanobservation.However,recenttechnologicaladvancesnowmake
automateddatacollectionpossible,whichcouldalleviatethisproblem.Environmental
DNA sequencing is especially promising for improving our understanding of
biodiversity patterns. Indeed, it eases and standardizes the measurement of
biodiversity, increases the amount of available data by orders of magnitude, and
dramaticallyexpandstherangeofaccessibletaxa.Inparticular,itallowsfortakinginto
accountmicrobialdiversity,arguablythe‘hiddenpartofthebiodiversityiceberg’.
Nevertheless,takingadvantageofthisnewtypeofdataischallenging.First,the
rangeofinformationtypesthatcanbecollectedisrestricted,inthatnocomplementary
measurements,suchassizeforinstance,canbemadeonorganisms.Inmostcases,even
taxonomic information is relatively impreciseowing to the lackof referencedatabase
for the retrieved DNA sequences. Thus, inference is mostly based on patterns of
unidentified OTUs. Second, because observations are indirect and noisy, their
interpretation isnotas straightforwardas in thecaseofdirect censusesof individual
Introduction
54
organisms. Third, the high diversity of microbial communities makes for large and
sparsedatasets,towhichexistingstatisticalapproachesarenotwellsuited.
Theoverarchinggoalof this thesiswas to investigatehowenvironmentalDNA
sequencing, and more generally the automated collection of ecological data, could
contribute to our understanding of biodiversity patterns and of their underlying
mechanisms.Thisworkwasmotivatedbytwoobservations.First,theoreticalmodelsin
ecologyareforthemostpartnotorientedtowardcomparisonwithdata,andwhenthey
are,asinthecaseofHubbell’sneutralmodel,theyarecentredonindividualorganisms,
which hampers their comparison to environmental DNA data. Second, existing
statisticalmethodsinecologyhavelimitationsintheirabilitytotacklesuchdata.Thus,
thisworkhasanimportantmethodologicalcomponent.Asecondgoalofthisthesiswas
toapply thedevelopedapproaches tosoilDNAdatacollected in the forestsofFrench
Guiana, so as to better understand community assembly in tropical forests. This
includesadatasetthatwascollectedaspartofthisthesis.
Outline2.
The first chapter addresses the issue of measuring beta diversity patterns from
environmentalDNAdata, andof using thesepatterns todisentangledispersal-limited
andniche-basedprocessesacrossthedifferentdomainsoflife.Tothisend,asoilDNA
datasetwascollectedinFrenchGuiana,inforestplotsthatareapproximatelyregularly
spacedona logarithmic scale.A rangeof soilpropertieswasalsomeasured from the
soil samples. Three approaches are compared: distance-based analyses using
dissimilaritymetrics, raw-data analyses usingmultivariate ordination, and fitting the
neutral prediction for the decay of taxonomic similarity with distance. These
approaches are typical of those used to analyse classical biodiversity census data. In
addition,theeffectonhumandisturbancethroughloggingisassessed,basedonamore
limitednumberofplotspresentingagradientofloggingintensities.
Introduction
55
Thesecondchapterfocusesonspeciesabundancedistributionsmeasuredfrom
environmentalDNAdata,andaddresses theproblemofcomparing thispattern to the
prediction of Hubbell’s neutral model. Indeed, it was unknown to what extent this
patternmay remain informative in spiteof thepotentialnoise. Simulation results are
presented, that quantify how the estimates of the neutral diversity and dispersal
parameters are biased when inferred from environmental DNA data. A benchmark
datasetoflimitedextentisusedtoassessthelevelofnoisethatistobeexpectedinreal
data.
Like the first chapter, the third chapter discusses spatial patterns in
environmentalDNAdata, but itproposesanapproachdiffering from those classically
followed in ecology. It investigates the potential of amodel-based statisticalmethod,
Latent Dirichlet Allocation, to decompose the data into assemblages of spatially co-
occurring OTUs. In addition, a method is proposed to measure the stability of the
decomposition.Theapproachistestedthroughsimulations,andbyapplyingittoalarge
soilDNAdataset.Thisdataset followsaregularspatialsamplingschemeovera forest
plot,andwascollectedinFrenchGuianabeforethestartofthisthesis.Theinsightson
soilcommunitystructureprovidedbytheapproacharediscussed,makinguseofLidar
measurementsofenvironmentalfeatures.
Finally, the discussion provides a synthesis of the results, and discusses the
perspectivesarisingfromthisthesis.
Introduction
56
References
Airoldi, E.M., Erosheva, E.A., Fienberg, S.E., Joutard, C., Love, T. & Shringarpure, S. (2010)ReconceptualizingtheclassificationofPNASarticles.PNAS,107,20899–20904.
Akaike, H. (1974) A new look at the statistical model identification. IEEE transactions onautomaticcontrol,19,716–723.
AlHammal,O.,Alonso,D.,Etienne,R.S.&Cornell,S.J.(2015)WhenCanSpeciesAbundanceDataRevealNon-neutrality?PlosComputationalBiology,11,23.
Alonso,D.,Etienne,R.S.&McKane,A.J.(2006)Themeritsofneutraltheory.TrendsinEcology&Evolution,21,451–457.
Alonso, D. & McKane, A.J. (2004) Sampling Hubbell’s neutral theory of biodiversity. EcologyLetters,7,901–910.
Alonzo, M., Bookhagen, B. & Roberts, D.A. (2014) Urban tree species mapping usinghyperspectralandlidardatafusion.RemoteSensingofEnvironment,148,70–83.
Andersen,K.,Bird,K.L.,Rasmussen,M.,Haile, J., Breuning-Madsen,H.,Kjaer,K.H.,Orlando, L.,Gilbert,M.T.P. &Willerslev, E. (2012)Meta-barcoding of “dirt” DNA from soil reflectsvertebratebiodiversity.MolecularEcology,21,1966–1979.
Aristotle(IVthcent.BC)HistoryofAnimals.Armstrong, R.A. & McGehee, R. (1980) Competitive exclusion. The American Naturalist, 115,
151–170.Arnoldi, J.-F., Loreau, M. & Haegeman, B. (2016) Resilience, reactivity and variability: A
mathematicalcomparisonofecologicalstabilitymeasures.JournalofTheoreticalBiology,389,47–59.
Arrhenius,O.(1921)Speciesandarea.JournalofEcology,9,95–99.Baas Becking, L.G.M. (1934) Geobiologie of inleiding tot demilieukunde., W.P. Van Stockum &
Zoon,TheHague,theNetherlands.Bayes,M.&Price,M.(1763)Anessaytowardssolvingaprobleminthedoctrineofchances.by
thelaterev.mr.bayes,frscommunicatedbymr.price,inalettertojohncanton,amfrs.PhilosophicalTransactions(1683-1775),370–418.
Bentley,D.R.,Balasubramanian,S.,Swerdlow,H.P.,Smith,G.P.,Milton,J.,Brown,C.G.,Hall,K.P.,Evers,D.J.,Barnes,C.L.&Bignell,H.R.(2008)Accuratewholehumangenomesequencingusingreversibleterminatorchemistry.nature,456,53.
Bik,H.M.,Porazinska,D.L.,Creer,S.,Caporaso,J.G.,Knight,R.&Thomas,W.K.(2012)Sequencingour way towards understanding global eukaryotic biodiversity. Trends in Ecology &Evolution,27,233–243.
Bishop, C.M. (2006) PatternRecognition andMachine Learning, Springer. Michael Jordan, JonKleinberg,BernhardSchölkopf.
Blackwell,D.&MacQueen,J.B.(1973)FergusondistributionsviaPólyaurnschemes.Theannalsofstatistics,353–355.
Blei, D. (2012) Probabilstic Topic Models. Communication of the Association for ComputingMachinery,55,77–84.
Blei,D.M.,Ng,A.Y.&Jordan,M.I.(2003)LatentDirichletAllocation.JournalofMachineLearningResearch,3,993–1022.
Introduction
57
Bloomfield, N.J., Knerr, N. & Encinas-Viso, F. (2017) A comparison of network and clusteringmethodstodetectbiogeographicalregions.Ecography.
Bohmann,K.,Evans,A.,Gilbert,M.T.P.,Carvalho,G.R.,Creer,S.,Knapp,M.,Yu,D.W.&deBruyn,M.(2014)EnvironmentalDNAforwildlifebiologyandbiodiversitymonitoring.TrendsinEcology&Evolution,29,358–367.
Braun-Blanquet&Pavillard(1922)Vocabulairedesociologievégétale.Brook,B.W.,Ellis,E.C.,Perring,M.P.,Mackay,A.W.&Blomqvist,L. (2013)Does the terrestrial
biospherehaveplanetarytippingpoints?TrendsinEcology&Evolution,28,396–401.Brown,J.H.(1995)Macroecology,UniversityofChicagoPress.Brown,J.H.(1978)Thetheoryofinsularbiogeographyandthedistributionofborealbirdsand
mammals.GreatBasinNaturalistMemoirs,209–227.Brown, J.H. & Kodric-Brown, A. (1977) Turnover Rates in Insular Biogeography: Effect of
ImmigrationonExtinction.Ecology,58,445–449.Brown,W.L.&Wilson,E.O.(1956)Characterdisplacement.Systematiczoology,5,49–64.Burnham,K.P.&Anderson,D.R.(2002)Modelselectionandmultimodelinference,Springer,New
York.Cabeza, M. & Moilanen, A. (2001) Design of reserve networks and the persistence of
biodiversity.TrendsinEcology&Evolution,16,242–248.Carpenter, S.R., Cole, J.J., Pace, M.L., Batt, R., Brock, W.A., Cline, T., Coloso, J., Hodgson, J.R.,
Kitchell,J.F.,Seekell,D.A.,Smith,L.&Weidel,B.(2011)EarlyWarningsofRegimeShifts:AWhole-EcosystemExperiment.Science,332,1079.
Caswell,H. (1976)Community structure - neutralmodel analysis.EcologicalMonographs,46,327–354.
Chase, J.M. & Leibold, M.A. (2003) Ecological niches: linking classical and contemporaryapproaches,UniversityofChicagoPress.
Chave,J.(2004)Neutraltheoryandcommunityecology.EcologyLetters,7,241–253.Chave,J.(2013)Theproblemofpatternandscaleinecology:whathavewelearnedin20years?
EcologyLetters,16,4–16.Chave, J.,Alonso,D.&Etienne,R.S. (2006)Theoreticalbiology -Comparingmodelsof species
abundance.Nature,441,E1–E1.Chave, J. & Leigh, E.G. (2002) A spatially explicit neutral model of beta-diversity in tropical
forests.TheoreticalPopulationBiology,62,153–168.Chave, J., Muller-Landau, H.C. & Levin, S.A. (2002) Comparing classical community models:
Theoreticalconsequencesforpatternsofdiversity.AmericanNaturalist,159,1–23.Chesson,P. (2000)Mechanismsofmaintenanceof speciesdiversity.AnnualReviewofEcology
andSystematics,31,343–+.Chisholm, R.A. & Pacala, S.W. (2010) Niche and neutral models predict asymptotically
equivalent species abundance distributions in high-diversity ecological communities.Proceedings of the National Academy of Sciences of the United States of America, 107,15821–15825.
Clements, F.E. (1916) Plant succession: an analysis of the development of vegetation, CarnegieInstitutionofWashington.
Condit, R., Pitman, N., Leigh, E.G., Chave, J., Terborgh, J., Foster, R.B., Nunez, P., Aguilar, S.,Valencia,R.,Villa,G.,Muller-Landau,H.C.,Losos,E.&Hubbell,S.P.(2002)Beta-diversity
Introduction
58
intropicalforesttrees.Science,295,666–669.Connell, J.H. (1983) On the prevalence and relative importance of interspecific competition:
evidencefromfieldexperiments.TheAmericanNaturalist,122,661–696.Connell,J.H.(1970)Ontheroleofnaturalenemiesinpreventingcompetitiveexclusioninsome
marineanimalsandinrainforesttrees.Dynamicsofpopulations.Connolly,S.R.,Hughes,T.P.&Bellwood,D.R.(2017)Aunifiedmodelexplainscommonnessand
rarityoncoralreefs.EcologyLetters.Cordero,O.X.,Wildschutte,H.,Kirkup,B.,Proehl,S.,Ngo,L.,Hussain,F.,LeRoux,F.,Mincer,T.&
Polz, M.F. (2012) Ecological Populations of Bacteria Act as Socially Cohesive Units ofAntibioticProductionandResistance.Science,337,1228.
Cox,C.B.,Moore,P.D.&Ladle,R.(2016)Biogeography:anecologicalandevolutionaryapproach,JohnWiley&Sons.
Crane,H.(2016)TheUbiquitousEwensSamplingFormula.StatisticalScience,31,1–19.Curtis, T.P. & Sloan, W.T. (2005) Exploring microbial diversity - A vast below. Science, 309,
1331–1333.Daily,G.(1997)Nature’sservices:societaldependenceonnaturalecosystems,IslandPress.Darwin,C.(1859)OntheOriginofSpeciesbyMeansofNaturalSelection.Davies, T.J., Allen, A.P., Borda-de-Agua, L., Regetz, J.&Melian, C.J. (2011)Neutral biodiversity
theory can explain the imbalance of phylogenetic trees but not the tempo of theirdiversification.Evolution.
Devroye, L. (1986) Sample-based non-uniform random variate generation. Proceedings of the18thconferenceonWintersimulation,pp.260–265.ACM.
Diamond,J.M.(1975)Assemblyofspeciescommunities.EcologyandEvolutionofCommunities,pp.342–444.Cody,M.L.&Diamond,J.M.,Cambridge,MA.
Doughty,C.E.,Wolf,A.&Malhi,Y.(2013)ThelegacyofthePleistocenemegafaunaextinctionsonnutrientavailabilityinAmazonia.NatureGeoscience,6,761–764.
Du, X., Zhou, S. & Etienne, R.S. (2011) Negative density dependence can offset the effect ofspecies competitive asymmetry: A niche-based mechanism for neutral-like patterns.JournalofTheoreticalBiology,278,127–134.
Etienne, R.S. (2007) A neutral sampling formula for multiple samples and an “exact” test ofneutrality.EcologyLetters,10,608–618.
Etienne,R.S. (2005)Anewsampling formula forneutralbiodiversity.EcologyLetters,8,253–260.
Etienne,R.S. (2009)Maximumlikelihoodestimationofneutralmodelparameters formultiplesamples with different degrees of dispersal limitation. Journal of Theoretical Biology,257,510–514.
Etienne, R.S. & Alonso, D. (2005) A dispersal-limited sampling theory for species and alleles.EcologyLetters,8,1147–1156.
Etienne,R.S.&Alonso,D. (2007)Neutral community theory:Howstochasticityanddispersal-limitationcanexplainspeciescoexistence.JournalofStatisticalPhysics,128,485–510.
Etienne,R.S.,Alonso,D.&McKane,A.J.(2007)Thezero-sumassumptioninneutralbiodiversitytheory.JournalofTheoreticalBiology,248,522–536.
Etienne, R.S. & Olff, H. (2004) A novel genealogical approach to neutral biodiversity theory.EcologyLetters,7,170–175.
Introduction
59
Etienne,R.S.&Olff,H.(2005)Confrontingdifferentmodelsofcommunitystructuretospecies-abundancedata:aBayesianmodelcomparison.EcologyLetters,8,493–504.
Ewens, W.J. (1972) The sampling theory of selectively neutral alleles. Theoretical populationbiology,3,87–112.
Falush, D., Stephens, M. & Pritchard, J.K. (2007) Inference of population structure usingmultilocusgenotypedata:dominantmarkersandnullalleles.MolecularEcologyNotes,7,574–578.
Falush, D., Stephens, M. & Pritchard, J.K. (2003) Inference of population structure usingmultilocus genotype data: Linked loci and correlated allele frequencies.Genetics,164,1567–1587.
Ferguson, T.S. (1973) A Bayesian analysis of some nonparametric problems. The annals ofstatistics,209–230.
Fisher,C.K.&Mehta,P.(2014)Thetransitionbetweenthenicheandneutralregimesinecology.Proceedings of the National Academy of Sciences of the United States of America, 111,13111–13116.
Fisher, J.B., Huntzinger, D.N., Schwalm, C.R. & Sitch, S. (2014) Modeling the TerrestrialBiosphere.AnnualReviewofEnvironmentandResources,39,91–123.
Fisher,R.A.(1925)Statisticalmethodsforresearchworkers,GenesisPublishingPvtLtd.Fisher,R.A.(1930)Thegeneticaltheoryofnaturalselection:acompletevariorumedition,Oxford
UniversityPress.Fisher,R.A.,Corbet,A.S.&Williams,C.B.(1943)Therelationbetweenthenumberofspeciesand
thenumberof individuals ina randomsampleofananimalpopulation.TheJournalofAnimalEcology,42–58.
Fortunato,S.(2010)Communitydetectioningraphs.PhysicsReports.Fuhrman, J.A. (2009) Microbial community structure and its functional implications.Nature,
459,193–199.Gause, G.F. (1932)Experimental studies on the struggle for existence: 1.Mixedpopulation of
twospeciesofyeast.JournalofExperimentalBiology,9,389–402.Gelman, A., Carlin, J.B., Stern, H.S. & Rubin, D.B. (2014) Bayesian data analysis, Chapman &
Hall/CRCBocaRaton,FL,USA.Ghalambor,C.K.,Hoke,K.L.,Ruell,E.W.,Fischer,E.K.,Reznick,D.N.&Hughes,K.A.(2015)Non-
adaptive plasticity potentiates rapid adaptive evolution of gene expression in nature.Nature,525,372–375.
Gibson,J.,Shokralla,S.,Porter,T.M.,King,I.,vanKonynenburg,S.,Janzen,D.H.,Hallwachs,W.&Hajibabaei,M.(2014)Simultaneousassessmentofthemacrobiomeandmicrobiomeinabulk sample of tropical arthropods through DNA metasystematics. Proceedings of theNationalAcademyofSciencesoftheUnitedStatesofAmerica,111,8007–8012.
Gilbert, J.A., Jansson, J.K. & Knight, R. (2014) The Earth Microbiome project: successes andaspirations.BmcBiology,12.
Giovannoni,S.J.,Britschgi,T.B.,Moyer,C.L.&Field,K.G.(1990)GeneticdiversityinSargassoSeabacterioplankton.Nature,345,60–63.
Gleason,H.A. (1926)The individualisticconceptof theplantassociation.BulletinoftheTorreyBotanicalClub,7–26.
Gourlet-Fleury,S.,Ferry,B.,Molino, J.-F.,Petronelli,P.&Schmitt,L. (2004)Experimentalplots:
Introduction
60
keyfeatures,Elsevier.Gravel,D.,Canham,C.D.,Beaudet,M.&Messier,C.(2006)Reconcilingnicheandneutrality:the
continuumhypothesis.EcologyLetters,9,399–409.Griffiths,T.&Steyvers,M.(2004)CollapsedGibbsSamplingforLDA.101,5228–5235.Grinnell,J.(1917)Theniche-relationshipsoftheCaliforniaThrasher.TheAuk,34,427–433.Guisan, A. & Thuiller, W. (2005) Predicting species distribution: offering more than simple
habitatmodels.EcologyLetters,8,993–1009.Hansen,M.C., Potapov, P.V.,Moore, R., Hancher,M., Turubanova, S.A., Tyukavina, A., Thau, D.,
Stehman, S.V., Goetz, S.J., Loveland, T.R., Kommareddy, A., Egorov, A., Chini, L., Justice,C.O. & Townshend, J.R.G. (2013) High-Resolution Global Maps of 21st-Century ForestCoverChange.Science,342,850–853.
Hanson,C.A.,Fuhrman,J.A.,Horner-Devine,M.C.&Martiny,J.B.H.(2012)Beyondbiogeographicpatterns:processesshapingthemicrobiallandscape.NatureReviewsMicrobiology.
Harris,K.,Parsons,T.L.,Ijaz,U.Z.,Lahti,L.,Holmes,I.&Quince,C.(2015)Linkingstatisticalandecological theory: Hubbell’s Unified Neutral Theory of Biodiversity as a HierarchicalDirichletProcess.Proc.IEEE,PP,1–14.
Hebert,P.D.N.,Cywinska,A.,Ball,S.L.&DeWaard,J.R.(2003)BiologicalidentificationsthroughDNAbarcodes.ProceedingsoftheRoyalSocietyB-BiologicalSciences,270,313–321.
Heckenberger,M.J.,Russell,J.C.,Fausto,C.,Toney,J.R.,Schmidt,M.J.,Pereira,E.,Franchetto,B.&Kuikuro,A.(2008)Pre-ColumbianUrbanism,AnthropogenicLandscapes,andtheFutureoftheAmazon.Science,321,1214.
Hillebrand,H.(2004)Onthegeneralityofthelatitudinaldiversitygradient.AmericanNaturalist,163,192–211.
Holmes,I.,Harris,K.&Quince,C.(2012)DirichletMultinomialMixtures:GenerativeModelsforMicrobialMetagenomics.PlosOne,7.
Holt,R.D.(2006)Emergentneutrality.TrendsinEcology&Evolution,21,531–533.Holt, R.D. (1996) Food Webs in Space: An Island Biogeographic Perspective. Food Webs:
IntegrationofPatterns&Dynamics(ed.byG.A.Polis)andK.O.Winemiller),pp.313–323.SpringerUS,Boston,MA.
Hoppe, F.M. (1984)Polya-likeurns and theEwens sampling formula. JournalofMathematicalBiology,20,91–94.
Houchmandzadeh, B. (2009) Theory of neutral clustering for growing populations. PhysicalReviewE,80,8.
Hubbell, S.P. (1997)Aunified theory of biogeography and relative species abundance and itsapplicationtotropicalrainforestsandcoralreefs.CoralReefs,16,S9–S21.
Hubbell, S.P. (2001) The unified neutral theory of biodiversity and biogeography (MPB-32),PrincetonUniversityPress.
Hubbell,S.P.(1979)Treedispersion,abundance,anddiversityinatropicaldryforest.Science,203,1299–1309.
Hubisz,M.J.,Falush,D.,Stephens,M.&Pritchard,J.K.(2009)Inferringweakpopulationstructurewiththeassistanceofsamplegroupinformation.MolecularEcologyResources,9,1322–1332.
Hug,L.A.,Baker,B.J.,Anantharaman,K.,Brown,C.T.,Probst,A.J.,Castelle,C.J.,Butterfield,C.N.,Hernsdorf, A.W., Amano, Y., Ise, K., Suzuki, Y., Dudek, N., Relman, D.A., Finstad, K.M.,
Introduction
61
Amundson,R.,Thomas,B.C.&Banfield,J.F.(2016)Anewviewofthetreeoflife.NatureMicrobiology,1,16048.
Hutchinson, G.E. (1957) Concluding remarks. Cold Spring Harbor Symposia on QuantitativeBiology,22,415–427.
Hutchinson,G.E.(1961)Theparadoxoftheplankton.TheAmericanNaturalist,95,137–145.Jabot, F., Etienne, R.S. & Chave, J. (2008) Reconciling neutral community models and
environmentalfiltering:theoryandanempiricaltest.Oikos,117,1308–1320.Jain, A.K. (2010) Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31,
651–666.Janzen,D.H.(1970)Herbivoresandthenumberoftreespeciesintropicalforests.TheAmerican
Naturalist,104,501–528.Jost,L.(2007)Partitioningdiversityintoindependentalphaandbetacomponents.Ecology,88,
2427–2439.Kalyuzhny, M., Kadmon, R. & Shnerb, N.M. (2015) A neutral theory with environmental
stochasticityexplainsstaticanddynamicpropertiesofecologicalcommunities.EcologyLetters,18,572–580.
Knights,D.,Kuczynski, J.,Charlson,E.S.,Zaneveld, J.,Mozer,M.C.,Collman,R.G.,Bushman,F.D.,Knight, R. & Kelley, S.T. (2011) Bayesian community-wide culture-independentmicrobialsourcetracking.NatureMethods,8,761-U107.
Kraft, N.J.B., Adler, P.B., Godoy, O., James, E.C., Fuller, S. & Levine, J.M. (2015) Communityassembly,coexistenceandtheenvironmentalfilteringmetaphor.FunctionalEcology,29,592–599.
Lawton,J.H.(1999)AreThereGeneralLawsinEcology?Oikos,84,177.Lawton, J.H., Bignell, D.E., Bolton, B., Bloemers, G.F., Eggleton, P., Hammond, P.M., Hodda, M.,
Holt,R.D.,Larsen,T.B.&Mawdsley,N.A.(1998)Biodiversityinventories,indicatortaxaandeffectsofhabitatmodificationintropicalforest.Nature,391,72–76.
Legendre, P., Borcard, D. & Peres-Neto, P.R. (2005) Analyzing beta diversity: partitioning thespatialvariationofcommunitycompositiondata.EcologicalMonographs.
Legendre, P., Borcard, D. & Peres-Neto, P.R. (2008) Analyzing or explaining beta diversity?Comment.Ecology,89,3238–3244.
Legendre, P. & De Caceres, M. (2013) Beta diversity as the variance of community data:dissimilaritycoefficientsandpartitioning.EcologyLetters,16,951–963.
Legendre,P.&Legendre,L.(2012)NumericalEcology,Elsevier.Leibold,M.A., Holyoak,M.,Mouquet, N., Amarasekare, P., Chase, J.M., Hoopes,M.F., Holt, R.D.,
Shurin, J.B., Law,R., Tilman,D., Loreau,M.&Gonzalez,A. (2004)Themetacommunityconcept:aframeworkformulti-scalecommunityecology.EcologyLetters,7,601–613.
Linnaeus,C.(1753)SpeciesPlantarum.Liu,B.,Liu,L.,Tsykin,A.,Goodall,G.J.,Green, J.E., Zhu,M.,Kim,C.H.&Li, J. (2010) Identifying
functional miRNA-mRNA regulatory modules with correspondence latent dirichletallocation.Bioinformatics,26,3105–3111.
Livermore, J.A. & Jones, S.E. (2015) Local-global overlap in diversity informsmechanisms ofbacterialbiogeography.ISMEJ,9,2413–2422.
Loreau, M. & de Mazancourt, C. (2013) Biodiversity and ecosystem stability: a synthesis ofunderlyingmechanisms.EcologyLetters,16,106–115.
Introduction
62
MacArthur, R.H. (1972)Geographical ecology:patterns in thedistributionof species, PrincetonUniversityPress.
MacArthur,R.H. (1957)On therelativeabundanceofbirdspecies.ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesofAmerica,43,293–5.
MacArthur,R.H.(1958)Populationecologyofsomewarblersofnortheasternconiferousforests.Ecology,39,599–619.
MacArthur, R.H. & Wilson, E. 0. (1967) The theory of island biogeography. Monographs inPopulationBiology,1.
Maire,V.,Gross,N.,Börger,L.,Proulx,R.,Wirth,C.,Pontes,L.daS.,Soussana,J.-F.&Louault,F.(2012) Habitat filtering and niche differentiation jointly explain species relativeabundancewithingrasslandcommunitiesalongfertilityanddisturbancegradients.NewPhytologist,196,497–509.
Margulies,M.,Egholm,M.,Altman,W.E.,Attiya,S.,Bader,J.S.,Bemben,L.A.,Berka,J.,Braverman,M.S.,Chen,Y.-J.,Chen,Z.&others (2005)Genomesequencing inopenmicrofabricatedhighdensitypicoliterreactors.Nature,437,376.
Mariadassou,M.,Pichon,S.&Ebert,D.(2015)Microbialecosystemsaredominatedbyspecialisttaxa.EcologyLetters,18,974–982.
Martiny, J.B.H., Bohannan, B.J.M., Brown, J.H., Colwell, R.K., Fuhrman, J.A., Green, J.L., Horner-Devine, M.C., Kane, M., Krumins, J.A., Kuske, C.R., Morin, P.J., Naeem, S., Øvreås, L.,Reysenbach, A.-L., Smith, V.H. & Staley, J.T. (2006) Microbial biogeography: puttingmicroorganismsonthemap.NatureReviewsMicrobiology,4,102–112.
Martiny, J.B.H., Eisen, J.A., Penn, K., Allison, S.D. & Horner-Devine, M.C. (2011) Drivers ofbacterialbeta-diversitydependonspatialscale.ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesofAmerica,108,7850–7854.
Mauch,M.,MacCallum,R.M.,Levy,M.&Leroi,A.M.(2015)Theevolutionofpopularmusic:USA1960-2010.RoyalSocietyopenscience,2,150081–150081.
McCann,K.S.(2000)Thediversity-stabilitydebate.Nature,405,228.McGill,B.J.(2003)Atestoftheunifiedneutraltheoryofbiodiversity.Nature,422,881–885.McGill, B.J., Etienne, R.S., Gray, J.S., Alonso, D., Anderson, M.J., Benecha, H.K., Dornelas, M.,
Enquist, B.J., Green, J.L., He, F.L., Hurlbert, A.H.,Magurran, A.E.,Marquet, P.A.,Maurer,B.A., Ostling, A., Soykan, C.U., Ugland, K.I. & White, E.P. (2007) Species abundancedistributions: moving beyond single prediction theories to integration within anecologicalframework.EcologyLetters,10,995–1015.
McGill,B.J.,Maurer,B.A.&Weiser,M.D.(2006)Empiricalevaluationofneutraltheory.Ecology,87,1411–1423.
McKane, A.J., Alonso, D. & Sole, R.V. (2004) Analytic solution of Hubbell’s model of localcommunitydynamics.TheoreticalPopulationBiology,65,67–73.
Miller,J.(2010)SpeciesDistributionModeling.GeographyCompass,4,490–509.Moran,P.A.P. (1958)Randomprocessesingenetics.MathematicalProceedingsoftheCambridge
PhilosophicalSociety,pp.60–71.CambridgeUniversityPress.Morrone, J.J. (2015) Biogeographical regionalisation of the world: a reappraisal. Australian
SystematicBotany,28,81.Noble, I.R. & Slatyer, R.O. (1977) Post-fire succession of plants in Mediterranean ecosystems.
Symposium on Environmental Consequences of Fire and Fuel Management in
Introduction
63
MediterraneanEcosystems,PaloAlto,CA,USA,pp.27–36.O’Dwyer, J.P.,Lake,J.K.,Ostling,A.,Savage,V.M.&Green,J.L.(2009)Anintegrativeframework
for stochastic, size-structured community assembly. Proceedings of the NationalAcademyofSciencesoftheUnitedStatesofAmerica,106,6170–6175.
Ofiteru, I.D., Lunn,M., Curtis, T.P.,Wells, G.F., Criddle, C.S., Francis, C.A. & Sloan,W.T. (2010)Combined niche and neutral effects in amicrobial wastewater treatment community.Proceedings of the National Academy of Sciences of the United States of America, 107,15345–15350.
Olszewski,D.(2012)EmployingKullback-LeiblerdivergenceandLatentDirichletAllocationforfrauddetectionintelecommunications.IntelligentDataAnalysis,16,467–485.
Pace,N.R.(1997)Amolecularviewofmicrobialdiversityandthebiosphere.Science,276,734–740.
Paine,R.T.(1980)FoodWebs:Linkage,InteractionStrengthandCommunityInfrastructure.TheJournalofAnimalEcology,49,666.
Palmer,M.W. (1994)Variation in species richness: towards a unification of hypotheses.FoliaGeobotanica,29,511–530.
Pawitan, Y. (2001) Inall likelihood: statisticalmodellingand inferenceusing likelihood, OxfordUniversityPress.
Poisot, T., Stouffer, D.B. & Gravel, D. (2014) Beyond species: why ecological interactionnetworksvarythroughspaceandtime.BioRxivpreprint.
Polis,G.A.,Sears,A.L.,Huxel,G.R.,Strong,D.R.&Maron, J. (2000)When isa trophiccascadeatrophiccascade?TrendsinEcology&Evolution,15,473–475.
Preston,F.W.(1948)Thecommonness,andrarity,ofspecies.Ecology,29,254–283.Preston,F.W.(1960)Timeandspaceandthevariationofspecies.Ecology,41,611–627.Pritchard, J.K., Stephens, M. & Donnelly, P. (2000) Inference of population structure using
multilocusgenotypedata.Genetics,155,945–959.Pueyo, S., He, F. & Zillio, T. (2007) The maximum entropy formalism and the idiosyncratic
theoryofbiodiversity.EcologyLetters,10,1017–1028.Ramirez,K.S.,Leff, J.W.,Barberan,A.,Bates,S.T.,Betley, J.,Crowther,T.W.,Kelly,E.F.,Oldfield,
E.E., Shaw, E.A., Steenbock, C., Bradford, M.A., Wall, D.H. & Fierer, N. (2014)Biogeographicpatterns inbelow-grounddiversity inNewYorkCity’sCentralParkaresimilartothoseobservedglobally.ProceedingsoftheRoyalSocietyB-BiologicalSciences,281,9.
Ricklefs,R.E.(2003)AcommentonHubbell’szero-sumecologicaldriftmodel.Oikos,100,185–192.
Ricklefs, R.E. (1987) Community diversity: relative roles of local and regional processes.Science(Washington),235,167–171.
Ricklefs, R.E. (2006) The unified neutral theory of biodiversity: Do the numbers add up?Ecology,87,1424–1431.
Roguet,A.,Laigle,G.S.,Therial,C.,Bressy,A.,Soulignac,F.,Catherine,A.,Lacroix,G.,Jardillier,L.,Bonhomme,C., Lerch,T.Z.&Lucas,F.S. (2015)Neutral communitymodel explains thebacterialcommunityassemblyinfreshwaterlakes.FemsMicrobiologyEcology,91,11.
Rønsted, N., Weiblen, G.D., Cook, J.M., Salamin, N., Machado, C.A. & Savolainen, V. (2005) 60million years of co-divergence in the fig–wasp symbiosis. Proceedings of the Royal
Introduction
64
SocietyB:BiologicalSciences,272,2593–2599.Rosen-Zvi,M.,Gri_ffiths,T.,Steyvers,M.&Smyth,P.(2004)TheAuthor-TopicModelforAuthors
andDocuments.Rosindell, J., Cornell, S.J., Hubbell, S.P.& Etienne, R.S. (2010) Protracted speciation revitalizes
theneutraltheoryofbiodiversity.EcologyLetters,13,716–727.Rosindell, J., Hubbell, S.P., He, F., Harmon, L.J. & Etienne, R.S. (2012) The case for ecological
neutraltheory.TrendsinEcology&Evolution,27,203–208.Rosvall, M., Axelsson, D. & Bergstrom, C.T. (2009) The map equation. The European Physical
JournalSpecialTopics,178,13–23.Sanger, F., Air, G.M., Barrell, B.G., Brown, N.L., Coulson, A.R., Fiddes, J.C., Hutchison, C.A.,
Slocombe,P.M.&Smith,M. (1977)NucleotidesequenceofbacteriophageφX174DNA.nature,265,687–695.
Scheffer,M.,Carpenter,S.R.,Lenton,T.M.,Bascompte,J.,Brock,W.,Dakos,V.,VandeKoppel,J.,Van de Leemput, I.A., Levin, S.A., Van Nes, E.H. & others (2012) Anticipating criticaltransitions.science,338,344–348.
Scheffer, M. & van Nes, E.H. (2006) Self-organized similarity, the evolutionary emergence ofgroupsof similarspecies.ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesofAmerica,103,6230–6235.
Scheffers,B.R., Joppa,L.N.,Pimm,S.L.&Laurance,W.F.(2012)Whatweknowanddon’tknowaboutEarth’smissingbiodiversity.TrendsinEcology&Evolution,27,501–510.
Schemske,D.W.,Mittelbach,G.G.,Cornell,H.V.,Sobel,J.M.&Roy,K.(2009)IsThereaLatitudinalGradient in the ImportanceofBiotic Interactions?AnnualReviewofEcology,Evolution,andSystematics,40,245–269.
Schuster, S.C. (2007)Next-generation sequencing transforms today’sbiology.NatureMethods,5,16–18.
Shafiei, M., Dunn, K.A., Boon, E., MacDonald, S.M.,Walsh, D.A., Gu, H. & Bielawski, J.P. (2015)BioMiCo:asupervisedBayesianmodelforinferenceofmicrobialcommunitystructure.Microbiome,3,8.
Shmida, A.V.I. & Wilson, M.V. (1985) Biological determinants of species diversity. Journal ofbiogeography,1–20.
Simberloff,D.S.&Wilson,E.O.(1969)ExperimentalZoogeographyofIslands:TheColonizationofEmptyIslands.Ecology,50,278–296.
Sizling, A.L., Storch, D., Sizlingova, E., Reif, J. & Gaston, K.J. (2009) Species abundancedistribution results from a spatial analogy of central limit theorem.Proceedingsof theNationalAcademyofSciencesoftheUnitedStatesofAmerica,106,6691–6695.
Sloan,W.T.,Lunn,M.,Woodcock,S.,Head,I.M.,Nee,S.&Curtis,T.P.(2006)Quantifyingtherolesof immigrationandchance inshapingprokaryotecommunitystructure.EnvironmentalMicrobiology,8,732–740.
Sloan,W.T.,Woodcock, S., Lunn,M.,Head, I.M.&Curtis,T.P. (2007)Modeling taxa-abundancedistributions in microbial communities using environmental sequence data.MicrobialEcology.
Soininen,J.,McDonald,R.&Hillebrand,H.(2007)Thedistancedecayofsimilarityinecologicalcommunities.Ecography,30,3–12.
ter Steege, H., Nigel, C.A., Sabatier, D., Baraloto, C., Salomao, R.P., Guevara, J.E., Phillips, O.L.,
Introduction
65
Castilho,C.V.,Magnusson,W.E.,Molino, J.F.,Monteagudo,A.,Vargas,P.N.,Montero, J.C.,Feldpausch, T.R., Coronado, E.N.H., Killeen, T.J., Mostacedo, B., Vasquez, R., Assis, R.L.,Terborgh,J.,Wittmann,F.,Andrade,A.,Laurance,W.F.,Laurance,S.G.W.,Marimon,B.S.,Marimon, B.H., Vieira, I.C.G., Amaral, I.L., Brienen, R., Castellanos, H., Lopez, D.C.,Duivenvoorden, J.F.,Mogollon,H.F.,Matos, F.D.D.,Davila,N., Garcia-Villacorta,R.,Diaz,P.R.S., Costa, F., Emilio, T., Levis, C., Schietti, J., Souza, P., Alonso, A., Dallmeier, F.,Montoya,A.J.D.,Piedade,M.T.F.,Araujo-Murakami,A.,Arroyo,L.,Gribel,R.,Fine,P.V.A.,Peres,C.A.,Toledo,M.,Gerardo,A.A.C.,Baker,T.R.,Ceron,C.,Engel,J.,Henkel,T.W.,Maas,P., Petronelli, P., Stropp, J., Zartman, C.E., Daly, D., Neill, D., Silveira,M., Paredes,M.R.,Chave,J.,Lima,D.D.,Jorgensen,P.M.,Fuentes,A.,Schongart,J.,Valverde,F.C.,DiFiore,A.,Jimenez, E.M.,Mora,M.C.P., Phillips, J.F., Rivas, G., van Andel, T.R., von Hildebrand, P.,Hoffman,B.,Zent,E.L.,Malhi,Y.,Prieto,A.,Rudas,A.,Ruschell,A.R.,Silva,N.,Vos,V.,Zent,S.,Oliveira,A.A.,Schutz,A.C.,Gonzales,T.,Nascimento,M.T.,Ramirez-Angulo,H.,Sierra,R.,Tirado,M.,Medina,M.N.U.,vanderHeijden,G.,Vela,C.I.A.,Torre,E.V.,Vriesendorp,C.,Wang,O.,Young,K.R.,Baider,C.,Balslev,H.,Ferreira,C.,Mesones,I.,Torres-Lezama,A.,Giraldo, L.E.U., Zagt, R., Alexiades, M.N., Hernandez, L., Huamantupa-Chuquimaco, I.,Milliken,W.,Cuenca,W.P.,Pauletto,D.,Sandoval,E.V.,Gamarra,L.V.,Dexter,K.G.,Feeley,K., Lopez-Gonzalez, G. & Silman,M.R. (2013)Hyperdominance in theAmazonian TreeFlora.Science,342,325–+.
Svenning, J. & Skov, F. (2007) Could the tree diversity pattern in Europe be generated bypostglacialdispersallimitation?Ecologyletters,10,453–460.
Taberlet,P.,Coissac,E.,Hajibabaei,M.&Rieseberg,L.H.(2012a)EnvironmentalDNA.MolecularEcology,21,1789–1793.
Taberlet, P., Coissac, E., Pompanon, F., Brochmann, C.&Willerslev, E. (2012b)Towards next-generation biodiversity assessment using DNAmetabarcoding.Molecular Ecology, 21,2045–2050.
Tansley,A.G.(1935)TheUseandAbuseofVegetationalConceptsandTerms.Ecology,16,284–307.
Teh,Y.W.,Jordan,M.I.,Beal,M.J.&Blei,D.M.(2006)HierarchicalDirichletProcesses.JournaloftheAmericanStatisticalAssociation,101,1566–1581.
Tilman,D.(1982)Resourcecompetitionandcommunitystructure,Princetonuniversitypress.Tilman,D.,May,R.M.,Lehman,C.L.&Nowak,M.A.(1994)Habitatdestructionandtheextinction
debt.Nature,371,65–66.Tilman,D.,Reich,P.B.&Knops,J.M.H.(2006)Biodiversityandecosystemstabilityinadecade-
longgrasslandexperiment.Nature,441,629–632.Tokeshi, M. (1996) Power Fraction: A New Explanation of Relative Abundance Patterns in
Species-RichAssemblages.Oikos,75,543–550.Tuomisto, H., Ruokolainen, K. & Yli-Halla, M. (2003) Dispersal, environment, and floristic
variationofwesternAmazonianforests.Science,299,241–244.Vaduva, C., Gavat, I. & Datcu, M. (2013) Latent Dirichlet Allocation for Spatial Analysis of
SatelliteImages.IeeeTransactionsonGeoscienceandRemoteSensing,51,2770–2786.Vallade,M.&Houchmandzadeh,B.(2003)Analyticalsolutionofaneutralmodelofbiodiversity.
PhysicalReviewE,68,5.Valle,D.,Baiser,B.,Woodall,C.W.&Chazdon,R.(2014)Decomposingbiodiversitydatausingthe
Introduction
66
LatentDirichletAllocationmodel,aprobabilisticmultivariatestatisticalmethod.EcologyLetters,17,1591–1601.
deVargas,C.,Audic,S.,Henry,N.,Decelle,J.,Mahe,F.,Logares,R.,Lara,E.,Berney,C.,LeBescot,N., Probert, I., Carmichael, M., Poulain, J., Romac, S., Colin, S., Aury, J.-M., Bittner, L.,Chaffron, S., Dunthorn, M., Engelen, S., Flegontova, O., Guidi, L., Horak, A., Jaillon, O.,Lima-Mendez,G.,Lukes,J.,Malviya,S.,Morard,R.,Mulot,M.,Scalco,E.,Siano,R.,Vincent,F.,Zingone,A.,Dimier,C.,Picheral,M.,Searson,S.,Kandels-Lewis,S.,Acinas,S.G.,Bork,P.,Bowler,C.,Gorsky,G.,Grimsley,N.,Hingamp,P.,Iudicone,D.,Not,F.,Ogata,H.,Pesant,S.,Raes, J.,Sieracki,M.E.,Speich,S.,Stemmann,L.,Sunagawa,S.,Weissenbach, J.,Wincker,P., Karsenti, E. & Tara Oceans, C. (2015) Eukaryotic plankton diversity in the sunlitocean.Science,348.
Vellend,M.(2010)Conceptualsynthesisincommunityecology.QuarterlyReviewofBiology,85,183–206.
Vilhena, D.A. & Antonelli, A. (2015) A network approach for identifying and delimitingbiogeographicalregions.NatureCommunications,6.
Volkov, I.,Banavar, J.R.,Hubbell, S.P.&Maritan,A. (2003)Neutral theoryandrelative speciesabundanceinecology.Nature,424,1035–1037.
Wakeley, J. (2009) Coalescent Theory: An Introduction, Roberts & Company Publishers,GreenwoodVillage.
Wallace,A.R.(1876)TheGeographicalDistributionofAnimals:WithaStudyoftheRelationsofLivingandExtinctFaunasasElucidatingthePastChangesoftheEarth’sSurface:InTwoVolumes.
Wang,H.G.,Wei,Z.,Mei,L.J.,Gu,J.X.,Yin,S.S.,Faust,K.,Raes,J.,Deng,Y.,Wang,Y.L.,Shen,Q.R.&Yin, S.X. (2017) Combined use of network inference tools identifies ecologicallymeaningfulbacterialassociationsinapaddysoil.SoilBiology&Biochemistry,105,227–235.
Watson,H.C.(1859)CybeleBritannica,.Watterson,G.A.(1974)Modelsforthelogarithmicspeciesabundancedistributions.Theoretical
PopulationBiology,6,217–250.Whittaker,R.H.(1965)DominanceandDiversityinLandPlantCommunities.Science,147,250–
260.Whittaker,R.H.(1960)VegetationoftheSiskiyouMountains,OregonandCalifornia.Ecological
Monographs,30,279–338.Williamson,M.H. (1988)Relationshipofspeciesnumbertoarea,distanceandothervariables.Á
In:Myers,AAandGiller,P.S.(eds),Analyticalbiogeography,anintegratedapproachtothestudyofanimalandplantdistributions,ChapmanandHall,pp.91Á115.
Wisz, M.S., Pottier, J., Kissling, W.D., Pellissier, L., Lenoir, J., Damgaard, C.F., Dormann, C.F.,Forchhammer,M.C.,Grytnes,J.-A.,Guisan,A.,Heikkinen,R.K.,Høye,T.T.,Kühn,I.,Luoto,M.,Maiorano,L.,Nilsson,M.-C.,Normand,S.,Öckinger,E.,Schmidt,N.M.,Termansen,M.,Timmermann, A., Wardle, D.A., Aastrup, P. & Svenning, J.-C. (2013) The role of bioticinteractions in shapingdistributionsand realisedassemblagesof species: implicationsforspeciesdistributionmodelling.BiologicalReviews,88,15–30.
Woodcock, S., vanderGast,C.J.,Bell,T., Lunn,M.,Curtis,T.P.,Head, I.M.&Sloan,W.T. (2007)Neutralassemblyofbacterialcommunities.FemsMicrobiologyEcology,62,171–180.
Introduction
67
Wright, J.P., Jones, C.G. & Flecker, A.S. (2002) An ecosystem engineer, the beaver, increasesspeciesrichnessatthelandscapescale.Oecologia,132,96–101.
Wright,S.(1931)EvolutioninMendelianpopulations.Genetics,16,97–159.Wright, S.J. (2002) Plant diversity in tropical forests: a review of mechanisms of species
coexistence.Oecologia,130,1–14.Yu,D.W., Ji,Y.Q.,Emerson,B.C.,Wang,X.Y.,Ye,C.X.,Yang,C.Y.&Ding,Z.L. (2012)Biodiversity
soup: metabarcoding of arthropods for rapid biodiversity assessment andbiomonitoring.MethodsinEcologyandEvolution,3,613–623.
Zillio,T.,Volkov,I.,Banavar,J.R.,Hubbell,S.P.&Maritan,A.(2005)Spatialscalinginmodelplantcommunities.PhysicalReviewLetters,95.
Introduction
68
Chapter1–DNA-basedBetaDiversity
69
Chapter1CausesofvariationinsoilbetadiversityacrossdomainsoflifeinthetropicalforestsofFrenchGuiana
GuilhemSommeria-Klein1,2,LucieZinger1,2,AmaiaIribar1,ElianeLouisanna3,Sophie
Manzi1,VincentSchilling1,EricCoissac4,HeidySchimann3,PierreTaberlet4,Jérôme
Chave1
1UniversitéToulouse3PaulSabatier,CNRS,IRD,UMR5174LaboratoireEvolutionetDiversitéBiologique(EDB),F-31062Toulouse,France.2EcoleNormaleSupérieure,CNRS,UMR8197InstitutdeBiologiedel’ENS(IBENS),F-75005Paris,France3INRA,AgroParisTech,CIRAD,CNRS,UniversitédesAntilles,UniversitédelaGuyane,UMREcologiedesForetsdeGuyane(EcoFoG),F-97379Kourou,France.4UniversitéGrenobleAlpes,CNRS,UMRLaboratoired'EcologieAlpine(LECA),F-38000Grenoble,France.
Chapter1–DNA-basedBetaDiversity
70
Chapteroutline
Betadiversitypatterns,i.e.howtaxonomiccompositionshiftsthroughspace,havelong
been used to infer the mechanisms of community assembly. Indeed, depending on
whether taxonomic composition covaries with environmental conditions or with
geographical distance, it can be inferred whether community assembly is driven by
deterministic niche processes or by neutral dispersal limitation. In this chapter, this
reasoningisappliedtoasoilDNAdatasetcollectedinvarious1-haforestplotsinFrench
Guiana,forarangeofbarcodesspanningmostofthetreeoflife.Toenablebothtypesof
processestobedistinguished,thesampledplotscoverarangeofsoiltypesaswellasa
rangeofinter-plotdistances.Inter-plotdistancesareapproximatelyregularlyspacedon
alogarithmicscale,soastobetterassesstheeffectofdispersallimitationontaxonomic
composition.Indeed,neutraldispersallimitationispredictedtoyieldalineardecrease
of taxonomicsimilaritywith log-distance.Asasidequestion, theeffectofpast logging
activitiesonsoilbiodiversityisassessedbasedonasetofdisturbedforestplots.
Chapter1–DNA-basedBetaDiversity
71
Abstract
Disentanglingtheprocessesthatcausetheassemblyofecologicalcommunitiesisakey
challenge, and these include both stochastic (neutral) processes and deterministic
niche filtering. Progress in biodiversity assessment using environmental DNA now
streamlines the studyofbiodiversitypatterns acrossdomainsof life.Using soilDNA
samples,wequantifiedthecausesofvariationinbetadiversitypatternsacrossmajor
taxonomic groups in the lowland tropical forest of French Guiana on a spatial scale
ranging from 40 m to 140 km, for a range of soil physico-chemical properties. We
quantifiedtherespective influenceofsoilconditions,dispersal limitation,andhuman
disturbances on beta diversity. In undisturbed forest plots, we found that the beta
diversity of bacteria and protists was primarily driven by soil conditions, while the
observedpatternsinplants,andtoalesserextentinannelids,werebestexplainedby
dispersal limitation. Both factors had an effect on fungi, arthropods and insects,
whereaswecouldnotdetect influenceofeitherfactoronnematodesandflatworms.
This analysis was consistent with a comparison of our data to the similarity decay
predicted by the neutral theory of biodiversity. These results suggest that spatial
patterns of plant biodiversity across the Amazon do not necessarily extend to other
taxonomicgroups, and that environmental factorsplaya foremost role in explaining
thesepatternsintropicalsoils.Alongthedisturbancegradient,wefoundasignificant
shift in taxonomic composition in two functionally important groups, plants and
annelids,asmallereffectonfungi,andnoeffectintheothergroups.
Chapter1–DNA-basedBetaDiversity
72
Chapter1–DNA-basedBetaDiversity
73
Introduction
Beta diversity describes the turnover of taxonomic composition through geographical
and environmental space, and yields insight into the mechanisms of community
assembly(Whittaker,1960,1972;Rosenzweig,1995;Gaston&Blackburn,2008).Asa
measureofthespatialvariabilityoftaxonomiccomposition,itmaybebroadlydefinedas
thedifferenceor ratiobetweenregional (gamma)diversityand local (alpha)diversity
(Whittaker, 1960; Chao et al., 2012). This has important practical implications for
biodiversityestimatesandconservation(Bassetetal.,2012;Hubbell,2013;terSteegeet
al.,2013;Socolaretal.,2017).
The extent of beta diversity and its causal mechanisms are dependent on the
spatial scale at which taxonomic turnover is considered (Soininen et al., 2007). Beta
diversityisoftenquantifiedwithinabiogeographicregion,sothatitisnotcausedbya
largeclimaticdifferenceoradifferentbiogeographichistorybetweenstations(Kreft&
Jetz,2010).Variationinbetadiversitycanbeascribedtotwotypesofprocesses:niche-
basedprocesses,whenabiotic andbiotic environmentalheterogeneitydetermines the
spatialdistributionoftaxabasedontheirphenotypicdifferences,andneutralprocesses,
when turnover in taxonomic composition results from demographic stochasticity
combinedwithlimiteddispersal(Leiboldetal.,2004).However,becauseenvironmental
differencestendtoalsobespatiallystructured,bothtypesofprocessesareoftendifficult
todisentangle(Gilbert&Lechowicz,2004).
Onefrontierinthestudyofbetadiversityisthatithasmostoftenbeenrestricted
toasingletaxonomicgroup,andespeciallyforesttrees(Whittaker,1960,1972;Nekola
&White,1999;Conditetal., 2002), amphibians (Baselgaetal., 2012), andarthropods
(Harrison et al., 1992; Novotny et al., 2007; Hortal et al., 2011), and freshwater taxa
(Cottenie, 2005). Studies that have attempted to compare patterns of beta diversity
acrosstaxaarescarce(butseeHarrisonetal.,1992).This is largelybecausetheeffort
needed to coordinate inventories of biological diversity across taxa is enormous, and
increases dramatically for smaller-bodied taxa (Lawton et al., 1998). DNA-based
methods have lifted this constraint and they have dramaticallywidened the range of
Chapter1–DNA-basedBetaDiversity
74
taxaforwhichdiversitypatternscanbemeasured.Insteadofcollectingorganismsand
assigning them a taxon label based on observation and on expert knowledge,
identificationisbasedonminuteamountsofbiologicalmaterialandonthesequencing
of universal DNA amplicons (DNA barcodes), a method first developed for
microorganisms (Pace, 1997). This method has been extended to rapid taxonomic
surveys:bulkDNAisextractedfromenvironmentalsamplesandDNAisamplifiedusing
universal primers, then sequenced (Taberlet et al., 2012, Yu et al. 2012). This
environmental DNA approach to biological diversity inventory aims at detecting the
presence of cells or of extracellular DNA for a range of taxa in a sample. Such an
approachisinprincipleapplicabletoanytaxonomicgroupinthetreeoflife(Bahramet
al., 2013; Schuldtetal., 2015; Siles&Margesin, 2016;Vincentetal., 2016). Since it is
possibletonormalizetheDNAextractionandsequencingproceduresformanysamples
atonce,suchanapproachissuitedtotheexplorationofbetadiversitypatterns.
We expect that smaller organisms with short generation times display higher
beta diversity at short spatial scale, i.e. over a few meters, than larger organisms,
becausetheyare locally filteredbyenvironmentalheterogeneity(Ramirezetal.,2014;
Mariadassouetal.,2015).Conversely,thebetadiversityofsmallorganismsispredicted
to be less dependent on distance compared to large organisms, owing to their higher
dispersalability(Soininenetal.,2007).Thus,weexpectthespatialdistributionofsmall
organismstobeprimarilygovernedbynicheeffects,whileweexpectlargeorganismsto
better comply with distance-limited neutral dynamics (Hubbell, 2001; Martiny et al.,
2011).Thesepredictionshavedirectimplicationsforthemaintenanceofbiodiversityin
disturbedlandscapes.Organismswithhigherdispersalabilitiesshouldbefoundevenin
heavilydisturbedhabitats.Ontheotherhand,slowdispersersshouldbemoreaffected
bydisturbances,andwouldalsotakelongertorecolonizehabitatsafterabandonment.
Inthisstudy,wecomparesoilbetadiversitypatternsacrossdomainsoflifeina
lowland tropical rainforest. We collected soil samples at locations separated by a
geographical distance ranging from 40m to 140 km, and spanning a variety of soil
types,whichwe quantified, aswell as a range of humandisturbance intensities.We
targeted taxonomic groups using barcodes with different levels of taxonomic
resolution,whichallowedustotesttherobustnessoftheobservedpatterns.Wethus
address the following questions: 1) What is the relative importance of dispersal
Chapter1–DNA-basedBetaDiversity
75
limitation and environmental filtering in explaining beta diversity across taxonomic
groups? 2) How good a fit is the dispersal-limited neutral theory for the various
taxonomicgroups?3)Howdoesbetadiversitydependonforestdisturbancebylogging
activities?Finally,weexplorethe implicationsofour findings forcommunityecology
andfortheconservationoftropicalforestecosystems.
Chapter1–DNA-basedBetaDiversity
76
Methods
Samplingscheme1.
Wesampledfifteen1-haplotsintheundisturbedlowlandrainforestofFrenchGuiana,
to which we added four 1-ha plots in disturbed habitats (see below). Geographical
distances between plots in pristine forest are approximately regularly spaced on a
logarithmicscale.Thischoicewasmotivatedbytheexpectationofalinearrelationship
between taxonomic similarity and log-distance in a spatially explicit neutral model
(Chave & Leigh, 2002). Twelve plots are located at the Nouragues research station
(about100kminland;latitude4°5’17"Nandlongitude52°40’48"W;Bongersetal.,
2001),andthreeattheParacouresearchstation(nearthecoast; latitude5°18′Nand
longitude52°53′W;Gourlet-Fleuryetal.2004);seeFig.1forlocations.Allplotsconsist
ofterrafirmeforest,butcoverarangeofsoiltypes(seebelow).
Inaddition to samplingplots inundisturbed forest,wealsosampledareas that
haveundergonedisturbancesofdifferentintensities.AtParacou,someplotshavebeen
experimentally logged at several logging intensities starting in 1986
(https://paracou.cirad.fr/experimental-design). In the twoheaviest logging treatments
(T2andT3),33-56%oftheabovegroundbiomasswaslostduetothefellingoperations.
Eighteenyearsafterlogging,theimpactofloggingactivitieswasstillvisible.Wesampled
twocontiguous1-haplotsinoneofthemostheavilyimpactedareas(P12plot).Wealso
sampled two contiguous 1-ha plots in a 25-ha area (Arbocel plot) 14 km away from
Paracou,thatwasfullyclear-cutin1976andleftregeneratingsincethen.
Withineach1-haplot,wecollectedeightysoilsamplesofabout30geachwithan
auger from themineral soil horizon (~10 cmdeep) along a square grid. Tominimize
samplingbiasandcoarsenthespatialgrain,wepooledsoilsamplesfivebyfivefollowing
across-shapedpatternabout15metersacross,withonesampleatthecentreandfour
samples in the corners (Fig. 2). This resulted in sixteen pooled samples per plot.We
extracted DNA from about 10 g of soil per pooled sample within a few hours after
Chapter1–DNA-basedBetaDiversity
77
sample collection, using the protocol described in Zinger etal. (2016). The remaining
soilwasdriedforsubsequentanalysesofsoilproperties.
Figure1:Samplingscheme.Relativepositionofallsampled1-haforestplots,in(A)Paracou,and (B) Nouragues; (C) relative position of the Paracou, Arbocel, and Nouragues sites.Undisturbedplotsareinredandthefourdisturbedplots(twoinParacouandtwoinArbocel)inyellow. In Nouragues, PP and GP denote respectively the Petit Plateau and Grand Plateaupermanentmonitoredplots,and‘GP-liana’denotestheL18subplotinGrandPlateau.
DNA amplification and sequencing yielded read counts for Operational
TaxonomicUnits(OTUs)atsixteensitesperplot(seebelow).Wefurtherpooledthese
samples four by four by averaging relative OTU abundances, so as to obtain one
samplingpointper0.25-haplot(Fig.2).Wedefinedthedistancebetweentwosampling
points as the distance between the centres of the two sets of pooled samples. Some
sampleswere removed from thedatasetowing to insufficientPCRyields (seebelow);
hencesomesamplingpointshavefewerthanfoursamplesoraremissing.
Inselberg summit
PP GP-liana
GP
Balanfois
Parare
P12P11
P06 Nouragues
ParacouArbocel
A B C
D
French Guiana
Chapter1–DNA-basedBetaDiversity
78
Soilsampleswerealsopooled fourby four toobtainasinglecompositesample
per0.25-hasubplot.Foreachpooledsoilsample,twelvemeasurementsweremadefrom
about60gofdrysoil.Granulometrydistinguishedtheclay(0-2µm),silt(2-63µm)and
sandfractions(63-2000µm).ThepHofsoilinwatersolutionwasmeasured,aswellas
total carbon (C) andnitrogen (N)mass fractions. Themass fraction of plant-available
phosphorus (P2O5) was measured using the Olsen extraction method. Lastly, a BaCl2
extractionwasperformedandtheconcentrationofmajorelementswasmeasuredusing
ICPMS(Ca,Mg,K,Fe,Mg,andAl).
Molecularandsequenceanalyses2.
WeamplifiedfivebarcodesbyPCRfromsoilsamples,targetingbacteria(16SrRNAgene
V5-V6regions;Fliegerovaetal.,2014),eukaryotes(18SrRNAgenev7region;Guardiola
etal.,2015),Viridiplantae(chloroplastictrnL-P6loop;Taberletetal.,1991),fungi(ITS1)
and insects (mitochondrial 16S rRNA; Clarke et al., 2014). Each soil sample was
amplified thrice independentlybyPCR, following the sameprotocol as inZingeretal.
(2017). Ampliconswere labelledwith a distinct nucleotide tag for each PCR, and six
sequencinglibraries,oneperbarcode,wereprepared.Sequencingwascarriedoutusing
paired-end Illumina sequencing (MiSeq 2x250 for 16S bacteria, 16S insects and ITS
fungi; HiSeq 2x100 for 18S eukaryotes and trnL plants). Negative PCR controls were
included in theprotocol to helpdetect contaminants. ThePCRs that yielded less than
1,000readswerediscardedfromsubsequentanalyses.
Data analyseswere conducted as inZingeretal. (2017). Sequencingdatawere
curated using the OBITools package (Boyer et al., 2016): paired-end reads were
assembled, dereplicated, and low-quality sequences were excluded. The resulting
sequenceswereclusteredintoOTUsusingtheInfomapalgorithm(Rosvalletal.,2009),
withadissimilaritythresholdofthreemismatchesandexponentiallydecreasingweights
onedges.OTUsrepresentedbyasinglesequencewereremoved,andthemostabundant
sequence in the clusterwas taken to be the true sequence. Taxonomic identifications
were assigned to OTUs using the ecotag program in the OBITools package based on
GenbankandSILVAdatabases(Zingeretal.,2017).OTUswithlessthan75%similarity
Chapter1–DNA-basedBetaDiversity
79
to any reference sequence were removed, as well as those with a taxonomic
identification outside of the taxonomic group targeted by the barcode. Further steps
were takentominimize thenumberofcontaminantOTUsasdescribed inZingeretal.
(2017). RareOTUswere not removed, and only the relativeOTU abundances in each
samplewereusedforfurtheranalyses.
Figure 2: Sampling scheme in each 1-ha forest plot. In each of the nineteenplots (fifteenundisturbedandfourdisturbed),eightysoilsampleswerecollected(openandfullblackcircles),and were pooled five by five (small dashed crosses). After conducting the molecular andsequenceanalysesonthesixteenpooledsamples(fullblackcircles),resultswerepooledfourbyfour(largedashedcross),andstatisticalanalyseswereperformedontheresultingfoureffectivesamplingpoints(opensquares).Thesixteenpooledsoilsampleswerealsodirectlypooledfourbyfourforconductingsoilanalyses.
Taxonomic identifications for the eukaryote 18S marker were used to assign
OTUs to sub-clades (Table S1): arthropods, insects, annelids, nematodes, flat worms
(Platyhelminthes), protists, fungi, and plants (Viridiplantae). The 18S marker was
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80 100
020
4060
8010
0
● ●
●●
● ●
●●
● ●
●●
● ●
●●
● ●
●●
● ●
●●
● ●
●●
● ●
●●
● ●
●●
● ●
●●
● ●
●●
● ●
●●
● ●
●●
● ●
●●
● ●
●●
● ●
●●
Sampling scheme1-ha plot
(m)
Chapter1–DNA-basedBetaDiversity
80
comparedwithmorespecificmarkersforfungi,plants,andinsects(ITS1,trnL,and16S,
respectively; Table S1). A rarefaction analysis was performed for each marker by
sampling with replacement between 1 and 8,000 reads per sample (Fig. S1). For all
markers,thenumberofOTUsreachednearsaturationinmostsamples.
Statisticalanalyses3.
We performed all statistical analyses in R using the ‘vegan’ package (version 2.4-2,
available at https://cran.r-project.org/), and followed the guidelines of Legendre &
Legendre(2012).Analyseswereperformedseparatelyforeachtaxonomicgroup.
WeperformedaPCAonsoilvariablesaftercenteringandnormalizingthem(i.e.,
subtractingtheirmeananddividingthembytheirstandarddeviationoverallsampling
points). Since clay, silt and sand fractions sum to 1, they yield only two independent
measurements;wechosetokeepclayandsiltfractions,asclayandsandfractionswere
almostperfectlyanticorrelated (correlationcoefficientof -0.97; seeResults,TableS3).
BeforeconductingthePCA,welumpedCa,Mg,MnandKconcentrationstogetherintoa
single‘exchangeablecations’variable.Inallfurtheranalyses,weusedthefirstfourPCA
axesasenvironmentalvariables.
We first studied the taxonomic dissimilarity among pairs of sampled locations
(‘distance-based’approach).Wecomputed theSorensen taxonomicdissimilarity index
(numberofnon-sharedOTUsdividedbynumberofOTUsinbothsamples),whichisone
possiblemeasureofoccurrence-basedbetadiversity(Koleffetal.,2003).TheSorensen
index between pairs of sampling points was regressed against their environmental
dissimilarity and against the logarithm of their geographical distance (measured in
meters).Theenvironmentaldissimilaritybetweentwosamplingpointswasdefinedas
theirEuclidiandistancewithrespecttothefoursoilPCAaxes.Totestthesignificanceof
regressionsoftheSorensenindexagainstenvironmentalandgeographicaldistances,we
performed Mantel tests with 999 permutations using simple and partial Pearson’s
correlationcoefficientsasteststatistics(functions‘mantel’and‘mantel.partial’).
Chapter1–DNA-basedBetaDiversity
81
Figure 3. Principal Component Analysis of soil variables for the fifteen undisturbed plots,projectedonthefirsttwoaxes(40%and30%oftotalvariance).‘GP-bottom‘correspondstothelowerhalfoftheGP-O13plot,whichbelongstoabottomland.
We then directly compared the taxonomic composition of sampled locations
using canonical ordination ('raw-data' approach; Legendreetal., 2005).We regressed
the OTU abundance data on environmental and spatial variables using Canonical
Redundancy Analysis (RDA; function ‘rda’). We first applied the Hellinger
transformationtoOTUabundancedata(i.e.,square-rootoftherelativeOTUabundances
at each sampling point) and centred them per OTU (i.e., subtracted the mean over
samplingpoints).Weusedthesixselectedsoilvariablesasexplanatoryenvironmental
variables,aftercentringandnormalization.WeusedPrincipalCoordinatesofNeighbour
Matrices (PCNM) as spatial explanatory variables representing different possible
patternsofspatialautocorrelationinthedata(Borcard&Legendre,2002;Borcardetal.,
2004). Two separate PCNM decompositions were performed for the Nouragues and
Paracou sites (function 'pcnm'; Borcard & Legendre, 2002), i.e. in each site we
performed a Principal Coordinates Analysis of the distancematrix between sampling
points, after setting all distances larger than a threshold distance to four times this
threshold distance (chosen as the minimal distance required to connect all sampling
points). We obtained seventeen PCNM variables with positive eigenvalues for
Nouragues,andsixforParacou.PCNMvariablesfrombothsiteswereassembledintoa
●
●●
●
●
●●
●
●
●
●
●●
●
● ●
●
●
●●
●
●
●
●
● ●●
●
●● ●
●
● ●
●
●
●●
●●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
Balanfois
GP
GP−bottom
GP−Liana
Inselberg summit
Paracou Parare
PP
pH
Ctot
Ntot P2O5 Clay
Silt
Al
Fe
Mg+Mn+Ca+K
d = 2
NBA1−ABGH
NBA1−CDEF NBA1−KLMN NBA1−IJOP
NBA2−ABGH
NBA2−CDEF NBA2−KLMN
NBA2−IJOP NINS−ABGH NINS−CDEF NINS−KLMN
NINS−IJOP
NF21−ABGH
NF21−CDEF NF21−KLMN
NF21−IJOP
NH20−ABGH NH20−CDEF
NH20−KLMN
NH20−IJOP
NH21−ABGH NH21−CDEF
NH21−KLMN
NH21−IJOP
NL11−ABGH
NL11−CDEF
NL11−KLMN NL11−IJOP
NL12−ABGH NL12−CDEF
NL12−KLMN NL12−IJOP
NO13−ABGH NO13−CDEF
NO13−KLMN
NO13−IJOP
NPA5−ABGH
NPA5−CDEF NPA5−KLMN
NPA5−IJOP NPA6−ABGH NPA6−CDEF
NPA6−KLMN
NPA6−IJOP
NL18−ABGH
NL18−CDEF
NL18−KLMN
NL18−IJOP
P063−ABGH P063−CDEF P063−KLMN P063−IJOP
P064−ABGH P064−CDEF
P064−KLMN
P064−IJOP
P111−ABGH P111−CDEF P111−KLMN P111−IJOP
pH
Ctot
Ntot P2O5 Clay
Silt
Al
Fe
Mg+Mn+Ca+K
Eigenvalues
Chapter1–DNA-basedBetaDiversity
82
single staggered matrix. The two submatrices were connected by adding a ‘dummy’
variable distinguishing Nouragues and Paracou sites by two different values. At each
site,weaddedUTMcoordinates(northingsandeastings)astwoadditionalexplanatory
variablesaftercentringandnormalization,soastoaccountforlinearspatialtrendsthat
cannotbecapturedbyPCNMvariables.
The total variance of taxonomic composition was partitioned between an
environmental component and a spatial component (function 'varpart'; Borcardetal.,
1992; Legendre et al., 2005). Two RDA-based forward selections of environmental
variables and spatial variableswereperformed separately (function 'ordiR2step'with
0.05thresholdp-valueforaddingavariabletothemodel;Blanchetetal.,2008),yielding
twoRDA-basedlinearmodels.WeonlyproceededwithvariableselectionwhentheRDA
conductedonallvariableswassignificant(p < 0.05;Blanchetetal.,2008);whenitwas
notforeitherenvironmentalorspatialvariables,wedidnotpartitionthevariance.
We then tested the predictions of the dispersal-limited neutral theory on the
dataset.Neutral processes arepredicted to yield a decayof taxonomic similaritywith
distance in the absence of dispersal barrier (Chave & Leigh, 2002). We used here
𝐹! 𝐴,𝐵 = 𝑝!!𝑝!!!!!! as ameasure of taxonomic similarity between samplesA andB,
where𝑝!!istheproportionofspeciessinsampleA,𝑝!! thatinsampleB,andSthetotal
number of species. Chave and Leigh (2002) predicted that in a continuous spatially
explicit dispersal-limitedneutralmodelwith spatial density of individuals𝜌, dispersal
parameterizedbyaGaussiankernelwithvariance𝜎!, anda rateof apparitionofnew
species equal to𝜈, 𝐹! 𝐴,𝐵 depends only on the pairwise distance𝑟between samples,
and can be expressed as𝐹! 𝑟 = − 𝑎 ln 𝑟 +𝑏, with𝑏 𝑎 = ln 2𝜈 2𝜎 + 𝛾(where𝛾is
Euler’s constant) and1 𝑎 = 𝜌𝜋𝜎! − ln 𝜈 /2 (cf. Appendix). We measured𝐹! among
pairsofsamplingpoints,regresseditagainstthelog-transformedgeographicaldistance
ln 𝑟 , and assessed significance byMantel test for 999 permutations, using Pearson’s
correlationcoefficientasteststatistics.Themeandispersaldistancepergeneration 2𝜎
canbeobtainedprovidedthatanestimateof𝜌isavailable.Forplants,weassumedthat
most of DNA retrieved came from tree species, and that the forest holds 500mature
trees(≥10cmdbh)perhectare,i.e.𝜌 = 0.05 m!!,whichisclosetoobserveddensities
Chapter1–DNA-basedBetaDiversity
83
(seeConditetal.2002).Wealsocomputedthequantity𝜎! 𝜈,whichmaybeinterpreted
astheratiobetweendispersalabilityanddiversificationrate.
Figure4:Occurrence-based(Sorensen)dissimilarityasafunctionoflog-distance.Theredlinefiguresthelinearregression.
Finally,weconductedaseparateanalysistoexplorehowbetadiversitydepends
on logging activities. Because our sampling effort along this disturbance gradientwas
Sore
nsen
dis
sim
ilarit
y
Log10 of geographical distance
●●
●●
● ●
●
●
●●●●
●●●●
●●●
●●●●●●
●
●
●
●
●●●●●
●●
●●
●
●
●●●●●●●●
●●●
●●
●●
● ●●
●
●●●●
●
●●●
●●●
●●●●
●●
●
●
● ●
●●●●●
●●
●●
●
●
●●●●●●●●
●●●
●
●
●
● ●
●
●
●
●●●
●
●
●●
●
●
●
●●●●
●
●
●
●
●●●●●●●●●
●●
●
●
●●●●
●
●●
●
●●●●
●
● ●
●
●
●
●●●
●
●●
●
●
●
●
●●●●●
●
●
●
●●●●●●●●●
●●
●
●
●●●●
●
●●
●
●
●●●
●●●
●
●●●●
●
●●
●
●●●
●●●●●
●
●
●
●●
●●●●●●●
●●
●
●
●●●●●●●
●
●●●●
●●●
●
●●●●●
●●
●
●
●●●●●
●
●●
●
●●
●●●●●●●
●●●
●
●●●●●●●
●
●●●●
● ●●●●●●●
●●
●●●●●●●●●
●
●●
●●●●●●●●
●●
●
●
●●●●●●●
●
●
●●
●
●
●●●●
●●
●●
●●●●●●●●
●
●
●● ●
●●●●●●●
●●
●
●
●●●●●●●●
●●●
●
●
●
●●
●●●
●
●
●
●
●●●●●
●
●
●● ●●
●●●●
●●
●●
●
●
●●●●
●●●
●
●
●●
●●
●●
●
●●
●
●
●
●
●●●●●
●
●
●● ●●
●
●●●
●●
●●
●
●
●●●●
●●●
●
●●●●●●
●
●
●
●
●●●
●●●●
●●
●
●
●
●
●●●●●●●
●●
●
●
●●●●●●●
●
●●●
●
●
●●
●
●
●●●
●●●●●●
●
●●
●●●●●●
●●
●●
●
●
●●●●●●●●
●●●●●
●●
●
●
●●
●●●●●
●
●
●● ●
●
●●●●
●●
●●
●
●
●●●●●●●
●
●●●●
●
●●
●
●●●●●●
●
●●
●
●●●●●●●
●●
●●
●
●
●●●●●●●●
●●●
●
●●
●●●
●●●●
●●
●
●
●●
●●●●●
●●
●●
●
●
●●●●●●●●
●●●●
●
●
●
●
●●●●●
●
●
●●
●●●●●●●●
●●
●
●
●●●●●●●●
●●●●
●
●
●
●●●●●
●
●
●
● ●●
●●●●
●●
●●
●
●
●●●●●●●●
●●●●
●
●
● ●
●
●
●●●
●
●●●●●●●●●
●●●● ●●
●●●●●●●●●●
●
● ●
●●
●
●
●
●
●
●
●●●●●●●
●
●
●
●
●●●●
●
●●●●
●●
●● ●
●●
●
●
●
●
● ●●●●●●●●
●●●●
●●●●●●●●●●●●
●●
●
●
●
●●
●
●●●●●●●●●
●
●
●
●●●●
●
●●
●●●●●
●●
●
●
●●
●
●●●●●●●●
●●
●
●
●●●●
●
●●
●
●●●●
●●
●
●●●
●●●●●●●●
●●
●
● ●●●●●
●●●
●●●●
●
●
●●
●
●●●●●●●●
●●
●
● ●●●●●●●●●●●●
●
●●
●
●
●●●●●●●●●
●
●
●●●●
●
●●
●
●●●●
●
●
●
●
●●●●●●●
●●●●
●●●●●●●
●
●●●●
●
● ●●●●●●●●
●●
●
●
●●●●●●●
●
●●●●
●
●
●●●●●●
●●●
●
●
●●●●
●●●●
●●●●
●
●●●●●●●●
●
●
●●●●●●●●●●●●●
●●
● ●●
●●
●●
●
●
●●●●●
●●●●●●●
● ●●
●●●
●●
●
●
●●●●●
●●●●●●●
●●
●●
●
●●
●
● ●●●●●●●●●●●
●
●
●●●
●●
●
●
●●●●●
●●●●●●●
● ●●
●●
●
●
●●●●●●●●●●●●
●●
●●
●
●
●●●●●●●●●●●●
●
●●
●
● ●●●●●●●●●●●●
●●
●
●●●●●●●●●●●●●
●
●
●
●●●●●●●●
●●●
●
●
●
●●●●●●●●●
●●
●
●
●●●●●●●●
●
●●
●
●●●●●●●●●●●
●
●●●
●
●
●
●●●●●
●●
●
●●
●
●●●●
●
●
●●
●●●●●
● ●●●
●●●●
●●
●
●●●●
●● ●
●●●● ●●●●
●●●
●
●●
●●●●
2 3 4 5
0.0
0.2
0.4
0.6
0.8
1.0
Bacteria 16S
●●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●●
●
●●
●●●
●
●
●
●
●
●
●●
●●
●
●●●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●●
●
●●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●●●●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●●●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●●
●
●●
●
●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●●
●
●●
●●●●
●●
●
●
●
●
●●
●
●●
●●
●●●
●
●
●
●
●●
●
●
● ●
●●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●●
●●●
●●
●
●●
●
●
●
●●●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●●
●●
●
●●●●
●
●
●●●●
●
●
●
●●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●●●
●●●●
●●
●●●●●
●
●●
●
●
●
●●
●●
●
●
●
●
●●
●●
●●
●
●
●
●
●●
●●
●
●
●●●
●
●
●
●●●●
●●
●
●
●●
●●
●
●
●
●●●
●
●
●●
●●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●●●
●
● ●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●●●
●
●
●
●●●
●
●
●
●
●
●
●●●
●●
●
●
●
●●
●
●●●
●
●●
●●
●●
●●●
●●●●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●●●
●
●
●
●
●●
●●
●
●
●●
●●
●●
●
● ●●
●
●●●
●●
●
●●
●
●●
●
●
●
●●
●●
●●
●●●
●●
●
●
●
●
●●●●
●
●
●
●
●
●
●●●
●
●
●
●
●●●●●
●
●
●●
●
●
●
●●
●
●●
●
●●●
●●●
●●
●●
●
●●
●●
●●
●
●●
●●●
●●●●
●●●
●●
●
●
●
●
●●
●●●●
●●
●
●
●●
●●●●
●
●●
●
●
●
●
●●
●●●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●●
●
●
●●
●
●●●●●●●
●
●●
●
●
●●
●
●
●
●
●
●●
●●
●
●●●
●●
●●
●
●●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●●
●●●
● ●
●
●
●
●●
●
●●
●●●
●
●●●
●
●
●
●●●
●●●
●
●●
● ●●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●●
●●
●
●
●●
●
●●●
●
●●●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●●●
●●
●
●
●●●●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●●
●
●
●●
●●
●
●●
●
●
●
●●
●●●●
●
●
●●●●●
●
●
●●
●●
●
●
●
●
●
●
●
●●●●●
●
●●
●
●●●●
●
●●
●
●●
●
●●●●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●●●●
●
●●●●
●●●
●
● ●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●●
●
●●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●●
●
●●●
●●●
●●
● ●
●
●● ●
●●● ●
●
●
●
●●
●
●●●
●●●●●●
●
●
●
●
●●
●●
● ●●●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●●●●●●●
●
●
●
● ●●
●●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●●●●●●
●
●
●●●
● ●
●●●
●●
●
●
●●●
●
●
●
●
●
●●
●
●●●●
●
●
●●●●
●●●
●●●
●
●
●
●
●
●
●
●
●●
●●
●●
●●●
●●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●●●●●●●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●●●
●●
●
●
●●
●●
●
●●
●●
●
●
●●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●●
●●●
●●●
●●
●
●
●●
●●
●
●
●
●
●
●●●●●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●●●
● ●
●
●●
●
●
●●
●●
●
●
●●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●●
●
●
●
●●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●●
●●
●●●●
●
●●
●
● ●●
●
●●
●
●
●●●●
●
●●
●
●
●●
●
●
●
●
●●
●●
●
●●●
●
●●
●
●
●
●
●
●
●●
●
●
●●●●
●
●
●
●
●●●●●●●
●
●●
●
●
●
●●●
●
●●
●●
●●
●
●●
●●●
●
●
●●●●
●●
●
●
●
●●
●
●●
●
● ●
●
●●
●
●
●●
●●
●●
●
●
●●
●
●
●
●
●●
●
●●●
●
●
●
●●●
●
●
●
●●●●
●
●●●●
●
●
●●
● ●
●●●
●
2 3 4 5
0.0
0.2
0.4
0.6
0.8
1.0
Protists 18S
●●
●●
●
●
●
●●●●
●●
●
●●
●●
●●
●
●
●
●
●
●●
●
●
●●
●●
●●●
●●●●●●●
●
●●
●●
●●●●●
●
●●●
● ● ●
●●
●
●●●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●●
●●●●●●●
●●●●
●●●●●●●●●
●●●
●
●●
●
●
●
●●●●
●
●●
●
●
●
●
●●
●
●●
●
●●●
●
●
●●
●●
●●●●●●●●●
●●
●●●●●●●●●●●●●
● ●●●
●●●●
●●
●
●●
●●
●●
●
●●
●
●
●●
●
●
●●
●
●
●●●●●●●
●●●●●
●●●●●●●●
●●●●
●
●
●
●●●●
●
●●
●●
●
●
●●
●
●
●
●
●
●●●●
●●●
●
●●●
●●●●●
●●
●
●
●●
●
●●●
●
●●●●●
●
●
●●●
●
●●●
●●●●
●●●
●
●
●
●
●●
●
●
●●
●
●
●●●●
●●
●●
●●●● ●●
●●●●●●●●●●
●
●●●●
●
●
●
●
●●●
●●●
●
●
●
●
●●
●
●
●●
●
●
●●●
●●●●●
●●●
● ●●●●●●●●●●●●
●●
●●
●●
●
●●
●
●
●●
●
●
●
●
●
●●●●
●●
●●
●●●●●●●●
●
●●
● ●
●●●●●●●●●●●
● ●●
●
●●
●
●●
●●●●
●●
●
●●●●
●
●
●
●● ●●●●●●●●●●
●●
●
●●
●●●●●
●
●●
●
● ●
●
●●
●
●
●
●●●
●
●●●
●●●●
●
●
●
●●●
●●●●●●●●●●●
●
●
●●●●●●
●
●●●
●
●
●●
●
●●●●●
●●●
●
●●
●●●
●●●●
●
●●●●●●●●●●●
●
●
●●●●●●
●●●
●
●
●●
●
●●
●●●
●
●
●●
●●●●●●
●●●
●●●●●●●●
●●●●
●
●●
●●
●
●
●
●
●●
●●
●
● ●
●
●●●
●
●
●
●
●●●●
●
●●
●●●
●●●●●●●
●●
●
●
●
●●
●
●
●●●
●●●●●● ●
●●
●●●
●
●
●
●●
●
●
●
●
●
●● ●
●●●●●●●●●
●
●●●●●●●●●●●●●
●
●
●
●●●
●
●●●
●
●●
●
●
●●●●
●●●●●●●●●
●●●●●●●●●●●●●●●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●●●●
●●●●●
●●
●
●
●●●●●●●
●●●●●
●● ●
●
●●●
●●●●●
●●
●● ●
●●●●●●●●
●
●
● ●●●●
●
●●●
●●●●● ● ●●
●●●
●
●●
●
●
●●
●●●●●●●●●●
●●●●
●
●●●●●●●●●●●
●●
●
●
●●
●●
●●
●
●
●
●●
●
●●●●●●●
●●●●
●
●●
●●●●●●●●●
●
●
●
●
●
●
●●●
●
●
●
●●
●●●●●●●●●●
●
●●●
●
●
●●●●
●●●●
●
●
●
●
●
●●●●
●
●●●
●●●●●●●●
●●●
● ●●
●
●●●●
●
●
●●●
●●
●
●
●●●●●
●●●
●●●●●●●●●
●
●
●●●●●●●●●●●●●●●●
●●●● ●
●●● ●●●●●●●●●
●●● ●
●●●●●●●●●●●
●
● ●
●
●
●
●●
●●
●●
●●●●●●
●●
●
●
●●●●●●
●
●●●●●● ● ●
●● ●
●●● ●●●●●●●●●
●●●
●●●●●●●●●●●●
●
●●
●●●●
●
●●●
●●●●●
●●
●
●
●●●
●●●
●
●●●●●
●●
●●
●●●
●●●●
●●●●
●●●
●
●●●●●●●●●●●●
●
●●●●●
●
●●●●●●●
●●●
●
●●●
●
●
●
●●●●●●
●
●
●●●
●●●●●●●●
●●●
●
●●●●●●●●●●●●
●●●
●
●●●
●●●●●●●
●
●
●●●●●●●●●●●●
●
●●
●●●●●●●
●
●
●●
● ●●●
●
●●●●●●●●
●●
●●●
●●●●●
●●●
●●●●
●●●●●
●●●●
●
●●●●●●●●●
●●
●
●●
●
●●●
●
●●●●●
●
●●
●●●●
●●
●●
●
●●●
●●●
●
●●●●●
●
●● ●
●●●
●
●
●
●
●●●
●
●
●●●●●●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●●●●●●
● ●
●●
● ●
●
●
●
●●
●
●
●
●●●●●●●
●
●
●
●
●●
●●
●●
●
●
●
●●●●●●●
●
●
●
●
●●●
●●
●
●
●
●
●●●●●●
●●
●●●●
●●
●
●
●
●●●●
●●●
●
●
●●
●
●●
●
●●●●●●
●●●
●
●●
●
●●
●
●
●
●
●●●●●●
● ●
●
●●●
●●●
●●●●●●
●
●
●●●●●●●●●●●●
●
●●●●●
●●●●●●●●●
●
●
●
●●●●●●●
●
●●
●
●
●●
●●●●
●
●
●
●●
●
●●●●
●
●
●●
●
●●●●
●
●
●●●●
●●●●
●
●●●●
●
● ●●●
●●
●●●●
●
●●
●
●●
●
●●●
2 3 4 5
0.0
0.2
0.4
0.6
0.8
1.0
Fungi ITS
●
●
●
● ●
●● ●●●●
●●●●
●●●●●●●● ●●●
●
●
●
●
●●●●
●●
●●●
●
●
●
●●●
●●
●●●●
●●
●
●
●
●●
●
●●
● ●
●● ●
●●●
●●
●●
●●●●
●●●●
●●●
●●
●
●●●
●●●
●●●●
●
●●●●
●●●
●●
●●
●
●
●
●
●
●●
●
●●
●
●● ●●●●
●●●●●●
●●
●●●● ●●●●
●●●
●
●●●● ●
●●
●●
●●●●●●●
●●●
●
●
●●
●
●●●●
● ●
●●
●●●●
●●●●
●●●●
●●●● ●●
●
●●
●●
●●●●●●
●●
●
●●●
●●●
●●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●●●●●
●●●
●●●●
●●●●
●●●●●
●●●
●●●●●●●●●●●●●
●
●●
●●●●●
●
●●
●●
●●
●
●●●●●
●●●●●●●●●●●●
●
●
●●●●●
●
●
●●
●●
●●●
●
●●●●●●
●●●
●●
●
●●
●
●●
●●
●●●●●
●●●
●●●●●●●●● ●
●
●
●
●●
●
●●●●●●●●●
●●●●●
●
●
●●●
●
●
●
●
●
●●●●●
●●●●
●●●●
●●●●●●●●
●●●●●
●●●●●●●●
●●●●●●●●
●●
●●●●●
●
●●
●
●●●
●
●●●
●
●●●
●●
●●●
●●●●●●●●
●●
●
●●
●●
●●●●●●●●
●●
●
●
●●●●
●
●●
●
●
●●
●
●●
●●●●
●●
●●●
●●●●●●
●●
●
●
●
●●●● ●
●●●●●
●●
●●●●
●●●●
●
●
●
●
●●●●
●
●●●●
●●
●●●●
●●
●●
●●●
●●
●
●●●●
●●●●●●●●
●●●●
●
●
●●
●
●
●
●
●●
●
●●
●
●●●●
●●
●
●●●●
●●●●
●●
●
●●●●
●
●●●
●●●●
●
●●
●
●●●●
●
●●
●
●
●●
●
●
●●
●●
●
●●
●●● ●
●●●●●●●
●●●●
●●
●●
●●●
●●●●●
●●●●
●
●●
●
●
●●●
● ●●●●●●●
●● ●
●●●
●●
●
●
●●
●●
●●●
●
●
●●●
●●●●
●●●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●●●
●●●●●
●
●
●
●●
●
● ●●●
●●
●
●●
●
●
●● ●●
●
●
●
●
●
●
●
●●
●
●●●●●
●●
●
●●
●●●
●
●
●
●●
●● ●●●
●
●
●●
●●●
●●●●
●
●
●
●●
●●
●
●●
● ●●
●
●
●●
●
●●
●●
●●
●●●●●
●●●●●●●●●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
● ●
●●
●
●
●●●●
●●
●
●●
●●●●●●●●●
●●●
●●
●●●●
●
●●
●
●
●●
●
● ●
●●
●●●●●●
●
●
●●●●●
●●●●●●●
●
●●●●
●●●
●
●
●●
●
●
●●
●
●
●●
● ●●●
●●
●●
●●●●● ●
●●●●●●
●
●●●●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●●●
●●
●●●
●●
●
●●●●●●●●●
●●
●●
●●
●
●
●
●
●
●●
●
● ●
●●●●●●●●●●●●
●
●●
●●●●
●●●●●
●●
●●
●
●●
●
●
●●●
●
●●
●●●●●●
●●
●●
●
●
●
●●●●●●●
●●●●
●
●
●
●●
●
●●●
●
●
●
●●●
●●
●
●●
●●●
●●●●●●●
●●
●●
●
●
●●
●
●●
●
●
●●
●
●
●
●
● ●●
●●●●
●●
●●●●●
●
●●●
●● ●●
●●
●
●
●
●●●
●
●
●
●
●●
●
●
●●●● ●
●●●●●●●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
● ● ●
●
●
●●●●
●
●
●●
●●●●
●
●●●
●●●●
●
●●
●
●
●●
●
●
●
●
●
●●●
●
●●●
●
●●●●●
●
●●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●●●● ●
●●●●●●
●
●
●●●
●●●●
●
●●
●
●
●
●
●
● ●
●●●● ●●●●●●●
●
●●
●●
●●●
●
●
●●
●
●
●
●●
●
●●●●
●
●●●●●●
●
●●
●
●●
●●●
●
●●
●●●●●
●●
●● ●●●
●
●●●●
●
●
●●●●●●
●
●●
●
●●●
●
●●
●
●
●●●●●●●
●●
●●
●●●●
●
●●●
●
●●●
● ●
●●●●●●●●
●●●●
●●
●●
●
●●
●
●
●●
●
● ●
●●●●●●●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●●
●
●●●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
● ●●
●
●●
●
●
●
●
●
●
●
●●
●
●●●
●
●
● ● ●●●
●●●●
●
●
●
●
●
●
●
●
●●●●
●
●●●●
●●
●● ●
●●
●
●
●
●
●
●●●
●
●
●●
●●●
●
●●●
●
●
●
●●
●
●
●●
●
●●●
●●●● ●
●
●
●
●
●●
●●
●●
●
● ●
●●●
●●●
●
●
●
●●
●●●●
●
●
●●●● ●●
●
●
●
●
●
●●
●●●
●●●●
●●
●
●
●
●
●
●●●
●●
●
●
●
●●●●
●
●●
●
●●●
●
●●
●
●
●●
●
●●
●●●●
●
●
●●
●●
●
●●
●
●
●●
●●●
●
●
●
●●●●●●●
●
●
●
●
●
●
●●●
●
●●
●
●●
●
●
●
●●●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●●
●
●●
●●
●
● ●●●
●
●●●
●
● ●
●
●●
●
2 3 4 5
0.0
0.2
0.4
0.6
0.8
1.0
Plants trnL
●
●
●
●
●
●●
●●
●
●
●●●
●
●●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●●
●
●
●●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●●
●● ● ●
●●●●●
●
●
●●
●
●
●
●
●
●●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●●●
●●
● ●
●● ●●
●●
●
●●●●●●
●
●●
●●
●●
●●
●
●
●
●
●
●●●
●
●●
●
●
●●
●
●
●
●
●
●●●
●
●●●
●
●
●●●
●
●
●
●
●●●
●
●●●
●●●
●●
●
●
●
●
●
●
●●●
●●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●●
●●
●●●
●
●
●
●
●
●
●
●
●●●
●●●●●
●●
●●●●
●
●
●
●
●●●●
●
●
●
●●●
●
●●
●
●
●●
●
●●
●
●●●●●●
●
●●
●●
●●
● ●
●
●
●
●●●●●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●●●
●
●●
●
●
●●
●
●
●
●●●
●
●
●●
●
●●●●
●●
●●
●●●
●●
●
●●
●
●
●●
●●●●
●●●
●
●
●
●●
●●●
●
●●
●
●
●
●●
●
●●●
●●●
●
●
●●●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●●
●
●
●
●●●●
●●
●●
●●
●●
●
●
●
●
●
●●●
●●
●
●
●
●●●●
● ●
●
●
●
●
●●●
●
●
●
●●●
●
●●
●●
●
●
●●●●
●
●
●●
●●●●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●●
● ●
●●●●●
●
●
●
●
●●
●
●
●●
●●●
●
●●●
●●
●
●●
●●
●●
●
●
●
●
●
●
●●●●
●●
●
●●
●
●
●
●●
●
●
●●
●●
●●●●
●●
●
●
●●
●
●
●●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●●●
●
●
●●
●●
●
●
●
●
●●●
●●●
●
●
●
●
●
●●
●
●
●●●
●●
●
●●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●●●
●
●●
●●
●●●
●
●●●●
●●
●●
●
●
●
●
●●
●
● ●
●●●●●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●●●●
●●
●●●●●●
●
●
●
●
●
●●●
●●
●
●
●●●●●
●●
●
●●●●
●●
●
●
●
●
●●
●
●●●●●
●
●●
●●●
●
●
●●
●
●
●●
●
●●
●●●
●
●●●●
●
●
●
●●
●
●●
●●●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●●
●
●
●●●
●●
●
●
●
●
●●●
●
●●
●●
●
●●●●●●
●
●
●
●
●
●
● ●●●
●
●
●
●
●
●
●
●
●
●●
●
●●●●●
●●
●
●●●
●●
●●
●●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●●
●●●
●●
●●
●●
●
●●
●●
●
●
●●
●●●●
●
●
●
●
●
●●●●●
●
●
●
●
●●●
● ●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●●●●
● ●●
●
●●●
●
●●
●
●
●●
●
●
●●●
●
●
●
●
●●
●
●
●●
●●●●●●●●●●
●●
●●●
●
●
●
●
●●
●●
●●
●
●
●
●●
●
●
●●
●
●
●●●●
●
●
●
●
●●●●●
●●
●●●
●●
●
●
●●●
●
●●
●●
●●
●
●●
●
●●●
●
●●
●●●
●
●●
●
●
●●
●
●●
●
●●●●●
●
●
●●●●●
●
●●
●
●
●●●
●●
●●●●
●●●●●●
●
●
●
●
●
●●
● ●●
●
●
●
●●
●
●
●
●
● ●
●●●●
●
●
●
●●●
●
●
●
●
●
●
●●●
●●●●
●●
●
●●
●
●
●
●
●
●●●●●
●●
●
●●●●
●
● ●
●
●
●●●
●
●
●●●
●
●●
●
●
●
●
●
●●●
●
●●●
●
●●
●●
●
●
●
●
●
●●●
●
●
●
●
●●●
●
●●
●
●
●
●●●●●●
●
●
●
●●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●●
●●
●
●
●
●●
●●
●
●●●●●●●
●
●●
●
●●●●●●
●●
●●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●●●●
●
●
●●
●●
●
●●
●●
●
●●
●●
●●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●●●
●●
●
●
●
●
●
●
●
●●●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●●
●
●
●●
●
●
●
●●
●
●
●●●
●●
●
●
●●●
●●
●
●
●
●●●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●●
●
●
●
●
●
●●
●●●
●
●●●
● ●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●●●●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●●
●
●
●
●
●
●●●●
●●●●
●
●
●●
●
●●
● ●●●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●●●
●●●●
●
●
●●●●●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●●●
●●
●
●●
●●●
●●●●
●
●
● ●●
●
●
●●●
●
●●
● ●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●●●●
●●
●
●
●●
●●
●
●●
●●●●
●
●
●
●
●
●●
●●
●
● ●
●
2 3 4 5
0.0
0.2
0.4
0.6
0.8
1.0
Arthropods 18S
●●
●
●
●
●●●●●●
●
●
●
●
●●
●●●
●
●●
●●●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●●●●
●
●●
●
●
●●●●●
●●
●
●
●
●●
● ●
●●●
●●●●
●●●●
●●
●
●
●
●●●
●●●●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●●●●●●
●
●
●●●
●
●
●
●
●●●●●
●●●
●●●●●●●●
●●
●
●●●●
●●●
●
●
●
●
●
●
●
●
●●●
●
●●●
●
●●●●●●●●●
●●
●
●●●●
●
●●
●
●●
●
●●
●
●
●
●●
●
●●●●●
●●●●
●
●
●
●
●●
●
●
●● ●●●
●
●
●●●●
●
●
●
● ●
●
●
●●
●
●
●●●
●
●●●●
●
●
● ●
●
●
●
●●●●
●●
●
●●
●●●
●
●●
●
●
●
●●●●●●●●
●
●
●●●
●
●
●●●●●
●●●●●●●●●● ●
●●
●
●●
●●●●
●
●
●
●
●
●
●
●●●●●
●●●
●
●●●●●●●●
●●●●●
●●●
●●●●●
●
●
●●
●
●
●
●●●
●
●
●●●●●
●
●●●
●
●●
●
●
●●
●
●●
●●
●●
●
●●●
●●
●
●●●
●
●●●
●
●●
●
●●
●
●
●●
●
●●●●
●●●
●●●
●
●●●
●
●●●●
●●
●
●●
●
●
●●●
●
●
●●●●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●●
●
● ●●
●●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●●●
●
●
●●●
●
●●●●●
●●●
●
●●●●
●●
●
●●●●
●●●●
●●
●
●●●●
●●
●
●
●
●
●●●●
●
●
●●●
●●●
●●
●
●
●●
●
●●●●
●●
●
●●
●
●●●●
●
●●
●
●●
●●●
●
●
●
●
●●●●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●●●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●●
●
●
●●
●
●
●●
●
●
●●
●●
●
●●
●●●
●
●●●●●●●●
●
● ●
●
●●●
●
●
●
●●●
●
●
●
●●
●●●
●
●●
●
●
●●
●
●
●
●
●●●●●
●
●●
●
●●●
●●●
●
●
●
●
●
●
●●●
●●
●●
●
●
●
●●
●
●
●
●
●
●●●●●
●
●●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●● ●●
●●●
●
●●●●●
●
●
●●
●
●
●●●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●●●
●
●●
●
●
●●
●
●●●
●
●●
●
●●
●●
●
●
●
● ●
●
●
●
●
●●●
●
●
●
●
●●●
●●
●
●●●
●
●
●
●●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
● ●●
●
●●●
●
●●●●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●●
●●●
●
●
●
●
●
●
●●
●
●●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●●
●●
●
●
●
●
●
●●
●
●●●
●●
● ●
●
●
●●
●
●●●
●● ●●
●●●●
●
●
●
●
● ●●●
●●●
●●●●●●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●●
●●
●
●
●●
●●
●
●●●●●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●●
●
●
●●●
●●
●
●
●●●●
●●●●
●●
●
●
●●
●●●
●
●
●
●
●●
●
●●
●●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●●
●●
● ●●
●
●
●
●
●
●●●
●●●●●●●●
●
●●●●
●
●
●
●
●
●●● ●
●
●
●
●●●
●●●
●●●
●
●
●●●●
●●●●
●
●●
●
●●●
●
●
●
●●●
●●
●
●●
●●●
●
●●●●
●●●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●●●●
●●
●
●●
● ●●
●
●●
●
●●
●
●●
●
●●
●●
●
●●●●●
●●
●●
● ●●
●
●●●●●
●
●●
●
●
●●●●●●●●●●●
●●
●
●●●●
●
●●
●
●
●
●●●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●●●●
●
●●●
● ●
●
●
●
●
●
●
●●
●
●
●●
●●●●●●
●
●
●●
●
●
●
●●●
●
●●●● ●●
●
●●●●●
●●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●●
●●●●
●
●●●
●
●
●
●
●
●●
●
●●●●●
●
●●
●
●●●●
●
●
●
●
●
●
●●
●●●●●
●
●●●●●●
●●
●
●
●●
●
●
●●
●
●●●
●●●●●
●
● ●
●
●●
●●●●●
●
●●●●●
●
●
●●●●
●●●
●●●
●
●
●●●●
●
●●
●
●●●●
●●
●●
●
●●●
●
●
●
●
●
●●
●
●●●
●
●●●
●
●
●
●
●●●
●
●
●●●●
●●●●●●
●●●●●●
●
●●
●
●●
●
●●●
●
●●●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●●
●●
●●
●
●
● ●
●●
●
●
●
●●
●
●
●
●
● ●●●●
●●●●
●
●●
●
●
●
●●
●●
●
2 3 4 5
0.0
0.2
0.4
0.6
0.8
1.0
Insects 16S
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●●
●●
●
●
●
●
●●●●
●
●●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
● ●●
●●
●
●
●●
●
●●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●●
●
●●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●●●
●●●
●●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●●●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●●●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●●
●
●●●
●
●
●●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●●
●
●●
●
●●
●●●
●●●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●●
●●●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●●●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●●●
●
●
● ●●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
● ●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
● ●●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●●●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●●
●
●●●●
●
● ●
●
● ●●
●
●●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
● ●
●
● ●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●●
●
●
●
●●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●●●
● ●
●
●●
● ●
●
●
●
●●
●
●●●
●
●
●
●●
●
●●
●
●
●●●
●
●
●●●
●
●
●
●
●●
●
●
●
●●●
●●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●●
●●
●
●
●● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●●● ●
●
●
● ●●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●●
●
●
●●
●
●●
●
●
●
●●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●●●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●●
● ●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
● ●
●
●●
●●●
●
●
●
●
●
●●
●●
●●●● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●●
●●
●●
●
●
●●
●
●
●●
●●
●●●
●
●
●
●
●
●
●●
●
●●
●●
●●●
● ●●
●●●
●
●
●
●
●
●
●●●●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●●●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●●●
●
●●
●
●
●
●●
●
●●●●
● ●● ●
●●
●
●
● ●
●
●
●
●
●●
●●
●●●
●
● ●●●
●●
2 3 4 5
0.0
0.2
0.4
0.6
0.8
1.0
Annelids 18S
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
● ● ●
●●
●
●
●
●
●
●●
●●●●
●
●
●
●●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●● ●
●
●●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●●●
●
●
●
●
●
●
●●
●●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
● ●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●●
●
●● ●●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●●
●●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●●●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●●●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●●●
●
●
●
●
●
●●●
●
●
●
●●
●●●
●
●
●●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●●
●
●●●●
●●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
● ●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●●
●
●●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●●
●
●
●
●
●●
●●●
●●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●●●
●●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●●●●
●
●●●●●●●●●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
2 3 4 5
0.0
0.2
0.4
0.6
0.8
1.0
Nematodes 18S
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●●● ●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●●
●●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
● ●●● ●
●
●● ●●●●●●●●●●●● ●●●●●
●
●
●●
●
●●
●
●
●
●
●
●●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●●●●●●●●●●●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
2 3 4 5
0.0
0.2
0.4
0.6
0.8
1.0
Platyhelminthes 18S
Chapter1–DNA-basedBetaDiversity
84
limited,wesimplyinvestigatedtherelativeeffectofdisturbanceandsoilconditionson
the various taxonomic groupswithout accounting for spatial structure.Wemeasured
the Sorensen dissimilarity index among pairs of sampling points in all Paracou and
Arbocel plots, both disturbed and undisturbed. We quantified logging intensity by a
dummyvariabletakingvalue0inundisturbedlocations(ParacouP6andP11areas),1
inmildly disturbed ones (Paracou P12 area), and 2 in strongly disturbed ones (clear
cutting;Arbocel).Wethenfollowedasimilarapproachasforthecomparisonbetween
soil effects and spatial aggregation in themain dataset.We performed amultivariate
linear regression (i.e., aone-dimensionalRDA)of theOTUabundancedata (Hellinger-
transformed and OTU-centred) against the logging intensity variable (centred and
normalized). When the linear regression was significant, we partitioned the total
variance of taxonomic composition between a logging intensity component and a soil
component.Weobtainedthesoilcomponentaspreviously:weperformedaPCAonsoil
variables, kept the first four axes, and built a RDA-based model by forward variable
selection.
Chapter1–DNA-basedBetaDiversity
85
Results
Chemical and physical soil properties varied across the samples (Table S2). The pH
rangedfrom3.8to5.5,Ccontentfrom1.9%to4.2%,Ncontentfrom0.12to0.31%,and
P contentwas very low (see also Grau etal., 2017). Soilswere also poor in terms of
exchangeablecationcontent(K+,Ca2+,Mg2+,Mn2+),andvariedsignificantly in termsof
texture,with sandy (up to 80% sand) to clayey (up to 80% clay) soils. Paracou soils
tended tobesandierandmorenutrient-poor thanNouraguessoils.This suggests that
the Nouragues-Paracou comparison compounds geographical distance and
environmentaldistanceeffects.ThefirstPCAaxis(40%of totalvariance)corresponds
toorganicmatter(totalcarbonandnitrogen)andclaycontents,whicharecorrelatedto
aluminium concentration and anticorrelated to pH; the second PCA axis (30% of
variance) corresponds to nutrient and silt contents, the third to phosphorus (13% of
variance)andthefourthtoiron(7%;Fig.3).
Geographicaldistance Soil
Mean𝐷!"#$%&$%
𝑟𝒅𝒊𝒔𝒕 𝑟𝒅𝒊𝒔𝒕,𝒑𝒂𝒓𝒕 slope𝒅𝒊𝒔𝒕 𝑟!"#$ 𝑟𝒔𝒐𝒊𝒍,𝒑𝒂𝒓𝒕 slope𝒔𝒐𝒊𝒍
PlantstrnL 0.42 0.65*** 0.61*** 0.038 0.29*** 0.06 0.011
Bacteria16S 0.49 0.16* -0.02 0.014 0.46*** 0.44*** 0.028
Protists18S 0.60 0.16** 0.05 0.012 0.30*** 0.26*** 0.015
FungiITS 0.87 0.43*** 0.29*** 0.029 0.54*** 0.45*** 0.025
Arthropods18S 0.53 0.36*** 0.29*** 0.026 0.28*** 0.17* 0.014
Insects16S 0.89 0.23*** 0.16** 0.013 0.25*** 0.18** 0.010
Annelids18S 0.35 -0.031 -0.08 -0.004 0.10 0.12 0.009
Nematodes18S 0.70 0.11* 0.09 0.012 0.05 0.02 0.004
Platyhelminthes18S 0.57 -0.079 -0.11* -0.015 0.07 0.10 0.009
Table1:Linearregressionoftaxonomicdissimilarity(Sorensenindex)againstsoilandgeographicaldistance.𝑟!"#$ ,𝑟!"#$ ,𝑟!"#$,!"#$ ,𝑟!"#$,!"#$arethesimpleandpartialPearson’scorrelationcoefficients.SignificancewasassessedusingManteltests:***forp<0.001;**for0.001<p<0.01;*for0.01<p<0.05.
Chapter1–DNA-basedBetaDiversity
86
Sorensendissimilarityvariedacross taxonomicgroups, and for thesamegroup
depending on the tested DNA barcode (Table 1; see Table S4, Fig. S2 and S3 for a
comparisonbetweenbarcodeswithingroup). Itwashighest for insects16Sand fungi
ITS (ca. 0.9 in average), and lowest for annelids and plants trnL (ca. 0.4 in average).
When plotted against log-transformed geographical distance (Fig. 4), Sorensen
dissimilarityshowedastronglysignificantcorrelationforplants,fungi,arthropodsand
insects,aweakcorrelationforprotists,bacteria,andnematodes,andnocorrelationfor
annelids and flat worms (by decreasing order of correlation coefficient; Table 1).
Sorensendissimilaritywasalsoregressedagainstsoildissimilarity(Fig.5).Wefounda
strongcorrelationtosoildissimilarityinfungi,bacteria,protists,plants,arthropodsand
insects, andnocorrelation inannelids, flatwormsandnematodes (Table1).To testa
possible collinearity between soil dissimilarity and geographical distance, we finally
computedthepartialcorrelationrdist,parttolog-distanceconditionalonsoildissimilarity.
Thepartial correlation to log-distancewassignificant inplants, fungi, arthropods,and
insects,butnotintheothergroups.Conversely,whencomputingthepartialcorrelation
rsoil,part to soildissimilarity conditionalon log-distance, the correlationwas retained in
fungi,bacteria,protists,insectsandarthropods,butlostinplants.
RDA-basedpartitioningofbeta-diversityshowedthatenvironmentalfactorsand
spatialaggregation togetherexplainedaproportionofbeta-diversity thatranged from
45%inbacteriatozeroinflatworms(Fig.6,Tables2,S5).Withinthefractionofbeta
diversityexplainedbysoileffects,thefirsttwosoilPCAaxeswerethemainexplanatory
factors,withthesilt-nutrientaxisplayingaparticularlyimportantroleinbacteria(Fig.
S4). The relative contribution of spatial aggregation and soil properties varied across
groups,withamajoreffectofspatialaggregationrelativetosoilinannelidsandplants,
while both effects were of the same magnitude for bacteria. While the collinearity
betweenenvironmentalandspatialvariables introduceduncertaintyas to theiractual
relative importance to beta diversity, pure spatial aggregation explained an equal or
higherproportionofthevariationcomparedtopureenvironmentalfactorsinallgroups.
Forbacteriaandprotists,thiscontrastswiththeconclusionsofdistance-basedanalyses.
The fit of the neutral prediction for the decay of taxonomic similarity𝐹!with
geographical distance was statistically significant for plants, bacteria, protists, fungi,
insectsandannelids,butnotforarthropods,nematodesandflatworms(TableS6,Fig.
Chapter1–DNA-basedBetaDiversity
87
S6).Atagivengeographicaldistance, the𝐹!statistic tendedtobemorescatteredthan
Sorensendissimilarityandtoexhibitoutliers(Fig.S6).Assumingadensityofoneplant
individualper20m2,asmeasured formatureneotropical forest trees,weestimateda
mean dispersal distance per generation of 43 m in plants. The dispersal to
diversificationratio𝜎! 𝜈washighestforfungiandinsects,intermediateforplantsand
annelids,andsmallestforprotistsandbacteria(TableS6).
Finally, we found that past logging activities had the strongest effect on plant
composition (TableS7,Fig.S6).Theyalsohadaneffectonannelids,whichwas larger
than the effect of soil conditions, and a small but strongly significant effect on fungi.
However,theyhadlittletonodetectableeffectonothergroups.
Puresoilfraction Mixedfraction Purespatial
fractionTotalexplained
variance
PlantstrnL 2.4*** 7.8 11.0*** 21.1***Bacteria16S 12.7*** 18.5 14.0*** 45.2***Protists18S 2.2** 8.7 10.0*** 20.8***FungiITS 3.8*** 4.9 5.9*** 14.5***Arthropods18S 1.5* 2.8 2.4** 6.7***Insects16S 0.1 1.3 1.5** 2.9***Annelids18S 5.5** 5.5 15.3*** 26.2***Nematodes18S 1.4** 1.4 2.4*** 5.2***Platyhelminthes18S NA NA NA NA
Table 2: Fractions of variance (adjusted R2, in%) explained by Canonical RedundancyAnalysisforenvironment-onlyandspatial-onlymodels.Significance:***forp<0.001;**forp<0.01;*forp<0.05.
Chapter1–DNA-basedBetaDiversity
88
Discussion
We have explored the patterns of soil beta diversity in the tropical forests of French
Guianabasedonfifteenundisturbed1-haplots,aswellasfourdisturbedplots.Distance-
basedanalysesusingSorensendissimilaritysuggest thatatourstudyscale,plantbeta
diversity is driven predominantly by geographical distance, bacteria and protist beta
diversitybysoilproperties,whilefungi,arthropodandinsectbetadiversitydependson
both types of factors. Finally, annelid, nematode and flatwormbeta diversity did not
correlatewithanyofthesefactors.Theobservationthatbothgeographicaldistanceand
environmentplayarole inexplainingcommunityassemblyhasalreadybeenreported
for a range of taxonomic groups, either in eukaryotes or in bacteria (Cottenie, 2005;
Thompson&Townsend,2006;Martinyetal.,2011).However,ourresultsareoneofthe
rarecasestudieswherebetadiversityhasbeenquantifiedacrossthesamesitesovera
broadrangeoftaxonomicgroups.
The dependence of plant beta diversity on geographical distance in tropical
forests has been reported in the past, and has been presented as evidence for the
importance of dispersal-limited neutral processes in shaping these ecological
communities(Conditetal.,2002).Likewise,thestrongdependenceofbetadiversityon
soil conditions in unicellular organisms (bacteria and protists) is in agreement with
expectations(Soininenetal.,2007;Ramirezetal.,2014).Whilewecouldexpectfungito
be primarily responsive to environmental conditions owing to their good dispersal
abilities, widespread plant-fungi associations may be responsible for the observed
dependenceonbothenvironmentalconditionsandgeographicaldistance(Bahrametal.,
2013).Indeed,dispersalishamperedbyhostspecificity,andthethedistributionofhost-
specificfungaltaxareflectsthatoftheirplanthosts.
For insects, previous studieshave reported a lowbetadiversity (Novotnyetal.,
2007; Basset et al., 2012). However, these studies have primarily focused on above-
groundherbivores,whichareknowntohavegooddispersalability.Incontrast,wehave
sampledsoil-dwellinginsects,andthusourfindingthattheseorganismshavehighbeta
diversity, influenced by both soil properties and dispersal limitation, does not
Chapter1–DNA-basedBetaDiversity
89
necessarily contradicts the results of previous publications. However, our finding is
significant because it shows that spatial patterns of biodiversity in insects cannot be
easily generalized across ecosystem compartments. Finally, annelids, nematodes and
flatworms are represented by a limited number of OTUs (Table S1), and the lack of
patternsinthesegroupsmightbeduetoalackofstatisticalpower.
Sore
nsen
dis
sim
ilarit
y
Soil dissimilarity
●●
●●
● ●
●
●
●●● ●
●● ●
●
●●●
●● ●●●
●
●
●
●
●
●●
● ● ●
●●
● ●
●
●
●●
●●●● ●
●
●●●
●●
●●
● ●●
●
●●
● ●
●
● ●●
● ●●
●●
●●
●●
●
●
●●
●●
● ● ●
●●
● ●
●
●
●● ●●●● ●
●
●●●
●
●
●
● ●
●
●
●
●●
●
●
●
●●
●
●
●
●● ●
●
●
●
●
●
●●
●●
● ●● ●●
● ●
●
●
●●
●●
●
● ●
●
●●●●
●
● ●
●
●
●
●● ●
●
●●
●
●
●
●
●●
●●●
●
●
●
●● ●
●●● ●
●●
● ●
●
●
●●
●●
●
●●
●
●
●●●
●●
●
●
●●
● ●
●
●●
●
●●●
●●
●●●
●
●
●
●●
●●●● ●●●
● ●
●
●
●●
●●●● ●
●
●●
●●
●●
●●
●● ●●●
●●
●
●
● ●● ●●
●
● ●
●
●●
●●● ● ●
●●
● ●●
●
●●
●●●● ●
●
●●
●●
●●●● ●●
●●
●●
●●
●● ● ● ●●
●
●
●●
●●
●● ● ●
●●
● ●
●
●
●●
●●●
● ●
●
●
●●
●
●
●●●●
●●
●●
●●●
● ●● ●
●
●
●
●● ●
●●
● ●●
●●
● ●
●
●
●●
●●●● ●●
●●●
●
●
●
●●
●● ●
●
●
●
●
●●
● ●●
●
●
●●● ●
●● ●
●
●●
●●
●
●
●●
●●
●● ●
●
●
●●
●●
●●
●
● ●
●
●
●
●
●●
●●
●
●
●
●● ● ●
●
● ● ●
●●
●●
●
●
●● ●●
●●
●
●
●●●
●● ●
●
●
●
●
● ●●
●●
●●
●●
●
●
●
●
●●● ● ●
●●
● ●
●
●
●●●●
●● ●
●
●●●
●
●
●●
●
●
● ●●
●●
●●● ●
●
●●
●●
●●
● ●
●●
● ●
●
●
●● ●●●● ●
●
●●●
● ●
● ●
●
●
●●
●●
● ●●
●
●
●● ●
●
●● ● ●
●●
●●
●
●
●●
●●●
● ●
●
●●
●●
●
●●
●
●●●●●●
●
●●
●
●●
●●
●● ●
●●
● ●
●
●
●●● ●●● ●
●
● ●●
●
●●
●●●
●● ●●
●●
●
●
●●
●●
● ● ●
●●
● ●
●
●
●●
● ●●● ●
●
● ●●●
●
●
●
●
●●●●●
●
●
●●
●●●
●● ●
●●
● ●
●
●
●●● ●●
●●
●
● ●●●
●
●
●
●●
●●●
●
●
●
●● ●
●● ● ●
●●
● ●
●
●
●● ●●●● ●
●
●●
●●
●
●
● ●
●
●
●● ●
●
●●
●●● ●
●●●
●● ●●●●
●●●●●
●●●
●●
●
●●
● ●
●
●
●
●
●
●
●●
● ●●
●●
●
●
●
●
●● ● ●
●
●●●
●
●●
●●●
●●
●
●
●
●
●●●●
●●
●●●●
● ●●
●●
●●●●●
●●●●
●
●●
●
●
●
●●
●
●●● ●
●●
●●●
●
●
●
●●
●●
●
●●
●●●●
●
●●
●
●
● ●
●
●●● ●
● ●●●●●
●
●
●● ●●
●
●●
●
●●●
●
●●
●
●●●
● ●● ●● ●●
●
● ●
●
● ●● ●●
●
●●
●
●●●●
●
●
●●
●
● ●● ●● ●●
●● ●
●
● ●●● ●●
●●●
●●●
●
●
●●
●
●
●● ●●
●●
●●●
●
●
●● ●
●
●
●●
●
●●●●
●
●
●
●
●●●●●
●●
● ●●
●
●●
● ●●● ●
●
●●●●
●
●● ●●
●● ●●●
●●
●
●
●● ● ●●● ●
●
●●●
●
●
●
●● ● ●●●
●●
●
●
●
●●●
●
●●●
●
●●●
●
●
●● ● ● ●●
● ●●
●
●●●● ●
●●●
●●●●
●
●●
● ●●
●●
● ●
●
●
●● ● ●●
●●
●● ●●●
●●●
●● ●
●●
●
●
●●●
●●
●●●● ●●
●
●●
●●
●
● ●
●
●●●● ●
●●● ●●●●
●
●
● ●●
● ●
●
●
●● ● ●●
●●
●● ●●●
● ●●
●●
●
●
●●
● ●●
●● ●● ●●●
●●
●●
●
●
●●● ●
● ●● ●
●●●●
●
●●
●
● ●●
● ●●
●●●● ●●
●●
●
●
●●
●● ●
●●● ●● ●●
●
●
●
●
●●●
●●●●●
●●●
●
●
●
●● ● ●●●● ●●
●●
●
●
●● ● ●●●●●
●
●●
●
●●●
●●●●
●●●●
●
●●●
●
●
●
●●●● ●
●●
●
●●
●
●●
●●
●
●
●●●
● ●●●
●●● ●
●● ● ●
●●
●
●●●●
●●●
●● ● ●●●●●
●●●
●
●●●●
●●
0 2 4 6 8
0.0
0.2
0.4
0.6
0.8
1.0
Bacteria 16S
●●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
● ●
●●
●
●●
●
●●
●●
●
●
●
●
●
●
●
● ●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●●
● ●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●●●
●
● ●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
● ●●●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●●●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●●
●
●●
●
● ●●
●
●
●
● ● ●●
●
●
●
●● ●
●
●
●
●●
●
●●
●●●
●
● ●
●
●
●
●
●●
●
●●
●●
●●●
●
●
●
●
●●
●
●
●●
●● ●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●●
●●●
●●
●
● ●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●●
●●
●
●●●●
●
●
●●●
●
●
●
●
●●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●●●●
●●
●● ●●
●
●
●●
●
●
●
● ●
●●
●
●
●
●
●●
●●
●●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●●●●
● ●
●
●
●●
●●
●
●
●
●●●
●
●
● ●
●●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●● ●
●
●
●●●
●
● ●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
● ●●
●
●
●
●
●
●
●●●
●
●
●
●●●
●
●
●
●
●
●
●●●
●●
●
●
●
●●
●
●● ●
●
●●
● ●
●●
●●
●
● ● ●●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●●
●●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●●
●●
●
●
●●
●●
●●
●
● ●●
●
● ● ●
●●
●
●●
●
● ●
●
●
●
●●
●●
● ●
●●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●●
●●●
●
●
●●
●
●
●
●●
●
●●
●
●●
●●●
●
●●
● ●
●
● ●
●●
●●
●
●●
●●●
●●
●●
●● ●
●●
●
●
●
●
●●
●● ●●
●●
●
●
●●
●●
●●
●
● ●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
● ●
●
●
●●
●
●●
●●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●●
●
● ● ●
●●
●●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
● ●
●
●●
● ●●
●●
●
●
●
●●
●
●●
●●●
●
●●
●
●
●
●
●●●
●●●
●
●●
●● ●
●
●
●
●●
●
●
●
●●
●
● ●
●
●
●
●●
●●
●
●
●●
●
●● ●
●
●●●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●●●
●●
●
●
●● ● ●
●
●
●
●
●●
●●
●
●
●
●
●
●●
● ●
●
●
●●
●●
●
● ●
●
●
●
●●
●● ●
●
●
●
●●
●●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●●●●●
●●
●
●●
●
●●●● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●●●
●
●
● ●● ●
●● ●
●
● ●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●●
●
●●
●
●
●
●
● ●●
●
●
●●
●
●
●
●
●●
●
●● ●
●●
●
●●
●●
●
●● ●
●●● ●
●
●
●
●●
●
●●
●
●●●
●●
●
●
●
●
●
●●
●●
●●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●● ●●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●●●●
●●
●
●
●
●
● ●●
● ● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●● ●●
●●●
●
●
●●●
●●
●●●
●●
●
●
● ●●
●
●
●
●
●
●●
●
●●●
●
●
●
●●●●●
●●●●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●● ●
●●
●
●
●●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●●● ●●
●●
●
●
●●
● ●
●
●
●
●
●
● ●
●
●
●●●
●●
●
●
●●
●●
●
●●
●●
●
●
●●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●●
●
●●
●●●
●●●
●●
●
●
●●
●●
●
●
●
●
●
●●● ●●
●
●
●
●
●
●●●
●
●●
●
●
●
●
● ●
●
●
● ●●
●
●
●
●
●
●●
●●●
●
●●
●
●
● ●
●●
●
●
●●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
● ●
●
●
●
●●
●
● ●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●●
● ●
●●● ●
●
●●
●
● ●●
●
●●
●
●
●●
●●
●
● ●
●
●
●●
●
●
●
●
● ●
● ●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●●●
●● ●●
●
●●
●
●
●
●●
●
●
●●
●●
●●
●
● ●
●●●
●
●
●●●
●
●●
●
●
●
● ●
●
●●
●
●●
●
●●
●
●
●●
●●
●●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●●●
●
●
●
●●●
●
●
●●●
●
●
●
●●
●●
●●●
●
0 2 4 6 8
0.0
0.2
0.4
0.6
0.8
1.0
Protists 18S
●●
●●
●
●
●
●●
●●
●●
●
●●
●●
●●
●
●
●
●
●
●●
●
●
●●
●●
●●
●
● ● ● ●● ● ●●
●●
● ●
●●●●
●
●
●●●
●●●
●●
●
●●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●●
●●
● ●●● ●
● ●
● ●● ●
●●● ●
●●● ●●
●●
●
●
●●
●
●
●
● ● ●●
●
●●
●
●
●
●
●●
●
● ●
●
●●●
●
●
●●
●●
● ●●●● ● ●● ●
● ●
●●●
●●●
● ● ●●●
●●
●●● ●
●●
●●
●●
●
●●
● ●
●●
●
●●
●
●
●●
●
●
● ●
●
●
● ●●● ●
● ●
● ●●
●●
●●
●●●● ● ●
●●●
●●
●
●
● ●●
●
●
●●
●●
●
●
●●
●
●
●
●
●
● ●●●
● ● ●
●
● ●●
● ●●
●●
●●
●
●
●●
●
●●●
●
●●●●●
●
●
●●
●
●
●●
●
●●●●
●●
●
●
●
●
●
● ●
●
●
●●
●
●
●●
●●
● ●
●●
●●
●●●●
●●
●● ●●●
●●●
●
●●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
● ●
●
●
●●
●
●
● ●●
●●●
●●
● ●●
●●● ●●●●
●●●
●●●
●●
●●
●●
●
●●
●
●
●●
●
●
●
●
●
● ●●
●
● ●
●●
● ●●●
● ●●
●
●
● ●
●●
●●●●
●●
●●●●●
● ●●
●
●●
●
●●
●●●
●
●●
●
●●
●●
●
●
●
●●● ●●●
●●
●● ● ●●●
●
● ●
●●● ●
●
●
●●
●
●●
●
●●
●
●
●
●●●
●
●●
●
● ● ● ●
●
●
●
●●●
●●● ● ●●●●
●●
●
●
●
●●
●●●
●
●
●●●
●
●
● ●
●
●●●●●
●●
●
●
● ●
●●
●
● ●●●
●
●● ● ● ●●● ● ● ●●
●
●
●●●●
● ●
●●●
●
●
●●
●
●●
●●
●
●
●
●●
● ● ● ●●●
●●
●●
●●●
● ●●●
● ● ●●
●
●●
●●
●
●
●
●
●●
●●
●
● ●
●
●●
●
●
●
●
●
●● ●
●
●
●●
●●●
●● ●● ●
●●●
●
●
●
●
●●
●
●
●● ●
●●●●●
● ●●
●
●●●
●
●
●
● ●
●
●
●
●
●
●●●
●●● ●●●● ● ●
●
●●
● ●●●
● ●●● ●●
●
●
●
●
● ●●
●
●●
●
●
● ●
●
●
●●
●●●
●●●
● ● ●● ●● ●●
●● ●●●●
● ●●●●●
●
● ●
●●
●
●
●
●
●
● ●
●
●
●
●
●●
●●
●●
●●
●● ●● ●
●
●
● ●●●
● ● ●
●●●●●
● ● ●
●
●●●
●●●
●●
● ●
●●●
●●
●● ●●
● ●●
●
●●● ●●
●
●●
●
●●●●
● ●● ●
●●
●
●
● ●
●
●
●●
●●● ●●●
● ●●●
●●
●●
●
● ● ●●● ● ●● ●●
●
●●
●
●
●●
●●
●●
●
●
●
●●
●
● ●●● ●●
●● ● ●
●●
● ●
●●●
● ●● ●●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●● ●●● ● ●
● ● ●●
●●●
●
●
●● ● ●
●●●●
●
●
●
●
●
●●●
●
●
●●
●
● ●●
● ● ● ●●●
●●
●●●
●
●●● ●
●
●
●●
●
●●
●
●
● ●●
● ●
●●
●
●●
● ● ●●●● ●
●
●
●●●●●
●● ● ●●●●● ●●●
●●
●● ●●●● ● ●● ● ●
●●
●●● ●
●●● ●●●●
● ●●●●●
●
●●
●
●
●
●●
●●
● ●
●●
● ●●●
●●
●
●
●● ● ●●●
●
●●●●●●● ●
●● ●
● ●●● ●● ● ● ●●● ●
●●
●●● ●●●●●
●●●●●
●
●●
●●
●●
●
● ●●
●● ●●●
● ●
●
●
●●●
●●●
●
●●●
●●
●●
●●
●●●
●●
● ●
●●
●●
●●●
●
●● ●●●●
● ●●●●●
●
●● ●●●
●
●●● ● ●
●●●
●●
●
●●●
●
●
●
● ●●●●●
●
●
●●●
●●
●●● ●●●
● ● ●
●
●●● ●●●● ●●●●●
● ● ●
●
●●●
● ● ●●●
●●
●
●
●●
●●●
●●●●●●●
●
●●
● ●●●
●● ●
●
●
● ●
●●●●
●
●● ● ●●●●●
●●
● ●●
●● ● ●
●●
●●
●●
●●
●●●
●●
●●
●●
●
● ●●
●● ●●
● ●●
●
●
●●
●
●●
●
●
● ●●●●
●
●●
● ● ●●
●●
●●
●
●●●
●● ●
●
●●●●
●
●
●● ●
● ●●
●
●
●
●
●●●
●
●
●● ●●
●●
●
●●
●
●●
●
●
● ●
●
●●
●
●
●●
●●● ●●●
● ●
●●
● ●
●
●
●
●●
●
●
●
●●●
●●●●
●
●
●
●
●●
●●
●●
●
●
●
● ●●● ●●
●
●
●
●
●
●●
●●
●
●
●
●
●
● ●● ●●
●
●●
●● ●
●
●●
●
●
●
●● ●●
●●●
●
●
●●
●
●●
●
●●
●●●●
●●●
●
● ●
●
●●
●
●
●
●
● ●●
●●●
● ●
●
●●●
●●●
●●●●
●●
●
●
●●●
●●●●●●●
●●
●
●●●
●●
●●●
●●●●●●
●
●
●
●●●
●●●●
●
●●
●
●
●●
●●●
●
●
●
●
●●
●
●●
●●
●
●
●●
●
●●●
●
●
●
● ●●
●
● ●●●
●
●●●●
●
●●●●
●●
●●●
●
●
●●
●
●●
●
●●●
0 2 4 6 8
0.0
0.2
0.4
0.6
0.8
1.0
Fungi ITS
●
●
●
● ●
● ● ● ● ●●
●●● ●
●● ●
●●●●●● ●●
●
●
●
●
●●●
●
●●
●●
●
●
●
●
●●
●
●●
●●●●
●●
●
●
●
●●
●
●●
●●
●● ●
● ●●
●●
● ●
●●● ●
●●
● ●● ●
●
●●
●
●●●
●●
●●
●●●
●
●●●
●● ●●
●●
●●
●
●
●
●
●
●●
●
●●
●
● ● ● ● ●●
●●● ●●●
●●
●●● ●●●
● ●
●●
●
●
●● ●●●
●●
●●
● ●● ●● ●
●
●●
●
●
●
● ●
●
●●
●●
●●
●●
●●
●●
●●
● ●
●● ●●
●●●
●● ●●
●●
●●
●● ●
●●●
●●
●
●● ●
●● ●
●●
●●
●
●
●
● ●
●
●
●
●
●
●●
●
●●
●●●●●
●
●●●●
●●● ●
● ●●
●●
●●
●
● ●●●●
●●● ●● ●● ●
●
●●
●●●●
●
●
●●
●●
●●
●
●●
● ●●●●●
●●
● ●●
●●● ●●
●
●●
● ●●
●
●
●●
●●
●●
●
●
● ●● ● ●●
●●
●
●●
●
●●
●
●●
●●
●●
● ●●
●●
●
●●● ● ●
●●●
●●●
●
●
●●
●
●●●●●
●●
●●
●● ●● ●
●
●
●●
●
●
●
●
●
●
●●●●●
●● ●●
●●● ●
●●●
●●●● ●
● ●●●●
●●●●
● ●●
●
●●● ●
●●● ●
● ●
●●● ●
●
●
● ●
●
●●
●
●
●●●
●
●●●
●●
● ●●
●● ● ●●●● ●
● ●
●
● ●
●●
●●●● ● ● ●●
●●
●
●
●● ●
●
●
●●
●
●
●●
●
●●
●●
●●
●●
●●●
●●● ● ●
●
● ●
●
●
●
● ●●
● ●
●●●
● ●
●●
● ●●●
●● ●
●
●
●
●
●
●●●
●
●
●●
●●
●●
●●
● ●
●●
● ●
●●
●
● ●
●
●●●●
●●●
●●
●●●● ●
●●
●
●
●●
●
●
●
●
●●
●
●●
●
●● ●●
● ●
●
●●●
●
●●●
●
● ●
●
●●
●●
●
●●
●
● ●●●
●
● ●
●
●●●●
●
●●
●
●
●●
●
●
●●
●●
●
●●
● ●● ●
●●● ●
● ●●
● ●●●
●●
●●
● ●●
●●
●●
●●
● ●●
●
●●
●
●
●●
●
● ●●● ● ●● ●
●●●
●●●
● ●
●
●
● ●
●●
● ●●
●
●
●●●
●●
●●
●●●
●
●
● ●
●
●
●●
●
●
●
●●
●
●
●● ●
●●
●● ●
●
●
●
● ●
●
●● ●●
● ●
●
●●
●
●
●●●●
●
●
●
●
●
●
●
●●
●
●● ● ●●
●●
●
●●
●●
●
●
●
●
●●
●● ● ●●
●
●
●●
● ●●
●●●
●
●
●
●
● ●
●●
●
●●
●● ●
●
●
● ●
●
●●
●●
●●
●● ● ●●
●● ●
●●
● ●● ●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
● ●
●●
●
●
●●●
●
●●
●
● ●
●●● ●●●
● ●●
●● ●
●●
●●
● ●
●
● ●
●
●
●●
●
● ●
●●
●●
●●●
●
●
●
●●●
●●● ● ●●● ●●
●
● ● ●●
●●
●
●
●
●●
●
●
●●
●
●
●●
● ●●●
●●
●●
●● ● ●●●
●●
●●
●●
●
●●
●●●●
●
●
●
● ●
●
●
●●
●
●
●
●
●●
●●
●
●●
● ●●
●●
●
●●● ● ●
●● ● ●●●
●●
●●
●
●
●
●
●
●●
●
●●
●●●
●● ●●
●● ●
●●
●
●●
●● ●●
● ● ●●●
●●
●●
●
● ●
●
●
●●●
●
● ●
●● ● ●
●●
●●
●●
●
●
●
●● ● ●
● ● ●●●
●●
●
●
●
● ●
●
●●●
●
●
●
●● ●
●●
●
●●
●●●
●● ● ● ●●
●● ●
●●
●
●
●●
●
●●
●
●
●●
●
●
●
●
● ●●
●●
●●
●●
●●● ● ●
●
●● ●
●●●●●
●
●
●
●
●●●
●
●
●
●
●●
●
●
● ●●● ●
●●● ● ●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●● ●
●
●
● ● ●●
●
●
●●
●●●
●
●
●●●
●●
●●
●
●●
●
●
●●
●
●
●
●
●
●●●
●
●●
●
●
●●●
●●●
●●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●● ●● ●
●●●
●●●
●
●
●●●
●●
●●
●
●●
●
●
●
●
●
●●
● ●●● ● ●●●
● ●●
●
● ●
●●
●●●
●
●
●●
●
●
●
●●
●
●●●●
●
●● ●● ●●
●
●●
●
●●
●●●
●
●●
●●●●
●
●●
●●● ●●
●
●●●
●
●
●
●●●
●●●
●
●●
●
●●
●
●
●●
●
●
●●●
● ● ●●
● ●
●●
●●● ●
●
●●
●
●
●●
●
●●
●●
●●● ● ●●
●●
●●●●
●●
●
●●
●
●
●●
●
● ●
●●● ●
●●●
●
● ●●
●
●
●
●
●
●
●
●
●
●●
●
●
●● ●●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
● ●●
●
●●
●
●
●
●
●
●
●
●●
●
● ●●
●
●
●● ●●●
●● ●
●●
●
●
●
●
●
●
●
●●●
●
●
●●●
●
●●
●●●
●●
●
●
●
●
●
●●●
●
●
● ●
●● ●
●
●●●
●
●
●
● ●
●
●
●●
●
●●●
● ●●
●●●
●
●
●
● ●
●●
●●
●
● ●
● ● ●
●●●
●
●
●
●●
●●
●●
●
●
●● ●
● ●●
●
●
●
●
●
●●
●●
●
● ● ●●●●
●
●
●
●
●
●● ●
●●
●
●
●
●●●
●
●
●●
●
●●●
●
●●
●
●
●●
●
●●
●●●●
●
●
●●
● ●
●
●●
●
●
●●
●●●
●
●
●
●●●●●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●●●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●●
●
●●
●●
●
●●●●
●
●●●
●
●●
●
●●
●
0 2 4 6 8
0.0
0.2
0.4
0.6
0.8
1.0
Plants trnL
●
●
●
●
●
●●
●●
●
●
●●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●●
●
●
●●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●●
●●●●
● ●●
● ●
●
●
●●
●
●
●
●
●
●●● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●●
●
●
● ●
●
●●●
●●
●●
● ● ●●
●●
●
●●●●●●
●
●●
●●
●●
●●
●
●
●
●
●
●●●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●●●
●
●
●●●
●
●
●
●
● ●●
●
●●
●
●●
●
●●
●
●
●
●
●
●
● ●●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●●●
●
●
●
●
●
●
●
●
●●
●
●●●
●●
●●
●●●●
●
●
●
●
● ●●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●●
●
●●●●●
●
●
● ●
●●
●●
● ●
●
●
●
●●●●
●●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●●
●
●
●
●
●● ●
●
●●
●
●
●●
●
●
●
● ● ●
●
●
●●
●
● ●●
●●●
●●
●●●
●●
●
●●
●
●
● ●
● ●●
●
●●
●
●
●
●
●●
● ●●
●
● ●
●
●
●
●●
●
●● ●
●●●
●
●
● ● ●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●● ●●
●●
●●
●●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●●●●
● ●
●
●
●
●
● ●●
●
●
●
●●
●
●
●●
●●
●
●
●● ●●
●
●
●●
●●
● ●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●●
●●
●● ●● ●
●
●
●
●
●●
●
●
●●
●●
●
●
● ●●
●●
●
●●
● ●
●●
●
●
●
●
●
●
● ●●
●
●●
●
●●
●
●
●
●●
●
●
●●
●●
● ●●●
●●
●
●
● ●
●
●
●●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●●●
●
●●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●● ●
●●
●
●●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●● ●●
●
● ●
●●
●●
●
●
● ●●●
●●
●●
●
●
●
●
●●
●
●●
●●●●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●● ●●
●
●●
●●
●●
●●
●●
●●
●
●
●
●
●
●●
●
●●
●
●
●●●
●●
●●
●
●●●
●● ●
●
●
●
●
●●
●
●● ●
●●
●
●●
● ● ●
●
●
●●
●
●
●●
●
●●
● ●●
●
●●
● ●
●
●
●
●●
●
●●
●●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
● ●
●
●
●● ●
●●
●
●
●
●
●●●
●
●●
●●
●
●●
●●●
●
●
●
●
●
●
●
●●● ●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●●
●●
●
●●●
●●
●●
●●
●●
●
●
●
●
●
● ●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●●
●●●
●●
●●
●●
●
●●
●●
●
●
●●
●●
● ●
●
●
●
●
●
●●
● ●●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●●
●
● ●
●
●
●●
●●
● ●●
●
●●●
●
●●
●
●
●●
●
●
●●●
●
●
●
●
● ●
●
●
●●
●●●●● ●●●●
●
●●
●●
●
●
●
●
●
● ●
●●
●●
●
●
●
● ●
●
●
●●
●
●
●● ●●
●
●
●
●
●●●●●
●●
●● ●
●●
●
●
● ●●
●
●●
●●
● ●
●
●●
●
●●●
●
●●
●●
●
●
●●
●
●
●●
●
●●
●
●●
●●●
●
●
●●
●●●
●
●●
●
●
●●●
●●
● ● ●●
●●● ●●●
●
●
●
●
●
●●
● ●●
●
●
●
●●
●
●
●
●
●●
● ●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●●
●
●●
●
●●
●
●
●
●
●
●● ●
●●
●●
●
●●●
●
●
● ●
●
●
●●
●●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●●●
●
●●
●●
●
●
●
●
●
●●●
●
●
●
●
●●●
●
● ●
●
●
●
● ●●●●●
●
●
●
●●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●● ●
●●
●
●
●
●●
●●
●
●●● ● ● ●●
●
●●
●
●●
●● ●
●
●●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●●
●
● ●
●
●
●
●●
●
●●
●
●
●
●
●
●●●●
●
●
●●
●●
●
●●
●●
●
●●
●●
●●
●
● ●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●●●
●●
●
●
●
●
●
●
●
●●
● ●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●● ●
●
●
●●
●
●
●
●●
●
●
●●●
●●
●
●
●●●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
● ●
●
●
●
●●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●●
●
●
●
●
●
●●
●● ●
●
●●
●● ●
●
●
●●
●
●
●●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●● ● ●● ●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●●
●
●
●
●
●
●●
●●
●●●
●
●
●
●●
●
● ●
●●●●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●●●
●●●●
●
●
●●●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●●
●●
●
●●
●●●
●●●●
●
●
● ●●
●
●
●●●
●
●●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●● ●
●
●●
●
●
●●
●●
●
●●
● ● ●●
●
●
●
●
●
●●
●●●
● ●
●
0 2 4 6 8
0.0
0.2
0.4
0.6
0.8
1.0
Arthropods 18S
●●
●
●
●
● ●● ● ●
●
●
●
●
●
●●
●●●
●
●●
●●
●
●●
●●
●
● ●
●
●
●
●
●
●
●
●
● ● ●●
●
● ●
●
●
●●
●●●
●●
●
●
●
●●
● ●
●●
●●●● ●
●● ● ●
●●
●
●
●
●● ●
● ●●●
●
●
●
●●
●
●
● ●
●
●
● ●
●
●
●
●●●● ● ●
●
●
●●●
●
●
●
●
●● ●
●●●
● ●
●●●●
●●● ●●●
●
●●●●
●● ●
●
●
●
●
●
●
●
●
● ● ●
●
●●●
●
●● ●
●●●●
●●
●●
●
● ● ●●
●
●●
●
●●
●
●●
●
●
●
●●
●
●●●●
●
● ●● ●
●
●
●
●
●●
●
●
●●●● ●
●
●
●●
●●
●
●
●
●●
●
●
● ●
●
●
●●
●
●
● ●●●
●
●
●●
●
●
●
●●●●
● ●
●
●●
● ●●
●
● ●
●
●
●
●●●●●●
●●
●
●
●●●
●
●
●●●●●
●●
●●● ●●●
● ●●
●●
●
●●
●●●
●
●
●
●
●
●
●
●
● ● ● ●●●● ●
●
●● ● ●●
●●●
●●
● ●●
●●●
●●
● ●●
●
●
●●
●
●
●
●● ●
●
●
● ●●●●
●
●●
●
●
●●
●
●
●●
●
●●
●●
●●
●
●●●
● ●
●
●●●
●
●● ●
●
●●
●
●●
●
●
●●
●
●●●
●
●●
●
● ●●
●
●● ●
●
●●
● ●
●●
●
● ●
●
●
●●●
●
●
●●● ●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●●
●
● ●●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●●
●
●
●
●●●
●
●●● ●●
●● ●
●
●●●●
●●
●
● ●●●
● ● ●●
●●
●
●●● ●
●●
●
●
●
●
●● ●●
●
●
●●●
●●●
●●
●
●
●●
●
● ●●●
●●
●
●●
●
●● ●
●
●
●●
●
●●
●●
●
●
●
●
●
●●
●●
●
●
● ●●
●
●
●
●
●
●
●
●●
●
●●●
●●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
● ●
●
●
●●
●
●
●●
● ●
●
●●
●●●
●
●●● ●●●●
●●
● ●
●
●●●
●
●
●
●●
●
●
●
●
●●
●●
●●
●●
●
●
●●
●
●
●
●
●●
●●●
●
● ●
●
●●●
●●●
●
●
●
●
●
●
●●●
● ●
●●
●
●
●
●●
●
●
●
●
●
● ● ● ●●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●●● ●●
●●
●
●
● ●●
●●
●
●
●●
●
●
● ●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●●
● ●
●●
●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●●
●
●
●
● ●
●
●
●
●
●●●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●●●
●
●●
●
●
● ●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●● ●
●● ●
●
●
●
●
●
●
●●
●
●●
● ●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
● ●
●
●●●
●●
● ●
●
●
●●
●
●●
●
●● ●●
●●
●●
●
●
●
●
●●●●
●●●
● ●●●●●
●
●
●
●
●
●
●
●
● ● ●
●
●●
●
● ●
●●
●
●
●●
●●
●
●●●
● ●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●●
●
●
●●
●
●●
●
●
●● ●●
●●●●●
●
●
●
●●
●●●
●
●
●
●
●●
●
●●
●●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●●
●●
● ●●
●
●
●
●
●
●● ●
●●
●●
●●
●●
●
●●
●●
●
●
●
●
●
● ●● ●●
●
●
●●
●
●● ●
●●
●
●
●
●●●●
●●●●
●
● ●
●
●●●
●
●
●
●●●
●●
●
●●
●●●
●
●●● ●
●●●●
●
●
●
●●
●
●
●
●
●
●●
●
●
● ●●
●
●
●
●
●●●
●
●●
●
●●
● ●●
●
●●
●
● ●
●
●●
●
●●
●●
●
●●●● ●
●●
●●
● ●●
●
●●
●● ●
●
● ●
●
●
●●●
●●
● ●● ●●●
●●
●
●●●
●
●
●●
●
●
●
● ●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●● ●
●
●
●●
●
● ●
●
●
●
●
●
●
●●
●
●
●●
●●● ●●
●
●
●
●●
●
●
●
● ●●
●
● ●●
●●●
●
●●●●
●
●●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●●
●●●●
●
●●●
●
●
●
●
●
●●
●
●●
●●
●
●
●●
●
●●●
●
●
●
●
●
●
●
●●
●●
● ●●
●
●●●
●●
●
● ●
●
●
●●
●
●
● ●
●
●●
●
●●
●●
●
●
● ●
●
●●
●●● ●●
●
●●
● ●●
●
●
●● ●● ●
●●
●●
●
●
●
● ●●●
●
● ●
●
●●● ●
●●
● ●
●
●●●
●
●
●
●
●
●●
●
●●●
●
●●●
●
●
●
●
● ●●
●
●
●●●
●
●●●
● ●●
●●
●●●●
●
●●
●
●●
●
●●●
●
●●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●●
●●
●●
●●
●
●
●●
● ●
●
●
●
●●
●
●
●
●
●●●●
●●
●●●
●
●●
●
●
●
●●
●●
●
0 2 4 6 8
0.0
0.2
0.4
0.6
0.8
1.0
Insects 16S
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
● ●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●●
●●
●
●
●
●
●●●●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
● ● ●
● ●
●
●
●●
●
●●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●● ●
●
●●
●
●
●●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
● ●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●●●
●●●
●● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●● ●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●● ●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
● ●
●
●
●
●●●
●
●
●●
●
●●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
●
●●
●
●●●
●
●
●● ●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
● ●
●
●●
●
●●
●
●
●
●
●
●●
●
● ●
●
●●
● ●●
●● ●
●
●
● ●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●●
● ●●
●
● ●
●
●
●
●●
●
●
● ●
●
●
●
●
●●
●● ●
●
●
●
● ●
●
●
●●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
● ●●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●●●●
●
●
● ●●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●●
●
●
● ●
●
●
●●
●
●
●
● ●
●
● ●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●●
●
● ●
●
●
●
● ●
●
●●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
● ● ●
●
●●
●
●
●●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ● ●●
●
●
●
●●●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
● ●●
●
●●●●
●
●●
●
● ● ●
●
●●
●
●
●●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●●
●
●●
●
● ● ●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●●
●●●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●●
●
●
●
● ●
●●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●●
●
● ●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
● ●
●
●●
●
●
●
●
●
●●
●
● ●
●
●
● ●
●
●
●●●
●●
●
●●
● ●
●
●
●
● ●
●
●● ●
●
●
●
●●
●
●●
●
●
●●●
●
●
●●●
●
●
●
●
●●
●
●
●
● ●●
●●
●
●
●
●
●
●●
●
● ●
●
●
●●
●
●●
●●
●
●
●● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●● ●
●
●●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●●● ●
●
●
●● ●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●●
●
●
●●
●
● ●
●
●
●
● ●
●
●●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●●
●
●
●●●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●● ●
●
●
●
●●
●
●●
● ●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●
● ●
●
●●
●●
●
●
●
●
●
●
●●
●●
●●●● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●●
●
●
● ●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●●
●●
●●
●
●
●●
●
●
●●
●●
●●●
●
●
●
●
●
●
●●
●
●●
●●
●●●
● ●●
●●●
●
●
●
●
●
●
●●●●
●
●
● ●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●●●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●●
●
●●
●
●
●
●●
●
●● ● ●
●●●●
●●
●
●
●●
●
●
●
●
●●
● ●
●●●
●
●●●●
●●
0 2 4 6 8
0.0
0.2
0.4
0.6
0.8
1.0
Annelids 18S
●
●
●
●
●
● ●
●
●●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●● ●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●●
● ●●●
●
●
●
●●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●●
●●●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●●
●
●
● ●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●●
●
●● ●●
●
●
●
●
●
●
●
●
●●
●
●
●● ●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●●
●
●●
●
● ●
●
●
●●
●●
●
●●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●● ●
●● ●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●● ●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
● ●●
●
●
●
●
●
●●●
●
●
●
● ●
●●
●
●
●
●●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●●
●
●● ●●
●●
●
●●
●
●
●●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
● ●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●●
●
●●
●●
● ●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●●
●
●●
●
●
●
●●
●
●●
● ●
●
●
●
●
●
●
● ●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
● ●
●
●
●
●
●●
●●●
●●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
● ●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●
● ● ●●
●
●● ●● ●● ●● ●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●●
●●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
0 2 4 6 8
0.0
0.2
0.4
0.6
0.8
1.0
Nematodes 18S
●
●
●
● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
● ●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
● ● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ● ●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●●
● ● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●● ● ● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●● ●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●●
● ● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●●
● ● ●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
● ● ●● ●
●
●● ● ●● ● ● ●●●● ● ●●●● ●●●
●
●
●●
●
●●
●
●
●
●
●
●●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
● ●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
● ●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●● ● ●●●●●●●●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
0 2 4 6 8
0.0
0.2
0.4
0.6
0.8
1.0
Platyhelminthes 18S
Chapter1–DNA-basedBetaDiversity
90
Figure5:Occurrence-based(Sorensen)dissimilarityasafunctionofsoildissimilarity.Soildissimilarity is computed as the Euclidian distance between the first four PCA axes of themeasuredsoilvariables.Theredlinefiguresthelinearregression.
TheRDA(‘rawdata’)approachledtoslightlydifferentconclusionsthanMantel-
basedcorrelations,andbroughtadditionalinsight.Asignificantfractionofannelidbeta
diversitycouldbeexplainedbyspatialaggregation,whiledistance-basedanalysesusing
Sorensen dissimilarity did not detect any signal in this group. This is in linewith the
limited dispersal abilities reported for annelids in this area (Decaëns et al., 2016).
Spatial aggregation was also found to be an important factor explaining the spatial
distributionofprotistsandbacteriainadditiontosoilproperties.Incontrast,wefound
little explanatory power for insects and arthropods. Overall, this is in line with the
highersensitivity tospatial structurereported in the literature for ‘rawdata’analyses
(Legendreetal.,2005),eventhoughtheinterpretationofthisspatialstructureasbeing
indicative of neutral processes is not straightforward (Smith & Lundholm, 2010).
However, a potential problem in our study design is that the logarithmic geographic
samplingschemeisnotideallysuitedtothedescriptionbyPCNMvariables.Becauseof
thechallengingnatureofextractingDNAonsite tominimizecontaminations,wecould
notmultiply the number of sampling points, butwe hope to address the issue of the
samplingdesignforDNA-basedbetadiversityanalysesinaforthcomingcontribution.
The fit of the neutral prediction for the decay of similarity with distance was
significant forall taxonomicgroupsexceptarthropods,nematodesandflatworms,but
waspoorerthanthefitofSorensendissimilaritytolog-distance.Apossibleconfounding
factoristhatunliketheSorensenindex,the𝐹! similaritymeasureissensibletonoisein
OTUabundances,andmayalsobebiasedbyunevensamplingeffortamongsamplesin
DNA-based data. Overall, a decay of𝐹!similarity with distance was detected in the
groups forwhich raw-data analyses showed an effect of spatial aggregation,which is
consistentwith the fact that both types of analysis rely on abundance information. In
particular,adecayof𝐹! similaritywithdistancewasfoundinannelidswhilenonewas
detected using Sorensen dissimilarity, which suggests that in this group, differences
between samples lie in the abundance pattern of OTUs rather in their occurrence
pattern.
Chapter1–DNA-basedBetaDiversity
91
Figure 6: Variance partitioning between soil PCA axes and spatial structure (PCNMdecomposition).Thespatialmodel is thereunionof two independentPCNMdecompositions,one for the Nouragues sampling sites and one for the Paracou sampling sites, plus the UTMcoordinatesinbothgroupsofsites.ThetwoPCNMdecompositionsareconnectedbyadummyvariablethattakesonevalueinNouraguesandanotherinParacou.Forwardvariableselectionisperformed on soil and spatial variables before variance partitioning. Hatching indicates non-significantpurefractions.
Our estimate of 43m for themean dispersal distance per generation in plants
wasclosetothatmeasuredempiricallyforneotropicaltrees(39m;Conditetal.,2002),
andtothatestimatedbyfittingtheneutralsimilaritydistance-decaypredictiontotree
censusdata (between40and73m;Conditetal.,2002).Becausean importantpartof
theretrievedplantDNAoriginates fromthetreerootsystem,conflatingthedensityof
plant individuals with that measured for trees may be a reasonable assumption,
howeversuchestimatesforthedensityofindividualsaredifficulttoobtainintheother
Pure spatialMixedPure soil
0.0
0.1
0.2
0.3
0.4
Varia
nce
parti
tioni
ng (R
DA)
Bacter
ia 16
S
Protists
18S
Fungi
ITS
Fungi
18S
Plants
trnL
Plants
18S
Arthrop
ods 1
8S
Annelid
s 18S
Nemato
des 1
8S
Platyh
elmint
hes 1
8S
Insec
ts 18
S
Insec
ts 16
S
Chapter1–DNA-basedBetaDiversity
92
taxonomicgroups.Thedispersaltodiversificationratio𝜎! 𝜈isdirectlymeasuredasthe
ratiobetween the intercept𝑏and theslope𝑎of the linear regressionof𝐹!against log-
distance(cf.Appendix).Lower𝜎! 𝜈in fungiand insectsreflects the lowmean levelof
similaritybetweensamplesinthesegroups,whichis,underaneutralmodel,indicative
offasterdiversificationthandispersal,whilethereversewouldholdtrueinhigher-𝜎! 𝜈
bacteriaandprotists(TableS6).
The challenge of measuring beta diversity is critical in conservation biology
(Koleff et al., 2003; Socolar et al., 2017), and today the vast majority of the lowland
tropicallandscapesarepartlydeforestedoratleastdegradedbyhumanactivities,with
directandmeasurableimpactonbiologicaldiversity(Barlowetal.,2016).Thetropical
forestsof FrenchGuianahave experienced low ratesof forest clearanceover thepast
decade (Hansen et al., 2013) and our sampling sites can therefore be considered as
undisturbed, and a baseline for the many studies focused on disturbed landscapes.
Hence, in our study, the processes shaping community assembly are unlikely to be
ascribed to human factors. We acknowledge that humans may have had previously
unnoticed impactsonbiodiversityespeciallyoncultivatedplants (Heckenbergeretal.,
2008) or earthworms (Marichal et al., 2010), however the great majority of our
undisturbedsitesarelocatedfarfrompresentorhistoricallocationsofdisturbancesand
wearethereforefairlyconfidentthatthepatternswehaveuncoveredarecontingenton
natural processes. However, to better quantify the possible magnitude of human
disturbances,wealsostudiedhowbetadiversityisalteredbyintensiveloggingandby
clear-cutting,atsiteswheretheforesthashadatleast18yearstorecover.Differencesin
vegetationareeasilynoticeableonthefield,andareindeedreflectedinourDNA-based
study.Thisanalysis,althoughlimitedinthenumberofsamples,alsoshowsaneffectof
pastloggingactivitiesonannelidsandtoalesserextentonfungi;howeverlittleimpact
ontheothercomponentsofsoilbiodiversitycanbedetected.
The current study is predicated on our assumption that DNA-basedmetrics of
betadiversitydocapturethesameecologicalprocessesasclassicones.Wedidfindthat
our data capture most of the diversity present in our soil samples, as indicated by
rarefaction analyses (Fig. S1).We also testedwhether our resultsweredependent on
the choice of the DNA barcode, by comparing the results obtained for the same
taxonomic groupwith twodistinctDNAbarcodes (Tables S4, S5, Fig. S4, S5). Inmost
Chapter1–DNA-basedBetaDiversity
93
cases, the results appear robust to the choice of the DNA barcode, even though we
detected,asexpected,moresignalinthespecificbarcodesforplants, insectsandfungi
than in the generic 18S barcode of lower taxonomic resolution. Overall, althoughwe
emphasize that current DNA-based inventories do not always capture the same
taxonomicgrain as classic surveys, this approachhas the advantageofbeing scalable,
anditshouldthusbeappropriateforrapidbiodiversityinventories,especiallyinfragile,
orthreatenedecosystems.
Chapter1–DNA-basedBetaDiversity
94
Acknowledgements
WethankMaximeRéjou-Méchainforfruitfuldiscussions.Thisworkhasbenefitedfrom
“Investissement d’Avenir” grants managed by the French Agence Nationale de la
Recherche (CEBA, ref. ANR-10-LABX-25-01 and TULIP, ref. ANR-10-LABX-0041;
ANAEE-France:ANR-11-INBS-0001),anadditionalANRgrant(METABARproject;PIP.
Taberlet), and funds fromCNRS.Work has been carried out at the CNRSNouragues
Research Station, within the Nouragues Natural Reserve, and at the CIRAD Paracou
ResearchStation.Wethankthemanagersofbothresearchstations.
Chapter1–DNA-basedBetaDiversity
95
References
Bahram,M.,Koljalg,U.,Courty,P.-E.,Diedhiou,A.G.,Kjoller,R.,Polme,S.,Ryberg,M.,Veldre,V.&Tedersoo,L.(2013)Thedistancedecayofsimilarityincommunitiesofectomycorrhizalfungiindifferentecosystemsandscales.JournalofEcology,101,1335–1344.
Barlow, J., Lennox,G.D.,Ferreira, J.,Berenguer,E.,Lees,A.C.,Nally,R.M.,Thomson, J.R.,Ferraz,S.F. de B., Louzada, J., Oliveira, V.H.F., Parry, L., Solar, R.R. de C., Vieira, I.C.G., Aragão,L.E.O.C., Begotti, R.A., Braga, R.F., Cardoso, T.M., Jr, R.C. de O., Jr, C.M.S., Moura, N.G.,Nunes, S.S., Siqueira, J.V., Pardini, R., Silveira, J.M., Vaz-de-Mello, F.Z., Veiga, R.C.S.,Venturieri,A.&Gardner,T.A. (2016)Anthropogenicdisturbance in tropical forestscandoublebiodiversitylossfromdeforestation.Nature.
Baselga, A., Gómez-Rodríguez, C. & Lobo, J.M. (2012) Historical legacies in world amphibiandiversity revealed by the turnover and nestedness components of beta diversity.PLoSOne,7,e32341.
Basset,Y.,Cizek,L.,Cuénoud,P.,Didham,R.K.,Guilhaumon,F.,Missa,O.,Novotny,V.,Ødegaard,F.,Roslin,T.,Schmidl,J.,Tishechkin,A.K.,Winchester,N.N.,Roubik,D.W.,Aberlenc,H.-P.,Bail, J., Barrios, H., Bridle, J.R., Castaño-Meneses, G., Corbara, B., Curletti, G., Duarte daRocha,W.,DeBakker,D.,Delabie, J.H.C.,Dejean,A.,Fagan,L.L.,Floren,A.,Kitching,R.L.,Medianero,E.,Miller,S.E.,GamadeOliveira,E.,Orivel,J.,Pollet,M.,Rapp,M.,Ribeiro,S.P.,Roisin, Y., Schmidt, J.B., Sørensen, L. & Leponce, M. (2012) Arthropod Diversity in aTropicalForest.Science,338,1481–1484.
Blanchet, F.G., Legendre, P. & Borcard, D. (2008) Forward selection of explanatory variables.Ecology,89,2623–2632.
Bongers, F., Charles-Dominique, P., Forget, P.-M.&Théry,M. (2001)Nouragues:dynamicsandplant-animalinteractionsinaNeotropicalrainforest,SpringerScience&BusinessMedia.
Borcard, D. & Legendre, P. (2002) All-scale spatial analysis of ecological data by means ofprincipalcoordinatesofneighbourmatrices.EcologicalModelling,153,51–68.
Borcard,D.,Legendre,P.,Avois-Jacquet,C.&Tuomisto,H.(2004)Dissectingthespatialstructureofecologicaldataatmultiplescales.Ecology,85,1826–1832.
Borcard,D.,Legendre,P.&Drapeau,P.(1992)Partiallingoutthespatialcomponentofecologicalvariation.Ecology,73,1045–1055.
Boyer,F.,Mercier,C.,Bonin,A.,LeBras,Y.,Taberlet,P.&Coissac,E.(2016)OBITOOLS:aUNIX-inspired software package for DNA metabarcoding. Molecular Ecology Resources, 16,176–182.
Chao, A., Chiu, C.-H. & Hsieh, T.C. (2012) Proposing a resolution to debates on diversitypartitioning.Ecology,93,2037–2051.
Chave, J. & Leigh, E.G. (2002) A spatially explicit neutral model of beta-diversity in tropicalforests.TheoreticalPopulationBiology,62,153–168.
Clarke, L.J., Soubrier, J., Weyrich, L.S. & Cooper, A. (2014) Environmental metabarcodes forinsects: in silicoPCRrevealspotential for taxonomicbias.MolecularEcologyResources,14,1160–1170.
Condit, R., Pitman, N., Leigh, E.G., Chave, J., Terborgh, J., Foster, R.B., Nunez, P., Aguilar, S.,Valencia,R.,Villa,G.,Muller-Landau,H.C.,Losos,E.&Hubbell,S.P.(2002)Beta-diversityintropicalforesttrees.Science,295,666–669.
Cottenie, K. (2005) Integrating environmental and spatial processes in ecological community
Chapter1–DNA-basedBetaDiversity
96
dynamics.EcologyLetters,8,1175–1182.Decaëns, T., Porco, D., James, S.W., Brown, G.G., Chassany, V., Dubs, F., Dupont, L., Lapied, E.,
Rougerie,R.&Rossi,J.-P.(2016)DNAbarcodingrevealsdiversitypatternsofearthwormcommunities inremotetropical forestsofFrenchGuiana.SoilBiologyandBiochemistry,92,171–183.
Fliegerova,K.,Tapio,I.,Bonin,A.,Mrazek,J.,Callegari,M.L.,Bani,P.,Bayat,A.,Vilkki,J.,Kopečný,J.&Shingfield,K.J.(2014)EffectofDNAextractionandsamplepreservationmethodonrumenbacterialpopulation.Anaerobe,29,80–84.
Gaston,K.&Blackburn,T.(2008)Patternandprocessinmacroecology,JohnWiley&Sons.Gilbert, B. & Lechowicz, M.J. (2004) Neutrality, niches, and dispersal in a temperate forest
understory. Proceedings of the National Academy of Sciences of the United States ofAmerica,101,7651–7656.
Gourlet-Fleury, S., Ferry,B.,Molino, J.-F.,Petronelli,P.&Schmitt,L. (2004)Experimentalplots:keyfeatures,Elsevier.
Grau,O.,Peñuelas,J.,Ferry,B.,Freycon,V.,Blanc,L.,Desprez,M.,Baraloto,C.,Chave,J.,Descroix,L.&Dourdain,A.(2017)Nutrient-cyclingmechanismsotherthanthedirectabsorptionfromsoilmaycontrol foreststructureanddynamics inpoorAmazoniansoils.ScientificReports,7,45017.
Guardiola,M.,Uriz,M.J.,Taberlet,P.,Coissac,E.,Wangensteen,O.S.&Turon,X.(2015)Deep-sea,deep-sequencing:metabarcodingextracellularDNAfromsedimentsofmarinecanyons.PloSone,10,e0139633.
Hansen,M.C., Potapov, P.V., Moore, R., Hancher,M., Turubanova, S.A., Tyukavina, A., Thau, D.,Stehman,S.V.,Goetz,S.J.,Loveland,T.R.,Kommareddy,A.,Egorov,A.,Chini,L.,Justice,C.O.&Townshend, J.R.G. (2013)High-ResolutionGlobalMapsof21st-CenturyForestCoverChange.Science,342,850–853.
Harrison,S.,Ross,S.J.&Lawton, J.H. (1992)BetaDiversityonGeographicGradients inBritain.JournalofAnimalEcology,61,151–158.
Heckenberger,M.J.,Russell,J.C.,Fausto,C.,Toney,J.R.,Schmidt,M.J.,Pereira,E.,Franchetto,B.&Kuikuro,A.(2008)Pre-ColumbianUrbanism,AnthropogenicLandscapes,andtheFutureoftheAmazon.Science,321,1214.
Hortal,J.,Diniz-Filho,J.A.F.,Bini,L.M.,Rodríguez,M.Á.,Baselga,A.,Nogués-Bravo,D.,Rangel,T.F.,Hawkins,B.A.&Lobo,J.M.(2011)Iceageclimate,evolutionaryconstraintsanddiversitypatternsofEuropeandungbeetles.EcologyLetters,14,741–748.
Hubbell, S.P. (2001) The unified neutral theory of biodiversity and biogeography (MPB-32),PrincetonUniversityPress.
Hubbell,S.P. (2013)Tropical rain forestconservationand the twinchallengesofdiversityandrarity.EcologyandEvolution,3,3263–3274.
Koleff,P.,Gaston,K.J.&Lennon,J.J.(2003)Measuringbetadiversityforpresence–absencedata.JournalofAnimalEcology,72,367–382.
Kreft,H.&Jetz,W.(2010)Aframeworkfordelineatingbiogeographicalregionsbasedonspeciesdistributions.JournalofBiogeography,37,2029–2053.
Lawton,J.H.,Bignell,D.E.,Bolton,B.,Bloemers,G.F.,Eggleton,P.,Hammond,P.M.,Hodda,M.,Holt,R.D., Larsen, T.B.&Mawdsley,N.A. (1998)Biodiversity inventories, indicator taxa andeffectsofhabitatmodificationintropicalforest.Nature,391,72–76.
Legendre, P., Borcard, D. & Peres-Neto, P.R. (2005) Analyzing beta diversity: partitioning thespatialvariationofcommunitycompositiondata.EcologicalMonographs.
Chapter1–DNA-basedBetaDiversity
97
Legendre,P.&Legendre,L.(2012)NumericalEcology,Elsevier.Leibold, M.A., Holyoak, M., Mouquet, N., Amarasekare, P., Chase, J.M., Hoopes, M.F., Holt, R.D.,
Shurin, J.B., Law, R., Tilman,D., Loreau,M.&Gonzalez, A. (2004) Themetacommunityconcept:aframeworkformulti-scalecommunityecology.EcologyLetters,7,601–613.
Mariadassou,M.,Pichon,S.&Ebert,D.(2015)Microbialecosystemsaredominatedbyspecialisttaxa.EcologyLetters,18,974–982.
Marichal,R.,Martinez,A.F.,Praxedes,C.,Ruiz,D.,Carvajal,A.F.,Oszwald,J.,delPilarHurtado,M.,Brown, G.G., Grimaldi, M., Desjardins, T. & others (2010) Invasion of Pontoscolexcorethrurus (Glossoscolecidae, Oligochaeta) in landscapes of the Amazoniandeforestationarc.AppliedSoilEcology,46,443–449.
Martiny,J.B.H.,Eisen,J.A.,Penn,K.,Allison,S.D.&Horner-Devine,M.C.(2011)Driversofbacterialbeta-diversitydependonspatialscale.ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesofAmerica,108,7850–7854.
Nekola, J.C.&White,P.S. (1999)Thedistancedecayofsimilarity inbiogeographyandecology.JournalofBiogeography,26,867–878.
Novotny, V., Miller, S.E., Hulcr, J., Drew, R.A.I., Basset, Y., Janda, M., Setliff, G.P., Darrow, K.,Stewart, A.J.A., Auga, J., Isua, B., Molem, K., Manumbor, M., Tamtiai, E., Mogia, M. &Weiblen, G.D. (2007) Low beta diversity of herbivorous insects in tropical forests.Nature,448,692–695.
Pace,N.R.(1997)Amolecularviewofmicrobialdiversityandthebiosphere.Science,276,734–740.
Ramirez,K.S., Leff, J.W.,Barberan,A., Bates, S.T., Betley, J., Crowther,T.W.,Kelly, E.F.,Oldfield,E.E., Shaw, E.A., Steenbock, C., Bradford, M.A., Wall, D.H. & Fierer, N. (2014)Biogeographic patterns in below-grounddiversity inNewYork City’s Central Park aresimilartothoseobservedglobally.ProceedingsoftheRoyalSocietyB-BiologicalSciences,281,9.
Rosenzweig,M.L.(1995)Speciesdiversityinspaceandtime,CambridgeUniversityPress.Rosvall, M., Axelsson, D. & Bergstrom, C.T. (2009) The map equation. The European Physical
JournalSpecialTopics,178,13–23.Schuldt, A., Wubet, T., Buscot, F., Staab, M., Assmann, T., Böhnke-Kammerlander, M., Both, S.,
Erfmeier,A.,Klein,A.-M.,Ma,K.,Pietsch,K.,Schultze,S.,Wirth,C.,Zhang,J.,Zumstein,P.&Bruelheide, H. (2015)Multitrophic diversity in a biodiverse forest is highly nonlinearacrossspatialscales.NatureCommunications.
Siles, J.A. & Margesin, R. (2016) Abundance and Diversity of Bacterial, Archaeal, and FungalCommunitiesAlonganAltitudinalGradientinAlpineForestSoils:WhatAretheDrivingFactors?MicrobialEcology,72,207–220.
Smith,T.W.&Lundholm,J.T.(2010)Variationpartitioningasatooltodistinguishbetweennicheandneutralprocesses.Ecography,33,648–655.
Socolar, J.B.,Gilroy, J.J.,Kunin,W.E.&Edwards,D.P. (2017)HowShouldBeta-Diversity InformBiodiversityConservation?TrendsinEcology&Evolution,31,67–80.
Soininen,J.,McDonald,R.&Hillebrand,H.(2007)Thedistancedecayofsimilarityinecologicalcommunities.Ecography,30,3–12.
ter Steege, H., Nigel, C.A., Sabatier, D., Baraloto, C., Salomao, R.P., Guevara, J.E., Phillips, O.L.,Castilho, C.V.,Magnusson,W.E.,Molino, J.F.,Monteagudo,A., Vargas, P.N.,Montero, J.C.,Feldpausch, T.R., Coronado, E.N.H., Killeen, T.J., Mostacedo, B., Vasquez, R., Assis, R.L.,Terborgh, J.,Wittmann,F.,Andrade,A.,Laurance,W.F.,Laurance,S.G.W.,Marimon,B.S.,
Chapter1–DNA-basedBetaDiversity
98
Marimon, B.H., Vieira, I.C.G., Amaral, I.L., Brienen, R., Castellanos, H., Lopez, D.C.,Duivenvoorden, J.F.,Mogollon, H.F.,Matos, F.D.D., Davila, N., Garcia-Villacorta, R., Diaz,P.R.S., Costa, F., Emilio, T., Levis, C., Schietti, J., Souza, P., Alonso, A., Dallmeier, F.,Montoya,A.J.D.,Piedade,M.T.F.,Araujo-Murakami,A.,Arroyo,L.,Gribel,R., Fine,P.V.A.,Peres,C.A.,Toledo,M.,Gerardo,A.A.C.,Baker,T.R.,Ceron,C.,Engel,J.,Henkel,T.W.,Maas,P., Petronelli, P., Stropp, J., Zartman, C.E., Daly, D., Neill, D., Silveira, M., Paredes, M.R.,Chave,J.,Lima,D.D.,Jorgensen,P.M.,Fuentes,A.,Schongart,J.,Valverde,F.C.,DiFiore,A.,Jimenez, E.M., Mora, M.C.P., Phillips, J.F., Rivas, G., van Andel, T.R., von Hildebrand, P.,Hoffman,B.,Zent,E.L.,Malhi,Y.,Prieto,A.,Rudas,A.,Ruschell,A.R.,Silva,N.,Vos,V.,Zent,S.,Oliveira,A.A.,Schutz,A.C.,Gonzales,T.,Nascimento,M.T.,Ramirez-Angulo,H.,Sierra,R.,Tirado,M.,Medina,M.N.U.,vanderHeijden,G.,Vela,C.I.A.,Torre,E.V.,Vriesendorp,C.,Wang,O.,Young,K.R.,Baider,C.,Balslev,H.,Ferreira,C.,Mesones, I.,Torres-Lezama,A.,Giraldo, L.E.U., Zagt, R., Alexiades, M.N., Hernandez, L., Huamantupa-Chuquimaco, I.,Milliken,W.,Cuenca,W.P.,Pauletto,D.,Sandoval,E.V.,Gamarra,L.V.,Dexter,K.G.,Feeley,K., Lopez-Gonzalez, G. & Silman,M.R. (2013) Hyperdominance in the Amazonian TreeFlora.Science,342,325–+.
Taberlet,P.,Coissac,E.,Hajibabaei,M.&Rieseberg,L.H.(2012)EnvironmentalDNA.MolecularEcology,21,1789–1793.
Taberlet,P.,Gielly,L.,Pautou,G.&Bouvet,J.(1991)Universalprimersforamplificationofthreenon-codingregionsofchloroplastDNA.Plantmolecularbiology,17,1105–1109.
Thompson, R.&Townsend, C. (2006)A trucewith neutral theory: local deterministic factors,speciestraitsanddispersallimitationtogetherdeterminepatternsofdiversityinstreaminvertebrates.JournalofAnimalEcology,75,476–484.
Vincent, J.B., Weiblen, G.D. & May, G. (2016) Host associations and beta diversity of fungalendophytecommunitiesinNewGuinearainforesttrees.MolecularEcology,25,825–841.
Whittaker,R.H.(1972)EvolutionandMeasurementofSpeciesDiversity.Taxon,21,213–251.Whittaker,R.H.(1960)VegetationoftheSiskiyouMountains,OregonandCalifornia.Ecological
Monographs,30,279–338.Yu,D.W., Ji, Y.Q.,Emerson,B.C.,Wang,X.Y., Ye,C.X., Yang,C.Y.&Ding,Z.L. (2012)Biodiversity
soup:metabarcodingofarthropodsforrapidbiodiversityassessmentandbiomonitoring.MethodsinEcologyandEvolution,3,613–623.
Zinger, L., Chave, J., Coissac, E., Iribar, A., Louisanna, E., Manzi, S., Schilling, V., Schimann, H.,Sommeria-Klein,G.&Taberlet,P.(2016)ExtracellularDNAextractionisafast,cheapandreliable alternative for multi-taxa surveys based on soil DNA. Soil Biology andBiochemistry,96,16–19.
Zinger, L., Taberlet, P., Schimann, H., Bonin, A., Boyer, F., De Barba, M., Gaucher, P., Gielly, L.,Giguet-Covex,C.,Iribar,A.,Réjou-Méchain,M.,Rayé,G.,Rioux,D.,Schilling,V.,Tymen,B.,Viers,J.,Zouiten,C.,Thuiller,W.,Coissac,E.&Chave,J.(2017)Soilcommunityassemblyvariesacrossbodysizesinatropicalforest.BioRxiv.
Chapter1–DNA-basedBetaDiversity
99
SupplementaryInformation
#OTUs #Reads
PlantstrnL 776 5,142,400
Plants18S 71 366,646
Bacteria16S 11,380 3,863,620
Protists18S 295 240,223
FungiITS 4,312 2,151,746
Fungi18S 386 832,153
Arthropods18S 342 463,057
Insects16S 3,497 1,331,880
Insects18S 70 185,446
Annelids18S 18 145,044
Nematodes18S 81 10,672
Platyhelminthes18S 32 15,619
TableS1:NumberofOTUsandreadcountpertaxonomicgroup.
Chapter1–DNA-basedBetaDiversity
100
pH Ctot Ntot P2O5 Clay Silt Sand Al Fe Mg Mn K Ca
Unit None (g/kg) (mg/kg) (%) (%) (%) (cmol+/kg)
Inselbergsummit 4.9 30.5 1.8 <5.0 27.5 4.6 67.9 2.1 0.081 0.27 0.011 0.097 0.08
PP-F21 4.9 29.1 1.9 5.8 33.6 4.0 62.6 2.6 0.068 0.13 0.011 0.088 0.09
PP-H20 4.3 34.3 2.1 7.3 48.1 4.5 47.4 2.5 0.144 0.46 0.018 0.127 0.26
PP-H21 4.8 31.1 2.2 6.0 51.0 4.3 44.7 2.3 0.063 0.37 0.017 0.082 0.25
GP-L11 5.0 35.7 3.0 5.3 73.3 12.8 13.9 1.2 0.006 0.67 0.125 0.113 1.48
GP-L12 4.6 37.0 3.1 7.3 71.8 16.8 11.4 1.6 0.006 0.43 0.215 0.112 0.75
GP-O13 4.3 32.6 2.1 6.8 71.7 12.8 15.6 3.5 0.030 0.51 0.070 0.114 0.75
GP-Liana 5.2 27.6 2.6 7.8 52.0 30.4 17.6 0.4 0.005 1.65 0.252 0.143 3.64
Balanfois-1 3.9 41.8 2.9 <5.0 78.7 7.6 13.8 3.5 0.041 0.22 0.028 0.084 0.35
Balanfois-2 3.9 40.7 2.9 <5.0 79.6 6.0 14.4 3.3 0.046 0.31 0.025 0.087 0.27
Parare-5 4.4 35.9 2.6 10.3 64.5 12.3 23.3 2.8 0.056 0.59 0.020 0.114 0.21
Parare-6 4.0 38.1 2.5 11.3 55.5 19.2 25.4 4.0 0.058 0.24 0.019 0.116 0.16
Paracou-06.3 5.0 19.2 1.2 6.5 13.4 7.2 79.4 1.2 0.043 0.22 <0.010 0.078 0.10
Paracou-06.4 4.9 20.1 1.3 8.3 12.2 7.0 80.9 1.2 0.035 0.16 <0.010 0.058 0.11
Paracou-11.1 5.0 20.1 1.2 6.8 16.3 8.0 75.7 1.6 0.053 0.23 <0.010 0.080 0.14
Paracou-12.1 4.6 27.8 1.7 <5.0 27.0 7.6 65.5 2.3 0.124 0.24 <0.010 0.080 0.07
Paracou-12.2 4.6 20.6 1.2 7.3 16.5 6.5 77.1 1.9 0.113 0.21 <0.010 0.079 0.16
Arbocel-7.3 4.6 30.7 1.9 7.0 22.0 10.0 68.0 1.6 0.143 0.32 <0.010 0.085 0.13
Arbocel-7.4 4.6 30.3 1.9 <5.0 24.6 10.7 64.7 1.7 0.147 0.29 <0.010 0.076 0.16
Table S2:Mean soil variables in all nineteen 1-ha plots.Eachvalue is theaverageof fourseparatemeasurements, eachmade on twenty pooled soil samples. Al, Fe,Mg,Mn, K, and Caconcentrationsareexpressedincmolofpositivechargesperkg.Valuesinitalicscorrespondtodisturbedplots.
Chapter1–DNA-basedBetaDiversity
101
pH Ctot Ntot P2O5 Cly Silt Al Fe Mg Mn K Ca
pH 1 -0.59 -0.41 -0.02 -0.53 0.09 -0.82 -0.17 0.27 0.19 -0.01 0.32
Ctot 1 0.91 -0.04 0.74 0.11 0.60 -0.03 0.05 0.08 0.33 -0.01
Ntot 1 -0.01 0.83 0.36 0.35 -0.31 0.33 0.41 0.47 0.31
P2O5 1 -0.01 0.27 0.08 -0.13 0.12 0.14 0.30 0.07
Clay 1 0.24 0.48 -0.45 0.27 0.41 0.51 0.29
Silt 1 -0.22 -0.39 0.76 0.67 0.52 0.76
Al 1 0.15 -0.38 -0.38 0.04 -0.45
Fe 1 -0.34 -0.56 -0.23 -0.48
Mg 1 0.70 0.60 0.91
Mn 1 0.56 0.81
K 1 0.59
Ca 1
VIF 5.035.9 44.8 1.5 12.5 4.58.33.3 7.2 5.0 3.2 10.8
TableS3:Correlationcoefficientsbetweensoilvariablesinthefifteenundisturbedplots.Bold font indicates correlation coefficients above 0.70. Variance Inflation Factors (VIF) arecomputedasthediagonalelementsoftheinversecorrelationmatrix.
Chapter1–DNA-basedBetaDiversity
102
Geographicaldistance Soil
Mean𝐷!"#$%&$%
𝑟𝒅𝒊𝒔𝒕 𝑟𝒅𝒊𝒔𝒕,𝒑𝒂𝒓𝒕 slope𝒅𝒊𝒔𝒕 𝑟!"#$ 𝑟𝒔𝒐𝒊𝒍,𝒑𝒂𝒓𝒕 slope𝒔𝒐𝒊𝒍
PlantstrnL 0.42 0.65*** 0.61*** 0.038 0.29*** 0.06 0.011
Plants18S 0.50 0.22** 0.15* 0.022 0.24*** 0.17* 0.016
FungiITS 0.87 0.43*** 0.29*** 0.029 0.54*** 0.45*** 0.025
Fungi18S 0.45 0.31*** 0.20* 0.022 0.39*** 0.31*** 0.019
Insects16S 0.89 0.23*** 0.16** 0.013 0.25*** 0.18** 0.010
Insects18S 0.57 0.07 0.05 0.008 0.06 0.03 0.005
TableS4:Linearregressionoftaxonomicdissimilarity(Sorensenindex)againstsoilandgeographicaldistance:comparisonbetweenbarcodeswithintaxonomicgroups(cf.Table2). 𝑟!"#$ ,𝑟!"#$ ,𝑟!"#$,!"#$ ,𝑟!"#$,!"#$are the simple and partial Pearson’s correlation coefficients.SignificancewasassessedusingManteltests:***forp<0.001;**for0.001<p<0.01;*for0.01<p<0.05.
Puresoilfraction Mixedfraction Purespatial
fractionTotalexplained
variance
PlantstrnL 2.4*** 7.8 11.0*** 21.1***
Plants18S 1.4 6.8 15.1*** 23.3***
FungiITS 3.8*** 4.9 5.9*** 14.5***
Fungi18S 4.1*** 15.2 11.8*** 31.2***
Insects16S 0.1 1.3 1.5** 2.9***
Insects18S NA NA NA NA
Table S5: Fractionsof variance (adjustedR2, in%)explainedbyCanonicalRedundancyAnalysis for environment-only and spatial-onlymodels: comparison between barcodeswithintaxonomicgroups(cf.Table3).Significance:***forp<0.001;**forp<0.01;*forp<0.05.
Chapter1–DNA-basedBetaDiversity
103
𝑅 1 𝑎(×10!) 𝑏/𝑎 log!" 𝜎! 𝜈
PlantstrnL 0.26*** 0.55 24 -20
Plants18S 0.16** 0.24 33 -28
Bacteria16S 0.22*** 6.8 48 -42
Protists18S 0.33*** 0.075 46 -40
FungiITS 0.14*** 2.3 13 -11
Fungi18S 0.23*** 0.76 31 -26
Arthropods18S 0.07 0.71 43 -37
Insects16S 0.08** 1.1 15 -13
Insects18S 0.09 0.11 33 -28
Annelids18S 0.25*** 0.061 30 -26
Nematodes18S 0.06 1.0 53 -46
Platyhelminthes18S 0.10 0.13 34 -30
Table S6: Fitting the neutral prediction for the decay of taxonomic similarity withdistance(Chave&Leigh,2002).𝐹! 𝐴,𝐵 = 𝑝!!𝑝!!!
!!! ,where𝑝!!istheproportionofspeciessin sample A and𝑝!! that in sample B, is regressed against the log-transformed geographicaldistance r between samples (expressed in meters), as𝐹! 𝑟 = 𝑎 ln 𝑟 + 𝑏:𝑅is the correlationcoefficient;***,**and*denotethesignificanceassessedbyManteltest(𝑝 < 0.001,𝑝 < 0.01and𝑝 < 0.05,respectively);and𝜎! 𝜈(expressedinsquare meters)istheratiobetweenthevariance𝜎!ofthedispersalkernelandtheneutralspeciationprobability𝜈.
Chapter1–DNA-basedBetaDiversity
104
Loggingonly
Pureloggingfraction
Mixedfraction
Puresoilfraction
Totalexplainedvariance
PlantstrnL 12.9*** 4.0*** 9.0 3.5** 16.4***Plants18S 10.9*** 3.1* 7.8 2.3 13.1***Bacteria16S 11.9*** 1.9* 10.0 18.2*** 30.1***
Protists18S 4.8** 1.1 3.6 4.8** 9.6**FungiITS 4.3*** 1.6*** 2.6 5.2*** 9.4***Fungi18S 7.6*** 4.7*** 2.9 8.6*** 16***Arthropods18S 1.7 NA NA NA NAInsects16S 1.6* NA NA NA NAInsects18S 3.4 NA NA NA NAAnnelids18S 6.0* 6.4* -0.3 5.4* 11.4**Nematodes18S 1.9* NA NA NA NAPlatyhelminthes18S 0.6 NA NA NA NA
TableS7:Fractionsofvariance(adjustedR2,in%)explainedbyCanonicalRedundancyAnalysisforloggingintensityandforsoilconditions.Significance:***forp<0.001;**forp<0.01;*forp<0.05.
Chapter1–DNA-basedBetaDiversity
105
Taxonomicgroup Selectedspatialvariables
Bacteria16S UTMN.Nouragues UTME.Nouragues Nouragues.MEM.1 Nouragues.MEM.5
Protists18S UTMN.Nouragues UTME.Nouragues Nouragues.MEM.1 Paracou.MEM.1PlantstrnL UTMN.Nouragues UTME.Nouragues Nouragues.MEM.1 Nouragues.MEM.5 Paracou.Nouragues UTME.Paracou Plants18S UTMN.Nouragues UTME.Nouragues Nouragues.MEM.1 Nouragues.MEM.4 Nouragues.MEM.5 Nouragues.MEM.8 Nouragues.MEM.12 Nouragues.MEM.15 Nouragues.MEM.16 FungiITS UTMN.Nouragues UTME.Nouragues Nouragues.MEM.1 Fungi18S UTMN.Nouragues UTME.Nouragues Nouragues.MEM.1 Arthropods18S UTMN.Nouragues UTME.Nouragues Annelids18S UTMN.Nouragues Nouragues.MEM.1 Paracou.MEM.3 Nematodes18S UTMN.Nouragues Platyhelminthes18S Noselectedmodel Insects16S UTME.Nouragues Paracou.Nouragues Nouragues.MEM.2 Nouragues.MEM.11 Nouragues.MEM.15 Insects18S Noselectedmodel
TableS8:Selectedspatialmodelsafterforwardvariableselection.Selectionisappliedonthe following variables: UTM coordinates in Nouragues and Paracou (‘UTMN.Nouragues’,‘UTME. Nouragues’, ‘UTMN.Paracou’, ‘UTME.Paracou’), the dummy variable connectingNouragues and Paracou sites (‘Paracou.Nouragues’), and PCNM variables in Nouragues(‘Nouragues.MEM.1’ to ‘Nouragues.MEM.17’) and Paracou (‘Paracou.MEM.1’ to‘Paracou.MEM.7’),whichrepresentdifferentpossiblepatternsofspatialautocorrelation.
Chapter1–DNA-basedBetaDiversity
106
Figure S1: Rarefaction analyses. In each sample and for each barcode, we sampled withreplacementbetween1and8,000reads,andplotted thecorrespondingnumberofOTUs(onecurvepersampleandperbarcode).
Eukaryotes18S
PlantstrnL FungiITS
Bacteria16S Insects16S
Chapter1–DNA-basedBetaDiversity
107
Figure S2: Occurrence-based (Sorensen) dissimilarity as a function of log-distance;comparisonbetweenbarcodeswithintaxonomicgroups(cf.Fig.4).Theredlinefiguresthelinearregression.
●●
●
●
●
●●●●●●
●
●
●
●
●●
●●●
●
●●
●●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●●●●
●
●●
●
●
●●●
●●
●●
●
●
●
●●
● ●
●●
●●●●●
●●●●
●●
●
●
●
●●●
●●●●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●●●●●●
●
●
●●●
●
●
●
●
●●●●●
●●●
●●●●●●●● ●
●
●
●●●●
●●●
●
●
●
●
●
●
●
●
●●●
●
●●●
●
●●●●●●●●●
●●
●
●●●●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●●●●●
●●●●
●
●
●
●
●●
●
●
●
● ●●●
●
●
●●●
●
●
●
●
● ●
●
●
●●
●
●
●●●
●
●●●●
●
●
● ●
●
●
●
●●●●
●●
●
●●
●●●
●
●●
●
●
●
●●●●●●●
●
●
●
●
●●●
●
●●●●●
●●●●●●●●●●
●●●
●
●●
●●●●
●
●
●
●
●
●
●
●●●●●
●●●
●
●●●●●●●●
●●●●●
●●●
●●●●●
●
●
●●
●
●
●
●●●
●
●
●●●●●
●
●●●
●
●●
●
●
●●
●
●●
●●
●●
●
●●
●
●●
●
●●●
●
●
●●
●
●●
●
●●
●
●
●●
●
●●●●
●●●
●●●
●
●●●
●
●●●●
●●
●
●●
●
●
●●
●●
●
●●●●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●●
●
● ●●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●●
●
●
●●
●
●
●●●●●
●●●
●
●●●●
●●
●
●●●●
●●●●
●●
●
●●●●
●●
●
●
●
●
●●●●
●
●
●●●
●●●
●●
●
●
●●
●
●●●●
●●
●
●●
●
●
●●●
●
●●
●
●●
●●●
●
●
●
●
●●
●●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●●●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●●
●●
●
●●
●●●
●
●●●●●●●●
●
● ●
●
●●●
●
●
●
●●●
●
●
●
●●
●●●
●
●●
●
●
●●
●
●
●
●
●●●●●
●
●●
●
●●●
●●●
●
●
●
●
●
●
●●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●●●●
●
●●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●● ●●
●●●
●
●●●●●
●
●
●●
●
●
●●●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●●
●
●●
●
●
●●
●
●●●
●
●●
●
●●
●●
●
●
●
● ●
●
●
●
●
●●●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
● ●●
●
●●●
●
●●●●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●●
●●
●
●
●
●
●
●●
●
●●●
●●
● ●
●
●
●●
●
●●●
●●
●●
●●●●
●
●
●
●
● ●●●
●●●
●●●●●●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●●
●●
●
●
●●
●
●
●
●●●●●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●●
●
●
●
●●
●●
●
●
●●●●
●●●●
●●
●
●
●●
●●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
● ●●
●
●
●
●
●
●●●
●●●●●●●●
●
●●●●
●
●
●
●
●
●●●
●●
●
●
●●●
●●●
●●●
●
●
●●●●
●●●●
●
●●
●
●●●
●
●
●
●●●
●●
●
●●
●●●
●
●●●●
●●●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●●●●
●●
●
●●
● ●●
●
●●
●
●●
●
●●
●
●●
●●
●
●●●●●
●●
●●
● ●●
●
●●●●●
●
●●
●
●
●●●●●●●●●●●
●●
●
●●●●
●
●●
●
●
●
●●●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●●●●
●
●●
●
● ●
●
●
●
●
●
●
●●
●
●
●●
●●●●●●
●
●
●●
●
●
●
●●●
●
●●
●● ●●
●
●●●●●
●●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●●
●
●●●
●
●●●
●
●
●
●
●
●
●
●
●●●●●
●
●●
●
●●●●
●
●
●
●
●
●
●●
●●●●●
●
●●●
●●●
●●
●
●
●●
●
●
●●
●
●●●
●●●●●
●
● ●
●
●●
●●●●●
●
●●●●●
●
●
●●●●
●●●
●●●
●
●
●●●●
●
●●
●
●●●●
●●
●●
●
●●●
●
●
●
●
●
●●
●
●●●
●
●●●
●
●
●
●
●●●
●
●
●
●●●
●●●●●●
●●●●●●
●
●●
●
●●
●
●
●●
●
●●●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●●
●●
●●
●
●
● ●
●●
●
●
●
●●
●
●
●
●
● ●●●●
●●●●
●
●●
●
●
●
●●
●●
●
2 3 4 5
0.0
0.2
0.4
0.6
0.8
1.0
Insects 16S
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●●
●
● ●
●
●●
●
●●
●
●
●●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
● ●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●● ●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●●
●
●
●
●
●●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
● ●
●
●●
●
●
●
●
●●
●
●
●●
●●
●●
●
●●
●●
●
●
●
●
●
●●●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
● ●●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●● ●
●●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●●●
●
●
●
●
●
●
●
●
● ●●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●● ●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●● ●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●●●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●●●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
● ●
●●
●
●
●
●
●
● ●
●
●●●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
2 3 4 5
0.0
0.2
0.4
0.6
0.8
1.0
Insects 18S
●●
●●
●
●
●
●●●●
●
●
●
●●
●●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●●●
●●●●●●●
●
●●
●●
●●●●●
●
●●●
● ● ●
●●
●
●●●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●●
●●●●●●●
●●●●
●●●●●●●●●
●●●
●
●●
●
●
●
●●●●
●
●●
●
●
●
●
●●
●
●●
●
●●●
●
●
●●
●●
●●●●●●●●●
●●
●●●●●●●●●●●●●
● ●●
●
●●●●
●●
●
●●
●●
●●
●
●●
●
●
●●
●
●
●●
●
●
●●●●●●●
●●●●●
●●●●●●●●
●●●●
●
●
●
●●●●
●
●●
●●
●
●
●●
●
●
●
●
●
●●●●
●●●
●
●●●
●●●●●
●
●
●
●
●●
●
●●●
●
●●●●●
●
●
●●●
●
●●●
●●●●
●●●
●
●
●
●
●●
●
●
●●
●
●
●●
●●
●●
●●
●●●● ●●
●
●●●●●●●●●
●
●●●●
●
●
●
●
●●●
●●●
●
●
●
●
●●
●
●
●●
●
●
●●●
●●●●●
●●●
● ●●●●●●●●●●●●
●●
●●
●●
●
●●
●
●
●●
●
●
●
●
●
●●●●
●●
●●
●●●●●●●●
●
●●
● ●
●●●●●●●●●●●
● ●●
●
●●
●
●●
●●●●
●●
●
●●●●
●
●
●
●● ●●●●
●●●●●●
●●
●
●●
●●●●●
●
●●
●
● ●
●
●●
●
●
●
●●●
●
●●●
●●●●
●
●
●
●●●
●●●●●●●●●●●
●
●
●●●●●●
●
●●●
●
●
●●
●
●●
●●●
●●●
●
●●
●●●
●●●●
●
●●●●●●●●●●●
●
●
●●●●●●
●●●
●
●
●●
●
●●
●●●
●
●
●●
●●●●●●
●●●
●
●●●●●●●
●●●●
●
●●
●●
●
●
●
●
●●
●●
●
● ●
●
●
●●
●
●
●
●
●●●●
●
●●
●●●
●●●●●●●
●●
●
●
●
●●
●
●
●●●
●●●●●● ●
●●
●●●
●
●
●
●●
●
●
●
●
●
●● ●
●●●●●
●●●●●
●●●●●●●●●●●●●
●
●
●
●●●
●
●●●
●
●●
●
●
●●●●
●●●●●●●●●
●●●●●●●●●●●●●●●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●●●
●
●●●●●
●●
●
●
●●●●●●●
●●●●●
●● ●
●
●
●●
●●●●●
●●
●● ●
●●●●●●●●
●
●
● ●●●●
●
●●●
●●●●● ● ●●
●●●
●
●●
●
●
●●
●●●●●●●●●●
●●●●
●
●●●●●●●●●●●
●●
●
●
●●
●●
●●
●
●
●
●●
●
●●●●●●●
●●●●
●
●●
●●●●●●●●●
●
●
●
●
●
●
●●●
●
●
●
●●
●●●●●●●●●●
●
●●●
●
●
●●●●
●●●●
●
●
●
●
●
●●●●
●
●
●●
●●●●●●●●
●●●
● ●●
●
●●●●
●
●
●●●
●●
●
●
●●●●●
●●●
●●●●●●●●
●●
●
●●●●●●●●●●●●●●●●
●●●● ●
●●● ●●●●●●●●●
●●
● ●●●●●●●●●●●●
●
● ●
●
●
●
●●
●●
●●
●●●●●●
●
●
●
●
●●●●●●
●
●●●●●● ● ●
●● ●
●●● ●●●●●●●●●
●●●
●●●●●●●●●●●●
●
●●
●●●●
●
●●●
●●●●●
●●
●
●
●●●
●●●
●
●●●●●
●●
●●
●●
●
●●●●
●●●●
●●●
●
●●●●●●●●●●●●
●
●●●
●●
●
●●●●●●●
●●●
●
●●●
●
●
●
●●●●●●
●
●
●●●
●●●●●●●●
●●●
●
●●●●●●●●●●●●
●●●
●
●●●
●●●●●●●
●
●
●●●●●●●●●●●●
●
●●
●●●●●●●
●
●
●●
● ●●●
●
●●●●●●●●
●●
●●●
●●●●●
●●●
●●●●
●●●●
●
●●●●
●
●●●
●●●●●●
●●
●
●●
●
●●
●
●
●●●●●
●
●●
●●●●
●●
●●
●
●●●
●●●
●
●●●●●
●
●● ●
●●●
●
●
●
●
●●●
●
●
●●●●●●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●●●●●●
● ●
●●
● ●
●
●
●
●●
●
●
●
●●●●●●●
●
●
●
●
●
●
●●
●●
●
●
●
●●●●●●●
●
●
●
●
●●●
●●
●
●
●
●
●●●●●●
●●
●●●●
●●
●
●
●
●●●●
●●●
●
●
●●
●
●●
●
●●
●●●●
●●●
●
●●
●
●●
●
●
●
●
●●●●●●
● ●
●
●●●
●●●
●●●●●●
●
●
●●●●●●●●●●●●
●
●●●●
●
●●●●●●●●●
●
●
●
●●●●●●●
●
●●
●
●
●●
●●●●
●
●
●
●●
●
●
●●●
●
●
●●
●
●●●●
●
●
●●●●
●●●●
●
●●●●
●
● ●●●
●●
●●●●
●
●●
●
●●
●
●●●
2 3 4 5
0.0
0.2
0.4
0.6
0.8
1.0
Fungi ITS
●
●● ● ●
●●
●●
●●
●
●●●●●
●
●●●●●
●
●
●
●●
●
●
●●
●●
●●●●
●
●●
●
●
●
●
●
●●●●
●
●
●
●●
●●
●●
● ●●
●
●
●
●
●●●
●
●
●
●●●
●●●●●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●●
●
●
●●
●
●●●●
●●
●●
●
●●●
●
●●●●●●●
●●
●●
●
●●
●●●●●
●
●●●●
●
●●
●
●
●●
●
●●
●
●
●●●
●
●●●●
●●●●
●
●●
●
●●●●
●●●●●●●●●●●
●
●
●
●●
●●●
●
●
●●
●
●●●
●●●
●
●
●
●
●
●
●●●●
●
●
●●
●●●●
●
●
●
●
●
●
●
●●
●●●
●●●
●●●
●
●●
●
●
●●●
●
●●
●●
●●
●
●●
●
●
●
●●
●
●●
●●
●
●
●
●●
●●●
●
● ●
●
●●●●●
●●●
●
●
●
●●●
●●●●
●
●●●
●
●●●
●
●
●●
●●●
●
●
●
●
●
●●●●●
●●●●
●
●●●
●
●
●
●
●
●
●●●●●●●●●
●
●●●●●●●●
●●●
●
●
●
●●●
●●
●
●
●●
●
● ●
●●
●
●
●
●●
●
●●●
●
●●●●●
●
●●
●
●
●
●●●●
●
●●●●
●
●
●●●
●●
●●●
●●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●●●●
●
●●
●
●
●●●●●●●
●
●●
●
●●
●●●●
●
●
●●●
●●
●●●●
●
●
●
●
●
●
●●●●
●
●
●
●
●●●●
● ●
●
●
●●●●●
●
●●●●
●
●●●●●
●
●●●
●
●
●●●●
●●
●
●
●
●
●
●
●●●●
●
●●
●
●
●
●●
●
●
●●●
●●●
●●
●
●●
●
●
●
●●
●●
●
●●●
●
●●●●●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●●●●
●●●●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●●●
●●
●
●
●●
●●
●
●
●
●●
●●●
●●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●●
●●●●●
●
●●●●
●
●
●
●
●●●
●
●●●
●●●
●
●
●●●●
●●
●
●
●●
●
● ●
●●●
●
●●●
●
●●●
●●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●●●●
●
●●
●
●
●●●
●
●
●●
●
●●●●
●●
●
●
●●●
●
●●
●●
●●●
●●●●●
●●●
●
●●
●
●
●●
●
● ●●
●●
●
●●
●
●●
●
●
●●
●● ●●●
●●
●●●
●
●
●
●
●●
●
●●●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●●●
●
●
●●●
●
●●●●
●●
●
●
●●
●
●
●●●●
●
●●●
●●●
●
●
●
●
●●
●
●●
●
●
●●
●
●●●
●
●●●
●
●
●
●●●
●
●
●●●●
●
●
●●
●●
●
●
●
●
●
●
●
●●●
●
●●
●●
●
●●●
●
●●
●
●
●
●●●
●●
●●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●●
●●●●
● ●
●
●
●●●
●
●
●
●●
●●
●
●
●●●
●●
●
●
●●
●
●
●●
●●
●
●
●●
●
●●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●●●●●●
●
●
●
●
●
●
●
●●●
●
●
●●
●●●●●
●
●
●●
●●
●
●●●●●
●
●●
●
●
●
●
●
●●
●
●●
●●●
●
●●
●
●●
●
●
●●●
●
●
●
●
●●
●●● ●
●●●
●
●
●
●
●●
●●
●●●●
●
●●
●
●●
●●
●●
●
●
●
●
●
●●
●
●
●
●●
●●●
●
●
●
●
●●
●
●●●●
●
●●●
●●●
●
●
●●
●
●
●●●
● ●●●
●●●
●
●
●
●
●
●
●●●
●
●
●●●
●
●●
●●
●
●●
●
●
●
●●
●●●
●●●●
●
●●
●
●
●●●
●
●●
●
●●●
●
●
●●
●
●●
●●
●
●●●
●●●
●
●
●●
●
● ●●●
●
●
●
●●
●●●●
● ●
●
●
●
●●
●●
●
●●
●●
●
●●
●
●
●●●
●
●
●
●●
●
●
●●
●
●●
●
●●
●●●●
●
●
●
●
●●●
●●
●
●
●
●●●
●
●
●●
●
●
●
●●●
●
●●
●●●●
●
●●●
●
●
●●●
●
●
●●
●●
●
●
●●
●
●
●●●
●●●●
●
●●
●
●
●
●
●
●
●
●●●●●●●
● ●●
●
●
●
●
●●
●
●
●●
●●●●●
●
●
●
●
●
●
●●●
● ●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●●
●
●
●
●●
●
●●●
●
●●
●
●
●●●●
●●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●●●
●
●
●
●
● ●●
●
●
●
●
●●
●●●
●
●
●
●
●
●●
●
●
● ●●
●
●
●
●
●
● ●●●●●●
●
●●●●●
●
●
●
●
●
●
●
●
●
●●●●●●
●●
●
●●
●
● ●
●
●
●●
●
●
●
●●
●●●
●
●●●●●
●
●
●
●
●●
●
●●●
●●●●●●●●●
●
●
●●
●
●
●●
●●●●
●●
●●●●
●
●
●
●
● ●●
●
●●
●
●
●
●
●●
●
●●
●
●
●●●●●●
●
●
●●●●
●
●
●
●
●
●●
●
●
●●
●●●●
●
●
●●
●●
●●●●
●●●
●
●
●●
●●
●
●
●●
●●●●
●●●●●●
●●
●●●●
●●
● ●●
●
●
●
●
●
●●
●
● ●
●●
●●●●
●
●
●
●●
●●●
●
●
●
●●
●●
●●
●
●
● ●
●
●
●●
●●●●
●● ●●
●●●
●
●
●
●
●
●
●●
●
2 3 4 5
0.0
0.2
0.4
0.6
0.8
1.0
Fungi 18S
●
●
●
●●
●● ●●●●
●●●●
●●●●●●●● ●●●
●
●
●
●
●●●●
● ●
●●●
●
●
●
●●
●
●●
●●●●
●●
●
●
●
●●
●
●●
● ●
●● ●
●●●
●●
●●
●●●●
●●●●
●●●
●●
●
●●●
●●●
●●●●
●
●●●●
●●●
●●
●●
●
●
●
●
●
●●
●
●●
●
●● ●●●●
●●●●●●
●●
●●●● ●
●●●
●
●●
●
●●●● ●
●●
●●
●●●●●●●
●●●
●
●
●●
●
●
●●●
● ●
●●
●●●●
●●
●●
●●●●
●●●● ●●
●
●●
●●
●●●●●●
●●
●
●●●
●●●
●●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●●●●●
●●
●
●●●●
●●●●
●●●●●
●●●
●●●●●
●●●●●●●●
●
●●
●●●●●
●
●●
●
●
●●
●
●●●●●
●●●●
●●●●●●●●
●
●
●●●●●
●
●
●●
●●
●●●
●
●●●●●●
●●●
●
●
●
●●
●
●●
●●
●●●●●
●●●
●●●●●●●●● ●
●
●
●
●●
●
●●●●●●●●●
●●●●●
●
●
●●●
●
●
●
●
●
●●●●●
●●●●
●●●●
●●●●●●●●
●●●●●
●●
●●●●●●
●●●●●
●●●●●
●●●●
●
●
●●
●
●●●
●
●●●
●
●●●
●●
●●●
●●●●●●●●
●●
●
●●
●●
●●●●●●●●
●●
●
●
●●●●
●
●●
●
●
●
●
●
●●
●●●
●
●●
●●●
●●●●●●
●●
●
●
●
●●●● ●
●
●●●●
●●
●●●●
●●●●
●
●
●
●
●●●
●
●
●●●●
●●
●●●●
●●
●●
●
●●
●●
●
●
●●●
●
●●●●●●●
●●●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●●
●
●●
●
●●●●
●●●●
●●
●
●●
●●
●
●●●
●●●●
●
●●
●
●●●●
●
●●
●
●
●●
●
●
●
●
●●
●
●●
●●
● ●●●●●●●●
●●●●
●●
●●
●●●
●●●●●
●●●●
●
●●
●
●
●●●
● ●●●●●●●
●● ●
●●●
●●
●
●
●●
●●
●●●
●
●
●●●
●●●●
●●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●●●●●●●
●
●
●
●●
●
● ●●●
●●
●
●●
●
●
●● ●●
●
●
●
●
●
●
●
●●
●
●●●●●
●●
●
●●
●●●
●
●
●
●●
●● ●●●
●
●
●●
●●●
●●●●
●
●
●
●●
●●
●
●●
● ●●
●
●
●●
●
●●
●●
●●
●●●●●
●●●●●●●●●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
● ●
●●
●
●
●●●●
●●
●
●●
●●●●●●●●●
●●●
●●
●●●●
●
●●
●
●
●●
●
● ●
●●
●●
●●●●
●
●
●●●●●
●●●●●●●
●
●●●●
●●●
●
●
●●
●
●
●
●
●
●
●●
● ●●●
●●
●●
●●●●● ●
●●●
●●●
●
●●●●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●●●
●●
●●●
●●
●
●●●●●●●●●
●●
●●
●●
●
●
●
●
●
●●
●
● ●
●●●●●●●●●●●●
●
●●
●
●●●
●●●●●
●●
●●
●
●●
●
●
●●
●
●
●●
●●●●●●
●●
●●
●
●
●
●●●●●●●
●●
●●
●
●
●
●●
●
●●●
●
●
●
●●●
●●
●
●●
●●●
●●●●●●●
●●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
● ●●
●●●●
●●
●●●●●
●
●●●
●● ●●
●●
●
●
●
●●●
●
●
●
●
●●
●
●
●●●● ●
●●●●●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
● ● ●
●
●
●●●●
●
●
●●
●●●●
●
●●●
●●●●
●
●●
●
●
●●
●
●
●
●
●
●●●
●
●●●
●
●●●●●
●
●●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●●●● ●
●●●●●●
●
●
●●●
●●●
●
●
●●
●
●
●
●
●
● ●
●●●● ●●●●●●●
●
●●
●●
●●
●
●
●
●●
●
●
●
●●
●
●●●●
●
●●●●●●
●
●●
●
●●
●●●
●
●●
●●●●●
●●
●● ●●●
●
●●●●
●
●
●●●●●●
●
●●
●
●
●●
●
●●
●
●
●●●●●●●
●●
●●
●●●●
●
●●●
●
●●●
● ●
●●●●●●●●
●●●●
●●
●●
●
●●
●
●
●●
●
● ●
●●●●●●●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●●
●
●●●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
● ●●
●
●●
●
●
●
●
●
●
●
●●
●
●●●
●
●
● ● ●●●
●●●●
●
●
●
●
●
●
●
●
●●●●
●
●●●●
●●
●● ●
●●
●
●
●
●
●
●●●
●
●
●●
●
●●
●
●●●
●
●
●
●●
●
●
●●
●
●●●
●●
●● ●
●
●
●
●
●●
●●
●●
●
● ●
●●●
●●●
●
●
●
●●
●
●●●
●
●
●●●
● ●●
●
●
●
●
●
●
●
●●
●
●●●●
●●
●
●
●
●
●
●●●
●●
●
●
●
●●●●
●
●●
●
●●●
●
●●
●
●
●
●
●
●●
●●●●
●
●
●●
●●
●
●●
●
●
●●
●●●
●
●
●
●
●
●●●●●
●
●
●
●
●
●
●●●
●
●●
●
●●
●
●
●
●●●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●●
●
●●
●●
●
● ●
●●
●
●●●
●
● ●
●
●●
●
2 3 4 5
0.0
0.2
0.4
0.6
0.8
1.0
Plants trnL
●
●
●
●
●
●
●
●●
●●●
●●
●●
●
●●●●
●
● ●
●
●
●
●
●
●●
●●●●
●●
●●●
●●
●
●
●
●
● ●●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●●●
●
●●●●
●●●
●
●
●
●
●
●
●
●●●●●●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●●●●
●
●●
●
●
●●
●
●
●
●
●●●●
●●●●
●
●●
●
●●
●
●
●●
●●
●
●
●
●●●
●
●
●
●
●●
●
●●
●●
●
●
●●
●
●●
●●●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
● ●
●
●
●
●●
●
●
●
●●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●●
●●●
●●
●●
●
●
●●●
●
● ●●
●
●
●●●●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
● ●
●
●●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●●
●
●
●
●
● ●
●
●●●
●
●
●●
●
●
●
●
●
●
●●●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●●●
●●
●●●●●●
●●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●●●●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●●
●
●●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●●●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●●
●
●
●●
●
●
●
●
●●
●●
●
●●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●●
●●
●●
●
●
●
●●●
●
●●
●●
●●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
● ●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●●
●●●●
●●
●
●
●
●
●
●
●●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●●●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●●
●
● ●
●
●●●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
● ●●
●
●
●●
●
●
●
●●
●
●●●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●●●
●
●
●
●
●
●
●●
●●●●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●●●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●●●
●
●
●
●
●●●
●
●
●
●
●●
●
●
2 3 4 5
0.0
0.2
0.4
0.6
0.8
1.0
Plants 18S
Sore
nsen
dis
sim
ilarit
y
Log10 of geographical distance (m)
Chapter1–DNA-basedBetaDiversity
108
Figure S3: Occurrence-based (Sorensen) dissimilarity as a function of log-distance;comparisonbetweenbarcodeswithintaxonomicgroups(cf.Fig.5).Theredlinefiguresthelinearregression.
Sore
nsen
dis
sim
ilarit
y
Soil dissimilarity
●●
●
●
●
● ●● ● ●
●
●
●
●
●
●●
●●●
●
●●
●●
●
●●
●●
●
● ●
●
●
●
●
●
●
●
●
● ● ●●
●
● ●
●
●
●●
●
●●
●●
●
●
●
●●
● ●
●●
●●●● ●
●● ● ●
●●
●
●
●
●● ●
● ●●
●
●
●
●
●●
●
●
● ●
●
●
● ●
●
●
●
●●●● ● ●
●
●
●●●
●
●
●
●
●● ●
●●●
● ●
●●●●
●●● ●●●
●
●●●●
●●
●
●
●
●
●
●
●
●
●
● ● ●
●
●●●
●
●● ●
●●●●
●●
●●
●
● ● ●●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●●●●
●
● ●● ●
●
●
●
●
●●
●
●
●
●●● ●
●
●
●●
●
●
●
●
●
●●
●
●
● ●
●
●
●●
●
●
● ●●●
●
●
●●
●
●
●
●●●●
● ●
●
●●
● ●●
●
● ●
●
●
●
●●●●●●
●
●
●
●
●
●●●
●
●●●●●
●●
●●● ●
●●● ●
●●
●
●
●●
●●●
●
●
●
●
●
●
●
●
● ● ● ●●●● ●
●
●● ● ●●
●●●
●●
● ●●
●●●
●●
● ●●
●
●
●●
●
●
●
●● ●
●
●
● ●●●●
●
●●
●
●
●●
●
●
●●
●
●●
●●
●●
●
●●
●
● ●
●
●●●
●
●
● ●
●
●●
●
●●
●
●
●●
●
●●
●●
●●
●
● ●●
●
●● ●
●
●●
● ●
●●
●
● ●
●
●
●●
●●
●
●●● ●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●●
●
● ●●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●●
●
●
●●● ●●
●●
●
●
●●●●
●●
●
● ●●●
●● ●
●
●●
●
●●● ●
●●
●
●
●
●
●● ●●
●
●
●●●
●●●
●●
●
●
●●
●
● ●●●
●●
●
●●
●
●
● ●●
●
●●
●
●●
●●
●
●
●
●
●
●●
●●
●
●
● ●●
●
●
●
●
●
●
●
●●
●
●●●
●●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
● ●
●
●
●●
●
●
●●
● ●
●
●●
●●●
●
●●● ●●●●
●●
● ●
●
●●●
●
●
●
●●
●
●
●
●
●●
●●
●●
●●
●
●
●●
●
●
●
●
●●
●●●
●
● ●
●
●●
●
●●●
●
●
●
●
●
●
●●●
● ●
●●
●
●
●
●
●
●
●
●
●
●
● ● ● ●●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●●● ●●
●●
●
●
● ●●
●●
●
●
●●
●
●
● ●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●●
● ●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●●
●
●
●
● ●
●
●
●
●
●●●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●●●
●
●●
●
●
● ●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●● ●
●
●
●
●
●
●
●
●
●
●●
● ●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
● ●
●
●●●
●●
● ●
●
●
● ●
●
●●
●
●●
●●
●●
●●
●
●
●
●
●●●●
●●●
● ●●●●●
●
●
●
●
●
●
●
●
● ● ●
●
●●
●
● ●
●●
●
●
●●
●
●
●
●●●
● ●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●●
●
●
●
●●
●●
●
●
●● ●●
●●●
●●
●
●
●
●●
●●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
● ●●
●
●
●
●
●
●● ●
●●
●●
●●
●●
●
●●
●●
●
●
●
●
●
●●● ●
●
●
●
●●
●
●● ●
●●
●
●
●
●● ●●
●●●●
●
● ●
●
●●●
●
●
●
●●●
●●
●
●●
●●●
●
●●● ●
●●●●
●
●
●
●●
●
●
●
●
●
●●
●
●
● ●●
●
●
●
●
●●●
●
●●
●
●●
● ●●
●
●●
●
● ●
●
●●
●
●●
●●
●
●●●● ●
●●
●●
● ●●
●
●●
●● ●
●
● ●
●
●
●●●
●●
● ●● ●●●
●●
●
●●●
●
●
●●
●
●
●
● ●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●● ●
●
●
●●
●
● ●
●
●
●
●
●
●
●●
●
●
●●
●●● ●●
●
●
●
●●
●
●
●
● ●●
●
●●
●●●●
●
●●●●
●
●●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●●
●
●●●
●
●●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●●●
●
●
●
●
●
●
●
●●
●●
● ●●
●
●●●
●●
●
● ●
●
●
●●
●
●
● ●
●
●●
●
●●
●●
●
●
● ●
●
●●
●●● ●●
●
●●
● ●●
●
●
●● ●● ●
●●
●●
●
●
●
● ●●●
●
● ●
●
●●● ●
●●
● ●
●
●●●
●
●
●
●
●
●●
●
●●●
●
●●●
●
●
●
●
● ●●
●
●
●
●●●
●●●
●●
●
●●
● ●●●
●
●●
●
●●
●
●
●●
●
●●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●●
●●
●●
●●
●
●
●●
● ●
●
●
●
●●
●
●
●
●
●●●●
●●
●●●
●
●●
●
●
●
●●
●●
●
0 2 4 6 8
0.0
0.2
0.4
0.6
0.8
1.0
Insects 16S
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
● ●●
●
●●
●
●●
●
●●
●
●●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●●●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
● ●
●
●
●
●
●●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●●
●●
●●
●
●●
● ●
●
●
●
●
●
● ●●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●● ●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
● ●
●●
●
●
●●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
● ●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
● ● ●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●●
● ●
●● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
● ● ●
●
●
●
●
●
●
●
●
● ● ●●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●● ●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●● ●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
● ●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●●
●●
●
●
●
●
●
●●
●
●● ●●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
0 2 4 6 8
0.0
0.2
0.4
0.6
0.8
1.0
Insects 18S
●●
●●
●
●
●
●●
●●
●
●
●
●●
●●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
● ● ● ●● ● ●●
●●
● ●
●●●●
●
●
●●●
●●●
●●
●
●●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●●
●●
● ●●● ●
● ●
● ●●
●
●●● ●
●●● ●●
●●
●
●
●●
●
●
●
● ● ●●
●
●●
●
●
●
●
●●
●
●●
●
●●●
●
●
●●
●●
● ●●●● ● ●● ●
● ●
●●●
●●●
● ● ●●●
●●
●●●
●
●●
●●
●●
●
●●
● ●
●●
●
●●
●
●
●●
●
●
● ●
●
●
● ●●● ●
● ●
● ●●
●●
●●
●●●● ● ●
●●●
●●
●
●
● ●●
●
●
●●
● ●
●
●
●●
●
●
●
●
●
● ●●●
● ● ●
●
● ●●
● ●●
●●
●
●
●
●
●●
●
●●●
●
●●●
●●
●
●
●●
●
●
●●
●
●●●●
●●
●
●
●
●
●
● ●
●
●
●●
●
●
●●
●●
● ●
●●
●●
●●●●
●
●●● ●
●●●
●●
●
●●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
● ●●
●●●
●●
● ●●
●●● ●●●●
●●●
●●●
●●
●●
●●
●
●●
●
●
●●
●
●
●
●
●
● ●●
●
● ●
●●
● ●●●
● ●●
●
●
● ●
●●
●●●●
●●
●●●●●
● ●●
●
●●
●
●●
●●●
●
●●
●
●●
●●
●
●
●
●●● ●●●
●●
●● ● ●●●
●
● ●
●●
● ●●
●
●●
●
●●
●
●●
●
●
●
●●●
●
●●
●
● ● ● ●
●
●
●
●●●
●●● ● ●●●●
●●
●
●
●
●●
●●●
●
●
●●●
●
●
● ●
●
●●
●●●
●●
●
●
● ●
●●
●
● ●●●
●
●● ● ●●
●● ● ● ●●
●
●
●●●
●● ●
●●
●
●
●
●●
●
●●
●●
●
●
●
●●
● ● ● ●●●
●●
●●
●●●
● ●●●
●● ●●
●
●●
●●
●
●
●
●
●●
●●
●
● ●
●
●
●●
●
●
●
●
●● ●
●
●
●●
●●●
●● ●● ●
●●●
●
●
●
●
●●
●
●
●● ●
●●●●●
● ●●
●
●●●
●
●
●
● ●
●
●
●
●
●
●●●
●●● ●●
●● ● ●●
●●
● ●●●
● ●●● ●●
●
●
●
●
● ●●
●
●●
●
●
● ●
●
●
●●
●●●
●●
●● ● ●● ●
● ●●
●● ●●●●
● ●●●●●
●
● ●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●●
●● ●● ●
●
●
● ●●●
● ● ●
●●●●●
● ● ●
●
●
●●
●●●
●●
● ●
●●●
●●
●● ●●
● ●●
●
●●● ●●
●
●●
●
●●●●
● ●● ●
●●
●
●
● ●
●
●
●●
●●● ●●●
● ●●●
●●
●●
●
● ● ●●● ● ●● ●●
●
●●
●
●
●●
●●
●●
●
●
●
●●
●
● ●●● ●●
●● ● ●
●●
● ●
●●●
● ●● ●●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●● ●
●● ● ●● ● ●
●
●●●
●
●
●● ● ●
●●●●
●
●
●
●
●
●●●
●
●
●
●●
● ●●
● ● ● ●●●
●●
●●●
●
●●●●
●
●
●●
●
●●
●
●
● ●●
● ●
●●
●
●●
● ● ●●●●
●●
●
●●●●●
●● ● ●●
●●● ●●●
●●
●● ●●●● ● ●● ● ●
●●
●●● ●
●●● ●●●●
● ●●●●●
●
●●
●
●
●
●●
●●
● ●
●●
● ●●●
●
●
●
●
●● ● ●●●
●
●●●●●●● ●
●● ●
● ●●● ●● ● ● ●●● ●
●●
●●● ●●●● ●
●●●●●
●
●●
●●
●●
●
● ●●
●●
●●●
● ●
●
●
●●●
●●●
●
●●●
●●
●●
●●
●●
●
●●
● ●
●●
●●
●●●
●
●● ●●●●
● ●●●●●
●
●●
●●●
●
●●● ● ●
●●●
●●
●
●●●
●
●
●
● ●●●●
●
●
●
●●●
●●
●●● ●●●
● ● ●
●
●●● ●●●● ●●●●●
● ● ●
●
●●
●
● ● ●●●
●●
●
●
●●
●●●
●●●
●●●●
●
●●
● ●●●
●● ●
●
●
● ●
●●●●
●
●● ● ●●●●●
●●
● ●●
●● ● ●
●●
●●
●●
●●
●●●
●
●
●●
●●
●
● ●●
●● ●●
● ●●
●
●
●●
●
●●
●
●
● ●●●●
●
●●
● ● ●●
●●
●●
●
●●●
●●●
●
●●●●
●
●
●● ●
● ●●
●
●
●
●
●●●
●
●
●● ●●
●●
●
●●
●
●●
●
●
● ●
●
●●
●
●
●●
●●● ●●●
● ●
●●
● ●
●
●
●
●●
●
●
●
●●●
●●●●
●
●
●
●
●
●
●●
●●
●
●
●
● ●●● ●●
●
●
●
●
●
●●
●●
●
●
●
●
●
● ●● ●●
●
●●
●● ●
●
●●
●
●
●
●● ●●
●●●
●
●
●●
●
●●
●
●●
●●●●
●●●
●
● ●
●
●●
●
●
●
●
● ●●
●●●
● ●
●
●●●
●●●
●●●●
●●
●
●
●●●
●●●●●●●
●●
●
●●●
●
●
●● ●
●●●●●●
●
●
●
●●●
●●●●
●
●●
●
●
●●
●●●
●
●
●
●
●●
●
●
●●
●●
●
●●
●
●●●
●
●
●
● ●●
●
● ●●●
●
●●●●
●
●●●●
●●
●●●
●
●
●●
●
●●
●
●●●
0 2 4 6 8
0.0
0.2
0.4
0.6
0.8
1.0
Fungi ITS
●
●●● ●
●●
●●
●●
●
●● ●●
●
●
●●●● ●
●
●
●
●●
●
●
●●
●●
●●
●●
●
●●
●
●
●
●
●
●●
●●
●
●
●
● ●
●●
●●
●●●
●
●
●
●
● ●●
●
●
●
●●
●
● ●●●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
● ●
●
●
●●
●
●●●●
●●
●●
●
●●
●
●
●●●
●●●●
●●
●●
●
●●
●●●●
●
●
●●●●
●
●●
●
●
●●
●
● ●
●
●
●●
●
●
●●●
●
●●●●
●
●●
●
●●
●●
●●
●●●
● ●●●
●●
●
●
●
●●
● ●●
●
●
●●
●
● ●●
●● ●
●
●
●
●
●
●
●●
●●
●
●
●●
●●●●
●
●
●
●
●
●
●
●●
● ● ●
●●●
●●●
●
● ●
●
●
●●●
●
●●
●●
●●
●
●●
●
●
●
●●
●
●●
●●
●
●
●
●●
●●
●
●
● ●
●
● ●●●●
● ●●
●
●
●
●●●
●●
●●
●
● ●●
●
●●●
●
●
●●
●● ●
●
●
●
●
●
●●
●●●
●●
●●
●
●●●
●
●
●
●
●
●
●● ●
●●● ●
●●
●
●●
●● ●● ●●
●●●
●
●
●
●●
●
● ●
●
●
●●
●
●●
● ●
●
●
●
● ●
●
●●
●●
●●●●●
●
●●
●
●
●
●●●
●
●
●● ●●
●
●
●● ●
●●
● ●●
●●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●●●●
●
●●
●
●
●● ●
● ●●●
●
●●
●
●●
●●
●●
●
●
●●●
●●
●●
● ●
●
●
●
●
●
●
●● ●●
●
●
●
●
●●●●
●●
●
●
●● ●● ●
●
●●●●
●
●●●● ●
●
●● ●
●
●
● ●●●
●●
●
●
●
●
●
●
●●
●●
●
● ●
●
●
●
●●
●
●
● ●●
●●
●
●●
●
●●
●
●
●
● ●
● ●
●
●●●
●
● ●● ●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●●
●●● ● ●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●●
●●●●
●
●
●
●●
● ●
●
●
●
●●
● ● ●
●●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●●●●
●
●
● ●
●●
●●●
●
●● ● ●
●
●
●
●
●●●
●
●●●
●●
●
●
●
● ●●●
●●
●
●
● ●
●
●●
●●●
●
●● ●
●
●●●
●●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●●
●
●●
●
●
●●●
●
●
●●
●
● ●●●
●●
●
●
● ●●
●
● ●
●●
● ●●
●● ●
●●
● ●●
●
●●
●
●
●●
●
●●●
●●
●
●●
●
●●
●
●
●●
●●●●●
● ●
●●●
●
●
●
●
●●
●
● ●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●●
●
●●
● ●
● ●
●
●
●●
●
●
●● ●●
●
●●●
●●
●
●
●
●
●
●●
●
●●
●
●
● ●
●
●●
●
●
●●
●
●
●
●
●●●
●
●
●● ●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●● ●
●
●●
●●
●
●●●
●
●●
●
●
●
●●●
●●
●● ● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●●
●●●●
● ●
●
●
●●●
●
●
●
●●
●●
●
●
●●●
●●
●
●
●●
●
●
●●
●●
●
●
● ●
●
●●
●
●
●
●
●●
●
●●
●
●●
●
●
●
● ●●●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●●●●
●
●
●●
● ●
●
●●● ●
●
●
●●
●
●
●
●
●
●●
●
●●
● ●●
●
●●
●
●●
●
●
●●●
●
●
●
●
● ●
●●
● ●
●● ●
●
●
●
●
●●
●●
●● ●●
●
● ●
●
●●
●●
●●
●
●
●
●
●
●●
●
●
●
●●
●● ●
●
●
●
●
●●
●
●●●
●
●
●●
●
●●●
●
●
●●
●
●
●●●
● ● ●●
●● ●
●
●
●
●
●
●
●●●
●
●
●● ●
●
●●
●●
●
●●
●
●
●
●●
● ●●
● ●●●
●
●●
●
●
●● ●
●
●●
●
●●●
●
●
●●
●
●●
●●
●
● ●●
●●
●
●
●
●●
●
● ●●●
●
●
●
● ●
●●●●
● ●
●
●
●
●●
●●
●
● ●
●●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●●
●
●●
●●
●●
●
●
●
●
●●
●
●●
●
●
●
●●●
●
●
●●
●
●
●
●●●
●
●●
●● ●●
●
●●
●
●
●
● ● ●
●
●
● ●
●●
●
●
●●
●
●
● ●●
●● ●
●
●
●●
●
●
●
●
●
●
●
●● ●
●●●
●
● ●●
●
●
●
●
● ●
●
●
● ●
●●●●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●● ●
●
●
●●
●
●
●
●●
●
●●●
●
●●
●
●
●●● ●
● ●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
● ●
●●●
●
●
●
●
● ● ●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
● ●●●
●● ●
●
●● ●●
●
●
●
●
●
●
●
●
●
●
●●●
●● ●
● ●
●
●●
●
● ●
●
●
●●
●
●
●
●●
●● ●
●
●●
●●●
●
●
●
●
●●
●
●●●
●●
● ● ●● ●●●
●
●
●●
●
●
●●
●●
●●
● ●
●●●●
●
●
●
●
● ●●
●
●●
●
●
●
●
●●
●
● ●
●
●
●●● ●● ●
●
●
● ●●●
●
●
●
●
●
●●
●
●
●●
●●
●●
●
●
●●
● ●
●●● ●
●●●
●
●
●●
●●
●
●
●●
●●●●
●● ● ●
●●
●●
●●
●●
●●
● ●●
●
●
●
●
●
●●
●
●●
● ●
●●● ●
●
●
●
●●
●●●
●
●
●
●●
●●
●●
●
●
●●
●
●
●●
●●
●●
●●●●
●●
●
●
●
●
●
●
●
●●
●
0 2 4 6 8
0.0
0.2
0.4
0.6
0.8
1.0
Fungi 18S
●
●
●
●●
● ● ● ● ●●
●●● ●
●● ●
●●●●●● ●●
●
●
●
●
●●●
●
●●
●●
●
●
●
●
●●
●
●●
●●●●
●●
●
●
●
●●
●
●●
●●
●● ●
● ●●
●●
● ●
●●● ●
●●
● ●● ●
●
●●
●
●●●
●●
●●
●●●
●
●●●
●● ●●
●●
●●
●
●
●
●
●
●●
●
●●
●
● ● ● ● ●●
●●● ●●●
●●
●●● ●●
●● ●
●
●●
●
●● ●●●
●●
●●
● ●● ●● ●
●
●●
●
●
●
● ●
●
●
●●
●
●●
●●
●●
●●
●●
● ●
●● ●●
●●●
●● ●●
●●
●●
●● ●
●●●
●●
●
●● ●
●● ●
●●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●●●●●
●
●●●●
●●● ●
● ●●
●●
●●
●
● ●●●●
●●● ●● ●● ●
●
●●
●●●●
●
●
●●
●
●
●●
●
●●
● ●●
●●●●
●● ●
●●
●● ●●
●
●●
● ●●
●
●
●●
●●
●●
●
●
● ●● ● ●●
●●
●
●
●
●
●●
●
●●
●●
●●
● ●●
●●
●
●●● ● ●
●●●
●●●
●
●
●●
●
●●●●●
●●
●●
●● ●● ●
●
●
●●
●
●
●
●
●
●
●●●●●
●● ●●
●●● ●
●●●
●●
●● ●● ●●
●●
●●
●●
●●
●●
●●● ●
●
●● ●● ●
●●● ●
●
●
●●
●
●●
●
●
●●●
●
●●●
●●
● ●●
●● ●●
●●● ●
● ●
●
● ●
●●
●●●● ● ● ●●
●●
●
●
●● ●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●●
●●●
●●●
● ●●
● ●
●
●
●
● ●●
● ●
●
●●
● ●
●●
● ●●●
●● ●
●
●
●
●
●
●●●
●
●
●●
●●
●●
●●
● ●
●●
● ●
●
●●
● ●
●
●
●●●
●
●●●
●●
●●● ●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●● ●
●
● ●
●
●●●
●
●●●
●
● ●
●
●●
●●
●
●●
●
● ●●●
●
● ●
●
●●●●
●
●●
●
●
●●
●
●
●
●
●●
●
●●
● ●
● ●●●
● ●● ●●
● ●●●
●●
●●
● ●●
●●
●●
●●
● ●●
●
●●
●
●
●●
●
● ●●● ● ●● ●
●●●
●●●
● ●
●
●
● ●
●●
● ●●
●
●
●●●
●●
●●
●●
●
●
●
● ●
●
●
●●
●
●
●
●●
●
●
●
● ●●
●●● ●
●
●
●
● ●
●
●● ●●
● ●
●
●●
●
●
●●●●
●
●
●
●
●
●
●
●●
●
●● ● ●●
●●
●
●●
●●
●
●
●
●
●●
●● ● ●●
●
●
●●
●●
●
●●●
●
●
●
●
● ●
●●
●
●●
●● ●
●
●
● ●
●
●●
●●
●●
●● ● ●●
●● ●
●●
● ●● ●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
● ●
●●
●
●
●●●
●
●●
●
● ●
●●● ●●●
● ●●
●● ●
●●
●●
● ●
●
● ●
●
●
●●
●
● ●
●●
●●
●●●
●
●
●
●●●
●●● ● ●●● ●●
●
● ● ●●
●●
●
●
●
●●
●
●
●
●
●
●
●●
● ●●●
●●
●●
●● ● ●●●
●●
●
●●
●
●
●●
●●●●
●
●
●
● ●
●
●
●●
●
●
●
●
●●
●●
●
●●
●●
●
●●
●
●●● ● ●
●● ● ●●●
●●
●●
●
●
●
●
●
●●
●
●●
●●●
●● ●
●●
● ●●
●●
●●
●
● ●●
● ● ●●
●
●●
●●
●
● ●
●
●
●●
●
●
● ●
●● ● ●
●●
●●
●●
●
●
●
●● ● ●
● ● ●●
●●
●
●
●
●
● ●
●
●●●
●
●
●
●● ●
●●
●
●●
●●●
●● ●
● ●●●
● ●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
● ●●
●●
●●
●●
●●● ● ●
●
●● ●
●●●●
●●
●
●
●
●●●
●
●
●
●
●●
●
●
● ●●● ●
●●● ● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●● ●
●
●
● ● ●●
●
●
●●
●●●
●
●
●●●
●●
●●
●
● ●
●
●
●●
●
●
●
●
●
●●●
●
●●
●
●
●●●
●●●
●●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●● ●● ●
●●●
●●●
●
●
●●●
●●
●
●
●
●●
●
●
●
●
●
●●
● ●●● ● ●●
●● ●
●
●
● ●
●●
●●
●
●
●
●●
●
●
●
●●
●
●●●●
●
●●●●
●●
●
●●
●
●●
● ●●
●
●●
●●●●
●
●●
●●● ●●
●
●●●
●
●
●
●●●
●●●
●
●●
●
●
●●
●
●●
●
●
●●●
●● ●●
● ●
●●
●●● ●
●
●●
●
●
●●
●
●●
●●
●●● ● ●●
●●
●●●●
●●
●
●●
●
●
●●
●
● ●
●●● ●
●●●
●
● ●●
●
●
●
●
●
●
●
●
●
●●
●
●
●● ●●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
● ●●
●
●●
●
●
●
●
●
●
●
●●
●
● ●●
●
●
●● ●●●
●● ●
●●
●
●
●
●
●
●
●
●●●
●
●
●●●
●
●●
●●●
●●
●
●
●
●
●
●●●
●
●
● ●
●
● ●
●
●●●
●
●
●
● ●
●
●
●●
●
●●●
● ●
●●●
●
●
●
●
● ●
●●
●●
●
● ●
● ● ●
●●●
●
●
●
●●
●
●●●
●
●
●● ●
● ●●
●
●
●
●
●
●
●
●●
●
● ● ●●●●
●
●
●
●
●
●● ●
●●
●
●
●
●●●
●
●
●●
●
●●●
●
●●
●
●
●
●
●
●●
●●●●
●
●
●●
● ●
●
●●
●
●
●●
●●●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●●●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●●
●
●●
●●
●
●●
●●
●
●●●
●
●●
●
●●
●
0 2 4 6 8
0.0
0.2
0.4
0.6
0.8
1.0
Plants trnL
●
●
●
●
●
●
●
● ●
●●●
●●
●●
●
●●
●●
●
●●
●
●
●
●
●
●●
●●
●●
● ●
●●●
● ●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●● ●
●
● ●●●
●● ●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●● ●
●
●
●●
●
●
● ●
●
●
●
●
●●
●●
●●
●●
●
●●
●
● ●
●
●
●●
●●
●
●
●
● ●●
●
●
●
●
● ●
●
●●
●●
●
●
●●
●
●●
●●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●●
●
●
●
●●
●
●
●
●●●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●●
●
●
●●
●●●
●●
●●
●
●
●●●
●
●●●
●
●
●●
●●
●
●●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
● ●
●
●●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
● ●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●●●
●●
●●
●● ●
●
●●
●
●
●
●
●
●
●
●●
●
●
●● ●
●
●
●
●●
● ●●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
● ●
●●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
● ●
●
●
●●
●
●●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●●
●
●●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●●
●
●
●
●
●●
●●
●
●●
●
●●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●●
●●
●
●
●
●●
●
●
● ●
● ●
●●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●●
●●●
●
●●
●
●
●
●
●
●
●● ●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●●
●
● ●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●●●
●
●
●●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
● ●
●
●●
●
●●
●
●
●
●
●
●
●●
●●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
● ●●
●
●
●●
●
●
●
●●
●
●●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●● ●●
●
●
●
●
●
●
●
●●
●● ●
●
●
●
●
●●
● ●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
0 2 4 6 8
0.0
0.2
0.4
0.6
0.8
1.0
Plants 18S
Chapter1–DNA-basedBetaDiversity
109
Figure S4: Proportion of variance explained by RDA for each soil variable. Only soilvariablesselectedbyforwardvariableselection(outofthesixinitialvariables)areshown.
AllPCA axis 4 FePCA axis 3 PPCA axis 2 Silt−nutrientsPCA axis 1 Clay−C−N−Al−pH
0.00
0.05
0.10
0.15
0.20
0.25
0.30
Prop
ortio
n of
exp
lain
ed v
aria
nce
(RDA
)
Bacter
ia 16
S
Protists
18S
Fungi
ITS
Fungi
18S
Plants
trnL
Plants
18S
Arthrop
ods 1
8S
Annelid
s 18S
Nemato
des 1
8S
Platyh
elmint
hes 1
8S
Insec
ts 18
S
Insec
ts 16
S
Chapter1–DNA-basedBetaDiversity
110
Figure S5: Testing the neutral prediction for the decay of taxonomic similarity withgeographical distance: F2 similarity as a function of log-distance.Red linedenotes linearregression.Notethaty-scalevariesacrosstaxonomicgroups.
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●●
●●
●
●●
●
●●●
●●
●●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●●
●●
●●
●●
●
●●
●●●
●●
●●●
●
●●
●●
●
●
●
●
●●●●
● ●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●●
●●
●●
●●
●
●●
●●●
●●
●●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●●
●●
●
●
●
●
●●●
●
● ●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●●
●●
●●
●
●
●●
●●
●
●
●
●
●
●
●●
●●
●
●●●●
●●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●●●
●
●
●●●●
●●
●●●
●
●●
●
●
●●
●●●● ●
●●●
●
●
●●
●
●●●
●
●
●●
●
●
●
●
●
●
●●●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●●●●●
●
●
●
●
●
●●
●●
●
●●
●●●●
●
●●
●
●
●●
●
●
●●
●
● ●●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●●●●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●●●
●
●
●
●
●●●●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●●
●●●
●
●●●●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●●
●●
●
●
●
●
●●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●●
●●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●●
●
●●
●
●●●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●●●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●●●●●
●
●
●
●
●●●●
●
●
● ●●●
●
●●
●
●
●●●●●●●●●
●●
●●●●
●
●
●●
●●●●
●
● ●●●
●
●
●
●
●
●●
●
●●
●●
●●
●
●●●●●
●
●
●
●
●●●●
●●
●●
●
●
●
●
●
●●
●
●●
●●
●●
●
● ●●●●
●
●
●
●
●●●●
●
●●
●
●
●
●
●
●●
●●●
●
●
●●
●
● ●●●●
●
●
●
●
●●●●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
● ●●●●
●
●
●
●
●●●●
●
●
●
●
●
●
●●
●●●
●●
●
●
●
● ●●●●
●
●
●
●
●●●●
●
●
●
●
●
●●
●●●
●
●
●●
●
● ●●●●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●●●●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●●
●●
●
●
●●●●
●
●
●
●
●●●●
●
●●●●
●●
●●
●●
●
● ●●●●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●●●
●
●●●●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●●●
●
●● ●
●●
●●
●
●
●
●●●
●
●
●
●
●●●●
● ● ●
●●
●●
●
●
●●●●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●●●
●
●●
●●
●
●
●
●●●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●●
●
●●
●
●
●
●●●
●
●
●
●
●●●●
●●
●
●
●
●●●
●
●
●
●
●●●●
●
●
●
●●●●
●
●
●●
●●●●
●
●
●●●●
●
●
●●
●●●●
●
●●
●●●●●●
●
●
●●
●●●●
●
●
●
●
●●●●
●
●●
●
●
●
●
●●●●
● ●
●
●
●
●
●●●●
●
●
●
●
●
●
●●●
●
●
●
●
●●●●
●
●●
●
●●●
●
●
●●●●
●
●
●●●
●
●●●
● ●●●
●●
2 3 4 5
0.00
20.
006
0.01
00.
014 Bacteria 16S
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●●
●●
●
●●
●
●●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●●
●●
●
●●
●
●●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●●●
●●
●
●●
●●
●
●●
●
●●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●●
●
●●
●●
●
●●
●
●●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
2 3 4 50.
20.
30.
40.
50.
60.
7
Protists 18S
● ●●●
●●
●
●●●●●●●●●●●●●● ●●●●●●●●
●●●●●●●●●●●●●●●● ●●●●●●●●●●●●● ● ● ●●● ●●●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●● ● ●
●
● ●●●●●●●●●●●●●● ●●●
●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●
●
●● ●●● ●●●●●
●
●●●●●●●● ●●●●●●
●
●●●
●
●●●●●●●●●●●●● ●●●●●●●●●●●●●
●●●
●●●●●●●●●●●●● ●●●
●●●●●
●●●●●●●●●●●●●●●● ●●●●●●●●●●
●●● ● ●●●●●●●●●●●●
●● ●●●●●●●
●
●●●●●●●●●●●●
●●●● ●●●●●●●●●●●●●
●●●●●●●●●●●●
●
● ●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●
●
●●●●●●●●●●●●●●● ●●●
●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●
●●●● ●● ●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●
● ●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●● ●● ●●●●●● ●●●●●●●●
●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●●●
●●●●●● ●●●●●●
●
●●●
●
●●●●●●●●●●●●● ●●●●●●●●●●
●●
●●●●●●● ●●●
●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●
●●●
●
●●●● ●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●●
●
●
●●●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●● ● ●● ●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●
●●● ●●●●●●●●
●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●
●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●
●●●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●●●
● ●
●
●● ●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●●●
●●●● ●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●● ● ●
●● ●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●
● ●●● ●●●● ●●●●●●●●●●●● ●●●●●●●●●●
●●
●
●● ●●●● ●●●●●●●●●●●● ●●●●●●●●●●
●●
● ● ●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●●
●●
●
● ●●●●●●●●●●●● ●●●●●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●
● ●● ●●●●●●●●●●●● ●●●●●●●●●●●●● ● ●●●●●●●●●●●● ●●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●
● ●
●●
●●● ●●●● ●●●●●●●●●●●●
●●
●
●●● ●●●● ●●●●●●●●●●
●●
●●
●●● ●●●● ●●●●●●●●●●●●
●
●
●●●●●● ●●●●●●●●●●●●
●
●
●
●●●● ●●●●●●●●●●●●
● ●
●●●● ●●●●●●●●●●●●
●
●●●● ●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●
●
● ●●●●●●●●●●●●● ● ●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●● ●●
●
●●● ●
●●●● ● ● ●●● ●●●●● ● ●●● ●●●●● ●●● ●●●●● ●●●●●●
●● ●
●●●● ●●●●●●●●
● ●●●
●●
2 3 4 5
0.00
0.05
0.10
0.15
Fungi ITS
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●●
●
●
●
●
●●●●●
●●
●
●
●
●
●
●●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●●●
●
●
●●
●
●●●●
●
●
●●●
●●
● ● ●
●
● ●
●
●●
●●
●
●●
●
●
●
●
●●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●●
●
●
●
●
●●
●
●
●●
●
● ●
●●●
●●
●
●
●
●●
●●●
●
●
●
●
●
●
●●
●
●
●●●
●
●●
●
●
●●●●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●●
●
●
●●
●●
●
●
●
●●
●●●
●
●●
●
●
●
●
●
●●
●
●●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●●●●
●
●
●●●●
●
●●
● ●
●
●
●
●
●
●●
●
●
●
●●●●●●●●●
●●
●
●
●
●●
●
●●
●
●
●
●●●
●
●●
●●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●●●
●
●●●●●
●●●
●●●●
●
●
●
●
●
●
●●
●
●
●●●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●●●●●
●
●
●
●●●●
●●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●●●
●●
●●●●●●●●
●●
●● ●
●●
●●
●
●
●
●●●●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●●
●●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●●
●●●●
●●
●
●
●
●●●
●●
●●●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●●
●
●●●●
●●●●●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●●●●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●●
●●●
●●●●
●
●
●
●
●
●●
●
●
●●
●
●●
●
● ●
●
●
●
●
●●
●
●●
●
●
●
●●●
●●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
● ●
●●●●
●●
●●●
●
●●
●
●
●●
●●●
●
●
●●
●
●●●●●●
●
●●
●●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●● ●
●●
●●●
●
●●●●
●●●
●●
●
●
●●●
●
●
●
●
●●●●
●
●●●
●●
●
●
●●●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●●●
●●●●
●●
●
●●
●●●
●
●
●
●●
●
●●●●
●
●
●
●
●
●
●●●●
●
● ●●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●●●
●●
●
●
●
●●●●●
●
●
●
●
●●
●
●
●
●●●●
●●●
●
●
●●●
●●●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●●●
●
●●●●
●●●●
●●●●
●
●●
●
●
●●
●
●
●
●
●
●
●●●
●
●●●●
●
●
●●
●
●●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●●●
●●●●
● ●
●●●
●
●
●
●●
●●
●●
●
●
●
●
●
●●●
●
●●
●
●●● ●
●●
●
●
●
●●
●●●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●●●
●
●●●
●
●
●
●●●●
●
●
●●●
●
●
●●
●●
●●
●●●●
●
●●●●●
●
●
● ●●
●
●●
●
●●●●
●
●●●
●
●
●
●
●●
●●
●
●
● ●●
●
●●
●
●●●
●●●●
●
●●●
●
●
●
● ●
●●●●
●
●●
●●●●
●
●
●●●
●
●
●
●
●●
●
●
●
●●
●
●
●●●
●
●●●
●
●
●
●●●
●
●
●
●●●●●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●●●
●
●
●
●●●
●●●●●●
●
●
●●●●●
●
●●●●●
●●
●
●●●●●
●
●
●●●●●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
2 3 4 5
0.00
0.04
0.08
0.12
Plants trnL
● ●● ● ●
●
●●●●●●
●●
●●●●●
●●●● ●
●●●●●●●●●●●●●
●●●●●●
●●●●
●
●●●●
●
●●●
●●
●● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
● ● ●●● ●●●
●●●●●●●●●●●●● ●●●●●
●
●
●●●●
●
●●
●
●●●●●
●●
●
●
●
●●●●
●●●●
●●
●●●●
●
●●●●
●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●
●●●●●
●
●●●●
●
●●●
●●
●● ●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●●
●
●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●
●●●●
●●
●●●●●●●●
●
●
●
●●●●●●●●
●●●●●●●
●
● ●●●
●●
●
●●●●●●●●●●●●●●●
●●●
●●●●●●●
●
●
●●
●
●
●
●●
●●●●
●
●●●
●●
●●
●●●●●●●●●●●●●●
●●
●●●
●
●
●●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●●●
●
●●●●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●●●●
●
●●●●●●
●●●●●●●●●●●● ●
●●●●●●
●
●●●●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●●●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●●●
●
●●●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●●
●
●●● ●
●●●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●●
●
●●●●
●
●●●●●●
● ●●●
●
●●●●●
●
●● ●
●●●
●●
●
●●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●●
●
●●●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●●
●● ●
●
● ●●●
● ●●●
●●●●
●
●●●●
●●●●●●●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●●●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●●●
●●
●
●
● ●
●●
●
●●
●●●●
●
●
●●
●●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●●
●●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●●
●●
●
●
●●
●
●
●●
●
●
●
●●●●●
●●
● ●●●
●
●●●●
●●●●
●
●●●
●
●
●●●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
● ●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●●●
●●
●
●
●●
●●●●●●●●●●●●
●
●●
●●●
●
●●●●
●
●●●
●
●
●●●
●●
●●
●
● ●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●● ●
●●
●
●
●●●●●●●●●●●●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●●
●
●
●●
●
●
●●●
●●
●●
●●●●●
●
●●●●●●●●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●●
●●
●
●
●
●
●●
●●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●●
●
●
●●
2 3 4 5
0.0
0.1
0.2
0.3
0.4
Arthropods 18S
●●● ● ●●● ●●●●●●
●
●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●
●
●●● ●●●●●●●●●●●●●●●● ●●●●●●●●●●● ●●●●●●●●●●●
●●●●●●●●●●●●● ● ●●● ●●●●●●●●●●●●●●●● ●●●●●●●●●●● ●●●●●●●●●●● ●●●●●●●●●●●●● ●●● ●●●●●●●●●●●●●
●
●● ●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●● ●● ●●●●●
●
●●●●
●
●●●●●
●
●
●●
●
●●●
●
●●●●
●
●●●●●●
●
● ●●●●●●●●●●●●
●
● ●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●● ●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●
●
●●●●●●●●●●●●●●●●●● ●
●
●●●●
●
●●●●●●●●●●●●●●● ●●●●●●●●●●●●● ●
●
●●●●●
●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●
●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●
●
●
●
●●●●●●●●●
●
●●●●●●●
●
●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●
●
●● ●●●●●●●●●●● ●●●●●●●●●●●●●●● ●●●
●●●●● ●●●●●
●
●●●
●
● ●●●
●
●●●●●●● ●●●●●●●
●●●●●●● ●●●●●●●● ●●●●●●●●
●
●● ●●
●
●●●●●●●● ●●●●●●●●●●●●● ●●●●●●●● ●●●●●●●●
●
●● ●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●● ●●●●●●●●●●●
●●●●●●●●●●●●● ●● ● ●●● ●●●●●●●●●●● ●●●●●●●●●●● ●●●●●●●●●●●●● ● ● ●●●
●
●●●●●●●●●● ●●●●●●●●●●●
●●●●●●●●●●●●● ● ●●●
●
●
●●
●
●●●●●● ●●●●●●●●●
●
● ●●●●●●●●●●●●● ●●
●●●●●●●●●●
●
● ●●●●●●●
●●●
●
●●●●
●
●●●●●
●●
●
●
● ●
●
●●●●●
●
●●● ●●●●●●●●●●● ●●●
●
●●●●●●●●● ● ●●●●●●●●●●● ●●●●●●●●●●● ●●●●●●●●●●●●● ●
●
●●●●●●●●● ●
●
●●●
●
●●●●● ●●●●●●●●●●●●●●
●
●●●●●●●● ●●●●●●●●●●● ●●●●●●●●●●●●
●
●●
●
●●● ●●● ●●●●●●●●●
●
● ●●●●●●●●●●●●● ●
●
●
●
● ●●● ●
●
●●●●●●●
●
● ●●●●●●●●●●●●● ● ●●● ●●● ●●●●●●●
●●●● ●●●●●●●●●●●●● ●●● ●●● ●●●●●●●
●
●
●
●
●
●
●●
●
●●●●●●●● ●● ●●● ●●●●●●●
●
●
●
● ●●●●●●●●●●●●●● ●●● ●●●
●
●●●●●●● ●●●●●●●●●●●●● ●●●●●●●●●●●●●● ●●●●●●●●●●●●●●● ●●
●
●●●●●●●● ●●●
●
●●●●●●●●●● ●●
●
●●●●●●●● ●●●●●●●●●●●●● ●●●●●●●●●●●
●
●●●●●●●●●●
●
●
●
●●●●●●●●● ●●●●●●●●●
●
●
●
● ● ● ●●●
●
●●●
●●●●●●●●●●●●● ● ●●● ●●●
●
●●●●●●●●●●●●
● ●●● ●●●● ●●●●●●●●●●●●●●● ●●●● ●●●●●●●●●●●●
●● ●●●●
●
●
●
●●●●●●●●●● ●●●
●
●●●●●●●●●●●●●●●● ●●●●●●●●●●●●
●
●
●
●
●
●●
●
●●●
●
●●●
●
●
●
●
●●
●
●●●●●●●● ●●●●●●●●●●●●●
●
●●
●
●●●
●
●
●●
●
●
● ● ●●● ●
●●●● ●
●
●●●
●●●●● ● ●●● ●●●●● ●●● ●●●●● ●●●●
●●
●
●
●●●●● ●●
●
●●●●●● ●●●
●●
2 3 4 50.
00.
10.
20.
30.
4
Insects 16S
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●●●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●●
●
●●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●●●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●●●
●
●●
●
●
●
●
●
●
●●
●●●
●
●
●
●●●
●
●
●
●
●
●
●
●●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●●●
●●●●
●●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●●●
●●●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●●
●●●●
●
●
●
●
●●●●
●
●●
●
●
●
●●●
●
●●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●●●
●●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●●●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●●
●
●
●●●
●
●
●●
●
●
●
●
●
●●●●●●●●●●
●●●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
● ●●●
●
●
●
●
●
●●●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●●
●
●
●
●●
●
●
●
●
●●
●●
●●
●
●
●
●●●●
●●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●●
●●●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●●
●●
●●
● ●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●●
●
●
●
●
●
●
●
●
2 3 4 5
0.2
0.4
0.6
0.8
Annelids 18S
●
●●
●●●
●
●
●●
●
●
●
●
●
●●
●●
●
●●● ●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●●
●●
●
●
●
● ●●●●●●
●
●●●●●
●●
●
●
●
● ●●●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●●
●●
●
●
●●●
●●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●●
●●●
●●
●
●
●
●
●
●●●
●
●
●●●●
●●
●
●●
●●
●
●●●
●
●●●
● ●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●●
●
●●●●●
●●
●●
●●
●
●
●
●
●
●
●●●●
●●●
●
●
●●
●
●
●●●
●
●
●●
●
●
●●
●
●
●
●●
●
●●
●
●●
●●
●
●●●●●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●●●
●
●
●●●
●●●●
●●●
●
●
●
●
●
●●●
●
●
●
●●
●
●●●
●
●
●●●●
●●●●●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●●●●
●●●●●●●
●
●
●●●●●●●●
●●●●●●●●●●●●●●●
●●
●
●●●
●
●
●
●
●●●
●●
●●
●
●●●
●
●●
●●
●
●
●
●●
●
●●●
●●●
●
●●●●
●●●
●
●
●
●
●●
●●●●●●●●●●●●●
●
●●●●●●●●●
●
●●●●●●●●●●●●
●●●●●●●●●●●●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●●
●●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●●
●●●●
●
●●
●
●●●
●●
●●
●
●
●
●
●
●●● ●
●●●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●●
●
●
●
●
●
●●●
●●
●
●
●● ●●●
●
●
●●●●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●●●●●
●●●●
●●
●
●
●
●●
●●
●●
●
●
●
●●
●
●
●●●
●
●
●●
●
●
●
●
●
●●
●
●●●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●●●●
●●●●●●
●●
●
●
●
●●●●●●
●●●● ●●●●
●
●●●
●●●
●
●
●
●
●●● ●
●
●
●●●●●●●●●
●
●
●●
●
●●●
●●●
●
●●
●
●●●
●
●
●●●●
●
●
●●
●●●
●
●●●●●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●●●●●
●
●
●
●
●
●●●●
●●
●
●
●
●●●●●●●●●
●
●●●
● ●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●●
●●●
●
●
●●●●
●
●
●●
●●●
●
●
●
●●
●
●
●
●
●●●●
●
●
●
●
●
● ●●●●
●
●●
●●
●
●
●
● ●
●●●●
●
●
●
●
●
●
● ●●●
●
●●
●
●
●
●
●
● ●●●●
●
●●
●●●●
●
●
●●
●
●
●
●
●
●●●
●●
●
●
●
●●●
●
●●
●●
●
●
●●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●●
●●●
●
●
●
●
●
●●●●
●
● ●●
●
● ●●●
●
●
●
●
●
●
●●
●
●●●
●
●●●●●
●●
●
●
●●●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●●●●
●●●●
●●●
●●●●●
●●●●
●
●●
●
●●●●
●
●
●
●●●
●●●●●●●●●
●●●
●●●●●
●
●●
●●●●●
●
●
●●●
●
●
●●●
●
●
●●
●●●●
●●●●●●●●
●●●
●
● ●●
●
●●
●
●
●
●●
●●
●
●
●● ●
●
●●●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●●
●●●●
●
●●●
●
●
●
●
●●●●
●
●●
●
●
●●●
●
●
●
●
●●
●
●
●●●
●
●●
●●●●
●
●
●
●
●
●
●
● ●
●
●●
●●
●
●●●●
●
●
●
●
●●
●
●
●●
●
●
●
● ●●●●
●
●●
●
●
●●●
●●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●●●●
●
●
● ●
●
●
●●
●●●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●●●
●
●
●●
●
●
●●
●
●
●●●
● ● ●
●
●
●●
●
● ●●●
●
●
●
●●
●●●
●
●●
●
●
●
●●●
●●●●
●
●
●
●●●●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●●
●●
●
●●
●●
●●●●
●
●●●●●●
●
●
●●●
●●●
●
●●
●
●
●
●
●●
●●●● ●●●●●●●●●●●●
●
●
●
●
●●
●
●
●●
●
●●●●
●
●
●
●●●●●●
●
●●●
●
●
●●●●●●●
●
●
●●
●
●●●●●●●●
●●●●
●●
● ●●
●
●
●
●●●
●
●●
●●
●
●
●●
●
● ●
●●
●
●
●
●
●● ●●
●
●
●
●●
●
●
●
●
●●●
●
●
●●●●
●
●
●●●
●
●●
●
●●
●
●
●●
2 3 4 5
0.0
0.1
0.2
0.3
0.4
Nematodes 18S
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
● ●
●
●●
●
●
●●●
●
●
●
●
●●
●
●
●●
●●
●●
●
●●
●
●
●●
●
●
●●●
●
●
●
●
●
●●
●
●
●●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●●●
●●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●●●
●
●
●
●●●
●
●
●
●●●●
●
●
●
●
●●
●
●●
●
●
●●
●●●
●
●
●
●●
●●●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●●
●
●●
●
●●●
●●
●
●●●●
●
●●
●
●
●●
●
●
●●●
●●
●●
●
●
●
●
●
●
●
●
●●●
●
●●●●●●
●●
●
●●
●
●●●
●●
●
●●●●
●
●●
●
●
●●
●
●
●●●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●●●
●
●
●
●●●
●●●●●●●●
●
●
●
●
●
●●
●●●
●
●
●●●●●●●●
●●
●●
●
●●
●
●
●
●●●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
● ●●● ●●
●
●
●
●
●
●
●
●
●
● ●●●
●●
●
●
●
●●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●●
●
●●●●
●
●●
●
●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●
●●●
●●
●
●●
●
●
●
●
●●
●●●
●
●
●
●●
●
●●●●
●●
●
●
●●
●
●
●●
●
●
●
●
●●●
●●
●
●
● ●
●●
●
●
●
●
●
●●●● ●
●●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●●
●
●●
●
●●●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●● ●●● ●
●
●● ●●●●●●●●●●●● ●●●●●●
●
●●
●
●●
●
●
●
●
●
●
●●●●
●●●●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●●
●
●
●
●
●●
●
●
●●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●●●●●●
●
●
●
●
●●
●●●
●●
●●
●
●
●●
●
●
●●
●
●
●●●●
●
●
●●
●●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●●●
●
●●●●
●
●●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●●●
●
●●●●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●●●●●●●●●●● ●●●●●●●●●●●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●●
●
●●●
●
●
●
●●●●
●●
●
●
●
●
●
●●
●
●●●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●● ●
●
●
●
●●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●●●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●●
●
●●●●●
●
●
●●
●
●●●
●
●
●●
●●
● ●●●●●●●●●●●●
●●
●●●
●●
●
●
●
●●
●
●● ●
●
●
●
●●●●●●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
● ●
●●●●
●●
●
●●
●
●●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●●
●●
●
2 3 4 5
0.0
0.2
0.4
0.6
0.8
Platyhelminthes 18S
F 2
Log10 of geographical distance (m)
Chapter1–DNA-basedBetaDiversity
111
Appendix: Fitting the neutral prediction for the distance-decay of
similarity
Chave & Leigh (2002) derived an analytical prediction for the decay of taxonomic
similaritywith distance in a continuous spatially explicit version of Hubbell’s neutral
model of biodiversity, where individuals have spatial density𝜌and where dispersal
follows a radially symmetric Gaussian probability density
𝑃 𝑟 = 1 2𝜋𝜎! exp − 𝑟! 2𝜎! as a function of distance r. They predict that the
stationaryprobability𝐹! 𝑟 that two randomly selected individualsdistantof rbelong
tothesamespeciesdecreasesas:
𝐹! 𝑟 ≃2𝐾!
𝑟 2𝜈2𝜎
ln 1𝜈 + 2𝜌𝜋𝜎!
provided thatr is larger thanσ. Inourdataset, theminimalvalue takenby r is40m,
whichisapproximatelyequaltothemeandispersaldistancepergenerationfortropical
trees(Conditetal.,2002).Becausethemeandispersaldistancepergenerationis 2𝜎in
themodel, and because trees are likely to be the organismswith the largestσ in our
study,theassumptionthatrislargerthanσcanberegardedhereasreasonable.
The parameter𝜈 it is the speciation probability in the underlying neutral
dynamics,i.e.theprobabilityforanewlybornindividualtobelongtoanewspecies.This
parametercharacterizes theregionalspeciesdiversity foragivenpopulationsize.The
function𝐾! 𝑟 is themodifiedBessel function of the secondkind andof zeroth order,
thatcanbeapproximatedas𝐾! 𝑟 ≃ − ln 𝑟 2 − 𝛾if𝑟 ≪ 1,where𝛾isEuler’sconstant.
Because𝜈 ≪ 1, this approximation can be regarded as valid in our case where𝑟 =
𝑟 2𝜈 𝜎.Theprobability𝐹! 𝑟 thenbecomes:
𝐹! 𝑟 ≃ −2 ln 𝑟 2𝜈
2𝜎 + 2𝛾
ln 1𝜈 + 𝜌𝜋𝜎!
Chapter1–DNA-basedBetaDiversity
112
Inempiricaldata,theprobability𝐹! 𝐴,𝐵 thatarandomlyselectedindividualin
site A belongs to the same species as a randomly selected individual in site B can be
measuredas𝐹! 𝐴,𝐵 = 𝑝!!𝑝!!!!!! ,where𝑝!!istheproportionofspeciessinsiteAand
𝑝!! thatinsiteB,andSisthetotalnumberofspeciesinbothsites(Chave&Leigh,2002).
We can thus compute the quantity𝐹! 𝐴,𝐵 for every pair of sampling points and
performedthelinearregression𝐹! = −𝑎 ln 𝑟 +𝑏,whereristhedistancebetweentwo
samplingpoints(inmeters).Byidentificationwiththemodel’sprediction,weobtain:
𝑏𝑎 = ln
2𝜈2𝜎 + 𝛾
1𝑎 = 𝜌𝜋𝜎! +
12 ln
1𝜈
The first equation provides the value of 𝜎 as a function of 𝜈 , 𝑎 and 𝑏 as
𝜎! 𝜈 = 1 2 exp − 𝑏 𝑎 + 𝛾 ,whilethesumofthetwoequationsprovidesthevalueof
𝜎 as a function of𝜌 ,𝑎 and𝑏 as the solution of:𝜌𝜋𝜎! − ln 2𝜎 + 𝑏 + 1 𝑎 + 𝛾 +
ln 2 2.
Chapter2–NeutralParameterInference
113
Chapter2Inferring neutral biodiversity parametersusingenvironmentalDNAdatasets
GuilhemSommeria-Klein1,LucieZinger1,PierreTaberlet2,EricCoissac2,JérômeChave1
AspublishedinScientificReports,
Volume6,2016
1UniversitéToulouse3PaulSabatier,CNRS,UMR5174LaboratoireEvolutionetDiversitéBiologique,F-31062Toulouse,France.2UniversitéGrenobleAlpes,CNRS,UMR5553Laboratoired’EcologieAlpine,F-38000Grenoble,France.
Chapter2–NeutralParameterInference
114
Chapteroutline
The distribution of species abundances has been one of themost intensively studied
patternsinecology,andtheuseofenvironmentalDNAcoulddramaticallyincreaseour
abilitytomeasureempiricalspeciesabundancedistributionsoverawiderangeoftaxa.
However, DNA-based abundance measurements are noisy and difficult to interpret
comparedtoclassicalcensusesofindividualorganisms.Thischapterdiscussestowhich
extent and under which conditions the whole species abundance distribution may
nevertheless remain informative. The bias on the estimates of Hubbell’s neutral
parameters is takenasameasureof this lossof information. Indeed,Hubbell’sneutral
theoryhasbeenthefirsttoproposearealisticquantitativepredictionforthispatternon
mechanistic grounds. Even though the underlying assumptions have been much
debated, this model remains fundamental as a null model against which non-neutral
effects can be contrasted. It also provides a characterization of species abundance
distributionsbasedontwoparameters,onemeasuringintrinsicdiversity,andtheother
measuringtheconnectivitybetweenlocalandregionalcommunitiesthroughmigration.
The problem is addressed by simulating several plausible sources of bias, based on
literatureandonassumptionsbackedbyabenchmarkdataset.
Chapter2–NeutralParameterInference
115
Abstract
TheDNApresentintheenvironmentisauniqueandincreasinglyexploitedsourceof
information for conducting fast and standardized biodiversity assessments for any
type of organisms. The datasets resulting from these surveys are however rarely
compared to the quantitative predictions of biodiversity models. In this study, we
simulate neutral taxa-abundance datasets, and add simulated noise typical of DNA-
based biodiversity surveys. The resulting noisy taxa abundances are used to assess
whether the two parameters of Hubbell’s neutral theory of biodiversity can still be
estimated.We find thatparameters canbe inferredprovided thatPCRnoiseon taxa
abundances does not exceed a certain threshold. However, inference is seriously
biased by the presence of artifactual taxa. The uneven contribution of organisms to
environmental DNA owing to size differences and barcode copy number variability
doesnot impedeneutralparameter inference,providedthat thenumberofsequence
readsusedforinferenceissmallerthanthenumberofeffectivelysampledindividuals.
Hence, estimating neutral parameters from DNA-based taxa abundance patterns is
possible but requires some caution. In studies that include empirical noise
assessments,ourcomprehensivesimulationbenchmarkprovidesobjectivecriteriato
evaluatetherobustnessofneutralparameterinference.
Chapter2–NeutralParameterInference
116
Chapter2–NeutralParameterInference
117
Introduction
Theobservationofbiodiversitypatternssuchasthediversity,relativeabundanceand
spatialdistributionoforganismsunderpinsmuchofecological theory (Brown,1995;
Rosenzweig,1995;Hubbell,2001).Yetempiricalmeasurementsofthesepatternsare
noisy. In all cases, some taxa are countedmore effectively than others, and error is
generated bymisidentification. Amajor question iswhether this noise is significant
enough to undermine comparisons between empirical measurements and models
(Hilborn&Mangel,1997;Legendre&Legendre,2012).This issuehasrecently taken
on new significance following the advent of DNA-based biodiversity exploration
methods, which are developing fast and hold the promise of rapid, repeatable and
comprehensivebiodiversitymeasurements(Biketal.,2012;Taberletetal.,2012).Yet
they are also less direct than classic biodiversity surveys and entail poorly assessed
noisesources.Inthisstudy,weaskhowtheparameterestimatesofHubbell’sneutral
theory,oneofthemostprominentquantitativebiodiversitymodelsofthelastdecade
(Hubbell,2001;Etienne&Alonso,2007;Rosindelletal.,2012),areaffectedbynoisein
taxa-abundance datasets. We focus on the type of noise generated in DNA-based
surveys, and specifically in DNA metabarcoding surveys (see below; Taberlet et al.,
2012), currently the most popular method for environmental DNA analysis.
Nevertheless,ourresultscanapplymoregenerally.
DNAmetabarcodingisamulti-taxaextensionoftheDNA-basedidentificationof
singlespecimenfromtissuesamplesusingauniversalDNA-barcodesequence(Hebert
et al., 2003). It consists in amplifying a short DNA barcode by PCR from the DNA
extracted fromanenvironmental sample (e.g. soil,water,bulksampleoforganisms),
and sequencing the product by high-throughput sequencing. This method is not
restricted to the detection of known taxa and hence allows for comprehensive
biodiversity measurement. DNA metabarcoding was initially developed to study
bacterialcommunities(Giovannonietal.,1990;Huberetal.,2007;Roeschetal.,2007;
Zinger et al., 2012), but has since been extended to many other groups including
archaea(Schleperetal.,2005)andeukaryoticclades(e.g.plants,earthworms,insects,
Chapter2–NeutralParameterInference
118
fungi;Bienertetal.,2012;Yoccozetal.,2012;Yuetal.,2012;Tedersooetal.,2014).It
ishencenowpossibletostudypatternsofdiversityacrossalldomainsoflife(Ramirez
etal., 2014;Tedersooetal., 2015).However,DNAmetabarcodingobservationshave
seldom been compared to the predictions of biodiversity models (Hubbell, 2001;
Ricklefs,2004).
Over the past decade, the neutral theory of biodiversity has represented a
significantadvanceininterpretingempiricalbiodiversitypatternswithinanecological
guild(Hubbell,2001;Etienne&Alonso,2007;Rosindelletal.,2012).Hubbell’sneutral
model is simple, easily generates biodiversity patterns, allows for exact maximum-
likelihood parameter inference from taxa-abundance distributions, and neutral
predictions on taxa-abundance distributions compare well with empirical surveys
(Etienne,2005;Etienne&Alonso,2005;Jabot&Chave,2009).InHubbell’smodel,sites
vacatedbythedeathofanindividualarereplacedbytheoffspringoflocalindividuals
orbyimmigrants.Birth,deathandimmigrationalloccurirrespectiveofthetaxonthe
organismbelongsto(neutralityhypothesis).Immigrantsaredrawnfromamuchlarger
(regional)poolofindividuals,andtheadditionofnewtaxaintheregionalpoolismade
possibleby(rare)speciationevents.Hubbell’smodelhastwoparameters:θdescribes
the taxon diversity of the regional pool, and m is the immigration rate from the
regionalpoolintothesampledcommunity(seeSupplementaryNote1).
ThepredictionsofHubbell’sneutralmodelhavesofarbeenprimarilycompared
tointegrativepatternsobtainedformacroorganismsusingclassiccensusdata,suchas
theabundancedistributionoftropicalforesttrees(Hubbell,2001).Somestudieshave
alsoappliedneutralmodelstoenvironmentalDNAdatatointerpretthecompositionof
microbial communities. Sloan et al. (2006, 2007) and (Woodcock et al., 2007)
Woodcock et al. (2007) developed a continuous approximation to Hubbell’s model
adapted to large-sized bacterial populations. They focused on estimating the rate of
immigration into the local community independentlyof assumptionson the regional
pool of taxa, by comparing taxa occurrence in multiple samples (Sloan et al., 2006;
Drakare & Liess, 2010; Ostman et al., 2010; Ayarza & Erijman, 2011; Roguet et al.,
2015) or by measuring the turnover of taxa over time (Ofiteru et al., 2010). The
composition of many microbial communities was found to be compatible with
Chapter2–NeutralParameterInference
119
stochasticimmigrationoftaxaofequivalentfitnessfromaregionalpool,atoddswith
the classic assumption that deterministic niche sorting explains the assemblage of
microbial communities (Baas Becking, 1934; Fenchel & Finlay, 2004). Another
approach is tosimultaneouslyestimate thediversityand immigrationparametersby
fitting the taxa-abundance distribution, as it has been commonly done for classic
censuses ofmacroorganisms. Dumbrell etal.(2010) and Lee etal. (2013) did so on
fungal andbacterialDNAdatausingmaximum-likelihoodparameter inferencebased
ontheexactEtiennesamplingformulas(Etienne,2005,2007,2009),whileHarrisetal.
(2015)followedaBayesianapproachinspiredbythefieldofmachinelearning.
Most DNA-based studies comparing empirical abundance patterns to the
predictionsofneutralmodelshavebeenlimitedbythepoordetectabilityofraretaxa
owing to the methods used (Sanger sequencing, DGGE, t-RFLP, ARISA). High-
throughputsequencingnowallowsforimprovedsamplingandprovidesbetterquality
data.Nevertheless,metabarcodingdataarenotdirectlycomparablewithclassiccensus
dataowing tobothexperimental andbiological factors.First,bothPCRamplification
and sequencing produce artifacts. During the PCR amplification, DNA polymerase
makesmistakeswhenreplicatingDNAstrands,ataratethatdependsonenzymetypes.
DNA strands suffer further damage during the high-temperature denaturation step
(Pienaar et al., 2006; Quince et al., 2011; Degnan & Ochman, 2012). Furthermore,
Illuminasequencinggeneratesbetween10-3and10-2errorsperbasepair(Rossetal.,
2013). Clustering algorithms are used to cluster the reads displaying errors with
respect to the original sequence into a singleMolecularOperationalTaxonomicUnit
(MOTU; Sipos et al., 2010; Coissac et al., 2012; Mahe et al., 2014). While these
approaches strongly reduce the number of artifacts in the data, they do not exclude
artifactualMOTUs that aremore difficult to detect (e.g. chimerical fragments, highly
degraded sequences). Second, unbalanced PCR amplification and sequencing among
taxadistortstherelativeabundancesofMOTUs(Siposetal.,2007;Amendetal.,2010;
Airdetal.,2011;Nguyenetal.,2015).Third,relativeabundancesarefurtherbiasedby
noisesourcesinherenttotheuseofDNAbarcodes,suchasthestrongvariabilityofthe
barcode copy number among taxa (Kembel etal., 2012;Weber& Pawlowski, 2013).
Thisproblemisevenmoreseriousformulticellularorganismsbecausethereadcount
shouldalsodependoncellabundance.Abundancesarefurtherbiasedbythevariable
Chapter2–NeutralParameterInference
120
rate of DNA release into the environment through excreted, sloughed or decaying
material(Andersenetal.,2012;Maruyamaetal.,2014;Klymusetal.,2015).
Inthispaper,weconductsimulationstoaddresshowthesourcesofuncertainty
mentionedabovemaydistortparameterestimatesinHubbell’sneutraltheory,andwe
discuss theconceptualdifferencesbetween individual-basedandenvironmentalDNA
approaches to themeasurement of biodiversity.We ask the following questions: 1)
whatistheeffectofartifactualMOTUsandabundancenoiseonestimatingtheneutral
diversity parameter? 2) Can we use the same approach for multicellular as for
unicellularorganisms?3)Whataretheeffectsofthedifferentnoisesourcesonneutral
parameterinferencewhenaccountingfordispersallimitation?
Chapter2–NeutralParameterInference
121
Methods
SamplingfromHubbell’sneutralmodel1.
We generated samples of J individuals following the stationary taxa-abundance
distribution of Hubbell’s neutral model. The immigration from the regional pool of
diversityparameterθintothesampledcommunitycanbeeithercharacterizedbythe
immigrationratemorby thenormalized immigrationparameter𝐼 = !!!!
𝐽 − 1 that
doesnotdependonthesamplesizeJandisthusinvariantbysampling.If𝑚 ≪ 1,I is
approximatedbytheproductJm,noted𝑁!𝑚inSloanetal.(2006,2007).
Wefirstassumednodispersallimitation(i.e.𝑚 = 1).Wegeneratedasampleby
runningJtimesthefollowingalgorithmparameterizedbyθ:atstepj,drawindividual
j+1 from a new taxon with probability𝜃/(𝑗 +𝜃), or draw one of the j individuals
already present and add an individual j+1 of the same taxon. This algorithm, due to
Hoppe(1984),partitionsJindividualsintoarandomnumberToftaxaaccordingtothe
Ewensdistributionofparameterθ(Ewens,1972).
Wethengeneratedsamples fromadispersal-limitedneutralcommunityusing
thetwo-stepprocedureprovidedinEtienne(2005)whichpartitionsJindividualsinto
arandomnumberToftaxa.First,werunJtimesHoppe’salgorithmasdescribedabove
butwithparameterI,soastopartitiontheJindividualsintoAimmigratingancestors.
Second, we run A times the algorithm with parameter θ, so as to partition the A
immigrating ancestor into T taxa, thus taking into account the taxa-abundance
distributionintheregionalpool.Finally,weassignthe J individualstothetaxonomic
identityoftheirimmigratingancestor.
We generated samples ofJ =105 individuals.We explored a realistic range ofparametervalues:θin[1,500]andmin[0.001,1].
Chapter2–NeutralParameterInference
122
SimulatingnoiseinDNAsequencereads:experimentalnoise2.
WesimulatedtheDNAmetabarcodingprocedurebysamplingNsequencereadsfrom
therelativetaxaabundancesoftheneutralmodel,possiblyaftermodifyingtherelative
abundancesaccordingtosimulatednoisesources(seebelow).Wepresenttheresults
obtained for the valueN =104 , a typical number of Illumina sequence reads for oneenvironmentalsample.
In order to test the effect of misidentification bias on neutral parameter
inference,weaddedartifactualMOTUstothedata,whilekeepingthenumberofreads
constant. We assumed that each true MOTU with a read abundance r generates a
random number of artifactual MOTUs, drawn from a multinomial distribution with
weight r. We added either singletons, or MOTUs with larger read abundances. We
obtainedanexampleof artifactualMOTUswith realistic abundance structure froma
benchmark experiment (see below and SupplementaryMethods). Drawing on these
empirical data,we simulated read abundances in the followingway: each artifactual
MOTUwas assumed to have an abundance of 1 read ifr <50 , or an abundance x ifr ≥50 ,wherexliesbetween1andr /50 withaprobabilitydensityp(x)=
1log(r/50)x .
Molecular experimental procedures introducebiases also in read abundances,
becausetheefficiencyofPCRamplificationandsequencingisvariableacrossMOTUs.
For instance, PCR amplification is less efficient if PCR priming sites differ from the
primersequence(Siposetal.,2007),orifthebarcodesequenceistoolongorGC-rich
(Airdetal.,2011).Asaresult,thereadabundancedistributionofMOTUsisnoisedwith
respect to theDNAbarcode abundancedistribution in the sample.Weassumed that
thenoise takes the formofa lognormallydistributedmultiplicativenoiseonrelative
abundances,withmean1andlogstandarddeviation𝜎!"#.Thischoiceisparsimonious
because this noise is predominantly due to PCR (Aird et al., 2011), and the
multiplicativeamplificationofDNAstrandsbyPCRgeneratesamultiplicativenoiseon
abundances. This multiplicative noise can be further assumed to result from the
product of random independent variables and thus to be lognormally distributedby
virtue of the central limit theorem. We tested the effect of noise intensity𝜎!"#on
Chapter2–NeutralParameterInference
123
neutralparameterinference.Forcompleteness,wealsotestedtheeffectofanadditive
Gaussiannoiseofstandarddeviation𝜎!"" onMOTUsrelativeabundances,fordifferent
𝜎!"" values.Thistypeofnoisecanberegardedassimulatingthenoisegeneratedinthe
sequencingstep.
To illustrate our modelling choices with empirical data, we produced a
benchmark dataset obtained by mixing the DNA of 16 plant species in known
quantities.TheexperimentanditsresultsaredetailedintheSupplementaryMethods.
Afterfollowingstandarddatacurationprotocols,wefoundthatthedatasetcontained
33%ofartifactualMOTUsanddisplayedalognormallydistributedmultiplicativenoise
onrelativeabundancesoflogstandarddeviation𝜎!"# = 1.2.Wereportedthesevalues
onthefiguresasexamplesofrealisticnoiseintensities.
SimulatingnoiseinDNAsequencereads:‘biological’noise3.
Irrespective of experimental noise, variability in the number of barcode copies per
individualmaycausebiasintheinterpretationofreadabundances.Forbacteria(16S
rDNA)orprotists(18SrDNA),barcodecopynumbervariability innuclearDNAisan
importantcontributiontoabundancenoise(Kembeletal.,2012;Weber&Pawlowski,
2013):Kembeletal.(2012)foundthatthebarcodecopynumberofthe16SrDNAgene
follows a zero-truncated Poisson distribution of parameter𝜆 = 4across a range of
bacterial clades. Formulticellular eukaryotes, organellic barcodes are typically used,
andtheysimilarlydisplayvariablecopynumberspercellacrosstaxaandtissuetypes.
Toassessthisissue,wetestedhowazero-truncatedPoisson-distributedmultiplicative
noiseaffectsneutralparameter inference, forvariousvaluesof theparameterλ.The
intensity of this noise is measured by the coefficient of variation (i.e., standard
deviation over mean) of the zero-truncated Poisson distribution. Since it reaches a
maximumatλ =1.8 ,noiseintensityismaximalforthisvalue.Formulticellularorganisms,thevariabilityinthenumberofbarcodecopiesper
individual is further amplified because the number of cells may vary vastly across
individuals, owing to body-size differences. We simulated size differences between
Chapter2–NeutralParameterInference
124
individuals followingasimpleandgenericapproach.As inO’Dwyeretal. (2009),we
assumedthatallindividuals,irrespectiveofthetaxontheybelongto,growinsizeover
timeataconstantrategfromaninitialnumberofcellsn0atbirth,anddieataconstant
rated. The stationary probability densitypind(n) of having a numbern of cells for arandomly chosen individual is given by the solution of the von Foerster equation
(O’Dwyeretal.,2009):𝑝!"# 𝑛 = !!𝑒!
!!(!!!!)(seeSupplementaryNote2).Weusedthis
distribution todrawanumbern of cellsbetweenn0 and infinity for each individual,
andmodifiedtheMOTUsrelativeabundancesaccordingly.Notethatwesimulatedsize
differencesbetweenindividualsandnotbetweentaxa,whichwouldhavebeenakinto
simulatingamultiplicativenoiseontaxaabundancesasabove.Wetestedtheeffecton
neutralparameterinferenceforarangeofvaluesof !!!!
+ 1,theratioofthemeancell
number!!+ 𝑛!dividedbytheinitialcellnumber𝑛!.Noiseintensityismeasuredbythe
coefficientofvariation1 (1+ !!𝑛!)oftheprobabilitydensity𝑝!"# 𝑛 .Itisboundedby
1 for !!!!
≫ 1, which corresponds to the case of taxa spanning large ranges of body
sizes,suchastreesorvertebrates.
Organismsmaybeentirely contained in theenvironmental sample if theyare
sufficiently small, orwhenDNA is extracted from amixture of directly sampled live
organisms,suchasinsectsfromalighttrap(bulksamples;Yuetal.,2012).However,in
most cases, only small fractions of these organisms are sampled (e.g. roots, pollen,
seeds, spores, faeces, and different secretion types), or even only extracellular DNA
resulting fromcelldeathandsubsequentdestructionofcellstructure(Levy-Boothet
al., 2007; Taberlet et al., 2012). Thus, the abundance distribution of environmental
DNAalsodependsonthekineticsofDNAreleaseanddegradationintheenvironment.
We assumed that this dynamics is fast with respect to changes in community
composition,sothatthe ‘stock’ofenvironmentalDNAis inasteadystate.Underthis
assumption, the rate of DNA release through the death of organisms is roughly
proportionaltothetotalnumberofcellsofthecurrentlylivingindividuals.Inaddition,
therateofenvironmentalDNAreleasebyalivingorganismreflectsitsmetabolicrate
and we assumed it to scale as the power 3/4 of body mass (or cell number), as
predicted by themetabolic theory of ecology61. DNA degradation ratewas assumed
Chapter2–NeutralParameterInference
125
uniform across individuals. Even though we focus here on multicellular organisms,
unicellularorganismsdoexcreteDNAmaterialanddifferinmetabolicratesaswell.
Based on the assumptions of the previous paragraph, we simulated the
abundancedistributionofenvironmentalDNAasfollows.We(1)generatedaneutral
sampleofindividuals,(2)assignedanumberofcellsnbetweenn0andinfinitytoeach
individualasabove,(3)countedafirstcontributiondn ofeachindividualtothestock
of environmentalDNA,withd thedeath rate, (4) andcounteda secondcontribution
𝑟!𝑛!!of each individual to the stock of environmental DNA, with r0 the rate of DNA
releaseforahypotheticalone-cellindividual.Thus,environmentalDNAabundanceper
individual is proportional to𝑛 + !!!𝑛!!rather than n. We tested the effect on neutral
parameter inference by varying 𝑟! 𝑑 , the parameter controlling the relative
contributionoflivinganddeadorganismstoenvironmentalDNA.
Estimatingtheneutralmodelparametersfromthetaxa-abundance4.
distribution
We estimated the parameters of Hubbell’s neutral model by maximum-likelihood
inference from the simulated taxa-abundancedistribution foranumberof simulated
noise sources. To test the influence of noise,we compared the estimated parameter
valuesθ andI with the values of θ and I used to generate the initial samples ofindividuals. For each set of parameters and noise intensity, we generated 100
simulatedsamples.Wereportedthemeanandstandarddeviationoftherelativebiases
(𝜃 − 𝜃) 𝜃andlog!"(𝐼 𝐼)overthe100realizations.
In the absence of dispersal limitation, the Ewens distribution permits the
inference of θ by likelihood maximization. The maximum-likelihood estimator of θ,
hereafter referred to as theEwens estimator, is implicitly given by𝑇 = !!!!
!!!!!! as a
functionofthenumberToftaxaandthenumberJofindividuals(Ewens,1972).Inthe
dispersal-limitedcase,theEtiennedistributionprovidesanexactlikelihoodexpression
for the simultaneous inference of θ and I (Etienne, 2005), as implemented in the
Chapter2–NeutralParameterInference
126
software Tetame (Jabot et al., 2008). As noted previously in the literature, the
likelihoodlandscapeoftheEtienneformulaoftendisplaystwolocalmaxima(Etienne
etal.,2006;Jabot&Chave,2009).Tofindthetrueparametervalues,wefirstestimated
θ using the Ewens estimator, and selected the local maximum with the θ estimate
closesttothevalueyieldedbytheEwensestimator.Priortotheseanalyses,wetested
the performances of both estimators on unbiased neutral data depending on
parametervaluesandsamplesize(seeSupplementaryNote3).
IntypicalenvironmentalDNAdata,thenumberJofindividualsinthesampleis
unknown.Asalreadydoneinpreviousstudies(Leeetal.,2013),weusedthenumber
of sequence reads as an effective number of individuals. This is possible owing to a
mathematicalpropertyoftheEwensandEtiennedistributions:bothdistributionsare
invariantbysamplingwithoutreplacement(Etienne&Alonso,2005),hencemaximum-
likelihood inference yields the same results on any random sample from the
community,andonanyrandomsubsamplefromaninitialsample(uptoapossiblebias
in the estimator). As a consequence, read abundances can be used for neutral
parameter inference, as long as the reads can be regarded as forming a subsample
withoutreplacementoftheinitialindividuals.Thisassumptionishowevernotalways
verified in empirical data (see Discussion). The invariance property of Etienne
distributiononlyholdsifthedistributionisexpressedasafunctionofI,thereforewe
usedheretheimmigrationparameterIinsteadofmforthepurposeofinference.Inthe
following,malwaysreferstothevalueintheinitialsampleofJindividuals.
Intheabsenceofdispersallimitation,θcanalsobeestimatedfromtheslopeof
the ranked log-abundance curve, a method that has the advantage of being
independentofJ.Indeed,thelogarithmof𝔼[𝑃!],theexpectedrelativeabundanceofthe
ith most abundant taxon, is given by:log 𝔼[𝑃!] = − log𝜃 − 𝑖 log(1+ 1 𝜃)(Ewens &
Tavaré, 1997). For simulated abundancenoise,we estimatedθ using thismethod in
additiontoEwensestimator.Werestrictedthelinearregressiontothelineardomain
of the ranked log-abundance curve. We also compared the performance of both
inferencemethods intheabsenceofsimulatednoise forsamplesof102,103,104and
105 sequence reads and for initial samples of individuals of different sizes (see
SupplementaryNote4).
Chapter2–NeutralParameterInference
127
Results
We first included artifactual MOTUs in a simulated sample and tested the effect on
estimatingthediversityparameterθoftheneutralmodelwithoutdispersallimitation.
The relative bias(𝜃 − 𝜃) 𝜃increasedwith the proportion of artifactualMOTUs, first
linearlyandthenfasterthanlinearly(Fig.1a-b).Itdidnotdependontheinitialθvalue
oronthereadabundanceoftheintroducedartifactualMOTUs.Thestandarddeviation
of𝜃wasnotmodifiedbythepresenceofartifactualMOTUs.
Next, we simulated PCR noise, modelled as a lognormally distributed
multiplicativenoisewith log standarddeviation𝜎!"#. Thisnoisehadnoeffecton the
inference of theθ parameter below a threshold𝜎!"#,!! . For𝜎!"# > 𝜎!"#,!! , θ was
underestimated.Thevalueof𝜎!"#,!!decreasedwith increasingθ but remainedof the
orderof1forθbetween1and500(𝜎!"#,!! ≈ 5for𝜃 = 1and𝜎!"#,!! ≈ 0.5for𝜃 = 500;
seeFig.1c-d).WealsoappliedanadditiveGaussiannoiseofstandarddeviation𝜎!"" to
therelativeabundances.Thistypeofnoiseintroducedabiasin𝜃forvaluesof𝜎!"" at
leastoneorderofmagnitudelargerthantherelativeabundanceoftheleastabundant
MOTUs(SupplementaryFig.S1).Neithertypeofnoiseaffectedthestandarddeviation
of𝜃(Fig. 1, Supplementary Fig. S1). These results held both inmaximum-likelihood
inferenceandwhenusinglinearregressionontherankedlog-abundance.
We then simulated the variability in barcode copy number by applying a
multiplicative noise distributed according to a zero-truncated Poisson distribution.
Thistypeofnoisehadnoeffectonθinference,evenforthemaximumnoiseintensityat
𝜆 = 1.8(Fig. 1e-f). We accounted for body size differences by assuming a steadily
growingcellnumbernoverthecourseofanindividual’slife,andbyvaryingtheratio
𝑔 𝑑𝑛! + 1ofthemeannumberofcells𝑔 𝑑 + 𝑛!dividedbytheinitialnumberofcells
𝑛!.We found that this ratio had no effect on themean and standard deviation ofθ,
evenatlargevalues(Fig.1g-h).Wealsotestedtheeffectofassigninganenvironmental
DNA mass proportional to𝑛 + !!!𝑛!!to individuals (where n is the cell number) to
reflectthejointeffectofmortality(nterm)andcellularturnover(n!!term,proportional
Chapter2–NeutralParameterInference
128
tometabolic rate).Wedidnot findanyeffectonθ inferenceeven for largevaluesof
𝑟! 𝑑(SupplementaryFig.S1).
0 50 100 150
15
5050
0 θ = 2030% artif. MOTUs
a
MOTU abundance rank
Read
abu
ndan
ce (l
og s
cale
)
0.0 0.1 0.2 0.3 0.4 0.5
0.0
0.5
1.0
1.5
Benchmark dataset
b
θ = 1θ = 20θ = 500
Proportion of artifactual MOTUs(θ
−θ)
θ
0 20 40 60 80 100
15
5050
0
θ = 20σlog = 1.2
c
MOTU abundance rank
Read
abu
ndan
ce (l
og s
cale
)
0.05 0.10 0.20 0.50 1.00 2.00
−0.8
−0.4
0.0
0.2
0.4
d
Benchmark dataset
σlog (log scale)
(θ−
θ)θ
0 20 40 60 80 100 120
15
5050
0 θ = 20λ = 4
e
MOTU abundance rank
Read
abu
ndan
ce (l
og s
cale
)
0 2 4 6 8
−0.4
−0.2
0.0
0.2
0.4
Bacteria Kembel et al. 2012
f
λ
(θ−
θ)θ
0 20 40 60 80 100 120
15
5050
0 θ = 20g (dn0) = 1000
g
MOTU abundance rank
Read
abu
ndan
ce (l
og s
cale
)
1e−03 1e−01 1e+01 1e+03
−0.4
−0.2
0.0
0.2
0.4
h
g (dn0) (log scale)
(θ−
θ)θ
Chapter2–NeutralParameterInference
129
Figure1:Neutralparameterinferencewithoutdispersallimitation.Leftpanels:meanMOTUrank abundancedistributionsover100 realizations forθ = 20in a104-read sample,without(dashedblueline)andwith(blackline)simulatednoise:(a)30%artifactualMOTUsadded(asmeasuredinbenchmarkdataset),(c)multiplicativelognormalnoiseoflogstandarddeviationσ!"# = 1.2(as measured in benchmark dataset), (e) multiplicative zero-truncated Poissonnoisesimulatingbarcodecopynumbervariability(Poissonparameterλ = 4; cf.Kembeletal.2012),and(g)sizestructureamongindividuals,foraratio !
!!!= 1000(meanbodymassover
birthmass).Rightpanels:meanandstandarddeviationover100realizationsoftherelativebias on the θ estimate in a 104-read sample, forθ = 1(green),θ = 20(black) andθ = 500(red),asafunctionof(b)theproportionofartifactualMOTUs(dashedbluelineunderlinesthelineardependence),(d)thelognormalnoiseintensityσ!"#,(f)thePoissonparameterλ,and(h)theratio𝑔 (𝑑𝑛!)
!!!!
.
Finally, we replicated the analysis in the presence of dispersal limitation (i.e.
assuming that𝑚 < 1 ). We found that the dispersal-limited maximum-likelihood
estimatorcanbestronglybiasedevenintheabsenceofsimulatednoisewhendispersal
limitation is toostrongor tooweak,especially for largeθvalues (seeSupplementary
Note 3). Therefore, we limited ourselves to parameter values that could be well
estimated in the absence of simulated noise. Provided the immigration rate is large
enough (𝑚 > 0.1 ), the relative bias (𝜃 − 𝜃) 𝜃 depended on the proportion of
artifactual MOTUs similarly to the m=1 m = 1case. For lower values of m, the
dependence of(𝜃 − 𝜃) 𝜃on the proportion of artifactualMOTUswas even stronger
(Fig. 2a-b). The relative biaslog!"(𝐼 𝐼)on the normalized immigration parameter
increased linearly with the proportion of artifactual MOTUs. Applying a lognormal
multiplicativenoiseof logstandarddeviation𝜎!"#onMOTUsrelativeabundancesdid
not bias the estimation of (θ, I) below a noise threshold𝜎!"#,!!identical to the one
foundwithout dispersal limitation. The threshold𝜎!"#,!!decreased only slightlywith
decreasingmvalue.Above𝜎!"#,!! ,θwasunderestimatedandIoverestimated(Fig.2c-
d). Applying an additive Gaussian noise of standard deviation𝜎!"" to the relative
abundancesintroducedabiasforvaluesof𝜎!"" largerthantherelativeabundanceof
theleastabundantMOTUs(SupplementaryFig.S2).Amultiplicativenoisedistributed
according to a zero-truncated Poisson had no influence on the parameter estimates
(Fig.2e-f),andlikewiseanexponentiallydistributednumberofcellsstillhadnoeffect
onparameterinferenceinthedispersal-limitedcase(Fig.2g-h,SupplementaryFig.S2).
Chapter2–NeutralParameterInference
130
Discussion
Although they provide an unparalleled amount of information, biodiversity studies
basedonenvironmentalDNAalsohavelimitations.Oneofthemisthattheabundance
of sequence reads corresponding to a given molecular taxonomic unit does not
necessarily reflect the true population abundance of the corresponding taxon. Our
analysisoffersaquantitativeassessmentoftheimportanceofthisissueinattempting
torelateenvironmentalDNAdatasetswiththeoreticalmodelpredictions.
Our goal was to assess when amplicon-based DNA read abundance data can
offerbiological insights into thepredictionsofHubbell’s neutral theory.We selected
Hubbell’smodeloverothermodelspredictingtaxa-abundancedistributionsbecauseit
incorporatesanumberofkeyfeaturesforanybiodiversitymodelsuchasdemographic
stochasticityanddispersallimitation(Vellend,2010).Estimatingtheparametersθand
m of the neutral model is useful in interpreting biodiversity patterns even if the
communityisnotgovernedbypurelyneutralmechanisms(Jabotetal.,2008).Indeed,
θ is closely related to Fisher’s biodiversity index, and is an unbiased index of
biodiversity,whilemquantifieshowthelocalsampleisconnectedtoitssurroundings.
Wesimulatedtaxaabundancedatasetsfromaneutralmodelandaddednoisetothem
usingarangeofplausiblenoisetypesandintensities.Weshowedthattheparameters
θand I could still be reliably estimated by maximum likelihood inference from the
simulated sequence reads, provided that artifactual MOTUs are rare, and that
lognormal noise on relative read abundances is below a log standard deviation
thresholdthatdependsonθ.Wealsoshowedthatunderourmodellingassumptions,
neutral inference is unbiased for assemblages of multicellular organisms and for
variablebarcodecopynumbers.Finally,we found that thenoise termshada similar
effect on parameter inference when fitting the one-parameter version of the model
(withoutdispersallimitation)andwhenfittingHubbell’sdispersal-limitedmodel.
Oneof themajordifferencesbetweenenvironmentalDNAsurveysandclassic
biodiversitysurveysisthatthenumberofsampledindividualsisusuallynotmeasured.
Chapter2–NeutralParameterInference
131
Yet,mostbiodiversitymeasuresassumetheknowledgeoftheorganisms’samplesize.
To solve this problem, we assumed in our simulations that the number of reads is
several times smaller than the number of effectively sampled individuals:𝑁 = 10!
sequencereads for𝐽 = 10!initial individuals.Under thisassumption,sequencereads
may be seen as a random subsample of the individuals, and because themaximum-
likelihood approach of the neutral theory relies on sampling formulas that are
invariant under subsampling, it follows that the inference on reads is unbiased (see
SupplementaryNote 4). Generating a larger number of individuals did not alter our
resultsbutwascomputationallyprohibitivewithouralgorithm.
The assumption that the number of sampled individuals exceeds that of
sequence reads is reasonable for prokaryotes (Whitman et al., 1998) and
microorganisms in general, but is unrealistic for larger organisms. One empirical
method to test whether the sequencing data meet the requirement for neutral
maximum-likelihoodinferenceistotakeasmallersubsampleofreadsandcheckthat
theparameterestimatesareunchanged.Ifnot,oneshoulddecreasesamplesizeuntil
stability is achieved (see SupplementaryNote 4). If environmental DNA data do not
consistofadiscretenumberofreads,asisthecaseint-RFLPandARISA,anarbitrarily
setsamplesizemaybeused(Leeetal.,2013).Thenumberof individualscanalsobe
estimatedempirically, as inWoodcocketal. (2007)orDumbrelletal. (2010). In the
neutralmodelwithoutdispersallimitation,amorestraightforwardapproachistoinfer
θ from the slope of the ranked log-abundance distribution, but this requires an
arbitrarydelimitation of the linear domain of the curve, and it is reliable only if the
read sample is large enough and contains a large enough taxonomic diversity. A
general rule is that the sampling scheme should be suited to the size and spatial
density of the target organisms: for large organisms, multiple spatially distributed
environmentalsamplesshouldbepooledsoastosampleasufficientlylargenumberof
individuals.Forinstance,capturingtheabundancedistributionofplanttaxafromsoil
DNAsamples requirespoolinga sufficientnumberof soil samplesovera sufficiently
largearea.
Chapter2–NeutralParameterInference
132
0.0 0.1 0.2 0.3 0.4 0.5
02
46
810
Benchmark dataset
a
m = 1m = 0.1m = 0.01m = 0.001
Proportion of artifactual MOTUs
log 1
0[I
I]
0.0 0.1 0.2 0.3 0.4 0.5
−0.5
0.0
0.5
1.0
b
Benchmark dataset
Proportion of artifactual MOTUs
(θ−θ)
θ
0.05 0.10 0.20 0.50 1.00 2.00
−20
24
68
10
c
Benchmark dataset
σlog (log scale)
log 1
0[I
I]
0.05 0.10 0.20 0.50 1.00 2.00
−0.5
0.0
0.5
d
Benchmark dataset
σlog (log scale)
(θ−θ)
θ
0 2 4 6 8
−10
12
3
e
Bacteria Kembel et al. 2012
λ
log 1
0[I
I]
0 2 4 6 8
−0.4
0.0
0.2
0.4
0.6
f
Bacteria Kembel et al. 2012
λ
(θ−θ)
θ
1e−03 1e−01 1e+01 1e+03
−10
12
3
g
g (dn0) (log scale)
log 1
0[I
I]
1e−03 1e−01 1e+01 1e+03
−0.4
0.0
0.4
0.8
h
g (dn0) (log scale)
(θ−θ)
θ
Chapter2–NeutralParameterInference
133
Figure2:Neutralparameterinferenceinthepresenceofdispersallimitation.Wesimulateda10!-read sample and computed the mean and standard deviation over 100 realizations of(𝜃 − 𝜃) 𝜃 and log!"(𝐼 𝐼) . Results are plotted for𝜃 = 20 and form = 1 (black),m = 0.1(green),m = 0.01(blue) andm = 0.001(red). Panels a-b: variation with the proportion ofartifactualMOTUs(dashedbluelineunderlinesthelineardependence).Panels c-d:variationwith the log standard deviation σ!"# of a multiplicative lognormal noise on relativeabundances. Panels e-f: variation with the parameter λ of a multiplicative zero-truncatedPoissonnoise.Panelsg-h:variationwiththebodysizeratio𝑔 (𝑑𝑛!)
!!!!
.
When accounting for dispersal limitation, a single sample of sequence reads
does not always provide enough information to reliably infer both θ and I from the
taxa-abundance distribution, even in the absence of additional noise source. The
maximum-likelihoodestimatormaybestronglybiasedwhentheimmigrationrateinto
the local community is either too low or too high, and increasingly so for larger
θ values (see Supplementary Note 3). Since these biases decrease with larger read
samplesize,thenumberofsequencereadsshouldbeaslargeaspossibleaslongasit
does not preclude using the sequence reads for parameter inference. Moreover, in
ordertoavoidbiasinthecaseofweakdispersallimitation,theEwensestimatorshould
be favoured whenever it yields a higher likelihood value than the dispersal-limited
estimator.
Inpractice,environmentalDNAstudiesoftensamplethesameregionalspecies
pool in different locations, which allows for more robust multi-sample maximum-
likelihood inference (Etienne, 2007, 2009). It should be noted however that exact
maximum-likelihood inference can be computationally prohibitive in the dispersal-
limitedcaseforlargernumbersofreadsthanweusedinthisstudyorinthecaseofa
multi-sample approach with large read samples (Lee et al., 2013). Continuous
approximationsdrawingontheworkofSloanetal.(2006,2007)andWoodcocketal.
(2007) might then be preferred, such as the Bayesian formulation of Harris et al.
(2015).
Our analysis reveals that the presence of artifactual MOTUs is the most
detrimentaltoneutralparameterinference.Bioinformaticsmethodsaimingatlimiting
thenumber of artifactualMOTUs shouldbe carefully applied to the sequencingdata
Chapter2–NeutralParameterInference
134
beforeanyattemptatestimatingbiodiversityindices(Siposetal.,2010;Coissacetal.,
2012; Mahe et al., 2014). However, these methods do not guarantee a complete
filtering of artifactual MOTUs from empirical datasets. In particular, chimeric
sequences formed at the PCR stagemay bemisconstrued asMOTUs. Because these
sequences are generated by rare error-generating PCR events, they should be
predominantly representedby few reads.Thusone strategy for removingartifactual
MOTUsconsistsinignoringallMOTUsbelowanempiricallysetabundancethreshold.
However,indoingso,welosetheinformationontherelationshipbetweenthenumber
ofreadsandthenumberofMOTUs.Hencewesuggestthatamoresatisfactorymethod
tomitigatethisproblemistotakeasufficientlysmallsubsampleofthesequencereads
soastotrimouttheartifactualMOTUs.
ThepresenceofartifactualMOTUsinoursimulatedtaxaassemblagesmanifests
itselfbyabreakintheslopeoftherankedlog-abundancecurve(Fig.1a,seealsoFig.S3
in Supplementary Methods). Thus, the adequate subsample size for an empirical
dataset may be chosen so as to trim out the MOTUs with abundances below an
observedbreakintherankedlog-abundancecurve.Anotherfindingofourstudyisthat
forthesameproportionofartifactualMOTUs,theθestimatehasasimilarrelativebias
acrossθvaluesandtheIestimateasimilarrelativebiasacrossIvalues.Therefore, if
artifactual MOTUs cannot be entirely excluded in an environmental DNA dataset,
conclusionsshouldbebasedonratiosofneutralparameterestimatesamongsamples
ratherthanonabsolutevalues.
We modelled PCR noise using a lognormally distributed multiplicative noise
term. We found a threshold noise value beyond which the inference of the neutral
parametersbecomesbiased.Thisthresholdwasfoundtobelowerforlargerθvalues.
For instance, the empirical noise intensity𝜎!"# = 1.2measured on our benchmark
datasetwasnearorbelowthethreshold𝜎!"#,!!forθvaluesuptoca.𝜃 = 20,whilefor
larger θ values, it was responsible for a moderate underestimation of θ (20% for
𝜃 = 500)and fora seriousoverestimationof I.Nevertheless,ourbenchmarkdataset
was here used for illustrative purposes, and noise intensity may differ in other
datasets. In metabarcoding studies, noise intensity likely depends on the barcode,
taxonomicgroupandwetlaboratoryprotocol.Thereforewestronglyadvisetoinclude
Chapter2–NeutralParameterInference
135
at least onebenchmarkdataset as part of any environmentalDNA study to quantify
noise intensity.Empiricalnoiseassessmentscanthenbecomparedtooursimulation
results.
WealsosimulatedaGaussianadditivenoiseonabundancedataandfoundthat
ithadadisproportionateeffectontheleastabundantMOTUs,thusdistortingthetaxa-
abundancedistribution: parameter inferencewasbiased if the standarddeviationof
thenoisewaslargerthantheabundanceoftheleastabundantMOTUs.Hereagain,itis
possible to correct for this type of noise in empirical datasets by subsampling the
sequence reads. Additive noise can be considered to model the abundance noise
generated by the sequencing step or by a single PCR cycle, while the succession of
severalPCRcyclesproducesamultiplicativeabundancenoise.
Another potential bias is due to the indirect relationship between the
number of DNA barcode sequences in the sample and the number of sampled
individuals. In particular, in the case ofmulticellular individuals, some of themmay
contributedisproportionatelymore thanothers.Given thevariabilityandcomplexity
oftheassociatednoisestructure,wechosetofollowamodellingapproachretainingas
much generality as possible. We size-biased our samples by assuming that DNA
availabilityintheenvironmentisproportionaltobodymass,ortotheturnoverofbody
mass (i.e. the metabolic rate). We found that neutral parameter estimates are not
modifiedbysizestructure in thecommunity, irrespectiveofhowstronglystructured
thecommunityis,whichisaninterestingandgeneralresult.
Our approach to accounting for body size is directly inspired from the size-
structuredneutralmodelofO’Dwyeretal.(2009).Thismodelintegratesthegrowthof
individuals intoaneutralpopulationdynamicswithoutdispersal limitation,andmay
offeranalyticalpredictionsfortheneutral“SpeciesBiomassDistribution”(SBD)while
accounting for the dependence of birth, death and growth rates on the size of
individuals.When individuals grow inbody size at a constant rate andneitherbirth
nor death rates depend on size, this model predicts the same SBD as obtained
analyticallyunderourassumptionofindependentexponentiallydistributedsizes(see
SupplementaryNote2).OurchoiceofarateofenvironmentalDNAreleasescalingwith
the3/4thpowerofbodymassismotivatedbyapredictionofthemetabolictheoryof
Chapter2–NeutralParameterInference
136
ecology,whichrelates themetabolic rate to thebodymass inoneof the fewgeneral
lawsofecology(Westetal.,1997).
Eventhoughourmodellingapproachderivesfromtheoreticalconsiderations,it
isalsosupportedbysomeempiricalevidence:ithasbeenshownthattherateofDNA
detectionintheenvironmentisbiasedbythesizeoforganisms(Andersenetal.,2012;
Maruyamaetal.,2014;Klymusetal.,2015),andthefactthatDNAabundanceshould
scale non-linearly with body mass has been experimentally verified in fishes
(Maruyama et al., 2014). Nevertheless, the noise introduced by size structure,
fragments of organisms and extracellular DNA certainly has a farmore complicated
structurethanwesimulated.Forinstance,ratesofDNAreleaseintotheenvironment
and of DNA degradation both depend on taxa and on local conditions, and fluctuate
temporally (Levy-Booth et al., 2007; Barnes et al., 2014; Strickler et al., 2015).
Moreover,theunevenspatialdistributionofenvironmentalDNAmaypreventproperly
samplingthetaxa-abundancedistributioninthecommunity,especiallyifwholepieces
of living or decaying multicellular organisms are contained in the environmental
sample. Poolingmultiple spatially distributed samples should help average out local
heterogeneity.
Inthisstudy,weconsideredthatdepartureofthenumberofDNAbarcodereads
fromtherealtaxonabundanceisasourceofbias.However,thissourceofbiasmaybe
generallyseenastheaccumulationofmutationsduringreplication.Inecology,theonly
type of replication taken into consideration is demography, butDNAmetabarcoding
data are also the result of cellular and PCR replication processes. Since the
assumptionsoftheneutraltheoryaregenericandapplytoanycollectionofreplicating,
mutating,andpotentiallydispersingentities,wecouldreplaceindividualorganismsby
DNAbarcodesasourbasicreplicatingentities,andreinterprettheneutralparameters
accordingly.Asa consequence,weexpect the taxa-abundancestructurepredictedby
theneutraltheorytoberobustaslongastheDNAbarcodesdonotdiffertoomuchin
theirreplicating,mutatinganddispersingabilities.
This study demonstrates that inferring the parameters of Hubbell’s neutral
model from the taxa-abundance distribution is possible even in noised biodiversity
Chapter2–NeutralParameterInference
137
datasets.Wetestedthishypothesisforarangeofbiologicallyplausiblenoisetermson
simulated metabarcoding data, and we provide guidance for neutral parameter
inference from such data. Our results indicate that whether an environmental DNA
dataset really reflects the sampled communitydependsonnoise intensity.Theyalso
suggest that this question can be answered by computing simple metrics on a
benchmarkdatasetandcomparingthemtooursimulations.Theonlywaytoquantify
thenoiselevelistoconductcarefulbenchmarkingexperiments,whichwilldependon
theexactsamplingandanalysisprotocol.
Chapter2–NeutralParameterInference
138
Acknowledgements
We thankRyanChisholm,FabienLarocheand JamesO’Dwyer for fruitfuldiscussion.
Thisworkhasbenefitedfrom“Investissementd’Avenir”grantsmanagedbytheFrench
AgenceNationaledelaRecherche(CEBA,ref.ANR-10-LABX-25-01andTULIP,ref.ANR-
10-LABX-0041)andfromanadditionalANRgrant(METABARproject;PIP.Taberlet).
Chapter2–NeutralParameterInference
139
References
Aird,D., Ross,M.G., Chen,W.-S., Danielsson,M., Fennell, T., Russ, C., Jaffe, D.B., Nusbaum, C.&Gnirke, A. (2011) Analyzing and minimizing PCR amplification bias in Illuminasequencinglibraries.GenomeBiology,12.
Amend, A.S., Seifert, K.A. & Bruns, T.D. (2010) Quantifying microbial communities with 454pyrosequencing:doesreadabundancecount?MolecularEcology,19,5555–5565.
Andersen, K., Bird, K.L., Rasmussen,M., Haile, J., Breuning-Madsen,H., Kjaer, K.H., Orlando, L.,Gilbert, M.T.P. &Willerslev, E. (2012)Meta-barcoding of “dirt” DNA from soil reflectsvertebratebiodiversity.MolecularEcology,21,1966–1979.
Ayarza, J.M. & Erijman, L. (2011) Balance of Neutral and Deterministic Components in theDynamicsofActivatedSludgeFlocAssembly.MicrobialEcology,61,486–495.
Baas Becking, L.G.M. (1934) Geobiologie of inleiding tot demilieukunde., W.P. Van Stockum &Zoon,TheHague,theNetherlands.
Barnes, M.A., Turner, C.R., Jerde, C.L., Renshaw, M.A., Chadderton, W.L. & Lodge, D.M. (2014)EnvironmentalconditionsinfluenceeDNApersistenceinaquaticsystems.EnvironmentalScience&Technology,48,1819–1827.
Bienert,F.,DeDanieli,S.,Miquel,C.,Coissac,E.,Poillot,C.,Brun,J.&Taberlet,P.(2012)TrackingearthwormcommunitiesfromsoilDNA.MolecularEcology,21,2017–2030.
Bik,H.M.,Porazinska,D.L.,Creer,S.,Caporaso,J.G.,Knight,R.&Thomas,W.K.(2012)Sequencingour way towards understanding global eukaryotic biodiversity. Trends in Ecology &Evolution,27,233–243.
Boyer,F.,Mercier,C.,Bonin,A.,LeBras,Y.,Taberlet,P.&Coissac,E.(2016)OBITOOLS:aUNIX-inspired software package for DNA metabarcoding. Molecular Ecology Resources, 16,176–182.
Brown,J.H.(1995)Macroecology,UniversityofChicagoPress.Coissac,E.,Riaz,T.&Puillandre,N.(2012)BioinformaticchallengesforDNAmetabarcodingof
plantsandanimals.MolecularEcology,21,1834–1847.Degnan, P.H. & Ochman, H. (2012) Illumina-based analysis ofmicrobial community diversity.
ISMEJournal,6,183–194.Drakare,S.&Liess,A.(2010)Localfactorscontrolthecommunitycompositionofcyanobacteria
in lakes while heterotrophic bacteria follow a neutral model. Freshwater Biology, 55,2447–2457.
Dumbrell,A.J.,Nelson,M.,Helgason,T.,Dytham,C.&Fitter,A.H. (2010)Relativerolesofnicheandneutralprocesses instructuringasoilmicrobial community. ISMEJournal,4,337–345.
Etienne, R.S. (2007) A neutral sampling formula for multiple samples and an “exact” test ofneutrality.EcologyLetters,10,608–618.
Etienne,R.S. (2005)Anew sampling formula forneutral biodiversity.EcologyLetters,8, 253–260.
Etienne,R.S. (2009)Maximum likelihoodestimationofneutralmodelparameters formultiplesamples with different degrees of dispersal limitation. Journal of Theoretical Biology,257,510–514.
Etienne, R.S. & Alonso, D. (2005) A dispersal-limited sampling theory for species and alleles.EcologyLetters,8,1147–1156.
Chapter2–NeutralParameterInference
140
Etienne,R.S.&Alonso,D. (2007)Neutral community theory:Howstochasticity anddispersal-limitationcanexplainspeciescoexistence.JournalofStatisticalPhysics,128,485–510.
Etienne,R.S.,Alonso,D.&McKane,A.J.(2007)Thezero-sumassumptioninneutralbiodiversitytheory.JournalofTheoreticalBiology,248,522–536.
Etienne,R.S.,Latimer,A.M.,Silander,J.A.&Cowling,R.M.(2006)Commenton“Neutralecologicaltheory reveals isolation and rapid speciation in a biodiversity hot spot.” Science,311,610B–+.
Ewens, W.J. (1972) The sampling theory of selectively neutral alleles. Theoretical populationbiology,3,87–112.
Ewens, W.J. & Tavaré, S. (1997) Multivariate Ewens Distribution. Discrete MultivariateDistributions(ed.byJohnson,.
Fenchel, T. & Finlay, B.J. (2004) The ubiquity of small species: Patterns of local and globaldiversity.Bioscience,54,777–784.
vonFoerster,H.(1959)Someremarksonchangingpopulations.KineticsofCellularProliferation,pp.382–399.Stohlman,F.
Giovannoni,S.J.,Britschgi,T.B.,Moyer,C.L.&Field,K.G.(1990)GeneticdiversityinSargassoSeabacterioplankton.Nature,345,60–63.
Harris,K.,Parsons,T.L.,Ijaz,U.Z.,Lahti,L.,Holmes,I.&Quince,C.(2015)Linkingstatisticalandecological theory: Hubbell’s Unified Neutral Theory of Biodiversity as a HierarchicalDirichletProcess.Proc.IEEE,PP,1–14.
Hebert,P.D.N.,Cywinska,A.,Ball,S.L.&DeWaard, J.R.(2003)Biological identificationsthroughDNAbarcodes.ProceedingsoftheRoyalSocietyB-BiologicalSciences,270,313–321.
Hilborn,R.&Mangel,M.(1997)Theecologicaldetective:confrontingmodelswithdata,PrincetonUniversityPress.
Hoppe, F.M. (1984) Polya-like urns and the Ewens sampling formula. JournalofMathematicalBiology,20,91–94.
Hubbell, S.P. (2001) The unified neutral theory of biodiversity and biogeography (MPB-32),PrincetonUniversityPress.
Huber, J.A.,MarkWelch,D.,Morrison,H.G.,Huse,S.M.,Neal,P.R.,Butterfield,D.A.&Sogin,M.L.(2007)Microbialpopulationstructuresinthedeepmarinebiosphere.Science,318,97–100.
Jabot,F.&Chave,J.(2009)Inferringtheparametersoftheneutraltheoryofbiodiversityusingphylogeneticinformationandimplicationsfortropicalforests.EcologyLetters,12,239–248.
Jabot, F., Etienne, R.S. & Chave, J. (2008) Reconciling neutral community models andenvironmentalfiltering:theoryandanempiricaltest.Oikos,117,1308–1320.
Kembel, S.W., Wu, M., Eisen, J.A. & Green, J.L. (2012) Incorporating 16S gene copy numberinformation improves estimates of microbial diversity and abundance. PlosComputationalBiology,8,11.
Klymus,K.E.,Richter,C.A.,Chapman,D.C.&Paukert,C.(2015)QuantificationofeDNAsheddingrates from invasive bighead carp Hypophthalmichthys nobilis and silver carpHypophthalmichthysmolitrix.BiologicalConservation,183,77–84.
Lee,J.E.,Buckley,H.L.,Etienne,R.S.&Lear,G.(2013)Bothspeciessortingandneutralprocessesdrive assembly of bacterial communities in aquatic microcosms. Fems MicrobiologyEcology,86,288–302.
Legendre,P.&Legendre,L.(2012)NumericalEcology,Elsevier.
Chapter2–NeutralParameterInference
141
Levy-Booth,D.J.,Campbell,R.G.,Gulden,R.H.,Hart,M.M.,Powell,J.R.,Klironomos,J.N.,Pauls,K.P.,Swanton,C.J.,Trevors,J.T.&Dunfield,K.E.(2007)CyclingofextracellularDNAinthesoilenvironment.SoilBiology&Biochemistry,39,2977–2991.
Mahe, F., Rognes, T., Quince, C., de Vargas, C. & Dunthorn,M. (2014) Swarm: robust and fastclusteringmethodforamplicon-basedstudies.Peerj,2.
Maruyama,A.,Nakamura,K.,Yamanaka,H.,Kondoh,M.&Minamoto,T.(2014)ThereleaserateofenvironmentalDNAfromjuvenileandadultfish.PlosOne,9,13.
Nguyen,N.H., Smith,D., Peay, K.&Kennedy, P. (2015) Parsing ecological signal fromnoise innextgenerationampliconsequencing.NewPhytologist,205,1389–1393.
O’Dwyer, J.P.,Lake, J.K.,Ostling,A.,Savage,V.M.&Green, J.L. (2009)An integrative frameworkforstochastic,size-structuredcommunityassembly.ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesofAmerica,106,6170–6175.
Ofiteru, I.D., Lunn, M., Curtis, T.P.,Wells, G.F., Criddle, C.S., Francis, C.A. & Sloan,W.T. (2010)Combined niche and neutral effects in a microbial wastewater treatment community.Proceedings of the National Academy of Sciences of the United States of America, 107,15345–15350.
Ostman, O., Drakare, S., Kritzberg, E.S., Langenheder, S., Logue, J.B. & Lindstrom, E.S. (2010)Regionalinvarianceamongmicrobialcommunities.EcologyLetters,13,118–127.
Pienaar, E., Theron, A., Nelson, A. & Viljoen, H.J. (2006) A quantitative model of erroraccumulationduringPCRamplification.ComputationalBiologyandChemistry,30, 102–111.
Quince, C., Lanzen, A., Davenport, R.J. & Turnbaugh, P.J. (2011) Removing noise frompyrosequencedamplicons.BMCBioinformatics,12,18.
Ramirez,K.S., Leff, J.W.,Barberan,A., Bates, S.T., Betley, J., Crowther,T.W.,Kelly, E.F.,Oldfield,E.E., Shaw, E.A., Steenbock, C., Bradford, M.A., Wall, D.H. & Fierer, N. (2014)Biogeographic patterns in below-grounddiversity inNewYork City’s Central Park aresimilartothoseobservedglobally.ProceedingsoftheRoyalSocietyB-BiologicalSciences,281,9.
Ricklefs, R.E. (2004) A comprehensive framework for global patterns in biodiversity. EcologyLetters,7,1–15.
Roesch, L.F., Fulthorpe, R.R., Riva, A., Casella, G., Hadwin, A.K.M., Kent, A.D., Daroub, S.H.,Camargo,F.A.O.,Farmerie,W.G.&Triplett,E.W.(2007)Pyrosequencingenumeratesandcontrastssoilmicrobialdiversity.ISMEJournal,1,283–290.
Roguet,A.,Laigle,G.S.,Therial,C.,Bressy,A.,Soulignac,F.,Catherine,A.,Lacroix,G.,Jardillier,L.,Bonhomme, C., Lerch, T.Z.& Lucas, F.S. (2015)Neutral communitymodel explains thebacterialcommunityassemblyinfreshwaterlakes.FemsMicrobiologyEcology,91,11.
Rosenzweig,M.L.(1995)Speciesdiversityinspaceandtime,CambridgeUniversityPress.Rosindell, J., Hubbell, S.P., He, F., Harmon, L.J. & Etienne, R.S. (2012) The case for ecological
neutraltheory.TrendsinEcology&Evolution,27,203–208.Ross,M.G.,Russ,C.,Costello,M.,Hollinger,A.,Lennon,N.J.,Hegarty,R.,Nusbaum,C.&Jaffe,D.B.
(2013)Characterizingandmeasuringbiasinsequencedata.GenomeBiology,14.Rosvall, M., Axelsson, D. & Bergstrom, C.T. (2009) The map equation. The European Physical
JournalSpecialTopics,178,13–23.Schleper,C.,Jurgens,G.&Jonuscheit,M.(2005)Genomicstudiesofuncultivatedarchaea.Nature
ReviewsMicrobiology,3,479–488.Sipos, M., Jeraldo, P., Chia, N., Qu, A.I., Dhillon, A.S., Konkel, M.E., Nelson, K.E., White, B.A. &
Chapter2–NeutralParameterInference
142
Goldenfeld,N.(2010)RobustcomputationalanalysisofrRNAhypervariabletagdatasets.PlosOne,5,8.
Sipos,R.,Szekely,A.J.,Palatinszky,M.,Revesz,S.,Marialigeti,K.&Nikolausz,M.(2007)Effectofprimer mismatch, annealing temperature and PCR cycle number on 16S rRNA gene-targettingbacterialcommunityanalysis.FEMSMicrobiologyEcology,60,341–350.
Sloan,W.T.,Lunn,M.,Woodcock,S.,Head,I.M.,Nee,S.&Curtis,T.P.(2006)Quantifyingtherolesof immigrationand chance in shapingprokaryote community structure.EnvironmentalMicrobiology,8,732–740.
Sloan,W.T.,Woodcock, S., Lunn,M.,Head, I.M.&Curtis, T.P. (2007)Modeling taxa-abundancedistributions in microbial communities using environmental sequence data.MicrobialEcology.
Strickler,K.M., Fremier,A.K.&Goldberg,C.S. (2015)Quantifying effectsofUV-B, temperature,andpHoneDNAdegradation inaquaticmicrocosms.BiologicalConservation,183, 85–92.
Taberlet,P.,Coissac,E.,Hajibabaei,M.&Rieseberg,L.H.(2012)EnvironmentalDNA.MolecularEcology,21,1789–1793.
Taberlet,P.,Coissac,E.,Pompanon,F.,Gielly,L.,Miquel,C.,Valentini,A.,Vermat,T.,Corthier,G.,Brochmann, C. & Willerslev, E. (2007) Power and limitations of the chloroplast trnL(UAA)intronforplantDNAbarcoding.Nucleicacidsresearch,35,e14–e14.
Tedersoo,L.,Bahram,M.,Cajthaml,T.,Põlme,S.,Hiiesalu, I.,Anslan,S.,Harend,H.,Buegger,F.,Pritsch, K., Koricheva, J. & Abarenkov, K. (2015) Tree diversity and species identityeffectsonsoilfungi,protistsandanimalsarecontextdependent.IsmeJournal.
Tedersoo, L., Bahram,M., Polme, S., Koljalg, U., Yorou, N.S.,Wijesundera, R., Ruiz, L.V., Vasco-Palacios,A.M.,Thu,P.Q.,Suija,A.,Smith,M.E.,Sharp,C.,Saluveer,E.,Saitta,A.,Rosas,M.,Riit,T.,Ratkowsky,D.,Pritsch,K.,Poldmaa,K.,Piepenbring,M.,Phosri,C.,Peterson,M.,Parts,K., Partel,K.,Otsing, E.,Nouhra, E.,Njouonkou,A.L.,Nilsson,R.H.,Morgado, L.N.,Mayor,J.,May,T.W.,Majuakim,L.,Lodge,D.J.,Lee,S.S.,Larsson,K.H.,Kohout,P.,Hosaka,K.,Hiiesalu,I.,Henkel,T.W.,Harend,H.,Guo,L.D.,Greslebin,A.,Grelet,G.,Geml,J.,Gates,G., Dunstan, W., Dunk, C., Drenkhan, R., Dearnaley, J., De Kesel, A., Dang, T., Chen, X.,Buegger,F.,Brearley,F.Q.,Bonito,G.,Anslan,S.,Abell,S.&Abarenkov,K.(2014)Globaldiversityandgeographyofsoilfungi.Science,346,1078–+.
Vellend,M.(2010)Conceptualsynthesisincommunityecology.QuarterlyReviewofBiology,85,183–206.
Volkov, I., Banavar, J.R.,Hubbell, S.P.&Maritan,A. (2003)Neutral theory and relative speciesabundanceinecology.Nature,424,1035–1037.
Weber,A.A.T.&Pawlowski,J.(2013)Canabundanceofprotistsbeinferredfromsequencedata:AcasestudyofForaminifera.PlosOne,8,8.
West,G.B.,Brown,J.H.&Enquist,B.J.(1997)Ageneralmodelfortheoriginofallometricscalinglawsinbiology.Science,276,122–126.
Whitman, W.B., Coleman, D.C. & Wiebe, W.J. (1998) Prokaryotes: The unseen majority.ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesofAmerica,95,6578–6583.
Woodcock, S., vanderGast, C.J., Bell, T., Lunn,M., Curtis, T.P.,Head, I.M.& Sloan,W.T. (2007)Neutralassemblyofbacterialcommunities.FemsMicrobiologyEcology,62,171–180.
Yoccoz,N.G.,Brathen,K.A.,Gielly,L.,Haile,J.,Edwards,M.E.,Goslar,T.,vonStedingk,H.,Brysting,A.K.,Coissac,E.,Pompanon,F.,Sonstebo,J.H.,Miquel,C.,Valentini,A.,deBello,F.,Chave,
Chapter2–NeutralParameterInference
143
J.,Thuiller,W.,Wincker,P.,Cruaud,C.,Gavory,F.,Rasmussen,M.,Gilbert,M.T.P.,Orlando,L., Brochmann, C., Willerslev, E. & Taberlet, P. (2012) DNA from soil mirrors planttaxonomicandgrowthformdiversity.MolecularEcology,21,3647–3655.
Yu,D.W., Ji, Y.Q.,Emerson,B.C.,Wang,X.Y., Ye,C.X., Yang,C.Y.&Ding,Z.L. (2012)Biodiversitysoup:metabarcodingofarthropodsforrapidbiodiversityassessmentandbiomonitoring.MethodsinEcologyandEvolution,3,613–623.
Zinger, L., Gobet, A. & Pommier, T. (2012) Two decades of describing the unseenmajority ofaquaticmicrobialdiversity.MolecularEcology,21,1878–1896.
Chapter2–NeutralParameterInference
144
SupplementaryInformation
Figure S1: Effect of additive noise andmetabolic rate on neutral parameter inference.Leftpanels:meanMOTUrankabundancedistributionsover100realizations for𝜃 = 20ina10!-read sample, without (dashed blue line) andwith (black line) simulated noise: (a) additiveGaussian noise of standard deviation𝜎!"" = 5. 10!!(5 times the relative abundance1 𝑁 =10!!of the least abundantMOTUs), and (c) size structureamong individuals andnon-linearscalingofDNAreleasewithbodymass, forabodysizeratio !
!!!= 1,000andaratio!!
!= 100
betweenmetabolicrateanddeathrate.Rightpanels:meanandstandarddeviationover100realizations of the relative bias on the𝜃estimate in a10!-read sample, for𝜃 = 1(green),𝜃 = 20(black)and𝜃 = 500(red),asafunctionof(b)theadditivenoiseintensity𝜎!"" ,and(d)theratio!!
!.
0 20 40 60 80 100 120
15
5050
0 θ = 20σadd = 5.10−4
a
MOTU abundance rank
Rea
d ab
unda
nce
(log
scal
e)
1e−05 1e−03 1e−01 1e+01
−0.6
−0.2
0.0
0.2
0.4
b
1/N
θ = 1θ = 20θ = 500
σadd (log scale)
(θ−θ)
θ
0 20 40 60 80 100 120
15
5050
0 θ = 20r0 d = 100
c
MOTU abundance rank
Rea
d ab
unda
nce
(log
scal
e)
1e−02 1e+00 1e+02 1e+04
−0.4
−0.2
0.0
0.2
0.4
d
r0 d (log scale)
(θ−θ)
θ
Chapter2–NeutralParameterInference
145
Figure S2:Effectofadditivenoiseandmetabolic rateonneutralparameter inference in thepresenceofdispersallimitation.Wesimulateda10!-readsampleandcomputedthemeanandstandard deviation over 100 realizations of(𝜃 − 𝜃) 𝜃andlog!"(𝐼 𝐼). Results are plotted for𝜃 = 20and for𝑚 = 1(black),𝑚 = 0.1(green),𝑚 = 0.01(blue) and𝑚 = 0.001(red). Panelsa-b: variation with the noise intensity𝜎!"" of an additive Gaussian noise on relativeabundances(1 𝑁 = 10!!istherelativeabundanceoftheleastabundantMOTUs).Panelsc-d:variationwiththeratio!!
!betweenmetabolicrateanddeathrate.
1e−05 1e−04 1e−03 1e−02 1e−01
−2−1
01
2a
1/N
m = 1m = 0.1m = 0.01m = 0.001
σadd (log scale)
log 1
0[I
I]
1e−05 1e−03 1e−01 1e+01−0.6
−0.2
0.2
0.4
0.6
b
1/N
σadd (log scale)(θ
−θ)
θ
1e−02 1e+00 1e+02 1e+04
−10
12
3
c
r0 d (log scale)
log 1
0[I
I]
1e−02 1e+00 1e+02 1e+04
−0.4
0.0
0.2
0.4
0.6
d
r0 d (log scale)
(θ−θ)
θ
Chapter2–NeutralParameterInference
146
SupplementaryMethods:Quantifyingnoiseusingabenchmark
dataset
Tobuildourbenchmarkdataset,wemixedthegenomicDNAextractedfrom16Alpine
plant species in known quantities (Table S1), and we amplified and sequenced the
chloroplasttrnLP6-loopbarcode(primerg-h;Taberletetal.,2007).Amplificationand
sequencing were replicated eight times. The DNA concentrations of the different
species in the mixture scaled logarithmically, with a doubling in genomic DNA
concentration from one species to the next more abundant. The 16 species thus
spannedalargerangeofDNAconcentration(1.10-5ng/µLto1ng/µL),representative
oftheDNAabundancesfoundinenvironmentalsamples.
The PCR mixtures comprised 2 ng DNA template, 10 µl of AmpliTaq Gold®
MasterMix (LifeTechnologies,Carlsbad,CA,USA),0.25µMofeachprimer,3.2µgof
BSA (Roche Diagnostic, Basel, Switzerland) for a final reaction volume of 20 µl.
Thermocycling conditions consisted of an initial denaturation step (95°C, 10 min)
followedby35cyclesofdenaturationat95°C(30s),primerannealingat50°C(30s)
andelongationat72°C(1min),andbyafinalextensionstep(72°C,7min).Amplicons
werethenpurified(MinEluteTMPCRpurificationkit,Qiagen),pooled,loadedonaHiSeq
Illuminalaneandsequencedusingthepaired-endtechnology.Thereadcoveragewas
about105Illuminasequencereadsforeachoftheeightreplicates.
Thesequencingdatawerefirstcuratedfollowingclassicalproceduresusingthe
OBITools package(Boyer et al., 2016), consisting in paired-end read assembly, read
assignationtotheirrespectivesamplesanddereplication.Sequencesoflengthshorter
than 10 nucleotides or containing ambiguous nucleotides were excluded. The
sequenceswerethenprocessedusingtheInfomapclusteringalgorithm(Rosvalletal.,
2009),tominimizethenumberofartifactualMOTUsbyclusteringsequencestogether
based on their similarity. The dataset is considered as a network of sequences
connected by links weighted according to sequence similarity. We used weights
decreasing exponentially with the number of nucleotide differences between
sequences and we discarded the links for more than 5 nucleotide differences. All
replicates were lumped for this clustering analysis. In parallel, all sequences were
Chapter2–NeutralParameterInference
147
assignedtoataxonusingthebarcodesofthe16speciesasareferencedatabase(Table
S1).
The clustering algorithm yielded 48 clusters (i.e. MOTUs), 24 of which were
foundonlyinsomeofthereplicates(Fig.S3a).Eachinputspecieswasrepresentedas
themost abundant sequenceof aMOTU found in all 8 realizations. Takingonly into
account theMOTUs shared across replicates, the proportion of artifactualMOTUs in
thecurateddatasetis33%(Fig.S3b).Usingthetaxonomicassignationofallsequences
tothemostsimilarofthe16species,wefoundthateachartifactualMOTUoriginates
from a single species and is at least 50 times less abundant than the species that
generated it (Fig. S3a). Therefore, artifactual MOTUs have little impact on the
abundance of the true MOTUs in the dataset. Moreover, the number of artifactual
MOTUs generated by a species is proportional to the latter’s read abundance r (Fig.
S3c), and the log-abundance of these artifactual MOTUs is uniformly distributed
between0andlog(𝑟 50).OurmodelingchoiceforsimulatingartifactualMOTUswith
realisticabundancesbuiltontheseempiricalobservations.
The amplification factor, i.e. the ratio between the read abundance and the
initialDNAconcentration,was found tobeapproximatelyconstantover therangeof
DNA concentrations spanned in the dataset (Fig. S3d). However, it varied across
species and replicates. This results in a multiplicative noise on relative abundances
that is approximately lognormally distributed, with logarithm standard deviation
𝜎!"# = 1.2 (Fig. S3e). Seventy-three percent of the variance of the logarithm is
explained by differences among species (likely related to the variability in barcode
copy number and in efficiency of PCR amplification) while the remaining variance
correspondstothevariabilityamongrealizations(Fig.S3d).
References:
Boyer,F.,Mercier,C.,Bonin,A.,LeBras,Y.,Taberlet,P.&Coissac,E.(2016)OBITOOLS:aUNIX-inspired software package for DNAmetabarcoding.Molecular Ecology Resources,16,176–182.
Rosvall,M., Axelsson, D. & Bergstrom, C.T. (2009) Themap equation.TheEuropeanPhysicalJournalSpecialTopics,178,13–23.
Taberlet,P.,Coissac,E.,Pompanon,F.,Gielly,L.,Miquel,C.,Valentini,A.,Vermat,T.,Corthier,G.,
Chapter2–NeutralParameterInference
148
Brochmann, C. &Willerslev, E. (2007) Power and limitations of the chloroplast trnL(UAA)intronforplantDNAbarcoding.Nucleicacidsresearch,35,e14–e14.
Species Dilutionfactor
Sequence Sequencelength(nt)
SequenceGCcontent
(%)Taxusbaccata 1.000000 atccgtattataggaacaataattttattttctagaaaagg 41 24.39Salviapratensis 0.500000 atcctgttttctcaaaacaaaggttcaaaaaacgaaaaaaaaaag 45 26.67
Populustremula 0.250000atcctatttttcgaaaacaaacaaaaaaacaaacaaaggttcataaagacagaataagaatacaaaag 68 25.00
Rumexacetosa 0.125000 ctcctcctttccaaaaggaagaataaaaaag 31 35.48
Carpinusbetulus 0.062500atcctgttttcccaaaacaaataaaacaaatttaagggttcataaagcgagaataaaaaag 61 27.87
Fraxinusexcelsior 0.031250 atcctgttttcccaaaacaaaggttcagaaagaaaaaag 39 33.33Piceaabies 0.015625 atccggttcatggagacaatagtttcttcttttattctcctaagataggaaggg 54 38.89
Loniceraxylosteum 0.007813 atccagttttccgaaaacaagggtttagaaagcaaaaatcaaaaag 46 32.61Abiesalba 0.003906 atccggttcatagagaaaagggtttctctccttctcctaaggaaagg 47 44.68
Acercampestre 0.001953atcctgttttacgagaataaaacaaagcaaacaagggttcagaaagcgagaaaggg 56 39.29
Brizamedia 0.000977atccgtgttttgagaaaacaagggggttctcgaactagaatacaaaggaaaag 53 39.62
Rosacanina 0.000488 atcccgttttatgaaaacaaacaaggtttcagaaagcgagaataaataaag 51 31.37Capsellabursa-pastoris 0.000244 atcctggtttacgcgaacacaccggagtttacaaagcgagaaaaaagg 48 45.83
Geraniumrobertianum 0.000122atccttttttacgaaaataaagaggggctcacaaagcgagaatagaaaaaaag 53 33.96
Rhododendronferrugineum 0.000061 atccttttttcgcaaacaaacaaagattccgaaagctaaaaaaaag 46 30.43
Lotuscorniculatus 0.000031atcctgctttacgaaaacaagggaaagttcagttaagaaagcgacgagaaaaatg 55 38.18
TableS1:Listandcharacteristicsofthe16plantspeciesincludedinthebenchmarkdataset.
Chapter2–NeutralParameterInference
149
Figure S3: Empirical results for the benchmark dataset obtained by mixing the DNA of 16plantspecies,thenamplifyingbyPCRandsequencingonanIlluminaplatformthechloroplasttrnLP6-loopbarcode,witheightreplicates.Panela:Readabundanceofthe16species(�)andoftheartifactualMOTUs(∘,∘),averagedoverthereplicates,asafunctionofthespeciesinitialabundance.SomeartifactualMOTUswere found ineveryrealization(∘),butotherswerenot(∘).Thebluedottedlinesdelineatetheabundancedomainchosentomodeltheabundancesofartifactual MOTUs. Panel b: Number of reads per MOTU as a function of the MOTU’sabundance rank, including and excluding artifactual MOTUs (black and dashed blue,respectively). Panel c:LinearrelationshipbetweenthenumberofartifactualMOTUsandthe
1e+00 1e−01 1e−02 1e−03 1e−04
●
●
●●
●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
● ●●
0.1
1010
00
aN
umbe
r of r
eads
(log
sca
le)
Species initial abundance (log scale)5 10 15 20
550
500
5000
5000
0
b
MOTU abundance rank
Num
ber o
f rea
ds (l
og s
cale
)●
●
●
● ●
●●●●●●●●●●
0.5 0.4 0.3 0.2 0.1 0.0
c
01
23
Species initial abundance
Num
ber o
f arti
fact
ual M
OTU
s
1e−04 1e−03 1e−02 1e−01 1e+00
−4−2
02
4
●
●●
●
●●
●●
●
●●
●
●●●●
●
●●
●
●●
●●●
●
●
●
●●●●
●●●
●
●
●
●
●
●
●●
●
●●●●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●●●●
●
●
●●
●
●●●
●
●
●
●
●
●
●●
●
●
●●
●●
●●
●
●●●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
d
Species initial abundance (log scale)
log(
ampl
ifica
tion
fact
or)
log(amplification factor)−3 −2 −1 0 1 2
05
1015
20
e
Chapter2–NeutralParameterInference
150
relativeabundanceofthespeciesthatgeneratedthem(themostabundantspeciesisexcluded,as well as the MOTUs found in only some of the replicates). Panel d: Logarithm of theamplification factor, i.e. the ratio between the read abundance and the initial DNAconcentration, as a function of the initial DNA concentration of the species (dotted lines:standarddeviation𝜎!"# = 1.2overallspeciesandallreplicates).Panel e:Probabilitydensityof the logarithm of the amplification factor over the 16 species and the 8 realizations,approximatelynormallydistributed.
Chapter2–NeutralParameterInference
151
SupplementaryNote1:Hubbell’sneutralmodel
Hubbell’s neutral model of biodiversity describes a large pool of JM individuals
undergoing randomdeath, birth and speciation events in the followingway: at each
timestep,oneindividualatrandomdies,andisreplacedbyanewindividual.Thisnew
individualbelongstoataxonnotpreviouslyfoundinthecommunitywithprobability
ν, or to oneof the already existing taxawithprobability1-ν. In the latter case, each
taxon has a probability to be picked proportionally to its abundance in the
community1. In the absence of dispersal limitation, the multivariate steady-state
distributionof taxaabundances is called theEwensdistributionand is characterized
bythesingleparameter𝜃 = !!!!
(𝐽! − 1)(Ewens,1972;Etienne&Alonso,2005).Any
sampleconsistingof𝐽 < 𝐽! individualsdrawnatrandomfromthecommunityfollows
alsotheEwensdistributionofparameterθ.
Adispersal-limitedversionof thismodel isdefinedas follows(Hubbell,2001;
Etienne & Alonso, 2005). New taxa disperse into a single local community by
immigrationfromaregionalpool,whichfollowsthemodelwithoutdispersallimitation
describedabove.Whenanindividualdies, it isreplacedbyanimmigratingindividual
withprobabilitym,andbytheoffspringofalocalindividualwithprobability1-m.Two
immigrantsmaybelongto thesametaxon.Themultivariatesteady-statedistribution
of taxa abundances in the dispersal-limited local community depends on two
parameters: the dispersal parameter 𝐼 = !!!!
(𝐽 − 1) , where J is the number of
individualsinthelocalcommunity,andthediversityparameterθoftheregionalpool
(Etienne,2005).Anysampledrawnatrandomfromthelocalcommunityalsofollows
theEtiennedistributionofparametersθandI(Etienne&Alonso,2005).
References:
Etienne, R.S. & Alonso, D. (2005) A dispersal-limited sampling theory for species and alleles.EcologyLetters,8,1147–1156.
Etienne,R.S. (2005)Anew sampling formula forneutral biodiversity.EcologyLetters,8, 253–260.
Ewens, W.J. (1972) The sampling theory of selectively neutral alleles. Theoretical population
Chapter2–NeutralParameterInference
152
biology,3,87–112.Hubbell, S.P. (2001) The unified neutral theory of biodiversity and biogeography (MPB-32),
PrincetonUniversityPress.
Chapter2–NeutralParameterInference
153
SupplementaryNote2:Modelingsizedifferences
ModelingsizedifferencesusingthevonFoersterequation1.
The von Foerster equation (von Foerster, 1959; O’Dwyer et al., 2009) describes a
populationwhereindividualsgrowinnumberofcells(ormass)nwithagrowthrate
g(n),andwheretheydiewithadeathrate𝑑 𝑛 .Theevolutionofthenumber𝑗 𝑛, 𝑡 d𝑛
ofindividualswithanumberofcellsbetweennandn+dnattimetisgivenby:
𝜕𝑗(𝑛, 𝑡)𝜕𝑡 = −
𝜕 𝑔 𝑛 𝑗 𝑛, 𝑡𝜕𝑛 − 𝑑 𝑛 𝑗 𝑛, 𝑡
When g(n) and𝑑 𝑛 are independent of n, the stationary (i.e., time-independent)
solutionofthevonForsterequationis:
𝑗 𝑛 = 𝐽𝑑𝑔 𝑒
!!!(!!!!)
whereJistheconstantpopulationsize,givenby:
𝐽 = 𝑗 𝑛!
!!𝑑𝑛
Therefore,arandomlychosenindividualinthepopulationhasanumbernofcellswith
probabilitydensity:
𝑝!"# 𝑛 =𝑗 𝑛𝐽 =
𝑑𝑔 𝑒
!!!(!!!!)
Weusedthisprobabilitydensitytodrawanumberofcellsbetweenn0andinfinityfor
eachindividualoftheneutralsample.Themean< 𝑛 >andthecoefficientofvariation
𝜎! < 𝑛 >ofthenumberofcellsofarandomlychosenindividualaregivenby:
< 𝑛 >=𝑔𝑑 + 𝑛!
𝜎!< 𝑛 > =
1𝑑𝑔 𝑛! + 1
Chapter2–NeutralParameterInference
154
ComparisonwithO’Dwyeretal.(2009)’ssize-structuredneutralmodel2.
O’Dwyer et al. (2009) transformed the deterministic von Foerster equation into a
probabilistic equation, and integrated it into the master equation of Volkov et al.
(2003), which describes a neutral dynamics without dispersal limitation in a
probabilisticway.Theresultingsize-structuredneutralmodelpredictsthat, insteady
state, if the growth rate𝑔 𝑛 , the birth rate𝑏 𝑛 and the death rate𝑑 𝑛 are
independentofn, and if !!≫ 𝑛!(i.e., individuals growmuch larger than their size at
birth),arandomlychosenspecieswillhaveatotalnumberofcells(oratotalbiomass)
nwithprobabilitydensity:
𝑝!" 𝑛 =𝜈𝑏𝑛 (𝑒
!!!!! ! − 𝑒!!!!)
where𝜈is the speciation rate of the neutralmodel (𝜈 𝑏 ≪ 1). Adding size structure
doesnotmodifytheprobabilityforarandomlychosenspeciestohaveJindividuals:
𝑃!" 𝐽 =𝜈𝑏𝐽
𝑏𝑑
!
While themodel of O’Dwyer etal. (2009) explicitly accounts for the coupling
between the demographic dynamics and the growth of individuals, we generated a
neutral sample of individuals and then assigned an independent number of cells to
eachindividual.Therefore,underourassumptions,thenumbersofcellsofthedifferent
individuals are described by independent and identically distributed exponential
random variables𝑁! , and for!!≫ 𝑛!, the total number of cells of a species with J
individualsfollowsanErlangdistribution:
𝑁!
!
!!!
∼ 𝐸𝑟𝑙𝑎𝑛𝑔(𝑔𝑑 , 𝐽)
withprobabilitydensity:
𝑝!" 𝑛 𝐽 = 𝑝!"#$%&(!!,!)𝑛 =
1𝐽 − 1 !
𝑑𝑔
!
𝑛!!!𝑒!!!!
Chapter2–NeutralParameterInference
155
Theprobabilitydensityforaspeciesofhavingatotalnumberofcellsn is thengiven
by:
𝑝!" 𝑛 = 𝑃!" 𝐽 𝑝!" 𝑛 𝐽!
!!!
Combining the expressions of 𝑃!"(𝐽) and 𝑝!" 𝑛 𝐽 above, we obtain the same
expression for𝑝!" 𝑛 as predicted by the size-structured model of O’Dwyer et al.
(2009).Therefore,inthesimplecasewhereg 𝑛 ,𝑏 𝑛 and𝑑 𝑛 areindependentofthe
number of cells n, explicitly accounting for the coupling between demographic
dynamicsandindividualgrowthisequivalenttoassumingaswedidthatallindividuals
haveindependentandidenticallydistributednumbersofcells.
The modelling approach of Volkov et al. (2003) and O’Dwyer et al. (2009)
differs from that of Ewens (1972) and Etienne (2005). The former consists in
describing the population dynamics of a single specieswith a fluctuating number of
individuals, independently of the remaining of the community, and then considering
that the results hold for every species in the community (“mean-field” approach). In
contrast,theEwensandEtiennedistributionsareobtainedbyexplicitlyconsideringa
communitywithaconstantnumberofindividualsandafluctuatingnumberofspecies
through time. However, the two approaches yield identical stationary distributions
providedthatthenumberofspeciesislargeenough(Etienneetal.,2007).
References:
Etienne,R.S. (2005)Anew sampling formula forneutral biodiversity.EcologyLetters,8, 253–260.
Etienne,R.S.,Alonso,D.&McKane,A.J.(2007)Thezero-sumassumptioninneutralbiodiversitytheory.JournalofTheoreticalBiology,248,522–536.
Ewens, W.J. (1972) The sampling theory of selectively neutral alleles. Theoretical populationbiology,3,87–112.
vonFoerster,H.(1959)Someremarksonchangingpopulations.KineticsofCellularProliferation,pp.382–399.Stohlman,F.
O’Dwyer, J.P.,Lake, J.K.,Ostling,A.,Savage,V.M.&Green, J.L. (2009)An integrative frameworkforstochastic,size-structuredcommunityassembly.ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesofAmerica,106,6170–6175.
Chapter2–NeutralParameterInference
156
Volkov, I., Banavar, J.R.,Hubbell, S.P.&Maritan,A. (2003)Neutral theory and relative speciesabundanceinecology.Nature,424,1035–1037.
Chapter2–NeutralParameterInference
157
SupplementaryNote3:Estimatorperformancewithoutsimulated
noise
Weexploredhowthemaximum-likelihoodneutralestimatorsbehaveintheabsenceof
simulated noise over the range of tested parameter values (𝜃in [1, 500] andm in
[0.001,1]).WefoundthatwhiletheEwensestimatorisverylittlebiased(Fig.S4a-b),
thedispersal-limitedestimatorcanbestronglybiaseddependingonparametervalues
and sample size (Fig. S4c-f). The dispersal-limited estimator underestimates θ and
overestimatesIwhentheimmigrationrateintothelocalcommunityistoosmall,and
overestimatesθ and underestimates Iwhen the immigration rate is too large. In the
case of our10!-read sample, values of I around𝐼 = 10! (i.e.𝑚 = 0.01in the10!-
individualsample)allowforthe leastbiasedestimationof(𝜃, 𝐼).Biasesarestrongest
for𝜃 > 100.
Forbothestimatorsstandarddeviationandbiasdecreasewithsamplesize,but
amuch larger sample size is required to obtain accurate estimates in the dispersal-
limited case than in the absence of dispersal limitation. While sample sizes of ca.
𝑁 = 100are sufficient for the Ewens estimator, sample sizes of𝑁 = 10!are still not
sufficientforsomeparametervaluesinthedispersal-limitedcase.Larger𝜃valuesand
smaller I values require larger sample sizes. Estimating the neutral parameters
simultaneouslyfromseveralreadsamplesreducesthesebiases(Etienne,2007).
Reference:
Etienne, R.S. (2007) A neutral sampling formula for multiple samples and an “exact” test ofneutrality.EcologyLetters,10,608–618.
Chapter2–NeutralParameterInference
158
Figure S4: Neutral parameter inference without simulated noise, for different parametervalues. The mean and standard deviation of the relative biases on parameter estimates areplotted over 500 realizations. Panels a-b:𝜃 inference without dispersal limitation, as afunction of (a) the input𝜃value and (b) the read number𝑁, for𝜃equal to 1, 20, and 500.Panelsc-d:𝜃andlog!"(𝐼)inferenceasafunctionoftheinput𝜃value,formequalto0.1,0.01,and 0.001. Panels e-f: 𝜃 andlog!"(𝐼) inference as a function of the read number𝑁 , for𝑚 = 0.01andfor𝜃equalto1,20,and500.
1 2 5 10 50 200
−0.3
−0.1
0.1
0.3
a
θ (log scale)
(θ−θ)
θ
100 200 500 1000 2000
−0.4
0.0
0.4
b
θ = 1θ = 20θ = 500
Read number N (log scale)
(θ−θ)
θ
1 2 5 10 50 200
−20
12
34
c
θ (log scale)
log 1
0[I
I]
1 2 5 10 50 200
−10
12
34
d
m = 0.1m = 0.01m = 0.001
θ (log scale)
(θ−θ)
θ
100 500 2000 5000
−2−1
01
2
e
Read number N (log scale)
log 1
0[I
I]
100 500 2000 5000
0.0
0.2
0.4
0.6
f
θ = 1θ = 20θ = 500
Read number N (log scale)
(θ−θ)
θ
Chapter2–NeutralParameterInference
159
SupplementaryNote4:Neutralparameterinferencewiththenumber
ofindividualsunknown
Because exact maximum-likelihood inference of the neutral parameters relies on
sampling formulas that are invariant under subsampling, it is possible to use the
sequence reads as effective individuals as long as we can consider the reads as a
subsample from the initial individuals. Therefore, there should be less reads than
individuals. A further complication is that the sequence reads are sampled with
replacementfromtheinitialindividualsinoursimulations(i.e.theyareamultinomial
samplefromtherelativeabundances)insteadofwithoutreplacementasrequiredfor
theinvariancepropertytohold.Hencethereshouldbeinfactseveraltimeslessreads
thanindividuals,becausesamplingwithandwithoutreplacementareequivalentonly
inthiscase.
Toillustratethisassumption,weexploredhowtheEwensmaximum-likelihood
estimatorbehavesintheabsenceofsimulatednoisedependingontheinitialnumber
of individuals J, for𝑁 = 10!,𝑁 = 10!,𝑁 = 10! and 𝑁 = 10! , and for 𝜃 = 20 . As
expected, the Ewens estimator yields an unbiased𝜃estimate as long as the initial
numberof individuals isca.oneorderofmagnitude larger thanthenumberofreads
(Fig. S5a-d).We then simulated a number of reads larger than the initial number of
individuals (𝑁 = 10! reads for 𝐽 = 10! individuals, and𝑁 = 10! reads for𝐽 = 10!
individuals),andtooksmallersubsamplesofreadsfromtheoriginalreadsampleuntil
reachingastable𝜃maximum-likelihoodestimate.Asexpected,the𝜃estimatebecomes
stableundersubsamplingforsamplesatleastoneorderofmagnitudesmallerthanthe
initialnumber𝐽ofindividuals.Usingthismethod,weachievedanunbiasedestimation
of𝜃in spite of the small initial number of individuals (Fig. S5e-f). In the dispersal-
limited case, we expect the maximum likelihood estimator based on the Etienne
samplingformulatobehavesimilarly.
WealsocomparedestimatingθusingtheEwensestimatorandestimatingθby
linearregressionontherankedlog-abundance.Wefoundthatbothmethodsperform
similarlywhen thenumberof reads isoneorderofmagnitude larger than the initial
numberof individuals,andthatwhenthisconditionisnotmet, linearregressionstill
Chapter2–NeutralParameterInference
160
provides an unbiased θ estimate (Fig. S5a-d). However, unlike maximum likelihood
inference, linear regressionon the ranked log-abundance is not reliablewheneither
thenumberofreadsortheinitialnumberofindividualsistoolow(lowerthanca.500
for𝜃 = 20; Fig. S5a-d), orwhen there is too little taxonomicdiversity in the sample.
Moreover,the𝜃estimatedependsonthearbitrarydelimitationofthelineardomainof
thecurve.
1e+02 1e+03 1e+04 1e+05
−0.8
−0.4
0.0
a
Number of reads N
J (log scale)
(θ−θ)
θ
1e+02 1e+03 1e+04 1e+05
−0.8
−0.4
0.0
b
Number of reads N
J (log scale)
(θ−θ)
θ
1e+02 1e+03 1e+04 1e+05
−0.6
−0.4
−0.2
0.0
c
Number of reads N
J (log scale)
(θ−θ)
θ
1e+02 1e+03 1e+04 1e+05
−0.4
−0.2
0.0
0.2
d
Number of reads N
J (log scale)
(θ−θ)
θ
−0.3
−0.1
0.1
e
N J
1e+02 1e+03 1e+04 1e+05Nsubsample (log scale)
(θ−θ)
θ
−0.5
−0.3
−0.1
0.1
f
N J
1e+02 1e+03 1e+04Nsubsample (log scale)
(θ−θ)
θ
Chapter2–NeutralParameterInference
161
Figure S5:𝜃inference without dispersal limitation and without simulated noise for 𝜃 = 20.Themeanandstandarddeviationof therelativebiason theθ estimateareplottedover100realizations.Panels a-d:θ inferencebymaximumlikelihood(black)andbylinearregressionon the ranked log-abundance (blue), as a function of the initial number of individuals J, (a)for 𝑁 = 10! reads, (b)𝑁 = 10! reads, (c)𝑁 = 10! reads and (d)𝑁 = 10! reads (linearregression too inaccurate to be plotted for 𝑁 = 10!). Panels e-f: Maximum-likelihood θestimateasafunctionofthesize𝑁!!"#$%&'( ofthereadsubsampleusedforestimation,startingfromanoriginal sampleof (e)𝑁 = 10!readsor (f)𝑁 = 10!reads.Anunbiasedθestimate isobtainedwhen𝑁!"#!$%&'( isatleastoneorderofmagnitudesmallerthantheinitialnumberofindividuals(e)𝐽 = 10!or(f)𝐽 = 10!.
Chapter2–NeutralParameterInference
162
Chapter3–TopicModelling
163
Chapter3TopicmodellingrevealsspatialstructureinaDNA-basedbiodiversitysurvey
GuilhemSommeria-Klein1,LucieZinger1,2,EricCoissac3,AmaiaIribar1,Heidy
Schimann4,PierreTaberlet3,JérômeChave1
1UniversitéToulouse3PaulSabatier,CNRS,IRD,UMR5174LaboratoireEvolutionetDiversitéBiologique(EDB),F-31062Toulouse,France.2EcoleNormaleSupérieure,CNRS,UMR8197InstitutdeBiologiedel’ENS(IBENS),F-75005Paris,France.3UniversitéGrenobleAlpes,CNRS,UMRLaboratoired'EcologieAlpine(LECA),F-38000Grenoble,France.4INRA,UMR745EcoFoG(AgroParisTech,CIRAD,CNRS,UniversityoftheFrenchWestIndies,UniversityofFrenchGuiana),F-97387Kourou,France.
Chapter3–TopicModelling
164
Chapteroutline
Thesecondchapterexploredtheeffectofnoiseontheinterpretabilityofenvironmental
DNA data. In this third chapter, another challenge of environmental DNA data is
addressed,namelythefactthatmicrobialdatasetstypicallyyieldalargenumberofrare
OTUs, and that sampling effort cannot be controlled across samples. As in the first
chapter, the focus is here on datasets containingmany spatially distributed samples.
However, while the first chapter aimed at comparing the taxonomic composition of
samples with respect to their spatial layout and to environmental descriptors, this
chapterdescribesamethodtoexplore thestructureofanenvironmentalDNAdataset
independently of any additional information. The results can then be interpreted in
regardofcontextualdata.Thismethod,which iscloselyrelated tomethodsalready in
use inmicrobiology, is suited to large and sparsedatasets, and accounts for sampling
effects. It consists in decomposing the data into assemblages of OTUs based on their
propensitytoco-occuracrosssamples.Inthischapter,itistestedusingsimulationsand
byapplyingittoalargesoilDNAdatasetcollectedoveraforestplotfollowingaregular
sampling scheme. A measure of the stability of the decomposition is also proposed.
Lastly,theapplicationofthisapproachtoecologicaldataisdiscussedmoregenerally.Of
particular interest is that thismethod ismodel-based, and could thusbe extendedby
modifying the underlying model, including by the addition of more mechanistic
elements.
Chapter3–TopicModelling
165
Abstract
High-throughput sequencing of amplicons from environmental DNA samples has
become a major method for rapid, standardized and comprehensive biodiversity
assessments, allowing for the study of all life formswithin a single sample.However,
data interpretation is often difficult because a large number of rare taxa confound
patterns. Hence, retrieving and describing the structure of such datasets requires
efficientmethodsfordimensionalityreduction.Here,wedescribethefirstapplicationof
Latent Dirichlet Allocation (LDA) to an environmental DNA dataset. LDA uses a
probabilisticmodel todecomposesamples intooverlappingassemblagesbasedon the
co-occurrenceoftaxaandthecovarianceoftheirabundances.Itaccountsforsampling
effectsandaccommodateslargeandsparsedatasets.Weshowthatthegroupingoftaxa
into assemblages can be tested statistically, and to this end develop a measure of
assemblagestability.WethenapplyaLDAalgorithmtoa largesoilsurveyofbacteria,
protists and metazoans in a 12-ha plot of primary tropical forest. The LDA analysis
reveals thatbacterial and protist assemblages display a strong spatial structurewhile
metazoans do not. Furthermore, bacteria and protists exhibit very similar spatial
patterns,whichmatchthetopographicalfeaturesoftheplot.WeconcludethatLDAisa
computationally efficient and robust method to detect and interpret the structure of
large DNA-based biodiversity datasets.We discuss the possible future applications of
thisapproachinbiodiversityscience.
Chapter3–TopicModelling
166
Chapter3–TopicModelling
167
Introduction
High-throughput sequencing is shedding a new light on the study of biodiversity
patternsacrossdomainsof life.Asimpleandefficientmethod is ‘DNAmetabarcoding’
(Taberletetal.,2012),whichconsists inamplifyingandsequencingagenomicmarker
(‘DNAbarcode’) intheDNAcontainedinenvironmentalsamplessuchassoil,wateror
feces(Thomsen&Willerslev,2015).Theresultingsequencescanthenbeclusteredinto
molecular Operational Taxonomic Units (OTUs), which serve as proxies of species in
biodiversity assessments, and which can possibly be assigned to known taxa after
comparison to reference databases. Metabarcoding data typically consist of a
‘communitymatrix’thatliststheOTUsfoundineachenvironmentalsample,aswellas
theirreadcounts.
Agoalofcommunityecologyistounderstandpatternsofspeciesco-occurrence
andturnoveracrossspace.Letusassumethatmanysampleshavebeencollectedacross
space, in a regular fashion. So far, the search for community structure has been
performed using multivariate ordination, as well as distance-based or partitioning-
based clustering (Legendre & Legendre, 2012). These methods have proven their
efficiency, but they have limitationswhen it comes to analysing datasetswith a very
largenumberofOTUs, andmany rareOTUs, resulting in largeand sparse community
matrices (Holmes et al., 2012). Their results are also biased by the uneven sampling
effort across samples in metabarcoding data, since sampling effort depends on the
amountofDNAretrievedandonPCRyieldforeachsample.
Probabilistic approaches to detecting data structure offer an alternative to
ordinationmethodsbyexplicitlymodellingthesamplingprocessthatunderliesthedata
(Holmes et al., 2012). This can be achieved using a so-called mixture model, which
assumesthatthedataarestructuredintoamixtureofseveral(unobserved)component
units, eachwith a distinctive taxonomic composition. Under thismodel, the observed
discretesamplesofsequencereads,whichmaybeofdifferentsizes,aresampledfrom
Chapter3–TopicModelling
168
thismixture.Thecomponentunitscanthenbeinferredfromthedatausingmaximum-
likelihoodorBayesianinference,whichproviderigorousmeansofassessinggoodness-
of-fit and of selecting the number of component units. Mixture models have been
successfully used in microbiology (Knights et al., 2011; Holmes et al., 2012; Ding &
Schloss,2014;Shafieietal.,2015)andincommunityecology(Valleetal.,2014),either
inanunsupervisedway(dataclustering)orinasupervisedway(dataclassification).In
particular, Valle et al. (2014) used Latent Dirichlet Allocation (LDA) to cluster tree
abundance data across forest plots into component assemblages – or ‘component
communities’.Theyshowedthatthismethodperformedbetterthanhierarchicalandk-
meansclusteringonsimulateddata.Here,weexplore thepotentialof thismethod for
theanalysisoflargemetabarcodingdatasets.
LDAdecomposessamplesintoamixtureofcomponentassemblages,whichmay
themselvesoverlapintheirtaxonomiccomposition.Thecomponentassemblagescanbe
interpretedascommunitiesofco-occurringtaxa.Becauseeachsampleisrepresentedby
a mixture of component assemblages, the model captures the smooth turnover in
speciescompositionalongenvironmentalgradients(Valleetal.,2014).Thismodelwas
originally introduced by Blei etal. (2003) to decompose large sets of text documents
into topics (a problem known as ‘topic modelling’), based solely on their word
frequency, and has been subsequently extended to the analysis of large and complex
datasets in various fields (see Blei (2012) for a review). The same model has been
independently introduced in population genetics tomodel population structure using
the distribution of alleles across individuals, and is now a cornerstone of population
genetics analyses (model with admixture in the Structure software; Pritchard et al.,
2000).
OneissuefortheapplicationofLDAtometabarcodingisthattheinterpretation
thatcanbemadeofabundance information, i.e. theDNAreadcountperOTU,remains
debated (Nguyen et al., 2015; Sommeria-Klein et al., 2016). For bacteria, it seems
possible to relate the read count to the number of cells in the sample (Kembel etal.,
2012),while in the case ofmacro-organisms, the read countmaybe indicative of the
taxon’s biomass in the environment (Andersen et al., 2012; Klymus et al., 2015).
Nevertheless,metabarcodingdataareoftenbestusedasoccurrencedata,anditisthus
important to evaluate the applicability of LDA to occurrence-based datasets. Second,
Chapter3–TopicModelling
169
depending on how strongly structured the data are, the LDA algorithm may fail to
converge to an optimal solution. It is indeed acknowledged in the literature that the
resultofLDAdecompositionmayvaryfromoneruntotheother(Steyvers&Griffiths,
2007;Balagopalan,2012;Valleetal.,2014).Hence,itwouldbeimportanttoquantifythe
robustness of the LDA decomposition, especially since environmental DNA data are
noisy.Wefirstaddresstheseproblemsonsimulateddata,andthenturntotheanalysis
of an empirical metabarcoding dataset describing the soil biodiversity of bacteria,
protistsandmetazoansoveralargetropicalforestplotinFrenchGuiana(Zingeretal.,
2017).We thusaddresshere the followingquestions: (1) canLDAaccurately retrieve
assemblages from occurrence data, (2) can we define a stability metric for the
decomposition of metabarcoding data into component assemblages, and (3) can
componentassemblagesretrievedfromempiricaldataberelatedtovariationinabiotic
conditions? Finally, we discuss our results in light of those obtained bymultivariate
methods(Zingeretal.,2017).
Chapter3–TopicModelling
170
Methods
LatentDirichletAllocation1.
LDA decomposition takes as an input a community matrix representing samples by
columns and OTUs by lines, where the entries are the read counts per OTU in each
sample.Occurrencedatacanalsobeprovidedasaninput,sincetheyareaspecialcaseof
abundancedatawhereOTUabundancesonly takevalues0or1. Inference consists in
fitting a generative model to the observed community matrix. The generative model
describesawaytogeneratethedatabasedontwoassumptions:thedataarestructured
intoKassemblages,whereKisafixedparameter,andeachsampleisamixtureoftheK
assemblages in Dirichlet-distributed proportions. The model involves unobserved
(‘latent’) variables describing the underlying decomposition of the data into the K
assemblages,andthe fittingprocessconsists inestimatingthemost likelyvalueof the
latentvariablesandofthemodel’sparametersgiventheobserveddata(Fig.1).
The generative model consists of the following steps. For sequence read n in
samplem,assemblagemembershipznisgeneratedbyacategoricaldrawfromavector
ofKmixtureweights 𝜃!! !∈ !,! (i.e.,oneoutofKcategoriesischosenatrandomwith
probability weights 𝜃!! !∈ !,! ). Then, the OTU membership wn is generated by a
categoricaldrawfromavectorofVmixtureweights 𝜙!!!
!∈ !,!,whereVisthenumber
of distinct OTUs in the whole dataset. The mixture weights 𝜃!! represent the
decompositionofeachsamplem intotheKassemblages,whilethemixtureweights𝜙!!
representthetaxonomiccompositionofeachassemblagek.Themodelfurtherassumes
thatthemixtureweights𝜃!!followforeachsamplemasymmetricDirichletdistribution
ofmixingparameterα.Therefore,foreachsamplem:
𝜽𝒎 = 𝜃!! !∈ !,! ∼ 𝐷𝑖𝑟𝑖𝑐ℎ𝑙𝑒𝑡 𝛼
Andthen,foreachsequencereadninsamplem:
𝑧! ∼ 𝐶𝑎𝑡𝑒𝑔𝑜𝑟𝑖𝑐𝑎𝑙 𝜽𝒎
Chapter3–TopicModelling
171
𝑤! ∼ 𝐶𝑎𝑡𝑒𝑔𝑜𝑟𝑖𝑐𝑎𝑙 𝝓𝒛𝒏
Thus, fitting the generativemodel to the observed data consists in finding the
most likely assemblage mixtures 𝜽𝒎 for the M samples, the most likely OTU
compositions𝝓𝒌 for the K assemblages, and the most likely value for the mixing
parameterαoftheDirichletdistribution.Thevalueofαindicateswhetherthesamples
tend to be decomposed into an evenmixture of component assemblageswith similar
abundances (case𝛼 > 1) or into an uneven mixture dominated by one or a few
componentassemblages(case𝛼 < 1).Asharpspatialsegregationoftheassemblagesis
associatedwithaαvaluemarkedlylowerthanunity.TheDirichletdistributionisused
as a prior primarily because it is the conjugate prior of the categorical distribution,
whicheasesanalyticalcalculations.
Figure 1. Illustration of Latent Dirichlet Allocation’s (LDA) principle. LDA decomposesacommunitymatrixwith discrete abundance information (e.g., read count) intoK assemblagesbasedontheco-occurrenceofOTUsandthecovarianceoftheirabundancesacrosssamples.Kisfixed beforehand and can be selected using likelihood-based model selection methods. Theassemblage mixture 𝜃!! !∈ !,! in each sample m, with 𝜃!!!
!!! = 1 , and the taxonomiccomposition 𝜙!! !∈ !,! ofeachassemblagek,with 𝜙!!!
!!! = 1,areinferredfromthedata.
InferenceusingaVariationalExpectation-Maximizationalgorithm2.
WefittedthegenerativemodeltotheobserveddatausingtheVariationalExpectation-
Maximization (VEM) algorithm proposed and implemented by Blei et al. (2003), and
Sample1 Sample2 Sample3
OTU1nOTU2★OTU3u
nnnn ★
nn ★★u
★★★uuu
Sample1 Sample2 Sample3
Assemblage1Assemblage2
10
0.50.5
01
Assemblage1 Assemblage2
OTU1nOTU2★OTU3u
0.750.250
00.50.5
θkm
φνk
LDAdecomposi<onK=2assemblages
Sample1 Sample2 Sample3
OTU1nOTU2★OTU3u
nnnn ★
nn ★★u
★★★uuu
Chapter3–TopicModelling
172
wrapped into theRpackage ‘topicmodels’byGrün&Hornik (2011).Compared to the
often-followed Bayesian approach of Griffiths & Steyvers (2004), this approach is
computationallyfaster,estimatesallparametersandallowsforabetter-justifieduseof
AICforselectingthenumberofassemblages.Thealgorithmusesapproximatelikelihood
maximizationtoestimatetheparametersαand𝝓 = 𝜙!! !∈ !,! !∈ !,!,aswellas the
posterior distribution of the latent variables 𝒛 = 𝑧! !∈ !,!! !∈ !,!and 𝜽 =
𝜃!! !∈ !,! !∈ !,!giventhedata𝒘 = 𝑤! !∈ !,!! !∈ !,!
.
First,wesetthemodelparametersto𝛼 = 0.1andtorandomlychosenvaluesfor
φ .Then,thefollowingtwostepsarerepeateduntilthelikelihood(ormoreprecisely,a
lower bound for the likelihood) converges. The variational step approximates the
posteriordistribution𝑃 𝒛,𝜽|𝒘,𝛼,𝝓 ofzandθ ,giventhedatawandgiventhecurrent
values of α and𝝓. This is achieved by minimizing the Kullback-Leibler divergence
between a variational approximation and the true posterior. The Expectation-
Maximization(EM)stepestimatestheparametersαand𝝓bymaximizingthemarginal
log-likelihood 𝐿 𝛼,𝝓 = ln 𝑃 𝒘|𝛼,𝝓 , making use of the approximation to the
posteriordistribution𝑃 𝒛,𝜽|𝒘,𝛼,𝝓 foundinthevariationalstep(Bleietal.,2003;Grün
& Hornik, 2011). We used a convergence threshold of 10-7 for the EM step and a
convergencethresholdof10-8forthevariationalstepinallouranalyses.
Thisalgorithmprovidesanestimateofthemarginallog-likelihoodln 𝑃 𝒘|𝛼,𝝓
of the final decomposition, that can be used to compare different realizations of the
algorithmortocomputethemodel’sAIC.Itisadeterministicalgorithminthesensethat
it consists in a simple iterative optimization. However, the resultmay depend on the
initializationforthetaxonomiccomposition𝝓ofassemblages.
Computingtheoptimalnumberofassemblages3.
WeselectedthenumberKofassemblagesbasedonAIC.Thereisnorigorousexpression
ofAICforamodelsuchasLDA(Burnham&Anderson,2002),butwechosetocompute
the AIC as2(𝐿 𝛼,𝝓 + 𝐾(𝑉 − 1)+ 1), where𝐿 𝛼,𝝓 = ln 𝑃 𝒘|𝛼,𝝓 is the marginal
log-likelihoodoftheLDAdecomposition.Indeed,thereare𝐾(𝑉 − 1)freeparametersto
Chapter3–TopicModelling
173
beestimated in𝝓 = 𝜙!! !∈ !,! !∈ !,!,plusthemixingparameter𝛼.This is thesame
expressionastheoneusedelsewhere(Than&Ho,2012).Weusedthelowerboundon
themarginallog-likelihoodcomputedaspartoftheVEMalgorithmasanapproximation
for𝐿 𝛼,𝝓 . We also tried to correct the AIC for small sample size as2[𝐿 𝛼,𝝓 +
𝐾 𝑉 − 1 + 1 1+ !!], whereM is the number of samples (Burnham & Anderson,
2002),butthisdidnotmodifyourresults,andwedonotreporttheseanalyseshere.
Assessingthestabilityofthedecomposition4.
TheLDAdecompositionreflectstheco-occurrencestructureofOTUsamongsamples,as
wellasthecovariancestructureoftheirabundancesinthecaseofabundancedata.Ifthe
dataarenotstronglystructured,theymayexhibitacomplexlikelihoodlandscape,which
increasesthechancethatthealgorithmreachesalocallikelihoodmaximum.Toaddress
this issue, we ran the algorithm a hundred times starting from random initial
assemblages𝝓𝒌,andweselectedonlytherealizationwiththehighest likelihoodvalue
for interpretation. We also measured the stability of the decomposition across the
hundred realizations,with two goals inmind:measuring how strongly structured the
data are, and assessing whether the realization with highest likelihood has indeed
reached the optimal solution. We removed the occasional realizations with α values
much larger than 1 from the analysis, because they correspond to non-informative
solutionswhereallsamplescontainallassemblagesinsimilarproportions.
Tomeasurethestabilityofthedecompositionacrossrealizations,wefirstneeded
todefineameasureofsimilaritybetweentwopossibledecompositionsofthedata.We
computeditasthemeansimilaritybetweentheassemblagesofthetwodecompositions.
Therefore, itboilsdowntodefiningameasureofsimilaritybetweentwoassemblages.
WeusedthesymmetrisedKullback-Leibler(sKL)divergence,ameasureofdissimilarity
between two distributions that stems from information theory and that is commonly
used in statistics and machine learning (Burnham & Anderson, 2002; Meila, 2006;
Steyvers&Griffiths, 2007).TheKullback-Leiblerdivergence (or relative entropy)of a
distribution 𝒒 = 𝑞! !∈ !,! relative to a distribution 𝒑 = 𝑝! !∈ !,! is defined as
Chapter3–TopicModelling
174
𝐷 𝒑 𝒒 = 𝑝! ln 𝑝! 𝑞!!!!! , with 𝑝!!
!!! = 1 and 𝑞!!!!! = 1 (Kullback, 1959). It
measurestheamountofinformationlostwhenapproximatingthedistributionpbythe
distributionq. The symmetrisedKullback-Leiblerdivergencebetweenp andq is then
defined as𝐷! 𝒑,𝒒 = 𝐷 𝒑 𝒒 + 𝐷 𝒒 𝒑 2. Between two assemblages𝑘!and𝑘!, the
sKL divergence can be computed either based on their spatial distribution, i.e.
𝐷! 𝜽𝒌𝟏 𝜃!!!!
!!! ,𝜽𝒌𝟐 𝜃!!!!
!!! ,orbasedontheirOTUcomposition, i.e.𝐷! 𝝓𝒌𝟏 ,𝝓𝒌𝟐 .
Thus,wewereable tomeasureboth thespatialandthe taxonomicsimilaritybetween
twoassemblages.Since𝐷! 𝒑,𝒒 isinfiniteassoonasthereisatleastoneiin 1,𝑁 that
verifies𝑝! = 0or𝑞! = 0, we avoided infinite sKL divergence values by setting a lower
boundineveryentryofθ andφ ,equaltotheinverseofthesumofallelementsinthe
communitymatrix(i.e.,theinverseofthetotalnumberofreadsinthecaseofabundance
data,ortheinverseofthetotalnumberofoccurrencesinthecaseofoccurrencedata).
Therefore,everypointwherebothdistributionstakevaluesbelowthisthresholdhasa
nullcontributionto𝐷! 𝒑,𝒒 .
We used the sKL divergence to define the similarity measure
𝜎 𝑘!, 𝑘! = 𝐷! 𝑘!, 𝑘! !"# − 𝐷! 𝑘!, 𝑘! 𝐷! 𝑘!, 𝑘! !"# between two assemblages𝑘!
and𝑘!,where 𝐷! 𝑘!, 𝑘! !"#is the average sKLdivergenceover1000 randomizations
of the assemblages.When computing spatial similarity,weperformed randomizations
by randomly shifting the spatial distribution of one assemblage with respect to the
other, so as to account for spatial autocorrelation (Fortin & Payette, 2002). When
computingtaxonomicsimilarity,weperformedrandompermutationsoftheOTUsinone
distributionwithrespecttotheother.Thesimilarity𝜎 𝑘!, 𝑘! isequalto1foraperfect
match, and to0when theassemblagesareas similar as expectedby chance.We then
defined the similarity between two decompositions𝑑!and𝑑!as the mean similarity
betweentheirbest-matchingassemblages,i.e.𝑆 𝑑!,𝑑! = 𝜎 𝑘!, 𝑘!∗ 𝑘! 𝐾!!!!! ,where
assemblage 𝑘!∗ 𝑘! is the best match in decomposition 𝑑! of assemblage 𝑘! in
decomposition𝑑!, as deduced from the comparison of𝜎values.Whenmore than one
assemblage 𝑘! in decomposition 𝑑! had a best match with assemblage 𝑘!∗ in
decomposition𝑑!,weforcedaone-to-onecorrespondencebetweentheassemblagesof
both decompositions by giving priority to higher𝜎values. This situation should be
Chapter3–TopicModelling
175
rarely encountered however, since assessing stability mostly involves comparing
decompositionsthatcloselyresembleeachother.
Figure2.Assessing thestabilityof theLDAdecompositionusing themetric I.Eachpanelrepresents100realizationsof theLDAalgorithmwithrandomassemblage initializations foramockdataset. Inbothcases,therealizationwithhighest likelihood(i.e., thebestrealization) iscompared to each of the 99 others by plotting their similarity S as a function of their log-likelihooddifference. Themetric I is defined as the intercept of the linear regression (dashedblue line). Two cases are illustrated: (a) realizations grow increasingly similar to the bestrealization as their likelihood increases (𝐼 = 1), and (b) dissimilar realizations with similarlikelihood coexist (𝐼 = 0.5). Values of I close to 1 indicate that the best realization is likely tohavereachedtheoptimalsolution.
Wemeasured the stability of the decomposition across𝑛 = 100realizations by
computing two metrics. First, we computed the mean similarity across all pairs of
realizations 𝑆 ! = 𝑆 𝑑!,𝑑! 𝑛 𝑛 − 1 2!!,!! .Themoresimilartherealizationsare
irrespectiveof the initialcondition, themorestronglystructuredthedataare likelyto
be. Second, we compared the realization with highest likelihood (i.e., the best
realization) to each of the𝑛 − 1others. To assess whether the best realization had
indeed reached the optimal solution, we plotted for each pair their similarity𝑆as a
functionoftheir log-likelihooddifference(Fig.2).Weperformeda linearregressionof
thesimilarityagainstthelog-likelihooddifference,andusedtheintercept𝐼!asametric.
●●●●●●●●●●●●●●●●●●●●●
●
●●●●●
●
●●●●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0.00
0.25
0.50
0.75
1.00
0 1000 2000 3000 4000 5000
Llh difference with best realization
Sim
ilarit
y S
to b
est r
ealiz
atio
n
a
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
0.00
0.25
0.50
0.75
1.00
0 1000 2000 3000 4000 5000
Llh difference with best realization
b
Chapter3–TopicModelling
176
Thismetric𝐼!assesseswhether the realizations tend to be increasingly similar to the
best realization as their likelihood increases, i.e. to ‘converge’ toward the best
realization. Values of𝐼! close to 1 mean that we can be confident that the best
realization has reached the global likelihood maximum, provided that the space of
possible initializations has been adequately sampled. We computed both metrics for
spatial ( 𝑆!"#$. ! , 𝐼!"#$.,! ) and taxonomic ( 𝑆!"#$. ! , 𝐼!"#$.,! ) similarities between
assemblages.
Simulateddata5.
To test the performance of the LDA algorithm on occurrence-transformed data with
respect to the original abundance data, we simulated a metabarcoding dataset. This
simulated dataset comprised 1,131 samples containing a total of 1,000 OTUs and
decomposedinto5assemblages.WefirstdefinedtheassemblagesbydrawingtheirOTU
compositionfromaDirichletdistributionofmixingparameter0.02.Wethenassignedto
each sample a mixture of assemblages in proportions determined by a sinusoidal
function of the sample’s index, so that the relative abundances of all 5 assemblages
successively peak at 100% (Fig. 3). Combining the assemblage mixture and the
taxonomiccompositionofassemblages,weobtainedtherelativeabundancesofOTUsin
eachsample.Wegeneratedthesimulateddatasetbysampling1,000sequencereadsper
sample from these relativeabundances,which resulted inanaveragediversityof105
OTUspersample.
Chapter3–TopicModelling
177
Figure3:LDAdecompositionofsimulatedoccurrenceandabundancedata.LDAappliedto
asimulateddatasetwith5assemblages,1,000MOTUs,1,131samples,and1,000sequencereads
persample,(a)fortheoriginalabundancedata,and(b)fortheoccurrencedataderivedfromthe
samedataset.EachplotshowstheassemblageproportionsestimatedbyLDAforK=5(coloured
lines; only the realization with highest likelihood out of 100 is shown) and the simulated
assemblageproportions(dashedblacklines).
Tropicalforestsoilmetabarcodingdataset6.
We applied LDA to an empiricalmetabarcoding dataset describing the biodiversity of
bacteria, protists and metazoans over a 300x400 m tropical forest plot (called Petit
Plateau;Chaveetal.,2008)attheNouraguesEcologicalResearchStation, ina lowland
tropical forest of central French Guiana (Bongers et al., 2001). Site conditions, data
collection,laboratoryprocedures,andsequencingfilteringproceduresarealldescribed
indetailinZingeretal.(2017)andareonlybrieflysummarizedhere.
Thesamplingcampaignwasconductedtowardstheendofthe2012dryseason.
Soilsampleswerecollectedfromthemineralhorizon(~10cmdeep)usingasoilauger
every10monasquaregridcoveringtheplotandexcludingtheedges,whichresultedin
1,131 soil samples (Fig. S1). Extracellular DNA was extracted in the field from each
sample (Zinger et al., 2016). The present study uses data from two DNA barcodes
0 200 600 1000
0.0
0.2
0.4
0.6
0.8
1.0
Asse
mbl
age
prop
ortio
ns in
sam
ples
a − Abundance
Samples0 200 400 600 800 1000
0.0
0.2
0.4
0.6
0.8
1.0
b − Occurrence
Samples
Chapter3–TopicModelling
178
amplified by PCR and sequenced on high-throughput Illumina sequencers, targeting
bacteria(16SrDNA),andalleukaryotes(18SrDNA).Thesequencingdatawerecurated
using theOBITools package (Boyer etal., 2016). Sequenceswere clustered intoOTUs
based on their similarity using the Infomap algorithm (Rosvall et al., 2009) with a
similarity cut-off of 3mismatches, so as to cluster spurious sequences resulting from
PCRandsequencingerrors.EachOTUwasgivenataxonomicassignationbycomparing
itssequencetothefollowingreferencedatabases:GenBankr197fortheeukaryotic18S
marker, andSILVA for thebacterial16Smarker.Sequencematching todatabaseswas
conductedusingtheecotagprogramincludedintheOBIToolspackage.Basedonthese
taxonomic assignations, we further split the eukaryotic 18S dataset into protists,
arthropods,annelids,nematodes,andflatworms(Platyhelminthes).
Out of the 1,131 samples, a number of samples were excluded from the
sequencingresultsforeachbarcodeduetoinsufficientPCRyields(7.2%ofsamplesfor
bacteriaand0.2%foreukaryotes).Weinterpolatedthecontentofthemissingsamples
by samplingwith replacement themeannumberof readsper sample from the (up to
eight)non-emptynearestneighbouringsamplesonthegrid.WethenappliedtheLDAon
either the read-abundance data, or on the occurrence-transformed data, defining the
absenceofanOTUinasamplestrictlyaszeroread-abundanceinthesample.Wedidnot
trimthedataforrareOTUs,orforOTUsrepresentedinasinglesample.
A fine-grained description of the forest canopy structure and topography was
obtainedusingasmall-footprintLiDARsurveycarriedoutoverthesamplingsiteinthe
same year as the soil sampling (2012; Rejou-Mechain et al., 2015). This allowed the
generation ofmaps of topography, slope, and canopy height from the LiDAR cloud of
points. The topography of the plot is relatively smooth, with amaximal difference in
elevationof30m.Mapsofsoilwetness(Beven&Kirkby,1979)andlightatgroundlevel
were also derived from the LiDARmeasurements (Tymenetal., 2017).We compared
theLiDAR-deriveddatawiththemetabarcodingdatabycomputingthemeanvaluesof
the environmental variables over 10-m-by-10-m cells centred on the soil sampling
points. We sought a biological interpretation for the retrieved assemblages by
comparing their spatial distribution to the distribution of LiDAR-obtained
environmental variables. To do so, we computed Pearson’s correlation coefficient
between the spatial distributions and assessed the significance of the correlation by
Chapter3–TopicModelling
179
performing 100,000 spatial randomizations, i.e. shifting randomly one spatial
distributionwithrespecttotheother,soastoaccountforspatialautocorrelation.
Table 1: Stability of LDA decomposition for occurrence data. For each of the taxonomicgroupsunderstudy:totalnumberofMOTUs;optimalnumberofassemblages𝐾!"#(!"#)obtainedfromAICminimization; spatial and taxonomic stability for three assemblages asmeasuredbythe 𝑆 !""and𝐼!""metrics;estimatedvalueofthemixingparameterαinthebestrealizationoutof100forthreeassemblages.
𝐾 = 3
Richness 𝐾!"#(!"#) 𝑆!"#$. !"" 𝐼!"#$.,!"" 𝑆!"#$. !"" 𝐼!"#$.,!"" 𝛼!"#$ !"#$.
Bacteria16S 20,162 5 0.85 1.0 0.95 1.0 0.16
Protists18S 1,648 2 0.68 1.0 0.95 1.0 0.082
Arthropods18S 1,881 2 0.62 0.62 0.91 0.93 0.11
Nematodes18S 378 2 0.33 0.49 0.88 0.94 0.05
Platyhelminthes18S 126 2 0.52 0.50 0.86 0.88 7.0
Annelids18S 51 2 0.41 0.57 0.83 0.90 0.035
Chapter3–TopicModelling
180
Results
We first appliedLatentDirichletAllocationdecomposition toa simulateddataset, and
compared the results for abundance and occurrence data. AICminimization correctly
recoveredthesimulatednumberofassemblages(five)inbothcases(Fig.S2).Inthecase
ofoccurrencedata,LDAyieldedmoreevenassemblagemixturesthansimulated(Fig.3).
Thealgorithmreachedtheoptimalsolutionmorereliablyforoccurrencedatathanfor
abundancedata( S!"#$. !"" = 0.98foroccurrencedata, 𝑆!"#$. !"" = 0.89forabundance
data,𝐼!"#$.,!"" = 1.0in both cases; cf. Fig. S2). Next, we applied the analysis to the
tropical forest soil dataset. Using the read-abundance data, the optimal number of
assemblages was always larger than 50, while it ranged between 2 and 5 for the
occurrencedata,dependingonthetaxonomicgroup(Table1).Asalsoobservedonthe
simulateddata,theLDAalgorithmconvergedmorereliablytowardtheoptimalsolution
foroccurrencedatathanforabundancedata(TableS1).ThusweconcludethatLDAcan
be effectively applied to occurrence-based biodiversity data. In the rest, we describe
resultsobtainedusingoccurrencedataandassuming𝐾 = 3assemblages,avalueclose
tothatminimizingtheAICacrosstaxonomicgroups.
Wefoundcleardifferenceswhencomparingbacteriaandunicellulareukaryotes
(henceforth denoted as protists) tometazoans (arthropods, annelids, flat worms and
nematodes).Bacteriaandprotistsdisplayedastrongerspatialstructureatthescaleof
ourstudyplot,asdeducedfromthespatialstabilityofthedecomposition:thesimilarity
intercept𝐼!"#$.,!""was equal to 1.0 (Table 1, Fig. S3), with a mean similarity across
realizations S!"#$. !""of0.85and0.68, respectively (Table 1). In contrast, metazoans
displayed a lower similarity intercept (0.49 ≤ 𝐼!"#$.,!"" ≤ 0.62 ), and also a lower
similarity across realizations (0.33 ≤ S!"#$. !"" ≤ 0.62). We also found that spatial
structurewaspositively correlated to taxonomicdiversity,measuredbyOTUrichness
(correlation coefficient𝜌 = 0.85between S!"#$. !!!and the number of OTUs; Fig. 4).
The taxonomic stability of the assemblages was higher than their spatial stability,
followingthesametrendsasthespatialstability,butwithlesspronounceddifferences
amongtaxonomicgroups(Table1).
Chapter3–TopicModelling
181
Figure 4. Stability for occurrence data measured as the mean similarity acrossrealizations 𝑺 𝟏𝟎𝟎,asafunctionofthenumberofOTUs.Themetric 𝑆 !""ismeasuredbasedon the (a) spatial and (b) taxonomic similarity between assemblages. The blue line figures alinear regression, and the shaded area its standard error. Pearson’s correlation coefficient is𝜌 = 0.85forspatialstabilityand𝜌 = 0.92fortaxonomicstability.
For all taxonomic groups except flat worms, the estimates of the mixing
parameterαweremuchsmallerthan1(Table1),indicatingastrongspatialsegregation
amongassemblages.Inbacteriaandprotists,thedecompositionintothreeassemblages
wasstronglylinkedtotopographicalfeatures(Fig.4,TableS2).Theblueassemblageof
Fig. 4 was associated with terra firme areas, defined as areas of higher topography,
gentler slope, and lower soil wetness. The green assemblage was associated with
hydromorphic areas, defined as displaying the opposite environmental correlations
(TableS2).Finally,thespatialdistributionoftheredassemblagematchedthelocationof
exposed rock patches that are scattered across the forest plot, based on direct
observations. In metazoans, we were unable to identify similar terra firme and
hydromorphicassemblages(Fig.S4,TableS2),howeveroneassemblageinarthropods
andnematodesdidmatchtheexposedrockspatialpattern(TableS3).Thisexposedrock
assemblagewasindeedconsistentlyfoundtobethemosttaxonomicallydistinctiveinall
taxonomic groups.Neither light at ground level nor canopy height explained the LDA
decompositioninanyofthetaxonomicgroups.
●
●
●
●
●
●
Bacteria
Protists
Annelids
Nematodes
Platyhelminthes
Arthropods
0.25
0.50
0.75
1.00
2 3 4 5Number of OTUs (log10)
Spat
ial s
tabi
lity
a
●●
●
●
●
●
BacteriaProtists
Annelids
Nematodes
Platyhelminthes
Arthropods
0.80
0.85
0.90
0.95
1.00
2 3 4 5Number of OTUs (log10)
Taxo
nom
ic s
tabi
lity
b
Chapter3–TopicModelling
182
Discussion
Large environmental DNA datasets offer a unique opportunity to unlock some of the
major challenges in community ecology, yet as a result data accumulation is
accelerating, thuscreatingtheneed fornovelmethodsadaptedto thesedata.Herewe
havepresentedthepotentialoftheLatentDirichletAllocationmethodfortheanalysisof
metabarcodingdata.Thismodel-basedmethodisadaptedtolargeandsparsedatasets.
Itassignsaprobabilityweightforeachsampletobelongtoanassemblagebasedonthe
OTUs in thissample,andalso infers thecomposition inOTUsofeachassemblage(see
Table S4). It thus goes beyond a categorical classification of samples and generates
biologically interpretable assemblages. Here, we further elaborate on the advantages
andlimitationsofthisapproach,andontheimplicationstotheanalysisoftheforestsoil
dataset.
DiscussingtheassumptionsofLDA.Unlikeinclassicalmultivariatemethods,noprior
transformationofthedataisrequired:inputdataconsistofdiscreteOTUabundances,or
occurrences,andsamplesizesmayvaryacrosssamples.Inputdataarenotrequiredto
meetanormalityassumption,thedefinitionofadissimilaritymetricisnotrequired,and
LDA thusmakes amore parsimonious use of the data. The assumptionsmade by the
underlyingmodelareminimal:theDirichletprioristhenaturalpriorfortheparameters
ofthecategoricaldistribution,anditissufficientlyflexibletofitmostdatasets(O’Brien
&Record,2016).Onecouldtakeasteptowardmoremechanisticmodellingbyadding
moreassumptionstotheLDAapproach.For instance,onecouldassumethataneutral
dynamicstakesplacewithinassemblages,sothattheirtaxonomiccompositionfollows
the taxa-abundance distribution predicted by Hubbell’s neutral theory of biodiversity
(Hubbell, 2001;Harrisetal., 2015).Assuming aDirichlet prior also on the taxonomic
composition of assemblages, as done in the Bayesian version of LDA (Griffiths &
Steyvers, 2004; Valle et al., 2014), is a first step in that direction, since the Dirichlet
distributionapproximatestheneutraltaxa-abundancedistributionforalargenumberof
taxa.
Chapter3–TopicModelling
183
Assessing therobustnessof theLDAdecompositionandselecting thenumberof
assemblages. In many applications of LDA, the question of the robustness of the
decompositioniscrucial.Howevertherobustnessofthealgorithm,asmeasuredbythe
similarity of the output across runs, has rarely been assessed, probably because it
entails a serious computational burden. Here we have proposed a practical way to
measure the similarity across runs based on the symmetrised Kullback-Leibler
divergence,andhaveusedittoquantifyhowstablethedecompositioniswithrespectto
initialization. We have computed two complementary stability metrics. First, 𝑆
measuresthemeansimilarityacrosspairsofrealizations.Thisstabilitymetricisgeneral
sinceitisnotcentredonthebestrealization,andmeasureshowstronglystructuredthe
data are. Second, I is the similarity intercept obtained by comparing the highest-
likelihoodrealizationtoallothersthroughalinearregressionoftheirsimilarityagainst
their log-likelihood difference. This second stability metric takes account of the
likelihoodinformation,islesscomputationallyintensive,andisusedtoassesswhether
therealizationwithhighestlikelihoodhasreachedtheoptimalsolution.
The symmetrised Kullback-Leibler (sKL) divergence is suited to assessing
stabilitybecauseitissensitivetosmalldifferencesbetweendistributions.However,itis
unbounded,whichmakesitdifficulttointerpret.BynormalizingthesKLdivergenceby
itsmeanvalueoverrandomizations,wedefinedasimilarityindexσequalto1whenthe
distributions are identical and to 0when they are nomore similar than expected by
chance.This indexalsoaccounts for spatial autocorrelation in thedatabyperforming
spatialrandomizations.
To compute the similarity between two decompositions, we consider only the
similarity between the best-matching assemblages of both decompositions, thus
discardingpart of the information.Thismethodworkswellwhen thedecompositions
aresimilar,howeversimilarityisundesirablylowwhenassemblagesaremergedorsplit
between the two decompositions. This could be corrected by computing the sKL
divergence between the full partitioning of the data in both decompositions, i.e. the
assignment of every sequence read to an assemblage, instead of comparing pairs of
assemblages.Whilethisistheapproachadvocatedforintheclusteringliterature(Meila,
Chapter3–TopicModelling
184
2006;Vinhetal.,2010),itwouldlikelybeverycomputationallyintensiveinthecaseof
LDA.
ThereisnouniquemethodtoselectthenumberofLDAcomponentunits(Airoldi
etal.,2010).HereweuseAICminimizationasan indicationof theoptimalnumberof
assemblages. Another commonly used method consists in splitting the data into a
learningsetandatestset,andoptimizingthepredictivepowerofthemodelonthetest
set, as measured by a perplexity function (Blei et al., 2003). A more sophisticated
methodistofollowthenon-parametricmodellingapproachof(Tehetal.,2006),where
the number of assemblages ismodelled as a random latent variable that is estimated
from the data. However, this method proved to have convergence issues on our
empirical data. Stability of the algorithm’s output could also be used as a criterion to
select the number of assemblages. When a large number of LDA component units is
selected,anadditionalstepofanalysisusingsimplerstatisticalmethodsmaybeneeded
torepresentandinterprettheresultoftheLDAdecomposition(Mauchetal.,2015).
Tropical forest soil biodiversity decomposition. By applying LDA to an
environmentalDNAdataset,wedescribedthespatialstructureofbacterial,protistand
metazoansoilcommunitiesina12-hatropicalforestplot.Thespatialpatternsretrieved
byLDAforthesetaxonomicgroupsallowedustoshedlightonsoilcommunitystructure
(seealsoZingeretal.,2017).
We applied the LDA algorithm to metabarcoding data with no further
transformation than clustering the sequences to avoid defining spurious OTUs. We
verified that the interpolation of missing samples played no role in generating the
observed patterns. The AIC minimization yielded between 2 and 5 assemblages for
occurrencedatadependingonthetaxonomicgroup,butweusedthevalue𝐾 = 3across
groupstofacilitateintercomparisonandbecausetheLDAdecompositionisrobusttothe
number of assemblages close to the optimum (Fig. S7). For the 20,162-OTU bacterial
dataset, the largest dataset considered in this study, numerical inference of the LDA
decompositionforthreeassemblagestookabout25minutesforoccurrencedataand35
minutes for abundance data, which amounts to respectively 48 and 60 hours when
running100realizationsofthealgorithmtoteststability.
Chapter3–TopicModelling
185
Abun
danc
eO
ccur
renc
eBacteria Protists Fungi
a b c
d e f
110 120 130Topography (m a.s.l.)
0 5 10Wetness
h
0.1 0.2 0.3 0.4 0.5Slope
i
a b
cd e
f
110 120 130Topography (m a.s.l.)
g
0 5 10Wetness
h
0.1 0.2 0.3 0.4 0.5Slope
i
100m
a b c
d
e
f
110 120 130Topography (m a.s.l.)
g
0 5 10Wetness
0.1 0.2 0.3 0.4 0.5Slope
i
Topographic Wetness Index
Figure5:Spatialdistributionofmicroorganismassemblages,for𝑲 = 𝟑assemblages.
SpatialdistributionoftheassemblagesobtainedfromindependentLDAdecompositionsofbacteriaandprotists,for(a-b)abundanceand(c-d)occurrencedata.Sampledlocationsareindicatedbydarkdots,andtheassemblagemixturebetweensampleshasbeeninterpolatedusingordinarykriging.Terrafirme(inblue),hydromorphic(ingreen)andexposedrock(inred)assemblagescanbeidentifiedineachtaxonomicgroup,basedoncorrelationsto(e-f)Lidar-derivedtopography,TopographicWetnessIndexandslope,aswellasonfieldobservations.Thespatialpatternsretrievedforabundancedataaresimilartothoseobtainedwithoccurrencedatabutlessstronglycorrelatedtotopographicvariables.
Chapter3–TopicModelling
186
Thestabilityanalysisof thealgorithm indicates that communitiesofunicellular
organisms (i.e. bacteria andprotists) aremarkedly structured at the scale of theplot,
while metazoan communities are less so. The stability of the decomposition is also
stronglycorrelatedwiththenumberofOTUs,whichspansseveralordersofmagnitude
across taxonomic groups (Fig. 4, Table1). Thus, the lower statistical power in groups
containing fewer OTUs could explain this pattern. However, it is more likely due to
ecologicaldifferencesbetweengroups.Indeed,thispatternisconfirmedbyZingeretal.
using ordination-based variation partitioning between environmental and spatial
components.
Furthermore,thetwounicellularorganismgroupscaneachbedecomposedinto
threespatiallysegregatedassemblagesmatchingplottopography.Whilethecovariation
ofmicroorganism compositionwith topographywas already detected in Zinger et al.,
spatialpatternscanherebedirectlyrepresentedundertheformofassemblagesthatare
characteristic of the different topographic conditions (Fig. 5, Table S4). These spatial
patternscanalsobeshowntobesimilarbetweenbacteriaandprotists,whichisbotha
novel insight and a hint that the assemblages retrieved by LDAdo reflect community
structure. One assemblage associated with patches of exposed rock was retrieved in
bacteriaandprotistsbutalsoinarthropodsandnematodes.Itstaxonomiccomposition
is particularly distinctive (Fig. S7), which might be explained by the high amount of
decaying organic matter retained between the boulders in these patches. A current
limitationofLDAisthatitsabilitytocomparetaxonomiccompositiontoenvironmental
data is limited to computing simple correlations between the spatial distribution of
retrieved assemblages and environmental variables. This is in contrast to ordination-
basedmethods such as CanonicalRedundancyAnalysis, and improving on this aspect
wouldbeausefuldirectionofresearch.
Using occurrence versus abundance data. The use of occurrence data was
computationallyfaster,andledtomorestableandmoreinterpretablepatterns.Because
biodiversity data typically display a wide range of taxonomic abundances (Fig. S5),
switching from abundance to occurrence data amounts to dramatically increasing the
weight of rare taxa. In the empirical dataset, these OTUs constitute the bulk of the
Chapter3–TopicModelling
187
diversity: OTUs tallying on average less than one sequence read per samplemake up
over 85% of the total number of OTUs in bacteria and protists (Fig. S5). They play a
significant role in shaping the patterns, since removing them erases the retrieved
occurrence-basedspatialpatterns(Fig.S6).Thishintsattheimportanceofraretaxain
defining communities of microorganisms. A possible caveat however is that some of
thoserareOTUsmightbegeneratedbyremnantPCRerrorsinthedata.IfPCRerrorsare
repeatable for a given DNA sequence, this would produce groups of consistently co-
occurringOTUsandthusartificiallyincreasethestabilityofoccurrence-basedpatterns.
Conclusion. LDA is an efficientmethod to detect structure in the large and complex
datasetsgeneratedbyenvironmentalDNAsequencingmethods.Therepresentationof
spatialbiodiversitypatternsderived fromLDA iseasily interpretable, and themethod
comeswithameasureofhowstronglythisrepresentationissupportedbythedata.LDA
couldbeused toexplore thebiogeographicpatternsarising in larger-scaleDNA-based
biodiversitysurveyssuchastheEarthMicrobiomeProject(Gilbertetal.,2014)andthe
Tara Oceans Project (Sunagawa et al., 2015). It could also be applied in non-spatial
samplingdesigns,suchastimeseries.Lastly,LDAisoneexampleofafamilyofmodels,
which could for instance find applications in the study of plant-microorganism
interactions (Rosen-Zvi et al., 2004). We hope this study will stimulate research on
model-basedmethodsofdataanalysisfortheecologicalinterpretationofenvironmental
DNAstudies.
Chapter3–TopicModelling
188
Acknowledgements
We thank Dylan Craven, BartHaegeman,HélèneMorlon, TimPaine,MélanieRoy andMarc-AndréSelosseforfruitfuldiscussions.WethankBlaiseTymenforhishelpwiththe
Lidardata.Thisworkhasbenefited from “Investissementd’Avenir” grantsmanagedby
theFrenchAgenceNationaledelaRecherche(CEBA,ref.ANR-10-LABX-25-01andTULIP,
ref.ANR-10-LABX-0041;ANAEE-France:ANR-11-INBS-0001), anadditionalANRgrant
(METABARproject;PIP.Taberlet), funds fromCNRS.Wearegrateful to theGenotoul
bioinformaticsplatformToulouseMidi-Pyrénéesforprovidingcomputingresources.
Chapter3–TopicModelling
189
References
Airoldi, E.M., Erosheva, E.A., Fienberg, S.E., Joutard, C., Love, T. & Shringarpure, S. (2010)ReconceptualizingtheclassificationofPNASarticles.PNAS,107,20899–20904.
Andersen, K., Bird, K.L., Rasmussen,M., Haile, J., Breuning-Madsen,H., Kjaer, K.H., Orlando, L.,Gilbert, M.T.P. &Willerslev, E. (2012)Meta-barcoding of “dirt” DNA from soil reflectsvertebratebiodiversity.MolecularEcology,21,1966–1979.
Balagopalan,A.(2012)ImprovingTopicReproducibilityinTopicModels.Beven,K.J.&Kirkby,M.J. (1979)Aphysicallybased,variablecontributingareamodelofbasin
hydrology.HydrologicalSciencesBulletin,24,43–69.Blei, D. (2012) Probabilstic Topic Models. Communication of the Association for Computing
Machinery,55,77–84.Blei,D.M.,Ng,A.Y.&Jordan,M.I.(2003)LatentDirichletAllocation.JournalofMachineLearning
Research,3,993–1022.Bongers, F., Charles-Dominique, P., Forget, P.-M.&Théry,M. (2001)Nouragues:dynamicsand
plant-animalinteractionsinaNeotropicalrainforest,SpringerScience&BusinessMedia.Boyer,F.,Mercier,C.,Bonin,A.,LeBras,Y.,Taberlet,P.&Coissac,E.(2016)OBITOOLS:aUNIX-
inspired software package for DNA metabarcoding. Molecular Ecology Resources, 16,176–182.
Burnham,K.P.&Anderson,D.R.(2002)Modelselectionandmultimodelinference,Springer,NewYork.
Chave,J.,Olivier,J.,Bongers,F.,Châtelet,P.,Forget,P.-M.,vanderMeer,P.,Norden,N.,Riéra,B.&Charles-Dominique,P.(2008)Above-groundbiomassandproductivityinarainforestofeasternSouthAmerica.JournalofTropicalEcology,24,355–366.
Ding,T.&Schloss,P.D.(2014)Dynamicsandassociationsofmicrobialcommunitytypesacrossthehumanbody.Nature,509,357–+.
Fortin,M.J.& Payette, S. (2002)How to test the significance of the relation between spatiallyautocorrelated data at the landscape scale: A case study using fire and forest maps.Ecoscience,9,213–218.
Gilbert, J.A., Jansson, J.K. & Knight, R. (2014) The Earth Microbiome project: successes andaspirations.BmcBiology,12.
Griffiths,T.&Steyvers,M.(2004)CollapsedGibbsSamplingforLDA.101,5228–5235.Grün,B.&Hornik,K.(2011)topicmodels:anRpackageforfittingtopicmodels.Harris,K.,Parsons,T.L.,Ijaz,U.Z.,Lahti,L.,Holmes,I.&Quince,C.(2015)Linkingstatisticaland
ecological theory: Hubbell’s Unified Neutral Theory of Biodiversity as a HierarchicalDirichletProcess.Proc.IEEE,PP,1–14.
Holmes,I.,Harris,K.&Quince,C.(2012)DirichletMultinomialMixtures:GenerativeModelsforMicrobialMetagenomics.PlosOne,7.
Hubbell, S.P. (2001) The unified neutral theory of biodiversity and biogeography (MPB-32),PrincetonUniversityPress.
Kembel, S.W., Wu, M., Eisen, J.A. & Green, J.L. (2012) Incorporating 16S gene copy numberinformation improves estimates of microbial diversity and abundance. PlosComputationalBiology,8,11.
Klymus,K.E.,Richter,C.A.,Chapman,D.C.&Paukert,C.(2015)QuantificationofeDNAsheddingrates from invasive bighead carp Hypophthalmichthys nobilis and silver carp
Chapter3–TopicModelling
190
Hypophthalmichthysmolitrix.BiologicalConservation,183,77–84.Knights,D.,Kuczynski, J., Charlson,E.S., Zaneveld, J.,Mozer,M.C.,Collman,R.G.,Bushman,F.D.,
Knight,R.&Kelley,S.T.(2011)Bayesiancommunity-wideculture-independentmicrobialsourcetracking.NatureMethods,8,761-U107.
Kullback,S.(1959)InformationTheoryandStatistics,JohnWiley&Sons.Legendre,P.&Legendre,L.(2012)NumericalEcology,Elsevier.Mauch,M.,MacCallum,R.M.,Levy,M.&Leroi,A.M.(2015)Theevolutionofpopularmusic:USA
1960-2010.RoyalSocietyopenscience,2,150081–150081.Meila,M.(2006)Comparingclusterings—aninformationbaseddistance.JournalofMultivariate
Analysis,98,873–895.Nguyen,N.H., Smith,D., Peay, K.&Kennedy, P. (2015) Parsing ecological signal fromnoise in
nextgenerationampliconsequencing.NewPhytologist,205,1389–1393.O’Brien,J.&Record,N.(2016)ThepowerandpitfallsofDirichlet-multinomialmixturemodels
forecologicalcountdata.BioRxivpreprint.Pritchard, J.K., Stephens, M. & Donnelly, P. (2000) Inference of population structure using
multilocusgenotypedata.Genetics,155,945–959.Rejou-Mechain,M.,Tymen,B.,Blanc,L.,Fauset,S.,Feldpausch,T.R.,Monteagudo,A.,Phillips,O.L.,
Richard,H.&Chave,J.(2015)Usingrepeatedsmall-footprintLiDARacquisitionstoinferspatialandtemporalvariationsofahigh-biomassNeotropical forest.RemoteSensingofEnvironment,169,93–101.
Rosen-Zvi,M.,Gri_ffiths,T.,Steyvers,M.&Smyth,P.(2004)TheAuthor-TopicModelforAuthorsandDocuments.
Rosvall, M., Axelsson, D. & Bergstrom, C.T. (2009) The map equation. The European PhysicalJournalSpecialTopics,178,13–23.
Shafiei, M., Dunn, K.A., Boon, E., MacDonald, S.M.,Walsh, D.A., Gu, H. & Bielawski, J.P. (2015)BioMiCo:asupervisedBayesianmodel for inferenceofmicrobialcommunitystructure.Microbiome,3,8.
Sommeria-Klein, G., Zinger, L., Taberlet, P., Coissac, E. & Chave, J. (2016) Inferring neutralbiodiversityparametersusingenvironmentalDNAdatasets.Scientificreports,6.
Steyvers,M.&Griffiths,T.(2007)ProbabilisticTopicModels.LatentSemanticAnalysis:ARoadtoMeaning (ed. by T. Landauer), D. McNamara), S. Dennis), and W. Kintsch), LaurenceErlbaum.
Sunagawa, S., Coelho, L.P., Chaffron, S., Kultima, J.R., Labadie, K., Salazar, G., Djahanschiri, B.,Zeller,G.,Mende,D.R.,Alberti,A.,Cornejo-Castillo,F.M.,Costea,P.I.,Cruaud,C.,d’Ovidio,F.,Engelen,S.,Ferrera, I.,Gasol, J.M.,Guidi,L.,Hildebrand,F.,Kokoszka,F.,Lepoivre,C.,Lima-Mendez,G.,Poulain, J.,Poulos,B.T.,Royo-Llonch,M.,Sarmento,H.,Vieira-Silva,S.,Dimier,C.,Picheral,M.,Searson,S.,Kandels-Lewis,S.,Bowler,C.,deVargas,C.,Gorsky,G.,Grimsley,N.,Hingamp,P.,Iudicone,D.,Jaillon,O.,Not,F.,Ogata,H.,Pesant,S.,Speich,S.,Stemmann, L., Sullivan,M.B.,Weissenbach, J.,Wincker, P., Karsenti, E., Raes, J., Acinas,S.G., Bork, P. & Tara Oceans, C. (2015) Structure and function of the global oceanmicrobiome.Science,348.
Taberlet,P.,Coissac,E.,Hajibabaei,M.&Rieseberg,L.H.(2012)EnvironmentalDNA.MolecularEcology,21,1789–1793.
Teh,Y.W., Jordan,M.I.,Beal,M.J.&Blei,D.M.(2006)HierarchicalDirichletProcesses. JournaloftheAmericanStatisticalAssociation,101,1566–1581.
Than,K.&Ho,T.B.(2012)FullySparseTopicModels.
Chapter3–TopicModelling
191
Thomsen,P.F.&Willerslev,E.(2015)EnvironmentalDNA-Anemergingtoolinconservationformonitoringpastandpresentbiodiversity.BiologicalConservation,183,4–18.
Tymen,B.,Vincent,G.,Courtois,E.A.,Heurtebize, J.,Dauzat, J.,Marechaux, I.&Chave, J. (2017)Quantifying micro-environmental variation in tropical rainforest understory atlandscapescalebycombiningairborneLiDARscanningandasensornetwork.AnnalsofForestScience,2,1–13.
Valle,D.,Baiser,B.,Woodall,C.W.&Chazdon,R.(2014)DecomposingbiodiversitydatausingtheLatentDirichletAllocationmodel,aprobabilisticmultivariatestatisticalmethod.EcologyLetters,17,1591–1601.
Vinh,N.X.,Epps,J.&Bailey,J.(2010)Informationtheoreticmeasuresforclusteringscomparison:variants, properties, normalization and correction for chance. Journal of MachineLearningResearch,11,2837–2854.
Zinger, L., Chave, J., Coissac, E., Iribar, A., Louisanna, E., Manzi, S., Schilling, V., Schimann, H.,Sommeria-Klein, G. & Taberlet, P. (2016) Extracellular DNA extraction is a fast, cheapand reliable alternative for multi-taxa surveys based on soil DNA. Soil Biology andBiochemistry,96,16–19.
Zinger, L., Taberlet, P., Schimann, H., Bonin, A., Boyer, F., De Barba, M., Gaucher, P., Gielly, L.,Giguet-Covex,C.,Iribar,A.,Rejou-Mechain,M.,Raye,G.,Rioux,D.,Schilling,V.,Tymen,B.,Viers,J.,Zouiten,C.,Thuiller,W.,Coissac,E.&Chave,J.(2017)Soilcommunityassemblyvariesacrossbodysizesinatropicalforest.bioRxiv.
Chapter3–TopicModelling
192
Spatialstability(K=3)
Taxonomicstability(K=3)
Abundancedata
Occurrencedata
Abundancedata
Occurrencedata
⟨ S!"#$.⟩ !""
I !"#$.,!""
⟨ S!"#$.⟩ !""
I !"#$.,!""
⟨ S!"#$.⟩ !
""
I !"#$.,!""
⟨ S!"#$.⟩ !
""
I !"#$.,!""
Bacteria16S
0.88
1.0
0.85
1.0
0.99
1.0
0.95
1.0
Protists18S
0.72
0.87
0.68
1.0
0.92
0.96
0.95
1.0
Arthropods18S
0.46
0.65
0.62
0.62
0.65
0.78
0.91
0.93
Nem
atodes18S
0.43
0.67
0.33
0.49
0.69
0.87
0.88
0.94
Platyhelminthes18S
0.45
0.69
0.52
0.50
0.66
0.83
0.86
0.88
Annelids18S
0.63
0.81
0.41
0.57
0.75
0.85
0.83
0.90
TableS1:StabilityofLDAdecompositionforoccurrenceandabundancedata.Foreachofthetaxonomicgroupsunderstudy,
spatialandtaxonom
icstabilityforthreeassemblagesasmeasuredbythe⟨ 𝑆⟩ !""and𝐼 !""metrics,forabundanceandoccurrence
data.
SupplementaryInformation
Chapter3–TopicModelling
193
Topography Wetness Slope
Terrafirme
Bacteria16S 0.36** -0.27** -0.26**Protists18S 0.23** -0.15** -0.17**Arthropods18S 0.18** -0.16*** -0.027Nematodes18S 0.15** -0.091** -0.081**Platyhelminthes18S 0.12** -0.11** -0.043Annelids18S 0.023 0.042 -0.091*
Hydromorphic
Bacteria16S -0.43*** 0.40*** 0.31**Protists18S -0.23** 0.10* 0.21***Arthropods18S -0.097* 0.10** -0.015Nematodes18S -0.096*** 0.10** 0.045Platyhelminthes18S 0.044 -0.099** 0.0087Annelids18S -0.057 0.058 0.022
Exposedrock
Bacteria16S 0.00024 -0.078** 0.0084Protists18S -0.052 0.084 -0.025Arthropods18S -0.12* 0.083 0.070Nematodes18S -0.071 -0.018 0.049Platyhelminthes18S -0.14** 0.19** 0.027Annelids18S 0.0098 -0.072* 0.075**
Table S2: Correlation coefficients between the spatial distribution of assemblages andabiotic variables. p-values p were computed based on 100,000 spatial randomizations.Significantcorrelationcoefficientsare indicatedby*,**,*** (𝑝 < 0.05, 𝑝 < 0.01, 𝑝 < 0.001),andadditionally by bold font when they are consistent with a hydromorphic or terra firmeinterpretation. Taxonomic groups in bold are those that can be assigned a ‘terra firme’ or‘hydromorphic’ label based on correlations to topography,wetness and slope, or an ‘exposedrock’labelbasedoncorrelationtothe‘exposedrock’bacterialassemblage(seeTableS3).
Chapter3–TopicModelling
194
Table S3: Correlation coefficients𝝆𝒔𝒑𝒂𝒕. between the spatial distribution of bacterialassemblages and the assemblages in other taxonomic groups. p-valuespwere computedbased on 100,000 spatial randomizations. Significant correlation coefficients are indicated by*,**,*** ( 𝑝 < 0.05, 𝑝 < 0.01, 𝑝 < 0.001 ), and correlation coefficients larger than 0.50 areindicatedbyboldfont.LabelsofassemblagesarethesameasinTableS2.
Bacteria16S
Terrafirme Hydromorphic Exposedrock
1stassemblage
-
Terrafirme
Protists18S 0.76*** -0.40*** -0.53***
Arthropods18S 0.23*** 0.12** -0.43***
Nematodes18S 0.24*** -0.19*** -0.098***
Platyhelminthes18S 0.32*** -0.10** -0.29***
Annelids18S 0.23*** 0.0046 -0.29***
2ndassemblage
-
Hydromorphic
Protists18S -0.45*** 0.51*** 0.022
Arthropods18S 0.16*** -0.21*** 0.022
Nematodes18S 0.10*** 0.16*** -0.29***
Platyhelminthes18S 0.20*** -0.13*** -0.12**
Annelids18S -0.064 0.13* -0.059*
3rdassemblage
-
Exposedrock
Protists18S -0.56*** -0.055 0.76***
Arthropods18S -0.65*** 0.12* 0.69***
Nematodes18S -0.48*** 0.045 0.56***
Platyhelminthes18S -0.48*** 0.20*** 0.38***
Annelids18S -0.18*** -0.076** 0.31***
Chapter3–TopicModelling
195
Table S4: Five most abundant OTUs per bacterial assemblage (out of 20,162 bacterialOTUs),foroccurrenceandabundancedata.
OTUproportion Taxonomicassignment
Occurrence-basedassemblages
Terrafirme
8.1.10-4 Acidobacteria 8.1.10-4 Acidobacteriaceae(Subgroup1)sp. 8.1.10-4 Acidobacteriaceae(Subgroup1)sp. 8.0.10-4 Acetobacteraceaesp. 8.0.10-4 unculturedHolophagasp. …
Hydromorphic
6.2.10-4 Acidothermaceaesp. 6.2.10-4 unculturedHolophagasp. 6.1.10-4 Nitrosomonadaceaesp. 5.9.10-4 unculturedHolophagasp. 5.9.10-4 Haliangiaceaesp. …
Exposedrock
6.1.10-4 RhizobialesIncertaeSedissp. 5.8.10-4 unculturedAcetobacteraceaebacterium 5.8.10-4 unculturedAcidobacteriaceaebacterium 5.7.10-4 Acidobacteriaceae(Subgroup1)sp. 5.7.10-4 Bacteria …
Abundance-basedassemblages
Terrafirme
4.0.10-2 Acidobacteria 3.0.10-2 unculturedNitrosococcussp. 2.6.10-2 unculturedBacillaceaebacterium 2.2.10-2 Acidothermaceaesp. 1.9.10-2 Alcaligenaceaesp. …
Hydromorphic
1.7.10-2 Alcaligenaceaesp. 1.6.10-2 unculturedThermosporotrichaceaebacterium 1.6.10-2 unculturedBacillaceaebacterium 1.4.10-2 Acidobacteria 1.3.10-2 Acidothermaceaesp. …
ExposedRock
3.8.10-2 Acidothermaceaesp. 1.6.10-2 Acidothermaceaesp. 1.5.10-2 unculturedNitrosococcussp. 1.0.10-2 unculturedSteroidobactersp. 1.0.10-2 Xanthobacteraceaesp. …
Chapter3–TopicModelling
196
Figure S1. Soil sampling over 12 ha of tropical forest. 1,131 soil samples (one every 10meters) were taken from the mineral soil horizon on a permanent plot of relativelyhomogeneous primary plateau forest at the Nouragues Ecological Research Station, FrenchGuiana.
300m400m
100m
Chapter3–TopicModelling
197
FigureS2.LDAappliedtoasimulateddatasetwith5assemblages,1,000MOTUs,1,131samples,and1,000 sequence readsper sample, (a,c) for the original abundancedata, and (b,d) for theoccurrencedataderivedfromthesamedataset.Panels(a,b)showthecomparisonbetweentherealization with highest likelihood and the 99 others using the spatial similarity𝑆!"#$. .𝑆!"#$. !"" = 0.98 for occurrence data, 𝑆!"#$. !"" = 0.89 for abundance data,𝐼!"#$.,!"" = 1.0 inboth cases; cf. Fig. 2. Panels (c,d) show AIC comparison between different K values, with 3realizationsperKvalue.
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●
●●●
●●●
●
●●●
●
●
●
●
●●
●●
●
0 50000 100000 150000
0.65
0.75
0.85
0.95
Spat
ial s
imila
rity
to b
est r
ealiz
atio
n
Llh difference with best realization
a
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●● ●
0 2000 4000 6000
0.6
0.7
0.8
0.9
1.0
Llh difference with best realization
b
●
●
● ●
● ● ●
2 3 4 5 6 7 8
8.8e
+06
9.4e
+06
1.0e
+07
●
●
● ●
● ● ●
●
●
● ●
● ● ●
●
●
●
● ● ● ●
c
Number K of assemblages
AIC
●
●
●
● ● ● ●
2 3 4 5 6 7 8
1220
000
1260
000
●
●
●
● ● ● ●
●
●
●
● ● ●●
●
●
●
●
● ● ●
d
Number K of assemblages
Chapter3–TopicModelling
198
FigureS3.StabilityofLDAdecomposition(𝑲 = 𝟑)forthedifferenttaxonomicgroups.Therealizationwithhighestlikelihoodoutof100iscomparedtothe99othersbasedontheirspatialsimilarity(y-axis)andontheirlog-likelihooddifference(x-axis),foroccurrencedataandforallthe taxonomic groupsunder study. The intercept I of the linear regression (dashedblue line)showsadifferencebetweenunicellularorganisms(𝐼 = 1.0)andmetazoans(𝐼 < 0.62).
●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●●●●●●● ● ●●
●●
●
●
●
●●●●
●●
●
0.00
0.25
0.50
0.75
1.00
0 2000 4000 6000 8000Llh difference
with best realization
Spat
ial s
imila
rity
to b
est r
ealiz
atio
n
a − Bacteria●●●●●
●●●●●
●
●
●
●●●●
●
●
●
●
●
●●
●
●●●●●
●
●
●●●●
●
●
●
●
●●●●
●
●●●●●
●
●●
●●●
●
●●●
●●
●
●
●
●
●
●●
●
●●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
0.00
0.25
0.50
0.75
1.00
0 300 600 900Llh difference
with best realization
b − Protists
●
●
●●
●●●
●
●
●●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●●●
●
●●●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
0.00
0.25
0.50
0.75
1.00
0 200 400 600Llh difference
with best realization
Spat
ial s
imila
rity
to b
est r
ealiz
atio
n
c − Arthropods
● ● ●
●
●
●
●
●
●●
●●
●
●
●
●
●●●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
0.00
0.25
0.50
0.75
1.00
0 100 200 300 400Llh difference
with best realization
d − Nematodes
●
●
●
● ●●●
●
●●●●●●
●●●●●●●●●
●●●●●
●●●●●●●●●
●●
●●●●
●●●●
●
●
●
●
●
●
●●
●●
●●●
●
●
●
●●●●●●●●●●●●●
●●●
●
●●●●
●●
●
●●●
●●●●
●
●●●●
0.00
0.25
0.50
0.75
1.00
0 200 400 600Llh difference
with best realization
Spat
ial s
imila
rity
to b
est r
ealiz
atio
n
e − Platyhelminthes●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
0.00
0.25
0.50
0.75
1.00
0 200 400 600Llh difference
with best realization
f − Annelids
Chapter3–TopicModelling
199
Figure S4: Spatialdistributionof eukaryotic assemblages, for𝑲 = 𝟑assemblages.SpatialdistributionoftheassemblagesobtainedfromindependentLDAdecompositionsofarthropods,nematodes,flatworms(Platyhelminthes)andannelids,foroccurrence(a-d)andabundance(e-h)data.Asinfigure4,sampledlocationsareindicatedbydarkdots,andtheassemblagemixturebetweensampleshasbeeninterpolatedusingordinarykriging.Foroccurrencedata,an‘exposedrock’ assemblage (in red) can be identified in arthropods and nematodes based on spatialcorrelationtothebacterial‘exposedrock’assemblage(TableS3).An‘exposedrock’assemblagemaybedistinguishedinflatwormsandannelidsaswellbutislessconspicuousthere.
Arthropods Nematodes
Occ
urre
nce
Platyhelminthes AnnelidsAb
unda
nce
a b c d
e f g h
Chapter3–TopicModelling
200
FigureS5:Rankedlog-abundancedistributionforbacteriaandprotists.
0 5000 15000
01
23
45
0 500 1000 15000
12
34
50 2000 6000 10000
01
23
45
OTUs ranked by abundance
OTUs ranked by abundance
MOTUs ranked by abundance
Abun
danc
e (lo
g 10 o
f rea
d nu
mbe
r)a - Bacteria b - Protists c - Fungi
Chapter3–TopicModelling
201
Figure S6:Effectofdatapre-processingonLDAdecomposition.Spatialdistributionoftheassemblages obtained from independent LDA decompositions of bacteria and protists foroccurrencedata, (a-b)withoutanyfilteringofrareOTUs,(c-d)afterremovingOTUsoccurringonly ina single sample, and (e-f) after removingOTUswith less thanone readper sampleonaverage (low-abundance OTUs). Removing single-sample OTUs brought little change to thedecomposition.Removinglow-abundanceOTUsontheotherhandyieldedverydegradedspatialpatterns in bacteria and protists, hinting at the important role of rareMOTUs in defining theretrievedassemblages.
a b
c d
e f i
Occ
urre
nce,
no
sing
le-s
ampl
e M
OTU
Occ
urre
nce,
no
low
-abu
ndan
ce
MO
TUO
ccur
renc
e
Bacteria Protists
Chapter3–TopicModelling
202
Figure S7: Spatial distribution of microorganism assemblages for occurrence data, for𝑲 = 𝟑 and for𝑲 = 𝑲𝐦𝐢𝐧[𝐀𝐈𝐂] assemblages (bacteria:𝐾!"# !"# = 5 ; protists:𝐾!"# !"# = 2 ).Decompositionsfor𝐾 = 𝐾!"#[!"#]andforK=3differprimarilythroughsplittingormergingofassemblages,withoutmajordisruptionofthespatialpatterns.ThisillustratestherobustnessofLDA decomposition to the number of assemblages close to the optimum. The exposed rockassemblage(darkred) is leftunchangedforKbetween2and5 inbacteriaandprotists,whichindicatesastrongtaxonomicdistinctiveness.
a b
c d f
Bacteria Protists Fungi
K =
3K m
in[A
IC]
Discussion
203
Discussion
Discussion
204
I. Synthesis
While data acquisition in ecology has long been dominated by low-technology
approachesrelyingmostlyondirecthumanobservation,thefieldhasrecentlywitnessed
a trend toward automated data acquisition. In particular, automated biodiversity
measurementscannowbeobtainedthroughthesequencingofenvironmentalDNA.This
method has originated in microbiology, where it is often the only means to obtain
informationontheorganismsunderstudy,butcannowbeappliedwithincreasingease
toanytypeoforganism.Thisprovidesecologistswithanunprecedentedinfluxofdata,
whichalsocreatesnewchallenges.
WhileDNA-baseddataareauniquemeanstoobtainexhaustiveandstandardized
biodiversitymeasurements,itremainsuncertaintowhatextenttheywillhelpsolvethe
classical questions of community ecology. Indeed, these data are obtained through
indirect observation of the targeted organisms, and lack in detail and accuracy
comparedtodirectobservations:inasense,qualityistradedforquantity.Thisentailsa
shiftfromstudiesrichinbiologicaldetailstowardthestudyofstructureandpatternsin
largedatasets.Moreover, thesheeramountofdataproduced is in itselfanobstacle to
theuseoftheclassicalstatisticalapproachesofecology.Conversely,currenttheoretical
modelsinecologyareoftennotwellsuitedforcomparisonwithdata.
The characteristics of environmental DNA data make them well suited to the
studyofintegrativepatternsofbiodiversity,forwhichthequantityandexhaustiveness
ofavailabledatamattermorethandetailed informationonindividualtaxa. Integrative
patterns have long been a key source of information for addressing one of the core
questionsof communityecology:whatare thedriversof communityassembly, and in
particular, when do dispersal limitation and demographic drift supersede abiotic
filteringandspeciesinteractionsasthemaindrivers?Thefirstandthirdchaptersofthis
thesis explore the use of environmental DNA data for the study of spatially explicit
biodiversitypatterns,whilethesecondchapterfocusesonrelativespeciesabundances.
ThesepatternsarestudiedinthetropicalforestofFrenchGuiana,a‘hyperdiverse’and
Discussion
205
poorlyknownecosystem;twocharacteristicsthatmakeautomateddatacollectionmost
needed.
ThefirstchaptershowshowenvironmentalDNAdatacanbeusedtoinvestigate
thedriversofbetadiversityinaspatiallyexplicitcontext,asithasbeendonepreviously
for classical data such as tree censuses in monitored forest plots. On a spatial scale
rangingfrom40mto140kmbetweensamplingpoints,adecayoftaxonomicsimilarity
withdistanceisobservedinmostgroups,i.e.plants,fungi,arthropods,insects,annelids,
bacteria, and protists, but not in nematodes and flatworms. Clear differences can be
observed between domains of life regarding the relative influence of geographic
distanceandabioticconditionsonbetadiversity:thedatahintatapredominanteffectof
dispersal limitation in plants and annelids, a predominant effect of abiotic filtering in
bacteriaandprotists,andamixtureofboth in fungi,arthropodsand insects.Thebeta
diversity of fungi and soil insects appears to be especially high. These findings are in
agreementwithexpectationsandpreviousempirical results forplants andunicellular
organisms (Condit et al., 2002; Soininen et al., 2007; Ramirez et al., 2014), but bring
some novel insight for annelids, fungi and insects. In addition, the inclusion of a few
forest plots subject to past logging activities indicates that even after two decades at
least,aneffectcanbedetectedonplantandannelidcomposition,aswellasonfungitoa
lesserextent,whereas it isnot thecase forothergroups.Thus, large-scalepatternsof
biodiversitycannowbereadilymeasuredandcomparedacrossatropicalforest’swhole
rangeoftaxausingenvironmentalDNA.
The second chapter focuses on relative species abundances, a pattern that has
been extensively used to test the predictions of theoretical models of community
assembly,especiallysinceHubbell’sworkontheneutraltheoryofbiodiversity(Hubbell,
2001). A major obstacle in exploiting species abundance patterns generated using
environmentalDNAisthatabundanceinformationisunreliable,becauseitisnoisyand
difficulttointerpret.However,simulationsshowthatevenifabundancemeasurements
areunreliable for individual taxa, valuable information can still be retrieved from the
species abundance distribution as a whole, as long as the noise is not too strong. In
particular, the parameters that characterize diversity and connectivity in a neutral
communitymaystillbereliablyestimated.Thanktothesampling-invariancepropertyof
neutralmodels,sequencingreadsmaybeusedasdiscreteabundanceunits inplaceof
Discussion
206
individualsaslongastheDNAoriginatesfromanumberofindividualslargerthanthe
numberofreads.Whileitisusuallythecaseformicroorganisms,thisconditionmaynot
beverifiedforlargerorganisms.Lastly,greatcareshouldbetakeninclusteringspurious
OTUsgeneratedduringPCRamplificationandDNAsequencing,sincetheystronglybias
neutralparameterestimates.
When the spatial distribution of species is shaped at least partly by niche
processes or by limiteddispersal, the structure of spatially distributed environmental
DNAdatashouldbemarkedbytheseprocesses.However,thissignalmaybefaintand
complex. Moreover, it is usually obscured by a large number of rare species and an
uneven sampling effort across samples. The third chapter shows how a categorical
mixturemodelsimilartosomeofthemodelsusedinmicrobiology,populationgenetics
ortextdocumentmodelling,LatentDirichletAllocation,canbeusedtoretrievespatial
patternsinaregularly-sampled12-haforestplot.Unliketheclassicalpattern-detection
tools of community ecology, such as simple ordination and clustering algorithms, this
model is designed to accommodate discrete abundance data in a large number of
unevenly sized samples, and performswell on large and sparse communitymatrices.
Even though the fitted model parameters may depend on the initialization of the
inference algorithm, this uncertainty can be quantified by measuring the similarity
between the outputs of different runs. The stability of the output across initial
conditions may even be used as an empirical measure of how strong the spatial
structureis.
In the 12-ha forest plot, the strongest structure is detected for bacteria and
protists.Moreover,thespatialpatternsofthesetwogroupsareverysimilar,andmatch
the topography of the forest plot. This is in agreement with the findings of the first
chapter,sinceabioticfilteringwasfoundtheretostronglyinfluencethebetadiversityof
these groups. In contrast, spatial structure in arthropods and annelids isweak,which
indicatesthatthespatialscaleandthelevelofenvironmentalheterogeneityina12-ha
plot are insufficient todetect theprocesses thatwere found toacton thesegroupsat
largerspatialscales.
Overall, we conclude that environmental DNA data can offer a uniquely
comprehensive, if somewhat crude, perspective on community structure in a complex
Discussion
207
and species-rich ecosystem. In addition to the classical tools of community ecology,
model-basedstatisticalmethodscanbeborrowedfromfieldsmoreaccustomedtolarge
and complex datasets, and put to good use to take full advantage of these data. The
development of ecology into a data-rich field should foster the development of
theoreticalmodels thatcanbecomparedtodatausingrigorousstatisticalapproaches,
following the example of Hubbell’s neutral model and its subsequent theoretical
developments (Etienne, 2005; Harris et al., 2015). Building on generative models
stemming frommachine learning, such as Latent Dirichlet Allocation, is one possible
avenueforthedevelopmentofsuchmodels,asdiscussedinthefollowing.
Discussion
208
II. Perspectives
Aquaticcommunities 1.
Thisthesisaimedatexploringgeneralapproachesfortheanalysisandinterpretationof
large biodiversity datasets, with the underlying goal of understanding community
assembly processes from biodiversity patterns. Nevertheless, it mostly focuses on
communityassemblyinlandecosystems,especiallyasstudiedthroughtheamplification
andsequencingofDNAextractedfromsoilsamples.Onemayfollowasimilarapproach
forstudyingcommunitiesofaquaticorganismsbyextractingDNAfromwatersamples.
Experimentally, themethodconsists in filteringwater throughameshsoas to collect
smalllivingorganismsaswellasfragmentsorsloughedmaterialfromlargerorganisms.
In particular, this approach allows for the study of planktonic microorganisms (i.e.,
suspended in thewater column and passively transported bywatermovements), the
knowledge ofwhich is so far very fragmented, despite them forming the basis of the
ocean’s foodwebandbeing responsible for theproductionof half of the atmospheric
dioxygen(Fieldetal.,1998).
The Tara project is an unprecedented and on-going effort to sample marine
planktonic communities in various locations spread across the world’s oceans (de
Vargas etal., 2015). Samplingwas conducted chiefly in the open ocean from2009 to
2012, withmore recent campaigns focusing onmore specific habitats. Samples were
collectedatdifferentdepths,andusingdifferentmeshsizessoastoassignthesampled
organismstodifferentsizeranges.TheLatentDirichletAllocationapproachofthethird
chapteriscurrentlybeingappliedtothisdataset,soastounderstandthebiogeography
andcommunitystructureofplanktoniceukaryotesacrosstheworld’socean.
Discussion
209
Figure1:Biogeographicpatternsinoceanicplanktonpredictedbyaneutralagent-basedmodel.Transportbyoceaniccurrentwassimulatedduring1,400yearswithconstantmutationrate starting from a single genome. Biogeographic regions are distinguished based on theirdominantOTUs,definingOTUsat either (top)99.9%similarityor (bottom)99.5%similarity.AdaptedfromHellwegeretal.(2014).
However, it is unclear what a suitable neutral model would be for planktonic
communities. Indeed, unlike land organisms, planktonic organisms do not actively
disperse.Instead,thelocalcommunityistransportedovertimealongoceaniccurrents,
and slowlymixeswith surrounding communities along theway.Hubbell’smodel of a
localcommunityunderconstantimmigrationflowcouldberegardedasasuitablemodel
for a planktonic community followed through time along an oceanic current
(‘Lagrangian’ perspective). However, whether several simultaneously sampled
communitiescanbeconsideredas independentandundergoing immigration fromthe
same metacommunity depends on their positions relative to oceanic currents.
We first consider the diversity in the model,which is not a realistic estimate of the actualdiversity in the surface ocean microbe popula-tion because we simulate super-individuals, butit illustrates the behavior of model. When themodel starts with a diverse population (that is,each cell has a different genome) and no muta-tion, diversity decreases monotonically as OTUsare lost by extinction andnot gained bymutation(Fig. 1A). After ~100 years, the population con-sists of ~10 OTUs, resident in relatively distinctspatial regions, and the rate of OTU loss becomeslimited by dispersal between these provinces (28).At that time, the model starts to predict higherOTU richness than a neutral theory model thatdoes not consider dispersal limitation (31). Therate of OTU loss becomes low, but the popu-lations continue to mix (28) and the probabilityof extinction remains greater than zero. At100,000 years, the model includes two OTUs inthe Southern Ocean and everywhere else. Themodel should eventually reduce to one OTU,although this is not realized in the 100,000-yearsimulation.When the model is initialized with a diverse
population and includes mutation, it also ex-hibits an initial rapid loss in diversity but then
levels off at an OTU richness slightly higher thanthe simulation without mutation (Fig. 1B). Forthese simulations, we determined the diversityfrom a sample of the population (100 cells) by per-forming pairwise whole-genome BLAST (BasicLocal Alignment Search Tool) alignment, iden-tifying OTUs using 99.9% whole-genome iden-tity cutoff and then up-scaling to the true richness(in the model) using Chao1, a nonparametric spe-cies estimator that extrapolates from the sampledata to “true” richness (see supplementary mate-rials and methods). The OTU richness is variableover time because of stochastic transport andsampling (see fig. S4), but that is identical forall simulations. Therefore, the difference in OTUrichness between the simulations with and with-out mutation can be attributed solely to muta-tion. The difference is relatively small but increaseswith higher taxonomic resolution (99.95% cut-off) or mutation rate (×3). For a simulation start-ing with a uniform population (all cells havethe same genome) and including mutation, theOTU richness starts at one and then increasesonce sufficient mutations accumulate to exceedthe OTU threshold. After ~200 years, the simu-lations starting diverse and uniform converge. Atthat time, model has reached a dynamic steady
state where the rate of OTU loss by extinction isbalanced by the rate of OTU gain by mutation.From a practical perspective, this shows thatspecifying different initial conditions or runningthe model any longer would not change thediversity.The model is then used to explore the role
of neutral evolution in producing biogeographicpatterns. As an example, we compared the ge-nomes of cells fromHawaii and the Gulf of Alaska(Fig. 2B). For the simulation starting diversewithout mutation, the difference (nucleotide di-vergence) is 100% until ~700 years when it ab-ruptly decreases to 0% (Fig. 2A). This is causedbya takeover of the Central Pacific province by a cellfrom the North Pacific province or a coalescenceof these two subpopulations (27) (see also movieS1 around 700 years). The simulation starting di-verse with mutation also starts at 100% anddecreases at the coalescence event, but thenincreases again as the two subpopulations di-verge. The simulation starting uniform initiallyhas 0% difference but immediately starts toincrease and then converges with the simulationstarting diverse and including mutation. Coales-cence events are stochastic (see also fig. S4B),and we observed two such events over the 1500-year simulation period. There are also occasionalabrupt drops in nucleotide divergence, which aredue to vagrant cells that enter a province but donot establish. The magnitude of nucleotide di-vergence is a function of the growth and muta-tion rates (figs. S5 and S6).The time between coalescence events puts a
limit to how much two provinces can diverge,and in this case, the model predicts up to 0.5%difference (99.5% identity). These results can berelated to observations. If two cells are sampledfrom these two locations and their genomes aresequenced and compared, 0 to 0.5% of the ob-served difference can be attributed to neutral pro-cesses. This level of divergence is substantial butconsiderably lower than what is commonly con-sidered a species (>95% identity). We map outthe biogeographic pattern produced by neutralevolution for Hawaii compared with all locationsacross the globe (Fig. 2B). The model predictsthat nucleotide divergence generally increaseswith distance from Hawaii. However, the diver-gence is larger for the North Pacific than theIndian Ocean, so distance and/or the presenceof landmasses are not necessarily good proxiesof dispersal barriers. We also compile this in-formation for all locations into an atlas of neu-tral biogeography (table S1).We mapped out the biogeographic pattern
produced by neutral evolution using fragmentrecruitment (7), which is akin to in silico DNAhybridization. Specifically, we took the single-cell genomes (SCGs) of the OTUs that were main-tained in the simulation starting uniform at 1400years (see Fig. 1B) and recruited fragments col-lected on a 10°-by-10° grid. We assigned each gridbox to the highest-recruiting SCG (i.e., the dom-inant OTU) and colored them accordingly, il-lustrating the provinces produced by neutralevolution andmaintained by dispersal limitation
1348 12 SEPTEMBER 2014 • VOL 345 ISSUE 6202 sciencemag.org SCIENCE
Fig. 3. Biogeographic patterns (OTU provinces) in global surface ocean microbes predicted by aneutral agent-based model and quantified by metagenomics fragment recruitment. Alignment offragments collected on a 10°-by-10° grid (number of samples n = 10,000 at each box, fragment length l =1000 base pairs) with SCGs from OTUs remaining at 1400 years (see Fig. 1B). (Top) 99.9% and(bottom) 99.5% BLAST identity. “Start Uniform” simulation, where all initial cells have the same,completely random genome, is shown. The colors demarcate areas with common dominant OTUs.
RESEARCH | REPORTS
Discussion
210
Simulations of the transport of plankton by currents between stations could help
measure their level of connectivity (seeFig. 1; Followsetal., 2007;Wardetal., 2012;
Hellwegeretal.,2014),andserveasthebasisforinference-orientedmodellingefforts.
Topicmodellingofbiodiversitydata2.
AsdiscussedinthesectionIII.5oftheIntroduction,LatentDirichletAllocationisavery
versatilemethod,thathasbeenemployedinavarietyofcontextsfarbeyonditsoriginal
intendeduseasa ‘naturallanguageprocessing’method.Itcouldbecomearoutinetool
fortheanalysisofenvironmentalDNAdata,astheverysimilarStructuresoftwarehas
become in population genetics. While the third chapter focuses on the analysis of
spatially distributed samples, LDA could prove equally useful for the analysis of time
series,orwhenbothaspatialandatemporaldimensionsarepresent,as inValleetal.
(2014).Itcouldalsobeusedtoanalysesamplesthatareneitherspatiallynortemporally
distributed. This is for instance often the case of humanmicrobiome data, which are
currently collected in large quantities, and the interpretation of which is an active
domain of research inmedical sciences (Huttenhower etal., 2012). Another potential
application is the analysis of the bacterial communities found in sewage plants, the
understanding of which is of critical importance for the optimization of wastewater
treatment(Ofiteruetal.,2010).
Theuseofgenerativemixturemodelsisnotnewinmicrobiology:thesemethods
have been first introduced to the field with the works of Knights et al. (2011) and
Holmes et al. (2012). However, probably because ecology and microbiology are still
relatively separate scientific fields, andbecause theuseof environmentalDNAdata is
more recent in ecology, generative mixture models have been little used so far in
ecology, except for the effort of Valle et al. (2014) on classical tree census data.
Furthermore, focus in microbiology appears to have beenmostly on models without
admixture (i.e.,where samplesbelong to a single assemblage, cf. Introduction), unlike
topic models. While LDA is one of the simplest topic models (along with the earlier
Probabilistic Latent Semantic Analysis model, or PLSA; Hofmann, 2001), many
extensions have been developed for the analysis of text documents since its original
Discussion
211
introduction. The adaptation of these methods to bioinformatics, e.g. for the
classification of DNA sequences or the identification of protein function, has been
extensivelyexplored(Liuetal.,2016).Ecology,andmicrobiology,wouldbenefitfroma
similareffortorientedtowardbiodiversitydata.Inthefollowing,Ireviewafewpossible
examples.
Figure2:Terrestrial biogeographicunits of theworld inferred from thedistributionof21,037speciesofamphibians,birdsandmammals. InferencewasperformedusingUPGMAhierarchical clustering on phylogenetic dissimilarity. Thick lines denote main biogeographicboundaries (separating ‘realms’) and dotted lines denote minor ones (separating ‘regions’).AdaptedfromHoltetal.(2013).
First,theapproachpresentedinthethirdchaptercanbeusedirrespectiveofthe
spatial scale at which the data are collected, andmay for instance be applied to the
definitionofbiogeographicunits.AsidefromthesequencingofenvironmentalDNA,the
development of DNA sequencing methods now allows for efficiently and accurately
assigningtoataxonanycollectedbiologicalmaterial,onceasuitablereferencedatabase
hasbeenestablished.Thus,abetterusecanbemadeofthelargenumberofspecimens
eithercollectedinthefieldorstoredinmuseumcollections,andtheresultingdatamay
be used to study biogeographic patterns in a data-driven way. In recent years,
alternatives to the classical hierarchical clustering approach (followed for instance in
Holt et al., 2013; see Fig. 2) have been sought to address this problem (Vilhena &
Antonelli,2015;Bloomfieldetal.,2017).TheAppendixillustratesthepotentialofLDAin
(3). Using existing knowledge of his time (6),mostly on the distributions and taxonomic rela-tionships of broadly defined vertebrate families,Wallace divided the world into six terrestrialzoogeographic units largely delineated by what wenow know as the continental plates. Despite rely-ing on limited information and lacking a statisticalbasis, Wallace’s original map is still in use today.
Wallace’s original zoogeographic regional-ization scheme considered ancestral relationshipsamong species, but subsequent schemes generallyused data only on the contemporary distribu-tions of species without explicitly consideringphylogenetic relationships (7–9). Phylogenetictrees contain essential information on the evolu-tionary relationships of species and have be-come increasingly available in recent decades,permitting the delineation of biogeographic re-gions as originally envisioned by Wallace. The
opportunity now exists to use phylogenetic in-formation for grouping assemblages of speciesinto biogeographic units on a global scale. In ad-dition to permitting a sound delimitation of bio-geographic regions, phylogenetic informationallows quantifying phylogenetic affinities amongregions (e.g., 10). Newly developed statisticalframeworks facilitate the transparent character-ization of large biogeographic data sets while min-imizing the need for subjective decisions (11).
Here, we delineated the terrestrial zoogeo-graphic realms and regions of the world (12) byintegrating data on the global distributions andphylogenetic relationships of the world’s am-phibians (6110 species), nonpelagic birds (10,074species), and nonmarine mammals (4853 species),a total of 21,037 vertebrates species [see (13) fordetails]. Pairwise phylogenetic beta diversity (pb)metrics were used to quantify change in phyloge-netic composition among species assemblagesacross the globe. Analyses of combined taxa pbvalues identified a total of 20 zoogeographic re-gions, nested within 11 larger realms, and quan-tified phylogenetic relatedness among all pairs ofrealms and regions (Fig. 1, figs. S1 and S2, andtables S1 and S2). We also used pb to quantifythe uniqueness of regions, with the Australian(mean pb = 0.68),Madagascan (mean pb = 0.63),and South American (mean pb = 0.61) regionsbeing the most phylogenetically distinct assem-blages of vertebrates (Fig. 2). These evolutiona-rily unique regions harbor radiations of speciesfrom several clades that are either restricted to agiven region or found in only a few regions.
Our combined taxa map (Fig. 1) contrastswith some previously published global zoogeo-graphic maps derived exclusively from data onthe distribution of vertebrate species (8, 9, 11).The key discrepancy between our classification
of zoogeographic regions and these previousclassifications is the lack of support for previ-ous Palearctic boundaries, which restricted thisbiogeographic region to the higher latitudes ofthe Eastern Hemisphere. The regions of centraland eastern Siberia are phylogenetically moresimilar to the arctic parts of the Nearctic region,as traditionally defined, than to other parts ofthe Palearctic (fig. S2). As a result, our newlydefined Palearctic realm extends across thearctic and into the northern part of the WesternHemisphere (Fig. 1 and fig. S1). These resultsbear similarities with the zoogeographic map of(11) derived from data on the global distributionof mammal families. In addition, our results sug-gest that the Saharo-Arabian realm is interme-diate between theAfrotropical and Sino-Japaneserealms [see the nonmetric multidimensional scaling(NMDS) plot in fig. S2]. Finally, we newly definethe Panamanian, Sino-Japanese, and Oceanianrealms [but see the Oceanian realm of Udvardyin (14) derived from data on plants].
Our classification of vertebrate assemblagesinto zoogeographic units exhibits some interest-ing similarities with Wallace’s original classi-fication, as well as some important differences(fig. S3). For example, Wallace classified islandseast of Borneo and Bali in his Australian region(fig. S3), which is analogous to our Oceanian andAustralian realms combined (Fig. 1 and fig. S1).In contrast, we find that at least some of theseislands (e.g., Sulawesi) belong to our Orientalrealm, which spans Sundaland, Indochina, andIndia (Fig. 1 and fig. S1). Moreover, our Ocean-ian realm is separate from theAustralian realm andincludes New Guinea together with the PacificIslands (14), whereas Wallace lumped thesetwo biogeographic units into the Australian re-gion. Wallace further argued that the Makassar
1Center for Macroecology, Evolution, and Climate, Depart-ment of Biology, University of Copenhagen, 2100 CopenhagenØ, Denmark. 2Biodiversity and Climate Research Centre (BiK-F)and Senckenberg Gesellschaft für Naturforschung, Sencken-berganlage 25, 60325 Frankfurt, Germany. 3Department of Bio-geography and Global Change, National Museum of NaturalSciences, Consejo Superior de Investigaciones Científicas, Calle deJosé Gutiérrez Abascal, 2, 28006 Madrid, Spain. 4Centro deInvestigação em Biodiversidade e Recursos Genéticos, Universi-dade de Évora, Largo dos Colegiais, 7000 Évora, Portugal.5Center forMacroecology, Evolution, and Climate, Natural HistoryMuseum of Denmark, University of Copenhagen, 2100CopenhagenØ,Denmark. 6Department of Ecology andEvolution,Stony Brook University, Stony Brook, NY 11794–5245, USA.7Department of Vertebrate Zoology, MRC-116, National Mu-seum of Natural History, Smithsonian Institution, Post OfficeBox 37012, Washington, DC 20013–7012, USA. 8BiodiversityResearch Group, School of Geography and the Environment,Oxford University Centre for the Environment, South Parks Road,Oxford OX1 3QY, UK.
*These authors contributed equally to this work.†To whom correspondence should be addressed. E-mail:[email protected]
Fig. 1. Map of the terrestrial zoogeographic realms and regions of the world.Zoogeographic realms and regions are the product of analytical clustering ofphylogenetic turnover of assemblages of species, including 21,037 species ofamphibians, nonpelagic birds, and nonmarine mammals worldwide. Dashedlines delineate the 20 zoogeographic regions identified in this study. Thick
lines group these regions into 11 broad-scale realms, which are named. Colordifferences depict the amount of phylogenetic turnover among realms. (For moredetails on relationships among realms, see the dendrogram and NMDS plot infig. S1.) Dotted regions have no species records, and Antarctica is not included inthe analyses.
www.sciencemag.org SCIENCE VOL 339 4 JANUARY 2013 75
REPORTS
on
Aug
ust 2
9, 2
016
http
://sc
ienc
e.sc
ienc
emag
.org
/D
ownl
oade
d fr
om
Discussion
212
this respect, by applying it to a large dataset of Amazonian frogs assembledwith the
helpofDNAidentification.LDAprovedinthiscaseanefficientwaytoinfertheoptimal
number of biogeographic units, assess the strength of the underlying signal, and
distinguishbetweensharpanddiffuseboundariesbetweenbiogeographicunits.
Nevertheless, unlike distance-based clustering, LDA does not allow for taking
phylogenetic information intoaccount,and improvingon thisaspectcouldbeauseful
research avenue. This problem could be for instance approached through the
Generalized Polya urn LDA model (Mimno et al., 2011). Furthermore, despite its
shortcomings, distance-based hierarchical clustering is appreciated by ecologists
because it provides additional hindsight on the relationship between the different
samples,andbecauseitoffersthepossibilityofchoosingthenumberofclustersbased
on the hierarchical tree. The hierarchical LDAmodel (or hLDA; Griffiths etal., 2004),
whichdescribesahierarchyofnestedtopics,couldbeappealinginthisrespect.
Second, the study of interactions between taxa constitutes a central interest of
communityecology.LargeenvironmentalDNAdatasetsprovideindirectinformationon
potential interactions between taxa through the co-occurrence of OTUs and the
covarianceof theirabundances (seeFig.3;Faust&Raes,2012).LDAassemblagesare
inferred based on this information, and thus reflect the presence of potential
interactions within each assemblage. Nevertheless, the application of LDA
decomposition separately todifferent taxonomic groups, asdone in the third chapter,
does not provide any information on the possible interactions between these groups.
Conversely, when LDA is applied to the whole dataset, it is not possible to explicitly
distinguish between subgroups of preferentially interacting taxa, such as plants and
fungi for instance. This shortcoming could be addressed by using the ‘author-topic
model’,anextensionofLDAaimingataccountingforsample‘metadata’,suchasauthors
inatextdocument(Rosen-Zvietal.,2004,2010).ThismodelisidenticaltoLDAexcept
thateachdocument(orsample)isnotdirectlycharacterizedbyamixtureoftopics(or
assemblages), but by its authors, to each ofwhich is assigned amixture of topics. In
practice, authorsmaybeanydiscrete labels, andcould for instancecorrespond to the
oneor few treespeciessurroundingeachsoil sample ifone is interested in tree-fungi
interactions.Themethodwouldthusyieldamixtureoffungiassemblagesforeachtree
species, and indirectly a mixture of assemblages for each sample based on the tree
Discussion
213
speciessurroundingit.Asimplerversionofthismodelmightalsobeconsideredwherea
singleassemblagecharacterizeseachtreespecies.
Figure3:Occurrence-based inferenceof interactionsbetweenprokaryoticOTUs fromaglobaldataset(Chaffronetal.,2010).EachnoderepresentsanOTU,andedgesbetweennodesrepresent significant associations based on co-occurrence. Edge thickness increases withsignificance.AdaptedfromFaust&Raes(2012).
Moregenerally,ecologicalstudiesoftendonotlimitthemselvestoexploringthe
structure of a single type of data, but attempt at uncovering statistical relationships
betweendifferent typesofdata,suchas taxonomicandenvironmentaldata.While the
author-topicmodelonlyallowsforaddingdiscretelabelstoeachsample,othermodels
such as Dirichlet-multinomial Regression (Mimno & McCallum, 2012) can also
accommodate continuous attributes, and could be used to account for environmental
measurements in themodel. The goodness-of-fit of themodelwithout environmental
Rule: if Bacteroides (OTU 4612) and Lachnospiraceae incertae
sedis (OTU b1228) are present, then Faecalibacterium (OTU b418) is also present
Colour code
Bacterial phyla
AcidobacteriaActinobacteriaAquificaeBacteroidetesChloroflexiFirmicutesGemmatimonadetesNitrospiraePlanctomycetesProteobacteriaSpirochaetesThermodesulfobacteriaTM7Verrucomicrobia
Archaeal phyla
CrenarchaeotaEuryarchaeota
OTUs above phylum level are white
Node fill colour code
Bacteroides
Clostridia unclassifiedFaecalibacterium
LachnospiraceaeLachnospiraceae unclassified and incertae sedis
Roseburia
Ruminococcaceae unclassifiedRuminococcus
Subdoligranulum
Node border colours distinguish between different OTUs
REVIEWS
542 | AUGUST 2012 | VOLUME 10 www.nature.com/reviews/micro
© 2012 Macmillan Publishers Limited. All rights reserved
Discussion
214
datacouldthenbecomparedtothatofthemodelaccountingforthesedata,ideallyusing
AIC.Thiswouldentail a slightlydifferentuseof topicmodelling than thatof the third
chapter: namely, shifting the focus from the exploration of data structure toward
hypothesistesting.However,bothapproacheshavetheirownmerits.
Finally,theapplicationoftopicmodellingtoecologyneedsnotbelimitedtotaxa
abundanceandoccurrencedata. It could for instancebeextended toexonsequencing
data describing functional types, or to RNA sequencing data characterizing gene
expression. Moreover, aside from the major technological revolution that is high-
throughputDNAsequencing,otherpromisingtechnologiesarecurrentlybeingadapted
toautomateddatacollectioninecology,notablyLidarandhyperspectralimaging.Topic
modellinghasbeensuccessfullyusedtoretrievepatternsfromimages(Luoetal.,2015),
and could possibly also find application in the analysis of remote-sensing ecological
data.
Statisticalversusmechanisticmodelling3.
Topicmodelling is but one ofmany competingbranches ofmachine learning that are
currentlyactivelydevelopedtoexploittheever-increasingamountofdataproducedby
currenttechnologies(Bishop,2006).Overtherecentyears,somebranchesofmachine
learninghavebecomeparticularlyprominent,especiallymulti-layeredneuralnetworks
underthenameof‘deeplearning’(LeCunetal.,2015).Suchmethodsareindeedefficient
atdetectingstructureinlargedatasets,andhavebeenrecentlyappliedtobioinformatics
problems such as DNA sequence classification (Rizzo et al., 2016). However, these
methods are not based on an easily interpretable model. As such, they can only be
fruitfully applied to supervised learning tasks, i.e. to situations where correct and
incorrectresultscanbetoldapartapriori,whicharemoretypicalofengineeringthan
basicscience.
In contrast, topic models have a mathematical structure that is similar to the
multivariate formulationofneutralmodels, asdiscussed inHarrisetal. (2015)and in
thethirdpartoftheIntroduction.Thisparallelcouldbeexploitedtobuildmixedmodels
Discussion
215
combiningtheadvantagesofbothtypesofmodels.Forinstance,alocalcommunity,such
as an island,may receive immigrating individuals from different source communities
thathavedistinct(known)taxonomiccompositions.Byassuminganeutraldynamicsin
the local community, and modelling the origin of immigrating individuals by a topic
model, one could possibly infer from the local taxonomic composition the relative
contribution of the different source communities. Conversely, starting from a topic
model as in the third chapter, one could assume that a neutral dynamics takes place
withineachassemblage.
While topic and neutral models may seem to be of different nature, the
distinction between statistical and mechanistic models is more tenuous than may
appear at first glance. The first topic modelling papers mentioned mechanistic
arguments to justify theirmodels, arguing than theymirrored theway humanswrite
text documents, and some subsequent developments try to better account for the
structure of natural language (Wallach, 2006). When applied to ecological data, the
assumptionthatlocalcommunitiesareamixtureofseveralassemblagesofco-occurring
taxa constitutes a genuine biological hypothesis. Conversely, the realism of the
hypothesesinHubbell’sneutralmodelhasbeenmuchdebated(Rosindelletal.,2012),
and onemight argue that itsmost valuable hindsight is on the nature of the species
abundance distribution pattern itself: namely, thatmost empirical species abundance
distributions can be approximately decomposed into orthogonal diversity and
connectivitycomponents,irrespectiveoftheirexactmechanisticinterpretation(Jabotet
al.,2008).
While a very flexiblemodel is undesirablewhen one aims at testingmodelling
hypotheses on data, it becomes an advantage when one aims at characterizing the
systemathandthrougha limitednumberofrelevantparameters.This isoftenamore
realistic prospect when faced with large datasets resulting from automated data
collection. However, as illustrated by the case of Hubbell’s neutral model, relevant
parameters cannotbedeterminedwithout anunderstandingof thebasicprocesses at
play. Moreover, characterizing a system is of little use if this does not entail the
possibility of predictions and generalization. A right balance is thus to find between
flexibilityandfalsifiabilityinbuildingmodelsfortheanalysisoflargedatasets,andthe
Discussion
216
shift toward inference-oriented models should not preclude building them on first
principles(Marquetetal.,2014).
Discussion
217
References
Bishop, C.M. (2006) Pattern Recognition andMachine Learning, Springer. Michael Jordan, JonKleinberg,BernhardSchölkopf.
Bloomfield, N.J., Knerr, N. & Encinas-Viso, F. (2017) A comparison of network and clusteringmethodstodetectbiogeographicalregions.Ecography.
Chaffron,S.,Rehrauer,H.,Pernthaler,J.&vonMering,C.(2010)Aglobalnetworkofcoexistingmicrobes fromenvironmentalandwhole-genomesequencedata.GenomeResearch,20,947–959.
Condit, R., Pitman, N., Leigh, E.G., Chave, J., Terborgh, J., Foster, R.B., Nunez, P., Aguilar, S.,Valencia,R.,Villa,G.,Muller-Landau,H.C.,Losos,E.&Hubbell,S.P.(2002)Beta-diversityintropicalforesttrees.Science,295,666–669.
Etienne,R.S. (2005)Anew sampling formula forneutral biodiversity.EcologyLetters,8, 253–260.
Faust, K. & Raes, J. (2012) Microbial interactions: from networks to models. Nature ReviewsMicrobiology,10,538–550.
Field, C.B., Behrenfeld,M.J., Randerson, J.T. & Falkowski, P. (1998) Primary production of thebiosphere:integratingterrestrialandoceaniccomponents.Science,281,237–240.
Follows, M.J., Dutkiewicz, S., Grant, S. & Chisholm, S.W. (2007) Emergent Biogeography ofMicrobialCommunitiesinaModelOcean.Science,315,1843.
Griffiths,T.L.,Jordan,M.I.,Tenenbaum,J.B.&Blei,D.M.(2004)Hierarchicaltopicmodelsandthenestedchineserestaurantprocess.Advancesinneuralinformationprocessingsystems,pp.17–24.
Harris,K.,Parsons,T.L.,Ijaz,U.Z.,Lahti,L.,Holmes,I.&Quince,C.(2015)Linkingstatisticalandecological theory: Hubbell’s Unified Neutral Theory of Biodiversity as a HierarchicalDirichletProcess.Proc.IEEE,PP,1–14.
Hellweger,F.L.,vanSebille,E.&Fredrick,N.D.(2014)Biogeographicpatternsinoceanmicrobesemergeinaneutralagent-basedmodel.Science.
Hofmann, T. (2001) Unsupervised learning by probabilistic latent semantic analysis.Machinelearning,42,177–196.
Holmes,I.,Harris,K.&Quince,C.(2012)DirichletMultinomialMixtures:GenerativeModelsforMicrobialMetagenomics.PlosOne,7.
Holt,B.,Lessard,J.P.,Borregaard,M.K.,Fritz,S.A.,Araujo,M.B.,Dimitrov,D.,Fabre,P.H.,Graham,C.H.,Graves,G.R.,Jonsson,K.A.,Nogues-Bravo,D.,Wang,Z.H.,Whittaker,R.J.,Fjeldsa,J.&Rahbek,C.(2013)AnUpdateofWallace’sZoogeographicRegionsoftheWorld.Science,339,74–78.
Hubbell, S.P. (2001) The unified neutral theory of biodiversity and biogeography (MPB-32),PrincetonUniversityPress.
Huttenhower,C.,Gevers,D.,Knight,R.,Abubucker,S.,Badger, J.H.,Chinwalla,A.T.,Creasy,H.H.,Earl, A.M., FitzGerald, M.G., Fulton, R.S., Giglio, M.G., Hallsworth-Pepin, K., Lobos, E.A.,Madupu,R.,Magrini,V.,Martin,J.C.,Mitreva,M.,Muzny,D.M.,Sodergren,E.J.,Versalovic,J.,Wollam,A.M.,Worley,K.C.,Wortman,J.R.,Young,S.K.,Zeng,Q.,Aagaard,K.M.,Abolude,O.O.,Allen-Vercoe,E.,Alm,E.J.,Alvarado,L.,Andersen,G.L.,Anderson,S.,Appelbaum,E.,Arachchi, H.M., Armitage, G., Arze, C.A., Ayvaz, T., Baker, C.C., Begg, L., Belachew, T.,Bhonagiri, V., Bihan, M., Blaser, M.J., Bloom, T., Bonazzi, V., Paul Brooks, J., Buck, G.A.,
Discussion
218
Buhay,C.J.,Busam,D.A.,Campbell, J.L.,Canon,S.R.,Cantarel,B.L.,Chain,P.S.G.,Chen, I.-M.A., Chen, L., Chhibba, S., Chu, K., Ciulla, D.M., Clemente, J.C., Clifton, S.W., Conlan, S.,Crabtree,J.,Cutting,M.A.,Davidovics,N.J.,Davis,C.C.,DeSantis,T.Z.,Deal,C.,Delehaunty,K.D.,Dewhirst,F.E.,Deych,E.,Ding,Y.,Dooling,D.J.,Dugan,S.P.,MichaelDunne,W.,ScottDurkin,A., Edgar,R.C., Erlich,R.L., Farmer, C.N., Farrell, R.M., Faust,K., Feldgarden,M.,Felix, V.M., Fisher, S., Fodor, A.A., Forney, L.J., Foster, L., Di Francesco, V., Friedman, J.,Friedrich, D.C., Fronick, C.C., Fulton, L.L., Gao, H., Garcia, N., Giannoukos, G., Giblin, C.,Giovanni,M.Y.,Goldberg, J.M.,Goll, J.,Gonzalez,A.,Griggs,A.,Gujja,S.,KinderHaake,S.,Haas,B.J.,Hamilton,H.A.,Harris,E.L.,Hepburn,T.A.,Herter,B.,Hoffmann,D.E.,Holder,M.E.,Howarth,C.,Huang,K.H.,Huse,S.M.,Izard,J.,Jansson,J.K.,Jiang,H.,Jordan,C.,Joshi,V., Katancik, J.A., Keitel, W.A., Kelley, S.T., Kells, C., King, N.B., Knights, D., Kong, H.H.,Koren,O.,Koren,S.,Kota,K.C.,Kovar,C.L.,Kyrpides,N.C.,LaRosa,P.S.,Lee,S.L.,Lemon,K.P.,Lennon,N.,Lewis,C.M.,Lewis,L.,Ley,R.E.,Li,K.,Liolios,K.,Liu,B.,Liu,Y.,Lo,C.-C.,Lozupone, C.A.,DwayneLunsford,R.,Madden,T.,Mahurkar,A.A.,Mannon, P.J.,Mardis,E.R., Markowitz, V.M., Mavromatis, K., McCorrison, J.M., McDonald, D., McEwen, J.,McGuire, A.L., McInnes, P., Mehta, T., Mihindukulasuriya, K.A., Miller, J.R., Minx, P.J.,Newsham,I.,Nusbaum,C.,O’Laughlin,M.,Orvis,J.,Pagani,I.,Palaniappan,K.,Patel,S.M.,Pearson,M., Peterson, J., Podar,M., Pohl, C., Pollard, K.S., Pop,M., Priest,M.E., Proctor,L.M., Qin, X., Raes, J., Ravel, J., Reid, J.G., Rho, M., Rhodes, R., Riehle, K.P., Rivera, M.C.,Rodriguez-Mueller, B., Rogers, Y.-H., Ross, M.C., Russ, C., Sanka, R.K., Sankar, P., FahSathirapongsasuti, J., Schloss, J.A., Schloss, P.D., Schmidt, T.M., Scholz, M., Schriml, L.,Schubert,A.M.,Segata,N.,Segre,J.A.,Shannon,W.D.,Sharp,R.R.,Sharpton,T.J.,Shenoy,N.,Sheth,N.U.,Simone,G.A.,Singh,I.,Smillie,C.S.,Sobel,J.D.,Sommer,D.D.,Spicer,P.,Sutton,G.G.,Sykes,S.M.,Tabbaa,D.G.,Thiagarajan,M.,Tomlinson,C.M.,Torralba,M.,Treangen,T.J.,Truty,R.M.,Vishnivetskaya,T.A.,Walker, J.,Wang,L.,Wang,Z.,Ward,D.V.,Warren,W.,Watson,M.A.,Wellington,C.,Wetterstrand,K.A.,White,J.R.,Wilczek-Boney,K.,Wu,Y.,Wylie, K.M.,Wylie, T., Yandava, C., Ye, L., Ye, Y., Yooseph, S., Youmans, B.P., Zhang, L.,Zhou,Y.,Zhu,Y.,Zoloth,L.,Zucker,J.D.,Birren,B.W.,Gibbs,R.A.,Highlander,S.K.,Methé,B.A., Nelson, K.E., Petrosino, J.F., Weinstock, G.M., Wilson, R.K. & White, O. (2012)Structure, function anddiversityof thehealthyhumanmicrobiome.Nature,486, 207–214.
Jabot, F., Etienne, R.S. & Chave, J. (2008) Reconciling neutral community models andenvironmentalfiltering:theoryandanempiricaltest.Oikos,117,1308–1320.
Knights,D.,Kuczynski, J., Charlson,E.S., Zaneveld, J.,Mozer,M.C.,Collman,R.G.,Bushman,F.D.,Knight,R.&Kelley,S.T.(2011)Bayesiancommunity-wideculture-independentmicrobialsourcetracking.NatureMethods,8,761-U107.
LeCun,Y.,Bengio,Y.&Hinton,G.(2015)Deeplearning.Nature,521,436–444.Liu, L., Tang, L., Dong, W., Yao, S. & Zhou, W. (2016) An overview of topic modeling and its
currentapplicationsinbioinformatics.SpringerPlus,5.Luo, W., Stenger, B., Zhao, X. & Kim, T.-K. (2015) Automatic Topic Discovery for Multi-Object
Tracking.AAAI,pp.3820–3826.Marquet,P.A.,Allen,A.P.,Brown,J.H.,Dunne,J.A.,Enquist,B.J.,Gillooly,J.F.,Gowaty,P.A.,Green,
J.L.,Harte,J.,Hubbell,S.P.,O’Dwyer,J.,Okie,J.G.,Ostling,A.,Ritchie,M.,Storch,D.&West,G.B.(2014)OnTheoryinEcology.Bioscience,64,701–710.
Mimno,D.&McCallum,A.(2012)Topicmodelsconditionedonarbitraryfeatureswithdirichlet-multinomialregression.arXivpreprintarXiv:1206.3278.
Discussion
219
Mimno,D.,Wallach,H.M., Talley, E., Leenders,M.&McCallum,A. (2011)Optimizing SemanticCoherence inTopicModels.Proceedingsofthe2011ConferenceonEmpiricalMethodsinNaturalLanguageProcessing.
Ofiteru, I.D., Lunn, M., Curtis, T.P.,Wells, G.F., Criddle, C.S., Francis, C.A. & Sloan,W.T. (2010)Combined niche and neutral effects in a microbial wastewater treatment community.Proceedings of the National Academy of Sciences of the United States of America, 107,15345–15350.
Ramirez,K.S., Leff, J.W.,Barberan,A., Bates, S.T., Betley, J., Crowther,T.W.,Kelly, E.F.,Oldfield,E.E., Shaw, E.A., Steenbock, C., Bradford, M.A., Wall, D.H. & Fierer, N. (2014)Biogeographic patterns in below-grounddiversity inNewYork City’s Central Park aresimilartothoseobservedglobally.ProceedingsoftheRoyalSocietyB-BiologicalSciences,281,9.
Rizzo,R.,Fiannaca,A.,LaRosa,M.&Urso,A.(2016)ADeepLearningApproachtoDNASequenceClassification. Computational Intelligence Methods for Bioinformatics and Biostatistics:12th International Meeting, CIBB 2015, Naples, Italy, September 10-12, 2015, RevisedSelected Papers (ed. by C. Angelini), P.M. Rancoita), and S. Rovetta), pp. 129–140.SpringerInternationalPublishing,Cham.
Rosen-Zvi,M.,Chemudugunta,C.,Griffiths,T.,Smyth,P.&Steyvers,M.(2010)LearningAuthor-TopicModelsfromTextCorpora.AcmTransactionsonInformationSystems,28.
Rosen-Zvi,M.,Gri_ffiths,T.,Steyvers,M.&Smyth,P.(2004)TheAuthor-TopicModelforAuthorsandDocuments.
Rosindell, J., Hubbell, S.P., He, F., Harmon, L.J. & Etienne, R.S. (2012) The case for ecologicalneutraltheory.TrendsinEcology&Evolution,27,203–208.
Soininen,J.,McDonald,R.&Hillebrand,H.(2007)Thedistancedecayofsimilarityinecologicalcommunities.Ecography,30,3–12.
Valle,D.,Baiser,B.,Woodall,C.W.&Chazdon,R.(2014)DecomposingbiodiversitydatausingtheLatentDirichletAllocationmodel,aprobabilisticmultivariatestatisticalmethod.EcologyLetters,17,1591–1601.
deVargas,C.,Audic,S.,Henry,N.,Decelle,J.,Mahe,F.,Logares,R.,Lara,E.,Berney,C.,LeBescot,N., Probert, I., Carmichael, M., Poulain, J., Romac, S., Colin, S., Aury, J.-M., Bittner, L.,Chaffron,S.,Dunthorn,M.,Engelen,S.,Flegontova,O.,Guidi,L.,Horak,A.,Jaillon,O.,Lima-Mendez, G., Lukes, J.,Malviya, S.,Morard,R.,Mulot,M., Scalco, E., Siano,R., Vincent, F.,Zingone,A.,Dimier,C.,Picheral,M., Searson, S.,Kandels-Lewis, S.,Acinas, S.G.,Bork,P.,Bowler,C.,Gorsky,G.,Grimsley,N.,Hingamp,P.,Iudicone,D.,Not,F.,Ogata,H.,Pesant,S.,Raes,J.,Sieracki,M.E.,Speich,S.,Stemmann,L.,Sunagawa,S.,Weissenbach,J.,Wincker,P.,Karsenti,E.&TaraOceans,C. (2015)Eukaryoticplanktondiversity in thesunlitocean.Science,348.
Vilhena, D.A. & Antonelli, A. (2015) A network approach for identifying and delimitingbiogeographicalregions.NatureCommunications,6.
Wallach,H.M.(2006)Topicmodeling:beyondbag-of-words.Proceedingsofthe23rdinternationalconferenceonMachinelearning,pp.977–984.ACM.
Ward,B.A.,Dutkiewicz,S.,Jahn,O.&Follows,M.J.(2012)Asize-structuredfood-webmodelfortheglobalocean.LimnologyandOceanography,57,1877–1891.
Discussion
220
Appendix–BiogeographyofAmazonianAnurans
221
AppendixLarge-scaleDNAbarcodingofAmazoniananuransleadstoanewdefinitionofbiogeographicalsubregionsintheGuianaShieldandrevealsavastunderestimationofdiversityandlocalendemism
Jean-PierreVachera,GuilhemSommeria-Kleina,FrancescoFicetolab,MiguelTrefaut
Rodriguesc,PhilippeJ.R.Kokd,BriceP.Noonane,AndrewSnydere,RawienJairamf,Paul
Ouboterf,JerrianeOliveiraGomesg,TeresaC.S.Avila-Piresg,JucivaldoDiasLimah,Raffael
Ernsti,MichelBlancj,MaëlDewynterk,TimJ.Colsonl,SergioM.deSouzam,PedroNunesn,
AugustinCamachoo,MauroTeixeirap,RenatoRecoderq,JoséCassimiror,Quentin
Martinezs,ChristianMartyt,PhilippeGaucheru,ChristopheThébauda,AntoineFouqueta,u.
Appendix–BiogeographyofAmazonianAnurans
222
aLaboratoireEDB,UMR5174,CNRS-UPS-IRD,Toulouse,France;bLaboratoired'ÉcologieAlpine(LECA),UMR5553,Grenoble,France;cUniversidade de São Paulo, Instituto de Biociências, Departamento de Zoologia, Caixa Postal11.461,CEP05508-090,SãoPaulo,SP,Brazil;dAmphibian Evolution Lab, Biology Department,Vrije Universiteit Brussel, 2 Pleinlaan, 1050Brussels,Belgium;eDepartmentofBiology,507ShoemakerHall,University,MS38677,USA;f National Zoological Collection Suriname (NZCS), Anton de Kom University of Suriname,Paramaribo,Suriname;gMuseuParaenseEmílioGoeldi,LaboratóriodeHerpetologia/CZO,Av.Perimetral,1901;TerraFirme,Belém,Pará,Brazil;hCentro de Pesquisas Zoobotânicas e Geologicas (CPZG), Instituto de Pesquisas Cientificas eTecnológicasdoEstadodoAmapá(IEPA),Macapá,AP,Brazil;IMuseumofZoology,SenckenbergNaturalHistoryCollectionsDresden,KönigsbrückerLandst.159,01109Dresden,Germany;jPointeMaripa,RN2/PK35,93711,Roura,FrenchGuiana;k Biotope, Agence Amazonie-Caraïbes, 30 Domaine de Montabo, Lotissement Ribal, 97300Cayenne,FrenchGuiana;lAdresses8rueMareschal,30900Nîmes,France;timpasseJeanGalot,Montjoly,FrenchGuiana;uLaboratoire Écologie, évolution, interactions des systèmes amazoniens (USR 3456 LEEISA),UniversitédeGuyane,CNRSGuyane,Cayenne,FrenchGuiana.
Appendix–BiogeographyofAmazonianAnurans
223
Introduction
Amazoniaencompassesabout40%oftheworld’stropicalforests(Sioli,1984;Hubbellet
al.,2008;Hoorn&Wesselingh,2010),andmanytaxonomicgroupsreachtheirhighest
speciesrichness in thisregion(Antonelli&Sanmartín,2011; Jenkinsetal.,2013).The
processes that have given rise to this exceptionally highdiversity have long intrigued
biologists(Wallace,1852;Bates,1863).Amazoniagivesanappearanceofhomogeneity,
because it is a vast and seemingly uniform extent of forest that is faunistically very
distinct from other Neotropical regions (Dinerstein et al., 1995; Olson et al., 2001;
Antonelli& Sanmartín, 2011;Vilhena&Antonelli, 2015).However, this ismisleading:
temperaturesandrainfallvarywidelyacrossAmazonia(Mayle&Power,2008),andso
dovegetation types (Anderson,2012;Hughesetal.,2013).Moreover,Amazoniahada
tumultuousclimatologicalandgeologicalpast,mainlycausedbytheAndeanupliftand
the setting-up of the Rio Amazonas watershed during the late Tertiary (Hoorn et al.,
2010).
ThedistributionofspecieswithinAmazoniaisknowntorelatetothislarge-scale
environmental heterogeneity. The observed congruence between the geographic
distribution of birds and primates on the one hand and themajor interfluves on the
other hand (Wallace, 1852; Haffer, 1974) led to the definition of biogeographic
subregions (BSRs), coined as “Amazonian areas of endemism” (Wallace, 1852;Haffer,
1974;Cracraft,1985).However,thereisstilllittleconsensusonhowtobestdelimitand
nameBSRs,withmanytermsbeingusedinterchangeably(Vilhena&Antonelli,2015).In
fact, the very existence and boundaries of different BSRs across Amazonia and the
relative degree of endemism within them have simply never been analysed using
modern analytic tools (e.g., clustering) and large species assemblages having
unambiguousdistributiondata(Nelsonetal.,1990;Morrone,2005;Nakaetal.,2012).
Moreover,currentknowledgeonthedelimitationofAmazonianBSRsismainlybasedon
birds, the best-known taxonomic group, as well as primates and plants displaying
limited distributions in Amazonia. The explanatory power of the Amazonian BSR as
currently defined remains questionable until their boundaries are proven to match
Appendix–BiogeographyofAmazonianAnurans
224
acrossmultiple taxonomicgroups.However, thisseemsunlikelybecause thesegroups
have overall high dispersal abilities, and their distribution patterns may be poor
predictorsforlessvagiletaxa(Claramuntetal.,2011;Pigot&Tobias,2015;Zizkaetal.,
2016).
Because small terrestrial vertebrates such as anurans have more limited
dispersalabilitiesandpossiblyagreatersensitivitytoenvironmentalvariation,theyare
bettersuitedtothedelineationofrelevantbioregions(Zeisset&Beebee,2008).Anuran
assemblagesmay display different or finer geographic patterns than those previously
described at the continental (Vilhena & Antonelli, 2015) or the regional scale
(Vasconcelos et al., 2014). For instance, one of the rare studies unambiguously
delimitingBSRsinAmazoniafoundwell-delimitedbioregionsintheGuianaShieldbased
onthedistributionofbirdspecies,includingalargehomogeneousregionspanningthe
easternpartoftheGuianaShield(Nakaetal.,2012).Yet,studiesonanuranamphibians
suggest a finerbiogeographic structure in theEasternGuianaShield,wheredivergent
lineages of frogs exhibit concordant distribution limits (Fouquet et al., 2012d, 2013,
2016). In thispaper,weaimatdelimitingAmazonianbioregions inadata-drivenway
basedonanewlycollecteddatasetofmolecularanurandiversity,withaparticularfocus
ontheEasternGuianaShield.
RevealingthebasicgeographicalstructureofspeciesdiversityinAmazoniaisnot
onlyofcrucialimportanceforconservation(DaSilvaetal.,2005),itisalsoanimportant
prerequisite for the study of the processes that gave rise to present-day diversity
patterns.IdentifyingBSRsinAmazoniamayhelpidentifythephysicalbarriersrelevant
to speciation, define the contact zones between closely related parapatric taxa, and
capture the effects of dispersal limitation in the structure of Amazonian communities
(e.g., Moura et al., 2016). Many hypotheses have been proposed to explain
heterogeneities in species distribution across Amazonia, including landscape change
inducedbylateTertiaryclimatefluctuations(Haffer1969),theupliftoftheAndes,and
continuousdispersal across large rivers (Hayes&Sewlal, 2004;Antonellietal., 2010;
Hoorn et al., 2010), or past environmental gradients (Colinvaux et al., 2000). These
differenthypotheseshavebeenverifiedforsometaxonomicgroupsatdifferentspatial
andtemporalscales(Hall&Harvey,2002),butthereisstillnoconsensusaboutthemain
driversofdiversificationwithinAmazonia.
Appendix–BiogeographyofAmazonianAnurans
225
TwomajorchallengestoourunderstandingofthebasicstructureofAmazonian
biodiversity are the scarcity of occurrence data and the imprecision of species
delineation(WallaceanandLinneanshortfalls).Theseshortfallsareparticularlyobvious
insmallterrestrialvertebratessuchasanurans(Ficetolaetal.,2014).Almostallanuran
taxa with large ranges in Amazonia exhibit deep divergences when analysed with
genetic tools, suggesting that they comprise several species, each with a restricted
distribution(Fouquetetal.,2007a;Funketal.,2012;Geharaetal.,2014;Fouquetetal.,
2015b;Ferrãoetal.,2016;Fouquetetal.,2016).Thesestudiestypicallyimplythatthe
actual species richness in these groups is more than twice that estimated from
morphology only. Therefore, ranges of Amazonian amphibians used in broad
biodiversityassessmentssuchastheInternationalUnionfortheConservationofNature
(IUCN) Red list are likely to be largely inaccurate (Ficetola et al., 2014). Out of 427
amphibianspeciesinhabitingthe6millionkm2ofAmazoniaaccordingtoIUCN,atleast
150species(35%)aredistributedovermorethan1millionkm2(Fouquetetal.,2007a).
Suchahighproportionofbroadly-distributed species seemsunlikely (Wynn&Heyer,
2001),becauseamphibiansusuallydisplaylowdispersalcapacitiesandoftenhavesmall
niches (Duellman & Trueb, 1994;Wells, 2010). This gap in our understanding of the
actualdiversityanddistributionofspeciescouldseriouslyinvalidateconclusionsdrawn
fromIUCNdata(Fodenetal.,2013;Jenkinsetal.,2013,2015;Pimmetal.,2014;Feeley
&Silman,2016).
Theoverallaimsofthisstudywere(1)toobtainanewgeoreferenceddatasetof
Amazonian anurans based onmolecular diversity,with a focus on the easternGuiana
Shield (EGS) (east of the Tepuis, and north of Rio Negro and Rio Amazonas), (2) to
provide estimates of the number of species and of their distributions in this part of
Amazonia, (3) to infer data-driven spatial boundaries betweenBSRs, aswell as to re-
assesstheirrateofendemism.Giventhatanuranspeciesboundariesanddistributions
areplaguedwithuncertainty inAmazoniaandthat IUCNdataareoftenout-datedand
imprecise, it is necessary to use occurrence records linked to taxonomic frameworks
based on clear criteria. Therefore, we conducted extensive fieldwork to collect
specimensrepresentativeofpresent-daydiversityatthescaleoftheentireregion,and
obtained mitochondrial DNA sequences (16S rDNA) from these specimens. We also
included inouranalysespubliclyavailablesequences fromotherspecimens.Basedon
Appendix–BiogeographyofAmazonianAnurans
226
thesesequences,wegeneratedtwonewtaxonomicframeworksforAmazoniananurans.
Our dataset represents the largest molecular diversity dataset gathered so far in
Amazoniaforanytaxonomicgroup.
Appendix–BiogeographyofAmazonianAnurans
227
Materialandmethods
Fieldwork1.
Weundertook fieldwork in several localities throughout theGuiana Shield, notably in
southernSuriname,FrenchGuiana,andtheBrazilianstatesofAmapáandRoraima.We
collectedspecimensofasmanyanuranspeciesaspossibleperlocalitybynocturnaland
diurnal active searches (visual and acoustic). Each specimen was identified and
photographed. They were subsequently euthanized using an injection of Xylocaine®
(lidocainechlorhydrate).Tissuesamples(liverormuscletissuefromthighortoe-clip)
wereremovedandstoredin95%ethanol,whilespecimensweretaggedandfixed(using
formalin10%)beforebeing transferred to70%ethanol forpermanent storage.These
field surveys allowed us to cover the anuran communities of the EGS at an
unprecedented fine scale (Fig.1A).Wecompleted thesedata for the restofAmazonia
withloansofmaterialfromseveralinstitutions,notablyfromUniversidaddeSaoPaulo
for the upper Madeira, lower Xingu, Abacaxis and Purus Rivers. Ultimately, the total
numberofanalysedsamplesreached4,681.
Moleculardata2.
We extracted DNA from the samples using the Wizard Genomic extraction protocol
(Promega;Madison,WI,USA).Wetargetedac.a.400bpfragmentof themitochondrial
16SrDNAusingMiSeqandSangertechniques(SupplementaryMethods).Weeventually
generated4,492sequences.
Additionally, we retrieved from GenBank (as of the 1st August 2015) all
sequences of species congenericwith those occurring in theGuiana Shield, aswell as
sequences of Adelphobates and Phyzelaphryne, two genera restricted to southern
Amazonia.Weremoved low-qualityor tooshortsequences,aswellasduplicates from
Appendix–BiogeographyofAmazonianAnurans
228
the same specimen. We obtained approximate geographical coordinates for most of
theserecordssearchingtheoriginalpapers,localityinformation,orcollectiondatabases.
The finaldatasetcontained11,166sequences,10,254ofwhichweregeotagged.
ThisbarcodedatasetisprobablythemostextensivegatheredsofarinAmazoniaforany
vertebrate group. 8,181 records are fromAmazonia proper, including 4,634 from the
EGS, while the remaining are from adjacent regions. The obtained sequences were
alignedwithMAFFTv.7(Katoh&Standley,2013).Weusedtheresultingalignment to
generate a neighbour-joining tree using pairwise deletion and p-distancemodel with
MEGAv.7.0.16(Kumaretal.,2016).
Taxonomicframeworks3.
Whilethereisvalidcriticismagainstrelianceonsimplisticsingle-sequenceapproaches
to species delineation (Goldstein & DeSalle, 2011; Krishna Krishnamurthy & Francis,
2012), such approaches can take us further toward the comparative quantification of
biodiversityoverdifferentspatialscales(Emersonetal.,2011;Yuetal.,2012; Jietal.,
2013). In thecaseofAmazoniananurans, clearandexhaustivedelimitationof species
boundariesbasedonmorphology,acousticsandmoleculardataremainsoutofreach.As
aconsequence,manyspeciesgroupshaveaveryconfusedtaxonomyleadingtofrequent
misidentification, lumpingof undescribed specieswithin a single taxon, and assigning
speciestopolyphyleticgroups.ThisresultsinlargelyinaccurateIUCNdata.Inorderto
compare our sequence dataset to IUCN data, we built two different taxonomic
frameworks. The TAXO1 taxonomic framework is conservative, linking as much as
possibleeachsequencetoanominaltaxonsoastoformamonophyleticgroup,whilethe
TAXO2taxonomicframeworkresultsfromapurelyDNA-basedspeciesdelineation(see
below).
ForTAXO1,ourgoalwastogroupundernominal taxathesequences forminga
monophyletic group according to the neighbour-joining tree, so as to obtain the
geographicrangeofthelineagesalreadyconsideredbytheIUCN.Originalfieldworkand
GenBank assignments were often contradictory because of the above-mentioned
Appendix–BiogeographyofAmazonianAnurans
229
reasonsandbecauseoftaxonomicchangessubsequenttoidentification,andwerethus
oftenmodified.Wefirstidentifiedthesequencesthatcouldbeunambiguouslylinkedto
a nominal taxon by considering the literature (e.g. sequences from type series), the
knownrangeofthetaxon,andthelocationofthetypelocality.Then,wecheckedwether
thisidentificationwasinaccordancewiththeIDofthemostcloselyrelatedsequences.If
in accordance, this taxon ID was applied to the sequences until another taxon was
applicable to more distant lineage. When a taxon was found to be paraphyletic, we
checked for possible misidentification, and whether one of the lineages could be
identified as another taxon. When paraphyly was ambiguous, we kept the original
identification.Whenparaphylywasunambiguous,oneofthelineageswasidentifiedas
the nominal taxonwhile the other oneswere identified as “sp.” if they did not share
affinitieswithanothertaxon.Inafewcases,twoormoretaxawerelargelyintricatewith
shallow genetic distances among sequences and remained ambiguous despite the
allopatric distribution of the lineages.We then considered them as single taxon (e.g.
Atelopushoogmoedi,A.flavescens)giventheyrepresentsinglelineageandsinglepatchof
distribution.Ultimately,we think thatTAXO1provides a representativeupdate of the
currenttaxonomicknowledgeforAmazoniananurans.941specieswereconsideredin
TAXO1,including365occurringinAmazonia.
For TAXO2 we applied the Automatic Barcode Gap Discovery (ABGD) species
delineation method (Puillandre et al., 2012) to our sequence dataset. We performed
ABGDanalysesfromthesourcecodewithdefaultsettings(JC69,Pmin:0.001,Pmax:0.1,
steps:10,Nbbins:20)oneachgenus,andattributedanumbertoeachcandidatespecies
retrievedintheanalysis.ComputationswereperformedontheEDB-CalcClusterhosted
bythelaboratory"ÉvolutionetDiversitéBiologique"(EDB),usingasoftwaredeveloped
by the Rocks(r) Cluster Group (San Diego Supercomputer Center, University of
California, San Diego and its contributors. In 24 instances (17 concerning Amazonian
taxa),differentnominaltaxainTAXO1werelumpedintoauniquecandidatespeciesin
TAXO2becauseofashallowmtDNAdivergencebetweenthem(notablyinAtelopusspp.
and Osteocephalus ssp.). As these correspond to clearly distinct species based on
morphologyandacoustic,andformmonophyleticgroupsinpreviousstudies(butherein
with shallow divergence or recovered ambiguously paraphyletic due to the low
resolutioninour400bp-long16Sfragment),weconsideredthemasfalsenegativeand
Appendix–BiogeographyofAmazonianAnurans
230
we applied to them the same taxonomic assignment as in TAXO1. Ultimately, 1,246
specieswereconsideredinTAXO2,including746occurringinAmazonia.
Third, we compiled amphibian species range data from the IUCN
(http://www.iucnredlist.org/technical-documents/spatial-data#amphibians), which is
themostwidely used amphibian distribution database. In order tomake this dataset
comparablewithTAXO1andTAXO2,weexcluded22genera(433species)thatareonly
partlyoverlappingwithourfocalarea,i.e.,westernAmazonia,northernAndes,Caatinga
andCerrados.OnegenusfromtheTepuis(Metaphryniscus)wasalsoomittedgiventhat
no sequences were available, as well as two introduced species (Eleutherodactylus
johnstoneiandLithobatescatesbeianus).Overall,51generawereusedinouranalyses.
Studyareaandspeciesdistributiondata4.
Ouranalysesfocusedonarectangularareathatincludesthewholecentral,easternand
northern parts of Amazonia (excludingmost of thewestern and southern parts). The
limitsofourstudyareawereW72°W47°andS11°N9°.Weappliedagridof1°Í1°
(500cells)tothisarea.ThisincludestheGuianaShield(Lujan&Armbruster,2011),the
centralandeasternpartsof theRioAmazonasdrainage,andthenorthernpartsof the
RioPurus,RioMadeira,RioTapajós,RioXingú,andRioTocantinsdrainages(Fig.1A)as
wellasperipheralnon-Amazonianareas.
We then estimated the putative range of each species by creating convex
polygonsoutofouroccurrencedatasetsTAXO1(358speciestotalwithinthefocalarea)
andTAXO2(596species)withthesppackageimplementedinR(RDevelopmentCore
Team,2016).ThenumbersofAmazonianspeciesincludedinTAXO1andTAXO2differ
from those occurring within the focal area because this area encompasses non-
Amazonian areas and excludes western and southern parts of Amazonia. We then
interpolated the occurrence of species in each cell of our study area for the three
datasets.Weexcludedspeciesoccurringinlessthanthreelocalitiesandcellswithless
than fivespecies in them, thus removingpoorly sampledspecies, thatdidnotprovide
enoughinformationforrangereconstruction,andpoorlysampledperipheralcells.118
Appendix–BiogeographyofAmazonianAnurans
231
specieswerediscardedinTAXO1and318inTAXO2.Finally,weconsidered240species
inTAXO1,278inTAXO2,and440intheIUCNdatasetwithinthefocalarea(Fig.1D,E,
F).
Figure1.(A)Alloccurrencesinthebarcodingdatasetandinsetofthefocalarea;(B)AmazonianAreasofEndemismfromSmithetal.,2014;(C)speciesrichnessmappedfromoccurrencesdatafrom TAXO1 and TAXO2, which provide identical results; (D) species richness mapped fromTAXO1afterpolygontransformationandexclusionofrarespecies;(E)speciesrichnessmapped
Appendix–BiogeographyofAmazonianAnurans
232
from TAXO2 after polygon transformation and exclusion of rare species; (F) species richnessmappedfromthedistributiondataofIUCNconsideredinouranalyses.
IdentificationofBiogeographicSubregions5.
TodelimitBSRsbasedonspeciesoccurrence,wedecomposedthecommunitymatrix-
i.e., the matrix listing the species occurring in each grid cell - using Latent Dirichlet
Allocation (Blei et al., 2003; Valle et al., 2014). LDA is an unsupervised clustering
methodbasedonaprobabilisticmodel,whichassumesthatseveralspeciesassemblages
coexistoverthestudyarea,thenumberKofwhichisfixedbeforehand.Thismethodhas
major advantages compared to classic clustering (e.g., hierarchical or k-means
clustering). First, it is likelihood-based, thusproviding rigorous tools for selecting the
number of assemblages and comparing decompositions. Second, assemblages may
partially overlap in taxonomic composition, and a given grid cell may either be
dominatedbyoneassemblageorcontainamixtureofassemblages.Thus, itallows for
modellinggradualchangesintaxonomiccompositionoverspace.Amixingparameterα
isestimatedaspartoftheinferenceprocedure,andindicateswhetherthesamplestend
tobedecomposedintoanevenmixtureofassemblages(case𝛼 > 1)orintoanuneven
mixturedominatedbyoneassemblage(case𝛼 < 1).
WeusedtheVariationalExpectationMaximization(EM)algorithmimplemented
byBleietal.(2003)andwrappedintotheRpackagetopicmodels(Grün&Hornik,2011)
forparameterinference,withaconvergencethresholdof10!!fortheEMstepand10!!
for the variational step.We assessed the reliability of the solution by comparing the
taxonomic composition of assemblages between 100 realizations of the algorithm
starting from random initial conditions. We only interpreted the decomposition
correspondingtotherealizationwiththehighestlikelihoodoutof100.Weselectedthe
numberKofassemblagesbyAICminimization.Werepresentedthespatialdistribution
of assemblages on a map after ordinary Kriging between cells (R package gstat ;
Pebesma, 2004). We also computed the Jaccard taxonomic dissimilarity between
assemblages and displayed it as a dendrogram. Additionally, we decomposed the
Appendix–BiogeographyofAmazonianAnurans
233
datasets into K=3 assemblages to assess the coarser biogeographic structure of the
studyarea.SeeSommeria-Kleinetal.(inprep.)forfurthermethodologicaldetails.
Appendix–BiogeographyofAmazonianAnurans
234
Results
Underestimation of species richness. Based on our analyses, among the 363
Amazonian species found in TAXO1, 53 genetic lineages could not be associatedwith
any nominal taxa. In the EGS, most of these undescribed lineages were already
documented (e.g., Adelophryne sp., Scinax sp. 2, or Pristimantis sp. 1) (Fouquet et al.,
2007b, 2012b). In southern and western Amazonia however, several lineages are
reportedhereforthefirsttime(e.g.,Allobatessp.“Divisor”,Amazophrynellasp.“Acre”,
Dendropsophussp.“Xingú”),Thissuggeststhatspeciesdiversityhasbeenwellsampled
inthelowlandsoftheGuianaShield,butnotintherestofAmazonia.Ourdatasetsalso
provide evidence of range extension formany taxa compared to previous knowledge.
ThisisforexamplethecaseofScinaxnasicus,whichextendstotheSipaliwinisavannah
(Suriname), Pristimantis koheleri, to the southern part of the Guiana Shield, or
Synapturanusmirandariberoi,tothesouthernpartoftheAmazonasdrainage.However,
mostofthesenewlydocumentedpopulationsarehighlygeneticallydivergentfromthe
populations lying within the known range of the species and are considered as
independentspeciesinTAXO2.
Infact,246TAXO1speciesdisplaysplits,yielding568species(Í2.3)inTAXO2.
TAXO2 provides 1,548 pairwise comparisons among species that are lumped as
conspecific in TAXO1. 39% of these average pairwise distances (p-distance pairwise
deletion)wereabove6%,athresholdbelievedtoconservativelydelimitspecies(Vences
et al., 2005; Fouquet et al., 2007a) and 85% were above 3% (Fig. 2A). In terms of
taxonomy, 436 TAXO2 species cannot be assigned to any of the 310 nominal taxa of
TAXO1. These observations suggest that the TAXO1 framework remains
overconservativeinmanyinstances.
Appendix–BiogeographyofAmazonianAnurans
235
Figure2.A:HistogramoftheaveragepairwisedistancesamongTAXO2speciesconsideredasasingle TAXO1 species (white bars) and among TAXO2 species considered as different TAXO1species(redbars;this lastdistributionwasrandomlysampledtoharbourthesamenumberofcomparisonsthan inthepreviousone);(B-C)Examplesofgeneticandgeographicpatterns fortwoPanamazonian singleTAXO1species thatprovidedrasticallydifferentpatterns inTAXO2;Leptodactyluspetersiibeing split into 16 specieswhereasHypsiboascalcaratus is only split intwo candidate species in TAXO2. The colours of the lineages on the tree correspond to thecoloursoftheoccurrencepointsandareasonthemap.†indicatescandidatespeciesthatwerediscardedfromtheanalysesinTAXO2(lessthanthreelocalityrecords).
Appendix–BiogeographyofAmazonianAnurans
236
Anumberofdistinctpatternsofdistributionemergefromtheoccurrencedataof
TAXO1 and TAXO2. We highlight three of them that segregate groups of species
occurring in the EGS: Guiana Shield endemic groups; Panamazonian allopatric groups
andwidespreadspecies.Thefirstpatternconcernsfivegroupsthatareendemictothe
GuianaShieldandoccurinboththehighlandsandthelowlands:Adelophryne(4species
in TAXO1 vs.4 in TAXO2),Otophryne (3 vs.3 species),Synapturanus (3 vs.4 species),
Anomaloglossus (15 vs.29 species),Vitreorana ritae clade (3 vs.3 species),Hypsiboas
benitezi clade (3 vs. 3 species). Among them, only Anomaloglossus seems to have
substantiallydiversifiedinthelowlands.Secondly,thevastmajorityofspeciesoccurring
in the EGS are nested inwidespread Amazonian or lowlands Neotropical clades (Fig.
2B).Most of these cladesdisplaydeepdivergence amongpopulations (above6%; e.g.
Leptodactyluspetersii–16candidate species inTAXO2)andcontain several candidate
specieswithmorerestrictedranges.Finally,78speciesoutof358(22%)inTAXO1,45
out of 596 (8%) in TAXO2 and 142 out of 440 (32%) in IUCN actually have broad
distributions(>1millionskm2)withinourfocalstudyarea(e.g.,H.calcaratus)(Fig.2C).
Biogeographical subregions.Wedecomposed theTAXO1,TAXO2 and IUCNdatasets
usingLatentDirichletAllocation.AICminimizationyieldedanoptimalnumberofspecies
assemblages close toK = 8 for all three datasets (Fig. S2). The retrieved assemblages
were found to be spatially segregated (mixing parameter α much smaller than 1:
𝛼!"#$ = 0.021,𝛼!"#$! = 0.019,𝛼!"#$! = 0.016 ) and contiguous. We could thus
interpretthemasBSRs.TheLDAdecompositionwasfoundtobereliableforthethree
datasetsbasedonitsstabilityover100realizations(Fig.S2).
Appendix–BiogeographyofAmazonianAnurans
237
Figure 3. Maps generated by interpolating the eight-assemblage Latent Dirichlet Allocation(LDA)decompositionofthespeciesoccurrencedata(A,B,C),andcorrespondingdendrogramsshowingtherelationshipsbetweentheeightassemblagesrecoveredintheLDAdecompositionusing average Jaccard taxonomic dissimilarity (based on the presence/absence of species inassemblages). (A) TAXO1; (B) TAXO2; (C) IUCN data. The white dashed lines represent theapproximateboundariesoftheBSRforathree-assemblageLDAdecomposition(inpanel[B],thenorth-westernandsouth-easternregionsbelongtothesameassemblage).Thenumbersonthemaps correspond to the numbers attributed to assemblages for each dataset. (D) TAXO1; (E)TAXO2;(F)IUCNdata.
Eventhoughnotidentical,thespatialboundariesoftheeightBSRsretrievedfor
TAXO1andTAXO2wereverysimilar(Fig.3A-B).ThelowlandsoftheEGSwereclearly
separatedfromtherestofthestudyareabytheRioAmazonasandthePantepuiregion.
Moreover,theEGSwasalsofoundtoexhibitsomeinternalstructure,sincethisareawas
composedofthreeindependentBSRs,allfoundinbothTAXO1andTAXO2despitelarge
differencesinthedistributionofthespeciesconsidered(e.g.,Leptodactyluspetersii).One
of these three BSRs (BSR1 on Fig. 3A-B) comprised the southern part of Guyana,
Roraima and the Northern parts of Pará and Amazonas states (Brazil). A second one
(BSR2 on Fig. 3A-B) comprised the northern part of Guyana and adjacent Venezuela.
Finally,a thirdone(BSR3onFig.3A-B)comprisedthestateofAmapá(Brazil),French
Appendix–BiogeographyofAmazonianAnurans
238
Guiana,andSuriname.ThesethreeBSRwereretrievedasasingleclusterinthecoarser
3-assemblage LDA decomposition. Taxonomic comparison between assemblages
indicatedthatamongthesethreeBSR,BSR1andBSR3weremoresimilartoeachother,
inbothTAXO1andTAXO2(Fig.1D,E).TheonlynotabledifferencebetweenTAXO1and
TAXO2intheEGSareawasthattheboundariesofBSR1matchedwelltheRioNegroand
Rio Amazonas in TAXO2, while BSR1 extended somewhat further west across the
RupununisavannahinTAXO1.TheboundariesbetweenBSRsinthisspecificareawere
also sharper in TAXO2 than in TAXO1. Outside of the EGS area, there was a striking
matchbetweenBSRboundariesandRioMadeirainTAXO1thatwasalreadyrecovered
in the 3-assemblage decomposition. In contrast, the Purus and Tapajòs Rivers were
foundtobeeachatthecenterofaBSRinbothTAXO1andTAXO2.
ThedistributionofBSRsusingtheIUCNdatabaseprovidedamarkedlydifferent
pattern, notably not matching the EGS boundaries. The three Guianas (Guyana,
Suriname,andFrenchGuiana)weregroupedtogetherinoneBSR,excludingthenorth-
westernpartofGuyanaand includingadjacentareasofAmapáandPará (Brazil).The
southernpartoftheEGSwasgroupedwiththesouthernpartoftheAmazondrainage,
thusencompassingRioAmazonas(Fig.3C).
Speciesrichnessandendemism.Intermsofspeciesrichnessandendemism,thethree
datasetsareradicallydifferent.TheBSR1ofIUCNiscomposedof119species,27.7%of
whichareendemic(Table1),andisgeographicallycomparabletothelumpingtogether
ofBSR2and3inTAXO1andTAXO2.Yet,despiteencompassingasmallergeographical
area,theBSR3ofTAXO1alonedisplayssimilarvaluesofrichnessandendemismasthe
BSR1 of IUCN. When considering the three Guiana Shield BSRs together in TAXO1,
richness(184species)andendemism(57%)aremuchhigherthanintheBSR1ofIUCN.
These metrics increase to 250 species and 82.4 % endemism in TAXO2 for the EGS
(Table1).BSR2(NorthernGuyana)containsthehighestnumberofendemicspeciesin
both taxonomic frameworks, reaching 75%endemism inTAXO2 (Table 1),while the
highestspeciesrichness(130inTAXO2)isfoundinBSR3(Suriname,FrenchGuianaand
Amapá).
Appendix–BiogeographyofAmazonianAnurans
239
UICN TAXO1 TAXO2
Partition BSR Species
richness
Endemic
species
Endemism
rate (%)
Species
richness
Endemic
species
Endemism
rate (%)
Species
richness
Endemic
species
Endemism
rate (%)
K = 8
1 119 33 27.7 89 4 0.4 71 25 35.2
2 – – – 85 46 54.1 90 68 75.5
3 – – – 118 30 25.4 130 77 59.2
K = 3 1 – – – 184 105 57 250 206 82.4
Table1:SpeciesrichnessandendemismineachoftheBSRscoveringtheEGS.Thefigurespresentedinthistableincludesingletons(specieswithonlyoneoccurrencepoint)andspeciesthatoccurinlessthanthreecells.BSRnumberscorrespondtothosedisplayedinFig.3.ForK=3,assemble1actuallycorrespondstotheEGS.
Appendix–BiogeographyofAmazonianAnurans
240
Discussion
Underestimation of species richness and regional endemism in Amazonia.We
analysedourmoleculardiversitydatausing two alternative taxonomic frameworks: a
conservativeframeworkTAXO1inwhichsequenceswereasmuchaspossibleclustered
intomonophyleticgroupsaroundpreviouslydescribednominaltaxa,andaframework
TAXO2 in which species were delineated solely based on the molecular distance
betweensequences.Ourspeciesdelineationanalysiscorroboratesprevioussuggestions
that the actual number of anuran species occurring in Amazonia remains vastly
underestimated (Fouquet et al., 2007a; Funk et al., 2012; Ferrão et al., 2016). The
numberofspeciesretrievedinTAXO2(746)andthelevelofdivergenceamongthemare
particularlystrikinginmanygroups.
OurTAXO1datasetcomprises363Amazonianspecies,whichisclosetothe427
speciesrecordedbytheIUCN.However,oursamplingeffort is lowoutsidetheEGS,as
illustratedbythefactthatwedonotretrieveseveralnominaltaxaincludedintheIUCN
database.Therefore,theactualnumberofspeciesislikelytobelargelyunderestimated
in TAXO1 outside the EGS. Moreover, TAXO1 remains over-conservative in many
instances, as the level of genetic divergencewithin species is often very high. TAXO2
suggests the existence of more than twice the number of species found in TAXO1.
ConsideringthatunevensamplingisevenmoreofanissueinTAXO2thaninTAXO1,as
manyofour candidate species areonly retrieved inoneor a few localities, theactual
species count for Amazonia is likely to be substantially more than twice the current
count. Hence, comparisons between taxonomic frameworks should be limited to the
EGS,whereoursamplingeffortishighest.WhenconsideringsolelytheEGS,thenumber
ofcandidatespeciesretrievedinTAXO2is1.34timeshigherthanforTAXO1(Table2).
A species delineation solely based on mtDNA divergence remains overly
simplistic and cannot reliably delineate the species occurring in the region since it
necessarilyoverestimates theactualnumberof species in somecases (falsepositives)
and underestimates in others (false negatives) (Hickerson et al., 2006). The pitfalls
inherent to the sole use of shortmtDNA sequences for species delineation have been
Appendix–BiogeographyofAmazonianAnurans
241
already extensively discussed (Hubert&Hanner, 2015).Nevertheless, inmost groups
for which the boundaries among species have been investigated using integrative
taxonomy,mtDNAdivergenceofsimilarmagnitudeasusedinthisstudytodifferentiate
between intra- and interspecific genetic divergence was generally associated with
phenotypicoracousticdifferentiationaswell(Funketal.,2012;Fouquetetal.,2015b;
Ortega-Andradeetal.,2015;Fouquetetal.,2016).Moreover,TAXO2subdivisionshave
already been proven to be associated with morphological or acoustic differences in
severalgroups(Jansenetal.,2011;Fouquetetal.,2013;Ferrãoetal.,2016).Thus, the
TAXO2 taxonomic framework takes into account finer subdivisions that certainly
correspond tophenotypically distinct species inmany cases, and it is highlyprobable
thattheprevalenceoffalsepositivesremainslimited.Incontrast,somefalsenegatives
weredetectedsinceseveralnominal taxawereretrievedasasinglecandidatespecies
using ABGD (e.g.,Atelopus flavescens and A. hoogmoedi,O. oophagus andO. taurinus).
ThesewerecorrectedinTAXO2buttheprevalenceoffalsenegativesremainsdifficultto
evaluate in most groups where species boundaries have not been investigated using
phenotypic traits. Overall, the present work provides an important update to the
documentation of Amazonian anuran diversity, which will undoubtedly contribute to
stimulatetheprocessofspeciesdelineationanddescription.
Ifourworkprovidesaglimpseofhow farwestill are fromreachinga realistic
estimateof thenumberof speciesoccurring throughoutAmazonia, italsoprovidesan
evenmorestrikingviewofthedegreeofregionalendemism.Ourestimatesoftherateof
endemismforthefrogsoftheEGSreach57.0%basedonTAXO1and82.4%basedon
TAXO2.ThesefiguresaretwotofourtimeshigherthantheestimateoftheIUCNforthe
samearea.Theyarealso1.0to1.4timeshigherthantherateofendemismoffrogsinthe
wholegeologicallydefinedGuianaShield,whichalsoencompassesVenezuelaandpartof
Colombia (Señaris&MacCulloch, 2005). In comparison, only 7.7%of bird species are
endemic to the whole Guiana Shield, 29 % of reptile species, and 11 % of mammal
species (Hollowell&Reynolds, 2005).These figures are still certainlyunderestimated
(Lim,2012),especiallyforreptiles(Geurgas&Rodrigues,2010;deOliveiraetal.,2016),
buttaxonomyhasprobablyreachedamuchmorestablelevelforbirdsandmammalsin
theGuianaShieldthanforanurans.IncomparisonwithothertropicalAmericanregions,
Appendix–BiogeographyofAmazonianAnurans
242
51.3%ofthevertebratespeciesfromtheAtlanticForestofBrazilareendemic,and46.2
%ofthevertebratesfromthetropicalAndesareendemic(Myersetal.,2000).
A simpleandroughextrapolationbasedon the species richnessandendemism
weobtainedfortheEGS(184–250specieswith57–82%endemism)appliedtotheeight
AmazonianBSRsretrievedinouranalysis leadstoca.1,472–2,000speciesinourfocal
area,which represent about three to five times the 427 species that are supposed to
occur inAmazonia according to the IUCN.Enhancingdata coverage in order to refine
theseestimationswouldrequireextensivefieldworkinremoteareas.Nevertheless,new
predictiveapproachesbasedonthedetectionofcrypticdiversity(Espíndolaetal.,2016)
maypermit to get amoreprecise estimate of species richness and endemism in each
BSR,andthereforewouldhelptargetingareaswheretofocussampling.
Biogeographic division of the eastern Guiana Shield. The extent of the BSRs
retrieved forTAXO1andTAXO2arevery similar in spiteof theuseof twodrastically
differenttaxonomicframeworks.Incontrast,theBSRsretrievedfromtheIUCNdatabase
areverydifferentanddonotcorrespondtoanylandscapefeature.Nobarriereffectof
the lowerRioAmazonas isevendistinguishable.This ismost likelyresulting fromthe
artificiallylargedistributionofmanyspeciescontainedinthisdatabaseonbothsidesof
thisriver.
The locationof theRioMadeiramatcheswell theboundarybetweenBSR5and
BSR6 in TAXO1,which is in accordancewithwhat has already been shown for other
groups of terrestrial vertebrates, such as birds (Fernandes et al., 2012; Ribas et al.,
2012) and primates (Cortés-Ortiz et al., 2003). The sharpness of this pattern is not
obviousinTAXO2,butthisisprobablyduetotheremovalofmanysingletonsfromthe
dataset after species delineation. Another interesting aspect is the lack of apparent
suture effect between the Purus and the Solimões drainages, also in accordancewith
whathaspreviouslybeenfoundforothergroupofterrestrialvertebrates(Cortés-Ortiz
etal.,2003;Fernandesetal.,2012;Ribasetal.,2012).Theseriversdisplayameandering
behaviour associated with an unstable course over time, thus enabling gene flow
throughconnectionbetweenpopulationslocatedonbothsidesanddispersalofspecies
fromone interfluve to theother (Aleixo,2004,2006;Batesetal., 2004; Jacksonetal.,
Appendix–BiogeographyofAmazonianAnurans
243
2013).Onthecontrary,wideriversintheBrazilianshieldsuchasRioMadeiradisplaya
putativelymorestablecourseovertimeandaremorelikelytoactaslonglastingsuture
zones that might have promoted diversification or at least been more efficient in
preventingdispersal(Antonellietal.,2010;Moraesetal.,2016).Suchcharacteristicsare
alsofoundinriversoftheEGS(Fernandesetal.,2012;Fouquetetal.,2012a,2015a),but
exceptfortheRioBrancoandRioNegro,theimpactoftheGuianaShieldriversongene
flowthroughdispersallimitationmightnotbeasimportantasfortheAmazonianrivers
of theBrazilianShield, owing to the smaller extentof the catchments and the smaller
width of the rivers themselves. This is reflected in our results, as the suture zones
betweenthethreeBSRsoftheEGSdonotcorrespondtoanymajordrainage.Infact,itis
more likely that the delimitation of these assemblages resulted from the combined
influence of past climatic and landscape changes (Fouquet etal., 2012c). The current
climatic characteristics of the EGS are heterogeneous, with a large dryer corridor
observedinthesouthernpart(Mayle&Power,2008),wherepatchesofsavannahsare
found today. This corridor alsomatches the suture zone betweenBSR1 vs. BSR2 and
BSR3. The strong climatic fluctuations in the Neotropics during the Miocene and
Plioceneplayedacrucialroleinthediversificationofseveralorganisms(Antonellietal.,
2010).Morerecentclimatefluctuationsandassociatedlandscapemodificationsduring
thePleistocenecertainlyhelpedmaintainthediversitythatresultedfromdiversification
eventsduringtheMioceneandPlioceneperiods(Carnaval&Bates,2007).
TheouterlimitsofthethreeBSRsmatchwellthedelimitationoftheGuiananarea
retrieved for birds (Naka, 2011), confirming the relevance of qualifying the EGS as a
biogeographic area. Nonetheless, using anuran assemblages as a model revealed
biogeographic heterogeneity within this region that could not be detected with bird
assemblages, likely because birds have much higher dispersal abilities than anurans
(Pigot&Tobias,2015).Thedistinctivenessof theBSRs compared to the remainingof
thedataset isalsoreflectedinthestructureofthedendrogramillustratingthe levelof
taxonomic similarity between assemblages (Fig. 3D, E). The southern limit of BSR1
corresponds to Rio Amazonas for both TAXO1 and TAXO2. This is congruent with
previousstudiesonterrestrialvertebratesindicatingthatthisriverisastrongbarrierto
gene flowand that it structures speciesassemblages (Cortés-Ortizetal., 2003;Haffer,
2008; Ribas et al., 2012). The delineation of the western part of BSR1 differs across
Appendix–BiogeographyofAmazonianAnurans
244
datasets. It coincides perfectly with the lower Rio Negro, and the Rio Branco and
associatedsavannahs(Rupununi) inTAXO2butextends furtherwest inTAXO1.These
differencesareinherenttothescarcersamplingwestandsouth-westoftheRioNegro
andRioBranco,weakening the sharpnessof the analysis in that zone, aphenomenon
that becomes even more prevalent in TAXO2 because of the further taxonomic
subdivisions. Another reason could be the inclusion of both forest and open habitat
speciesinouranalysis,whichcouldblurthepatterninareaswherebothsavannahand
forestarefound.
It is interestingtonote that the limitsof theBSRsof theEGSarerathersimilar
when considering either aK=3 or aK=8 decomposition, for both TAXO1 and TAXO2.
Thisindicatesthatastrongco-occurrencesignalunderliesthedelineationoftheseBSRs,
especially in the caseof the twonorthernmostones (BSR2andBSR3)whosewestern
and eastern boundaries coincide perfectly with the ones retrieved in the three-
assemblagedecomposition(Fig.3).
Conclusion.Despitebeingfarfromexhaustive,ourbarcodingdatasetisthelargestever
gathered forAmazonia,andweargue that it isclose frombeingexhaustivewithin the
EGS. Of course, the patterns we obtained need to be confirmed in other taxonomical
groups, and need even for the anurans to be much improved outside the EGS.
Nevertheless, our results help us understand the spatial scale of the sampling efforts
needed to capture the actual diversity of Amazonia. It implies notably that the
magnitudeoftheLinneanandWallaceanshortfallsinAmazoniaissolargethatwecould
questiontheconclusionsoflarge-scalestudiesbasedoncurrentlyadmittedbiodiversity
data inAmazonia (Feeley& Silman, 2011; Foden etal., 2013). In fact, evenwith very
coarsedata(IUCN),theyestimatedthatAmazonianamphibiansarehighlythreatenedby
climatechange.Consideringthatmanyspecieswerenotincludedandthattheyactually
harbour much narrower distributions, we can hypothesise that the situation is even
moreworrying.IfadegreeofendemismsimilartotheoneweestimatedwithintheEGS
actually occurs across Amazonia, the impact of habitat loss could have been
underestimated.ItisespeciallythecasealongtheArcofdeforestation(Vedovatoetal.,
2016),whereentire faunalassemblages thatmayharbourahighdegreeofendemism
Appendix–BiogeographyofAmazonianAnurans
245
areatriskofextinction(DaSilvaetal.,2005).Moreover,onlyBSR3encompassesalarge
proportion of protected areas in the EGS. In contrast, BSR2 (northern Guyana) only
harbourstwoprotectedareasandtheBSR1onlyencompassesthreebiologicalreserves
(REBIO), four national forests (FLONA) and three national parks (PARNA) in its
Brazilian part. Such results demonstrate the importance of deciphering the basic
structureoftheAmazoniandiversityinordertoconserveitefficiently.
Appendix–BiogeographyofAmazonianAnurans
246
Acknowledgements
Thiswork has benefited from an 'Investissement d'Avenir' grantmanaged byAgence
Nationale de la Recherche (CEBA, ref.ANR-10-LABX-25-01), France. We would like to
thank the following people for their help on the field: Daniel Baudin, Sébastien Cally,
ElodieCourtois,AndyLorenzini,BenoîtVillette.WethankPierreSolbèsatLaboratoire
Évolution et Diversité Biologique (Toulouse, France) for support with the EDB-cCacl
cluster.
Appendix–BiogeographyofAmazonianAnurans
247
References
Aleixo, A. (2004) Historical diversification of a terra-firme forest bird superspecies: aphylogeographic perspective on the role of different hypotheses of Amazoniandiversification.Evolution,58,1303–1317.
Aleixo,A.(2006)HistoricaldiversificationoffloodplainforestspecialistspeciesintheAmazon:acasestudywithtwospeciesoftheaviangenusXiphorhynchus(Aves:Dendrocolaptidae).BiologicalJournaloftheLinneanSociety,89,383–395.
Anderson,L.O. (2012)Biome-ScaleForestProperties inAmazoniaBasedonFieldandSatelliteObservations.RemoteSensing,4.
Antonelli, A., Quijada-Mascareñas, A., Crawford, A.J., Bates, J.M., Velazco, P.M. & Wüster, W.(2010)MolecularstudiesandphylogeographyofAmazoniantetrapodsandtheirrelationtogeologicalandclimaticmodels. InHoorn,C.,Wesselingh,F.:Amazonia,LandscapeandSpeciesEvolution,1stedition.Blackwellpublishing,386–404.
Antonelli, A. & Sanmartín, I. (2011)Why are there somany plant species in the Neotropics?Taxon,60,403–414.
Bates,H.W.(1863)ThenaturalistontheRiverAmazons,arecordofadventures,habitsofanimals,sketchesofBrazilianandIndianlifeandaspectsofnatureundertheEquatorduringelevenyearsoftravel,JohnMurray,London.
Bates,J.M.,Haffer,J.&Grismer,E.(2004)AvianmitochondrialDNAsequencedivergenceacrossaheadwaterstreamoftheRioTapajós,amajorAmazonianriver.JournalofOrnithology,145,199–205.
Blei,D.M.,Ng,A.Y.&Jordan,M.I.(2003)LatentDirichletAllocation.JournalofMachineLearning,3,993–1022.
Carnaval,A.C.&Bates, J.M. (2007)AmphibianDNAshowsmarkedgenetic strucureand tracksPleistoceneclimatechangeinNortheasternBrazil.Evolution,61,2942–2957.
Claramunt, S., Derryberry, E.P., Remsen, J. V & Brumfield, R.T. (2011) High dispersal abilityinhibitsspeciationinacontinentalradiationofpasserinebirds.ProceedingsoftheRoyalSocietyB:BiologicalSciences.
Colinvaux, P.A., De Oliveira, P.E. & Bush, M.B. (2000) Amazonian and Neotropical plantcommunities on glacial time-scales: The failure of the aridity and refuge hypotheses.QuaternaryScienceReviews,19,141–169.
Cortés-Ortiz, L., Bermingham, E., Rico, C., Rodrıguez-Luna, E., Sampaio, I. & Ruiz-Garcıa, M.(2003) Molecular systematics and biogeography of the Neotropical monkey genus,Alouatta.MolecularPhylogeneticsandEvolution,26,64–81.
Cracraft, J. (1985) Historical Biogeography and Patterns of Differentiation within the SouthAmericanAvifauna:AreasofEndemism.OrnithologicalMonographs,49–84.
Dinerstein,E.,Olson,D.M.,Graham,D.J.,Webster,A.L.,Primm,S.A.,Bookbinder,M.P.&Ledec,G.(1995)AConservationAssessmentoftheTerrestrialEcoregionsofLatinAmericaandtheCaribbean,Washigton(DC):WorldBank.
Duellman,W.E.&Trueb,L.(1994)BiologyofAmphibians,JohnHopkinsUniversityPress.Emerson, B.C., Cicconardi, F., Fanciulli, P.P. & Shaw, P.J.A. (2011) Phylogeny, phylogeography,
phylobetadiversity and themolecular analysis of biological communities.PhilosophicalTransactionsoftheRoyalSocietyB:BiologicalSciences,366.
Appendix–BiogeographyofAmazonianAnurans
248
Espíndola,A.,Ruffley,M.,Smith,M.L.,Carstens,B.C.,Tank,D.C.&Sullivan, J. (2016) Identifyingcryptic diversity with predictive phylogeography. Proceedings of the Royal Society B:BiologicalSciences,283.
Feeley, K.J. & Silman, M.R. (2016) Disappearing climates will limit the efficacy of Amazonianprotectedareas.DiversityandDistributions,22,1081–1084.
Feeley,K.J.&Silman,M.R.(2011)Thedatavoidinmodelingcurrentandfuturedistributionsoftropicalspecies.GlobalChangeBiology,17,626–630.
Fernandes, A.M., Wink, M. & Aleixo, A. (2012) Phylogeography of the chestnut-tailed antbird(Myrmeciza hemimelaena) clarifies the role of rivers in Amazonian biogeography.JournalofBiogeography,39,1524–1535.
Ferrão,M., Colatreli, O., de Fraga, R., Kaefer, I.L.,Moravec, J. & Lima, A.P. (2016)High speciesrichnessofScinaxtreefrogs(Hylidae)inathreatenedAmazonianlandscaperevealedbyanintegrativeapproach.PLoSONE,11,e0165679.
Ficetola,G.F.,Rondinini,C.,Bonardi,A.,Katariya,V.,Padoa-Schioppa,E.&Angulo,A.(2014)Anevaluationof the robustnessof global amphibian rangemaps. JournalofBiogeography,41,211–221.
Foden,W.B.,Butchart, S.H.M., Stuart, S.N., Vié, J.-C.,Akçakaya,H.R.,Angulo,A.,DeVantier, L.M.,Gutsche,A.,Turak,E.,Cao,L.,Donner,S.D.,Katariya,V.,Bernard,R.,Holland,R.A.,Hughes,A.F., O’Hanlon, S.E., Garnett, S.T., Şekercioğlu, Ç.H. &Mace, G.M. (2013) Identifying theWorld’sMostClimateChangeVulnerableSpecies:ASystematicTrait-BasedAssessmentofallBirds,AmphibiansandCorals.PLoSONE,8,e65427.
Fouquet, A., Courtois, E.A., Baudain, D., Lima, J.D., Souza, S.M., Noonan, B.P. & Rodrigues,M.T.(2015a)Thetrans-riverinegeneticstructureof28Amazonianfrogspeciesisdependentonlifehistory.JournalofTropicalEcology,31,361–373.
Fouquet,A.,Gilles,A.,Vences,M.,Marty,C.,Blanc,M.&Gemmell,N.J.(2007a)UnderestimationofspeciesrichnessinneotropicalfrogsrevealedbymtDNAanalyses.PlosOne,2,e1109.
Fouquet,A.,Ledoux, J.-B.,Dubut,V.,Noonan,B.P.&Scotti, I. (2012a)Theinterplayofdispersallimitation, rivers, and historical events shapes the genetic structure of an Amazonianfrog.BiologicalJournaloftheLinneanSociety,106,356–373.
Fouquet,A.,Loebmann,D.,Castroviejo-Fisher,S.,Padial, J.M.,Orrico,V.G.D.,Lyra,M.L.,Roberto,I.J.,Kok,P.J.R.,Haddad,C.F.B.&Rodrigues,M.T.(2012b)FromAmazoniatotheAtlanticforest:MolecularphylogenyofPhyzelaphryninaefrogsrevealsunexpecteddiversityanda striking biogeographic pattern emphasizing conservation challenges. MolecularPhylogeneticsandEvolution,65,547–561.
Fouquet,A.,Martinez,Q.,Courtois,E.A.,Dewynter,M.,Pineau,K.,Gaucher,P.,Blanc,M.,Marty,C.&Kok,P.J.R.(2013)AnewspeciesofthegenusPristimantis(Amphibia,Craugastoridae)associatedwiththemoderatelyelevatedmassifsofFrenchGuiana.Zootaxa,3750,569–586.
Fouquet,A.,Martinez,Q.,Zeidler,L.,Courtois,E.A.,Gaucher,P.,Blanc,M.,Lima,J.D.,Souza,S.M.,Rodrigues, M.T. & Kok, P.J.R. (2016) Cryptic diversity in the Hypsiboas semilineatusspeciesgroup(Amphibia,Anura)withthedescriptionofanewspeciesfromtheeasternGuianaShield.Zootaxa,4084,79–104.
Fouquet,A.,Noonan,B.P.,Rodrigues,M.T.,Pech,N.,Gilles,A.&Gemmell,N.J. (2012c)Multiplequaternary refugia in the Eastern Guiana Shield revealed by comparativephylogeographyof12frogspecies.SystematicBiology,61,461–489.
Fouquet, A., Orrico, V.G.D., Ernst, R., Blanc, M., Martinez, Q., Vacher, J.-P., Rodrigues, M.T.,
Appendix–BiogeographyofAmazonianAnurans
249
Ouboter,P.,Jairam,R.&Ron,S.(2015b)AnewDendropsophusFitzinger,1843(Anura:Hylidae)oftheparvicepsgroupfromthelowlandsoftheGuianaShield.Zootaxa,4052,39–64.
Fouquet,A.,Recoder,R.,Teixeira Jr.,M.,Cassimiro, J.,Amaro,R.C.,Camacho,A.,Damasceno,R.,Carnaval, A.C., Moritz, C. & Rodrigues, M.T. (2012d) Molecular phylogeny andmorphometric analyses revealdeepdivergencebetweenAmazonia andAtlantic ForestspeciesofDendrophryniscus.MolecularPhylogeneticsandEvolution,62,826–838.
Fouquet, A., Vences, M., Salducci, M.D., Meyer, A., Marty, C., Blanc, M. & Gilles, A. (2007b)Revealingcrypticdiversityusingmolecularphylogeneticsandphylogeography in frogsof the Scinax ruber andRhinellamargaritifera species groups.MolecularPhylogeneticsandEvolution,43,567–582.
Funk,W.C.,Caminer,M.&Ron,S.R.(2012)HighlevelsofcrypticspeciesdiversityuncoveredinAmazonianfrogs.ProceedingsoftheRoyalSocietyB:BiologicalSciences,279,1806–1814.
Gehara,M.,Crawford,A.J.,Orrico,V.G.D.,Rodríguez,A., Lötters, S., Fouquet,A.,Barrientos,L.S.,Brusquetti,F.,DelaRiva,I.,Ernst,R.,Urrutia,G.G.,Glaw,F.,Guayasamin,J.M.,Hölting,M.,Jansen,M.,Kok,P.J.R.,Kwet,A., Lingnau,R., Lyra,M.,Moravec, J., Pombal Jr, J.P.,Rojas-Runjaic, F.J.M., Schulze, A., Señaris, J.C., Solé, M., Rodrigues,M.T., Twomey, E., Haddad,C.F.B.,Vences,M.&Köhler,J.(2014)Highlevelsofdiversityuncoveredinawidespreadnominaltaxon:continentalphylogeographyoftheneotropicaltreefrogDendropsophusminutus.PLoSONE,9,e103958.
Geurgas, S.R. & Rodrigues, M.T. (2010) The hidden diversity of Coleodactylus amazonicus(Sphaerodactylinae, Gekkota) revealed bymolecular data.Molecular Phylogenetics andEvolution,54,583–593.
Goldstein, P.Z. & DeSalle, R. (2011) Integrating DNA barcode data and taxonomic practice:Determination,discovery,anddescription.BioEssays,33,135–147.
Grün, B. & Hornik, K. (2011) topicmodels: An R Package for Fitting Topic Models. Journal ofStatisticalSoftware;Vol1,Issue13.
Haffer, J. (1974) Avian speciation in Tropical South America, Nuttall Ornithological Club, 14,Cambridge,Massachusetts.
Haffer, J. (2008)Hypotheses to explain the origin of species in Amazonia.Brazilian JournalofBiology,68,917–947.
Hall,J.P.W.&Harvey,D.J.(2002)ThephylogeographyofAmazoniarevisited:newevidencefromriodinidbutterflies.Evolution,56,1489–1497.
Hayes,F.E.&Sewlal,J.-A.N.(2004)TheAmazonRiverasadispersalbarriertopasserinebirds:effectsofriverwidth,habitatandtaxonomy.JournalofBiogeography,31,1809–1818.
Hickerson, M.J., Stahl, E.A. & Lessios, H.A. (2006) Test for Simultaneous Divergence usingApproximateBayesianComputation.Evolution,60,2435–2453.
Hollowell,T.&Reynolds,R.P.(2005)ChecklistoftheterrestrialvertebratesoftheGuianaShield.BulletinoftheBiologicalSocietyofWashington.
Hoorn, C. & Wesselingh, F.P. (2010) Introduction: Amazonia, landscape and species evolution.Amazonia:landscapeandspeciesevolution,pp.1–6.Wiley-Blackwell.
Hoorn, C., Wesselingh, F.P., ter Steege, H., Bermudez, M.A., Mora, A., Sevink, J., Sanmartín, I.,Sanchez-Meseguer, A., Anderson, C.L., Figueiredo, J.P., Jaramillo, C., Riff, D., Negri, F.R.,Hooghiemstra,H.,Lundberg,J.,Stadler,T.,Särkinen,T.&Antonelli,A.(2010)AmazoniaThrough Time: Andean Uplift, Climate Change, Landscape Evolution, and Biodiversity.Science,330,927–931.
Appendix–BiogeographyofAmazonianAnurans
250
Hubbell,S.P.,He,F.,Condit,R.,Borda-de-Água,L.,Kellner, J.&terSteege,H.(2008)HowmanytreespeciesarethereintheAmazonandhowmanyofthemwillgoextinct?ProceedingsoftheNationalAcademyofSciences,105,11498–11504.
Hubert,N.&Hanner,R. (2015)DNABarcoding,speciesdelineationandtaxonomy:ahistoricalperspective.DNABarcodes,3,44–58.
Hughes,C.E.,Pennington,R.T.&Antonelli,A.(2013)NeotropicalPlantEvolution:AssemblingtheBigPicture.BotanicalJournaloftheLinneanSociety,171,1–18.
Jackson, N.D., Austin, C.C., Haffer, J., Capparella, A., Haffer, J., Gascon, C.,Malcolm, J., Patton, J.,Silva, M. da, Bogart, J., Hayes, F., Sewlal, J., Colwell, R., Slatkin, M., Peres, C., Patton, J.,daSilva, M., McLuckie, A., Lamb, T., Schwalbe, C., McCord, R., Fouquet, A., Ledoux, J.,Dubut,V.,Noonan,B.,Scott,I.,Brice,J.,Stølum,H.,Akin,J.,Mather,C.,Brooks,G.,Fitch,H.,Achen,P.,Jackson,N.,Austin,C.,Jackson,N.,Austin,C.,Soltis,D.,Morris,A.,McLachlan,J.,Manos,P.,Soltis,P.,Pyron,R.,Burbrink,F.,Excoffier,L.,Smouse,P.,Quattro, J., Jackson,N.,Glenn,T.,Hagen,C.,Austin,C.,Irwin,D.,Kocher,T.,Wilson,A.,Austin,C.,Spataro,M.,Peterson,S.,Jordon,J.,McVay,J.,DeWoody,J.,Schupp,J.,Kenefic,L.,Busch,J.,Murfitt,L.,Hoffman, J., Amos, W., Pompanon, F., Bonin, A., Bellemain, E., Taberlet, P., Weir, B.,Cockerham, C., Kalinowski, S., Raymond, M., Rousset, F., Stamatakis, A., Pritchard, J.,Stephens, M., Donnelly, P., Jost, L., Hedrick, P., Excoffier, L., Hedrick, P., Slatkin, M.,Balloux,F.,Lugon-Moulin,N.,Paetkau,D.,Waits,L.,Clarkson,P.,Craighead,L.,Strobeck,C.,Gaggiotti,O.,Lange,O.,Rassmann,K.,Gliddon,C.,Crawford,N.,Heller,R.,Siegismund,H., Faubet, P., Gaggiotti, O., Barton, N., Slatkin, M., Pemberton, J., Slate, J., Bancroft, D.,Barrett, J.,Dakin,E.,Avise,J.,Evanno,G.,Regnaut,S.,Goudet, J.,Brandley,M.,Guiher,T.,Pyron,R.,Winne,C.,Burbrink,F.,O’Donnell,R.,Mock,K.,Austin,J.,Lougheed,S.,Boag,P.,Fontanella,F., Feldman,C., Siddall,M.,Burbrink,F.,Guiher,T.,Burbrink,F., Starkey,D.,Shaffer, H., Burke, R., Forstner, M., Iverson, J., Zamudio, K., Savage, W., Makowsky, R.,Chesser,J.,Rissler,L.,Niemiller,M.,Fitzpatrick,B.,Miller,B.,Li,J.,Yeung,C.,Tsai,P.,Lin,R.,Yeh,C.,Postma,E.,Noordwijk,A.van,Gavrilets,S.,Tatarenkov,A.,Healey,C.,Avise,J.,Gavrilets, S., Li,H., Vose,M., Jackson, S.,Webb,R., Anderson,K.,Overpeck, J.,Webb, T.,Haywood,A.,Valdes,P.,Sellwood,B.,Kaplan,J.,Dowsett,H.,Saucier,R.,Estoup,A.,Jarne,P.,Cornuet,J.,O’Reilly,P.,Canino,M.,Bailey,K.,Bentzen,P.,Frazier,D.,Kesel,R.,Blum,M.,Guccione,M.,Wysocki,D.,Robnett,P.,Rutledge,E.,Smith,L.,Baker,J.,Killgore,K.&Kasul,R.(2013)TestingtheRoleofMeanderCutoffinPromotingGeneFlowacrossaRiverineBarrierinGroundSkinks(Scincellalateralis).PLoSONE,8,e62812.
Jansen, M., Bloch, R., Schulze, A. & Pfenninger, M. (2011) Integrative inventory of Bolivia’slowlandanuransrevealshiddendiversity.ZoologicaScripta,40,567–583.
Jenkins, C.N., Alves,M.A.S., Uezu, A. & Vale,M.M. (2015) Patterns of Vertebrate Diversity andProtectioninBrazil.PLoSONE,10,e0145064.
Jenkins,C.N.,Pimm,S.L.& Joppa,L.N. (2013)Globalpatternsof terrestrialvertebratediversityandconservation.ProceedingsoftheNationalAcademyofSciences,110,E2602–E2610.
Ji, Y.,Ashton,L., Pedley, S.M.,Edwards,D.P.,Tang,Y.,Nakamura,A.,Kitching,R.,Dolman,P.M.,Woodcock,P.,Edwards,F.A.,Larsen,T.H.,Hsu,W.W.,Benedick,S.,Hamer,K.C.,Wilcove,D.S., Bruce, C., Wang, X., Levi, T., Lott, M., Emerson, B.C. & Yu, D.W. (2013) Reliable,verifiableandefficientmonitoringofbiodiversityviametabarcoding.EcologyLetters,16,1245–1257.
Katoh, K. & Standley, D.M. (2013) MAFFT Multiple Sequence Alignment Software Version 7:Improvements inPerformanceandUsability.MolecularBiologyandEvolution,30,772–
Appendix–BiogeographyofAmazonianAnurans
251
780.Krishna Krishnamurthy, P. & Francis, R.A. (2012) A critical review on the utility of DNA
barcodinginbiodiversityconservation.BiodiversityandConservation,21,1901–1919.Kumar, S., Stecher, G. & Tamura, K. (2016)MEGA7:Molecular Evolutionary Genetics Analysis
version7.0forbiggerdatasets.MolecularBiologyandEvolution,33,1870–1874.Lim, B.K. (2012) Preliminary assessment of Neotropical mammal DNA barcodes: an
underestimationofbiodiversity.TheOpenZoologyJournal,5,10–17.Lujan,N.K.&Armbruster,J.W.(2011)TheGuianaShield.HistoricalBiogeographyofNeotropical
FreshwaterFishes,pp.211–224.TheRegentsoftheUniversityofCalifornia.Mayle,F.E.&Power,M.J.(2008)ImpactofadrierEarly–Mid-HoloceneclimateuponAmazonian
forests.PhilosophicalTransactionsoftheRoyalSocietyB:BiologicalSciences,363,1829–1838.
Moraes,L.J.C.L.,Pavan,D.,Barros,M.C.&Ribas,C.C.(2016)Thecombinedinfluenceofriverinebarriers and flooding gradients on biogeographical patterns for amphibians andsquamatesinsouth-easternAmazonia.JournalofBiogeography,43,2113–2124.
Morrone, J.J. (2005) Biogeographic areas and transition zones of Latin America and theCaribbeanislandsbasedonpanbiogeographicandcladisticanalysesoftheentomofauna.AnnualReviewofEntomology,51,467–494.
Myers,N.,Mittermeier,R.A.,Mittermeier,C.G.,daFonseca,G.A.B.&Kent, J. (2000)Biodiversityhotspotsforconservationpriorities.Nature,403,853–858.
Naka, L.N. (2011) Avian distribution patterns in the Guiana Shield: implications for thedelimitationofAmazonianareasofendemism.JournalofBiogeography,38,681–696.
Naka, L.N., Bechtoldt, C.L., Henriques L. Magalli Pinto & Brumfield, R.T. (2012) The Role ofPhysicalBarriers in theLocationofAvianSutureZones in theGuianaShield,NorthernAmazonia.TheAmericanNaturalist,179,E115–E132.
Nelson,B.W.,Ferreira,C.A.C.,daSilva,M.F.&Kawasaki,M.L.(1990)Endemismcentres,refugiaandbotanicalcollectiondensityinBrazilianAmazonia.Nature,345,714–716.
deOliveira,D.P.,deCarvalho,V.T.&Hrbek,T.(2016)CrypticdiversityinthelizardgenusPlica(Squamata):phylogeneticdiversityandAmazonianbiogeography.ZoologicaScripta,45,630–641.
Olson,D.M.,Dinerstein,E.,Wikramanayake,E.D.,Burgess,N.D.,Powell,G.V.N.,Underwood,E.C.,D’amico,J.A.,Itoua,I.,Strand,H.E.,Morrison,J.C.,Loucks,C.J.,Allnutt,T.F.,Ricketts,T.H.,Kura, Y., Lamoreux, J.F.,Wettengel,W.W., Hedao, P. & Kassem, K.R. (2001) TerrestrialEcoregionsof theWorld:ANewMapofLifeonEarth:Anewglobalmapof terrestrialecoregionsprovidesaninnovativetoolforconservingbiodiversity.BioScience,51,933–938.
Ortega-Andrade,H.M.,Rojas-Soto,O.R.,Valencia,J.H.,EspinosadelosMonteros,A.,Morrone,J.J.,Ron,S.R.&Cannatella,D.C.(2015)InsightsfromIntegrativeSystematicsRevealCrypticDiversity inPristimantisFrogs (Anura:Craugastoridae) fromtheUpperAmazonBasin.PLoSONE,10,e0143392.
Pebesma, E. J. (2004) Multivariable geostatistics in S: the gstat package. Computers &Geosciences,30,683-691.
Pigot, A.L. & Tobias, J.A. (2015) Dispersal and the transition to sympatry in vertebrates.ProceedingsoftheRoyalSocietyB:BiologicalSciences,282,20141929.
Pimm,S.L., Jenkins,C.N.,Abell,R.,Brooks,T.M.,Gittleman, J.L., Joppa,L.N.,Raven,P.H.,Roberts,C.M. & Sexton, J.O. (2014) The biodiversity of species and their rates of extinction,
Appendix–BiogeographyofAmazonianAnurans
252
distribution,andprotection.Science,344.Puillandre, N., Lambert, A., Brouillet, S. & Achaz, G. (2012) ABGD, Automatic Barcode Gap
Discoveryforprimaryspeciesdelimitation.MolecularEcology,21,1864–1877.RDevelopmentCoreTeam(2016)R:ALanguageandEnvironmentforStatisticalComputing.R
FoundationforStatisticalComputingViennaAustria,0,{ISBN}3-900051-07-0.Ribas,C.C.,Aleixo,A.,Nogueira,A.C.R.,Miyaki,C.Y.&Cracraft, J. (2012)Apalaeobiogeographic
model for biotic diversification within Amazonia over the past three million years.ProceedingsoftheRoyalSocietyB:BiologicalSciences,279,681LP-689.
Señaris,J.C.&MacCulloch,R.D.(2005)Amphibians.ChecklistoftheTerrestrialVertebratesoftheGuianaShield(ed.byT.Hollowell)andR.P.Reynolds),pp.9–23.BulletinoftheBiologicalSocietyofWashingtonno.13.
Da Silva, J.M.C., Rylands, A.B.&Da Fonseca, G.A.B. (2005) The fate of theAmazonian areas ofendemism.ConservationBiology,19,689–694.
Sioli,H.(1984)TheAmazon:LimnologyandLandscapeEcologyofaMightyTropicalRiveranditsBasin,W.Junk,Dordrecht,TheNetherlands.
Valle,D.,Baiser,B.,Woodall,C.W.&Chazdon,R.(2014)DecomposingbiodiversitydatausingtheLatentDirichletAllocationmodel,aprobabilisticmultivariatestatisticalmethod.Ecologyletters,17,1591–1601.
Vedovato, L.B., Fonseca, M.G., Arai, E., Anderson, L.O. & Aragão, L.E.O.C. (2016) The extent of2014forestfragmentationintheBrazilianAmazon.RegionalEnvironmentalChange,1–6.
Vences, M., Thomas, M., van der Meijden, A., Chiari, Y. & Vieites, D.R. (2005) Comparativeperformanceofthe16SrRNAgeneinDNAbarcodingofamphibians.FrontiersinZoology,2,1–12.
Vilhena, D.A. & Antonelli, A. (2015) A network approach for identifying and delimitingbiogeographicalregions.NatureCommunications,6,6848.
Wallace, A.R. (1852) On the monkeys of the Amazon. Proceedings of the Zoological Society ofLondon,20,107–110.
Wells,K.D.(2010)TheEcologyandBehaviorofAmphibians,UniversityofChicagoPress,Chicago.Wynn, A. & Heyer,W.R. (2001) Do geographicallywidespread species of tropical amphibians
exist? An estimate of genetic relatedness within the neotropical frog Leptodactylusfuscus(Schneider,1799)(Anura,Leptodactylidae).TropicalZoology,14,255–285.
Yu, D.W., Ji, Y., Emerson, B.C., Wang, X., Ye, C., Yang, C. & Ding, Z. (2012) Biodiversity soup:metabarcoding of arthropods for rapid biodiversity assessment and biomonitoring.MethodsinEcologyandEvolution,3,613–623.
Zeisset, I. & Beebee, T.J.C. (2008) Amphibian phylogeography: a model for understandinghistoricalaspectsofspeciesdistributions.Heredity,101,109–119.
Zizka,A.,Steege,H.Ter,Pessoa,M.D.C.R.&Antonelli,A.(2016)Findingneedlesinthehaystack:WheretolookforrarespeciesintheAmericantropics.Ecography.
Author:GuilhemSommeria-Klein
Title: Frommodels to data: understanding biodiversity patterns from environmentalDNAdata
Supervisors:JérômeChave,HélèneMorlon
Abstract: Integrative patterns of biodiversity, such as the distribution of taxaabundances and the spatial turnover of taxonomic composition, have been underscrutiny from ecologists for a long time, as they offer insight into the general rulesgoverning the assembly of organisms into ecological communities. Thank to recentprogressinhigh-throughputDNAsequencing,thesepatternscannowbemeasuredinafast and standardized fashion through the sequencing of DNA sampled from theenvironment (e.g. soil or water), instead of relying on tedious fieldwork and rarenaturalistexpertise.Theycanalsobemeasuredforthewholetreeoflife,includingthevast and previously unexplored diversity ofmicroorganisms. Taking full advantage ofthisnewtypeofdataischallenginghowever:DNA-basedsurveysareindirect,andsufferas such from many potential biases; they also produce large and complex datasetscompared to classical censuses. The first goal of this thesis is to investigate howstatisticaltoolsandmodelsclassicallyusedinecologyorcomingfromotherfieldscanbeadapted to DNA-based data so as to better understand the assembly of ecologicalcommunities.The secondgoal is toapply theseapproaches to soilDNAdata from theAmazonianforest,theEarth’smostdiverselandecosystem.
Twobroadtypesofmechanismsareclassically invokedtoexplaintheassemblyofecologicalcommunities:‘neutral’processes,i.e.therandombirth,deathanddispersalof organisms, and ‘niche’ processes, i.e. the interaction of the organisms with theirenvironment and with each other according to their phenotype. Disentangling therelative importance of these two types of mechanisms in shaping taxonomiccompositionisakeyecologicalquestion,withmanyimplicationsfromestimatingglobaldiversity to conservation issues. In the first chapter, thisquestion is addressedacrossthetreeoflifebyapplyingtheclassicalanalytictoolsofcommunityecologytosoilDNAsamplescollectedfromvariousforestplotsinFrenchGuiana.
The second chapter focuses on the neutral aspect of community assembly. AmathematicalmodelincorporatingthekeyelementsofneutralcommunityassemblyhasbeenproposedbyS.P.Hubbellin2001,makingitpossibletoinferquantitativemeasuresofdispersalandofregionaldiversityfromthelocaldistributionoftaxaabundances.Inthischapter,thebiasesintroducedwhenreconstructingthetaxaabundancedistributionfromenvironmentalDNAdataarediscussed,andtheirimpactontheestimationofthedispersalandregionaldiversityparametersisquantified.
The third chapter focuses on how non-random differences in taxonomiccomposition across a group of samples, resulting from various community assemblyprocesses,canbeefficientlydetected,representedandinterpreted.Amethodoriginallydesignedtomodelthedifferenttopicsemergingfromasetoftextdocumentsisappliedhere to soilDNAdata sampled along a grid over a large forest plot in FrenchGuiana.Spatialpatternsofsoilmicroorganismdiversityaresuccessfullycaptured,andrelatedtofinevariationsinenvironmentalconditionsacrosstheplot.Finally,theimplicationsofthethesisfindingsarediscussed.Inparticular,thepotentialoftopicmodellingforthemodellingofDNA-basedbiodiversitydataisstressed.
Keywords:spatialbiodiversitypatterns,speciesabundancedistribution,beta-diversity,environmental DNA, metabarcoding, soil biodiversity, tropical forest, French Guiana,statistical modeling of biodiversity, neutral theory of biodiversity, topic modeling
Auteur:GuilhemSommeria-KleinTitre : Desmodèle aux données: comprendre la structure de la biodiversité à partir de l'ADNenvironnementalDirecteursdethèse:JérômeChave,HélèneMorlonLieuetdatedesoutenance:UniversitéPaulSabatier,Toulouse,le14septembre2017
Résumé:Ladistributiondel’abondancedesespècesenunsite,etlasimilaritédelacompositiontaxonomiqued’unsiteà l’autre,sontdeuxmesuresdelabiodiversitéayantservidelonguedatede base empirique aux écologues pour tenter d’établir les règles générales gouvernantl’assemblage des communautés d’organismes. Pour ce type de mesures intégratives, leséquençage haut-débit d'ADN prélevé dans l'environnement («ADN environnemental»)représenteunealternativerécenteetprometteuseauxobservationsnaturalistestraditionnelles.Cette approche présente l’avantage d’être rapide et standardisée, et donne accès à un largeéventaildetaxonsmicrobiensjusqu’alorsindétectables.Toutefois,cesjeuxdedonnéesdegrandetailleà la structurecomplexe sontdifficilesàanalyser, et le caractère indirectdesobservationscomplique leur interprétation. Le premier objectif de cette thèse est d’identifier les modèlesstatistiques permettant d’exploiter ce nouveau type de données pour mieux comprendrel’assemblagedescommunautés.Ledeuxièmeobjectifestdetesterlesapprochesretenuessurdesdonnéesdebiodiversitédusolenforêtamazonienne,collectéesenGuyanefrançaise.
Deux grands types de processus sont invoqués pour expliquer l'assemblage descommunautésd’organismes : lesprocessus "neutres", indépendantsde l’espèce considérée,quesont la naissance, la mort et la dispersion des organismes, et les processus liés à la nicheécologiqueoccupéeparlesorganismes,c'est-à-direlesinteractionsavecl’environnementetentreorganismes.Démêlerl'importancerelativedecesdeuxtypesdeprocessusdansl’assemblagedescommunautés est une question fondamentale en écologie ayant de nombreuses implications,notamment pour l'estimation de la biodiversité et la conservation. Le premier chapitre abordecettequestionà travers la comparaisond’échantillonsd'ADNenvironnementalprélevésdans lesol de diverses parcelles forestières en Guyane française, via les outils classiques d’analysestatistiqueenécologiedescommunautés.
Le deuxième chapitre se concentre sur les processus neutres d’assemblages descommunautés. S.P. Hubbell a proposé en 2001 un modèle décrivant ces processus de façonprobabiliste,etpouvantêtreutilisépourquantifierlacapacitédedispersiondesorganismesainsique leur diversité à l’échelle régionale simplement à partir de la distribution d’abondance desespèces observée en un site. Dans ce chapitre, les biais liés à l’utilisation de l’ADNenvironnementalpourreconstituerladistributiond’abondancedesespècessontdiscutés,etsontquantifiésauregarddel’estimationdesparamètresdedispersionetdediversitérégionale.
Le troisièmechapitre seconcentresur lamanièredont lesdifférencesnon-aléatoiresdecompositiontaxonomiqueentresiteséchantillonnés,résultantdesdiversprocessusd’assemblagedes communautés, peuvent être détectées, représentées et interprétés. Un modèle statistiqueconçuàl'originepourclassifierlesdocumentsàpartirdesthèmesqu’ilsabordentesticiappliquéà des échantillons de sol prélevés selon une grille régulière au sein d’une grande parcelleforestière. La structure spatiale de la composition taxonomique des microorganismes estcaractériséeavecsuccèsetreliéeauxvariations finesdesconditionsenvironnementalesauseindelaparcelle.
Les implications des résultats de la thèse sont enfin discutées. L'accent est mis enparticulier sur lepotentieldesmodèles thématique («topicmodels»)pour lamodélisationdesdonnéesdebiodiversitéissuesdel’ADNenvironnemental.
Mots-clés : structure spatiale de la biodiversité, distribution d’abondance d’espèces, diversitébeta,ADNenvironnemental,metabarcoding,biodiversitédusol,forêttropicale,Guyanefrançaise,modélisationstatistiquedelabiodiversité,théorieneutredelabiodiversité,topicmodeling
Disciplineadministrative:EcologieIntituléetadressedulaboratoire:LaboratoireEvolution&DiversitéBiologique(EDB)UMR5174(CNRS/UPS/IRD),UniversitéPaulSabatier,Bâtiment4R1118routedeNarbonne,31062Toulousecedex9,France.