INVITED PAPER LinkingStatisticaland EcologicalTheory ... · hierarchical Dirichlet process. Using...

42
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. INVITED PAPER Linking Statistical and Ecological Theory: Hubbell’s Unified Neutral Theory of Biodiversity as a Hierarchical Dirichlet Process By Keith Harris , Todd L. Parsons , Umer Z. Ijaz , Leo Lahti , Ian Holmes, and Christopher Quince ABSTRACT | Neutral models which assume ecological equiv- alence between species provide null models for community assembly. In Hubbell’s unified neutral theory of biodiversity (UNTB), many local communities are connected to a single metacommunity through differing immigration rates. Our ability to fit the full multisite UNTB has hitherto been limited by the lack of a computationally tractable and accurate algorithm. We show that a large class of neutral models with this mainland-island structure but differing local community dynamics converge in the large population limit to the hierarchical Dirichlet process. Using this approximation we developed an efficient Bayesian fitting strategy for the multisite UNTB. We can also use this approach to distinguish between neutral local community assembly given a nonneutral metacommunity distribution and the full UNTB where the metacommunity too assembles neutrally. We applied this fitting strategy to both tropical trees and a data set comprising 570 851 sequences from 278 human gut microbiomes. The tropical tree data set was consistent with the UNTB but for the human gut neutrality was rejected at the whole community level. However, when we applied the algorithm to gut microbial species within the same taxon at different levels of taxonomic resolution, we found that species abundances within some genera were almost consistent with local community assembly. This was not true at higher taxonomic ranks. This suggests that the gut microbiota is more strongly niche constrained than macroscopic organisms, with different groups adopting differ- ent functional roles, but within those groups diversity may at least partially be maintained by neutrality. We also observed a negative correlation between body mass index and immigra- tion rates within the family Ruminococcaceae. This provides a novel interpretation of the impact of obesity on the human microbiome as a relative increase in the importance of local Manuscript received May 29, 2014; revised October 14, 2014; accepted March 13, 2015. The work of K. Harris was supported by a Unilever Research Grant. The work of T. L. Parsons was supported by the CNRS, and was also supported in part by the Fondation Sciences Mathe ´matiques de Paris. The work of U. Z. Ijaz was supported by NERC IRF NE/L011956/1. The work of L. Lahti was supported by the Academy of Finland (Decision 256950). The work of I. Holmes was supported by NIH R01 Grant HG004483. The work C. Quince was supported by an MRC Fellowship MR/M50161X/1 as part of the CLIMB consortium MR/L015080/1. K. Harris is with the School of Mathematics and Statistics, University of Sheffield, Sheffield S10 2TN, U.K. (e-mail: [email protected]). T. L. Parsons is with the Laboratoire de Probabilite´s et Mode `les Ale ´atoires, Paris, France. He is also with the Colle `ge de France, Center for Interdisciplinary Research in Biology, 75005 Paris, France (e-mail: [email protected]). U. Z. Ijaz is with the Infrastructure and Environment Research Division, School of Engineering, University of Glasgow, Glasgow, G12 8LT, U.K. (e-mail: [email protected]). L. Lahti is with the Department of Veterinary Biosciences, University of Helsinki, Helsinki 0170, Finland. He is also with the Laboratory of Microbiology, Wageningen University, Wageningen, Netherlands (e-mail: [email protected]). I. Holmes is with the Department of Bioengineering, University of California, Berkeley, CA 94720 USA (e-mail: [email protected]). C. Quince is with Warwick Medical School, University of Warwick, Coventry, CV4 7AL, U.K. (e-mail: [email protected]). Digital Object Identifier: 10.1109/JPROC.2015.2428213 0018-9219 Ó 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. | Proceedings of the IEEE 1

Transcript of INVITED PAPER LinkingStatisticaland EcologicalTheory ... · hierarchical Dirichlet process. Using...

Page 1: INVITED PAPER LinkingStatisticaland EcologicalTheory ... · hierarchical Dirichlet process. Using this approximation we developed an efficient Bayesian fitting strategy for the ...

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

INV ITEDP A P E R

Linking Statistical andEcological Theory: Hubbell’sUnified Neutral Theory ofBiodiversity as a HierarchicalDirichlet Process

By Keith Harris, Todd L. Parsons, Umer Z. Ijaz, Leo Lahti,

Ian Holmes, and Christopher Quince

ABSTRACT | Neutral models which assume ecological equiv-

alence between species provide null models for community

assembly. In Hubbell’s unified neutral theory of biodiversity

(UNTB), many local communities are connected to a single

metacommunity through differing immigration rates. Our

ability to fit the full multisite UNTB has hitherto been limited

by the lack of a computationally tractable and accurate

algorithm. We show that a large class of neutral models with

this mainland-island structure but differing local community

dynamics converge in the large population limit to the

hierarchical Dirichlet process. Using this approximation we

developed an efficient Bayesian fitting strategy for the

multisite UNTB. We can also use this approach to distinguish

between neutral local community assembly given a nonneutral

metacommunity distribution and the full UNTB where the

metacommunity too assembles neutrally. We applied this

fitting strategy to both tropical trees and a data set comprising

570 851 sequences from 278 human gut microbiomes. The

tropical tree data set was consistent with the UNTB but for the

human gut neutrality was rejected at the whole community

level. However, when we applied the algorithm to gut microbial

species within the same taxon at different levels of taxonomic

resolution, we found that species abundances within some

genera were almost consistent with local community assembly.

This was not true at higher taxonomic ranks. This suggests that

the gut microbiota is more strongly niche constrained than

macroscopic organisms, with different groups adopting differ-

ent functional roles, but within those groups diversity may at

least partially be maintained by neutrality. We also observed a

negative correlation between body mass index and immigra-

tion rates within the family Ruminococcaceae. This provides a

novel interpretation of the impact of obesity on the human

microbiome as a relative increase in the importance of local

Manuscript received May 29, 2014; revised October 14, 2014; accepted March 13, 2015.The work of K. Harris was supported by a Unilever Research Grant. The work ofT. L. Parsons was supported by the CNRS, and was also supported in part by theFondation Sciences Mathematiques de Paris. The work of U. Z. Ijaz was supported byNERC IRF NE/L011956/1. The work of L. Lahti was supported by the Academy ofFinland (Decision 256950). The work of I. Holmes was supported by NIH R01 GrantHG004483. The work C. Quince was supported by an MRC Fellowship MR/M50161X/1as part of the CLIMB consortium MR/L015080/1.K. Harris is with the School of Mathematics and Statistics, University of Sheffield,Sheffield S10 2TN, U.K. (e-mail: [email protected]).T. L. Parsons is with the Laboratoire de Probabilites et Modeles Aleatoires, Paris,France. He is also with the College de France, Center for Interdisciplinary Research inBiology, 75005 Paris, France (e-mail: [email protected]).U. Z. Ijaz is with the Infrastructure and Environment Research Division, School ofEngineering, University of Glasgow, Glasgow, G12 8LT, U.K. (e-mail:[email protected]).L. Lahti is with the Department of Veterinary Biosciences, University of Helsinki,Helsinki 0170, Finland. He is also with the Laboratory of Microbiology, WageningenUniversity, Wageningen, Netherlands (e-mail: [email protected]).I. Holmes is with the Department of Bioengineering, University of California, Berkeley,CA 94720 USA (e-mail: [email protected]).C. Quince is with Warwick Medical School, University of Warwick, Coventry, CV4 7AL,U.K. (e-mail: [email protected]).

Digital Object Identifier: 10.1109/JPROC.2015.2428213

0018-9219 ! 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

| Proceedings of the IEEE 1

Page 2: INVITED PAPER LinkingStatisticaland EcologicalTheory ... · hierarchical Dirichlet process. Using this approximation we developed an efficient Bayesian fitting strategy for the ...

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

growth versus external immigration within this key group of

carbohydrate degrading organisms.

KEYWORDS | Diversity; ecological modelling; hierarchical

Dirichlet process; Hubbell’s unified neutral theory of biodiver-

sity; microbial communities

I . INTRODUCTION

A key question in ecology is what maintains speciesdiversity in communities. The classical view is that everyspecies occupies a distinct niche and the species observedin a community are then determined by the niches present.The niche itself is viewed as an n-dimensional hyper-volume in a space of abiotic and biotic environmentalvariables [1]. If two species occupy the same niche thenone will outcompete the other [2]. This viewpoint has beenchallenged by neutral theory. Neutral models of speciesabundance combine stochastic population dynamics withthe assumption of ecological equivalence between species,formally defined as equivalent forms for all per capitademographic rates, e.g., birth and death. Ecologicalequivalence is assumed to operate between species witha similar functional role deriving from the same broadfunctional group or guild of species [3]. The result of theneutrality assumption is that rather than one speciesalways outcompeting another the abundances within theneutral guild fluctuate. The diversity at a single site is thengenerated as a balance between the immigration of newspecies and local extinction [4]. In Hubbell’s UnifiedNeutral Theory of Biodiversity (UNTB) these ideas wereextended to multiple sites [5] using a mainland-islandstructure [6]. The local communities experiencing neutraldynamics are coupled through migration to a metacom-munity where neutral dynamics are again assumed butdiversity is generated through speciation on a longertime-scale.

The relative importance of niche versus neutralprocesses in macroscopic organisms is controversial. Thefirst attempts to address this question fitted the UNTB tospecies abundance distributions (SADs) from a single siteand compared model fit to nonneutral alternatives, e.g.,log-normal or log-series [7]. The development of Etienne’sgenealogical approach, which allowed the calculation of anexact sampling formula or likelihood for a single-siteUNTB model [8], was key in allowing the UNTB to be fitefficiently to abundance data [8], [9]. Maximising thislikelihood with respect to the model parameters generatesa model fit. However, single samples do not provideenough information to reliably fit the UNTB [10] and it hasbeen demonstrated that niche models can generateidentical SADs to a single-site neutral model [11]. Amore powerful test of the UNTB is to fit a data set frommultiple sites simultaneously assuming the same meta-community but different immigration rates. The genea-logical approach has been generalised to multiple sites

with identical migration rates [12] but for the fully generalcase of multiple sites with different immigration rates theresulting sampling formula is computationally intractablefor more than a few sites [13]. Instead, an approximatetwo-stage method has to be used [14]–[16].

If the importance of neutrality is still an open questionfor macroscopic organisms then it is even more pertinentfor microbes. It is only the recent coupling of molecularmethods for characterising species identity with nextgeneration sequencing that has allowed the efficientdetermination of microbial community structure in situ[17]. However, we are now regularly generating data setscomprising hundreds of sites and tens of thousands ofsampled individuals per site [18]. In order to accurately fitthe multisite UNTB to these data we developed analternative to the likelihood based genealogical approach.We are able to show that the UNTB is, in the limit of largepopulation sizes, equivalent to a model from machinelearning, the hierarchical Dirichlet process (HDP) [19].Moreover, our result is more general than the UNTB, asthis limit applies irrespective of the exact local communitydynamics, provided species are neutral and the totalcommunity size is fixed. We can use this result to adapt theexisting Bayesian fitting strategy for the HDP to theproblem of fitting the UNTB [15].

Using this strategy it is possible to efficiently fit eventhe largest data sets in a reasonable amount of time withthe added advantage of generating full posterior distribu-tions over the parameters rather than just a maximumlikelihood prediction. This method also reconstructs themetacommunity distribution enabling us to separate thekey question of whether a community appears neutral intotwo parts. We can generate samples from the full neutralmodel with our fitted parameters and, as in [12], comparetheir likelihood with that of the observed samples to testfor neutrality, but we can also generate samples given theobserved metacommunity and, hence, test for neutral localcommunity assembly alone.

We will validate this method by applying it to twenty-nine tropical tree plots from Panama [20]. We will thenuse it to determine the extent to which gut microbialcommunities are neutrally assembled [18]. The human gutis not a closed system, being constantly subjected toimmigration events mainly through the diet, hence ametacommunity description is appropriate. However, it isnot obvious for microbes at what level we would expectneutrality to operate, as different types of microorganismsperform very different roles. Indeed, there is evidence ofclustering of gut microbiota into different enterotypes[21]–[23], which implies nonneutral structuring at thewhole community level. We will address this issue bysubdividing the species according to their taxa at multipletaxonomic levels. There is increasing evidence of ecologialcoherence at higher taxonomic levels for bacteria, withparticular taxonomic groupings correlating with broadtraits and metabolic functions [24]–[26]. Thus, even

Harris et al. : Linking Statistical and Ecological Theory

2 Proceedings of the IEEE |

Page 3: INVITED PAPER LinkingStatisticaland EcologicalTheory ... · hierarchical Dirichlet process. Using this approximation we developed an efficient Bayesian fitting strategy for the ...

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

though within a species there may be variability in genecontent and the precise niche occupied by strains, e.g.,commensal and pathogenic Escherichia coli [27], at higherlevels an ecological signal is preserved [24]. We will testwhether this signal leads to species within taxa beingdistributed neutrally in the human gut.

This is the first time that the full multisite neutralmodel has been fit to microbial community data. Earlierstudies fitted the proportion of sites that a given specieswas observed in as a function of its abundance in themetacommunity [28]. However, this approach modelslocal neutral community assembly only, cannot allow fordifferent immigration rates between sites and does notutilise the actual abundances of species, only theirpresence or absence. Similarly, although [29] showedthat the bacterial taxa-abundance distributions in tree-holes scaled across sites in a way that was consistent withthe neutral model, they were not fitting to the actualspecies abundances directly, but rather the shapes of thosedistributions in individual sites. Recently, an attempt wasmade to determine the degree of neutrality in human gutmicrobiota but again by fitting the single-site distributiononly [30]. By testing for neutrality at both the local andmetacommunity level, and by resolving to differenttaxonomic groups, we will address the question of whatis structuring the newly revealed microbial diversity of thehuman gut.

II . METHODS

A. Hubbell’s Unified Neutral Theory ofBiodiversity (UNTB)

The UNTB separates the dynamics in the metacommu-nity from that in the local communities but both areneutral. Assume that there are M local communitiesindexed i ¼ 1; . . . ;M each with a fixed number of Ni

individuals. Each iteration of the local communitydynamics for site i comprises two steps: choose anindividual at random and remove it; with probability mi

migration occurs and this individual is replaced by arandomly chosen member of the metacommunity or withprobability 1" mi it is replaced by a randomly chosenmember of local community i. A generation in the modelconsists of replacing each individual on average oncewhich will require Ni iterations of these two steps. Thesedynamics will generate a stochastic Markov chain for theabundance of each species [31], which given a sufficientlylong time will converge to a stationary, or time-invariant,distribution. In the UNTB it is assumed that the localcommunities are at this stationary state which we willdenote as a vector for each site !!i, with elementsð!i;1; . . . ; !i;SÞ giving the probability of observing aparticular species at site i. The two parameters mi and Ni

can be conveniently replaced by a single immigration rateIi ¼ ðmi=ð1" miÞÞðNi " 1Þ [9]. The parameter Ii controls

the coupling of the local community to the metacommu-nity. As Ii !1, the local community stationary distribu-tion will approach the metacommunity distribution andthe number of species at that site will increase, while asIi ! 0, the local community will become dominated by asingle species.

In the metacommunity equivalent neutral dynamicsoperate but with new species generated through speciationwith a probability ". This occurs on a longer time-scalethan the local community dynamics so that the metacom-munity can be assumed fixed relative to the localcommunities. Just as in the local communities where Ii ispreferred to mi, it is more convenient to use the speciationrate (or fundamental biodiversity number) to parameterisethe metacommunity distribution, # ¼ ð"=ð1" "ÞÞðN " 1Þ[9], where N is the fixed number of individuals in themetacommunity. The parameter # can be viewed asthe rate at which new individuals are appearing in themetacommunity as a result of speciation. As it increases,the total number of species, which we will denote S, in themetacommunity also increases and the species abundancedistribution becomes increasingly skewed to rare indivi-duals. The final component of the UNTB is to realise thatthe observed data, the M% S frequency matrix X withelements xij giving the number of times species j isobserved at site i, is a sample from the local community[9]. The simplest approach is to assume sampling withreplacement so that the multinomial distribution describesthe vector of observations at a given site

!Xi & MNðJi; !!iÞ (1)

where Ji ¼PS

j¼1 xij is the sample size.

B. HDP Limit to Neutral MetacommunitiesIn the SI Appendix we show that a wide class of neutral

models including the UNTB converge in the largepopulation limit to the same hierarchical Dirichlet process(HDP) approximation. This approximation captures theessential hypothesis of the UNTBVnamely neutrality,finite populations, and multiple panmictic geographicallyisolated populations linked by rare migrationVwhilstbeing robust to the specific details of the local communitydynamics. Analogous to the relationship between King-man’s coalescent, Kimura’s diffusion, and the Wright-Fisher model and its many generalisations (e.g., Cannings’models), we find that under suitable conditions on thehigher moments of the individual reproductive output(namely, that when one considers the correspondinggenealogical process, the coalescent, mergers of three ormore ancestral lines happen with vanishingly smallprobability as the population size tends to infinity), it issufficient to introduce local effective population sizes foreach deme to accurately approximate many disparatemodels.

Harris et al. : Linking Statistical and Ecological Theory

| Proceedings of the IEEE 3

Page 4: INVITED PAPER LinkingStatisticaland EcologicalTheory ... · hierarchical Dirichlet process. Using this approximation we developed an efficient Bayesian fitting strategy for the ...

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

For example, just as Hubbell’s UNTB has populationdynamics analogous to the Moran model of populationgenetics, we could equally well consider a ‘‘Wright–Fisher’’neutral model, in which all individuals perish at the end ofeach time step, but each leaves behind a Poissondistributed number of offspring (conditioned on the totalpopulation size). While qualitatively different, this modelretains the notion of neutrality: each individual is equallylikely to be the parent of a randomly chosen individual inthe next generation. With an appropriate choice of timerescaling (see Example 2 in the SI), this model also givesrise to the HDP in the large population limit, much as boththe Moran and Wright–Fisher models give rise to the samediffusive limits for appropriate choices of effectivepopulation size. By contrast, if we consider the highly-skewed reproduction model in which the offspring of onerandomly chosen individual replaces all other individuals,we do not obtain the HDP, even though we preserve theneutral hypothesisVas we discuss in the SI (Section 1.2),we require that the offspring distribution is not so fat-tailed that one individual is reasonably likely to be parentto a significant portion of the next generation. In this lattercase, there is still a well-defined limit, but it is poorlyunderstood; in particular, there is no known analogue tothe Antoniak (6) upon which our approach rests.

It has been shown previously that for large localpopulation sizes, and assuming a fixed finite-dimensionalmetacommunity distribution with S species present thenthe local community distribution, !!i, can be approximatedby a Dirichlet distribution [28], [32]. The parameters ofthis Dirichlet distribution are proportional to theimmigration rate multiplied by the metacommunitydistribution

!!ijIi; !$ & DirðIi!$Þ (2)

where !$ ¼ ð$1; . . . ; $SÞ is the relative frequency of eachspecies in the metacommunity. In the SI Appendix (seeSection 1.4: Corollary 1), we generalize this to the casewhere as for the UNTB, there is a potentially infinitenumber of species that can be observed in the localcommunity. Then the stationary distribution is a Dirichletprocess (DP) [33]

!!ijIi; !$ & DPðIi; !$Þ: (3)

The DP can be viewed as an infinite dimensionalgeneralization of the Dirichlet. It generates an infiniteset of samples from the base distribution, which in thiscase is the metacommunity !$, while the concentrationparameter, which is Ii here, controls the distribution ofweights of those samples. Indeed, these weights are

generated by a stick-breaking process (see below) withparameter Ii.

In the metacommunity, a Dirichlet process also applies(SI Appendix: Section 1.5), but now the base distribution issimply a uniform distribution over arbitrary species labels,and the concentration parameter is the biodiversityparameter, #. This is not a new observation, as it isimplicit in the use of Ewens’s sampling formula [34] forthe metacommunity in Etienne’s approach [9]. In this casethe metacommunity distribution is purely the stick-breaking process. Define an infinite set of randomvariables drawn from a beta distribution f$0kg

1k¼1

$0k & Betað1; #Þ: (4)

Then we can define the kth element of the metacommu-nity vector as

$k ¼ $0k:Yk"1

l¼1

1" $0l! "

: (5)

We will denote this process !$ & Stickð#Þ. Since the localcommunities are also DPs the model becomes a hierar-chical Dirichlet process (HDP) in the parlance of machinelearning [19]. The stick-breaking process is one way toview the DP but an alternative perspective can be obtainedby considering successive draws from a DP, which yieldsthe Chinese restaurant process, where each new draw hasa probability proportional to the number of individualsalready assigned to an existing type (which in our casewould be species) of deriving from that type and aprobability proportional to # of deriving from a previouslyunseen type (or species). From this process the Antoniakequation for the number of types or species S observedfollowing N draws from a DP with concentrationparameter # can be derived

PðSj#;NÞ ¼ sðN; SÞ#S Gð#ÞGð#þ NÞ

(6)

where sðN; SÞ is the unsigned Stirling number of the firstkind [35] and GðxÞ denotes the gamma function.

C. Gibbs Sampler for the Neutral-HDP ModelCombining the model elements described above, we

obtain the complete Neutral-HDP model as

!$j# & Stickð#Þ;!!ijIi; !$ &DPðIi; !$Þ;!Xij!!i; Ji &MNðJi; !!iÞ:

Harris et al. : Linking Statistical and Ecological Theory

4 Proceedings of the IEEE |

Page 5: INVITED PAPER LinkingStatisticaland EcologicalTheory ... · hierarchical Dirichlet process. Using this approximation we developed an efficient Bayesian fitting strategy for the ...

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

To this we add gamma hyper-priors for the biodiversityparameter, #, and the immigration rates, Ii

#j%; & &Gammað%; &Þ (7)

Iij'; ( &Gammað'; (Þ (8)

where %, &, ' and ( are all constants.In any given sample although the potential number of

species is infinite we only observe S different types. It isconvenient therefore to represent the model in terms ofthese finite dimensional number of types and one furtherclass corresponding to all unobserved species. We willrepresent the proportions of the S observed speciesexplicitly as $k with k ¼ 1; . . . ; S and the unrepresentedcomponent as $u ¼

PLk¼Sþ1 $k, in the limit as L!1. In

this finite dimensional representation we can determinethe species distributions in the local communities

!!i & DirðIi$1; . . . ; Ii$S; Ii$uÞ: (9)

We can then marginalize the local community distribu-tions and derive the probability of the observed frequen-cies given the metacommunity distribution !$ and theimmigration rates Ii, i ¼ 1; . . . ;M

PðXj !$; I1; . . . ; IMÞ

¼YM

i¼1

Ji!

Xi1! ( ( (XiS!

GðIiÞGðJi þ IiÞ

YS

j¼1

Gðxij þ Ii$jÞGðIi$jÞ

: (10)

The observation that the UNTB is actually a hierarchi-cal Dirichlet process allows us to utilise an efficient Gibbssampling method to fit it. A Gibbs sampler is a type ofBayesian Markov chain Monte Carlo (MCMC) algorithm.An MCMC algorithm generates samples from the posteriordistribution of the parameters given the data [36], which inthis case is Pð#; I1; . . . ; IMjXÞ. In general, the posterior istoo complex to sample from directly and, in Gibbssampling, samples are instead generated from the condi-tional distribution of one parameter given all the others.These full conditionals are often much simpler than thejoint posterior distribution, and, crucially, if repeatedsamples are taken in this way, then they will converge ontothe posterior after sufficient iterations. By introducingextra auxiliary variables, it is possible to devise an efficientGibbs sampler for the UNTB-HDP approximation. One ofthese auxiliary variables is the metacommunity distribu-tion itself !$ and the other is the number of ancestors in sitei that gave rise to species j, denoted Tij, i.e., the number ofindependent immigration events from the metacommu-

nity. Using these variables a Gibbs sampling iterationproceeds as follows:

1) Sample the biodiversity parameter # from theconditional

Pð#jS; TÞ / sðT; SÞ#S Gð#ÞGð#þ TÞGammað#j%; &Þ

(11)

where T ¼PM

i¼1

PSj¼1 Tij. The first part of the

above expression derives from the Antoniak (6)for the number of unique species observed, S,when we sample T ancestors from the metacom-munity Dirichlet process with concentrationparameter, #, the second part is simply the prioron # [35]. To sample from this we use the auxiliaryvariable approach of [37].

2) Sample the metacommunity distribution

!$ ¼ ð$1; $2; . . . ; $S; $uÞ & DirðT(1; T(2; . . . ; T(S; #Þ(12)

where T(j ¼PM

i¼1 Tij. This exploits the conjugacybetween the stick breaking prior for the meta-community, !$, and the likelihood of the ancestornumbers Tij [19].

3) Sample the immigration rates

PðIijTijÞ /GðIiÞ

GðJi þ IiÞITi(

i GammaðIij'; "Þ: (13)

This is again just Antoniak’s equation multipliedby the prior but here the number of unique typesobserved, are the ancestors from the metacom-munity, Ti( ¼

PSj¼1 Tij, in Ji samples from the local

community DP with concentration parameter, Ii.4) Sample the ancestral states

PðTijjxij; Ii; $jÞ ¼GðIi$jÞ

Gðxij þ Ii$jÞsðxij; TijÞðIi$jÞTij

(14)

where again we recognise the Antoniak equation.This summarises the Gibbs sampling but in SIAppendix 2 we rigorously derive the aboveconditional distributions.

In general we found that this MCMC procedure quicklyconverges but to ensure that we were sampling from thestationary distribution we generated either 50 000 Gibbs

Harris et al. : Linking Statistical and Ecological Theory

| Proceedings of the IEEE 5

Page 6: INVITED PAPER LinkingStatisticaland EcologicalTheory ... · hierarchical Dirichlet process. Using this approximation we developed an efficient Bayesian fitting strategy for the ...

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

samples for each fitted data set and discarded the first25 000 iterations as burn-in or for the human gutmicrobiota when testing multiple taxa we used 10 000Gibbs sample and discarded 5000 iterations as burn-in.The results below are quoted as the median values overthese last 25 000 or 5000 samples with upper and lowercredible (Bayesian confidence) limits given by the 2.5%and 97.5% quantiles of these samples.

An MCMC approach was used in an early method to fitthe single-site model [8], but it required the use of themore complicated Metropolis-Hastings algorithm, notGibbs sampling, which is central to the efficiency of ourmethod. In SI Appendix Section 2 we present detailedresults demonstrating that on samples generated from theUNTB with known parameters that our method outper-forms the two-stage approximate method of [16], providingaccurate and reliable estimates of both # and Ii exceptwhen Ii ) #. In this case there is a consistent bias towardsunder-estimating Ii, which, as we explain in SI AppendixSection 2, is preferable to the large variation in theparameter estimates exhibited by the two-stage approxi-mation. The HDP method also has two further advantages:it generates a full posterior distribution of the modelparameters, which provides a realistic estimate of theuncertainty around their point estimates, and it alsorecovers the metacommunity distribution.

To determine whether an observed data set appearsneutral we used a similar Monte Carlo significance test tothat in [12]. Given the kth posterior sample of fitted UNTBparameters, #k; Ik

1; . . . ; IkM, an artificial data matrix with the

same number of samples M and the same sample sizes Ji asthe original data matrix is generated by sampling from thefull neutral-HDP, which we will denote by Xk

0. Given thissample we can also generate a neutral metacommunitydistribution, !$

k0, using (12), since the ancestral frequencies

T(j ¼PM

i¼1 Tij are known. This will be a true neutralmetacommunity since the distribution will correspond tostick-breaking with parameter #. Note that the number ofspecies observed can differ from S. We then calculate thelikelihood PðXk

0j !$k0; Ik

1; . . . ; IkMÞ using (10). These like-

lihoods were then compared to the actual likelihood of theobserved sample, PðXj !$k; Ik

1; . . . ; IkMÞ, and the proportion

that exceeded that value calculated to give a pseudo p-value, denoted pN, that the data is consistent with theneutral model. In addition, we generated data sets, Xk

1,with the metacommunity fixed at the model fitted values,!$k. Due to the hierarchical nature of the model, themetacommunity DP only gives a prior on the metacom-munity distributions, the observed meta-community candeviate from the neutral expectation. This enables us totest for local neutral community assembly but with a fittedpotentially nonneutral metacommunity. We do this in thesame way calculating the likelihood for each of thesam ples , PðXk

1j !$k; Ik1; . . . ; Ik

MÞ, an d c om par i ng toPðXj !$k; Ik

1; . . . ; IkMÞ, the proportion of samples with

likelihood greater than this forms our pseudo p-value for

local neutral community assembly, which we denoteby pL.For both tests, samples were generated either from 2500sets of fitted parameters taken from every tenth iterationof the last 25 000 Gibbs samples or from 500 sets of fittedparameters taken from every tenth iteration of the last5000 Gibbs samples for the human gut microbiota whentesting multiple taxa.

There are many ways in which a distribution couldappear nonneutral. A clear example is provided by thesituation where communities fall into a finite number ofdistinct types such that community configurations clustertogether. It has been suggested that the human gutmicrobiome can be clustered into three distinct enter-otypes [21]–[23]. This will appear nonneutral since asingle metapopulation distribution will be unable todesribe all the community configurations observed. Inaddition, communities can also appear nonneutral at thelevel of the observed taxa abundances, if the abundanceswithin individual samples are more or less skewed to rarespecies than expected for a Dirichlet process then this willappear nonneutral at the local community level. If thisoccurs for the metacommunity then neutrality will berejected there too.

D. Identifying Neutral Subsets of SpeciesFor the microbial community data, we will separate

species by their taxa and fit the model to taxa separately inan attempt to identify neutral subsets. The validity of thisapproach rests on two observations. First, that if there aremultiple neutral guilds of species in a community, wherethe abundance of a guild varies from site to site in anonneutral fashion, then the community as a whole willappear nonneutral but if we just sample species from oneguild then the neutral patterns will be recovered [38]. Thisis self-evident. The second observation is that if only asubset of the species in a neutral guild are sampled, thenthat subset will still fluctuate neutrally but with renorma-lized probabilities. This derives from the followingproperty of the Dirichlet distribution, that if only asubset of the S dimensions are observed, say U, then thatsubset is still distributed as a Dirichlet on the reducedspace with the same parameters. For the neutral modelthe result is that the biodiversity parameter is unchangedbut that the immigration rate at each site is reduced,IU

i ¼ Iið1"P

i 62U $iÞ, according to the weight of themissing species in the metacommunity. The result is thatif at some level of taxonomic resolution all species arefrom the same neutral guild, if not necessarily represent-ing all that guild, then they will still be identified as neutral.

The key ideas used in the above derivations aresummarized in Table 1.

E. Data

1) Neutral Simulation: In SI Appendix Section 2 we showthat the UNTB-HDP fitting method accurately determines

Harris et al. : Linking Statistical and Ecological Theory

6 Proceedings of the IEEE |

Page 7: INVITED PAPER LinkingStatisticaland EcologicalTheory ... · hierarchical Dirichlet process. Using this approximation we developed an efficient Bayesian fitting strategy for the ...

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

the parameters of data sets generated from the UNTB. Toprovide a further test of the model fitting from a samplethat relaxes the mainland-island structure of the UNTB butmaintains the assumption of neutrality we performed aneutral model simulation. This comprised 50 sites indexedi ¼ 1; . . . ; 50, with a fixed population number ofNi ¼ 20 000 individuals per site. Discrete dynamics wereused with a probability that an individual was removed ateach iteration of 5%. Deleted individuals were thenreplaced, with speciation probability " ¼ 10"5 by anentirely new species, by an individual chosen at randomfrom the local community in the previous iteration withprobability ð1" "Þð1" miÞ, or by an individual chosen atrandom from all the other sites with probability ð1" "Þmi.The migration probability was varied across sites accordingto the rule mi ¼ i% 10"4, so that the immigration rate,Ii ¼ miNi ¼ 2i, varied from 2 to 100. The model was runfor 2000 generations, i.e., 40 000 iterations, at whichpoint the species number appeared stationary, then 1000individuals were sampled with replacement from each site.The UNTB-HDP model was fit by Gibbs sampling to thisdata set as was the two-stage approximate method of [16].This simulation although it has strictly neutral dynamicsdoes not correspond exactly to Hubbell’s UNTB becauserather than an explicit mainland-island structure withdiversity only generated in the metapopulation, it has

speciation occuring in the local populations themselves,with a metapopulation which is an implicit aggregate of thelocal populations rather than an explicit distribution.

2) Tropical Trees From Panama: To provide a well-distributed sample of tropical trees at a regional level wetook twenty-nine of the one hectare forest plots consid-ered in [20]. These comprised all the one hectare samplesfrom the Panama region with an elevation of less than200 metres. This restriction ensured that all samples werefrom the same environment of lowland tropical forest. Wealso did not use data from the three larger Panama plotsin order to maintain an even sampling at the regionallevel. Within each plot all trees * 10 cm in diameter werecensused and their morpho-species recorded. The net-work of sample sites was spread across a 15 % 50 kmregion along the Panama canal, see [41] for details. A totalof 13 263 trees were sampled from 367 species. Thenumber of individuals observed in each plot ranged froma minimum of 302 to 647 with a median of 450. TheUNTB-HDP model was fit to this data as described above.

3) Human Gut Microbiota: To compare with the tropicaltree analysis we also fitted the UNTB-HDP model to astudy of the gut microbiomes of twins and their mothers[18]. These comprised fecal samples from 154 different

Table 1 Key Ideas Used in This Paper

Harris et al. : Linking Statistical and Ecological Theory

| Proceedings of the IEEE 7

Page 8: INVITED PAPER LinkingStatisticaland EcologicalTheory ... · hierarchical Dirichlet process. Using this approximation we developed an efficient Bayesian fitting strategy for the ...

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

individuals characterized by family and body mass index(BMI). Each individual was sampled at two time pointsapproximately two months apart. The V2 hypervariableregion of the 16S rRNA gene was amplified by PCR andthen sequenced using 454. We reprocessed this data setfiltering the reads, denoising and removing chimeras usingthe AmpliconNoise pipeline [42], [43]. This gave a total of570 851 reads split over 278 samples, since out of the308 collected samples thirty failed to possess any readsfollowing filtering. The size of individual samples variedfrom just 53 to 10 580 with a median of 1598. Thenumber of unique sequences remaining following noiseremoval was 19 647. These were then taxonomicallyclassified using the RDP stand-alone classifier of [44].We constructed operational taxonomic units (OTUs) at3% sequence difference using average linkage clusteringto approximate species [45]. This was done for the entiredata set generating 7238 OTUs. We fitted the UNTB-HDP model to this data set.

To explore the impact of sample size and number onthe ability of our pseudo p-values to correctly identify acommunity as nonneutral at the local and metacommunitylevels we generated a series of subsampled data sets fromthis study. First, we selected at random without replace-ment either 20, 50, 100, or 200 samples from all those thathad 1000 reads or greater (247 in total). Then wegenerated a series of data sets where we sampledincreasing numbers of individuals or reads from theseselected samples, from 20 individuals per sample to 400inclusive in increments of 20. We used sampling withreplacement i.e. multinomial sampling so that expectedOTU proportions were equal to those in the observedcommunities. For each number of samples and number ofreads we generated ten replicate communities. We thenfitted the UNTB-HDP model to these communities andtested for neutrality at the local and metapopulation level.

Starting with the full data set, we split the uniquesequences according to the phylum to which they wereclassified, using a cut-off of 70% bootstrap confidence.OTUs were then reconstructed at 3% for each phylum andthe UNTB-HDP fit to each phylum separately. Werepeated this process for family and genus too. Onlysamples that had more than 150 representatives from ataxa were included in the analysis and the model was onlyfit to taxa that had at least 50 samples satisfying thiscriterion. This ensured a sufficiently large data set forparameters to be inferred and if a taxa dominates a neutralguild occupying a particular role we would expect it toappear in a large proportion of samples. We also gen-erated ten replicate data sets from the full data set withthe same number of samples and same number of readsper sample as the data sets split by taxa at each level.Applying the UNTB-HDP to these then gives us anequivalent bench-mark for the effect of subsampling onour ability to detect nonneutrality. We also did this forthe tropical tree data.

III . RESULTS

A. Neutral SimulationIn Fig. 1, we give the immigration rates estimated by the

UNTB-HDP fitting algorithm for the neutral simulation.From this single sample we are able to accurately predict

Fig. 1. Estimated immigration rates versus true values for the UNTB-

HDP model fit to a neutral model simulation. Predictions are medians

(solid line) from 25 000 posterior samples together with lower (2.5%)

and upper (97.5%) Bayesian confidence intervals (dotted lines). The

predictions from the two-stage approximation are also given

(blue line).

Fig. 2. An NMDS plot of the twenty-nine Panama tropical tree

communities. Communities are visualised as bubbles with size

proportional to the median Ii values obtained from the UNTB-HDP

Gibbs sampler. Contours calculated using the ordisurf function of the

R vegan package are also shown. The metacommunity distribution is

denoted by a solid black point.

Harris et al. : Linking Statistical and Ecological Theory

8 Proceedings of the IEEE |

Page 9: INVITED PAPER LinkingStatisticaland EcologicalTheory ... · hierarchical Dirichlet process. Using this approximation we developed an efficient Bayesian fitting strategy for the ...

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

the immigration rates across all the sites. The uncertaintyin our predictions increases for higher Ii but there is noconsistent bias. In contrast, the two-stage approximationsubstantially underestimates the immigration rate as Ii

increases. This is most likely because although thesimulation appears locally neutral ðpL ¼ 0:57Þ as wewould expect, the hypothesis that the neutral modelapplies at the metacommunity level too is rejected,pN ¼ 0:0096. The deviation from the mainland-islandstructure and the occurrence of speciation within theislands themselves results in a metacommunity distribu-tion that deviates from the neutral stick-breaking process.This illustrates that in contrast to the two-stage approx-imation the UNTB-HDP model can still correctly predictimmigration rates when neutral community assemblyoperates only at the local community level.

B. Tropical Trees From PanamaBy fitting the UNTB-HDP model to the twenty-nine

tropical tree communities we found that they have a

distribution of abundances across sites that is consistentwith the neutral model at both the metacommunity andlocal community levels, pN ¼ 0:81 and pL ¼ 0:23. Themedian fitted # obtained was 109.3. The median fittedimmigration rates varied across sites from 20.69 to 76.93with a median of 41.7. In Fig. 2, we use nonmetricmultidimensional scaling (NMDS) to position eachcommunity in two-dimensions in such a way as to preserveBray-Curtis distances between communities. This wasdone using the metaMDS function of the vegan package inR [46]. The fitted metacommunity distribution is alsoshown in this plot. The sites are represented as bubbleswith size proportional to their fitted immigration rates andcontours calculated using the ordisurf function. From thisit is apparent that the communities with higher Ii are ingeneral more similar to the metacommunity. The fittedimmigration rates are also related to the spatial location ofthe sites. Although there is no spatial location associatedwith the metacommunity, if we assign it to the location ofthe site with the highest Ii, site 14, and calculate the

Fig. 3. Impact of sample number and size on detection of nonneutrality in the human gut data. The figures show the pseudo p-values for neutrality

for both the complete neutral model ðPGÞ and local community assembly ðPLÞ. We generated ten replicate communities by sampling without

replacement either 20, 50, 100, or 200 samples from those that had 1000 reads or greater (247 in total) and from the selected samples we

generated a fixed number of reads sampling with replacement. We increased read numbers from 20 individuals per sample to 400 inclusive in

increments of 20. We then tested the subsampled communities for neutrality.

Harris et al. : Linking Statistical and Ecological Theory

| Proceedings of the IEEE 9

Page 10: INVITED PAPER LinkingStatisticaland EcologicalTheory ... · hierarchical Dirichlet process. Using this approximation we developed an efficient Bayesian fitting strategy for the ...

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

distance from this site to each of the others, then we find asignificant negative correlation ðp ¼ 0:03Þ between dis-tance and immigration rate.

C. Human Gut MicrobiotaIn contrast to the tropical trees, the human gut

samples do not appear neutral at the whole communitylevel, pN ¼ 0 and pL ¼ 0. This was not purely an effect ofthe tropical trees comprising a data set of fewer samplesand fewer individuals. Reducing the gut data set to anequivalent number of samples (29) with the same sizeswe would still always reject neutrality at the metacom-munity level, at the local level we observed a median pL of0.062 across the ten replicates. We would falsely fail toreject neutrality therefore but not as strongly as for thereal tree data ðpL ¼ 0:23Þ. Therefore, we can concludethat the human gut is convincingly less neutral thantropical trees even accounting for the different samplenumbers and sizes.

In Fig. 3 we show the impact of sample number andsample size on the pseudo p-values for the test of neutralityfor whole community and local community assembly. Withsufficient samples (i.e., at least 200) we have power toreject neutrality at both levels provided the sample sizeexceeds 150 but as sample number decreases our power tocorrectly reject neutrality particularly for local communityassembly decreases.

The results of subdividing the OTUs at differenttaxonomic levels and fitting the UNTB-HDP model aregiven in a nested format in Table 2. The families associatedwith each phylum are indented below as are the genera ineach family. We see some evidence that as we move downthe taxonomic hierarchy from phyla, through families to

genera, the subdivided communities appear more consis-tent with neutral local community assembly. We wouldreject local neutrality for both major phyla found in thehuman gut, the Bacteroides and Firmicutes, but there aretwo families out of four for which we cannot confidentlyreject neutral local community assembly at the 1% level,the Bacteroidaceae and Incertae Sedis XIV, with pL ¼ 0:03and 0.05, respectively. At the level of genera, two out ofthree appear close to neutral at the local level, theexception being the Faecalibacterium. This is not the casewhen we do not use the fitted metacommunities andinstead test for both neutral local community assembly anda neutral metacommunity. Then for all data sets we wouldcompletely reject neutrality. The figures in paranthesesgive pseudo p-values for the equivalent complete data setrandomly sampled down to the same size as the taxa. Thisgives us a benchmark to verify that these affects are notpurely due to small sample sizes. From these we see that inall cases the probability of incorrectly concluding that thesubsampled data set is neutral is less than 1%.

To quantify how the metacommunity deviates from theneutral assumption for those data sets that appear locallyneutral we compared the fitted metacommunities aver-aged over 500 Gibbs samples with the metacommunityobserved in samples from the full neutral model with theequivalent parameters. These two distributions are shownin Fig. 4 for the three genera, Bacteroides, Blautia, andFaecalibacterium. These distributions are shown as rank-abundance plots with the OTUs ordered in terms of therelative frequency with that frequency given on the y-axis,which is log-scaled. It is clear that the fitted metacommu-nities from the three genera all have a small number ofhighly abundant OTUs and then a long tail of rare OTUs.

Table 2 Fitting the UNTB-HDP Model to Human Gut Microbiota

Harris et al. : Linking Statistical and Ecological Theory

10 Proceedings of the IEEE |

Page 11: INVITED PAPER LinkingStatisticaland EcologicalTheory ... · hierarchical Dirichlet process. Using this approximation we developed an efficient Bayesian fitting strategy for the ...

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

The neutral model cannot fit a metacommunity of thisshape.

We also looked for correlations between the fittedimmigration rates for the different taxa and the body massindex of subjects. No significant relationships were foundat the genus level but for the family Ruminococcaceae asignif icant negative relationship was observed(p-value ¼ 0:014 see Fig. 5). The same negative correlationwas also observed for their parent phylum the Firmicutes butit was slightly stronger ðp-value ¼ 0:007Þ.

IV. DISCUSSION

The results clearly demonstrate the usefulness of theUNTB-HDP Gibbs sampler, its ability to fit large multi-sample data sets, and its robustness to deviations of themetacommunity from neutrality and the ability to detectthose deviations whilst still correctly inferring immigra-tion rates. The resulting significance tests and fittedparameters reveal a great deal about the ecology of thehuman gut microbiota in comparison to macroscopicorganisms such as the tropical trees. The human gut isclearly much more strongly structured by functional

niches. Only at the genus level do we see some evidenceof neutral local community assembly in the gut, whilsttropical trees were well described by the neutral modelwithout any subdivision of species. In some ways, this is tobe expected, given the multiplicity of metabolic rolesperformed by the human microbiota we would not expectecological equivalence at the whole community level.However, the borderline neutral patterns we did observesuggest the possibility that neutral local communityassembly may be operating within the species occupyingthose roles, and that neutral processes may be responsiblefor maintaining some of the vast diversity that is observedin the human gut. This has to be a tentative conclusion aspattern does not imply process [10], but, regardless, thefact the observed abundances are consistent with theneutral model means that its importance for explainingfine-scale gut microbial diversity cannot be ruled out.

It is important to address the question of whether thetests have the power necessary to detect nonneutrality. It isclear from Fig. 3 that as the number of samples inparticular decreases it becomes hard to detect nonneutraldistributionsVthis is actually a strong motivation for theuse of the UNTB-HDP which can be efficiently fit in the

Fig. 4. Human gut metacommunity distributions. The fitted metacommunity distributions (red line) and neutral metacommunity predictions

(blue line) as rank-abundance curves for three genera: Bacteroides, Blautia, and Faecalibacterium.

Harris et al. : Linking Statistical and Ecological Theory

| Proceedings of the IEEE 11

Page 12: INVITED PAPER LinkingStatisticaland EcologicalTheory ... · hierarchical Dirichlet process. Using this approximation we developed an efficient Bayesian fitting strategy for the ...

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

multisite case. However, our benchmarking against the fullgut data set allows us to conclude that some genera and thetropical trees appear more neutral than the equivalentsized complete gut microbiome. It is also important to notethat the model was unable to detect the spatial signature inthe tropical tree data as a deviation from neutrality. In theabsence of that spatial information we would haveincluded that a spatially inhomogenous metapopulationwas sufficient to explain these patterns. That certainlymotivates inference strategies for spatially explicit neutralmodels [47].

It is highly significant that the metacommunitydistributions could not be explained by the neutral processfor any taxa. Instead, the metacommunity was dominatedby a small number of very abundant OTUs, with in all casesthe most abundant OTU possessing a relative abundanceexceeding 10% of the metacommunity. This may be asignature of nonneutral processes. The dominant OTUsmay have a competitive advantage, or interactions withbacteriophages [48] or the host immune system may bestructuring these distributions [49], and that is skewingtheir apparent metacommunity abundance, or it maygenuinely reflect the abundance of these organisms in themetacommunity perhaps coupled with an improveddispersal ability over their competitors.

The parameters of the fitted models, in particular, theimmigration rates, are also highly informative. For the

Panamanian tree data set we showed that these correlatedwith spatial location of the sites. A strong effect ofdistance on community similarity was found in theoriginal study and a spatially explicit version of theneutral model was fit to the data [20], but we have shownthat even in the UNTB where space is only implicit, thissignal can be recovered from the fitted immigration rates.For the gut microbiota samples, we have no spatialposition, but here, remarkably, the immigration rates forthe family Ruminococcaceae and phylum Firmicutescorrelated negatively with body mass index. This providesan unique interpretation of the impact of obesity on thehuman gut microbiota: an increase in the rate of input ofnutrients to the gut effectively results in an increase inmicrobial growth rates in the key carbohydrate metabolis-ing group the Ruminococcaceae [50] and these equate to adecrease in immigration rate relative to local birth.

It is also instructive to compare immigration ratesbetween fitted models. There has been debate as to theimportance of dispersal on microbial community structure,the theory that ‘‘everything is everywhere, but theenvironment selects’’ [51]. However, comparing thetropical tree fits with the gut microbiota at the phylumlevel we find that the predicted immigration rates arecomparable, implying that dispersal limitation may be justas important between human guts as it is between tropicalforests. Interesting patterns also appear comparing immi-gration rates between gut taxa. They are much lower, forexample, for the Bacteroides than the Firmicutes, probablyreflecting the much higher tendency for the latter to bespore-forming.

Finally, whilst these results are of great interest inthemselves, perhaps our most significant achievement isformally linking a model from ecology, the Unified NeutralTheory of Biodiversity, with a model from machinelearning, the hierarchical Dirichlet process. In addition,by showing that the details of the local communitydynamics are irrelevant for the HDP approximation tohold, provided the neutrality assumption is met, we mayexplain why we were able to fit communities as different astropical trees and the gut microbiota. This stronglymotivates the HDP as an ecological null model. What ismore the mathematical structure of the HDP is easilyextendable to for example, niche-neutral models or furtherhierachical levels. Therefore, we believe that the connec-tion we have made here will lead to an explosion ofhierarchical Bayesian modelling in community ecology.

Software for fitting the UNTB-HDP can be downloadedfrom: https://github.com/microbiome/NMGS. h

Acknowledgment

The authors would like to thank three anonymousreviewers for constructive comments.

Fig. 5. Immigration rate versus BMI. Median immigration rate for the

family Ruminococcaceae determined by the UNTB-HDP model plotted

against body mass index. A significant negative correlation is observed

(p-value ¼ 0:014VPearson’s correlation).

Harris et al. : Linking Statistical and Ecological Theory

12 Proceedings of the IEEE |

Page 13: INVITED PAPER LinkingStatisticaland EcologicalTheory ... · hierarchical Dirichlet process. Using this approximation we developed an efficient Bayesian fitting strategy for the ...

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

RE FERENCES

[1] G. E. Hutchinson, ‘‘Concluding remarks,’’ inProc. Cold Spring Harbor Symp. QuantitativeBiol., 1957, vol. 22, pp. 415–427.

[2] G. Hardin, ‘‘The competitive exclusion principle,’’Science, vol. 131, pp. 1292–1297, 1960.

[3] D. Simberloff and T. Dayan, ‘‘The guildconcept and the structure of ecologicalcommunities,’’ Ann. Rev. Ecol. Syst., vol. 22,pp. 115–143, 1991.

[4] H. Caswell, ‘‘Community structure: A neutralmodel analysis,’’ Ecol. Monograph., vol. 46,pp. 327–354, 1976.

[5] S. P. Hubbell, The Unified Neutral Theory ofBiodiversity and Biogeography. Princeton, NJ,USA: Princeton Univ. Press, 2001.

[6] R. H. MacArthur and E. O. Wilson, The Theoryof Island Biogeography. Princeton, NJ, USA:Princeton Univ. Press, 1967.

[7] B. J. McGill, ‘‘A test of the unified neutraltheory of biodiversity,’’ Nature, vol. 422,pp. 881–885, 2003.

[8] R. S. Etienne and H. Olff, ‘‘A novelgenealogical approach to neutral biodiversitytheory,’’ Ecol. Lett., vol. 7, pp. 170–175, 2004.

[9] R. S. Etienne, ‘‘A new sampling formula forneutral biodiversity,’’ Ecol. Lett., vol. 8,pp. 253–260, 2005.

[10] J. Rosindell, S. P. Hubbell, F. He,L. J. Harmon, and R. S. Etienne, ‘‘The case forecological neutral theory,’’ Trends Ecol. Evol.,vol. 27, pp. 203–208, 2012.

[11] R. A. Chisholm and S. W. Pacala, ‘‘Nicheand neutral models predict asymptoticallyequivalent species abundance distributions inhigh-diversity communities,’’ Proc. Nat. Acad.Sci. USA, vol. 107, pp. 15 821–15 825, 2010.

[12] R. S. Etienne, ‘‘A neutral sampling formula formultiple samples and an ‘exact’ test ofneutrality,’’ Ecol. Lett., vol. 10, pp. 608–618,2007.

[13] R. S. Etienne, ‘‘Maximum likelihoodestimation of neutral model parametersfor multiple samples with different degrees ofdispersal limitation,’’ J. Theor. Biol., vol. 257,pp. 510–514, 2009.

[14] F. Munoz, P. Couteron, B. R. Ramesh, andR. S. Etienne, ‘‘Estimating parameters ofneutral communities: From one single large toseveral small samples,’’ Ecology, vol. 88,pp. 2482–2488, 2007.

[15] F. Jabot, R. S. Etienne, and J. Chave,‘‘Reconciling neutral community models andenvironmental filtering: Theory and anempirical test,’’ Oikos, vol. 117, pp. 1308–1320,2008.

[16] R. S. Etienne, ‘‘Improved estimation ofneutral model parameters for multiplesamples with different degrees of dispersallimitation,’’ Ecology, vol. 90, no. 3,pp. 847–852, 2009.

[17] M. Hamady, J. J. Walker, J. K. Harris,N. J. Gold, and R. Knight, ‘‘Error-correctingbarcoded primers for pyrosequencinghundreds of samples in multiplex,’’ Nat.Methods, vol. 5, no. 3, pp. 235–237,Mar. 2008.

[18] P. J. Turnbaugh et al., ‘‘A core gut microbiomein obese and lean twins,’’ Nature, vol. 457,no. 7228, pp. 480–484, Jan. 22, 2009.

[19] Y. W. Teh, M. I. Jordan, M. J. Beal, andD. M. Blei, ‘‘Hierarchical Dirichlet

processes,’’ J. Amer. Statist. Assoc., vol. 101,no. 476, pp. 1566–1581, Dec. 2006.

[20] R. Condit et al., ‘‘Beta-diversity in tropicalforest trees,’’ Science, vol. 295, no. 5555,pp. 666–669, Jan. 25, 2002.

[21] M. Arumugam et al., ‘‘Enterotypes of thehuman gut microbiome,’’ Nature, vol. 473,no. 7346, pp. 174–180, May 12, 2011.

[22] I. Holmes, K. Harris, and C. Quince,‘‘Dirichlet multinomial mixtures: Generativemodels for microbial metagenomics,’’ PLoSONE, vol. 7, no. 2, Feb. 2012, Art. ID. e30126.[Online]. Available: http://dx.doi.org/10.1371%2Fjournal.pone.0030126.

[23] T. Ding and P. D. Schloss, ‘‘Dynamics andassociations of microbial community typesacross the human body,’’ Nature, vol. 509,pp. 357–360, 2014.

[24] N. Fierer, M. A. Bradford, and R. B. Jackson,‘‘Toward an ecological classification ofsoil bacteria,’’ Ecology, vol. 88, no. 6,pp. 1354–1364, Jun. 2007.

[25] L. Philippot et al., ‘‘Spatial patterns ofbacterial taxa in nature reflect ecological traitsof deep branches of the 16S rRNA bacterialtree,’’ Environ. Microbiol., vol. 11, no. 12,pp. 3096–3104, Dec. 2009.

[26] L. Philippot et al., ‘‘The ecological coherenceof high bacterial taxonomic ranks,’’ NatureRev. Microbiol., vol. 8, no. 7, pp. 523–529,Jul. 2010.

[27] D. A. Rasko et al., ‘‘The pangenome structureof Escherichia coli: Comparative genomicanalysis of E-coli commensal and pathogenicisolates,’’ J. Bacteriol., vol. 190, no. 20,pp. 6881–6893, Oct. 2008.

[28] W. Sloan, M. Lunn, S. Woodcock, I. Head,S. Nee, and T. Curtis, ‘‘Quantifying the rolesof immigration and chance in shapingprokaryote community structure,’’ Environ.Microbiol., vol. 8, no. 4, pp. 732–740,Apr. 2006.

[29] S. Woodcock et al., ‘‘Neutral assembly ofbacterial communities,’’ FEMS Microbiol. Ecol.,vol. 62, no. 2, pp. 171–180, Nov. 2007, JointSymposium of the Environmental-Microbiolo-gy-Group/British-Ecological-Society/Society-for- General-Microbiology, Univ. York, York,England, Sep. 13, 2006.

[30] P. Jeraldo et al., ‘‘Quantification of the relativeroles of niche and neutral processes instructuring gastrointestinal microbiomes,’’Proc. Nat. Acad. Sci. USA, vol. 109, no. 25,pp. 9692–9698, Jun. 19, 2012.

[31] A. McKane, D. Alonso, and R. Sole, ‘‘Analyticsolution of Hubbell’s model of localcommunity dynamics,’’ Theor. Pop. Biol.,vol. 65, no. 1, pp. 67–73, Feb. 2004.

[32] W. T. Sloan, S. Woodcock, M. Lunn,I. M. Head, and T. P. Curtis, ‘‘Modelingtaxa-abundance distributions in microbialcommunities using environmentalsequence data,’’ Microb. Ecol., vol. 53, no. 3,pp. 443–455, Apr. 2007, Workshop onMicrobial Environmental Genomics,Shanghai Jiao Tong Univ., ManhangCampus, Shanghai, China,Jun. 12–15, 2005.

[33] T. S. Ferguson, ‘‘A Bayesian analysis of somenonparametric problems,’’ Ann. Stat., vol. 1,no. 2, pp. 209–230, 1973.

[34] W. J. Ewens, ‘‘The sampling theory ofselectively neutral mutations,’’ Theor. Pop.Biol., vol. 3, pp. 87–112, 1972.

[35] C. E. Antoniak, ‘‘Mixtures of Dirichletprocesses with applications to Bayesiannonparametric problems,’’ Ann. Statist., vol. 2,pp. 1152–1174, 1974.

[36] D. J. Mackay, ‘‘Bayesian interpolation,’’ NeuralComput., vol. 4, pp. 415–417, 1992.

[37] M. D. Escobar and M. West, ‘‘Bayesian densityestimation and inference using mixtures,’’J. Amer. Statist. Assoc., vol. 90, no. 430,pp. 577–588, Jun. 1995.

[38] S. C. Walker, ‘‘When and why do non-neutralmetacommunities appear neutral?’’ Theor.Pop. Biol., vol. 71, pp. 318–331, 2007.

[39] D. Aldous, ‘‘Exchangeability and related topicsEcole d’Ete de Probabilites deSaint-Flour XIII-1983. Berlin: Springer-Verlag,1985, pp. 1–198.

[40] F. M. Hoppe, ‘‘Polya-like urns and the Ewens’sampling formula,’’ J. Math. Biol., vol. 20,no. 1, pp. 91–94, 1984.

[41] C. Pyke, R. Condit, S. Aguilar, and S. Lao,‘‘Floristic composition across a climatic gradientin a neotropical lowland forest,’’ J. Veg. Sci.,vol. 12, no. 4, pp. 553–566, Aug. 2001.

[42] C. Quince et al., ‘‘Accurate determination ofmicrobial diversity from454 pyrosequencing data,’’ Nat. Methods,vol. 6, pp. 639–641, 2009.

[43] C. Quince, A. Lanzen, R. J. Davenport, andP. J. Turnbaugh, ‘‘Removing noise frompyrosequenced amplicons,’’ BMC Bioinf.,vol. 12, no. 38, 2011.

[44] Q. Wang, G. M. Garrity, J. M. Tiedje, andJ. R. Cole, ‘‘Naive Bayesian classifier for rapidassignment of rRNA sequences into the newbacterial taxonomy,’’ Appl. Environ. Microb.,vol. 73, no. 16, pp. 5261–5267, Aug. 2007.

[45] N. Youssef et al., ‘‘Comparison of speciesrichness estimates obtained using nearlycomplete fragmentsand simulated pyrosequencing-generatedfragments in 16S rRNA gene-basedenvironmental surveys,’’ Appl. Environ.Microbiol., vol. 75, no. 16, pp. 5227–5236,Aug. 15, 2009.

[46] R Development Core Team, R: A Language andEnvironment for Statistical Computing, Vienna,Austria, 3-900051-07-0, 2010.[Online]. Available: http://www.R-project.org.

[47] J. Rosindell, Y. Wong, and R. Etienne,‘‘A coalescence approach to spatial neutralecology,’’ Ecol. Informat., pp. 259–271, 2008.

[48] S. Minot et al., ‘‘The human gut virome: Inter-individual variation and dynamicresponse to diet,’’ Genome Res., vol. 21, no. 10,pp. 1616–1625, Oct. 2011.

[49] C. Quince et al., ‘‘The impact of Crohn’sdisease genes on healthy human gut micro-biota: A pilot study,’’ Gut, vol. 62, pp. 952–954, Jan. 2013.

[50] X. Ze, S. H. Duncan, P. Louis, and H. J. Flint,‘‘Ruminococcus bromii is a keystone speciesfor the degradation of resistant starch in thehuman colon,’’ ISME J., vol. 6, pp. 1535–1543,2012.

[51] B. J. Finlay and T. Fenchel, ‘‘Cosmopolitanmetapopulations of free-living microbialeukaryotes,’’ Protist, vol. 155, pp. 237–244,2004.

Harris et al. : Linking Statistical and Ecological Theory

| Proceedings of the IEEE 13

Page 14: INVITED PAPER LinkingStatisticaland EcologicalTheory ... · hierarchical Dirichlet process. Using this approximation we developed an efficient Bayesian fitting strategy for the ...

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

ABOUT THE AUT HORS

Keith Harris received the B.Sc. degree in math-

ematics and statistics from the University of York,

York, U.K., in 2002, and the M.Sc. and Ph.D.

degrees in statistics from the University of

Sheffield, Sheffield, U.K., in 2003 and 2008,

respectively.

From 2008 to 2013, he was a PDRA at the

University of Glasgow, first working on a project

entitled ‘‘Classifiers in Medicine and Biology

(Advancing Machine Learning Methodology for

New Classes of Prediction Problems)’’ in the Department of Computing

Science, and later on a Unilever funded project in the School of

Engineering that focused on developing new statistical methods for the

analysis of genomics data from microbial communities. He is currently a

PDRA in the School of Mathematics and Statistics at the University of

Sheffield working on an EPSRC funded project called ‘‘Simulation Tools

for Automated and Robust Manufacturing.’’

Todd L. Parsons received the B.Sc. degree in pure

mathematics from the University of Waterloo,

Waterloo, ON, Canada, and the M.Sc. and Ph.D.

degee in mathematics from the University of

Toronto, Toronto.

From 2007 to 2011, he was a Postdoctoral

Researcher in biology at the University of

Pennsylvania. He is now a permanent Research

Scientist (CR2) with the Centre National de la

Recherche Scientifique (CNRS) with a primary

appointment in the Probability and Stochastic Models Laboratory at

l’Universite Pierre et Marie Curie (Paris 06), and a courtesy appointment

in the Centre for Interdisciplinary Research in Biology at the College de

France. His research interests include stochastic population models and

combinatorial stochastic processes arising from ecology, epidemiology,

and population genetics.

Umer Z. Ijaz received the B.S. and M.S. degrees in

computer systems engineering from Ghulam Ishaq

Khan Institute, Pakistan, in 2000 and 2003,

respectively, and the Ph.D. degree in electrical

and electonics engineering from Jeju National

University, South Korea, in 2008.

Between 2008 to 2014, he worked as Postdoc-

toral Researcher at the Universities of Cambridge,

Oxford, and Glasgow. He is currently a NERC

Independent Research Fellow and Lord Kelvin

Smith Fellow leading his own group on Environmental Omics at School of

Engineering, Glasgow. The purpose of his current research is to integrate

different sources of ’omics data (metagenomics, metatranscriptomics,

metabolomics, and metaproteomics) in environmental science for

microbial community analysis, by focusing on software development

and numerical ecology.

Leo Lahti received the doctoral degree in machine

learning and bioinformatics from the Department

of Computer Science, Aalto University, Espoo,

Finland, in 2010.

He is currently affiliated with the Laboratory of

Microbiology, Wageningen University, The Nether-

lands, and the Department of Veterinary Bios-

ciences, University of Helsinki, Finland, as a

Postdoctoral Research Fellow of the Academy of

Finland. His recent research focuses on under-

standing the population dynamics and health associations of the human

microbiome.

Ian Holmes received the Ph.D. degree from the

Sanger Centre in the University of Cambridge (now

the Sanger Institute), Cambridge, U.K., in 1998.

Following a spell at Los Alamos National

Laboratory as a Fulbright-Zeneca Research Fel-

low, he worked on the Berkeley Drosophila

Genome Project and the EBI’s Ensembl project

before being appointed as a Lecturer in Bioinfor-

matics at the Department of Statistics, University

of Oxford (2002–2004). In 2004, he was hired by

the Department of Bioengineering, University of California, Berkeley, CA,

USA, and was promoted to tenure in 2010. His research involves building

realistic stochastic models of various aspects of genome evolution, and

making these models useful as practical tools for biological discovery.

Christopher Quince received the Ph.D. degree in

food web modeling, in 2002, from the Theoretical

Physics Group at the University of Manchester,

Manchester, U.K.

He has pioneered the development of bioinfor-

matics and statistics for the interpretation of next

generation sequence data from microbial com-

munities. He then worked on theoretical popula-

tion genetics and fish growth models during

postdoctoral positions at Arizona State University

and the University of Toronto before obtaining a LKAS fellowship at the

University of Glasgow in 2006 to study microbial communities. Since

then he has held successive fellowships from the EPSRC and MRC

enabling him to devote his time to developing algorithms and software

for microbial community analysis. These include the widely used

software, AmpliconNoise and uchime, for the analysis of 16S rRNA

sequence data and CONCOCT for extracting genomes from shotgun

metagenomics data.

Harris et al. : Linking Statistical and Ecological Theory

14 Proceedings of the IEEE |

Page 15: INVITED PAPER LinkingStatisticaland EcologicalTheory ... · hierarchical Dirichlet process. Using this approximation we developed an efficient Bayesian fitting strategy for the ...

SI Appendix: 1) Large Population Limits for a NeutralMetacommunity and 2) Gibbs Sampling for the UNTB-HDPKeith Harris1, Todd L Parsons2, Umer Ijaz3, Leo Lahti4, Ian Holmes5 and Christopher Quince6,⇤1 School of Mathematics and Statistics, University of She�eld, She�eld, UK2Laboratoire de Probabilites et Modeles Aleatoires, CNRS UMR 7599, UPMC Univ Paris 06, Paris, France.3Infrastructure and Environment Research Division, School of Engineering, University of Glasgow,Glasgow, G12 8LT, UK.4Department of Veterinary Biosciences, University of Helsinki, Helsinki, Finland & Laboratory ofMicrobiology, Wageningen University, Wageningen, Netherlands.4Department of Bioengineering, University of California, Berkeley, California, USA.6Warwick Medical School, University of Warwick, Coventry, CV4 7AL, UK.⇤Corresponding author

1 Large Population Limits for a Neutral Metacommunity

1.1 Summary and Outline

Given the length and technical nature of this supplement, we will begin with a summary that outlines theresults herein. Our intent is to formulate a class of models that generalize Hubbell’s formulation of the UnifiedNeutral Theory of Biodiversity and Biogeography (UNTB) and a number of variants that have appeared inthe community ecology literature, whilst retaining the essential feature of neutrality. Our inspiration in thisare Cannings’ models [1], which have become the standard in theoretical population genetics. We discusscoalescent theory and these models in detail below, but in brief, a Cannings’ model allows any reproductionlaw with discrete generations that keeps the total population size fixed, provided that relabeling the parentsleaves the distribution of o↵spring unchanged. More generally, we could consider models replacing fixedpopulation sizes with density dependent population dynamics, as in [2], [3, 4] and [5], but this would havefurther lengthened and complicated this supplement.

We formulate a mainland-island Cannings’ model, in which the mainland has size N

0

= N and theislands have size N

i

that grow with N , but are approximately equal. We allow migration between any pairof island and mainland, and further allows mutations to give rise to new types on both island and mainland.After collecting a few results regarding the reproduction law for a Cannings’ model, we show in Section 1.4,provided that:

• the islands are asymptotically smaller than the mainland (in both census and e↵ective population size;see the discussion below),

• migration between demes is rare (we assume that the probability that a migrant arrives in a localcommunity is inversely proportional to the size of that community), and

• the probability of multiple mergers is asymptotically smaller in N than the rate of pairwise coalescence,

then Proposition 1 shows that if we rescale time proportionally to the e↵ective population size of the islands(i.e.,we measure time so that one time step corresponds to N

e

generations) for large values of N , thepopulation dynamics on the islands converge to the dynamics of Moran’s infinitely many alleles model,with the migration rate from the mainland taking the place of the mutation rate in the population geneticmodel, and such that the type of all new mutants/migrants is drawn from the initial type distribution forthe mainland (i.e., the probability of migration between islands or novel mutations appearing on an islandbecomes vanishingly small as N grows large, and can be completely ignored in the limit), and moreover, thecomposition of the mainland remains constant on this timescale - the dynamics are su�ciently slow that onecannot see changes when time is scaled according to the e↵ective population size of the islands. Moreover,this limit is independent of the specific reproduction law for the islands, provided it satisfies Cannings’conditions - indeed, we don’t even need to assume the same law between islands. As a consequence of the

1

Page 16: INVITED PAPER LinkingStatisticaland EcologicalTheory ... · hierarchical Dirichlet process. Using this approximation we developed an efficient Bayesian fitting strategy for the ...

identification of the islands’ dynamics as a variation of the infinitely many alleles model, we can use previousresults from theoretical population genetics to conclude that the stationary distribution for the islands is aDirichlet Process, and that the composition of a sample is distributed according to Ewens’ sampling formula.

In Section 1.5 we turn our attention to the mainland. We first observe that for large values of time,the species distribution on the islands converge onto stationary processes governed by the Dirichlet processabove. We can then apply this with results from [6] to obtain Proposition 3, which tells us that we needto rescale time according to the e↵ective population size of the mainland (again, so that one time stepcorresponds to N

e

generations, but now N

e

for the mainland, which is substantially larger). On this slowscale, the islands will essentially instantaneously arrive at their stationary state (an instant in this “slow”time scale is in fact an extremely long time in the natural “intermediate” time scale for the islands), whilstnow the population on mainland follows the “real” infinitely many alleles model (with the actual mutationrate), and again, migrations from an island to the mainland become vanishingly rare as N becomes large,and, as before, the stationary distribution is again a Dirichlet process, where each newly appearing genotypeis assigned a label chosen uniformly at random from [0, 1] (thus the probability of two distinct mutationsgiving rise to the same type is 0). In particular, the islands have the Hierarchical Dirichlet Process fortheir stationary distribution: they are Dirichlet Processes in which the types are drawn from the underlyingDirichlet Process that describes the mainland.

1.2 A Mainland-Island “Cannings’ Model”

We begin by formulating a broad class of haploid models that includes Hubbell’s Unified Neutral Theory ofBiodiversity and Biogeography (UNTB) [7]. Our inspiration are Cannings’ population genetic models [1],which use exchangeability as a general mathematical formulation of neutrality: random variables ⌫

1

, . . . , ⌫

N

are exchangeable if the random vectors (⌫⇡(1)

, . . . , ⌫

⇡(N)

) are equal in distribution for all permutations ⇡ of{1, . . . , N}. Informally, the labels 1, . . . , N are arbitrary, and can be changed without essentially changingthe process. In a Cannings’ model, one assumes a fixed population of size N and discrete generations; ⌫

i

(n)is the number of o↵spring in the n+ 1st generation of the i

th individual of the n

th generation. (⌫1

, . . . , ⌫

N

)is assumed to be exchangeable and must satisfy

NX

i=1

i

= N.

Under suitable conditions on the higher moments (n.b., as a consequence of exchangeability, we must haveE [⌫

i

] = 1 for all i), one can show [8] that as N ! 1 the frequency of types (here, the type of an indi-vidual is inherited from its ancestor in the initial population) and the genealogical process converge to theWright-Fisher di↵usion and Kingman’s coalescent, respectively (relaxing the moment conditions leads to

a ⇤-coalescent limit for the genealogical process). In particular, if X(N)

i

(n) is the number of descendantsalive in the n

th generation of the i

th ancestral individual in the 0th generation, and c

Ni is the coalescenceprobability, i.e., the probability two individuals sampled without replacement from deme i have the sameparent,

c

N

:=E [(⌫

1

)2

]

N � 1,

where(x)

k

:= x(x� 1) · · · (x� k + 1)

is the falling factorial or Pochhammer symbol. Then, [8] shows that

limN!1

E [(⌫1

)3

]

NE [(⌫1

)2

]= 0

2

Page 17: INVITED PAPER LinkingStatisticaland EcologicalTheory ... · hierarchical Dirichlet process. Using this approximation we developed an efficient Bayesian fitting strategy for the ...

is a necessary and su�cient condition for X(N)

i

(bc�1

N

tc) to converge weakly1 as N ! 1 to a Wright-Fisherdi↵usion, i.e., to a di↵usion process with probability density

p(y, t|x) := P {X(t) 2 y + dy|X(0) = x}satisfying the Kolmogorov backward equation

@p

@t

=1

2

NX

i=1

NX

j=1

x

i

(�ij

� x

j

)@p

2

@x

i

@x

j

.

The quantity c

�1

N

has been referred to as the coalescent e↵ective population size, and can be shown togeneralize previously defined notions of an e↵ective population size [9].

Here, we take our cues from the discussion of infinite-alleles models in [6], which we will closely follow,in formulating a “Cannings’ UNTB” with migration and mutation. As in previous models, we will assumea mainland, which supports a population of size N

0

= N , together with a collection of islands labelledi = 1, . . . ,M which support populations of size N

i

. We will assume that the islands are all approximatelythe same size, and substantially smaller than the mainland; for Section 1.4, we will require N

i

⌧ N

0

2,whereas we will need to impose sharper estimates of the relative sizes in Section 1.5. In what follows, wewill refer to the mainland and each of the islands as having N

0

or N

i

niches respectively, we will use theterm deme when we are referring to a local community that can be either an island or the mainland, andwill refer to e.g., the individual in the j

th niche in the i

th deme.We will assume discrete generations, and that at each time step the current residents reproduce and are

replaced by their o↵spring. The j

th individual has ⌫(N)

ij

o↵spring so that

NiX

j=1

ij

= N

i

,

and model neutrality by assuming that each random vector (⌫(N)

i1

(n), . . . , ⌫(N)

iNi(n)) is exchangeable. We

further assume that (⌫(N)

i1

(n), . . . , ⌫(N)

iNi(n)) is independent of (⌫(N)

j1

(m), . . . , ⌫(N)

jNj(m)) unless i = j and all

m = n. Following [8], we define

c

Ni :=E [(⌫

i1

)2

]

N

i

� 1,

for i = 0, . . . ,M , and assume the analogue of Mohle’s condition:

limNi!1

E [(⌫i1

)3

]

N

i

E [(⌫i1

)2

]= 0, (1)

which has the following consequence [10]:

1A family of random variables {X(N)} taking values in a space S is said to converge weakly to X if

lim

N!1E[f(X(N)

)] = E[f(X)]

for all f 2 C(S); the values E[f(X)] completely characterize the distribution of X. Weak convergence is denoted by

X

(N) ) X.

2We will write aN = o(bN ), or aN ⌧ bN , if

lim

N!1

aN

bN= 0,

and use aN ⇣N bN to indicate that

lim

N!1

aN

bN= 1.

We will also write aN = O(bN ) if there exists a constant C such that

aN CbN ,

for all N .

3

Page 18: INVITED PAPER LinkingStatisticaland EcologicalTheory ... · hierarchical Dirichlet process. Using this approximation we developed an efficient Bayesian fitting strategy for the ...

Lemma 1. Assume (1). Then,

limNi!1

c

Ni = 0,

and

limNi!1

E [(⌫1

)2

(⌫2

)2

]

c

Ni

= 0.

We will further assume that there exists aN

such that

limN!1

c

Ni

a

N

=

(�

i

if i > 0, and

0 otherwise.(2)

which formalises the notion that the populations on the islands are all of the same order of magnitude (theire↵ective population sizes are asymptotically proportional c

Ni ⇠ �

i

a

N

) and asymptotically smaller than themainland (a

N

⌧ c

N

).We will further assume that each individual has a type, which is a label in [0, 1], which we think of as a

probability space with the uniform (Lebesgue) measure �. The labels are more of a mathematical conveniencefor tracking ancestries, and have no e↵ect on fitness, so we could equally well take labels in any compactPolish space X that is equipped with a probability measure �(dx). We write X

ij

(n) 2 [0, 1] for the type ofthe individual in the j

th niche of the i

th deme in generation n – the labels are inherited from the parent,except when an individual is subject to mutation at birth. We discuss the processes of reproduction andmutation below. The state of the i

th deme in the n

th generation is conveniently represented by an atomicprobability measure on [0, 1],

G

(N)

i

(n) =1

N

i

NiX

j=1

Xij(n),

where �

Xij(t)is the Dirac point mass at X

ij

(t), and the superscript (N) emphasizes the dependence on the

“system size” N , i.e., for any subset A ✓ [0, 1], G(N)

i

(n)(A) is the number of individuals in the ith deme with

a type in the set A. We write G(N)(n) = G

(N)

0

(n)⌦ · · ·⌦G

(N)

M

(n) for the product measure,

G(N)(n)(A) = G

(N)

0

(n)(A) · · ·G(N)

M

(n)(A).

Given a measure µ and a continuous function f on [0, 1], we will use the shorthand

hf, µi :=Z

f(x)µ(dx)

for the integral. More generally, if f 2 C([0, 1]M+1), then

hf, µ0

⌦ · · ·⌦ µ

M

i :=Z

f(x0

, . . . , x

M

)µ0

(dx0

) · · ·µM

(dxM

).

By definition, we have

hf,G(N)

i

(n)i = 1

N

i

NiX

j=1

f(Xij

(n)).

We model migration by assuming that with probability c

Ni$i2

(the factor of 1

2

is to maintain consistency ofnotation with the cited population genetics literature), a given individual in the n+1st generation is replacedby the migrant o↵spring of a parent chosen uniformly at random from the entire metapopulation, i.e.,weassume a parent of type X

pq

(n), where the p and q are drawn uniformly from {0, . . . ,M} and {1, . . . , Np

},respectively. Thus, the average number of migrants to a given island is asymptotically independent of N ;this is a weak migration limit. Equivalently, the parent is drawn from the metapopulation measure,

G

(N)(n) :=1

PM

k=0

N

k

MX

i=0

N

i

G

(N)

i

(n). (3)

4

Page 19: INVITED PAPER LinkingStatisticaland EcologicalTheory ... · hierarchical Dirichlet process. Using this approximation we developed an efficient Bayesian fitting strategy for the ...

Finally, we allow for the possibility that individuals mutate after birth; we assume that there is a prob-ability measure P

(N) such that the o↵spring of a parent with type x 2 [0, 1] mutates to a type in A ✓ [0, 1]with probability P

(N)(x,A). Define an operator Q(N) on C([0, 1]) by

(Q(N)

f)(x) =

Z1

0

f(y)P (N)(x, dy).

Then, for all f 2 C([0, 1]), we define

(Q(N)

i

f)(x) := Ehf(X

ij

)���G(N)(n), parent of type x

i

= (1� c

Ni

$

i

2)(Q(N)

f)(x) + c

Ni

$

i

2

Z(Q(N)

f)(y)G(N)(n)(dy) (4)

and

(B(N)

i

f)(x) :=$

i

2

✓Zf(y)G(N)(n)(dy)� f(x)

◆. (5)

While it may at first appear unusual, this notation will greatly simplify subsequent calculations.We will assume mutation is weak:

B := limN!1

c

�1

N

(I �Q

(N))

exists and B is a bounded operator. Thus, for any set A ✓ [0, 1], the probability that the o↵spring has atype in A approaches 1 as N ! 1, if the parent has a type in A, and approaches 0 otherwise. Here, c

N

is the coalescent e↵ective population size for the mainland, and we are making the standard assumptionthat mutation rates scale like the reciprocal of the e↵ective population size. For the sake of clarity in thearguments that follow, we emphasize that our assumptions entail that

Q

(N)

i

= I + c

NiB(N)

i

+ c

N

B + o(cN

).

One can consider many forms for the operator B; the operator

(B(L)

f)

✓i

L

◆=

L� 1

LX

j=1

✓f

✓j

L

◆� f

✓i

L

◆◆

corresponds to the classical population genetic models, in which the number of possible types is discrete andfinite (here, there are L) and mutation is symmetric (i.e., the o↵spring of an individual have the same typeas their parent with probability 1 � ✓

N

, and mutate to any other type with probability ✓

N(L�1)

). Since the

labels are arbitrary, they can be assumed to be chosen from the set�

1

L

,

2

L

, . . . , 1 . Now, as L ! 1, B(L)

converges to the operator

(Bf)(x) =✓

2

Z1

0

f(y) dy � f(x) = ✓(hf,�i � f(x)),

which corresponds to the infinitely many alleles model; the probability that two mutations give rise to thesame type is 0. We will henceforth assume B is of this form.

Remark 1. Although we have formulated the community dynamics in discrete time, we could equally well

consider a continuous time Markov process G

(N)

i

(t) in which disturbances happen at some rate D; in thelatter case, we consider the embedded Markov chain: if disturbances happen at random times ⌧

1

, ⌧

2

, . . ., then

the embedded chain is the process G(N)

i

(n) := G

(N)

i

(⌧n

). The limiting (continuous time) process as N ! 1is the same for both G

(N) and G

(N)

i

.

5

Page 20: INVITED PAPER LinkingStatisticaland EcologicalTheory ... · hierarchical Dirichlet process. Using this approximation we developed an efficient Bayesian fitting strategy for the ...

In the next section, we will consider the limiting behaviour as first N and then L are taken to infinity.We will see that under moment assumptions corresponding to those in [8], all of these models converge tothe same limiting process. First, however, we illustrate how Hubbell’s original UNTB is an example of ourclass of models.

Example 1 (Hubbell’s UNTB). In Hubbell’s original model [7], only a single individual is replaced in eachdeme at at each time step. We then have ⌫

0i

takes values in {0, 1}, with

P{⌫0i

= 1} = m.

We then have that the remaining o↵spring numbers are either

(⌫i1

, . . . , ⌫

iNi) = (1, . . . , 1, 0, 1, . . . , 1)

(the vector with i

th entry 0, and all others 1), if ⌫0i

= 1, and is

(⌫i1

, . . . , ⌫

iNi) = (1, . . . , 1, 0, 1, . . . , 1, 2, 1, . . . , 1)

(the vector with i

th entry 0 and j

th entry 2 for some i 6= j), if ⌫0i

= 0, with conditional probabilities equalto 1

Niand 1

Ni(Ni�1)

, respectively (and thus the ⌫

ij

are exchangeable, given ⌫

i0

).For this model, we have

c

Ni =2

N

i

(Ni

� 1),

whereas by definition, (⌫i1

)3

= 0, so (1) holds.In Hubbell’s model, immigrants are always from the mainland, which is assumed to have a fixed, sta-

tionary distribution (usually taken so that samples from the mainland are distributed according to Ewens’sampling formula [11]) , and no mutations are assumed to occur on the islands. We will not need to makethese assumptions, but will instead derive them (in the limit as N ! 1) as a consequence of the relativesize of the mainland and the islands.

Example 2 (“Wright-Fisher” UNTB). We can regard Hubbell’s UNTB as a community analogue of thediscrete Moran model. We could similarly define a community analogue to the Wright-Fisher model by assum-

ing that the vector (⌫i1

, . . . , ⌫

iNi) follows a multinomial distribution with parameters Ni

and⇣

1

Ni, . . . ,

1

Ni

⌘,

i.e., for each i:

P {(⌫i1

, . . . , ⌫

iNi) = (k1

, . . . , k

Ni)} =N

i

!

k

1

! · · · kNi !

✓1

N

i

◆k1

· · ·✓

1

N

i

◆kNi

.

Here, cNi =

1

Ni, whereas E [(⌫

i1

)3

] = 1

Ni

2

.

Example 3. We briefly note that it is possible to have cNi ⌘ 0, by assuming that (⌫

i1

, . . . , ⌫

iNi) = (1, . . . , 1)with probability 1 (a trivial case that we will ignore), whereas it need not be the case that

limNi!1

c

Ni = 0

if (1) is violated: if we assume that with probability 1

Ni, ⌫

ij

= N

i

and ⌫

ik

= 0 for all k 6= j, then c

Ni ⌘ 1.

1.3 Preliminaries Considering Exchangeable Variables

It is well known [12] that(N

i

)j

(Ni

)k

E⇥(⌫

i1

)k1 · · · (⌫ij)kj

⇤,

6

Page 21: INVITED PAPER LinkingStatisticaland EcologicalTheory ... · hierarchical Dirichlet process. Using this approximation we developed an efficient Bayesian fitting strategy for the ...

where j, k

1

, . . . , k

j

2 N and k := k

1

+ · · · + k

j

, is the probability that k individuals, sampled uniformlyat random without replacement from the i

th deme have exactly j parents in the previous generation, n.b.,exchangeability implies that

(Ni

)j

(Ni

)k

E⇥(⌫

i1

)k1 · · · (⌫ij)kj

⇤=

(Ni

)j

(Ni

)k

E⇥(⌫

i⇡(1)

)k1 · · · (⌫i⇡(j))kj

for any permutation ⇡ of {1, . . . , Ni

}, so that these probabilities only depend on j, k, and the unordered listof values k

1

, . . . , k

j

. In [8], we find the following monotonicity result for these probabilities:

Lemma 2. Let j � l, k

1

� m

1

, . . . , k

l

� m

l

, and m := m

1

+ · · ·+m

l

. Then,

(Ni

)j

(Ni

)k

E⇥(⌫

i1

)k1 · · · (⌫ij)kj

⇤ (N

i

)l

(Ni

)m

E [(⌫i1

)m1 · · · (⌫ij)ml ] .

Remark 2. In particular, in conjunction with Lemma 2, we have

(Ni

)j�1

(Ni

)j

E [(⌫i1

)2

i2

· · · ⌫ij�1

] c

Ni , (6)

(and, by exchangeability, whenever at least one k

i

� 2) and

(Ni

)j

(Ni

)k

E⇥(⌫

i1

)k1 · · · (⌫ij)kj

⇤= o(c

Ni) (7)

whenever kq

, k

r

� 2 for at least two distinct indices q, r or kq

� 3 for some index q.

Remark 3. In particular, in (7), (Ni)j

(Ni)kE⇥(⌫

i1

)k1 · · · (⌫ij)kj

⇤is always smaller than one of

(Ni

)1

(Ni

)3

E [(⌫i1

)3

]

or(N

i

)2

(Ni

)4

E [(⌫i1

)2

(⌫i2

)2

] .

In what follows, all terms o(cNi) will be of order at most equal to one of these two quantities (which are the

probability of three individuals sampled at random having the same parent in the previous generation, or asample of four individuals consisting of two pairs of descendants of two distinct parents, respectively) or willbe of order less than or equal to

cNiNi

. This will be very important when we consider the long timescale.

We will have use of some general relations between exchangeable random variables in the sequel:

Lemma 3. For all j > 1

E [⌫i1

· · · ⌫ij�1

]� E [⌫i1

· · · ⌫ij

] = (j � 1)(N

i

)j�1

(Ni

)j

E [(⌫i1

)2

· · · ⌫ij�1

] .

7

Page 22: INVITED PAPER LinkingStatisticaland EcologicalTheory ... · hierarchical Dirichlet process. Using this approximation we developed an efficient Bayesian fitting strategy for the ...

Proof. We begin by observing that

N

i

E [⌫i1

· · · ⌫ij�1

] = E [Ni

i1

· · · ⌫ij�1

]

= E [(⌫i1

+ · · ·+ ⌫

iNi)⌫i1 · · · ⌫ij�1

]

= E"

NiX

k=1

i1

· · · ⌫ij�1

ik

#

= E

2

4j�1X

k=1

i1

· · · ⌫ij�1

ik

+NiX

k=j

i1

· · · ⌫ij�1

ik

3

5

=j�1X

k=1

E [⌫i1

· · · ⌫ij�1

ik

] +NiX

k=j

E [⌫i1

· · · ⌫ij�1

ik

]

=j�1X

k=1

E

2

64⌫2ik

j�1Y

l=1l 6=k

il

3

75+NiX

k=j

E [⌫i1

· · · ⌫ij�1

ik

]

and thus, exploiting the exchangeability of the ⌫

ij

,

= (j � 1)E⇥⌫

2

i1

· · · ⌫ij�1

⇤+ (N

i

� j + 1)E [⌫i1

· · · ⌫ij

] .

On the other hand,

N

i

E [⌫i1

· · · ⌫ij�1

] = (j � 1)E [⌫i1

· · · ⌫ij�1

] + (Ni

� j + 1)E [⌫i1

· · · ⌫ij�1

] .

Equating the two sides and subtracting, we get

(Ni

� j + 1) (E [⌫i1

· · · ⌫ij�1

]� E [⌫i1

· · · ⌫ij

]) = (j � 1)�E⇥⌫

2

i1

· · · ⌫ij�1

⇤� E [⌫

i1

· · · ⌫ij�1

]�.

The result follows.

Remark 4. In conjunction with (6), the lemma tells us that for all j > 1,

E [⌫i1

· · · ⌫ij�1

]� E [⌫i1

· · · ⌫ij

] = O(cNi),

and thus,E [⌫

i1

· · · ⌫iq

]� E [⌫i1

· · · ⌫ir

] = O(cNi),

for any q < r.

Next, we observe that

Lemma 4. For all j,

E [⌫i1

· · · ⌫ij

] = 1�✓j

2

◆(N

i

)j�1

(Ni

)j

E [(⌫i1

)2

· · · ⌫ij�1

]� o(cNi).

Proof. This is a consequence of the identity

(Ni

)j

= (⌫i1

+ · · ·+ ⌫

iNi)j =X

j1+···+jNi=j

j!

j

1

! · · · jNi !

(⌫i1

)j1 · · · (⌫iNi)jNi

,

where we assume 0! = 1 for ease of notation, and we assume that most of the ji

are equal to zero. Equivalently,if we only consider non-zero values,

(Ni

)j

=jX

m=1

X

n1,...,nmdistinct

X

k1+···+km=k

j!

k

1

! · · · km

!(⌫

in1)k1 · · · (⌫inm)km .

8

Page 23: INVITED PAPER LinkingStatisticaland EcologicalTheory ... · hierarchical Dirichlet process. Using this approximation we developed an efficient Bayesian fitting strategy for the ...

Taking expectations on both sides, and using the exchangeability of (⌫i1

, . . . , ⌫

iNi), we have

(Ni

)j

=jX

m=1

X

n1,...,nmdistinct

X

k1+···+km=k

j!

k

1

! · · · km

!E [(⌫

i1

)k1 · · · (⌫im)

km ] .

Now, observe that the expected value of the summand is independent of the choice of the values n1

, . . . , n

m

,that can be chosen in

�N

m

�ways. Moreover, the expectation E [(⌫

i1

)k1 · · · (⌫im)

km ] remains unchanged underpermutations, and thus are all equal to

E⇥(⌫

i1

k1· · · (⌫

im

km

⇤,

where k

1

� k

2

� · · · � k

m

are the values k1

, . . . , k

m

listed in decreasing order. If we let ap

be the number ofindices q such that k

q

= p,a

p

= #{q : kq

= p},then

jX

m=1

X

n1,...,nmdistinct

X

k1+···+km=k

j!

k

1

! · · · km

!E [(⌫

i1

)k1 · · · (⌫im)

km ]

=jX

m=1

X

k1+···+km=k

k1�k2�···�km

j!

k

1

! · · · km

!

m!

a

1

! · · · aj

!

✓N

m

◆E⇥(⌫

i1

k1· · · (⌫

im

km

⇤,

so that, simplifying and dividing through by (N)j

, we have

1 =jX

m=1

X

k1+···+km=k

k1�k2�···�km

j!

k

1

! · · · km

!

1

a

1

! · · · aj

!

(N)m

(N)j

E⇥(⌫

i1

k1· · · (⌫

im

km

= E [⌫i1

· · · ⌫ij

] +

✓j

2

◆(N

i

)j�1

(Ni

)j

E [(⌫i1

)2

· · · ⌫ij�1

] + o(cNi),

where, using (7), we have truncated after the two highest order terms in the sum.

We conclude this section with a final observation,

Lemma 5. Let j > 1. Then,

(Ni

)j

(Ni

)j+1

E [(⌫i1

)2

i2

· · · ⌫ij

] =(N

i

)j�1

(Ni

)j

E [(⌫i1

)2

i2

· · · ⌫ij�1

] + o(cNi).

Proof. Again exploiting exchangeability, we see that

(Ni

� j+1)E [(⌫i1

)2

i2

· · · ⌫ij

]

= E [(⌫i1

)2

i2

· · · ⌫ij�1

ij

] + E [(⌫i1

)2

i2

· · · ⌫ij�1

ij+1

] + · · ·+ E [(⌫i1

)2

i2

· · · ⌫ij�1

iNi ]

= E [(⌫i1

)2

i2

· · · ⌫ij�1

(⌫ij

+ · · ·+ ⌫

iNi)]

= E [(⌫i1

)2

i2

· · · ⌫ij�1

(Ni

� ⌫

i1

� · · ·� ⌫

ij�1

)]

= E [(⌫i1

)2

i2

· · · ⌫ij�1

(Ni

� j + 2� (⌫i1

� 2)� (⌫i2

� 1)� · · ·� (⌫ij�1

� 1))]

= (Ni

� j + 2)E [(⌫i1

)2

i2

· · · ⌫ij�1

]� E [(⌫i1

)3

i2

· · · ⌫ij�1

]

� E [(⌫i1

)2

(⌫i2

)2

i3

· · · ⌫ij�1

]� · · ·� E [(⌫i1

)2

i2

· · · (⌫ij�1

)2

] .

9

Page 24: INVITED PAPER LinkingStatisticaland EcologicalTheory ... · hierarchical Dirichlet process. Using this approximation we developed an efficient Bayesian fitting strategy for the ...

In particular, dividing both sides by (Ni

� j + 1)(Ni

� j + 2), we have

(Ni

)j

(Ni

)j+1

E [(⌫i1

)2

i2

· · · ⌫ij

] =(N

i

)j�1

(Ni

)j

E [(⌫i1

)2

i2

· · · ⌫ij�1

]� (Ni

)j�1

(Ni

)j+1

E [(⌫i1

)3

i2

· · · ⌫ij�1

]

� (Ni

)j�1

(Ni

)j+1

E [(⌫i1

)2

(⌫i2

)2

i3

· · · ⌫ij�1

]� · · ·� (Ni

)j�1

(Ni

)j+1

E [(⌫i1

)2

i2

· · · (⌫ij�1

)2

]

and the result again follows by (7).

Remark 5. Iterating the previous lemma, we see that

(Ni

)j

(Ni

)j+1

E [(⌫i1

)2

· · · ⌫ij

] = · · · = E [(⌫i1

)2

]

N

i

� 1+ o(c

Ni) = c

Ni + o(cNi).

1.4 Convergence to a Limit

We will be interested in weak limits of the random measures G(N)(n) in two time-scales determined by N ,a “slow-time” process, G(N)(bc�1

N

tc), and an “intermediate-time” process G(N)(ba�1

N

tc), where t > 0 is acontinuous time variable, and we will consider the limits as N ! 1.

Our principal tool in doing this is the generator of G(N)(n), an operator on C(P([0, 1])M+1) defined by

(GN

F )(µ) = EhF (G(N)(n+ 1))

���G(N)(n) = µi� F (µ).

Knowing (GN

F )(µ) for all F 2 C(P([0, 1])M+1) and all µ 2 P([0, 1])M+1 completely characterizes thetransition probabilities of G(N), and thus, together with the initial value G(N)(0), allow us to characterizethe process (although not necessarily the limit, see e.g., [6]).

Our limiting processes are continuous, rather than discrete time random variables, but also have as-sociated generators; in general, if H(t) is a continuous time process taking values in P([0, 1])M+1 andF 2 C(P([0, 1])M+1), then H(t) has generator H:

(HF )(µ) = limh!0

+

E [F (H(t+ h))|H(t) = µ]� F (µ)

h

,

with domain D(H), consisting of all functions F for which the limit exists.The notion of a generator simultaneously generalizes the transition matrix, master equation, and di↵usion

equations of classical probability. The typical proof of convergence proceeds by first showing that a limitexists, then characterizing the limit by first determining the limit of the generators, and finally showing thatgiven the initial conditions (via a distribution from which they are drawn), there is a unique process withthat generator (e.g., [6] is a standard reference).

Remark 6. Note that (HF )(µ) is the right-hand derivative of E [F (H(t+ h))|H(t) = µ] at t = 0. In partic-ular, if the generator vanishes, then E [F (H(t))|H(0) = µ] = F (µ) for all t > 0, and all F , and the processH(t) ⌘ µ is constant. This will be important when we come to consider the limit on the intermediate timescale.

We will make use of the fact that the set of functions

C :=

(F (µ) =

MY

i=0

KiY

k=1

hfik

, µ

i

i�����Ki

2 N0

, f

ik

2 C([0, 1])

)

is separating, and convergence determining [6], so that for the purpose of characterizing our process and itslimits, we need only compute the value the generator takes on functions F 2 C and its limits.

We will evaluate the generator on this class of functions, but we first begin with a pair of lemmas.We will use

`to indicate the disjoint union of sets and, for all integers M > 0, we use the shorthand

[M ] = {1, . . . ,M}.

10

Page 25: INVITED PAPER LinkingStatisticaland EcologicalTheory ... · hierarchical Dirichlet process. Using this approximation we developed an efficient Bayesian fitting strategy for the ...

Lemma 6. Let µ

i

= 1

Ni

PNi

j=1

xij for x

ij

2 [0, 1]. Then,

KiY

k=1

hfik

, µ

i

i = 1

N

Kii

KiX

m=1

X

j1,...,jmdistinct

X

A1`

···`

Am=[Ki]

mY

q=1

Y

r2Aq

f

ir

(xijq )

=1

N

Kii

X

j1,...,jKidistinct

KiY

k=1

f

ik

(xijk) +O

�N

�1

�,

where the sum is over all partitions of [Ki

] into m disjoint sets.

Proof. The first statement is simply a matter of collecting terms according to the number of distinct valuesj

k

:

N

Kii

KiY

k=1

hfik

, µ

i

i = N

Kii

KiY

k=1

0

@ 1

N

i

NiX

j=1

f

ik

(xij

)

1

A =NiX

j1=1

· · ·NiX

jKi=1

KiY

k=1

f

ik

(xijk)

=KiX

m=1

X

j1,...,jmdistinct

X

A1`

···`

Am=[Ki]

mY

q=1

Y

r2Aq

f

ir

(xijq ).

Now, for the final term, we have m = 1, A1

= [Ki

], so it takes the form:

NiX

j1=1

KiY

k=1

f

ik

(xij1) = N

i

hKiY

k=1

f

ik

, µ

i

i,

whilst for m = 2, we have:

=NiX

j1=1

X

j2 6=j1

X

A1`

A2=[Ki]

Y

k2A1

f

ik

(xij1)

Y

k2A2

f

ik

(xij2)

=X

A1`

A2=[Ki]

NiX

j1=1

Y

k2A1

f

ik

(xij1)

NiX

j2=1

Y

k2A2

f

ik

(xij2)�

X

A1`

A2=[Ki]

NiX

j1=1

Y

k2A1

f

ik

(xij1)

Y

k2A2

f

ik

(xij1)

= N

2

i

X

A1`

A2=[Ki]

hY

k2A1

f

ik

, µ

i

ihY

k2A2

f

ik

, µ

i

i � S(Ki

, 2)Ni

hY

k2A1

f

ik

, µ

i

i,

where S(Ki

, 2) is a Stirling number of the second kind [13] and gives the number of distinct partitions of Ki

elements into 2 sets.Proceeding inductively in this manner completes the proof of the lemma.

The previous lemma shows we will be interested in products over distinct indices j1

, . . . , j

m

. In particular,we have the result of Lemma 7.

Lemma 7. For distinct values j

1

, . . . , j

Ki in {1, . . . , Ni

},

E"

KiY

k=1

f

ik

(Xijk(n+ 1))

�����{Xij

(n) = x

ij

}#=

E [⌫i1

· · · ⌫iKi ]

(Ni

)Ki

X

p1,...,pKidistinct

KiY

k=1

(Q(N)

i

f

ik

)(xipk)

+E [(⌫

i1

)2

i2

· · · ⌫iKi�1

]

(Ni

)Ki

X

q<r

X

p1,...,pKipq=pr

KiY

k=1k 6=q,r

(Q(N)

i

f

ik

)(xipk)(Q

(N)

i

f

iq

Q

(N)

i

f

ir

)(xipq ) + o(c

Ni).

11

Page 26: INVITED PAPER LinkingStatisticaland EcologicalTheory ... · hierarchical Dirichlet process. Using this approximation we developed an efficient Bayesian fitting strategy for the ...

Proof. We begin by recalling that conditional on an individual’s parent having type x, its type is indepen-dently distributed according to the probability measure P (x, ·), i.e.,

Ehf(X

ij

(n+ 1))���G(N)(n), parent of type x

i= (Q(N)

i

f)(x).

We can thus, similar to the previous lemma, write:

E"

KiY

k=1

f

ik

(Xijk(n+ 1))

�����{Xij

(n) = x

ij

}#

=KiX

m=1

X

p1,...,pmdistinct

X

A1`

···`

Am=[Ki]

E⇥(⌫

ip1)|A1| · · · (⌫ipm)|Am|⇤

(Ni

)Ki

mY

q=1

Y

r2Aq

(Q(N)

i

f

ir

)(xipq ),

whereE⇥(⌫

ip1)|A1| · · · (⌫ipm)|Am|⇤

(Ni

)Ki

=E⇥(⌫

i1

)|A1| · · · (⌫im)|Am|⇤

(Ni

)Ki

is the probability that the K

i

distinct individuals have m ancestors p

1

, . . . , p

m

(with types x

ip1 , . . . , xpm),and that the individuals in A

q

had parent pq

.

Next, we observe that since���Q(N)

i

��� 1,

������

X

p1,...,pmdistinct

X

A1`

···`

Am=[Ki]

E⇥(⌫

ip1)|A1| · · · (⌫ipm)|Am|⇤

(Ni

)Ki

mY

q=1

Y

r2Aq

(Q(N)

i

f

ir

)(xipq )

������

X

A1`

···`

Am=[Ki]

(Ni

)m

(Ni

)Ki

E⇥(⌫

i1

)|A1| · · · (⌫im)|Am|⇤ KiY

k=1

kfik

k

and is thus o(cNi) whenever |A

q

| � 3 for some q or |Aq

| and |Ar

| are both � 2 for distinct indices q, r by(7). The result follows.

We now turn to the main result of this section:

Proposition 1. Let µ

(N)

i

= 1

Ni

PNi

j=1

xij , for x

ij

2 [0, 1] and let µ(N) = µ

(N)

1

⌦ · · ·⌦ µ

(N)

M

converge weakly

to a measure µ 2 P([0, 1])M+1

.

Let F (µ) =Q

M

i=0

QKi

k=1

hfik

, µ

i

i 2 C and, for i = 1, . . . ,M , let

(Gi

F )(µ) =MY

j=0j 6=i

KjY

k=1

hfjk

, µ

i

i

0

B@KiX

q=1

$

i

2hf

iq

, µ

i

� µ

0

iKiY

k=1k 6=q

hfik

, µ

i

i

+1

2

X

q 6=r

KiY

k=1k 6=q,r

hfik

, µ

i

i (hfiq

f

ir

, µ

i

i � hfiq

, µ

i

ihfir

, µ

i

i)

1

CA (8)

define an operator on C(P([0, 1])M+1).Then,

limN!1

a

�1

N

(GN

F )(µ(N)) = (GF )(µ) :=MX

i=1

i

(Gi

F )(µ).

12

Page 27: INVITED PAPER LinkingStatisticaland EcologicalTheory ... · hierarchical Dirichlet process. Using this approximation we developed an efficient Bayesian fitting strategy for the ...

Moreover, given µ

i

2 P(P([0, 1])), there exist unique independent processes G

i

(t) with generators Gi

, such

that G

i

(0) is distributed according to µ

i

and such that

G(N)(ba�1

N

tc) ) G(t) := G

0

(0)⌦G

1

(�1

t)⌦ · · ·⌦G

M

(�M

t),

for all t > 0, where convergence is in the space of cadlag functions endowed with the Skorokhod topology,

DP([0,1])

M+1 [0,1) (see e.g., [6]).

Remark 7. Becauselim

N!1

c

N

a

N

= 0,

the component of the generator acting on the mainland vanishes in the limit; if

C0

:=

(F 2 C

�����F (µ) =K0Y

k=1

hfik

, µ

0

i),

then Gi

F ⌘ 0 for all FC0

and thus the generator vanishes on this set. Equivalently, the process G0

(t) ⌘ µ

0

.

Remark 8. Recall from Equation 2 that the e↵ective population size of the i

th island is c

Ni ⇠ �

i

a

N

; sincewe have rescaled time by a

N

rather than the individual e↵ective population sizes, the factors �

i

appear inthe generator and in the components G

i

. These reflect the fact that the di↵erent e↵ective population sizeson the di↵erent islands result in their population dynamics having di↵erent rates (i.e., di↵erent expectedinter-event times), which are given by the �

i

.

Remark 9. This theorem tells us that on the intermediate time scale, the islands have essentially independentdynamics, coupled only by immigration from a mainland which remains unchanged on the intermediatetimescale. The generator of the dynamics on the island is identical to that in the infinite population limitfor the infinitely many alleles model, with the rescaled migration rate, $i

2

taking the place of the rescaledmutation rate ✓, and the mainland density measure µ

0

taking the place of Lebesgue measure.

Proof. Applying Lemmas 6 and 7, we have

EhF (G(N)(n+ 1))

���G(N)(n) = µi= E

"MY

i=0

KiY

k=1

hfik

, G

(N)

i

(n+ 1)i�����{Xij

(n) = x

ij

}#

=MY

i=0

E"

KiY

k=1

hfik

, G

(N)

i

(n+ 1)i�����{Xij

(n) = x

ij

}#

=MY

i=0

1

N

Kii

KiX

m=1

X

j1,...,jmdistinct

X

A1`

···`

Am=[Ki]

E

2

4mY

q=1

Y

r2Aq

f

ir

(xijq )

������{X

ij

(n) = x

ij

}

3

5

=MY

i=0

1

N

Kii

KiX

m=1

X

j1,...,jmdistinct

X

A1`

···`

Am=[Ki]

0

@E [⌫i1

· · · ⌫im

]

(Ni

)m

X

p1,...,pmdistinct

mY

k=1

(Q(N)

i

Y

l2Ak

f

il

)(xipk)

+E [(⌫

i1

)2

i2

· · · ⌫im�1

]

(Ni

)m

X

q<r

X

p1,...,pmpq=pr

mY

k=1k 6=q,r

(Q(N)

i

Y

l2Ak

f

il

)(xipk)((Q

(N)

i

Y

l2Aq

f

il

)(Q(N)

i

Y

l2Ar

f

il

))(xipq ) + o(c

Ni)

1

CA .

Now, observing that the term in brackets is independent of the values j

k

, we note that j

1

, . . . , j

m

can be

13

Page 28: INVITED PAPER LinkingStatisticaland EcologicalTheory ... · hierarchical Dirichlet process. Using this approximation we developed an efficient Bayesian fitting strategy for the ...

chosen in (Ni

)m

ways, and we are left with a product over sums of the form:

1

N

Kii

KiX

m=1

E [⌫i1

· · · ⌫im

]X

A1`

···`

Am=[Ki]

X

p1,...,pmdistinct

mY

k=1

(Q(N)

i

Y

l2Ak

f

il

)(xipk)

+1

N

Kii

KiX

m=1

E [(⌫i1

)2

i2

· · · ⌫im�1

]X

A1`

···`

Am=[Ki]

X

q<r

X

p1,...,pmpq=pr

mY

k=1k 6=q,r

(Q(N)

i

Y

l2Ak

f

il

)(xipk)((Q

(N)

i

Y

l2Aq

f

il

)(Q(N)

i

Y

l2Ar

f

il

))(xipq ) + o(c

Ni).

We will focus our attention on the first sum in the first line. Using Lemma 6 in reverse, we have

1

N

Kii

X

p1,...,pKidistinct

KiY

k=1

(Q(N)

i

f

ik

)(xipk) =

KiY

k=1

hQ(N)

i

f

ik

, µ

i

i

� 1

N

Kii

Ki�1X

m=1

X

A1`

···`

Am=[Ki]

X

p1,...,pmdistinct

mY

k=1

(Q(N)

i

Y

l2Ak

f

il

)(xipk),

where the terms on the second line are O⇣

1

Ni

⌘. Thus,

1

N

Kii

KiX

m=1

E [⌫i1

· · · ⌫im

]X

A1`

···`

Am=[Ki]

X

p1,...,pmdistinct

mY

k=1

(Q(N)

i

Y

l2Ak

f

il

)(xipk)

= E [⌫i1

· · · ⌫iKi ]

KiY

k=1

hQ(N)

i

f

ik

, µ

i

i

+1

N

Kii

Ki�1X

m=1

(E [⌫i1

· · · ⌫im

]� E [⌫i1

· · · ⌫iKi ])

X

A1`

···`

Am=[Ki]

X

p1,...,pmdistinct

mY

k=1

(Q(N)

i

Y

l2Ak

f

il

)(xipk).

Further, we observed in Remark 4 that the di↵erences E [⌫i1

· · · ⌫im

]�E [⌫i1

· · · ⌫iKi ] are O(c

Ni), so that thefirst sum reduces to

E [⌫i1

· · · ⌫iKi ]

KiY

k=1

hQ(N)

i

f

ik

, µ

i

i+ o(cNi).

Proceeding similarly, applying Lemma 6 with the set ofKi

�1 distinct functions {Q(N)

i

f

ik

}k 6=q,r

[{(Q(N)

i

f

iq

)(Q(N)

i

f

ir

)},we see that

1

N

Kii

KiX

m=1

E [(⌫i1

)2

i2

· · · ⌫im�1

]X

A1`

···`

Am=[Ki]

X

q<r

X

p1,...,pmpq=pr

mY

k=1k 6=q,r

(Q(N)

i

Y

l2Ak

f

il

)(xipk)((Q

(N)

i

Y

l2Aq

f

il

)(Q(N)

i

Y

l2Ar

f

il

))(xipq )

=1

N

i

E [(⌫i1

)2

i2

· · · ⌫iKi�1

]X

q<r

KiY

k=1k 6=q,r

hQ(N)

i

f

ik

, µ

i

ih(Q(N)

i

f

iq

)(Q(N)

i

f

ir

), µi

i+ o(cNi),

14

Page 29: INVITED PAPER LinkingStatisticaland EcologicalTheory ... · hierarchical Dirichlet process. Using this approximation we developed an efficient Bayesian fitting strategy for the ...

where we have used the fact that 1

NiE [(⌫

i1

)2

i2

· · · ⌫im�1

] = O(cNi) in bounding the higher order terms.

Thus,

EhF (G(N)(n+ 1))

���G(N)(n) = µi=

MY

i=0

E [⌫

i1

· · · ⌫iKi ]

KiY

k=1

hQ(N)

i

f

ik

, µ

i

i

+1

N

i

E [(⌫i1

)2

i2

· · · ⌫iKi�1

]X

q<r

KiY

k=1k 6=q,r

hQ(N)

i

f

ik

, µ

i

ih(Q(N)

i

f

iq

)(Q(N)

i

f

ir

), µi

i+ o(cNi)

1

CA

=MY

i=0

E [⌫i1

· · · ⌫iKi ]

KiY

k=1

hQ(N)

i

f

ik

, µ

i

i+MX

i=0

MY

j=0j 6=i

E⇥⌫

j1

· · · ⌫jKj

⇤ KjY

k=1

hQ(N)

i

f

jk

, µ

i

i

⇥ 1

N

i

E [(⌫i1

)2

i2

· · · ⌫iKi�1

]X

q<r

KiY

k=1k 6=q,r

hQ(N)

i

f

ik

, µ

i

ih(Q(N)

i

f

iq

)(Q(N)

i

f

ir

), µi

i+ o(cNi).

Further, recalling (4), by assumption

Q

(N)

i

= I + c

NiB(N)

i

+O(cNi),

we have

EhF (G(N)(n+ 1))

���G(N)(n) = µi

=MY

i=0

E [⌫i1

· · · ⌫iKi ]

KiY

k=1

hfik

, µ

i

i

+MX

i=0

c

NiE [⌫i1

· · · ⌫iKi ]

MY

j=0j 6=i

E⇥⌫

j1

· · · ⌫jKj

⇤ KjY

k=1

hfjk

, µ

i

iKiX

q=1

hB(N)

i

f

iq

, µ

i

iKiY

k=1k 6=q

hfik

, µ

i

i

+MX

i=0

MY

j=0j 6=i

E⇥⌫

j1

· · · ⌫jKj

⇤ KjY

k=1

hfjk

, µ

i

i 1

N

i

E [(⌫i1

)2

i2

· · · ⌫iKi�1

]X

q<r

KiY

k=1k 6=q,r

hfik

, µ

i

ihfiq

f

ir

, µ

i

i

+ o(cNi). (9)

Now recall,

(B(N)

i

f)(x) :=$

i

2

✓Zf(y)G(N)(n)(dy)� f(x)

◆,

and, from (3), we have

G

(N)(n) =1

PM

k=0

N

k

MX

i=0

N

i

G

(N)

i

(n) =1

PM

k=0

N

k

MX

i=0

N

i

µ

i

= µ

0

+O⇣

NiN0

⌘,

so that

(B(N)

i

f)(x) =$

i

2

✓Zf(y)µ

0

(dy)� f(x)

◆+ o(1).

15

Page 30: INVITED PAPER LinkingStatisticaland EcologicalTheory ... · hierarchical Dirichlet process. Using this approximation we developed an efficient Bayesian fitting strategy for the ...

Now, recalling Lemma 4, we have

E [⌫i1

· · · ⌫iKi ] = 1�

✓K

i

2

◆(N

i

)Ki�1

(Ni

)Ki

E [(⌫i1

)2

i2

· · · ⌫iKi�1

]� o(cNi)

= 1�✓K

i

2

◆1

N

i

�K

i

+ 1E [(⌫

i1

)2

i2

· · · ⌫iKi�1

]� o(cNi)

= 1�✓K

i

2

◆✓1

N

i

+K

i

� 1

N

i

(Ni

�K

i

+ 1)

◆E [(⌫

i1

)2

i2

· · · ⌫iKi�1

]� o(cNi)

= 1�✓K

i

2

◆1

N

i

E [(⌫i1

)2

i2

· · · ⌫iKi�1

]� o(cNi),

so that

F (µ) =MY

i=0

KiY

k=1

hfik

, µ

i

i

=MY

i=0

E [⌫

i1

· · · ⌫iKi ]

KiY

k=1

hfik

, µ

i

i+✓K

i

2

◆1

N

i

E [(⌫i1

)2

i2

· · · ⌫iKi�1

]KiY

k=1

hfik

, µ

i

i+ o(cNi)

!

=MY

i=0

E [⌫i1

· · · ⌫iKi ]

KiY

k=1

hfik

, µ

i

i

+MX

i=0

MY

j=0j 6=i

E⇥⌫

j1

· · · ⌫jKj

⇤ KjY

k=1

hfjk

, µ

j

i 1

N

i

E [(⌫i1

)2

i2

· · · ⌫iKi�1

]X

q<r

KiY

k=1

hfik

, µ

i

i+ o(cNi). (10)

Thus, taking the di↵erence of (9) and (10) and using Lemmas 4 and 5 respectively to replace E [⌫i1

· · · ⌫iKi ]

and 1

NiE [(⌫

i1

)2

i2

· · · ⌫iKi�1

] by 1�O(cNi) and c

Ni + o(cNi), we see that

EhF (G(N)(n+ 1))

���G(N)(n) = µi� F (µ) = c

Ni

MX

i=0

(Gi

F )(µ) + o(cNi).

The first assertion follows directly.We now observe that, restricted to the space of functions

Ci

:=

(F (µ) =

KiY

k=1

hfij

, µ

i

i�����Ki

2 N0

, f

ij

2 C([0, 1])

)✓ C(P([0, 1])),

Gi

is exactly the generator (4.4) of the infinitely many alleles model of Chapter 10 of [6]. In particular,Theorem 4.1 of the same chapter tells us that given a fixed initial measure µ

i

2 P(P([0, 1])), the martingaleproblem for (G

i

, µ

i

) is well posed, i.e., there exists a unique in distribution process G

i

(t) with initial valueG

i

(0) distributed according to µ

i

with generator Gi

. Moreover, using Theorem 1.1 of Chapter 6 of [6], wesee that G

i

(�i

t) is the unique process with generator �

i

Gi

. We can thus appeal to Theorem 10.1 in [6] toconclude that given an initial measure µ = µ

0

⌦ · · ·⌦ µ

M

, then the martingale problem for

MX

i=1

i

Gi

is well posed and has solution G

0

(0)⌦G

1

(�1

t)⌦ · · ·⌦G

M

(�M

t).Given convergence of the generators, and well-posedness of the limiting generator, the second assertion

then follows by Lemma 5.1 in Chapter 4 of [6].

16

Page 31: INVITED PAPER LinkingStatisticaland EcologicalTheory ... · hierarchical Dirichlet process. Using this approximation we developed an efficient Bayesian fitting strategy for the ...

Finally, we conclude this section by observing that our characterization of the limiting generator in termsof the generator of the infinitely many alleles di↵usion model also allows us to characterize the stationarydistribution:

Corollary 1. The stationary process for the islands is the joint law of M independent Dirichlet processes

with scaling parameters $

i

and base probability measure µ

0

, DP($i

, µ

0

).

Proof. This is immediate from the result for a single copy of the infinitely many alleles model. Seee.g.,Theorem 4.1, Chapter 9 in [6].

1.5 Long-Term Behaviour

In the previous section, we simply assumed that the mainlands were asymptotically smaller in size (asmeasured by the coalescence probability of two randomly selected individuals) than the mainland, in orderto show that the Cannings’ UNTB converged to a sum of independent copies of the infinitely many allelesdi↵usion process, with migration from the mainland playing the role of mutation. In this section, we willshow that in a slow timescale, the dynamics on the mainland converge to the standard infinitely many allelesmodel as well, from which we can conclude, as before, that the stationary distribution of the mainland is thatof the Dirichlet process DP(✓,�), where � is the Lebesgue measure on [0, 1]. Thus, after a transient period,the mainland will approach a measure µ

0

⇠ DP(✓,�), whereas the islands will converge on HierarchicalDirichlet Processes DP($

i

, µ

0

) [14].Let ⌫

i

, i = 1, .., n, be the law of the stationary process DP($i

, µ

0

) from Corollary 1 above, and let⌫ = ⌫

1

⌦ · · ·⌦ ⌫

M

, i.e., given a function F 2 C(P([0, 1])M ),Z

F (µ) ⌫(dµ) =

Z· · ·

ZF (µ

1

, . . . , µ

M

) ⌫1

(dµ1

) · · · ⌫M

(dµM

),

then ⌫ is a stationary distribution for G(t): we haveZ(GF )(µ)⌫(dµ) = 0

for all F 2 C(P([0, 1])M+1), or equivalently, writing T (t) for the semi-group generated by G, (i.e.,

(T (t)F )(µ) = E [F (G(t))|G(0) = µ] ,

where G(t) is the process with generator G of Proposition 1) we haveZ(T (t)F )(µ)⌫(dµ) =

ZF (µ) ⌫(dµ)

for all F 2 C(P([0, 1])M ).We start by showing that as t ! 1, G(t) converges to a stationary process G? distributed according to

⌫, (i.e.,P {G?(t) 2 A|G?(0) ⇠ ⌫} = ⌫(A)

for all subsets A ✓ P([0, 1])M+1). To this end, we begin with a series of lemmas, which are essentially thesame as results appearing in [15]:

Lemma 8. Let µ = µ

0

⌦· · ·⌦µ

M

2 P([0, 1])M+1

and let F (µ) =Q

M

i=0

QKi

k=1

hfik

, µ

i

i 2 C. Let K =P

M

i=0

K

i

be the degree of F . If K � 1, there exists a scalar � > 0 and a function , which is a sum of functions of

the same form as F , but of degree K � 1, such that

GF = ��F + .

Thus,

(T (t)F )(µ) = e

��t

F +

Zt

0

e

��(t�s)T (s) ds.

17

Page 32: INVITED PAPER LinkingStatisticaland EcologicalTheory ... · hierarchical Dirichlet process. Using this approximation we developed an efficient Bayesian fitting strategy for the ...

Proof. Recalling Equation (8), we have

(Gi

F )(µ) =MX

i=0

X

1j 6=kKi

(hfij

f

ik

, µ

i

i � hfij

, µ

i

ihfik

, µ

i

i)Y

l 6=j,k

hfil

, µ

i

i

+MX

i=0

KiX

j=1

$

i

2hf

ij

, x

0

� µ

i

iY

k 6=j

hfik

, µ

i

i

= �

MX

i=0

K

i

(Ki

� 1)

2+$

i

2

!F +

MX

i=0

hX

1j 6=kKi

f

ij

f

ik

, µ

i

iY

l 6=j,k

hfil

, µ

i

i

+MX

i=0

hKiX

j=1

$

i

2f

ij

, x

0

iY

k 6=j

hfik

, µ

i

i,

giving the first statement. In particular, if K = 1, say K

i

= 1, we have

GF = ��i$i

2F + h�i$i

2f

i1

, x

0

i.

For the second statement, we observe that

d

dt

e

�tT (t)F = e

�t (�T (t)F + T (t)GF ) = e

�tT (t) .

The result follows by integrating both sides over (0, t).

With this lemma, we can show that the process G(t) is ergodic, i.e., the distribution of G(t) convergeson ⌫, independently of the initial condition.

Proposition 2. Let F 2 C(P([0, 1])M ). As t ! 1,

limt!1

����T (t)F �Z

F (µ) ⌫(dµ)

���� = 0.

Proof. Since they are convergence-determining, it su�ces to show the result for functions of the form F 2 C.We then have

(T (t)F ) = e

��t

F +

Zt

0

e

��(t�s)T (s) ds

for � > 0 and of degree K � 1. Integrating both sides, and recalling thatZ(T (t)F )(µ)⌫(dµ) =

ZF (µ) ⌫(dµ)

we have ZF (µ) ⌫(dµ) = e

��t

ZF (µ) ⌫(dµ) +

Zt

0

e

��(t�s)T (s)

Z (x) ⌫(dµ) ds,

so that����T (t)F �

ZF (µ) ⌫(dµ)

���� e

��t

ZF (µ) ⌫(dµ) +

Zt

0

e

��(t�s)

����T (s) �Z (µ) ⌫(dµ)

���� ds.

The first term on the right hand side clearly vanishes as t ! 1; for the latter, we can iterate the aboveinequality, relying on the fact that the process will eventually terminate when the degree reaches 1; whenK = 1, say (µ) = hf

i1

, µ

i

i, we have

(T (t) ) = e

�!i2 t

+

Zt

0

e

�!i2 (t�s)T (s)h!i

2f

i1

, x

0

i ds = e

�!i2 t

+

Zt

0

e

��(t�s)h!i

2f

i1

, x

0

i ds

18

Page 33: INVITED PAPER LinkingStatisticaland EcologicalTheory ... · hierarchical Dirichlet process. Using this approximation we developed an efficient Bayesian fitting strategy for the ...

whereas Z (µ) ⌫(dµ) = e

�!i2 t

Z (µ) ⌫(dµ) +

Zt

0

e

��(t�s)h!i

2f

i1

, x

0

i ds,

so that ����T (t) �Z (µ) ⌫(dµ)

���� = e

�!i2 t

���� �Z (µ) ⌫(dµ)

���� ! 0

as t ! 1.

Define a linear map P on C(P([0, 1])M ) by

PF =

ZF (µ) ⌫(dµ),

i.e.,P sends F 2 C(P([0, 1])M ) to a constant function; more generally, if F 2 C(P([0, 1])M+1), PF is afunction of µ

0

alone. In particular,

(PF )(µ0

) = E [F (µ0

, G

1

(t), . . . , GM

(t))|Gi

(0) ⇠ ⌫

i

] ,

so that applying the operator P is equivalent to conditioning on the islands being at their stationary state.Note that P2 = P, so that P is a projection. Moreover,

P(GF ) =

Z(GF ) ⌫(dµ) = 0,

so the range of G is contained in the null space of P, R(G) ✓ N (P), whereas G1 = 0, so that R(P) ✓ N (G).In fact, we have:

Lemma 9. P is the spectral projection onto N (G).

Proof. By definition, the spectral projection onto N (G), Q, is the residue of the resolvent of G at � = 0:

Q = lim�!0

+�(�� G)�1 = lim

�!0

+�

Z 1

0

e

��tT (t) dt.

Now, fix " > 0 and choose t

0

> 0 so that kT (t)� Pk < " for t > t

0

. Then, for � > 0,

�����Z 1

0

e

��tT (t) dt� P���� =

�����Z 1

0

e

��t (T (t)� P) dt

����

Z 1

0

e

��t kT (t)� Pk dt

= �

Zt0

0

e

��t kT (t)� Pk dt+ �

Z 1

t0

e

��t kT (t)� Pk dt

�t

0

suptt0

kT (t)� Pk+ ".

kT (t)� Pk is a continuous function, and thus bounded on [0, t0

]. Thus the first term vanishes as � ! 0+,whereas " can be chosen arbitrarily small. We conclude Q = P.

With this, we are able to obtain our final result.

Proposition 3. Assume, as before, that

limN!1

c

N

a

N

= 0.

19

Page 34: INVITED PAPER LinkingStatisticaland EcologicalTheory ... · hierarchical Dirichlet process. Using this approximation we developed an efficient Bayesian fitting strategy for the ...

Let P be the projection defined above. Define an operator G0

on C0

by

(G0

F )(µ) =

0

B@K0X

q=1

2hf

0q

,�� µ

0

iK0Y

k=1k 6=q

hf0k

, µ

0

i

+1

2

X

q 6=r

K0Y

k=1k 6=q,r

hf0k

, µ

0

i (hf0q

f

0r

, µ

0

i � hf0q

, µ

0

ihf0r

, µ

0

i)

1

CA , (11)

and let T0

(t) be the semigroup generated by PG0

. Then, for all F 2 C(P([0, 1])M ), and all � 2 (0, 1) we

have ⇣I + G(N)

⌘bc�1N tc

F ! T0

(t)PF

uniformly in � t �

�1

. If in addition, we assume that G

i

(0) ⇠ ⌫

i

for all i = 1, . . . ,M , and G

0

(t) is a

stochastic process with generator G0

, then

G(N)(bc�1

N

tc) ) G(t) = G

0

(t)⌦G

1

(t) · · ·⌦G

M

(t),

where the processes G

i

(t) are stationary for all i = 1, . . . ,M .

Remark 10. The heuristic understanding of Proposition 3 is that

G(N) = c

�1

N

a

N

G + c

�1

N

G0

+ lower order terms

where HP ⌘ 0. Now c

�1

N

a

N

! 1 as N ! 1, so the first term dominates. c

�1

N

a

N

is essentially the rateat which the first term shapes the dynamics of the process, and so as N grows large, the first term, whichacts only on the islands, causes them to rapidly approach their equilibrium state (which, as we have alreadyseen, corresponds to projection by P). The first term, however, has no e↵ect on the mainland. Moreover,the mainland only changes at the slower rate c

�1

N

. Thus, the first term has already forced the faster termsto equilibrium, and we can assume that they are at equilibrium when we consider the mainland. Finally, thefirst two terms completely specify the limit, so what remains can only contribute a higher order correction.This is essentially the infinite dimensional analogue of the following simple dynamical system:

x = �Nax+ f(x, y),

y = �pNby + g(x, y),

for a, b > 0. Using variation of constants, we have

x(t) = e

�Nat

x(0) +

Zt

0

e

�Na(t�s)

f(x(s), y(s)) ds,

y(t) = e

�pNbt

y(0) +

Zt

0

e

�pNb(t�s)

g(x(s), y(s)) ds.

Thus, provided f and g are bounded,

Zt

0

e

�Na(t�s)

f(x(s), y(s)) ds 1

Na

kfk ,

and Zt

0

e

�pNb(t�s)

g(x(s), y(s)) ds 1pNb

kgk ,

20

Page 35: INVITED PAPER LinkingStatisticaland EcologicalTheory ... · hierarchical Dirichlet process. Using this approximation we developed an efficient Bayesian fitting strategy for the ...

so that as N ! 1, we have x(t) = 0 +O�

1

N

�. We can thus substitute this back into the equation for y(t)

to conclude that

y(t) = e

�pNbt

y(0) +

Zt

0

e

�pNb(t�s)

g(0, y(s)) ds+O�

1

N

�,

(setting x(t) ⌘ 0 is equivalent to the action of the projection P). Thus, similarly, y(t) = 0 +O⇣

1pN

⌘.

Remark 11. It is necessary to assume G

i

(0) ⇠ ⌫

i

to obtain continuity of T0

(t)P at t = 0, which in turn isrequired to ensure weak convergence. More generally, Proposition 3 tell us that in the slow timescale, theisland demes instantaneously jump to their stationary states, and henceforth evolve as stationary processes;see [16] and [5] for more detailed discussions of processes with this behaviour.

Proof. Calculations essentially identical to those in Proposition 1 show that, when restricted to C0

, c�1

N

G(N) =

G0

+ o(cN

), with the primary di↵erence being with the operator Q(N)

0

. Here,

Q

(N)

0

= I + c

N

B

(N)

0

+ c

N

B + o(cN

),

where, as before

(B(N)

0

f)(x) =$

i

2(hf, µ

0

i � f(x)) + o(1),

but now

(Bf)(x) =✓

2

Z1

0

f(y) dy � f(x) = ✓(hf,�i � f(x))

(recall that � is Lebesgue measure, �(dx) = dx) is of the same asymptotic order. Moreover, we now only

consider terms of the form hQ(N)

0

f

0k

, µ

0

i, and

hB(N)

0

f

0k

, µ

0

i = $

i

2(hf, µ

0

i � hf, µ0

i) + o(1),

which vanishes in the limit. Thus,

c

N

hQ(N)

0

f

0k

, µ

0

i � hf0k

, µ

0

i = ✓

2

Z1

0

f(y) dy � f(x) + o(1) = ✓(hf,�i � f(x)) + o(1),

giving the corresponding terms in the generator (11).The first statement is then a restatement of Corollary 7.7, Chapter 1 of [6]; translating our notation into

theirs, we have

"

N

= c

N

,

N

= c

�1

N

a

N

,

A

N

= c

�1

N

GN

,

B = G, and A = G0

. That G0

generates a strongly continuous semigroup is Theorem 4.1, Chapter 10 of [6],which we used previously.

The second statement is a consequence of Corollary 8.9, Chapter 4, [6], where our initial condition ensurescontinuity of the semigroup T

0

(t) at t = 0.

2 Gibbs Sampling for the UNTB-HDP

2.1 Observed abundances

The observed data takes the form of an N ⇥ S matrix of counts X whose elements x

ij

are the observedfrequency of species j in community sample i. Here, N denotes the total number of communities and S

the total number of di↵erent species found in those communities. We will also denote the row vectors of X,which give the observed frequency distribution of species in each individual sample, by X

i

, i = 1, . . . , N .The size of each sample is simply J

i

=P

S

j=1

x

ij

.

21

Page 36: INVITED PAPER LinkingStatisticaland EcologicalTheory ... · hierarchical Dirichlet process. Using this approximation we developed an efficient Bayesian fitting strategy for the ...

2.2 Neutral-HDP model

�|✓ ⇠ Stick(✓), (12)

i

|Ii

, � ⇠ DP(Ii

, �), (13)

X

i

|⇡i

, J

i

⇠ MN(Ji

, ⇡

i

). (14)

This model for the observed frequencies can be interpreted as the generation of an infinite dimensional meta-community distribution � which is obtained from a stick-breaking or GEM distribution with concentrationparameter ✓. From this, for each community i we sample using the Dirichlet process a vector of taxa prob-abilities ⇡

i

which has concentration I

i

, the immigration rate for that site, and base distribution �. Finally,we sample the observed frequencies for each community X

i

from ⇡

i

using the multinomial distribution. Wealso include gamma hyper-priors for ✓ and the I

i

:

✓|↵, ⇣ ⇠ Gamma(↵, ⇣), (15)

I

i

|⌘, ⇠ Gamma(⌘,), (16)

where ↵, ⇣, ⌘ and are all constants. This completes the definition of our model.

2.3 Finite dimensional representation

In any given sample although the potential number of species is infinite we only observe S di↵erent types.It is convenient therefore to represent the model in terms of these finite dimensional number of types andone further class corresponding to all unobserved species. We will derive this as the limit of L total types asL ! 1. We will represent the proportions of the S observed species explicitly as �

k

with k = 1, . . . , S andthe unrepresented component as �

u

=P

L

k=S+1

k

. Let ✓

r

= ✓/L and ✓

u

= ✓(L � S)/L, then we will havea Dirichlet prior on � ⇠ Dir(✓

r

, . . . , ✓

r

, ✓

u

). In this finite dimensional representation we can also determinethe distributions in the local communities:

i

⇠ Dir(Ii

1

, . . . , I

i

S

, I

i

u

). (17)

We can then marginalise the local community distributions and derive the probability of the observed fre-quencies given �:

P (X|�, I1

, . . . , I

N

) =NY

i=1

J

i

!

X

i1

! · · ·XiS

!

�(Ii

)

�(Ji

+ I

i

)

SY

j=1

�(xij

+ I

i

j

)

�(Ii

j

). (18)

2.4 Gibbs sambling

To devise a Gibbs sampling strategy we need to determine the full conditional distributions of the param-eters we wish to sample, ✓ and I

i

, for i = 1, . . . , N . Our starting point will be the joint distribution ofthese parameters and the data, that is, Equation 18 multiplied by the prior distributions for �, ✓ and I

i

,marginalised over �:

P (✓, I1

, . . . , I

N

,X) =

Z

¯

P (X|�, I1

, . . . , I

N

)P (�|✓)d�Gamma(✓|↵, ⇣)NY

i=1

Gamma(Ii

|⌘, ⌫). (19)

The key to simplifying this expression is to expand the terms �(xij

+ I

i

j

)/�(Ii

j

) in Equation 18 aspolynomials [14]:

�(xij

+ I

i

j

)

�(Ii

j

)=

Tij=xijX

Tij=0

s(xij

, T

ij

)(Ii

j

)Tij, (20)

22

Page 37: INVITED PAPER LinkingStatisticaland EcologicalTheory ... · hierarchical Dirichlet process. Using this approximation we developed an efficient Bayesian fitting strategy for the ...

where the coe�cients s(xij

, T

ij

) are unsigned Stirling numbers of the first kind. We subsitute these sumsinto Equation 19 and then introduce the T

ij

and � as auxilliary variables to give:

Q(✓, �, I1

, . . . , I

N

, T

ij

) /

0

@NY

i=1

J

i

!

X

i1

! · · ·XiS

!

�(Ii

)

�(Ji

+ I

i

)

SY

j=1

s(xij

, T

ij

)(Ii

j

)Tij

1

A

P (�|✓)Gamma(✓|↵, ⇣)NY

i=1

Gamma(Ii

|⌘, ⌫). (21)

2.4.1 Full conditional for the ancestral states

From Equation 21, we see that the full conditional distribution for the number of ancestors (tables in theChinese restaurant franchise analogy) of species j in sample i is given by:

P (Tij

|xij

, I

i

,�

j

) / s(xij

, T

ij

)(Ii

j

)Tij. (22)

The reciprocal of Equation 20 is the normalising constant of this probability distribution and thus:

P (Tij

|xij

, I

i

,�

j

) =�(I

i

j

)

�(xij

+ I

i

j

)s(x

ij

, T

ij

)(Ii

j

)Tij. (23)

2.4.2 Full conditional for the metapopulation

In their derivation of a posterior sampling scheme for the hierarchical Dirichlet process mixture model usingan augmented Chinese restaurant franchise representation, [14] showed that the full conditional distributionfor the metapopulation vector � was:

� = (�1

,�

2

, . . . ,�

S

,�

u

) ⇠ Dir(T·1, T·2, . . . , T·S , ✓), (24)

where T·j =P

N

i=1

T

ij

.

2.4.3 Full conditional for the immigration rates

To derive the full conditional distribution of each I

i

given the other parameters we simply pull out all termsthat depend on I

i

from Equation 21. This gives:

P (Ii

|Tij

) / �(Ii

)

�(Ji

+ I

i

)I

Ti·i

Gamma(Ii

|⌘, ⌫), (25)

where T

i· =P

S

j=1

T

ij

. We can use the auxiliary variable approach of [17] to develop a Gibbs samplingupdate for I

i

, i = 1, . . . , N . Here, for each i, we can write:

�(Ii

)

�(Ii

+ J

i

)=

1

�(Ji

)

Z1

0

w

Iii

(1� w

i

)Ji�1

✓1 +

J

i

I

i

◆dw

i

(26)

(cf. with equation (A.2) of [14]). We now define auxiliary variables w = (wi

)Ni=1

and s = (si

)Ni=1

, where eachw

i

is a variable taking on values in [0, 1] and each s

i

is a binary {0, 1} variable, and define the followingdistribution:

q(Ii

, w, s) /NY

i=1

I

⌘�1+Ti·i

e

�⌫Iiw

Iii

(1� w

i

)Ji�1

✓J

i

I

i

◆si

(27)

(cf. with equation (A.3) of [14]). Now marginalising q to I

i

gives the desired conditional distribution for Ii

.Hence q defines an auxiliary variable sampling scheme for I

i

. Given w and s, we have:

q(Ii

|w, s) / I

⌘�1+Ti·�sii

e

�Ii(⌫�logwi), (28)

23

Page 38: INVITED PAPER LinkingStatisticaland EcologicalTheory ... · hierarchical Dirichlet process. Using this approximation we developed an efficient Bayesian fitting strategy for the ...

which is a Gamma distribution with parameters ⌘+ T

i· � s

i

and ⌫ � logwi

(cf. with equation (A.4) of [14]).Given I

i

, the w

i

and s

i

are conditionally independent, with distributions:

q(wi

|Ii

) / w

Iii

(1� w

i

)Ji�1 (29)

and

q(si

|Ii

) /✓J

i

I

i

◆si

, (30)

which are Beta(Ii

+ 1, Ji

) and Bernoulli⇣

JiJi+Ii

⌘, respectively (cf. with equations (A.5) and (A.6) of [14]).

2.4.4 Full conditional for the biodiversity parameter

A direct consequence of the stick-breaking prior for � is that the probability of observing S species from atotal number of T =

PN

i=1

PS

j=1

T

ij

ancestors is given by:

P (S|✓, T ) = s(T, S)✓S�(✓)

�(✓ + T )(31)

(cf. with equation (A.7) of [14]). The biodiversity parameter ✓ does not govern any other aspects of thejoint distribution in Equation 21, hence Equation 31, along with the prior for ✓ in Equation 15, is all that isneeded to derive a Gibbs sampling update for ✓. The auxiliary variable approach of [17] can also be appliedhere, which leads to the following auxiliary variable sampling scheme for ✓:

✓|⇢,�, S ⇠ Gamma(↵+ S � ⇢, ⇣ � log �), (32)

⇢|✓, T ⇠ Bernoulli

✓T

T + ✓

◆, (33)

�|✓, T ⇠ Beta(✓ + 1, T ). (34)

2.5 Results

In order to examine how well our HDP estimation approach performed in comparison with existing methods[18–20], we used a combination of simulated data and real data that had been analysed before. Firstly, wegenerated 1,000 simulated data sets of three local samples with 1,000 individuals each for the eight parametercombinations given in Table 1. Note that the migration probability is simply m

i

= I

i

/(Ii

+ J

i

� 1). Thesedata sets were generated using the PARI/GP code provided in [18], which is an urn algorithm based oncoalescence theory. We then estimated the parameters using the Gibbs sampling approach based on theHDP approximation and the approximate two stage approach of [19]. Tables 2 and 3 gives the means,coe�cients of variation and mean absolute deviations from the true values of our approach and Etienne’stwo stage approximate method, respectively, across the 1,000 data sets for each parameter combination.

For all parameter combinations considered the HDP approximation outperforms Etienne’s approximationas an estimator of ✓, as in each case the overall means are closer to the true values and the coe�cients of vari-ations and mean absolute deviations from the true values are considerably smaller. The HDP approximationprovides a less biased and more reliable estimator of ✓ than Etienne’s approximation.

A similar pattern is observed with the estimates of the immigration probabilities mi

, as for the parametercombinations considered our approach gives lower coe�cients of variation and mean absolute deviations fromthe true value than Etienne’s approximate method. Both approximations break down when the immigrationrate I is significantly larger than the fundamental biodiversity parameter ✓ (for example, see the estimatesof m

3

for synthetic data sets 1-5 in Tables 2 and 3), but in di↵erent ways. Our method underestimates theimmigration probability m in such cases, but the standard deviation around that estimate remains low, andthus our estimator for m is biased when I > ✓, but as this bias is consistent it would be possible to correctfor it. On the other hand, Etienne’s approximate approach gives an overall mean over the 1,000 simulateddata sets that is much closer to the true value in such a case than our method does. However, the variability

24

Page 39: INVITED PAPER LinkingStatisticaland EcologicalTheory ... · hierarchical Dirichlet process. Using this approximation we developed an efficient Bayesian fitting strategy for the ...

Data set J

i

✓ I

1

I

2

I

3

m

1

m

2

m

3

1 1000 5 111 249.75 666 0.1 0.2 0.42 1000 50 111 249.75 666 0.1 0.2 0.43 1000 500 111 249.75 666 0.1 0.2 0.44 1000 5 10.0909 52.5789 333 0.01 0.05 0.255 1000 50 10.0909 52.5789 333 0.01 0.05 0.256 1000 500 10.0909 52.5789 333 0.01 0.05 0.257 1000 5 1 2.002 4.012 0.001 0.002 0.0048 1000 50 1 2.002 4.012 0.001 0.002 0.004

Table 1: The parameter values chosen for the synthetic neutral model data sets that composed our simulationstudy.

Data set

ˆ

✓ CV MAD m1 CV MAD m2 CV MAD m3 CV MAD

1 5.4092 0.20 0.8950 0.0934 0.29 0.0232 0.1508 0.23 0.0522 0.2002 0.19 0.1998

2 51.5476 0.09 3.9993 0.0990 0.14 0.0114 0.1923 0.15 0.0242 0.3262 0.12 0.0749

3 498.8622 0.07 25.8993 0.0999 0.08 0.0067 0.1982 0.07 0.0119 0.3836 0.07 0.0252

4 5.4477 0.22 1.0088 0.0110 0.42 0.0032 0.0526 0.36 0.0144 0.1417 0.26 0.1083

5 51.7504 0.12 4.8836 0.0101 0.21 0.0017 0.0504 0.17 0.0065 0.2211 0.16 0.0387

6 488.8805 0.10 40.7537 0.0100 0.17 0.0014 0.0503 0.10 0.0040 0.2495 0.09 0.0171

7 5.3388 0.46 1.8189 0.0014 0.96 0.0007 0.0030 0.98 0.0015 0.0066 0.95 0.0035

8 55.0994 0.43 17.2483 0.0010 0.44 0.0004 0.0022 0.34 0.0006 0.0043 0.29 0.0009

Table 2: Estimates of ✓ and m

i

from the various scenarios of simulated data sets of Table 1 using thehierarchical Dirichlet process approximation. The values reported are the means, coe�cients of variationand mean absolute deviations from the true value of the parameter estimates over 1000 such data sets.

Data set

ˆ

✓ CV MAD m1 CV MAD m2 CV MAD m3 CV MAD

1 5.9130 0.40 1.9880 0.1899 1.45 0.1621 0.2763 1.14 0.2300 0.4057 1.10 0.3260

2 51.9033 0.20 8.2626 0.1071 0.44 0.0274 0.2239 0.56 0.0776 0.4231 0.48 0.1556

3 507.2382 0.12 50.4488 0.1006 0.09 0.0070 0.2010 0.09 0.0138 0.4032 0.12 0.0356

4 6.0710 0.45 2.1911 0.0410 3.62 0.0356 0.1177 1.88 0.1042 0.3086 1.11 0.2666

5 54.2026 0.29 12.6540 0.0102 0.55 0.0020 0.0580 0.83 0.0190 0.2897 0.72 0.1440

6 578.4131 0.36 166.5742 0.0100 0.18 0.0014 0.0503 0.13 0.0048 0.2601 0.34 0.0503

7 9.9517 1.41 6.5506 0.0164 7.03 0.0158 0.0348 4.69 0.0338 0.0473 3.88 0.0450

8 860.1590 7.00 824.9333 0.0011 1.61 0.0004 0.0022 0.73 0.0007 0.0075 6.32 0.0045

Table 3: Estimates of ✓ and m

i

from the various scenarios of simulated data sets of Table 1 using Eti-enne’s approximate method. The values reported are the means, coe�cients of variation and mean absolutedeviations from the true value of the parameter estimates over 1000 such data sets.

25

Page 40: INVITED PAPER LinkingStatisticaland EcologicalTheory ... · hierarchical Dirichlet process. Using this approximation we developed an efficient Bayesian fitting strategy for the ...

around Etienne’s approximate estimate of m is much higher because the algorithm often converges to animmigration probability of 1, even when the true value is much lower. It is also worth noting that Etienne’sapproximate method also breaks down badly for data sets 7 and 8 where the immigration probabilities arevery low, whereas the HDP approximation copes much better in such scenarios. Thus, we conclude that theHDP approximation is a better estimator of the neutral model’s parameters than Etienne’s approximationunless I >> ✓ and the immigration probabilities are close to 1.

In Table 4, we present the average times in seconds of Etienne’s approximate method using the codegiven in [19] and PARI/GP’s default settings, and our Gibbs sampling approach coded in C++ when it wasrun for 50,000 iterations with half of these being conservatively discarded as burn-in. Under these settings,for all but one of the simulated data set scenarios of Table 1, Etienne’s approximate method is two to threetimes faster than our approach. However, we are being very conservatie with sample number and equivalentresults could be achieved with as little at 10,000 iterations when the two methods would be of comparablespeed.

We were unable to replicate these results using Etienne’s ‘exact’ maximum likelihood method, so insteadwe quote those that he gave in a similar simulation study [20] in Table 5. We see that Etienne’s ‘exact’method slightly outperforms the HDP approximation as an estimator of ✓, as although the coe�cients ofvariation are broadly similar, the overall means are generally closer to their true values and thus Etienne’s‘exact’ method is less biased for this parameter. Regarding the estimation of immigration probabilities, theresults are comparable when ✓ <= I. When ✓ > I, there is a tendency for Etienne’s ‘exact’ method tooverestimate the immigration probability, but not as badly as the HDP approximation underestimates it.The advantage of the HDP approximation is that our code is easier to implement than Etienne’s ‘exact’method’s PARI/GP algorithm, it is much faster, and our approach can handle the large data sets oftenencountered in microbiomics.

As an example of how the methods compare on real data, we reanalysed the tropical tree data set usedas an example in [18–20]. The data consists of three forest plots in Panama called Barro Colorado Island (50ha), Cocoli (4 ha) and Sherman (5.96 ha), which lie along a precipitation gradient [21]. Table 6 shows theresults of the parameter estimation for Etienne’s three methods and our HDP approach. We see that in thiscase the results from the HDP approximation closely match Etienne’s ‘exact’ method, while his approximatemethod overestimates ✓ and underestimates the immigration rates. The matching results of our approachand Etienne’s ‘exact’ method is unsurprising as in this case ✓ >> I

i

.

References

1. Cannings C (1974) The latent roots of certain markov chains arising in genetics: A new approach, I.haploid models. Adv Appl Prob 6: 260–290.

2. Parsons TL, Quince C (2007) Fixation in haploid populations exhibiting density dependence II: Thequasi-neutral case. Theor Popul Biol 72: 468–479.

3. Parsons TL, Quince C, Plotkin JB (2008) Expected times to absorption and fixation for quasi-neutraland neutral haploid populations exhibiting density dependence. Theor Popul Biol 74: 302–310.

4. Parsons TL, Quince C, Plotkin JB (2010) Some consequences of demographic stochasticity in popu-lation genetics. Genetics 185: 1345–1354.

5. Parsons TL (2012) Asymptotic Analysis of Some Stochastic Models from Population Dynamics andPopulation Genetics. Ph.D. thesis, University of Toronto.

6. Ethier SN, Kurtz TG (1986) Markov Processes: Characterization and Convergence. New York: JohnWiley and Sons.

7. Hubbell SP (2001) The Unified Neutral Theory of Biodiversity and Biogeography. Princeton UniversityPress.

26

Page 41: INVITED PAPER LinkingStatisticaland EcologicalTheory ... · hierarchical Dirichlet process. Using this approximation we developed an efficient Bayesian fitting strategy for the ...

Data set Etienne’s approximation HDP approximation1 13.8583 40.62232 21.5615 41.12543 208.6595 41.58814 14.9588 41.85325 14.9767 40.67656 27.3442 42.40847 20.0091 56.16138 17.8649 57.5658

Table 4: Average time in seconds that Etienne’s approximate method and the HDP approximation took torun on the various scenarios of simulated data sets of Table 1. Note that the HDP approximation was runfor 50,000 iterations and half of these were conservatively discarded as burn-in.

Data set ✓ CV m

1

CV m

2

CV m

3

CV1 4.9689 0.21 0.1119 0.44 0.2353 0.49 0.4727 0.502 49.9838 0.10 0.1022 0.16 0.2041 0.16 0.4105 0.183 501.5142 0.07 0.1005 0.08 0.2009 0.08 0.4007 0.084 4.8982 0.25 0.0108 0.43 0.0572 0.46 0.3658 0.705 49.9892 0.12 0.0103 0.21 0.0513 0.16 0.2643 0.256 504.0792 0.11 0.0101 0.17 0.0504 0.11 0.2521 0.097 5.0388 0.45 0.0012 0.67 0.0027 1.27 0.0066 4.858 56.0378 0.55 0.0010 0.42 0.0020 0.35 0.0042 0.30

Table 5: Estimates of ✓ and m

i

from the various scenarios of simulated data sets of Table 1 using Etienne’s‘exact’ maximum likelihood method. The values reported are the means and coe�cients of variation over1000 such data sets, and were obtained from [20].

Method ✓ I

BCI

I

C

I

S

Etienne fixed I 259 44.2 44.2 44.2Etienne approx 342 53.7 30.8 33.9Etienne ‘exact’ 235± 23 65.3± 5.9 31.5± 3.9 35.7± 3.9HDP approx 231± 22 65.5± 5.9 31.6± 3.8 35.8± 3.9

Table 6: Neutral parameter estimates for samples from three local tree communities (Sherman, BCI andCocoli) in the Panama Canal Zone using Etienne’s approaches and the hierarchical Dirichlet process approx-imation. Standard errors are given for the methods where they are available.

27

Page 42: INVITED PAPER LinkingStatisticaland EcologicalTheory ... · hierarchical Dirichlet process. Using this approximation we developed an efficient Bayesian fitting strategy for the ...

8. Mohle M (2001) Forward and backward di↵usion approximations for haploid exchangeable populationmodels. Stochastic Processes Appl 95: 133–149.

9. Sjodin P, Kaj I, Krone S, Lascoux M, Nordborg M (2005) On the meaning and existence of an e↵ectivepopulation size. Genetics 169: 1061-70.

10. Mohle M, Sagitov S (2003) Coalescent patterns in diploid exchangeable population models. J MathBiol 47: 337–352.

11. Ewens WJ (1972) The sampling theory of selectively neutral alleles. Theor Popul Biol 3: 87–112.

12. Kingman JFC (1982) On the genealogy of large populations. J Appl Prob 19A: 27–43.

13. Abramowitz M, Stegun IA (1964) Handbook of Mathematical Functions with Formulas, Graphs, andMathematical Tables. New York: Dover.

14. Teh YW, Jordan MI, Beal MJ, Blei DM (2006) Hierarchical Dirichlet processes. Journal of theAmerican Statistical Association 101: 1566–1581.

15. Ethier SN, Kurtz TG (1981) The infinitely-many-neutral-alleles di↵usion model. Advances in AppliedProbability : 429–452.

16. Katzenberger GS (1991) Solutions of a stochastic di↵erential equation forced onto a manifold by alarge drift. Ann Probab 19: 1587–1628.

17. Escobar MD, West M (1995) Bayesian density estimation and inference using mixtures. Journal ofthe American Statistical Association 90: 577–588.

18. Etienne RS (2007) A neutral sampling formula for multiple samples and an ‘exact’ test of neutrality.Ecology Letters 10: 608–618.

19. Etienne RS (2009) Improved estimation of neutral model parameters for multiple samples with di↵erentdegrees of dispersal limitation. Ecology 90: 847–852.

20. Etienne RS (2009) Maximum likelihood estimation of neutral model parameters for multiple sampleswith di↵erent degrees of dispersal limitation. Journal of Theoretical Biology 257: 510–514.

21. Condit R, Pitman N, Leigh Jr E, Chave J, Terborgh J, et al. (2002) Beta-diversity in tropical foresttrees. Science 295: 666–669.

28