Background The demographic events experienced by populations influence their genealogical history...
-
Upload
shawna-hornby -
Category
Documents
-
view
217 -
download
1
Transcript of Background The demographic events experienced by populations influence their genealogical history...
BackgroundThe demographic events experienced by populations influence their genealogical history and
therefore the pattern of neutral polymorphism observable within and between extant populations
(see Figure 1). For example, the number of alleles shared between very closely related species
depends on the time at which the species split and whether gene flow occurred since the split (see
Figure 2). Thus, polymorphism data can be used to estimate the demographic parameters
describing the history of two incipient species (see Figure 3).
Here, we consider a simple model in which two populations split T generations ago and the
number of migrants exchanged between them is M per generation. Na, N1 and N2 are the effective
population sizes for the ancestral, first and second descendant populations, respectively. We
denote the set of parameters by . Our goal is to estimate the posterior distribution of given the
data. Rather than using all the data to estimate these parameters, we summarize the data for each
locus by four statistics known to be sensitive to the parameters of interest (see Figure 1 for details).
Given a genealogy, the probability of obtaining these statistics can be calculated explicitly. We
therefore take the following approach to obtain an estimate of the posterior distribution of the
parameters:
Specifically, we pick a set of parameters independently from prior distributions, then simulate a
genealogical history for each locus and calculate u= p(D|G,). We then weight the values of the
parameters by u to obtain an estimate of their posterior probability.
Keystone Symposia. Genome Sequence Variation. Jan 08 – Jan 13, 2006
Estimating divergence times and testing for migration using multi-locus polymorphism data
Céline Becqueta, Andrea S. Putnamb, Peter Andolfattob, and Molly Przeworskia Dept. of Human Genetics, Chicago, IL, USA, 60637a; University of California at San Diego, La Jolla, CA, 92093b
Table 1 – Comparison between methods
PaperSpee
d Advantages DrawbacksWakeley & Hey 1997 Fast Method of moments estimator Summary of data
Allows for recombination Low accuracy
Multiple loci Ss & Sf >0 required
Can use genotype data Model with no migration
Nielsen & Wakeley 2001 Slow Uses all the data Uses one locus
Allows for uncertainty in nuisance parameters
Recombination not allowedHaplotype data required
More general model (allows for migration)
Hey & Nielsen 2004 Slow Same as Nielsen & Wakeley 2001
Recombination not allowedHaplotype data required
Multiple loci
Leman et al. 2005 Fast Approximate Bayesian Computation approach
Summary of data
Uses one locus
No need for Ss & Sf >0 Recombination not allowed
Can use genotype data Model with no migration
Our method Slow Same as Leman et al. 2005 Summary of data
More general model
Allows for recombination
Multiple loci
An example of polymorphism data at a locus in
three sequences sampled from each of two
populations. The horizontal lines represent aligned
sequences; the colored squares, disc and ovals
stands for segregating sites. We use the following
summaries of the polymorphism data at each locus:
the number of segregating sites specific to sample
one (S1), specific to sample two (S2), shared
between samples from both populations (Sshared) and
fixed in either population sample (Sfixed).
S1=1
S2=2
Sshared=1
Sfixed=1
123
abc
Fig-2. Effects of divergence time and migration on polymorphism data
Examples of genealogical histories for three sequences sampled from each of two closely
related populations, under different models. The patterns of polymorphism and divergence
expected under each model are indicated below. For simplicity, we present a single genealogy, but
for recombining loci, there may be many histories within a single region (i.e. there is an ancestral
recombination graph, rather than a tree). The vertical branches represent ancestral lineages for the
six sequences; they are colored according to whether a mutation would lead to a fixed, shared or
unique polymorphism in the sample (see Figure 1). In c, gene flow occurred (yellow line), thus
sequence 3 was sampled in population one but its ancestor came from population two.
G
dGGpGDpDf )()|(),|()|( Posterior distribution
Calculated explicitly
Estimated from coalescent simulations
Prior distributions on parameters
Future directionsOur current method is relatively slow when using data from multiple loci because it is searching a huge space of possible histories and
parameters. We would like to speed up the method and extend it to more complex models. To do so, we will need to account for two sources of variance: in the genealogies and the parameters. We therefore plan to generate many genealogies for the same set of parameters in order to improve the accuracy of our estimate of p(D|) and use Markov Chain Monte Carlo in order to better explore the parameter space.
AbstractPopulation divergence times are of interest in many contexts, from human genetics to
conservation biology. These times can be estimated from polymorphism data. However, existing
approaches make a number of assumptions (e.g., no recombination within loci or no migration
since the split) that limit their applicability. To overcome these limitations, we developed an
Approximate Bayesian Computation approach to estimate population parameters for a simple split
model, allowing for migration as well as intralocus recombination. Application to simulated data
suggests that the approach provides fairly accurate estimates of population sizes and divergence
times and has high power to detect migration since the split. We illustrate the potential of the
method by applying it to polymorphism data from five highly recombining loci surveyed in two
closely related species of Lepidoptera (Papilio glaucus and P. canadensis).
Fig-1. Summary statistics used for estimation
a. A gene genealogy for a recent divergence time without migration
1 2 3 a b c
T
N1 N2
Na
Excess of shared polymorphisms
(occurring along the red branch)
and few fixed sites (purple branch).
b. A gene genealogy for an old divergence time without migration
T
1 2 3 a b cFew shared polymorphisms (none
here) and an excess of fixed sites.
c. A gene genealogy for an old divergence time with migration
T
1 2 3 a b c
M
Excess of shared polymorphism and
few fixed sites.
Application to two Papilio species
Fig-3. Performance on a small simulated data set
Mean of the divergence time (a) and the ratio of ancestral to current population size (b). The
estimates are based on polymorphism data from ten simulated loci of 1 kb, generated with: a
sample size of 20 individuals from each population, the population mutation rates θ1=θ2=θa=.001,
T=5x104 generations and M=5. Each vertical line refers to a data set (Y-axis) , the red line
indicates the true value and the X-axis range corresponds to the range of the prior distribution. As
can be seen, the divergence times tend to be over-estimated, while the ancestral population size
estimates are more accurate.
Fig-4. Ranges of P. glaucus and P. canadensis. A narrow hybrid zone forms where the ranges meet. Female
mimetic morph of P. glaucus is shown with yellow morphs.
We applied our method to data from five highly
recombining loci sampled in two species of Lepidoptera
(Papilio glaucus and P. canadensis). These two species
are known to exchange migrants and experience high
levels of recombination. In order to examine the sensitivity
to assumptions about migration, we compared the
parameter estimates obtained in models with and without
gene flow: the time of divergence appears to be under-
estimated and ancestral population size over-estimated
when migration is ignored (see Table 2).
Table 2 - Effect of model on estimation
Estimator Model N1 N2 NaT (in
generation)Migration
Rate*
MeanMigration 2.65E+05 1.80E+05 1.72E+05 4.38E+05 0.154
No Migration 2.59E+05 1.75E+05 6.25E+05 1.46E+05
MedianMigration 2.55E+05 1.66E+05 1.51E+05 4.45E+05 0.148
No Migration 2.52E+05 1.62E+05 5.84E+05 1.43E+05* The posterior probability of migration is > .999, while the prior probability is only .5. Thus, there is strong support for gene flow, as expected.
0123456789
1011
0.E+00 1.E+05 2.E+05 3.E+05 4.E+05 5.E+05 6.E+05T
a
0123456789
1011
0 2 4 6Na/N1
b