Sampling distributions of alleles under models of neutral evolution.
-
Upload
abigail-newman -
Category
Documents
-
view
219 -
download
0
Transcript of Sampling distributions of alleles under models of neutral evolution.
![Page 1: Sampling distributions of alleles under models of neutral evolution.](https://reader036.fdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c12f2/html5/thumbnails/1.jpg)
Sampling distributions of alleles under models of neutral evolution
![Page 2: Sampling distributions of alleles under models of neutral evolution.](https://reader036.fdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c12f2/html5/thumbnails/2.jpg)
1. Genetic drift and mutation2. Coalescent3. Pairwise differences and numbers of segregating sites4. Population with time-varying size
Plan
![Page 3: Sampling distributions of alleles under models of neutral evolution.](https://reader036.fdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c12f2/html5/thumbnails/3.jpg)
Mathematical model for sampling distributions
of alleles
Genetic drift Mutation
![Page 4: Sampling distributions of alleles under models of neutral evolution.](https://reader036.fdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c12f2/html5/thumbnails/4.jpg)
Genetic drift
Alleles:
A1: A2:
Replication = sampling with replacement
A1 – becomes fixed
A2 – becomes lost
G1
G2
Gn
...
![Page 5: Sampling distributions of alleles under models of neutral evolution.](https://reader036.fdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c12f2/html5/thumbnails/5.jpg)
Mutation
Gk
Gk+1
Mutation introducesgenetic variability tothe evolution process
![Page 6: Sampling distributions of alleles under models of neutral evolution.](https://reader036.fdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c12f2/html5/thumbnails/6.jpg)
MutationMutation follows a Poisson process with intensity measured per locus (per site) per generation. Spatial characterization of places and effects caused, further specifies a mutation model. Most often applied are: infinite sites model, where it is assumed that each mutation takes place at a DNA site that never mutated before; infinite alleles model, where each mutation produces an allele never present in a population before; recurrent mutation model, where multiple changes of the nucleotide at a site are possible; stepwise mutation model, where mutation acts bidirectionally, increasing or reducing the number of repeats of a fixed DNA motif.
![Page 7: Sampling distributions of alleles under models of neutral evolution.](https://reader036.fdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c12f2/html5/thumbnails/7.jpg)
Infinite sites model
Mutation configuration in the infinite sites
model is fully described by a map between numbers of
sequences and numbers of mutations
1
2
3
4
5
Mutations
1 2 3 4 5 6
Seq
uen
ce
s
![Page 8: Sampling distributions of alleles under models of neutral evolution.](https://reader036.fdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c12f2/html5/thumbnails/8.jpg)
Statistics of mutations (segregating sites)
![Page 9: Sampling distributions of alleles under models of neutral evolution.](https://reader036.fdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c12f2/html5/thumbnails/9.jpg)
Number of segregating sites
1
2
3
4
5
Mutations
1 2 3 4 5 6
Seq
uen
ce
s
S=6
![Page 10: Sampling distributions of alleles under models of neutral evolution.](https://reader036.fdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c12f2/html5/thumbnails/10.jpg)
Pairwise differences
1
2
3
4
5
1 2 3 4 5 6
Seq
uen
ces
No of differencesd23 = 3
Mutations
Average number of pairwise differences = 3
![Page 11: Sampling distributions of alleles under models of neutral evolution.](https://reader036.fdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c12f2/html5/thumbnails/11.jpg)
Histogram of pairwise differences
No o
f p
air
s
No of differences
0
1
2
0 1 2 3 4 5 6
3
![Page 12: Sampling distributions of alleles under models of neutral evolution.](https://reader036.fdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c12f2/html5/thumbnails/12.jpg)
Classes of mutations
1
2
3
4
5
1 2 3 4 5 6
Mutation of class 2
Seq
uen
ces
Mutations
![Page 13: Sampling distributions of alleles under models of neutral evolution.](https://reader036.fdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c12f2/html5/thumbnails/13.jpg)
Histogram of classes of mutations
Fre
qu
en
cy
Class of mutation
0
0.5
1 2
1
![Page 14: Sampling distributions of alleles under models of neutral evolution.](https://reader036.fdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c12f2/html5/thumbnails/14.jpg)
Coalescence method
One looks at the past of an n - sample of sequences taken at present. Possible events that happen in the past are coalescences leading to common ancestors of sequences, and mutationsalong branches of ancestral tree.
![Page 15: Sampling distributions of alleles under models of neutral evolution.](https://reader036.fdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c12f2/html5/thumbnails/15.jpg)
Coalescence method
Present
Past
Generation 1, (=1)
Generation 2, (=2)
Generation k, (=k)
.
.
…….
n - sample
Pop
ula
tion
size
2N
2N
2N
![Page 16: Sampling distributions of alleles under models of neutral evolution.](https://reader036.fdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c12f2/html5/thumbnails/16.jpg)
Coalescence – pairwise statistics
Two sequences. For each sequence draw randomly a parent in generation 1 (=1), then for each parent draw randomly a (grand) parent in generation 2, (=2) …. . COMMON ANCESTOR2(i) - probability that a COMMON ANCESTOR of the two sequences lived in generation i (=i)
![Page 17: Sampling distributions of alleles under models of neutral evolution.](https://reader036.fdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c12f2/html5/thumbnails/17.jpg)
N2
1)1(2
)2
11(
2
1)2(2 NN
12 )
2
11(
2
1)( k
NNk
![Page 18: Sampling distributions of alleles under models of neutral evolution.](https://reader036.fdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c12f2/html5/thumbnails/18.jpg)
Coalescence – continuous time approximations
Population time scale 1 unit = 2N generations
Nt
2
tetp )(2
Mutational time scale 1 unit = 1/2 generations
2t Netpt
4 ,1
)(2
![Page 19: Sampling distributions of alleles under models of neutral evolution.](https://reader036.fdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c12f2/html5/thumbnails/19.jpg)
Coalescence n-samplek independent, exponentially distributed random variables mutation intensityN population's effective size
= 4N product parameter t = 2 mutational time scale ( - is time in number of
generations).
n
kkn s
kk
ssp2
2 )2
exp(2
),...,(
)2
exp(2
)( kk s
kk
sp
![Page 20: Sampling distributions of alleles under models of neutral evolution.](https://reader036.fdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c12f2/html5/thumbnails/20.jpg)
Coalescence method
The use of coalescence
theory allows efficient
formulation of appropriate models and
gives a good basis for
approaching model analysis problems, like
hypotheses testing or
parameter estimation.
s5
s4
s3
6
5
4
3
2
1t4
t2
1 2 3 4 5
s2
t3
t5
![Page 21: Sampling distributions of alleles under models of neutral evolution.](https://reader036.fdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c12f2/html5/thumbnails/21.jpg)
Independence of metrics (coalescence times) and
topology
Topologies of trees (with ordered
branches) are all equally probable.
Metrics (distributions of branch
lengths) of trees are determined by
coalescence process which, in turn,
depends on population parameters.
![Page 22: Sampling distributions of alleles under models of neutral evolution.](https://reader036.fdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c12f2/html5/thumbnails/22.jpg)
Coalescence – statistics of pairwise differences
Assume mutational time – scale. Then mutations occur with intensity = 1/2. Let A2 denote a Z+ random variable defined by number of segregating sites between sample 1 and sample 2. T – random variable given by coalescence time t. Conditional probability that A2=n is Poisson with =t ! n
te
nt
P[A2=n | T=t] =
![Page 23: Sampling distributions of alleles under models of neutral evolution.](https://reader036.fdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c12f2/html5/thumbnails/23.jpg)
n
nAP
11
1][ 2
0
22 ][)(n
nsnAPs
)1(2 )|( stetTs
sss
11
1
1
)1(1
1)(2
![Page 24: Sampling distributions of alleles under models of neutral evolution.](https://reader036.fdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c12f2/html5/thumbnails/24.jpg)
Coalescence – population with time varying size
![Page 25: Sampling distributions of alleles under models of neutral evolution.](https://reader036.fdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c12f2/html5/thumbnails/25.jpg)
Population with time-varying size
Population's effective size N(t) changes in time, then product parameter is also a time function (t)= 4N(t)
Joint probability density function:
.0 ,
))(
exp(
)(),...,(
132
2
222
1
nn
n
k
t
t
k
k
k
n
tttt
d
tttp
k
k
![Page 26: Sampling distributions of alleles under models of neutral evolution.](https://reader036.fdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c12f2/html5/thumbnails/26.jpg)
How the history of population size
N(t) (t)is encoded in histograms
of pairwise differences and mutation classes ?
![Page 27: Sampling distributions of alleles under models of neutral evolution.](https://reader036.fdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c12f2/html5/thumbnails/27.jpg)
Pairwise differences
![Page 28: Sampling distributions of alleles under models of neutral evolution.](https://reader036.fdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c12f2/html5/thumbnails/28.jpg)
no of differences
0 5 10 150
12
34
56
7
time t
(t
)
Pairwise differences I
0 5 10 15 20 250
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
freq
uen
cy
![Page 29: Sampling distributions of alleles under models of neutral evolution.](https://reader036.fdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c12f2/html5/thumbnails/29.jpg)
no of differences
freq
uen
cy
Pairwise differences II
0 5 10 15 20 25 300
0.02
0.04
0.06
0.08
0.1
0.12
time t
(t
)
0 5 10 15 20 25 300
20
40
60
80
100
120
![Page 30: Sampling distributions of alleles under models of neutral evolution.](https://reader036.fdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c12f2/html5/thumbnails/30.jpg)
no of differences
freq
uen
cy
Pairwise differences III
0 5 10 15 20 25 300
0.02
0.04
0.06
0.08
0.1
0.12
0.14
time t
(t
)
0 5 10 150
50
100
150
200
250
![Page 31: Sampling distributions of alleles under models of neutral evolution.](https://reader036.fdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c12f2/html5/thumbnails/31.jpg)
Mutation classes
Frequencies are computed under the assumption
that mutaion intensity is low
![Page 32: Sampling distributions of alleles under models of neutral evolution.](https://reader036.fdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c12f2/html5/thumbnails/32.jpg)
Mutation classes I
0 5 10 150
12
34
56
7
time t
SNP type
N(t
)fr
eq
uen
cy
1 2 3 4 5 6 7 8 9 100
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.3
N(t)=const
![Page 33: Sampling distributions of alleles under models of neutral evolution.](https://reader036.fdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c12f2/html5/thumbnails/33.jpg)
SNP type
time t
N(t
)fr
eq
uen
cy
1 2 3 4 5 6 7 8 9 100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0 5 10 150
50
100
150
200
250
N(t)=N0exp(rt)
0.5
N0r=10
Mutation classes II
![Page 34: Sampling distributions of alleles under models of neutral evolution.](https://reader036.fdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c12f2/html5/thumbnails/34.jpg)
time t
SNP type
N(t
)fr
eq
uen
cy
1 2 3 4 5 6 7 8 9 100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0 5 10 15 20 25 300
20
40
60
80
100
120
0.6
Mutation classes III
![Page 35: Sampling distributions of alleles under models of neutral evolution.](https://reader036.fdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c12f2/html5/thumbnails/35.jpg)
Conclusions
Different histories of population sizes lead to different sampling distributions of alleles
Parametric models of different form (exponential, stepwise, logistic) can lead to similar (difficult to distinguish) distributions of alleles
Estimation of population size history from DNA data can be unstable
![Page 36: Sampling distributions of alleles under models of neutral evolution.](https://reader036.fdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c12f2/html5/thumbnails/36.jpg)
Models versus data
Parametric and nonparametric estimation of
population size histories from DNA samples
Testing hypotheses on values of parameters
under parametric models, testing hypotheses
of time constant versus time varying
scenario
![Page 37: Sampling distributions of alleles under models of neutral evolution.](https://reader036.fdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c12f2/html5/thumbnails/37.jpg)
Models versus data
0 2 4 6 8 10 12 14 16 18 200
50100150200250300350400450
0 5 10 15 20 25 300
0.02
0.04
0.06
0.08
0.1
0.12
Data on worldwide distribution of mtDNA pairwise differences R. Cann et. al. 1987
Estimation of history of human population size
![Page 38: Sampling distributions of alleles under models of neutral evolution.](https://reader036.fdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c12f2/html5/thumbnails/38.jpg)
Models versus data II
2 4 6 8 10 12 14 16 18 200
0.1
0.2
0.3
0.4
0.5
0.6 Histogram of classes of mutations. Data on worldwide distribution of mtDNA pairwise differences R. Cann et. al. 1987
![Page 39: Sampling distributions of alleles under models of neutral evolution.](https://reader036.fdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c12f2/html5/thumbnails/39.jpg)
Models versus data III
Data on types of 44 SNPs randomly located in the genome Picoult, Newberg 2000
0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 10
12
34
5
67
8
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.10.20.30.40.50.60.70.80.9
1
Parametric estimates of N(t) based on the above data